Related to the interesting talk at a recent NBRug meeting on Ruport, I’ve been looking at doing data warehousing in Ruby. Fortunately, Anthony Eden has beaten me to the punch with ActiveWarehouse. For those unfamiliar with Data Warehousing, the concepts are basically to create a series of facts that are indexed buy multiple dimensions.

  • A fact is typically an integer (e.g., sales amount) with data relating to the fact expressed in dimensions. (Note that in some cases, you can have a Factless Fact table where the information is entirely in the dimensions).
  • Dimensions provide different types of data relating to the fact (e.g., date & time of entry, user, product, location, etc.), so a fact table has a column for each dimension with a column for the fact scalar value itself.

The dimension tables contain an id index column, a column for the fact value (e.g., timestamp for date & time) and columns for each summary to be associated with the dimension (e.g., day of week, day of month, month, year, hour minute, …). At this point, queries can be performed by joining the dimension and fact tables and summarizing (or counting) the scalar fact value against conditions against the dimension tables (e.g., sales per year by person and region).

Typically, this data is pre-summarized in cubes with summary tables that contain the results of these summary queries. The ActiveWarehouse plugin contains everything necessary to define, populate and report on this data.

I expect to be using this to hold information on user listening habits for MP3 files.