30
Dimensions

This chapter contains:

Dimensions in a Data Warehouse

Dimensions do not have to be defined, but spending time creating them can yield significant benefits because they help query rewrite perform more complex types of rewrite. They are mandatory if you use the advisor to recommend which materialized views to create, drop, or retain.

A business process is an operational process within an organization about which data can be collected. As an example, each store of a video chain might gather and store data regarding sales and rentals of video tapes at the check-out counter. The video chain management can build a data warehouse to analyze the sales of its products across all stores over time and help answer questions such as:

What is the effect of promoting one product on the sale of a related product that is not promoted?
What are the product sales before and after the promotion?

The data in the video chain's data warehouse system has two important components: dimensions and facts. The dimensions are products, locations (stores), promotions, and time. One approach for identifying your dimensions is to review your reference tables, such as a product table which contains everything about a product, or a store table containing all information about a store. The facts are sales (units sold or rented) and profits. A data warehouse contains facts about the sales of each product at each store on a daily basis.

Dimension values are usually organized into hierarchies. Going up a level in the hierarchy is called rolling up the data and going down a level in the hierarchy is called drilling down the data. In the video chain example:

Within the time dimension, months roll up to quarters, quarters roll up to years, and years roll up to all years.
Within the product dimension, products roll up to categories, categories roll up to departments, and departments roll up to all departments.
Within the location dimension, stores roll up to cities, cities roll up to states, states roll up to regions, regions roll up to countries, and countries roll up to all countries, as shown in Figure 30-1.

Figure 30-1 Geography Dimension

Data analysis typically starts at higher levels in the dimensional hierarchy and gradually drills down if the situation warrants such analysis.

You can visualize the dimensions of a business process as an n-dimensional data cube. In the video chain example, the business dimensions product, location, and time can be represented along the three axes of the cube. Each unit along the product axis represents a different product, each unit along the location axis represents a store, and each unit along the time axis represents a month. At the intersection of these values is a cell that contains factual information, such as units sold and profits made. Higher-level analysis consists of selecting and aggregating the factual information within a subcube, such as rentals of comedy videos in California stores during the second quarter of 1998.

Therefore, the first step towards creating a dimension is to identify the dimensions within your data warehouse and then draw the hierarchies as shown in Figure 30-1. For example, city is a child of state (because you can aggregate city-level data up to state), and state. Using this approach, you should find it easier to translate this into an actual dimension.

In the case of normalized or partially normalized dimensions (a dimension that is stored in more than one table), identify how these tables are joined. Note whether the joins between the dimension tables can guarantee that each child-side row joins with one and only one parent-side row. In the case of denormalized dimensions, determine whether the child-side columns uniquely determine the parent-side (or attribute) columns. These constraints can be enabled with the NOVALIDATE and RELY options if the relationships represented by the constraints are guaranteed by other means. Note that if the joins between fact and dimension tables do not support this relationship, you still gain significant performance advantages from defining the dimension with the CREATE DIMENSION statement. Another alternative, subject to certain restrictions, is to use outer joins in the materialized view definition (that is, in the CREATE MATERIALIZED VIEW statement).

You must not create dimensions in any schema that does not satisfy these relationships, incorrect results can be returned from queries otherwise.

Creating a Dimension

Before you can create a dimension, tables must exist in the database which contain this dimension data. For example, if you create a dimension called LOCATION, one or more tables must exist which contains the city, state, and country information. In a data warehouse, these dimension tables already exist. It is therefore a simple task to identify which ones will be used.

You create a dimension with the CREATE DIMENSION statement. Within the CREATE DIMENSION statement, use the LEVEL...IS clause to identify the names of the dimension levels.

The location dimension contains a single hierarchy, with arrows drawn from the child level to the parent level. At the top of this dimension graph is the special level ALL, that represents aggregation over all rows. Each arrow in this graph indicates that for any child there is one and only one parent. For example, each city must be contained in exactly one state and each state must be contained in exactly one country. States that belong to more than one country, or that belong to no country, violate hierarchical integrity. Hierarchical integrity is necessary for the correct operation of management functions for materialized views that include aggregates.

Therefore, using the entities illustrated in Figure 30-1 as an example, you can declare a dimension LOCATION which contains levels CITY, STATE, and COUNTRY:

CREATE DIMENSION location_dim
     
LEVEL city      IS location.city
LEVEL state     IS location.state
LEVEL country   IS location.country

Using your drawing of the dimension, translate each level in the diagram to a LEVEL clause in the CREATE DIMENSION statement. You therefore define 3 levels: city, state, and country. Then each level in the dimension must correspond to one or more columns in a table in the database. Thus, level city is identified by the column city in the table called location and level country is identified by a column called country in the same table.

In this example, the database tables are denormalized and all the columns exist in the same table. However, this is not a prerequisite for creating dimensions. "Using Normalized Dimension Tables" shows how to create a dimension that has a normalized schema design using the JOIN KEY clause.

The next step is to declare the relationship between the levels with the HIERARCHY statement and give that hierarchy a name. A hierarchical relationship is a functional dependency from one level of a hierarchy to the next level in the hierarchy. Using the level names defined previously, the CHILD OF relationship denotes that each child's level value is associated with one and only one parent level value. Again, using the entities in Figure 30-1, the following statements declare a hierarchy LOC_ROLLUP and define the relationship between CITY, STATE, and COUNTRY.

HIERARCHY loc_rollup    (
     
city     CHILD OF
state    CHILD OF
country     )

In addition to the 1:n hierarchical relationships, dimensions also include 1:1 attribute relationships between the hierarchy levels and their dependent dimension attributes. For example, if there are columns governor and mayor, then the ATTRIBUTE...DETERMINES statement would be state to governor and city to mayor.

In our example, suppose a query was issued that queried by mayor instead of city. Since this 1-1 relationship exists between the attribute and the level, city can be used to identify the data.

ATTRIBUTE city DETERMINES mayor

This complete dimension definition is shown below including the creation of the location table.

CREATE TABLE location  (
     
city     VARCHAR2(30),
state    VARCHAR2(30),
country  VARCHAR2(30),
mayor    VARCHAR2(30),
governor  VARCHAR2(30)  );


     CREATE DIMENSION location_dim
     
LEVEL city      IS location.city
LEVEL state     IS location.state
LEVEL country   IS location.country

     HIERARCHY loc_rollup    (
     
city     CHILD OF
state    CHILD OF
country )

     ATTRIBUTE     city      DETERMINES   location.mayor
ATTRIBUTE     state     DETERMINES   location.governor;

The design, creation, and maintenance of dimensions is part of the design, creation, and maintenance of your data warehouse schema. Once the dimension has been created, check that it meets these requirements:

There must be a 1:n relationship between a parent and children. A parent can have one or more children, but a child can have only one parent.
There must be a 1:1 attribute relationship between hierarchy levels and their dependent dimension attributes. For example, if there is a column corporation, then a possible attribute relationship would be corporation to president.
If the columns of a parent level and child level are in different relations, then the connection between them also requires a 1:n join relationship. Each row of the child table must join with one and only one row of the parent table. This relationship is stronger than referential integrity alone because it requires that the child join key must be non-null, that referential integrity must be maintained from the child join key to the parent join key, and that the parent join key must be unique.
Ensure (using database constraints if necessary) that the columns of each hierarchy level are non-null and that hierarchical integrity is maintained.
The hierarchies of a dimension may overlap or be disconnected from each other. However, the columns of a hierarchy level cannot be associated with more than one dimension.
Join relationships that form cycles in the dimension graph are not supported. For example, a hierarchy level cannot be joined to itself either directly or indirectly.

Multiple Hierarchies

A single dimension definition can contain multiple hierarchies as illustrated below. Suppose a department store wants to track the sales of certain items over time. The first step is to define the time dimension over which sales will be tracked. Figure 30-2 illustrates a dimension "Time" with three time hierarchies.

Figure 30-2 Time_dim Dimension with Three Time Hierarchies

From the illustration, you can construct the following denormalized Time dimension statement. The associated CREATE TABLE statement is also shown.

CREATE TABLE time (
     
curDate     DATE,
month       INTEGER,
quarter     INTEGER,
year        INTEGER,
season      INTEGER,
week_num    INTEGER,
dayofweek   VARCHAR2(30),
month_name  VARCHAR2(30)  );


     CREATE DIMENSION time_dim
     
LEVEL curDate    IS time.curDate 
LEVEL month      IS time.month 
LEVEL quarter    IS time.quarter 
LEVEL year       IS time.year
LEVEL season     IS time.season
LEVEL week_num   IS time.week_num


     HIERARCHY calendar_rollup  ( 
     
curDate       CHILD OF
month         CHILD OF
quarter       CHILD OF
year                    )

     HIERARCHY weekly_rollup    (
     
     curDate          CHILD OF
     week_num           )

     HIERARCHY seasonal_rollup  (
     
     curDate          CHILD OF
     season             ) 

     ATTRIBUTE curDate     DETERMINES  time.dayofweek
ATTRIBUTE month       DETERMINES  time.month_name;

Using Normalized Dimension Tables

The tables used to define a dimension may be normalized or denormalized and the individual hierarchies can be normalized or denormalized. If the levels of a hierarchy come from the same table, it is called a fully denormalized hierarchy. For example, CALENDAR_ROLLUP in the Time dimension is a denormalized hierarchy. If levels of a hierarchy come from different tables, such a hierarchy is either a fully or partially normalized hierarchy. This section shows how to define a normalized hierarchy.

Suppose the tracking of products is done by product, brand, and department. This data is stored in the tables PRODUCT, BRAND, and DEPARTMENT. The product dimension is normalized because the data entities ITEM_NAME, BRAND_ID, and DEPT_ID are taken from different tables. The clause JOIN KEY within the dimension definition specifies how to join together the levels in the hierarchy. The dimension statement and the associated CREATE TABLE statements for the PRODUCT, BRAND, and DEPARTMENT tables are shown below.

CREATE TABLE product  (
     
item_name    VARCHAR2(30),
brand_id     INTEGER  );


     CREATE TABLE brand  (
     
brand_id     INTEGER,
brand_name   VARCHAR2(30),
dept_id      INTEGER);


     CREATE TABLE department  (
     
dept_id       INTEGER,
dept_name     VARCHAR2(30),
dept_type     INTEGER);


     CREATE DIMENSION product_dim 
     
LEVEL item      IS product.item_name 
LEVEL brand_id  IS brand.brand_id 
LEVEL dept_id   IS department.dept_id 
 

     HIERARCHY merchandise_rollup      
(
     
item                   CHILD OF  
brand_id               CHILD OF  
dept_id 

JOIN KEY  product.brand_id REFERENCES brand_id 
JOIN KEY  brand.dept_id    REFERENCES dept_id

     ) 
ATTRIBUTE brand_id DETERMINES product.brand_name 
ATTRIBUTE dept_id  DETERMINES (product.dept_name, product.dept_type);

Viewing Dimensions

Two procedures are available which allow you to display the dimensions that have been defined. First, the file smdim.sql must be executed to provide the DEMO_DIM package, which includes:

DEMO_DIM.PRINT_DIM to print a specific dimension
DEMO_DIM.PRINT_ALLDIMS to print all dimensions

The DEMO_DIM.PRINT_DIM procedure has only one parameter, the name of the dimension to display. The example below shows how to display the dimension TIME_PD.

DEMO_DIM.PRINT_DIM  ('TIME_PD');

To display all of the dimensions that have been defined, call the procedure DEMO_DIM.PRINT_ALLDIMS without any parameters as shown below.

DEMO_DIM.PRINT_ALLDIMS ();

Irrespective of which procedure is called, the output is identical. A sample display is shown below.

DIMENSION GROCERY.TIME_PD 
LEVEL FISCAL_QTR IS GROCERY.WEEK.FISCAL_QTR  
LEVEL MONTH IS GROCERY.MONTH.MONTH  
LEVEL QUARTER IS GROCERY.QUARTER.QUARTER  
LEVEL TIME_KEY IS GROCERY.TIME.TIME_KEY  
LEVEL WEEK IS GROCERY.WEEK.WEEK  
LEVEL YEAR IS GROCERY.YEAR.YEAR  
HIERARCHY WEEKLY_ROLLUP (  
      TIME_KEY 
       CHILD OF WEEK 
   JOIN KEY GROCERY.TIME.WEEK REFERENCES WEEK 
  ) 
    HIERARCHY FISCAL_ROLLUP (  
             TIME_KEY 
             CHILD OF WEEK 
             CHILD OF FISCAL_QTR 
    JOIN KEY GROCERY.TIME.WEEK REFERENCES WEEK 
    ) 
   HIERARCHY CALENDAR_ROLLUP (  
             TIME_KEY 
             CHILD OF MONTH 
             CHILD OF QUARTER 
             CHILD OF YEAR 
      JOIN KEY GROCERY.TIME.MONTH REFERENCES MONTH 
      JOIN KEY GROCERY.MONTH.QUARTER REFERENCES QUARTER 
      JOIN KEY GROCERY.QUARTER.YEAR REFERENCES YEAR 
    ) 
  
    ATTRIBUTE TIME_KEY DETERMINES GROCERY.TIME.DAY_NUMBER_IN_MONTH 
    ATTRIBUTE TIME_KEY DETERMINES GROCERY.TIME.DAY_NUMBER_IN_YEAR 
    ATTRIBUTE WEEK DETERMINES GROCERY.WEEK.WEEK_NUMBER_OF_YEAR 
    ATTRIBUTE MONTH DETERMINES GROCERY.MONTH.FULL_MONTH_NAME

Dimensions and Constraints

Constraints plan an important role with dimensions. In most cases, full referential integrity is enforced on the operational databases, and operational procedures can be used to ensure that data flowing into the data warehouse (after data cleansing) never violates referential integrity; so, in practice, referential integrity constraints may or may not be enabled in the data warehouse.

It is recommended that constraints be enabled and, if validation time is a concern, then the NOVALIDATE clause should be used as shown below. Primary and foreign keys should be implemented as described. Referential integrity constraints and NOT NULL constraints on the fact tables provide information that query rewrite can use to extend the usefulness of materialized views.

ENABLE NOVALIDATE CONSTRAINT pk_time;

In addition, the RELY clause should be used to advise query rewrite that it can rely upon the constraints being correct as shown below.

ALTER TABLE time MODIFY CONSTRAINT pk_time RELY;

Validating a Dimension

If the relationships described by the dimensions are incorrect, wrong results could occur. Therefore, you should verify the relationships specified by CREATE DIMENSION using the DBMS_OLAP.VALIDATE_DIMENSION procedure periodically.

This procedure is easy to use and only has four parameters:

dimension name
owner name
set to TRUE to only check the new rows for tables of this dimension
set to TRUE to verify that all columns are not null

The example shown below validates the dimension time_fn in the Grocery schema

DBMS_OLAP.VALIDATE_DIMENSION ('TIME_FN', 'GROCERY', FALSE, TRUE);

All exceptions encountered by the VALIDATE_DIMENSION procedure are placed in the table MVIEW$_EXCEPTIONS, which is created in the user's schema. Querying this table will identify the exceptions that were found. For example:

OWNER    TABLE_NAME  DIMENSION_NAME RELATIONSHIP BAD_ROWID          
-------- ----------- -------------- ------------ ------------------ 
GROCERY  MONTH       TIME_FN        FOREIGN KEY  AAAAuwAAJAAAARwAAA

However, rather than query this table, it may be better to query as follows where the rowid of the invalid row is used to retrieve the actual row that has violated the constraint. In this example, the dimension TIME_FD is checking a table called month. It has found a row that violates the constraints and using the rowid, we can see exactly which row in the month table is causing the problem.

SELECT * FROM month
WHERE rowid IN (select bad_rowid from mview$_exceptions);

MONTH      QUARTER    FISCAL_QTR YEAR       FULL_MONTH_NAME      MONTH_NUMB
---------- ---------- ---------- ---------- -------------------- ----------
199903     19981      19981      1998       March                3

Altering a Dimension

Some modification can be made to the dimension using the ALTER DIMENSION statement. You can add or drop a level, hierarchy, or attribute from the dimension using this command.

Referring to the time dimension in Figure 30-2, you could remove the attribute month, drop the hierarchy weekly_rollup, and remove the level week. In addition, you could add a new level called qtr1.

ALTER DIMENSION time_dim DROP attribute month;
ALTER DIMENSION time_dim DROP hierarchy weekly_rollup;
ALTER DIMENSION time_dim DROP LEVEL week;
ALTER DIMENSION time_dim ADD LEVEL qtr1 IS time.fiscal_qtr;

A dimension becomes invalid if you change any schema object which the dimension is referencing. For example, if the table on which the dimension is defined is altered.

To check the status of a dimension, view the contents of the column invalid in the table ALL_DIMENSIONS.

To revalidate the dimension, use the COMPILE option as shown below.

ALTER DIMENSION time_dim COMPILE;

Deleting a Dimension

A dimension is removed using the DROP DIMENSION command. For example:

DROP DIMENSION time_dim;

30 Dimensions