Oracle8i Tuning Release 8.1.5 A67775-01 |
|
This chapter contains:
One of the major benefits of creating and maintaining materialized views is the ability to take advantage of query rewrite, which transforms a SQL statement expressed in terms of tables or views into a statement accessing one or more materialized views that are defined on the detail tables. The transformation is transparent to the end user or application, requiring no intervention and no reference to the materialized view in the SQL statement. Because query rewrite is transparent, materialized views can be added or dropped just like indexes without invalidating the SQL in the application code.
Before the query is rewritten, it is subjected to several checks to determine whether it is a candidate for query rewrite. If the query fails any of the checks, then the query is applied to the detail tables rather than the materialized view. This can be costly in terms of response time and processing power.
The Oracle optimizer uses two different methods to recognize when to rewrite a query in terms of one or more materialized views. The first method is based on matching the SQL text of the query with the SQL text of the materialized view definition. If the first method fails, the optimizer uses the more general method in which it compares join conditions, data columns, grouping columns, and aggregate functions between the query and a materialized view.
Query rewrite operates on queries and subqueries in the following types of SQL statements:
and on subqueries in the set operators UNION, UNION ALL, INTERSECT, and MINUS.
Several factors affect whether or not a given query is rewritten to use one or more materialized views:
Query rewrite is available with cost-based optimization. Oracle optimizes the input query with and without rewrite and selects the least costly alternative. The optimizer rewrites a query by rewriting one or more query blocks, one at a time.
If the rewrite logic has a choice between multiple materialized views to rewrite a query block, it will select one to optimize the ratio of the sum of the cardinality of the tables in the rewritten query block to that in the original query block. Therefore, the materialized view selected would be the one which can result in reading in the least amount of data.
After a materialized view has been picked for a rewrite, the optimizer performs the
rewrite, and then tests whether the rewritten query can be rewritten further with another materialized view. This process continues until no further rewrites are possible. Then the rewritten query is optimized and the original query is optimized. The optimizer compares these two optimizations and selects the least costly alternative.
Since optimization is based on cost, it is important to collect statistics both on tables involved in the query and on the tables representing materialized views. Statistics are fundamental measures, such as the number of rows in a table, that are used to calculate the cost of a (rewritten) query. They are created with the ANALYZE statement or by using the DBMS_STATISTICS package.
Queries that contain in-line or named views are also candidates for query rewrite. When a query contains a named view, the view name is used to do the matching between a materialized view and the query. That is, the set of named views in a materialized view definition should match exactly with the set of views in the query. When a query contains an inline view, the inline view may be merged into the query before matching between a materialized view and the query occurs.
Several steps must be followed to enable query rewrite:
If step 1 has not been completed, a materialized view will never be eligible for query rewrite. ENABLE QUERY REWRITE can be specified either when the materialized view is created, as illustrated below, or via the ALTER MATERIALIZED VIEW statement.
CREATE MATERIALIZED VIEW store_sales_mv ENABLE QUERY REWRITE AS SELECT s.store_name, SUM(dollar_sales) AS sum_dollar_sales FROM store s, fact f WHERE f.store_key = s.store_key GROUP BY s.store_name;
You can use the initialization parameter QUERY_REWRITE_ENABLED to disable query rewrite for all materialized views, or to enable it again for all materialized views that are individually enabled. However, the QUERY_REWRITE_ENABLED parameter cannot enable query rewrite for materialized views that have disabled it with the CREATE or ALTER statement.
The NOREWRITE hint disables query rewrite in a SQL statement, overriding the QUERY_REWRITE_ENABLED parameter, and the REWRITE (mview_name, ...) hint restricts the eligible materialized views to those named in the hint.
Query rewrite requires the following initialization parameter settings:
The QUERY_REWRITE_INTEGRITY parameter is optional, but must be set to STALE_TOLERATED, TRUSTED, or ENFORCED if it is specified (see "Accuracy of Query Rewrite"). It will default to ENFORCED if it is undefined.
A materialized view is used based not on privileges the user has on that materialized view, but based on privileges the user has on detail tables or views in the query.
The system privilege GRANT REWRITE allows you to enable materialized views in your own schema for query rewrite only if all tables directly referenced by the materialized view are in that schema. The GRANT GLOBAL REWRITE privilege allows you to enable materialized views for query rewrite even if the materialized view references objects in other schemas.
The privileges for using materialized views for query rewrite are similar to those for definer-rights procedures. See Oracle8i Concepts for further information.
A query gets rewritten only when a certain number of conditions are met:
To determine this, the optimizer may depend on some of the data relationships declared by the user via constraints and dimensions. Such data relationships include hierarchies, referential integrity, and uniqueness of key data, and so on.
The following sections use an example schema and a few materialized views to illustrate how the data relationships are used by the optimizer to rewrite queries. A retail database consists of these tables:
STORE (store_key, store_name, store_city, store_state, store_country) PRODUCT (prod_key, prod_name, prod_brand) TIME (time_key, time_day, time_week, time_month) FACT (store_key, prod_key, time_key, dollar_sales)
Two materialized views created on these tables contain only joins:
CREATE MATERIALIZED VIEW join_fact_store_time ENABLE QUERY REWRITE AS SELECT s.store_key, s.store_name, f.dollar_sales, t.time_key, t.time_day, f.prod_key, f.rowid, t.rowid FROM fact f, store s, time t WHERE f.time_key = t.time_key AND f.store_key = s.store_key; CREATE MATERIALIZED VIEW join_fact_store_time_oj ENABLE QUERY REWRITE AS SELECT s.store_key, s.store_name, f.dollar_sales, t.time_key, f.rowid, t.rowid FROM fact f, store s, time t WHERE f.time_key = t.time_key(+) AND f.store_key = s.store_key(+);
and two materialized views contain joins and aggregates:
CREATE MATERIALIZED VIEW sum_fact_store_time_prod ENABLE QUERY REWRITE AS SELECT s.store_name, time_week, p.prod_key, SUM(f.dollar_sales) AS sum_sales, COUNT(f.dollar_sales) AS count_sales FROM fact f, store s, time t, product p WHERE f.time_key = t.time_key AND f.store_key = s.store_key AND f.prod_key = p.prod_key GROUP BY s.store_name, time_week, p.prod_key; CREATE MATERIALIZED VIEW sum_fact_store_prod ENABLE QUERY REWRITE AS SELECT s.store_city, p.prod_name SUM(f.dollar_sales) AS sum_sales, COUNT(f.dollar_sales) AS count_sales FROM fact f, store s, product p WHERE f.store_key = s.store_key AND f.prod_key = p.prod_key GROUP BY store_city, p.prod_name;
You must compute statistics on the materialized views so that the optimizer can determine based on cost whether to rewrite the queries.
ANALYZE TABLE join_fact_store_time COMPUTE STATISTICS; ANALYZE TABLE join_fact_store_time_oj COMPUTE STATISTICS; ANALYZE TABLE sum_fact_store_time_prod COMPUTE STATISTICS; ANALYZE TABLE sum_fact_store_prod COMPUTE STATISTICS;
The optimizer uses a number of different methods to rewrite a query. The first, most important step is to determine if all or part of the results requested by the query can be obtained from the precomputed results stored in a materialized view.
The simplest case occurs when the result stored in a materialized view exactly matches what is requested by a query. The Oracle optimizer makes this type of determination by comparing the SQL text of the query with the SQL text of the materialized view definition. This method is most straightforward and also very limiting.
When the SQL text comparison test fails, the Oracle optimizer performs a series of generalized checks based on the joins, grouping, aggregates, and column data fetched. This is accomplished by individually comparing various clauses (SELECT, FROM, WHERE, GROUP BY) of a query with those of a materialized view.
Two methods are used by the optimizer:
In full SQL text match, the entire SQL text of a query is compared against the entire SQL text of a materialized view definition (that is, the entire SELECT expression), ignoring the white space during SQL text comparison. The following query
SELECT s.store_name, time_week, p.prod_key, SUM(f.dollar_sales) AS sum_sales, COUNT(f.dollar_sales) AS count_sales FROM fact f, store s, time t, product p WHERE f.time_key = t.time_key AND f.store_key = s.store_key AND f.prod_key = p.prod_key GROUP BY s.store_name, time_week, p.prod_key;
which matches sum_fact_store_time_prod (white space excluded) will be rewritten as:
SELECT store_name, time_week, product_key, sum_sales, count_sales FROM sum_fact_store_time_prod;
When full SQL text match fails, the optimizer then attempts a partial SQL text match. In this method, the SQL text starting from the FROM clause of a query is compared against the SQL text starting from the FROM clause of a materialized view definition. Therefore, this query:
SELECT s.store_name, time_week, p.prod_key, AVG(f.dollar_sales) AS avg_sales FROM fact f, store s, time t, product p WHERE f.time_key = t.time_key AND f.store_key = s.store_key AND f.prod_key = p.prod_key GROUP BY s.store_name, time_week, p.prod_key;
will be rewritten as:
SELECT store_name, time_week, prod_key, sum_sales/count_sales AS avg_sales FROM sum_fact_store_time_prod;
Note that under the partial SQL text match rewrite method, the average of sales aggregate required by the query is computed using sum of sales and count of sales aggregates stored in the materialized view.
When neither SQL text match succeeds, the optimizer uses a general query rewrite method.
The general query rewrite methods are much more powerful than SQL text match methods because they can enable the use of a materialized view even if it contains only part of the data requested by a query, or it contains more data than what is requested by a query, or it contains data in a different form which can be converted into a form required by a query. To achieve this, the optimizer compares the SQL clauses (SELECT, FROM, WHERE, GROUP BY) individually between a query and a materialized view.
The Oracle optimizer employs four different checks called:
Depending on the type of a materialized view, some or all four checks are made to determine if the materialized view can be used to rewrite a query as illustrated in the table below.
MV with Joins Only |
MV with Joins and Aggregates | MV with Aggregates on a Single Table | |
---|---|---|---|
Join Compatibility |
X |
X |
- |
Data Sufficiency |
X |
X |
X |
Grouping Compatibility |
- |
X |
X |
Aggregate Computability |
- |
X |
X |
To perform these checks, the optimizer uses data relationships on which it can depend. For example, primary key and foreign key relationships tell the optimizer that each row in the foreign key table joins with at most one row in the primary key table. Furthermore, if there is a NOT NULL constraint on the foreign key, it indicates that each row in the foreign key table joins with exactly one row in the primary key table.
Data relationships such as these are very important for query rewrite because they tell what type of result is produced by joins, grouping, or aggregation of data. Therefore, to maximize the rewritability of a large set of queries when such data relationships exist in a database, they should be declared by the user.
In this check, the joins in a query are compared against the joins in a materialized view. In general, this comparison results in the classification of joins into three categories:
The common join pairs between the two must be of same type, or the join in the query must be derivable from the join in the materialized view. For example, if a materialized view contains an outer join of table A with table B, and a query contains an inner join of table A with table B, the result of the inner join can be derived by filtering the anti-join rows from the result of the outer join.
For example, consider this query:
SELECT s.store_name, t.time_day, SUM(f.dollar_sales) FROM fact f, store s, time t WHERE f.time_key = t.time_key AND f.store_key = s.store_key AND t.time_day BETWEEN '01-DEC-1997' AND '31-DEC-1997' GROUP BY s.store_name, t.time_day;
The common joins between this query and the materialized view join_fact_store_time are:
f.time_key = t.time_key AND f.store_key = s.store_key
They match exactly and the query can be rewritten as:
SELECT store_name, time_day, SUM(dollar_sales) FROM join_fact_store_time WHERE time_day BETWEEN '01-DEC-1997' AND '31-DEC-1997' GROUP BY store_name, time_day;
The query could also be answered using the join_fact_store_time_oj materialized view where inner joins in the query can be derived from outer joins in the materialized view. The rewritten version will (transparently to the user) filter out the anti-join rows. The rewritten query will have the structure:
SELECT store_name, time_day, SUM(f.dollar_sales) FROM join_fact_store_time_oj WHERE time_key IS NOT NULL AND store_key IS NOT NULL AND time_day BETWEEN '01-DEC-1997' AND '31-DEC-1997' GROUP BY store_name, time_day;
In general, if you use an outer join in a materialized view containing only joins, you should put in the materialized view either the primary key or the rowid on the right side of the outer join. For example, in the previous example, join_fact_store_time_oj there is a primary key on both store and time.
Another example of when a materialized view containing only joins is used is the case of a semi-join rewrites. That is, a query contains either an EXISTS or an IN subquery with a single table.
Consider this query, which reports the stores that had sales greater than $10,000 during the 1997 Christmas season.
SELECT DISTINCT store_name FROM store s WHERE EXISTS (SELECT * FROM fact f WHERE f.store_key = s.store_key AND f.dollar_sales > 10000 and f.time_key between '01-DEC-1997' and '31-DEC-1997');
This query could also be seen as:
SELECT DISTINCT store_name FROM store s WHERE s.store_key in (SELECT f.store_key FROM fact f WHERE f.dollar_sales > 10000);
This query contains a semi-join 'f.store_key = s.store_key' between the store and the fact table. This query can be rewritten to use either the join_fact_store_time materialized view, if foreign key constraints are active or join_fact_store_time_oj materialized view, if primary keys are active. Observe that both materialized views contain 'f.store_key = s.store_key' which can be used to derive the semi-join in the query.
The query is rewritten with join_fact_store_time as follows:
SELECT store_name FROM (SELECT DISTINCT store_name, store_key FROM join_fact_store_time WHERE dollar_sales > 10000 AND f.time_key BETWEEN '01-DEC-1997' and '31-DEC-1997');
If the materialized view join_fact_store_time is partitioned by time_key, then this query is likely to be more efficient than the original query because the original join between store and fact has been avoided.
The query could be rewritten using join_fact_store_time_oj as follows.
SELECT store_name FROM (SELECT DISTINCT store_name, store_key FROM join_fact_store_time_oj WHERE dollar_sales > 10000 AND store_key IS NOT NULL AND time_key BETWEEN '01-DEC-1997' and '31-DEC-1997');
Rewrites with semi-joins are currently restricted to materialized views with joins only and are not available for materialized views with joins and aggregates.
A query delta join is a join that appears in the query but not in the materialized view. Any number and type of delta joins in a query are allowed and they are simply retained when the query is rewritten with a materialized view. Upon rewrite, the materialized view is joined to the appropriate tables in the delta joins.
For example, consider this query:
SELECT store_name, prod_name, SUM(f.dollar_sales) FROM fact f, store s, time t, product p WHERE f.time_key = t.time_key AND f.store_key = s.store_key AND f.prod_key = p.prod_key AND t.time_day BETWEEN '01-DEC-1997' AND '31-DEC-1997' GROUP BY store_name, prod_name;
Using the materialized view join_fact_store_time, common joins are: f.time_key = t.time_key AND f.store_key = s.store_key. The delta join in the query is f.prod_key = p.prod_key.
The rewritten form will then join the join_fact_store_time materialized view with the product table:
SELECT store_name, prod_name, SUM(f.dollar_sales) FROM join_fact_store_time mv, product p WHERE mv.prod_key = p.prod_key AND mv.time_day BETWEEN '01-DEC-1997' AND '31-DEC-1997' GROUP BY store_name, prod_name;
All delta joins in a materialized view are required to be lossless with respect to the result of common joins. A lossless join guarantees that the result of common joins is not restricted. A lossless join is one where, if two tables called A and B are joined together, rows in table A will always match with rows in table B and no data will be lost, hence the term lossless join. For example, every row with the foreign key matches a row with a primary key provided no nulls are allowed in the foreign key. Therefore, to guarantee a lossless join, it is necessary to have FOREIGN KEY, PRIMARY KEY, and NOT NULL constraints on appropriate join keys. Alternatively, if the join between tables A and B is an outer join (A being the outer table), it is lossless as it preserves all rows of table A.
All delta joins in a materialized view are required to be non-duplicating with respect to the result of common joins. A non-duplicating join guarantees that the result of common joins is not duplicated. For example, a non-duplicating join is one where, if table A and table B are joined together, rows in table A will match with at most one row in table B and no duplication occurs. To guarantee a non-duplicating join, the key in table B must be constrained to unique values by using a primary key or unique constraint.
Consider this query which joins FACT and TIME:
SELECT t.time_day, sum(f.dollar_sales) FROM fact f, time t WHERE f.time_key = t.time_key AND t.time_day BETWEEN '01-DEC-1997' AND '31-DEC-1997' GROUP t.time_day;
The materialized view join_fact_store_time has an additional join between FACT and STORE: 'f.store_key = s.store_key'. This is the delta join in join_fact_store_time.
We can rewrite the query if this join is lossless and non-duplicating. This is the case if f.store_key is a foreign key to p.store_key and is not null. The query is therefore rewritten as:
SELECT time_day, SUM(f.dollar_sales) FROM join_fact_store_time WHERE time_day BETWEEN '01-DEC-1997' AND '31-DEC-1997' GROUP BY time_day;
The query could also be rewritten with the materialized view join_fact_store_time_oj where foreign key constraints are not needed. This view contains an outer join between fact and store: 'f.store_key = s.store_key(+)' which makes the join lossless. If s.store_key is a primary key, then the non-duplicating condition is satisfied as well and optimizer will rewrite the query as:
SELECT time_day, SUM(f.dollar_sales) FROM join_fact_store_time_oj WHERE time_key IS NOT NULL AND time_day BETWEEN '01-DEC-1997' AND '31-DEC-1997' GROUP BY time_day;
The current limitations restrict most rewrites with outer joins to materialized views with joins only. There is very limited support for rewrites with materialized aggregate views with outer joins. Those views should rely on foreign key constraints to assure losslessness of delta materialized view joins.
In this check, the optimizer determines if the necessary column data requested by a query can be obtained from a materialized view. For this, the equivalence of one column with another is used. For example, if an inner join between table A and table B is based on a join predicate A.X = B.X, then the data in column A.X will equal the data in column B.X in the result of the join. This data property is used to match column A.X in a query with column B.X in a materialized view or vice versa.
For example, consider this query:
SELECT s.store_name, f.time_key, SUM(f.dollar_sales) FROM fact f, store s, time t WHERE f.time_key = t.time_key AND f.store_key = s.store_key GROUP BY s.store_name, f.time_key;
This query can be answered with join_fact_store_time even though the materialized view doesn't have f.time_key. Instead, it has t.time_key which, through a join condition 'f.time_key = t.time_key', is equivalent to f.time_key.
Thus, the optimizer may select this rewrite:
SELECT store_name, time_day, SUM(dollar_sales) FROM join_fact_store_time GROUP BY store_name, time_key;
If some column data requested by a query cannot be obtained from a materialized view, the optimizer further determines if it can be obtained based on a data relationship called functional dependency. When the data in a column can determine data in another column, such a relationship is called functional dependency or functional determinance. For example, if a table contains a primary key column called prod_key and another column called prod_name, then, given a prod_key value, it is possible to look up the corresponding prod_name. The opposite is not true, which means a prod_name value need not relate to a unique prod_key.
When the column data required by a query is not available from a materialized view, such column data can still be obtained by joining the materialized view back to the table that contains required column data provided the materialized view contains a key that functionally determines the required column data.
For example, consider this query:
SELECT s.store_name, t.time_week, p.prod_name, SUM(f.dollar_sales) AS sum_sales, FROM fact f, store s, time t, product p WHERE f.time_key = t.time_key AND f.store_key = s.store_key AND f.prod_key = p.prod_key AND p.prod_brand = 'KELLOGG' GROUP BY s.store_name, t.time_week, p.prod_name;
The materialized view sum_fact_store_time_prod contains p.prod_key, but not p.prod_brand. However, we can join sum_fact_store_time_prod back to PRODUCT to retrieve prod_brand because prod_key functionally determines prod_brand. The optimizer rewrites this query using sum_fact_store_time_prod as:
SELECT mv.store_name, mv.time_week, p.product_key, mv.sum_sales, FROM sum_fact_store_time_prod mv, product p WHERE mv.prod_key = p.prod_key AND p.prod_brand = 'KELLOGG' GROUP BY mv.store_name, mv.time_week, p.prod_key;
Here the PRODUCT table is called a joinback table because it was originally joined in the materialized view but joined back again in the rewritten query.
There are two ways to declare functional dependency:
The DETERMINES clause of a dimension definition may be the only way you could declare functional dependency when the column that determines another column cannot be a primary key. For example, the STORE table is a denormalized dimension table which has columns store_key, store_name, store_city, city_name, and store_state. Store_key functionally determines store_name and store_city functionally determines store_state.
The first functional dependency can be established by declaring store_key as the primary key, but not the second functional dependency because the store_city column contains duplicate values. In this situation, you can use the DETERMINES clause of a dimension to declare the second functional dependency.
The following dimension definition illustrates how the functional dependencies are declared.
CREATE DIMENSION store_dim LEVEL store_key IS store.store_key LEVEL city IS store.store_city LEVEL state IS store.store_state LEVEL country IS store.store_country HIERARCHY geographical_rollup ( store_key CHILD OF city CHILD OF state CHILD OF country ) ATTRIBUTE store_key DETERMINES store.store_name; ATTRIBUTE store_city DETERMINES store.city_name;
The hierarchy geographic_rollup declares hierarchical relationships which are also 1:n functional dependencies. The 1:1 functional dependencies are declared using the DETERMINES clause, such as store_city functionally determines city_name.
The following query:
SELECT s.store_city, p.prod_name SUM(f.dollar_sales) AS sum_sales, FROM fact f, store s, product p WHERE f.store_key = s.store_key AND f.prod_key = p.prod_key AND s.city_name = 'BELMONT' GROUP BY s.store_city, p.prod_name;
can be rewritten by joining sum_fact_store_prod to the STORE table so that city_name is available to evaluate the predicate. But the join will be based on the store_city column, which is not a primary key in the STORE table; therefore, it allows duplicates. This is accomplished by using an inline view which selects distinct values and this view is joined to the materialized view as shown in the rewritten query below.
SELECT iv.store_city, mv.prod_name, mv.sum_sales FROM sum_fact_store_prod mv, (SELECT DISTINCT store_city, city_name FROM store) iv WHERE mv.store_city = iv.store_city AND iv.store_name = 'BELMONT' GROUP BY iv.store_city, mv.prod_name;
This type of rewrite was possible because the fact that store_city functionally determines city_name as declared in the dimension.
This check is required only if both the materialized view and the query contain a GROUP BY clause. The optimizer first determines if the grouping of data requested by a query is exactly the same as the grouping of data stored in a materialized view. That means, the level of grouping is the same in both the query and the materialized view. For example, a query requests data grouped by store_city and a materialized view stores data grouped by store_city and store_state. The grouping is the same in both provided store_city functionally determines store_state, such as the functional dependency shown in the dimension example above.
If the grouping of data requested by a query is at a coarser level compared to the grouping of data stored in a materialized view, the optimizer can still use the materialized view to rewrite the query. For example, the materialized view sum_fact_store_time_prod groups by store_name, time_week, and prod_key. This query groups by store_name, a coarser grouping granularity:
SELECT s.store_name, SUM(f.dollar_sales) AS sum_sales, FROM fact f, store s WHERE f.store_key = s.store_key GROUP BY s.store_name;
Therefore, the optimizer will rewrite this query as:
SELECT store_name, SUM(sum_dollar_sales) AS sum_sales, FROM sum_fact_store_time_prod GROUP BY s.store_name;
In another example, a query requests data grouped by store_state whereas a materialized view stores data grouped by store_city. If store_city is a CHILD OF store_state (see the dimension example above), the grouped data stored in the materialized view can be further grouped by store_state when the query is rewritten. In other words, aggregates at store_city level (finer granularity) stored in a materialized view can be rolled up into aggregates at store_state level (coarser granularity).
For example, consider the following query:
SELECT store_state, prod_name, SUM(f.dollar_sales) AS sum_sales FROM fact f, store s, product p WHERE f.store_key = s.store_key AND f.prod_key = p.prod_key GROUP BY store_state, prod_name;
Because store_city functionally determines store_state, sum_fact_store_prod can be used with a joinback to store table to retrieve store_state column data, and then aggregates can be rolled up to store_state level, as shown below:
SELECT store_state, prod_name, sum(mv.sum_sales) AS sum_sales FROM sum_fact_store_prod mv, (SELECT DISTINCT store_city, store_state FROM store) iv WHERE mv.store_city = iv.store_city GROUP BY store_state, prod_name;
Note that for this rewrite, the data sufficiency check determines that a joinback to the STORE table is necessary, and the grouping compatibility check determines that aggregate rollup is necessary.
This check is required only if both the query and the materialized view contain aggregates. Here the optimizer determines if the aggregates requested by a query can be derived or computed from one or more aggregates stored in a materialized view. For example, if a query requests AVG(X) and a materialized view contains SUM(X) and COUNT(X), then AVG(X) can be computed as SUM(X) / COUNT(X).
If the grouping compatibility check determined that the rollup of aggregates stored in a materialized view is required, then aggregate computability check determines if it is possible to roll up each aggregate requested by the query using aggregates in the materialized view.
For example, SUM(sales) at the city level can be rolled up to SUM(sales) at the state level by summing all SUM(sales) aggregates in a group with the same state value. However, AVG(sales) cannot be rolled up to a coarser level unless COUNT(sales) is also available in the materialized view. Similarly, VARIANCE(sales) or STDDEV(sales) cannot be rolled up unless COUNT(sales) and SUM(sales) are also available in the materialized view. For example, given the query:
SELECT p.prod_name, AVG(f.dollar_sales) AS avg_sales FROM fact f, product p WHERE f.prod_key = p.prod_key GROUP BY p.prod_name;
The materialized view sum_fact_store_prod can be used to rewrite it provided the join between FACT and STORE is lossless and non-duplicating. Further, the query groups by prod_name whereas the materialized view groups by store_city, prod_name, which means the aggregates stored in the materialized view will have to be rolled up. The optimizer will rewrite the query as:
SELECT mv.prod_name, SUM(mv.sum_sales)/SUM(mv.count_sales) AS avg_sales FROM sum_fact_store_prod mv GROUP BY mv.prod_name;
The argument of an aggregate such as SUM can be an arithmetic expression like A+B. The optimizer will try to match an aggregate SUM(A+B) in a query with an aggregate SUM(A+B) or SUM(B+A) stored in a materialized view. In other words, expression equivalence is used when matching the argument of an aggregate in a query with the argument of a similar aggregate in a materialized view. To accomplish this, Oracle converts the aggregate argument expression into a canonical form such that two different but equivalent expressions convert into the same canonical form. For example, A*(B-C), A*B-C*A, (B-C)*A, and -A*C+A*B all convert into the same canonical form and, therefore, they are successfully matched.
To clarify when dimensions and constraints are required for the different types of query rewrite, refer to Table 31-2.
Query rewrite offers three levels of rewrite integrity that are controlled by the initialization parameter QUERY_REWRITE_INTEGRITY, which can either be set in your parameter file or controlled using the ALTER SYSTEM or ALTER SESSION command. The three values it can take are:
This is the default mode. The optimizer will only use materialized views which it knows contain fresh data and only use those relationships that are based on enforced constraints.
In TRUSTED mode, the optimizer trusts that the data in the materialized views based on prebuilt tables is correct, and the relationships declared in dimensions and RELY constraints are correct. In this mode, the optimizer uses prebuilt materialized views, and uses relationships that are not enforced as well as those that are enforced. In this mode, the optimizer also 'trusts' declared but not enforced constraints and data relationships specified using dimensions.
In STALE_TOLERATED mode, the optimizer uses materialized views that are valid but contain stale data as well as those that contain fresh data. This mode offers the maximum rewrite capability but creates the risk of generating wrong results.
If rewrite integrity is set to the safest level, ENFORCED, the optimizer uses only enforced primary key constraints and referential integrity constraints to ensure that the results of the query are the same as the results when accessing the detail tables directly.
If the rewrite integrity is set to levels other than ENFORCED, then there are several situations where the output with rewrite may be different from that without it.
PREBUILT
materialized view table may be incorrect.
Since query rewrite occurs transparently, special steps have to be taken to verify that a query has been rewritten. Of course, if the query runs faster, this should indicate that rewrite has occurred but this isn't proof. Therefore, to confirm that query rewrite does occur, use the EXPLAIN PLAN statement.
The EXPLAIN PLAN facility is used as described in Oracle8i SQL Reference. For query rewrite, all you need to check is that the object_name column in PLAN_TABLE contains the materialized view name. If it does, then query rewrite will occur when this query is executed.
In this example, the materialized view store_mv has been created.
CREATE MATERIALIZED VIEW store_mv ENABLE QUERY REWRITE AS SELECT s.region, SUM(grocery_sq_ft) AS sum_floor_plan FROM store s GROUP BY s.region;
If EXPLAIN PLAN is used on this SQL statement, the results are placed in the default table PLAN_TABLE.
EXPLAIN PLAN FOR SELECT s.region, SUM(grocery_sq_ft) FROM store s GROUP BY s.region;
For the purposes of query rewrite, the only information of interest from PLAN_TABLE is the OBJECT_NAME, which identifies the objects that will be used to execute this query. Therefore, you would expect to see the object name STORE_MV in the output as illustrated below.
SELECT object_name FROM plan_table; OBJECT_NAME ------------------------------ STORE_MV 2 rows selected.
A materialized view is only eligible for query rewrite if the ENABLE QUERY REWRITE clause has been specified, either initially when the materialized view was first created or subsequently via an ALTER MATERIALIZED VIEW command.
The initialization parameters described above can be set using the ALTER SYSTEM SET command. For a given user's session, ALTER SESSION can be used to disable or enable query rewrite for that session only. For example:
ALTER SESSION SET QUERY_REWRITE_ENABLED = TRUE;
The correctness of query rewrite can be set for a session, thus allowing different users to work at different integrity levels.
ALTER SESSION SET QUERY_REWRITE_INTEGRITY = STALE_TOLERATED; ALTER SESSION SET QUERY_REWRITE_INTEGRITY = TRUSTED; ALTER SESSION SET QUERY_REWRITE_INTEGRITY = ENFORCED;
Hints may be included in SQL statements to control whether query rewrite occurs. Using the NOREWRITE hint with no argument in a query prevents the optimizer from rewriting it.
The REWRITE hint with no argument in a query forces the optimizer to use a materialized view (if any) to rewrite it regardless of the cost.
The REWRITE (mv1, mv2, ...) hint with argument(s) forces rewrite to select the most suitable materialized view from the list of names specified.
For example, to prevent a rewrite, you can use:
SELECT /*+ NOREWRITE */ s.city, SUM(s.grocery_sq_ft) FROM store s GROUP BY s.city;
and to force a rewrite using mv1, you can use:
SELECT /*+ REWRITE (mv1) */ s.city, SUM(s.grocery_sq_ft) FROM store s GROUP BY s.city;
The following guidelines will help in getting the maximum benefit from query rewrite. They are not mandatory for using query rewrite and rewrite is not guaranteed if you follow them. They are general rules of thumb.
Make sure all inner joins referred to in a materialized view have referential integrity (foreign key - primary key constraints) with additional NOT NULL constraints on the foreign key columns. Since constraints tend to impose a large overhead, you could make them NONVALIDATE and RELY and set the parameter QUERY_REWRITE_INTEGRITY to STALE_TOLERATED or TRUSTED. However, if you set QUERY_REWRITE_INTEGRITY to ENFORCED, all constraints must be enforced to get maximum rewritability.
You can express the hierarchical relationships and functional dependencies in normalized or denormalized dimension tables using the HIERARCHY clause of a dimension. Dimensions can express intra-table relationships which cannot be expressed by any constraints. Set the parameter QUERY_REWRITE_INTEGRITY to TRUSTED or STALE_TOLERATED for query rewrite to take advantage of the relationships declared in dimensions.
Another way of avoiding constraints is to use outer joins in the materialized view. Query rewrite will be able to derive an inner join in the query, such as (A.a = B.b), from an outer join in the materialized view (A.a = B.b(+)), as long as the rowid of B or column B.b is available in the materialized view. Most of the support for rewrites with outer joins is provided for materialized views with joins only. To exploit it, a materialized view with outer joins should store the rowid or primary key of the inner table of an outer join. For example, the materialized view join_fact_store_time_oj stores the primary keys store_key and time_key of the inner tables of outer joins.
If you need to speed up an extremely complex, long-running query, you could create a materialized view with the exact text of the query.
In order to get the maximum benefit from query rewrite, make sure that all aggregates which are needed to compute ones in the targeted query are present in the materialized view. The conditions on aggregates are quite similar to those for incremental refresh. For instance, if AVG(x) is in the query, then you should store COUNT(x) and AVG(x) or store SUM(x) and COUNT(x) in the materialized view. Refer to Table 29-1, "Requirements for Fast Refresh of Materialized Views" .
Aggregating data at lower levels in the hierarchy is better than aggregating at higher levels because lower levels can be used to rewrite more queries. Note, however, that doing so will also take up more space. For example, instead of grouping on state, group on city (unless space constraints prohibit it).
Instead of creating multiple materialized views with overlapping or hierarchically related GROUP BY columns, create a single materialized view with all those GROUP BY columns. For example, instead of using a materialized view that groups by city and another materialized view that groups by month, use a materialized view that groups by city and month.
Use GROUP BY on columns which correspond to levels in a dimension but not on columns that are functionally dependent, because query rewrite will be able to use the functional dependencies automatically based on the DETERMINES clause in a dimension. For example, instead of grouping on city_name, group on city_id (as long as there is a dimension which indicates that the attribute city_id determines city_name, you will enable the rewrite of a query involving city_name).
Optimization with materialized views is based on cost and the optimizer needs statistics of both the materialized view and the tables in the query to make a cost-based choice. Materialized views should thus have statistics collected using either the ANALYZE TABLE statement or the DBMS_STATISTICS package.
See Also:
For more information about collecting statistics, please refer to "Generating Statistics". |