Oracle8i Application Developer's Guide - Fundamentals Release 8.1.5 A68003-01 |
|
This chapter discusses the procedures necessary to create and manage the different types of objects contained in a user's schema. The topics include:
Specific information is described in the following locations:
See Also:
Indexes are used in Oracle to provide quick access to rows in a table. Indexes provide faster access to data for operations that return a small portion of a table's rows.
Oracle does not limit the number of indexes you can create on a table. However, you should consider the performance benefits of indexes and the needs of your database applications to determine which columns to index.
The following sections explain how to create, alter, and drop indexes using SQL commands. Some simple guidelines to follow when managing indexes are included.
With one notable exception, you should usually create indexes after you have inserted or loaded (using SQL*Loader or Import) data into a table. It is more efficient to insert rows of data into a table that has no indexes and then to create the indexes for subsequent queries, etc. If you create indexes before table data is loaded, then every index must be updated every time you insert a row into the table. The exception to this rule is that you must create an index for a cluster before you insert any data into the cluster.
When you create an index on a table that already has data, Oracle must use sort space to create the index. Oracle uses the sort space in memory allocated for the creator of the index (the amount per user is determined by the initialization parameter SORT_AREA_SIZE
), but must also swap sort information to and from temporary segments allocated on behalf of the index creation. If the index is extremely large, it might be beneficial to complete the following steps:
CREATE TABLESPACE
command.
TEMPORARY TABLESPACE
option of the ALTER USER
command to make this your new temporary tablespace.
CREATE INDEX
command.
DROP TABLESPACE
command. Then use the ALTER USER
command to reset your temporary tablespace to your original temporary tablespace.
Under certain conditions, you can load data into a table with the SQL*Loader "direct path load", and an index can be created as data is loaded.
Use the following guidelines for determining when to create an index:
Primary and unique keys automatically have indexes, but you might want to create an index on a foreign key; see Chapter 5, "Maintaining Data Integrity" for more information.
See Also:
Some columns are strong candidates for indexing. Columns with one or more of the following characteristics are candidates for indexing:
WHERE COL_X > -9.99 *power(10,125)
is preferable to
WHERE COL_X IS NOT NULL
This is because the first uses an index on COL_X
(assuming that COL_X
is a numeric column).
Columns with the following characteristics are less suitable for indexing:
LONG
and LONG
RAW
columns cannot be indexed.
The size of a single index entry cannot exceed roughly one-half (minus some overhead) of the available space in the data block. Consult with the database administrator for assistance in determining the space required by an index.
A table can have any number of indexes. However, the more indexes, the more overhead is incurred as the table is altered. When rows are inserted or deleted, all indexes on the table must be updated. When a column is updated, all indexes on the column must be updated.
Thus, there is a trade-off between speed of retrieval for queries on a table and speed of accomplishing updates on the table. For example, if a table is primarily read-only, then more indexes might be useful; but, if a table is heavily updated, then fewer indexes might be preferable.
The order in which columns are named in the CREATE
INDEX
command does not need to correspond to the order in which they appear in the table. However, the order of columns in the CREATE
INDEX
statement is significant because query performance can be affected by the order chosen. In general, you should put the column expected to be used most often first in the index.
For example, assume the columns of the VENDOR_PARTS
table are as shown in Figure 6-1.
Assume that there are five vendors, and each vendor has about 1000 parts.
Suppose that the VENDOR_PARTS
table is commonly queried by SQL statements such as the following:
SELECT * FROM vendor_partsWHERE part_no = 457 AND vendor_id = 1012;
To increase the performance of such queries, you might create a composite index putting the most selective column first; that is, the column with the most values:
CREATE INDEX ind_vendor_idON vendor_parts (part_no, vendor_id);
Indexes speed retrieval on any query using the leading portion of the index. So in the above example, queries with WHERE
clauses using only the PART_NO
column also note a performance gain. Because there are only five distinct values, placing a separate index on VENDOR_ID
would serve no purpose.
You can create an index for a table to improve the performance of queries issued against the corresponding table. You can also create an index for a cluster. You can create a composite index on multiple columns up to a maximum of 16 columns. A composite index key cannot exceed roughly one-half (minus some overhead) of the available space in the data block.
Oracle automatically creates an index to enforce a UNIQUE
or PRIMARY
KEY
integrity constraint. In general, it is better to create such constraints to enforce uniqueness and not explicitly use the obsolete CREATE
UNIQUE
INDEX
syntax.
Use the SQL commandCREATE
INDEX
to create an index. The following statement CREATE INDEX emp_ename ON Emp_tab(ename)TABLESPACE users STORAGE (INITIAL 20K NEXT 20kPCTINCREASE 75)PCTFREE 0;
Notice that several storage settings are explicitly specified for the index.
To create a new index, you must own, or have the INDEX
object privilege for, the corresponding table. The schema that contains the index must also have a quota for the tablespace intended to contain the index, or the UNLIMITED
TABLESPACE
system privilege. To create an index in another user's schema, you must have the CREATE
ANY
INDEX
system privilege.
You might drop an index for the following reasons:
When you drop an index, all extents of the index's segment are returned to the containing tablespace and become available for other objects in the tablespace.
Use the SQL command DROP
INDEX
to drop an index. For example, to drop the EMP_ENAME
index, enter the following statement:
DROP INDEX Emp_ename;
If you drop a table, then all associated indexes are dropped.
To drop an index, the index must be contained in your schema or you must have the DROP
ANY
INDEX
system privilege.
A function-based index is an index built on an expression. It extends your indexing capabilities beyond indexing on a column. A function-based index increases the variety of ways in which you can access data.
The expression used in a function-based index can be an arithmetic expression or an expression that contains a PL/SQL function, package function, C callout, or SQL function. Function-based indexes also support linguistic sorts based on linguistic sort keys (collation), efficient linguistic collation of SQL statements, and case-insensitive sorts.
Like other indexes, function-based indexes improve query performance. For example, if you need to access a computationally complex expression often, then you can store it in an index. Then when you need to access the expression, it is already computed. You can find a detailed description of the advantages of function-based indexes in "Using Function-Based Indexes".
Function-based indexes have all of the same properties as indexes on columns. However, unlike indexes on columns which can be used by both cost-based and rule-based optimization, function-based indexes can be used by only by cost-based optimization. Other restrictions on function-based indexes are described in "Restrictions on Function-Based Indexes".
See Also:
For more information on function-based indexes, see Oracle8i Concepts. For information on creating function-based indexes, see Oracle8i Administrator's Guide. |
The following list describes the advantages of function-based indexes in greater detail:
WHERE
clause below:
CREATE INDEX Idx ON Example_tab(Column_a + Column_b); SELECT * FROM Example_tab WHERE Column_a + Column_b < 10;
In the CREATE
INDEX
statement, idx
is the name of the index, Example_tab
is the name of the table, and column_a
and column_b
represent columns. The optimizer can use a range scan for this query because the index is built on (column_a
+ column_b
). Range scans typically produce fast response times if the predicate has low selectivity (that is, if the predicate selects less than 15% of the rows of a large table). In addition, the optimizer can estimate selectivity of predicates involving expressions more accurately if the expressions are materialized in a function-based index (expressions of function-based indexes are represented as virtual columns and ANALYZE
can build histograms on such columns).
MAP
method to build indexes on an object type column.
UPPER
and LOWER
functions, descending order sorts with the DESC
function, and linguistic-based sorts with the NLSSORT
function.
See Also:
For examples of how to use function-based indexes, see the Oracle8i Administrator's Guide. |
As an example, consider a weather research institute that maintains tables of weather data for various cities. Some of their projects include tracking daily temperature fluctuations throughout the year. Other projects include tracking fluctuations as a function of the city's distance from the equator. By building indexes on the complex functions that they want to calculate, the institute can optimize the execution of the queries they submit. The following section contains examples of indexes that could be created and the queries that could use them.
The table, Weatherdata_tab
, contains columns for the minimum daily temperature (Mintemp
), maximum daily temperature (Maxtemp
), the day the temperature was recorded (Day
), and the Region (Region_Obj
). Region_Obj
is an object column that contains columns for country (Country
) and city (Cityname
). Figure 6-2 illustrates the Weatherdata_tab
schema.
An index is created that calculates the difference in temperature for the cities in the tables. A query that could use the delta_index
index returns the contents of the table for temperature differences less than 20:
CREATE INDEX Delta_index ON Weatherdata_tab (Maxtemp - Mintemp); SELECT * FROM Weatherdata_tab WHERE (Maxtemp - Mintemp) < '20';
An index is created that calls the object method distance_from_equator
to calculate the distance from the equator for each city in the table. The method is applied to the object column Region_Obj
. A query that could use the distance_index
index returns the names of the cities that are at a distance greater than 1000 miles from the equator:
CREATE INDEX Distance_index ON Weatherdata_tab (Distance_from_equator (Reg_obj)); SELECT * FROM Weatherdata_tab WHERE (Distance_from_equator (Reg_Obj)) > '1000';
An index is created that satisfies the queries of German-speaking users that sorts temperature data by city name. A query that could use the City_index
index returns the contents of the table, ordered by city name. The German sort order for city name is used. Note that in the SELECT
statement, a WHERE
clause is not needed. This is because in a German session, NLSSORT
is set to German
.
CREATE INDEX City_index ON Weatherdata_tab (NLSSORT(Cityname, 'NLS_SORT=German')); SELECT * FROM Weatherdata_tab ORDER BY Cityname;
An index is created on the difference between the maximum and minimum temperatures, and on the maximum temperature. The result of the difference is sorted in descending order. A query that could use the compare_index
index returns the contents of the table that satisfy the condition where the difference is less than 20 and the maximum temperature is greater than 75.
CREATE INDEX compare_index ON Weatherdata_tab ((Maxtemp - Mintemp) DESC, Maxtemp); SELECT * FROM Weatherdata_tab WHERE ((Maxtemp - Mintemp) < '20' AND Maxtemp > '75');
The following command creates a function-based index IDX
on table EMP_TAB
.
CREATE INDEX Idx ON Emp_tab (UPPER(Ename));
The SELECT
command uses the function-based index on UPPER
(e_name
) to return all of the employees with name like :KEYCOL
.
SELECT * FROM Emp_tab WHERE UPPER(Ename) like :KEYCOL;
The following command creates a function-based index IDX
on table Fbi_tab where A, B, and C represent columns.
CREATE INDEX Idx On Fbi_tab (A + B * (C - 1), A, B);
The SELECT
statement can either use index range scan (notice that the expression is a prefix of index IDX
) or index fast full scan (which may be preferable if the index has specified a high parallel degree).
SELECT a FROM Fbi_tab Where A + B * (C - 1) < 100;
This example demonstrates how a function-based index can be used to support an NLS Sort Index. Given a string, the NLSSORT
function returns a sort key. The following CREATE
INDEX
statement creates an NLS_SORT
sort on table NLS_TAB
with collation sequence GERMAN
.
CREATE INDEX Nls_index ON Nls_tab (NLSSORT(Name, 'NLS_SORT = German'));
The SELECT
statement selects all of the contents of the table and orders it by NAME
. The rows are ordered using the German collation sequence.
SELECT * FROM Nls_tab ORDER BY Name;
This example demonstrates a case-insensitive search. The UPPER
function converts the ENAMEs
to all uppercase letters:
CREATE INDEX Case_insensitive_idx ON Emp_tab (UPPER(Ename));
An example query which would use this index is:
SELECT * FROM Emp_tab WHERE UPPER(Ename) = 'JOE';
Note the following restrictions on function-based indexes:
DETERMINISTIC
. There is no error checking whether or not a subprogram is qualified as DETERMINISTIC
. It is up to you to ensure that a subprogram is DETERMINISTIC
.
The following semantic rules demonstrate how to use the keyword DETERMINISTIC
:
PACKAGE
level subprogram can be declared as DETERMINISTIC in the PACKAGE
specification but not in the PACKAGE
BODY
. Errors will be raised if DETERMINISTIC is used inside a PACKAGE
BODY
.
PACKAGE
BODY
) cannot be declared as DETERMINISTIC.
DETERMINISTIC
subprogram can call another subprogram whether the called program is declared as DETERMINISTIC
or not.
LOB
columns.
VARCHAR2
or RAW
data types from a PL/SQL function are not permitted due to length restrictions. A possible work around is to use substrings to limit the size of the function's output. For example:
SUBSTR (F(X), 1, 100)
Where F(X)
represents the PL/SQL function. The SUBSTR
command would need to be used for the function when creating the index and when referencing the function in queries.
Because clusters store related rows of different tables together in the same data blocks, two primary benefits are achieved when clusters are properly used:
Some guidelines for creating clusters are outlined below.
Use clusters to store one or more tables that are primarily queried (not predominantly inserted into or updated), and for which queries often join data of multiple tables in the cluster or retrieve related data from a single table.
Choose cluster key columns carefully. If multiple columns are used in queries that join the tables, then make the cluster key a composite key. In general, the same column characteristics that make a good index apply for cluster indexes.
A good cluster key has enough unique values so that the group of rows corresponding to each key value fills approximately one data block. Too few rows per cluster key value can waste space and result in negligible performance gains. Cluster keys that are so specific that only a few rows share a common value can cause wasted space in blocks, unless a small SIZE
was specified at cluster creation time.
Too many rows per cluster key value can cause extra searching to find rows for that key. Cluster keys on values that are too general (for example, MALE
and FEMALE
) result in excessive searching and can result in worse performance than with no clustering.
A cluster index cannot be unique or include a column defined as LONG
.
Also note that clusters can reduce the performance of DML statements (INSERT
s, UPDATE
s, and DELETE
s) as compared to storing a table separately with its own index. These disadvantages relate to the use of space and the number of blocks that must be visited to scan a table. Because multiple tables share each block, more blocks must be used to store a clustered table than if that same table were stored non-clustered. You should decide about using clusters with these trade-offs in mind.
To identify data that would be better stored in clustered form than in non-clustered form, look for tables that are related via referential integrity constraints, and tables that are frequently accessed together using SELECT
statements that join data from two or more tables. If you cluster tables on the columns used to join table data, then you reduce the number of data blocks that must be accessed to process the query; all the rows needed for a join on a cluster key are in the same block. Therefore, query performance for joins is improved.
Similarly, it may be useful to cluster an individual table. For example, the EMP_TAB
table could be clustered on the DEPTNO
column to cluster the rows for employees in the same department. This would be advantageous if applications commonly process rows, department by department.
Like indexes, clusters do not affect application design. The existence of a cluster is transparent to users and to applications. Data stored in a clustered table is accessed via SQL just like data stored in a non-clustered table.
Use a cluster to store one or more tables that are frequently joined in queries. Do not use a cluster to cluster tables that are frequently accessed individually.
Once you create a cluster, tables can be created in the cluster. However, before you can insert any rows into the clustered tables, you must create a cluster index. The use of clusters does not affect the creation of additional indexes on the clustered tables; you can create and drop them as usual.
Use the SQL command CREATE
CLUSTER
to create a cluster. The following statement creates a cluster named EMP_DEPT
, which stores the EMP_TAB
and DEPT_TAB
tables, clustered by the DEPTNO
column:
CREATE CLUSTER Emp_dept (Deptno NUMBER(3))PCTUSED 80 PCTFREE 5;
Create a table in a cluster using the SQL command CREATE
TABLE
with the CLUSTER
option. For example, the EMP_TAB
and DEPT_TAB
tables can be created in the EMP_DEPT
cluster using the following statements:
CREATE TABLE Dept_tab (Deptno NUMBER(3) PRIMARY KEY, . . . ) CLUSTER Emp_dept (Deptno);CREATE TABLE Emp_tab (Empno NUMBER(5) PRIMARY KEY, Ename VARCHAR2(15) NOT NULL, . . . Deptno NUMBER(3) REFERENCES Dept_tab) CLUSTER Emp_dept (Deptno);
A table created in a cluster is contained in the schema specified in the CREATE
TABLE
statement; a clustered table might not be in the same schema that contains the cluster.
You must create a cluster index before any rows can be inserted into any clustered table. For example, the following statement creates a cluster index for the EMP_DEPT
cluster:
CREATE INDEX Emp_dept_indexON CLUSTER Emp_dept INITRANS 2 MAXTRANS 5 PCTFREE 5;
The cluster key establishes the relationship of the tables in the cluster.
To create a cluster in your schema, you must have the CREATE
CLUSTER
system privilege and a quota for the tablespace intended to contain the cluster or the UNLIMITED
TABLESPACE
system privilege. To create a cluster in another user's schema, you must have the CREATE
ANY
CLUSTER
system privilege, and the owner must have a quota for the tablespace intended to contain the cluster or the UNLIMITED
TABLESPACE
system privilege.
To create a table in a cluster, you must have either the CREATE
TABLE
or CREATE
ANY
TABLE
system privilege. You do not need a tablespace quota or the UNLIMITED
TABLESPACE
system privilege to create a table in a cluster.
To create a cluster index, your schema must contain the cluster, and you must have the following privileges:
CREATE
ANY
INDEX
system privilege or, if you own the cluster, the CREATE
INDEX
privilege
UNLIMITED
TABLESPACE
system privilege
Oracle dynamically allocates additional extents for the data segment of a cluster, as required. In some circumstances, you might want to explicitly allocate an additional extent for a cluster. For example, when using the Oracle Parallel Server, an extent of a cluster can be allocated explicitly for a specific instance.
You can allocate a new extent for a cluster using the SQL command ALTER
CLUSTER
with the ALLOCATE
EXTENT
option.
Drop a cluster if the tables currently within the cluster are no longer necessary. When you drop a cluster, the tables within the cluster and the corresponding cluster index are dropped; all extents belonging to both the cluster's data segment and the index segment of the cluster index are returned to the containing tablespace and become available for other segments within the tablespace.
You can individually drop clustered tables without affecting the table's cluster, other clustered tables, or the cluster index. Drop a clustered table in the same manner as a non-clustered table--use the SQL command DROP
TABLE
.
See "Dropping Tables" for more information about individually dropping tables.
You can drop a cluster index without affecting the cluster or its clustered tables. However, you cannot use a clustered table if it does not have a cluster index. Cluster indexes are sometimes dropped as part of the procedure to rebuild a fragmented cluster index.
To drop a cluster that contains no tables, as well as its cluster index, if present, use the SQL command DROP
CLUSTER
. For example, the following statement drops the empty cluster named EMP_DEPT
:
DROP CLUSTER Emp_dept;
If the cluster contains one or more clustered tables, and if you intend to drop the tables as well, then add the INCLUDING
TABLES
option of the DROP
CLUSTER
command. For example:
DROP CLUSTER Emp_dept INCLUDING TABLES;
If you do not include the INCLUDING
TABLES
option, and if the cluster contains tables, then an error is returned.
If one or more tables in a cluster contain primary or unique keys that are referenced by FOREIGN
KEY
constraints of tables outside the cluster, then you cannot drop the cluster unless you also drop the dependent FOREIGN KEY
constraints. Use the CASCADE
CONSTRAINTS
option of the DROP
CLUSTER
command, as in
DROP CLUSTER Emp_dept INCLUDING TABLES CASCADE CONSTRAINTS;
An error is returned if the above option is not used in the appropriate situation.
To drop a cluster, your schema must contain the cluster, or you must have the DROP
ANY
CLUSTER
system privilege. You do not have to have any special privileges to drop a cluster that contains tables, even if the clustered tables are not owned by the owner of the cluster.
The following sections explain how to create, alter, and drop hash clusters and clustered tables using SQL commands.
A hash cluster is used to store individual tables or a group of clustered tables that are static and often queried by equality queries. Once you create a hash cluster, you can create tables. To create a hash cluster, use the SQL command CREATE
CLUSTER
. The following statement creates a cluster named TRIAL_CLUSTER
that is used to store the TRIAL_TAB
table, clustered by the TRIALNO
column:
CREATE CLUSTER Trial_cluster ( Trialno NUMBER(5,0))PCTUSED 80 PCTFREE 5 SIZE 2K HASH IS Trialno HASHKEYS 100000;CREATE TABLE Trial_tab (Trialno NUMBER(5) PRIMARY KEY, ...) CLUSTER Trial_cluster (Trialno);Controlling Space Usage Within a Hash Cluster
When you create a hash cluster, it is important that you correctly choose the cluster key and set the
HASH
IS,
SIZE
, andHASHKEYS
parameters to achieve the desired performance and space usage for the cluster. The following sections provide guidance, as well as examples of setting these parameters.Choosing the Key
Choosing the correct cluster key is dependent on the most common types of queries issued against the clustered tables. For example, consider the
EMP_TAB
table in a hash cluster. If queries often select rows by employee number, then theEMPNO
column should be the cluster key; if queries often select rows by department number, then theDEPTNO
column should be the cluster key. For hash clusters that contain a single table, the cluster key is typically the entire primary key of the contained table. A hash cluster with a composite key must use Oracle's internal hash function.Setting HASH IS
Only specify the
HASH
IS
parameter if the cluster key is a single column of theNUMBER
datatype and contains uniformly distributed integers. If the above conditions apply, then you can distribute rows in the cluster such that each unique cluster key value hashes to a unique hash value (with no collisions). If the above conditions do not apply, you should use the internal hash function.Dropping Hash Clusters
Drop a hash cluster using the SQL command
DROP
CLUSTER
:DROP CLUSTER Emp_dept;Drop a table in a hash cluster using the SQL command
DROP
TABLE
. The implications of dropping hash clusters and tables in hash clusters are the same as for index clusters.When to Use Hashing
Storing a table in a hash cluster is an alternative to storing the same table with an index. Hashing is useful in the following situations:
SELECT . . . WHERE Cluster_key = . . . ;
In such cases, the cluster key in the equality condition is hashed, and the corresponding hash key is usually found with a single read. With an indexed table, the key value must first be found in the index (usually several reads), and then the row is read from the table (another read).
HASH
IS
col, HASHKEYS
n, and SIZE
m clauses is an ideal representation for an array (table) of n items (rows) where each item consists of m bytes of data. For example:
ARRAY X[100] OF NUMBER(8)
This could be represented as the following:
CREATE CLUSTER C(Subscript INTEGER)HASH IS Subscript HASHKEYS 100 SIZE 100;CREATE TABLE X(Subscript NUMBER(2), Value NUMBER(8))CLUSTER C(Subscript);
Alternatively, hashing is not advantageous in the following situations:
SELECT . . . WHERE Cluster_key < . . . ;
A hash function cannot be used to determine the location of specific hash keys; instead, the equivalent of a full table scan must be done to fetch the rows for the query. With an index, key values are ordered in the index, so cluster key values that satisfy the WHERE
clause of a query can be found with relatively few I/Os.
In most cases, you should decide (based on the above information) whether to use hashing or indexing. If you use indexing, consider whether it is best to store a table individually or as part of a cluster.
If you decide to use hashing, then a table can still have separate indexes on any columns, including the cluster key.
See Also:
For additional guidelines on the performance characteristics of hash clusters, see Oracle8i Tuning. |