4
Distributed Database System Application Development

This chapter describes the special considerations that are necessary if you are designing an application to run in a distributed database system. Oracle8i Concepts describes how Oracle eliminates much of the need to design applications specifically to work in a distributed environment.

The topics covered include:

The Oracle8i Administrator's Guide provides a complete discussion of implementing Oracle8i applications. This chapter provides information specific to development for an Oracle8i distributed database environment. See also Oracle8i Application Developer's Guide - Fundamentals for more information about application development in an Oracle environment.

Factors Affecting the Distribution of an Application's Data

In a distributed database environment, you should coordinate with the database administrator to determine the best location for the data. Some issues to consider are:

number of transactions posted from each location
amount of data (portion of table) used by each node
performance characteristics and reliability of the network
speed of various nodes, capacities of disks
importance of a node or link when it is unavailable
need for referential integrity among tables

Controlling Connections Established by Database Links

When a global object name is referenced in a SQL statement or remote procedure call, database links establish a connection to a session in the remote database on behalf of the local user. The remote connection and session are only created if the connection has not already been established previously for the local user session.

The connections and sessions established to remote databases persist for the duration of the local user's session, unless the application (or user) explicitly terminates them. Terminating remote connections established using database links is useful for disconnecting high cost connections (such as long distance phone connections) that are no longer required by the application.

The application developer or user can close (terminate) a remote connection and session using the ALTER SESSION command with the CLOSE DATABASE LINK parameter. For example, assume you issue the following query:

SELECT * FROM emp@sales;
COMMIT;

The following statement terminates the session in the remote database pointed to by the SALES database link:

ALTER SESSION CLOSE DATABASE LINK sales;

To close a database link connection in your user session, you must have the ALTER SESSION system privilege.

Note:
Before closing a database link, you must first close all cursors that use the link and then end your current transaction if it uses the link.

Referential Integrity in a Distributed System

Oracle does not permit declarative referential integrity constraints to be defined across nodes of a distributed system (that is, a declarative referential integrity constraint on one table cannot specify a foreign key that references a primary or unique key of a remote table). However, parent/child table relationships across nodes can be maintained using triggers. For more information about triggers to enforce referential integrity, see Oracle8i Concepts.

Note:
If you decide to define referential integrity across the nodes of a distributed database using triggers, be aware that network failures can limit the accessibility of not only the parent table, but also the child table.

For example, assume that the child table is in the SALES database and the parent table is in the HQ database. If the network connection between the two databases fails, some DML statements against the child table (those that insert rows into the child table or update a foreign key value in the child table) cannot proceed because the referential integrity triggers must have access to the parent table in the HQ database.

Distributed Queries

A distributed query is decomposed by the local Oracle into a corresponding number of remote queries, which are sent to the remote nodes for execution. The remote nodes execute the queries and send the results back to the local node. The local node then performs any necessary post-processing and returns the results to the user or application.

If a portion of a distributed statement fails, for example, due to an integrity constraint violation, Oracle returns error number ORA-02055. Subsequent statements or procedure calls return error number ORA-02067 until a rollback or rollback to savepoint is issued.

You should design your application to check for any returned error messages that indicate that a portion of the distributed update has failed. If you detect a failure, you should rollback the entire transaction (or rollback to a savepoint) before allowing the application to proceed.

Tuning Distributed Queries

The most effective way of optimizing your distributed queries is to access the remote database(s) as little as possible and to retrieve only the required data. Specifically, if you reference 5 remote tables from two different remote databases in a distributed query and have a complex filter (e.g. WHERE r1.salary + r2.salary > 50000), you can improve the performance of the query by rewriting the query to access the remote databases once and to apply the filter at the remote site (causing less data to be transferred to the query execution site). Rewriting your query to access the remote database once is achieved by using collocated inline views.

With the above information in mind, the following terms need to be defined:

Collocated: Two or more tables located in the same database.
Inline View: A SELECT statement that is substituted for a table in a parent SELECT statement. The embedded SELECT statement (in bold) is an example of an inline view:
```
SELECT e.empno, e.ename, d.deptno, d.dname 
   FROM (SELECT empno, ename from emp@orc1.world) e, dept d;
```
Collocated Inline View: An inline view that selects data from multiple tables from a single database only (reduces the amount of times that the remote database is accessed, improving the performance of a distributed query).

Though you can write a distributed query in any fashion that you like, it is highly recommended that you form your distributed query using collocated inline views to increase the performance of your distributed query if possible.

Oracle's cost based optimization can transparently rewrite many of your distributed queries to take advantage of the performance gains offered by collocated inline views.

Cost Based Optimization

In addition to rewriting your queries with collocated inline views, the cost based optimization method will optimize your distributed queries according to the gathered statistics of the referenced tables and the computations performed by the optimizer. For example, cost based optimization will analyze the following query (notice that it analyzes the query inside a CREATE TABLE statement):

CREATE TABLE AS (SELECT l.a, l.b, r1.c, r1.d, r1.e, r2.b, r2.c
   FROM local l, remote1 r1, remote2 r2 
      WHERE l.c = r.c AND r1.c = r2.c AND r.e > 300);

and rewrite it as:

CREATE TABLE AS (SELECT l.a, l.b, v.c, v.d, v.e

   FROM (SELECT r1.c, r1.d, r1.e, r2.b, r2.c FROM remote1 r1, remote2 r2

      WHERE r1.c = r2.c AND r1.e > 300) v, local l

   WHERE l.c = r1.c);

The alias V is assigned to the inline view which can then be referenced as a table in the above SELECT statement. Creating a collocated inline view reduces the amount of queries performed at a remote site, thereby reducing costly network traffic.

Setup Cost Based Optimization

After you have set up your system to use cost based optimization to improve the performance of your distributed queries (as well as other types of queries - see the Oracle8i Tuning manual for more information), the operation will be transparent to the user, that is the optimization will occur automatically when the query is issued.

You need to complete the following tasks to set up your system to take advantage of Oracle's optimizer:

Set Up Database Environment
Analyze Tables (to generate table statistics)

Set Up Environment

To enable cost based optimization, the OPTIMIZER_MODE parameter must be set to CHOOSE or COST. This parameter can be persistently set by modifying the OPTIMZER_MODE parameter in the parameter file (INIT.ORA) or set on a session-level by issuing an ALTER SESSION command.

See the Oracle8i Tuning manual for information on setting the OPTIMZER_MODE parameter in the parameter file (INIT.ORA) file.

Issue the following ALTER SESSION statement to set the OPTIMIZER_MODE at the session level (this setting will be valid for the current session only):

ALTER SESSION OPTIMIZER_MODE = CHOOSE;

ALTER SESSION OPTIMIZER_MODE = COST;

See the Oracle8i Tuning manual for more information on configuring your system to use a cost based optimization method.

Analyze Tables

In order for cost based optimization to select the most efficient path for your query, you must provide accurate statistics for the tables involved in the distributed query.

The easiest way to generate statistics for a table is to execute the ANALYZE command. For example, if you reference the EMP and DEPT tables in your distributed query, you would execute the following to generate the necessary statistics:

ANALYZE TABLE emp COMPUTE STATISTICS;
ANALYZE TABLE dept COMPUTE STATISTICS;






Note:



You must connect locally with respect to the tables to execute the ANALYZE statement. You cannot execute the following:     
ANALYZE TABLE remote@remote.com COMPUTE STATISTICS;


     


You must first connect to the remote site and then execute the above ANALYZE statement.

See the Oracle8i SQL Reference book for additional information on using the ANALYZE statement.

To generate statistics for more than one object at a time, see the "Generating Statistics" section in the Oracle8i Tuning manual. Additionally, see the "Automated Statistics Gathering" section in the Oracle8i Tuning manual to learn how to automate the process of keeping your statistics current, thus improving the performance and accuracy of cost based optimization.

How Does Cost Based Optimization Work?

As illustrated in the introduction to "Tuning Distributed Queries", the optimizer's main task is to rewrite a distributed query to use collocated inline views. This optimization is performed in three steps:

All Mergeable Views are Merged
Optimizer Performs Collocated Query Block Test
Optimizer Rewrites Query Using Collocated Inline Views

After the query is rewritten, it is executed and the data set is returned to the user.

Cost Based Optimization Restrictions

While cost based optimization is performed transparently to the user, there are several distributed query scenarios that cost based optimization is not able improve the performance upon. Specifically, if your distributed query contains any of the following, cost based optimization will not be effective:

Aggregates
Subqueries
Complex SQL

If your distributed query contains one of the above, make sure you read the "Extend Cost Based Optimization with Hints" section to learn how you can modify your query and use hints to improve the performance of your distributed query.

Extend Cost Based Optimization with Hints

If you have a distributed query that the optimizer cannot handle (see "Cost Based Optimization Restrictions"), you can use hints to extend the capability of cost based optimization. Specifically, if you write your own query that utilizes collocated inline views, you will want to instruct the CBO to not rewrite your distributed query.

Additionally, if you have special knowledge about the database environment (i.e. statistics, load, network and CPU limitations, distributed queries, etc.), you can specify a hint to guide cost based optimization.

For the purposes of optimizing distributed queries, you will provide hints based on your knowledge of the distributed query. Specifically, if you have written your own optimized query using collocated inline views that are based on your knowledge of the database environment, specify the NO_MERGE hint to prevent the optimizer from rewriting your query.

This technique is especially helpful if your distributed query contains an aggregate, subquery, or complex SQL. Since this type of distributed query cannot be rewritten by the optimizer, specifying NO_MERGE will cause the optimizer to skip the steps described in the "How Does Cost Based Optimization Work?" section.

The DRIVING_SITE hint allows you to define a site that is remote to you to act as the query execution site. This is especially helpful when the remote site contains the majority of the data and the query will perform better if executed from that remote site and the resultant data set returned to the local site.

NO_MERGE

The NO_MERGE hint prevents Oracle from merging an inline view into a potentially non-collocated SQL statement (see step 1 in the "How Does Cost Based Optimization Work?" section). This hint is embedded in your SELECT statement and can appear either at the beginning of the SELECT statement with the inline view as an argument or in the query block that defines the inline view.

With Argument:

SELECT /*+NO_MERGE(v)*/ t1.x, v.avg_y 

   FROM t1, (SELECT x, AVG(y) AS avg_y FROM t2 GROUP BY x) v,

   WHERE t1.x = v.x AND t1.y = 1;

In Query Block

SELECT t1.x, v.avg_y 

   FROM t1, (SELECT /*+NO_MERGE*/ x, AVG(y) AS avg_y FROM t2 GROUP BY x) v,

   WHERE t1.x = v.x AND t1.y = 1;

You will most likely use this hint if you have developed an optimized query based on your knowledge of your database environment. For more information, see the NO_MERGE hint in the Oracle8i Tuning manual.

DRIVING_SITE

The DRIVING_SITE hint allows you to specify the site where the query execution is performed. It is highly recommended that you let the cost based optimization determine where the execution should be performed, but if you want to override the optimizer (either because your statistics are stale or performance on a particular machine has been severely degraded), you can specify the execution site with the DRIVING_SITE hint. A SELECT statement with a DRIVING_SITE hint might look like:

SELECT /*+DRIVING_SITE(dept)*/ * FROM emp, dept@remote.com

   WHERE emp.deptno = dept.deptno;

For more information, see DRIVING_SITE in the Oracle8i Tuning manual. For more information about tuning distributed queries, see "Tuning Distributed Queries" and Oracle8i Tuning.

Verifying Optimization

An important aspect to tuning your distributed queries is to analyze the execution plan for a query. The feedback that you receive from your analysis is an important element to testing and verifying your database. This verification is increasingly important when you want to compare the execution plan for a distributed query that is optimized by cost based optimization versus the execution plan for a distributed query that you manually optimize (using hints, defining collocated inline views, etc.). See the Oracle8i Tuning manual for detailed information about execution plans, the EXPLAIN PLAN command, and how to interpret the results.

Prepare Database

Before you can view the execution plan for you distributed query, you must prepare you database to store the execution plan. This preparation is easily performed by executing a script; complete the following to prepare your database to store an execution plan:

@utlxplan.sql

Note:
The location of the UTLXPLAN.SQL file depends on your operating system.

After you execute the UTLXPLAN.SQL file, a PLAN_TABLE will be created in the current schema to temporarily store the execution plan.

Generate Execution Plan

Once you have prepared your database to store the execution plan, you are ready to view the execution plan for a specified query. Instead of directly executing the SQL statement, you append the statement with the EXPLAIN PLAN FOR clause. For example, you might execute the following:

EXPLAIN PLAN FOR
   SELECT d.dname FROM dept d
      WHERE d.deptno IN
         (SELECT deptno FROM emp@orc2.world
            GROUP BY deptno
               HAVING COUNT (deptno) >3);

View Execution Plan

After you have executed the above SQL statement, the execution plan will be stored temporarily in the PLAN_TABLE that you created earlier. To view the results of the execution plan, execute the following script:

@utlxpls.sql

Note:
The location of the UTLXPLS.SQL file depends on your operating system.

Executing the UTLXPLS.SQL file will display the execution plan for the SELECT statement that you specified. Your results will be formatted like the following:

Plan Table
--------------------------------------------------------------------------------
| Operation                 |  Name    |  Rows | Bytes|  Cost  | Pstart| Pstop |
--------------------------------------------------------------------------------
| SELECT STATEMENT          |          |       |      |        |       |       |
|  NESTED LOOPS             |          |       |      |        |       |       |
|   VIEW                    |          |       |      |        |       |       |
|    REMOTE                 |          |       |      |        |       |       |
|   TABLE ACCESS BY INDEX RO|DEPT      |       |      |        |       |       |
|    INDEX UNIQUE SCAN      |PK_DEPT   |       |      |        |       |       |
--------------------------------------------------------------------------------

If you are manually optimizing your distributed queries by writing your own collocated inline views and/or using hints, you are advised to generate an execution plan before and after your manual optimization. With both execution plans, you can compare the effectiveness of your manual optimization and make changes to your optimization as necessary to improve the performance of your distributed query.

If you want to view the SQL statement that will be executed at the remote site, execute the following select statement:

SELECT other FROM plan_table WHERE operation = 'REMOTE';

Your output might look like the following:

SELECT DISTINCT "A1"."DEPTNO" FROM "EMP" "A1" 
   GROUP BY "A1"."DEPTNO" HAVING COUNT("A1"."DEPTNO")>3

Note:
If you are having difficulty viewing the entire contents of the OTHER column, you may need to execute the following:
SET LONG 9999999

Handling Errors in Remote Procedures

When a procedure is executed locally or at a remote location, four types of exceptions can occur:

PL/SQL user-defined exceptions, which must be declared using the keyword EXCEPTION.
PL/SQL predefined exceptions, such as NO_DATA_FOUND keyword.
SQL errors, such as ORA-00900 and ORA-02015.
Application exceptions, which are generated using the RAISE_APPLICATION_ERROR() procedure.

When using local procedures, all of these messages can be trapped by writing an exception handler, such as shown in the following example:

BEGIN
  ...
EXCEPTION
   WHEN ZERO_DIVIDE THEN
   /* ...handle the exception */
END;

Notice that the WHEN clause requires an exception name. If the exception that is raised does not have a name, such as those generated with RAISE_APPLICATION_ERROR, one can be assigned using PRAGMA_EXCEPTION_INIT, as shown in the following example:

DECLARE
  null_salary EXCEPTION;
  PRAGMA EXCEPTION_INIT(null_salary, -20101);
BEGIN
  ...
  RAISE_APPLICATION_ERROR(-20101, 'salary is missing');
...
EXCEPTION
  WHEN null_salary THEN
  ...
END;

When calling a remote procedure, exceptions can be handled by an exception handler in the local procedure. The remote procedure must return an error number to the local, calling procedure, which then handles the exception as shown in the previous example. Note that PL/SQL user-defined exceptions always return ORA-06510 to the local procedure.

Therefore, it is not possible to distinguish between two different user-defined exceptions based on the error number. All other remote exceptions can be handled in the same manner as local exceptions.

4Distributed Database System Application Development