10
Design Considerations

This chapter describes various design considerations, including:

Designing the types

Structured and Unstructured Data

Structured data is one whose type is expressible to Oracle in the form of an Object Type. Unstructured data is one which is un-interpretable by Oracle, that is, whose type is a RAW or a BLOB. The choice of modeling cartridge data as structured or unstructured depends on the following considerations:

Structured data can be shared by different applications since the structure is published in Oracle.
Structured types provide strong type checking whereas unstructured data does not.
Structured data is easily queried whereas unstructured data is not. One has to publish user-defined functions to facilitate querying the unstructured data.
Constraints are easily supported on structured data but not on unstructured data.
Indexes are easily supported on structured data, whereas, on unstructured data indices on user-defined functions would need to be created, or extensible indexes would need to be defined.
Structured data needs to be marshalled by Oracle to be retrieved to client as a value, whereas, unstructured data is easily retrievable as a value.

Using Nested Tables or VARRAYs

In deciding whether to use a nested table or a VARRAY type to model a collection, it is important to understand how these are implemented in Oracle to make the right design choice. Logically, nested tables differ from VARRAYs in one fundamental way: VARRAYs represent ordered set of items whereas nested tables do not. But, physically, nested tables can only be represented as tables, whereas VARRAYs can only be represented as raw columns or LOB columns. The implications of their physical representation is the following:

Nested Tables

Nested tables are better suited for querying since elements are represented as rows.
Indices may be created on columns of nested tables for faster searches.
Constraints may be specified for nested tables.
Clustering of nested elements belonging to a common parent row is possible when the storage table is specified as an Index Organized Table, furthermore, specifying key compression reduces the overhead of the system assigned NESTED_TABLE_ID values.
When stored as a Heap Organized Table, creating an index on the NESTED_TABLE_ID column enhances retrieval of nested tables.
Retrieving the nested table as a value for a given parent incurs the overhead of selecting and marshalling the individual rows to form the collection value.
Even though parent tables may be partitioned, storage tables corresponding to their nested tables cannot be partitioned.

VARRAYs

A VARRAYs is better suited for retrieval as a value since that is s how it is stored.
Support for indexing, specification of constraints on VARRAYs is not available.
Querying of VARRAYs is sub-optimal since rows have to materialized from collection value.
Partitioning of VARRAYs stored as LOBs is permitted when the parent table is partitioned.

Based on the above implications, if the ability to query of update individual collection elements is important, then nested tables are a better choice to model your collection data. On the other hand, if your application is requires fetching the entire collection as a whole and then operating on it, modeling the collection data as a VARRAY will yield better retrieval performance.

Working Around Inheritance

Inheritance is a technique used in object-oriented development to create objects that contain generalized attributes and behavior for groups of related objects. The more general objects created using inheritance are referred to as a super-types. The objects that "inherit" from the super-types (i.e. are more specific cases of the super-type) are called subtypes.

A common case of inheritance is that of Person and Employee. Some instances of person are employees. The more general case, Person, is the super-type and the special case, Employee, the sub-type. Another example could involve a Vehicle as super-type and Car, Truck as its sub-types.

Figure 10-1 Class Diagram: Vehicle as Super-type, Car and Truck as Subtypes

Inheritance Implementation Consequences

Inheritance can imply various levels of encapsulation for super-types. In cases where the super-type should not be exposed to other objects, a subtype should contain the methods and attributes necessary to make the super-type invisible. To understand the implementation consequences of the inheritance, it is also important to remember that Oracle8i is a strongly-typed system. A strongly-typed system requires that the type of an attribute is declared when the attribute is declared. Only values of the declared type may be stored in the attribute. For example, the Oracle8i collections are strongly-typed. Oracle8i does not allow the implementation of heterogeneous collections (collections of multiple types).

See Also:

Working with User-Defined Types in Oracle8i Application Developer's Guide - Fundamentals

Simulating Inheritance

Inheritance can be implemented in Oracle8i using one of the following three techniques:

Subtype Contains Super-type
Super-type Contains or References All Subtypes
Dual Subtype / Super-type Reference.

Subtype Contains Super-type

Figure 10-2 Object-Relational Schema -- Subtype Contains Super-type

The Subtype Contains Super-type technique hides the implementation of the abstractions/generalizations for a subtype. Each of the subtypes are exposed to other types in the object model. The super-types are not exposed to other types. To simulate inheritance, the super-type in the design object model is created as an object type. The subtype is also created as an object type. The super-type is defined as an embedded attribute in the subtype. All of the methods that can be executed for the subtype and it's super-type must be defined in the subtype.

The Subtype Contains Super-type technique is used when each subtype has specific relationships to other objects in the object model. For example, a super-type of Customer may have subtypes of Private Customer and Corporate Customer. Private Customers have relationships with the Personal Banking objects, while Corporate Customers have relationships with the Commercial Banking objects. In this environment, the Customer super-type is not visible to the rest of the object model.

In the Vehicle-Car/Truck example, the Vehicle (super-type) is embedded in the sub-types Car and Truck.

Super-type Contains All Subtypes

Figure 10-3 Object-Relational Schema -- Super-type Contains All Subtypes

The Super-type Contains All Subtypes technique hides the implementation of the subtypes and only exposes the super-type. To simulate inheritance, all of the subtypes for a given super-type in the design object model are created as object types. The super-type is created as an object type as well. The super-type declares an attribute for each subtype. The super-type also declares the constraints to enforce the one-and-only-one rules for the subtype attributes. All of the methods that can be executed for the subtype must defined in the super-type.

The Super-type Contains All Subtypes technique is used when objects have relationships with other objects that are predominately one-to-many in multiplicity. For example, a Customer can have many Accounts and a Bank can have many Accounts. The many relationships require a collection for each subtype if the Subtype Contains Super-type technique is used. If the Account is a super-type and Checking and Savings are subtypes, both Bank and Customer must implement a collection of Checking and Savings (4 collections). Adding a new account subtype requires that both Customer and Bank add the collection to support the new account subtype (2 collections per addition). Using the Super-type Contains All Subtypes technique means that the customer and bank have a collection of Account. Adding a subtype to Accounts means that only account changes.

In the case of the Vehicle-Car/Truck, the Vehicle is created with Car and Truck as embedded attributes of Vehicle.

Dual Subtype / Super-type Reference

Figure 10-4 Object-Relational Schema -- Dual Subtype / Super-type Reference

In cases where the super-type is involved in multiple object-relationships with many for a multiplicity and the subtypes have specific relationships in the object model, the implementation of Inheritance is a combination of the two inheritance techniques. The super-type is implemented as an object type. Each subtype is implemented as an object type. The super-type implements a referenced attribute for each subtype (zero referenced relationship). The super-type also implements an or-association for the group of subtype attributes. Each subtype implements a referenced attribute for the super-type (one referenced relationship). In this way, both the super-type and sub-type are visible to the rest of the object model.

In the case of the Vehicle-Car/Truck, the Vehicle is created as an type. The Car and Truck are created as types. The Vehicle type implements a reference to both Car and Truck, with the or-constraint on the Car and Truck attributes. The Car implements an attribute that references Vehicle. The Truck implements an attribute that references Vehicle.

Writing Methods: PL/SQL, C or Java?

When writing methods for object types, you have multiple implementation choices - PL/SQL, C/C++ and Java. Of these, PL/SQL and Java methods run within the address space of the server. C/C++ methods are dispatched as external procedures and run outside the address space of the server.

The best implementation choice varies from situation to situation. The following rules of thumb might be of help.

A callout involving C or C++ is, in general, the fastest if the nature of processing is substantially CPU-bound. However, callouts incur the cost of dispatch, and if the amount of processing in C/C++ is not large then the cost of dispatch does not amortize very well.
PL/SQL tends to offer the best price-performance for methods that are not computation-intensive. The other implementation options are typically favored over PL/SQL if you have a large body of code already implemented in another language that you want to use a part of the data cartridge
Java is a relatively open implementation choice. The interpreted nature of Java implies that for high performance applications, some sort of compilation of methods written in Java will be needed.

Invokers Rights -- Why, When, How

Until release 8.1.5, stored procedures and SQL methods could only execute with the privileges of the definer. Such definer-rights routines are bound to the schema in which they reside, and this remains the default. Under this condition, a routine executes with the rights of the definer of the function, not the user invoking it. However, this is a limitation if the function statically or dynamically issues SQL statements.

For example, if the function had a static cursor that performs a SELECT from USER_TABLES, the USER_TABLES it would retrieve would be that of the definer irrespective of which user was using the function. For the function to be used against data not owned by the definer, explicit GRANTs had to be issued from the owner to the definer, or the function needed to be defined in the same schema where the data resided. The former course creates security and administration problems; the latter forces the function to be redefined in each schema that needs to use it.

The invoker-rights mechanism, introduced in Orace8i release 8.1.5, permits a function to execute with the privileges of the invoker. This permits cartridges to live within a schema dedicated to the cartridge and to be used by other schemas without requiring privileges be granted to operate on objects in the schema where the cartridge resides.

Callouts

When to Callout

You should consider utilizing callouts in the following circumstances:

When it would be impractical or impossible to code the algorithm you require in SQL.
When the performance gains of a compiled language (e.g., C) outweigh the extproc callout overhead
When you wish to leverage existing 3GL code

When to Callback

You should consider utilizing callbacks in the following circumstances:

When you need data that was not passed as an argument to the call out.
When it isn't practical to pass the data to the call out (e.g., the number and size of the parameters exceeds that which is allowed or performs well).

Consider making a single callout which does multiple callbacks rather than multiple callouts (e.g. instead of a factorial callout which takes a single number and computes a the factorial for it, consider making a callout which takes a VARRAY and repeatedly calls back to get next number to compute the factorial for. You always do performance testing to see at what at point the multi-call back approach out-performs the multi-callout approach

Callouts and LOB

It may be to your advantage to code your callout so that it is independent of LOB type (BFILE/BLOB).
The PL/SQL layer of your cartridge can "open" your BFILE so that no BFILE-specific logic is required in your callout (other than error recovery from OCILob calls that do not operate on BFILEs).
With the advent of temporary LOBs in Orace8i release 8.1.5, you need to be aware of the deep copy that can occur when assignments and calls are done with temporary LOBs. Use "NOCOPY" (by REF) on BLOB parameters as appropriate.

Saving and Passing State

External procedures under Oracle 8.0 have a "state-less" model. All Statement handles opened during the invocation of an external procedure are closed implicitly at the end of the call.

In Oracle 8.1, we allow "state" (OCI Statement handles etc. and associated state in the DBMS) to be saved and used across invocations of external procedures in a session.B y default cartridges are still stateless, however, OCIMemory services and OCIContext services can be used with OCI_DURATION_SESSION or other appropriate duration to save state. Statement handles created in one external procedure invocation can get re-used in another. The Data Cartridge developer needs to explicitly free these handles. It is recommended that this is done as soon as the statement handle is no longer needed. All state maintained for the statement in the OCI handles and in the DBMS would get freed as a result. This should help in improving the scalability of the Data Cartridge.

Designing Indexes

Influencing Index Performance

It is wrong to assume that creating domain index is always the best course. If, after careful consideration, you determine that you need to create domain index, you should keep the following factors in mind. For one, if the domain index is complex, the functional implementation will work better

When the data size is small
When the result is a large percentage of the total data size.

Judicious use of the extensible optimizer can lead to good performance.

Influencing Index Performance

Naming of internal components can be an issue. Naming of internal data objects for a domain index implementation and are typically based on names you provide for table and indexes. The problem is that the derived names for the internal objects should not conflict with any other user defined object or system object. You may have to develop some policy that restricts names, or implement some metadata management scheme to avoid errors during DROP, CREATE etc.

When to Use IOTs

You can create only one index on IOTs in 8.0.x releases. However, if most of your data is in the index, it's more efficient than storing your data in both the table and then an additional index.

You can create secondary indexes on IOTs in Orace8i release 8.1.5 which offers a big advantage if you are accessing the data different ways.

Can Index Structures Be Stored in LOBs

Index structures can be stored in LOBs but take care to tune the LOB for best performance. If you are accessing a particular LOB frequently, create your table with the CACHE option and place the LOB index in a separate tablespace. If you are updating a LOB frequently, TURN OFF LOGGING and read/write in multiples of CHUNK size. If you are accessing a particular portion of a LOB frequently, buffer your reads/writes using LOB buffering or your own buffering scheme.

External Index Structures

With the extensible indexing framework, the meaning and representation of a user-defined index is left to the cartridge developer. We do provide basic index implementations such as IOTs. In certain cases, binary or character LOBs can also be used to store complex index structures. IOTs, BLOBs and CLOBs all live within the database. In addition to them, you may also store a user-defined index as a structure external to the database, say in a BFILE.

The external index structure gives you the most flexibility in terms of how your index is represented. It is useful if you have already invested in the development of in-memory indexing structures. For example, an operating system file may store index data, which is read into a memory mapped file at run time. Such cases may be handled as BFILEs in the external index routines.

External index structures may provide superior performance. However, this comes at some cost. Index structures external to the database do not participate in the transaction semantics of the database which, in the case of index structures inside the database, make data and concomitant index updates atomic. This means that if update to the data causes an update for the external index to be invoked via the extensible indexing interface, any failures may cause the data updates to be rolled back but not the index updates. The database can only roll back what is internal to it -- external index structures cannot be rolled back in synchronization with a database rollback.

External index structures can be very useful for read-only access. Their semantics become complex if updates to data are involved.

Multi-Row Fetch

ODCIIndexFetch(self IN [OUT] <impltype>, nrows IN NUMBER, rids OUT ODCIRidList)  
RETURN NUMBER

When the ODCIIndexFetch routine is called, the ROWIDs of all the rows that satisfy the operator predicate are returned. The maximum number of rows that can be returned by the ODCIIndexFetch routine is nrows (nrows being an argument to the ODCIIndexFetch routine). The value of nrows is decided by Oracle based on some internal factors. If you have a better idea of the number of rows that ought to be returned to achieve optimal query performance, you can determine that this number of rows is returned in the ODCIRidList VARRAY instead of nrows. Note that the number of values in the ODCIRidList has to be less than or equal to nrows.

You, as cartridge designer, are in the best position to make a judgement regarding the number of rows to be returned. For example, if in the index the number of (say 1500) rowids are stored together and nrows = 2000, then it may be optimal to return 1500 rows in lieu of 2000 rows. Otherwise the user would have to retrieve 3000 rowids, return 2000 if them and note which 1000 rowids were not returned.

If you not have any specific optimization in mind, you can use the value of nrows to determine the number of rows to be returned. Currently the value of nrows has been set to 2000.

Anyone implementing indexes which use callouts should use multirow fetch to fetch the largest number of rows back to the server. This offsets the cost of making the callout. You can set this using _DOMAIN_INDEX_BATCH_SIZE in your init.ora.

Designing Operators

Functional and Index Implementations

All indexes should contain an indexed and functional implementation of the operator, in case the optimizer chooses not to use the indexed implementation. You can, however, use the indexing structures to produce the functional result.

Talking to the Optimizer

Weighing Cost and Selectivity

Estimating Cost

In Orace8i release 8.1.5 only the CPU and I/O costs are considered.

Cost for functions

The cost of executing a C function can be determined using common profilers or tools. For SQL queries, an explain plan of the query would give a rough estimate of the cost of the query. In addition the tkprof utility can be used to gather information about the CPU and the I/O cost involved in the operation. The cost of executing a callout could also be determined by using it in a SQL query which "selects from dual" and then estimating its cost from the tkprof utility.

Cost for Indexes

The cost of the index is a function of the selectivity of the predicate (which is passed as an argument to the cost function) * the total number of data blocks in the index structures. Hence the index cost function should be one which increases with the increase in selectivity of the predicate. With a selectivity of 100%, the cost of accessing the index should be the cost of accessing all the data in all the structures that comprise the domain index.

The total cost of accessing the index is the cost of performing the ODCIIndexStart, N * ODCIIndexFetch and ODCIIndexClose operators, where N is the number of times the ODCIIndexFetch routine will be called based on the selectivity of the predicate. The cost of ODCIIndexStart, ODCIIndexFetch and ODCIIndexClose functions can be determined as discussed above.

Estimating Selectivity

Selectivity for Functions

The selectivity of a predicate is the percentage of rows returned by the predicate divided by the total number of rows in the table(s).

The selectivity function should use the statistics collected for the table to determine what percentage of rows of the table will be returned by the predicate with the given list of arguments. For example, to compute the selectivity of a predicate IMAGE_GREATER_THAN (Image SelectedImage) which determines the images that are greater than the Image SelectedImage, a histogram of the sizes of the images in the database can be a useful statistics to compute the selectivity.

Collecting Statistics

Statistics can affect the calculation of selectivity for predicates and also the cost of domain indexes.

Statistics for Tables

The statistics collected for a table can affect the computation of selectivity of a predicate. So statistics that can help the user make a better judgement about the selectivity of a predicate should be collected for a table/column. Knowing the predicates that would operate on the data will be helpful to determine what statistics would be good to collect.

Some example of statistics that can be useful in spatial domain for example could be the average/min/max number of elements in a VARRAY that contains the nodes of the spatial objects.

Note that standard statistics are collected in addition to the user defined statistics when the ANALYZE command is invoked.

Statistics for Indexes

When a domain index is analyzed statistics for the underlying objects which constitute the domain index should be analyzed. For example if the domain index is comprised of tables, the statistics collection function should ANALYZE the tables when the domain index is analyzed. The cost of accessing the domain index can be influenced by the statistics that have been collected for the index. For example the cost of accessing a domain index could be approximated to the selectivity * the total number of data blocks (in the various tables) being accessed when the domain index is accessed.

To accurately define cost, selectivity and statistics functions, a good understanding of the domain is required. The above guidelines are meant to help you understand some of the issues you need to take into account while working on the cost, selectivity and statistics functions. In general it may be a good idea to start of by using the default cost and selectivity and observe how the queries of interest behave.

Design for maintenance

Carefully design your object types and methods. Object types are difficult to upgrade once they are in use by applications.
Use OIDs in all of your object types so users can import/export data easily across databases.
It is easy to add a method to a type; it's hard to remove it.
You are likely get more use out of the cartridge and the existing tool stack if you support functions against a traditional relational model in addition to an object model.
Expose significant and frequently used data from your complex objects in object types as attributes so that you can build an index on them.
If your cartridge maintains a large number of objects, views, tables, etc., consider making a metadata table to maintain the relationships among the objects for the user. This will ease the complexity of developing and maintaining the cartridge when it is in use.

How to Make Your Cartridge Extensible

Keep your interface simple, and document it thoroughly.
Use OO concepts appropriately.
Ensure that your methods do not have side affects

How to Make Your Cartridge Installable

Include a README with your cartridge to tell users how to install the cartridge
Make the cartridge installable in one step in the database, if possible:
```
sqlplus @imginst 
```
Tell users how to start the listener if you are using callouts.
Tell users how to setup extproc. Most users have never heard of extproc and many users have never set up a listener. This is the primary problem when deploying cartridges.
In Orace8i release 8.1.5 using the software packager, you can easily create custom SQL install scripts by using the 'instantiate_file' action. This is a great feature that allows you to substitute variables in your files when they are installed and it leaves your user with scripts and files that are customized for their machine.

Miscellaneous

How to Write Portable Cartridge Code

You should:

Use the datatypes in oratypes.h
Use OCI calls where ever possible.
Use the switches which enforce ANSI C conformance when possible
Use ANSI C function prototypes
Build and test on your target platforms as early in your development cycle as possible (flush out platform specific code and allow as much time to redesign as possible).

You should avoid:

Storing endian (big/little) specific data
Storing floating point data (IEEE/VAX/other)
Operating System specific calls (if they can't be avoided, isolate them in an OS specific layer, however the calls you require are not in the OCI, and also are not in POSIX, then you are likely to encounter intractable problems)
int <-> size_t implicit casts on a 64 bit platform

10Design Considerations

Designing the types

Structured and Unstructured Data

Using Nested Tables or VARRAYs

Nested Tables

VARRAYs

Working Around Inheritance

Figure 10-1 Class Diagram: Vehicle as Super-type, Car and Truck as Subtypes

Inheritance Implementation Consequences

Simulating Inheritance

Subtype Contains Super-type

Figure 10-2 Object-Relational Schema -- Subtype Contains Super-type

Super-type Contains All Subtypes

Figure 10-3 Object-Relational Schema -- Super-type Contains All Subtypes

Dual Subtype / Super-type Reference

Figure 10-4 Object-Relational Schema -- Dual Subtype / Super-type Reference

Writing Methods: PL/SQL, C or Java?

Invokers Rights -- Why, When, How

Callouts

When to Callout

When to Callback

Callouts and LOB

Saving and Passing State

Designing Indexes

Influencing Index Performance

Influencing Index Performance

When to Use IOTs

Can Index Structures Be Stored in LOBs

External Index Structures

Multi-Row Fetch

Designing Operators

Functional and Index Implementations

Talking to the Optimizer

Weighing Cost and Selectivity

Estimating Cost

Cost for functions

Cost for Indexes

Estimating Selectivity

Selectivity for Functions

Collecting Statistics

Statistics for Tables

Statistics for Indexes

Design for maintenance

How to Make Your Cartridge Extensible

How to Make Your Cartridge Installable

Miscellaneous

How to Write Portable Cartridge Code

10
Design Considerations