Oracle8(TM) ConText(R) Cartridge Application Developer's Guide Release 2.0 A54630-01 |
|
This chapter describes the different query methods you can use in your ConText application. You can use these methods with text queries and theme queries. The following topics are covered:
Each of the query methods (two-step, one-step, and in-memory) provide advantages and disadvantages that you must consider when developing an application. The following table briefly describes each method and illustrates the various advantages and disadvantages to using each:
To perform a two-step query, do the following:
The document textkeys and scores are stored in the specified result table.
Note:
You must create the result table before you execute the CONTAINS procedure. |
The following example shows a simple two-step query. The query uses a policy named ARTICLES_POL to search the text column in a table named TEXTTAB for any articles that contain the word petroleum. Note that before the two-step query example is executed, the result table, CTX_TEMP, is created:
create table CTX_TEMP( textkey varchar2(64), score number, conid number); execute ctx_query.contains('ARTICLE_POLICY', 'petroleum','CTX_TEMP') SELECT SCORE, title
FROM CTX_TEMP, TEXTTAB
WHERE texttab.PK=ctx_temp.textkey
ORDER BY SCORE DESC;
In this example, the articles with the highest scores appear first in the hitlist because the results are sorted by score in descending order.
In a two-step query, the score results generated by the CONTAINS procedure are physically stored in a result table that has been allocated (either by the application developer or dynamically within the application).
If you want to include scores in the hitlist returned by a two-step query, the scores must be selected from the result table in the second step of the query.
In two-step queries, ConText uses result tables called hitlist tables to store intermediate results. Intermediate results can be merged into the standard SQL query through a join operation or a sub-query operation. The result tables must be created before the query is performed. A hitlist table can be created manually or allocated through the CTX_QUERY.GETTAB procedure.
Hitlist tables can be named anything; however, they must have the following structure:
See Also:
For more information about the structure of the hitlist result tables, see "Hitlist Table Structure" in Appendix A. |
For applications that support multiple concurrent users, ConText allows for sharing a single result table among all the users rather than allocating a separate table for each user.
You control sharing of result tables with the sharelevel and the query_id parameters of the CTX_QUERY.CONTAINS procedure. If the result table is shared, the CONTAINS procedure must specify that sharelevel is equal to one and include a unique query_id so that each result can be distinguished from others in the result table.
When sharelevel is equal to 0:
When sharelevel is equal to 1 then:
Attention:
ConText does not verify that these rules are observed. You must control multiple concurrent usage by passing a different query_id to the requestor if the result table is shared. |
When you execute a two-step query on a table with a composite textkey, the number of textkey columns in the result table must match the composite keys count in the document table. For example, if you want to execute a query on a document table that has a two-column textkey, create a result table with the following schema: TEXTKEY, TEXTKEY2, SCORE, CONID.
The following SQL*Plus examples show two different ways in which to create a result table with a two-column composite textkey:
/* create composite textkey result table manually */ create table ctx_temp( textkey varchar(64), textkey2 varchar(64), score number, conid number); /* allocate composite textkey result table with CTX_QUERY.GETTAB() */ exec ctx_query.gettab(CTX_QUERY.HITTAB, :hit_tab, 2)
See Also:
For more information on the structure of composite textkey result tables, see "Composite Textkey Hitlist Tables" in Appendix A. |
There is an alternative to step 2 of a two-step query. Rather than joining the result table and text table in a SELECT statement, create a view to perform the join. Then use a SELECT statement to select the appropriate rows from that view. Use this approach when the development tool does not allow tables to be joined in a SELECT statement (e.g. Oracle Forms).
For example:
CREATE VIEW SURVEY AS SELECT * FROM TEXTTAB, CTX_TEMP WHERE PK = TEXTKEY;
SELECT SCORE, AUTHOR FROM SURVEY ORDER BY SCORE DESC;
In this example:
To execute a two-step query on a table with a composite textkey, you first specify the multiple textkey columns when you create the policy for the text column
See Also:
For more information about creating policies for composite textkey tables, see Oracle8 ConText Cartridge Administrator's Guide. |
In addition, before the two-step query, create a result table in which the number of TEXTKEY columns match the number of columns in the composite textkey in the document table. You can create the result table manually or using the CTX_QUERY.GETTAB procedure.
See Also:
For more information on the structure of composite textkey result tables, see "Composite Textkey Hitlist Tables" in Appendix A. |
For example, to create a result table manually with a composite textkey consisting of two columns, issue the following SQL statement:
create table CTX_TEMP2( textkey varchar2(64), textkey2 varchar2(64), score number, conid number);
In the two-step query, use the AND operator in the WHERE condition when you join the result and text tables. For example:
exec ctx_query.contains('ARTICLE2_POLICY',\ 'petroleum',\ 'CTX_TEMP2') SELECT SCORE, title FROM CTX_TEMP2, TEXTTAB2 WHERE texttab2.PK=ctx_temp2.textkey AND texttab2.PK2=ctx_temp2.textkey2 ORDER BY SCORE DESC;
A structured query is a query based on a text column and a structured data column. The structured data column is usually in the same table as the text column. For example, you might use a structured query to retrieve documents on a certain subject that were written after a certain date, where the document content is in a text column and date information is in a structured data column.
The CTX_QUERY.CONTAINS procedure provides an additional parameter, struct_query, for specifying the WHERE condition in a structured query. For example, to select all news articles that contain the word Oracle that were written on or after October 1st, 1996, you might use:
exec ctx_query.contains('news_text','Oracle','res_tab',\ struct_query => 'issue_date >= (''1-OCT-1996'')')
Note:
Because the struct_query parameter expects a WHERE condition, you can specify a subquery. This is useful when the structured data column is in another table. |
Executing a structured query with the struct_query parameter improves performance over processing a query on a text column and then refining the hitlist by applying a where condition against a structured column. This is especially so when the selectivity of the where condition is high, because when you use the structured query parameter, the ConText server executes the entire query without first writing out a potentially large hitlist to be refined later by the Oracle server.
Note:
If the user who includes a structured query in a two-step query is not the owner of the table containing the structured and text columns, the user must have SELECT privilege with GRANT OPTION on the table. In addition, if the object being queried is a view, the user must have SELECT privilege with GRANT OPTION on the base table for the view. SELECT privilege with GRANT OPTION can be granted to a user using the GRANT command in SQL. For more information, see Oracle8 Server SQL Reference. |
If a database link has been created for a remote database, two-step queries support querying text columns in the remote database.
Note:
Database links are created using the CREATE DATABASE LINK command in SQL. For more information about creating database links, see Oracle8 Server SQL Reference. |
To perform a two-step query for a text column in a remote database, specify the database link for the remote database in the CONTAINS procedure as part of the policy for the column in the remote database.
In addition, the result table specified in CONTAINS must exist in the remote database, and you, the user performing the query, must have the appropriate privileges on the result table.
For example:
exec ctx_query.contains('MY_POL@DB1', 'petroleum','CTX_TEMP')
In this example, MY_POL exists in a remote database identified by the database link DB1. The CTX_TEMP result table exists in the same remote database.
See Also:
For more information about remote queries and distributed databases, see Oracle8 Server Concepts. |
The CONTAINS procedure provides an argument for processing two-step queries in parallel. Processing queries in parallel helps balance the load between ConText servers and might improve query performance.
When the CONTAINS procedure is called in a two-step query, the PARALLEL argument can be used to specify the number of ConText servers, up to the total number of ConText servers running with the Query personality, that are used to process two-step queries and write the results to the result table.
For example:
execute ctx_query.contains('ARTICLE_POLICY',\ 'petroleum', 'CTX_TEMP', parallel=>2)
In this example, the text column in the ARTICLE_POLICY policy is queried for documents that contain the term petroleum. The query is processed in parallel by any two available ConText servers with the Query personality and the results are written to CTX_TEMP.
The one-step query uses the CONTAINS and SCORE functions in a SQL statement to execute a user's request for documents. Rows and columns containing the text and structured data for relevant documents are returned to the application program as a record set like any other query in SQL.
Note:
Before one-step queries can be executed, the database in which the text resides must be text enabled by setting the ConText initialization parameter TEXT_ENABLE = TRUE. This can be done by either setting it in the initsid.ora system initialization file, or by using the ALTER SESSION command. For more information about initialization parameters and the initsid.ora file, see Oracle8 Server Administrator's Guide. For more information about using the ALTER SESSION command, see Oracle8 Server SQL Reference. |
After a user has submitted a one-step query, ConText performs the following tasks to return the results to the user:
The following SELECT statement shows a simple one-step query. This query searches a text table called TEXTTAB for any articles that contain the word petroleum.
SELECT *
FROM texttab
WHERE CONTAINS (text, 'petroleum') > 0;
Because ConText functions execute within normal SQL statements, all of the capabilities for selecting and querying normal structured data fields, as well as text, are available. For instance, in the example, if the text table had a column listing the date the article was published, the user could select articles based on that date as well as the content of the text column.
Note:
The asterisk wildcard character ( * ) in specifies that the record set returned by the query includes all the columns of the text table for the selected documents, as well as the scores generated for each document. If a query has more than one CONTAINS function, the asterisk wildcard does not return scores for the multiple CONTAINS and the SCORE function must be called explicitly. See "Scoring" in this chapter for an example. |
One-step queries support calling more than one CONTAINS functions in the WHERE clause of a SELEC statement. Multiple CONTAINS can be used in a one-step query to perform queries on multiple text columns located either in the same table or in separate tables.
If multiple ConText servers with the Query personality are running and a one-step query with multiple CONTAINS is executed, the query is processed in parallel. Each CONTAINS function is evaluated by one of the available ConText servers and the results from the servers are combined before they are returned to the user.
In a one-step query, the document scores are generated by the CONTAINS function and returned by the SCORE function.
Each CONTAINS function in a query produces a separate score. When there are multiple CONTAINS functions, each CONTAINS function must have a label (a number) so the SCORE value can be identified in other clauses of the SELECT statement.
The SCORE function may be used in a SELECT list, an ORDER BY clause or a GROUP BY clause.
For example:
SELECT SCORE (10), SCORE(20), title FROM DOCUMENTS WHERE CONTAINS (TEXT, 'holmes,' 10)
OR CONTAINS (TEXT, 'moriarty', 20)
OR CONTAINS (TEXT, 'baker street', 30) ORDER BY SCORE(10) GROUP BY SCORE(30)
The CONTAINS function can only appear in the WHERE clause of a SELECT statement.
You cannot issue the CONTAINS function in the WHERE clause of an UPDATE, INSERT or DELETE statement.
You can perform one-step queries on text tables with composite textkeys. The syntax for the query is the same as the syntax for a query on a table with a single-column textkey.
If a database link has been created for a remote database, one-step queries support querying text columns in the remote database.
To perform a one-step query for a text column in a remote database, the database link for the remote database is specified as part of the table name in the SELECT clause.
For example:
SELECT *
FROM texttab@db1
WHERE CONTAINS (text, 'petroleum') > 0;
In this example, texttab exists in a remote database identified by the database link DB1.
Note:
One-step queries do not support querying LONG and LONG RAW columns in remote database tables. For more information about creating database links, see Oracle8 Server SQL Reference. For more information about remote queries and distributed databases, see Oracle8 Server Concepts. |
In-memory queries use a buffer and a cursor to return query results. Returning query results to a buffer in memory improves performance over writing and reading query results to and from database result tables, which is typical of one- and two-step queries.
To perform an in-memory query, do the following:
In addition, you can specify that OPEN_CON return additional columns (up to five) for the selected documents from the text table.
The following example shows a simple in-memory query. This query uses a policy named ARTICLES_POL to search the text column in a table named TEXTTAB for any articles that contain the word petroleum.
declare
score char(5);
pk char(5);
curid number;
title char(256);
begin
dbms_output.enable(100000);
curid := ctx_query.open_con(
policy_name => 'ARTICLES_POL',
text_query => 'petroleum',
score_sorted => true,
other_cols => 'title');
while (ctx_query.fetch_hit(curid, pk, score, title)>0)
loop
dbms_output.put_line(score||pk||substr(title,1,50));
end loop;
ctx_query.close_con(curid);
end;
In this example, the TITLE column from the table is also returned by OPEN_CON, so a variable must be declared for TITLE.
DBMS_OUTPUT.ENABLE sets the buffer size to the maximum of 100000 bytes (1 Mb) to ensure that the buffer is large enough to hold the results of the query.
The SCORE_SORTED argument in OPEN_CON is set to true which causes OPEN_CON to store the hits in the query buffer in descending order by score.
FETCH_HIT is called in a loop to fetch SCORE, PK, and TITLE for each hit until a value less than zero is returned, indicating that the buffer is empty.
DBMS_OUTPUT.PUT_LINE prints the results to the standard output.
See Also:
For more information about the DBMS_OUTPUT PL/SQL package, see Oracle8 Server Application Developer's Guide. |
You can perform in-memory queries on text tables that have multiple column textkeys. When you use CTX_QUERY.FETCH_HIT to retrieve each hit from the buffer, the PK argument is returned as an encoded string. To access an individual textkey, you must use CTX_QUERY.PKDECODE.
In-memory queries have the following limitations:
Because the OPEN_CON procedure does not support an additional struct_query parameter, you cannot query for structured data in an in-memory query.
You cannot use the max and first/next operators with in-memory queries.
If a database link has been created for a remote database, in-memory queries support querying text columns in the remote database.
Note:
Database links are created using the CREATE DATABASE LINK command in SQL. For more information about creating database links, see Oracle8 Server SQL Reference. |
To perform an in-memory query for a text column in a remote database, the database link for the remote database is specified in the CTX_QUERY.OPEN_CON procedure as part of the policy for the column in the remote database.
In addition, the result table specified in CTX_QUERY.CONTAINS must exist in the remote database and the user performing the query must have the appropriate privileges on the result table.
See Also:
For more information about remote queries and distributed databases, see Oracle8 Server Concepts. |
In addition to two-step, one-step, and in-memory queries, you can use the CTX_QUERY.COUNT_HITS function to return the number of hits for a query without generating scores for the hits or returning the textkeys for the documents. The documents can be stored in a local or remote database. Counting query hits is generally much faster than performing a full query and can be used to audit queries to ensure large and unmanageable hitlists are not returned.
Counting query hits can be performed in two modes: estimate and exact. The modes are based on the method ConText uses to record deleted documents in a text index.
In exact mode, hits are returned only for those documents that satisfy the conditions of the query expression and are currently in the text column of the table.
In estimate mode, hits may be included for documents that satisfy the query condition, but have been deleted from the text column or have been updated so that they no longer satisfy the query expression. This can occur when the text index for the column has not been optimized and the internal document IDs are still present in the index.
In general, the inaccuracy of the results returned by COUNT_HITS in estimate mode is proportional to the amount of DML that has been performed on a text column.
Note:
If the index being queried has been optimized and no further DML has been performed on the text column, estimate mode will return accurate results. |
See Also:
For more information about text indexing, DML, and optimization, see Oracle8 ConText Cartridge Administrator's Guide. |