Oracle8(TM) ConText(R) Cartridge Application Developer's Guide Release 2.0 A54630-01 |
|
This chapter describes how to perform theme queries. The following topics are covered:
Theme queries are issued against a set of documents, typically stored in a text column. Before you can execute a theme query on a set of documents, you must first create a theme index. To do so, specify the THEME_LEXER as the lexer preference when you create the policy for the text column. For example:
execute ctx_ddl.create_policy('THEME_POLICY',\
'table1.text', lexer_pref => 'CTXSYS.THEME_LEXER');
See Also::
For more information about creating theme indexes, see Oracle8 ConText Cartridge Administrator's Guide. |
When you create a theme index for a set of documents in a text column, ConText creates a document signature for each document. A document signature is a collection of the main concepts or themes in the document. ConText can store up to 16 themes per document.
Each theme in the document signature has a theme vector associated with it that defines the theme as part of a hierarchy. For example if two themes in a document are computer software and telephones, ConText might generate the corresponding theme vectors with the following theme tokens and weights:
Theme Vector 1 Weight science and technology 40 computer industry 40 computer software 40 Theme Vector 2 Weight science and technology 30 communications 30 telecommunications industry 30 telephones 30
When ConText interprets a document to create the theme index, theme token names are derived from the standard names and categories in the knowledge catalog. Theme tokens in the index represent concepts in the document that might appear exactly like the token, as alternate forms of the word, or as a semantically related concept. For example, the canonical form Oracle Corporation might represent Oracle and Oracle Corp in the document.
See Also:
For more information about the knowledge catalog, see "Knowledge Catalog" in Chapter 7. |
The theme weight is a measure of the strength of a theme relative to the other themes in a document. Weights are associated with theme vectors, and thus theme tokens within the same theme vector have the same weight.
For example, the tokens telephones and communications in Theme Vector 2 have the same weight of 30. When you issue a theme query, ConText uses theme weights to score hits.
To execute a theme query, you specify a query string, which can be a sentence or a phrase with or without operators. ConText interprets your query, creating a normalized form of your query that it can use to match against document signatures. Context returns a list of documents that satisfy the query, based on certain rules, along with a score of how relevant each document is to the query.
To execute a theme query with the CTX_QUERY.CONTAINS procedure, you must specify a policy that has a theme lexer associated with it.
For example, you specify a theme query on computer software as follows:
execute ctx_query.contains('THEME_POL', 'computer software', 'CTX_TEMP');
In the above example, ConText generates theme vectors for the query computer software, which ConText attempts to match with document signatures in the theme index.
When a match is found, ConText uses the weight of the matched theme to compute a score that reflects how relevant the match is to the query; the higher the score, the more relevant the hit. ConText returns the matched document as part of the hitlist.
For example, if you issue a theme query with a token of computer software, ConText might return a match on a document that has a theme vector as follows:
Science and Technology 40 Computer Industry 40 Computer Software 40
Likewise, if you issued a query for the token science and technology, ConText returns the above document; however, performing a query on a broad term like science and technology would likely return a larger and more vague hitlist.
You can execute theme queries using the one-step method in SQL*Plus. The way in which ConText matches theme signatures, scores hits, and returns documents is the same as in a two-step query.
For example, to execute a theme query on computer software:
SELECT * FROM TEXTAB
WHERE CONTAINS (text, 'computer software') > 0
For a text column that has more than one policy associated with it, you must specify which policy to use in the CONTAINS clause using the pol_hint parameter. You might create two policies for a column when you want to perform both theme and text queries on the column.
For example, if the column text had a regular text policy and a theme policy THEMEPOL associated with it, you would do a theme query as follows:
SELECT ID, SCORE(0) FROM TEXTAB
WHERE CONTAINS (text, 'computer software', 0, 'THEMEPOL') > 0
When you need to specify policy in the CONTAINS function as in this example, you must also specify a placeholder, in this case 0, for the LABEL parameter.
See Also:
For more information about using the pol_hint parameter in the CONTAINS function, see the specification for CONTAINS in Chapter 9. |
Unlike regular text queries, theme queries are case-sensitive. For example, doing a query on the common noun turkey, which describes a type of bird, will not produce a hit on the proper noun Turkey, which describes a country.
An ambiguous word or phrase is one that is vague or contains very little information. If your query contains an ambiguous term, ConText returns an error. An example of an ambiguous query term is the word images or the phrase good times.
In theme queries, the following operators have the same semantics as with regular text queries:
Some valid query strings using operators are as follows:
contains(text, 'telephones & {computer industry}') > 0 contains(text, 'telephones*3 & {computer software}*.5 > 50') > 0
In a theme query, the thesaurus operators (synonym, broader term, narrower term etc.) work the same way as in a regular text query, provided a thesaurus has been created/loaded.
In theme query expressions, the grouping characters (), [] have the same semantics as with a regular text query.
In theme query expressions, the wildcard characters (%, _) work the same way as in regular text queries.
Note:
There is a risk of ambiguity when using the wildcard character. For example, doing a theme query on %court% might return documents that have a theme of 'court of law' or 'tennis court'. |
ConText does not support the following query expression operators with theme queries: