Oracle8i interMedia Text Reference Release 8.1.5 A67843-01 |
|
This chapter provides reference information for using the CTX_DDL PL/SQL package to create and manage the objects required for Text indexes.
CTX_DDL contains the following stored procedures and functions:
Creates a field section and adds the section to an existing section group. This enables field section searching with the WITHIN operator.
Field sections are delimited by start and end tags. By default, the text within field sections are indexed as a sub-document separate from the rest of the document.
Unlike zone sections, field sections cannot nest or overlap. As such, field sections are best suited for non-repeating, non-overlapping sections such as TITLE and AUTHOR markup in email- or news-type documents.
Because of how field sections are indexed, WITHIN queries on field sections are usually faster than WITHIN queries on zone sections.
CTX_DDL.ADD_FIELD_SECTION( group_name in varchar2, section_name in varchar2, tag in varchar2, visible in boolean default FALSE );
Specify the name of the section group to which section_name is added. You can add up to 64 field sections to a single section group.
Specify the name of the section to add to the group_name. You use this name to identify the section in queries. Avoid using names that contain non-alphanumeric characters such as _, since these characters must be escaped in queries.
Specify the tag which marks the start of a section. For example: HTML
Specify TRUE to make the text visible within rest of document.
By default the visible flag is FALSE. This means that Oracle indexes the text within field sections as a sub-document separate from the rest of the document. However, you can set the visible flag to TRUE if you want text within the field section to be indexed as part of the enclosing document.
The following code defines a section group basicgroup of the BASIC_SECTION_GROUP type. It then creates a field section in basicgroup called Author for the <A> tag. It also sets the visible flag to FALSE:
begin ctx_ddl_create_section_group('basicgroup', 'BASIC_SECTION_GROUP'); ctx_ddl.add_field_section('basicgroup', 'Author', 'A', FALSE); end;
Because the Author field section is not visible, to find text within the Author section, you must use the WITHIN operator as follows:
'(Martin Luther King) WITHIN Author'
A query of Martin Luther King without the WITHIN operator does not return instances of this term in field sections. If you want to query text within field sections without specifying WITHIN, you must set the visible flag to TRUE when you create the section as follows:
begin ctx_ddl.add_field_section('basicgroup', 'Author', 'A', TRUE); end;
Oracle knows what the end tags look like from the group_type parameter you specify when you create the section group. The start tag you specify must be unique within a section group.
Section names need not be unique across tags. You can assign the same section name to more than one tag, making details transparent to searches.
Within the same group, zone section names and field section names cannot be the same. The terms Paragraph and Sentence are reserved for special sections.
You can define up to 64 field sections within a section group. Within the same group, section zone names and section field names cannot be the same.
Field sections cannot be nested. For example, if you define a field section to start with <TITLE> and define another field section to start with <FOO>, the two sections cannot be nested as follows:
<TITLE> dog <FOO> cat </FOO> </TITLE>
Repeated field sections are allowed, but are treated as a single section. The following is an example of repeated field section in a document:
<TITLE> cat </TITLE> <TITLE> dog </TITLE>
To work with sections that are nested or that repeat, define them as zone sections.
"Section Group Types" in Chapter 3.
Adds a special section, either SENTENCE or PARAGRAPH, to a section group. This enables searching within sentences or paragraphs in documents with the WITHIN operator.
A special section in a document is a section which is not explicitly tagged as are zone and field sections. The start and end of special sections are detected when the Text index is created. Oracle supports two such sections: paragraph and sentence.
CTX_DDL.ADD_SPECIAL_SECTION( group_name IN VARCHAR2, section_name IN VARCHAR2);
Specify the name of the section group.
Specify SENTENCE or PARAGRAPH.
The following code enables searching within sentences within HTML documents:
begin ctx_ddl_create_section_group('htmgroup', 'HTML_SECTION_GROUP'); ctx_ddl.add_special_section('htmgroup', 'SENTENCE'); end;
You can also add zone sections to the group to enable zone searching in addition to sentence searching. The following example adds the zone section Headline to the htmgroup:
begin ctx_ddl_create_section_group('htmgroup', 'HTML_SECTION_GROUP'); ctx_ddl.add_special_section('htmgroup', 'SENTENCE'); ctx_ddl.add_zone_section('htmgroup', 'Headline', 'H1'); end;
If you are only interested in sentence or paragraph searching within documents and not interested in defining zone or field sections, you can use the NULL_SECTION_GROUP as follows:
begin ctx_ddl_create_section_group('nullgroup', 'NULL_SECTION_GROUP'); ctx_ddl.add_special_section('nullgroup', 'SENTENCE'); end;
The sentence and paragraph boundaries are determined by the lexer. Therefore, if the lexer cannot recognize the boundaries, no sentence or paragraph sections are indexed.
"Section Group Types" in Chapter 3.
Adds a stopclass to a stoplist. A stopclass is a class of tokens that is not to be indexed.
CTX_DDL.ADD_STOPCLASS( stoplist_name in varchar2, stopclass in varchar2 );
Specify the name of the stoplist.
Specify the stopclass to be added to stoplist_name. Currently, only the NUMBERS class is supported.
The following code adds a stopclass of NUMBERS to the stoplist mystop:
begin ctx_ddl.add_stopclass('mystop', 'NUMBERS'); end;
The maximum number of stopwords, stopthemes, and stopclasses you can add to a stoplist is 4095.
Adds a single stoptheme to a stoplist. A stoptheme is a theme that is not to be indexed.
In English, you query on indexed themes using the ABOUT operator.
CTX_DDL.ADD_STOPTHEME( stoplist_name in varchar2, stoptheme in varchar2 );
Specify the name of the stoplist.
Specify the stoptheme to be added to stoplist_name.
The following example adds the stoptheme banking to the stoplist mystop:
begin ctx_ddl.add_stoptheme('mystop', 'banking'); end;
The maximum number of stopwords, stopthemes, and stopclasses you can add to a stoplist is 4095.
Adds a single stopword to a stoplist. To create a list of stopwords, you must call this procedure once for each word.
CTX_DDL.ADD_STOPWORD( stoplist_name in varchar2, stopword in varchar2 );
Specify the name of the stoplist.
Specify the stopword to be added.
The following example adds the stopwords because, notwithstanding, nonetheless, and therefore to the stoplist mystop:
begin ctx_ddl.add_stopword('mystop', 'because'); ctx_ddl.add_stopword('mystop', 'notwithstanding'); ctx_ddl.add_stopword('mystop', 'nonetheless'); ctx_ddl.add_stopword('mystop', 'therefore'); end;
The maximum number of stopwords, stopthemes, and stopclasses you can add to a stoplist is 4095.
ALTER INDEX in Chapter 2.
Appendix E, "Supplied Stoplists"
Creates a zone section and adds the section to an existing section group. This enables field section searching with the WITHIN operator.
Zone sections are sections delimited by start and end tags. The <B> and </B> tags in HTML, for instance, marks a range of words which are to be rendered in boldface.
Zone sections can be nested within one another, can overlap, and can occur more than once in a document.
CTX_DDL.ADD_ZONE_SECTION( group_name in varchar2, section_name in varchar2, tag in varchar2 );
Specify the name of the section group to which section_name is added.
Specify the name of the section to add to the group_name. You use this name to identify the section in queries. Avoid using names that contain non-alphanumeric characters such as _, since most of these characters are special must be escaped in queries.
Specify the pattern which marks the start of a section.
The following code defines a section group called htmgroup with a type of HTML_SECTION_GROUP. It then creates a zone section in htmgroup called Headline:
begin ctx_ddl_create_section_group('htmgroup', 'HTML_SECTION_GROUP'); ctx_ddl.add_zone_section('htmgroup', 'Headline', 'H1'); end;
Oracle knows what the end tags look like from the group_type parameter you specify when you create the section group. The start tag you specify must be unique within a section group.
Section names need not be unique across tags. You can assign the same section name to more than one tag, making details transparent to searches.
Within the same group, zone section names and field section names cannot be the same. The terms Paragraph and Sentence are reserved for special sections.
Zone sections can overlap each other. For example, if <B> and <I> denotes two different zone sections, they can overlap in document as follows:
plain <B> bold <B> bold and italic </B> only italic </I> plain
Zone sections can nest, including themselves as follows:
<TD> <TABLE><TD>nested cell</TD></TABLE></TD>
"Section Group Types" in Chapter 3.
Creates a preference in the Text data dictionary. You specify preferences in the parameter string of CREATE INDEX or ALTER INDEX.
CTX_DDL.CREATE_PREFERENCE(preference_name in varchar2, object_name in varchar2);
Specify the name of the preference to be created.
Specify the name of the preference object.
See Also:
For a complete list of preference objects and their associated attributes, see Chapter 3, "Indexing". |
The following example creates a lexer preference that specifies a text-only index. It does so by creating a BASIC_LEXER preference called my_lexer with CTX_DDL.CREATE_PREFERENCE. It then calls CTX_DDL.SET_ATTRIBUTE twice, first specifying Y for the INDEX_TEXT attribute, then specifying N for the INDEX_THEMES attribute.
begin ctx_ddl.create_preference('my_lexer', 'BASIC_LEXER'); ctx_ddl.set_attribute('my_lexer', 'INDEX_TEXT', 'YES'); ctx_ddl.set_attribute('my_lexer', 'INDEX_THEMES', 'NO'); end;
The following example creates a data storage preference called mypref that tells the system that the files to be indexed are stored in the operating system. The example then uses CTX_DDL.SET_ATTRIBUTE to set the PATH attribute of to the directory /docs.
begin ctx_ddl.create_preference('mypref', 'FILE_DATASTORE'); ctx_ddl.set_attribute('mypref', 'PATH', '/docs'); end;
You use CTX_DDL.CREATE_PREFERENCE to create a preference with DETAIL_DATASTORE. You use CTX_DDL.SET_ATTRIBUTE to set the attributes for this preference. The following example shows how this is done:
begin ctx_ddl.create_preference('my_detail_pref', 'DETAIL_DATASTORE'); ctx_ddl.set_attribute('my_detail_pref', 'binary', 'true'); ctx_ddl.set_attribute('my_detail_pref', 'detail_table', 'my_detail'); ctx_ddl.set_attribute('my_detail_pref', 'detail_key', 'article_id'); ctx_ddl.set_attribute('my_detail_pref', 'detail_lineno', 'seq'); ctx_ddl.set_attribute('my_detail_pref', 'detail_text', 'text'); end;
The following examples specify that the index tables are to be created in the foo tablespace with an initial extent of 1K:
begin ctx_ddl.create_preference('mystore', 'BASIC_STORAGE'); ctx_ddl.set_attribute('mystore', 'I_TABLE_CLAUSE', 'tablespace foo storage (initial 1K)'); ctx_ddl.set_attribute('mystore', 'K_TABLE_CLAUSE', 'tablespace foo storage (initial 1K)'); ctx_ddl.set_attribute('mystore', 'R_TABLE_CLAUSE', 'tablespace foo storage (initial 1K)'); ctx_ddl.set_attribute('mystore', 'N_TABLE_CLAUSE', 'tablespace foo storage (initial 1K)'); ctx_ddl.set_attribute('mystore', 'I_INDEX_CLAUSE', 'tablespace foo storage (initial 1K)'); end;
When you create preferences with objects that have no attributes, you need only create the preference, as in the following example which sets the filter to the NULL_FILTER:
begin ctx_ddl.create_preference('my_null_filter', 'NULL_FILTER'); end;
ALTER INDEX in Chapter 2.
Creates a section group for defining sections in a text column.
When you create a section group, you can add to it zone, field, or special sections with ADD_ZONE_SECTION, ADD_FIELD_SECTION, or ADD_SPECIAL_SECTION.
When you index, you name the section group in the parameter string of CREATE INDEX or ALTER INDEX.
After indexing, you can query within your defined sections with the WITHIN operator.
CTX_DDL.CREATE_SECTION_GROUP( group_name in varchar2, group_type in varchar2 );
Specify the section group name to create as [user.]section_group_name. This parameter must be unique within an owner.
Specify section group type. The group_type parameter can be one of:
The following command creates a section group called htmgroup with the HTML group type.
begin ctx_ddl_create_section_group('htmgroup', 'HTML_SECTION_GROUP'); end;
"Section Group Types" in Chapter 3.
Creates a new, empty stoplist. Stoplists can contain words or themes that are not to be indexed.
You can add either stopwords, stopclasses, or stopthemes to stoplists using ADD_STOPWORD, ADD_STOPCLASS, or ADD_STOPTHEME.
You can specify a stoplist in the parameter string of CREATE INDEX or ALTER INDEX.
CTX_DDL.CREATE_STOPLIST(stoplist_name in varchar2);
Specify the name of the stoplist to be created.
The following code creates a stoplist called mystop:
begin ctx_ddl.create_stoplist('mystop'); end;
The maximum number of stopwords, stopthemes, and stopclasses you can add to a stoplist is 4095.
ALTER INDEX in Chapter 2.
Appendix E, "Supplied Stoplists"
The DROP_PREFERENCE procedure deletes the specified preference from the Text data dictionary.
CTX_DDL.DROP_PREFERENCE(preference_name IN VARCHAR2);
Specify the name of the preference to be dropped.
The following code drops the preference my_lexer.
begin ctx_ddl.drop_preference('my_lexer'); end;
Dropping a preference does not affect indexes that have been created using that preference.
The DROP_SECTION_GROUP procedure deletes the specified section group, as well as all the sections in the group, from the Text data dictionary.
CTX_DDL.DROP_SECTION_GROUP(group_name IN VARCHAR2);
Specify the name of the section group to delete.
The following code drops the section group htmgroup and all its sections:
begin ctx_ddl.drop_section_group('htmgroup'); end;
Drops a stoplist from the Text data dictionary.
CTX_DDL.DROP_STOPLIST(stoplist_name in varchar2);
Specify the name of the stoplist.
The following code drops the stoplist mystop:
begin ctx_ddl.drop_stoplist('mystop'); end;
When you drop a stoplist, you must recreate or rebuild the index for the change to take effect.
The REMOVE_SECTION procedure removes the specified section from the specified section group. You can specify the section by name or by id. You can view section id with the CTX_USER_SECTIONS view.
Use the following syntax to remove a section by section name:
CTX_DDL.REMOVE_SECTION( group_name in varchar2, section_name in varchar2 );
Specify the name of the section group from which to delete section_name.
Specify the name of the section to delete from group_name.
Use the following syntax to remove a section by section id:
CTX_DDL.REMOVE_SECTION( group_name in varchar2, section_id in number );
Specify the name of the section group from which to delete section_id.
Specify the section id of the section to delete from group_name.
The following code drops a section called Title from the htmgroup:
begin ctx_ddl.remove_section('htmgroup', 'Title'); end;
Removes a stopclass from a stoplist.
CTX_DDL.REMOVE_STOPCLASS( stoplist_name in varchar2, stopclass in varchar2 );
Specify the name of the stoplist.
Specify the name of the stopclass to be removed.
The following code removes the stopclass NUMBERS from the stoplist mystop.
begin ctx_ddl.remove_stopclass('mystop', 'NUMBERS'); end;
Removes a stoptheme from a stoplist.
CTX_DDL.REMOVE_STOPTHEME( stoplist_name in varchar2, stoptheme in varchar2 );
Specify the name of the stoplist.
Specify the stoptheme to be removed from stoplist_name.
The following code removes the stoptheme banking from the stoplist mystop:
begin ctx_ddl.remove_stoptheme('mystop', 'banking'); end;
Removes a stopword from a stoplist. To have the removal of a stopword be reflected in the index, you must rebuild your index.
CTX_DDL.REMOVE_STOPWORD( stoplist_name in varchar2, stopword in varchar2 );
Specify the name of the stoplist.
Specify the stopword to be removed from stoplist_name.
The following code removes a stopword because from the stoplist mystop:
begin ctx_ddl.remove_stopword('mystop','because'); end;
Sets a preference attribute. You use this procedure after you have created a preference with CTX_DDL.CREATE_PREFERENCE.
ctx_ddl.set_attribute(preference_name in varchar2, attribute_name in varchar2, attribute_value in varchar2);
Specify the name of the preference.
Specify the name of the attribute.
Specify the attribute value. You can specify boolean values as TRUE or FALSE, T or F, YES or NO, Y or N, or 1 or 0.
The following example creates a data storage preference called filepref that tells the system that the files to be indexed are stored in the operating system. The example then uses CTX_DDL.SET_ATTRIBUTE to set the PATH attribute to the directory /docs.
begin ctx_ddl.create_preference('filepref', 'FILE_DATASTORE'); ctx_ddl.set_attribute('filepref', 'PATH', '/docs'); end;
See Also:
For more information about data storage, see "Datastore Objects" in Chapter 3. For more examples of using SET_ATTRIBUTE, see CREATE_PREFERENCE. |
Removes a set attribute from a preference.
CTX_DDL.UNSET_ATTRIBUTE(preference_name varchar2, attribute_name varchar2);
The following example shows how you can enable alternate spelling for German and disable alternate spelling with ctx_ddl.unset_attribute:
begin ctx_ddl.create_preference('GERMAN_LEX', 'BASIC_LEXER'); ctx_ddl.set_attribute('GERMAN_LEX', 'ALTERNATE_SPELLING', 'GERMAN'); end;
To disable alternate spelling, use the CTX_DDL.UNSET_ATTRIBUTE procedure as follows:
begin ctx_ddl.unset_attribute('GERMAN_LEX', 'ALTERNATE_SPELLING'); end;