Oracle8i interMedia Text Reference Release 8.1.5 A67843-01 |
|
This chapter introduces the main features of Oracle8i interMedia Text (iMT). It is provided to help you get started with indexing, querying, and document presentation.
The following topics are covered:
The goal of this chapter is to introduce the main features of interMedia Text as it pertains to designing a query application. The sections that follow describe out-of-box default behavior mainly.
The general steps for enabling Text queries in a query application are the following:
The sections that follow describe how Oracle8i interMedia text enables you to achieve these steps.
Oracle8i interMedia Text provides the following two roles for system administrators and application developers:
The CTXSYS role enables users to do the following
The CTXAPP role enables users to do the following:
The default indexing behavior expects documents loaded in a text column.
Note: Even though the system expects documents to be loaded in a text column, you can also store your documents in other ways, including the file system and as a URL. For more information about data storage, see "Datastore Objects" in Chapter 3. |
By default, the system expects your documents to be loaded in a text column. Your text column can be VARCHAR2, CLOB, BLOB, CHAR or BFILE.
Note: Storing data in the deprecated column types of LONG and LONG RAW is supported only for migrating Oracle7 systems to Oracle8. The column types NCLOB, DATE and NUMBER cannot be indexed. |
Because the system can index most document formats including HTML, PDF, Microsoft Word, and plain text, you can load any of these document types into the text column.
See Also:
For more information about the supported document formats, see Appendix C, "Supported Filter Formats". |
Oracle enables you to load data using various methods, including
See Also:
For loading examples, including how to use SQL*Loader, see Appendix D, "Loading Examples". To learn more about ctxload, see "ctxload" in Chapter 11. For more information about the DBMS_LOB package, see Oracle8i Supplied Packages Reference. For more information about working with LOBs, see the Oracle8i Application Developer's Guide - Large Objects (LOBs). For more information about Oracle Call Interface, see Oracle Call Interface Programmer's Guide |
Once your text is loaded in a text column, you can run the command to create a Text index.
For example, the following command creates a Text index called myindex on the text column in the docs table:
create index myindex on docs(text) indextype is ctxsys.context;
When you use CREATE INDEX without explicitly specifying parameters, the system does the following for all languages by default:
Note:
For document filtering to work correctly in your system, you must ensure that your environment is set up correctly to support the Inso filter. To learn more about configuring your environment to use the Inso filter, see "About Inso Filtering Technology" in Appendix C. |
Of course, you can change the default indexing behavior by creating your own preferences and specifying these custom preferences in the parameter string of CREATE INDEX.
See Also:
To learn more about creating your own custom preferences, see Chapter 3, "Indexing". See also CTX_DDL.CREATE_PREFERENCE in Chapter 7. To learn more about using CREATE INDEX, see its specification in Chapter 2. |
In addition to the general defaults, the system enables the following option for English language text:
By default, the following features are enabled:
Index maintenance is necessary after your application inserts, updates, or deletes documents in your base table.
If your base table is static, that is, you do no updating, inserting or deleting of documents after your initial index, you do not need to maintain your index.
However, if you perform DML (inserts, updates, or deletes) on your base table, you must update your index. You can synchronize your index manually with ALTER INDEX. You can also run the ctxsrv server in the background which synchronizes the index automatically at regular intervals.
See Also:
For more information about synchronizing the index, see ALTER INDEX in Chapter 2. For more information about ctxsrv, see "ctxsrv" in Chapter 11. |
You issue Text queries using the CONTAINS operator in a SELECT statement. With CONTAINS, you can issue two types of queries:
A word query is a query on the exact word or phrase you enter between the single quotes in the CONTAINS operator.
The following example finds all the documents in the text column that contain the word oracle. The score for each row is selected with the SCORE operator using a label of 1:
SELECT SCORE(1) title from news WHERE CONTAINS(text, 'oracle', 1) > 0;
In your query expression, you can use text operators such as AND and OR to achieve different results. You can also add structured predicates to the WHERE clause.
See Also:
For more information about the different operators you can use in queries, see Chapter 4, "Query Operators". |
You can count the hits to a query using count(*), CTX_QUERY.COUNT_HITS, or CTX_QUERY.EXPLAIN.
In all languages, ABOUT queries increases the number of relevant documents returned by a query.
In English, ABOUT queries can use the theme component of the index, which is created by default. As such, this operator returns documents based on the concepts of your query, not only the exact word or phrase you specify.
For example, the following query finds all the documents in the text column that are about the subject politics, not just the documents that contain the word politics:
SELECT SCORE(1) title from news WHERE CONTAINS(text, 'about(politics)', 1) > 0;
In your query application, you can use other query features. The following table lists some of these features and shows where to look in this book for more information.
Feature | Where to Find More Information |
---|---|
Section Searching |
Chapter 7, "CTX_DDL Package" for defining sections. |
Proximity Searching |
|
Stem and Fuzzy Searching |
|
Thesaural Queries |
Chapter 4, "Query Operators" for using thesaurus operators in queries. Chapter 10, "CTX_THES Package" for browsing a thesaurus. "ctxload" in Chapter 11 for loading thesauri. |
Word Decompounding (German and Dutch) Alternate Spelling (German, Dutch, and Swedish) |
"Lexer Objects" in Chapter 3 for enabling these features. |
Optimizing Queries for Response Time |
|
Query Explain Plan |
|
Hierarchical Query Feedback |
Typically, a Text query application allows the user to view the documents returned by a query. The user selects a document from the hitlist and then your application presents the document in some form.
With interMedia Text, you can render a document in different ways. For example, you can present documents with query terms highlighted. Highlighted query terms can be either the words of a word query or the themes of an ABOUT query in English.
Table 1-1 describes the different output you can obtain and which procedure to use to obtain each type:
Output | Procedure |
---|---|
Highlighted document, plain text version |
CTX_DOC.MARKUP |
Highlighted document, HTML version |
CTX_DOC.MARKUP |
Highlight offset information for plain text version |
CTX_DOC.HIGHLIGHT |
Highlight offset information for HTML version |
CTX_DOC.HIGHLIGHT |
Plain text version, no highlights |
CTX_DOC.FILTER |
HTML version of document, no highlights |
CTX_DOC.FILTER |