Oracle8(TM) ConText(R) Cartridge Application Developer's Guide Release 2.0 A54630-01 |
|
This chapter provides an overview of the Oracle8 ConText Cartridge.
The following topics are covered in this chapter:
Most of today's business data is not stored as structured data; it is stored as non-structured text in thousands of formats: letters, memos, manuals, reports, news articles, electronic mail, notes, messages, etc.
For many businesses, this huge volume of text is a vast, valuable and unmanageable information resource. Relevant documents are usually difficult to locate, hard to retrieve, and often impossible to digest. Oracle solves the text management problem with ConText.
ConText is built on the power and scalability of Oracle Universal Server. It uses advanced text analysis and retrieval technology to give users the exact information they need when they need it. With ConText, Oracle Universal Server is a complete solution for managing any data resource -relational, text, spatial, image, video, or audio-in any application, at any scale.
ConText manages unstructured text as quickly and as easily as structured data. It is an online text management system that uses SQL or PL/SQL to search through large volumes of text stored in either structured databases or system files.
Using ConText, developers can quickly and efficiently build mission-critical applications that provide hundreds or even thousands of concurrent users with fast, efficient access to text-based information. And, because text is now a supported datatype in the Oracle Universal Server, new applications and extensions to existing Oracle applications are quick and easy to build with standard tools.
The advantages of ConText include:
Using ConText's advanced indexing, retrieval, reduction, and classification features, users pinpoint and access required textual information quickly and easily from large volumes of text data.
ConText's extensible framework easily integrates new languages, formats, specialized search engines and text processing services. This adaptability to new requirements preserves an enterprise's investment in its text storage and retrieval applications and provides a healthy environment for long-term application development.
ConText currently recognizes, indexes, and retrieves text for most of the NLS-compliant, single-byte languages (7-bit and 8-bit character sets). All of these languages can be processed by the basic lexer provided with ConText.
ConText also supports query expansion, in the form of stemming, soundex, and fuzzy matching, for English and the following Western European languages: French, Spanish, Italian, German, and Dutch.
For multi-byte languages, ConText provides the following lexers: Japanese, Korean (BETA), and Chinese (BETA). The Japanese lexer is provided recognizes three of the Japanese writing systems: Kanji, Hiragana and Katakana.
Because ConText is fully integrated with Oracle8, users can manage text with the same reliability, scalability, security, integrity, fault tolerance, and administrative ease they expect from an enterprise-caliber relational database system.
ConText takes full advantage of Oracle's standard interfaces and third party tools-Power Builder, SQL*Windows, OLE Automation tools, and Visual Basic, for example. By installing ConText on one or more servers, client tools like SQL*Plus, Oracle Forms and Pro*C can be used to access and manipulate text just as easily and efficiently as structured data.
While standalone text-retrieval products often burden developers with separate development environments, ConText treats text and relational data as peers and uses standard SQL to locate and retrieve relevant text information.
ConText features that facilitate text management and retrieval include:
ConText provided a sophisticated natural language parser that can analyze English-language text and return detailed thematic information about the text. This theme information can be used in two very distinct and powerful ways to manipulate text:
Note:
The Linguistic Services are only available for English-language documents. |
Theme queries provide a powerful alternative or extension to text queries. In a text query, the occurrence of a word in a document is sufficient for the document to be returned in the results of the query. However, this type of query may generate more hits than the user wants.
Theme queries let the user search for documents based on the main ideas or concepts in the documents. In a theme query, only those documents in which a particular topic was sufficiently developed to be classified as a document-level theme are returned.
Themes and thematic content (Gists) can be generated on a per document basis through the Linguistic Services. This information can then be used to view documents by their themes, as well as their thematically-relevant paragraphs.
The application developer uses the Linguistics Services to create various levels of shorter abstracts that the user can use to quickly review the essential content of documents and determine their relevance.
The individuals involved in developing, supporting, maintaining and using ConText facilities are:
An end user is the individual or organization that uses an application to locate, retrieve, and read text. The End User defines the data or information requirements that must be satisfied by the application. The End User also defines the document environment from which text will be selected.
The application developer designs the application, defines the environment required to support the application, works with the System Administrator to create the environment, and writes the programs and procedures that satisfy user requirements. This book is targeted to this audience.
The database administrator maintains the Oracle system facilities, the databases, and the system environment that supports a ConText application.
The ConText administrator maintains the ConText environment that supports text applications, for example the policies and preferences that define text columns and indexes. The way in which your database administrator creates policies and preference affect the way you, the application developer, execute your queries.
The collection of text to be managed must be stored in an environment that is accessible to Oracle and ConText either as columns in an Oracle database or as pointers to system files outside the database.
Documents must be properly loaded into the database (or identified by external pointers) and indexed before text/theme queries can be executed.
In addition, linguistic output must be generated for each document before the linguistic information can be viewed for the documents.
To index a document or generate linguistic output for the document, the column storing the document must be defined as a text column. ConText recognizes a text column in a table if the column has one or more policies attached to it.
A table can contain more than one text column, but each text column requires a separate policy.
The process of loading documents, defining text columns, and creating ConText indexes for the columns is documented in the Oracle8 ConText Cartridge Administrator's Guide.
In particular, the Oracle8 ConText Cartridge Administrator's Guide explains how to:
See Also:
For more information about how to generate linguistic output, see Chapter 8, "Using ConText Linguistics". |