Oracle8i interMedia Text Reference Release 8.1.5 A67843-01 |
|
This chapter discusses the executables provided with interMedia Text. The following topics are discussed in this chapter:
You use the ctxsrv server daemon for background DML processing. You can start it from the command line or with the interMedia Text Manager administration tool.
This server synchronizes the index with ALTER INDEX at regular intervals.
ctxsrv [-user ctxsys/passwd[@sqlnet_address]] [-personality M] [-logfile log_name] [-sqltrace]
Specify the username and password for the Oracle user CTXSYS.
The username and password can be immediately followed by @sqlnet_address to permit logon to remote databases. The value for sqlnet_address is a database connect string. If the TWO_TASK environment variable is set to a remote database, you need not specify a value for sqlnet_address to connect to the database.
Specify the personality mask for the server started by ctxsrv. The only possible value is M and M is the default.
Specify the name of a log file to which the server writes all session information and errors.
Enables the server to write to a trace file in the directory specified by the USER_DUMP_DEST initialization parameter.
See Also:
For more information about SQL trace and the USER_DUMP_DEST initialization parameter, see Oracle8 Administrator's Guide. |
The following example starts a server and writes all server messages to a file named ctx.log:
ctxsrv -user ctxsys/ctxsys -personality M -log ctx.log &
The following example starts a server and writes all server messages to a file named ctx.log. Because -user is not specified, the server prompts you to enter a user:
ctxsrv -log ctx.log ... Copyright (c) Oracle Corporation 1979, 1998. All rights reserved. ... Enter user:
At the prompt, enter 'CTXSYS/password', where password is the password assigned to the CTXSYS user.
Pending index updates are stored in the DML queue. To view this queue, you can use the CTX_PENDING or CTX_USER_PENDING views.
You can also use the interMedia Text Manager administration tool, which is part of the Oracle Enterprise Manager.
You can view DML errors with the CTX_INDEX_ERRORS or CTX_USER_INDEX_ERRORS views.
Background DML with ctxsrv scans for DML constantly by polling the DML queue. This leads to new additions being indexed automatically and quickly. However, background DML also tends to process documents in smaller batches, which increases index fragmentation.
However, when you synchronize the index manually with ALTER INDEX, the batches are usually larger and thus there is less index fragmentation.
You can shut down ctxsrv with
CTX_ADM.SHUTDOWN in Chapter 6.
The following views in Appendix H, "Views":
For more information on starting servers with the administration tool, see the online help for the interMedia Text Manager. This administration tool ia a Java application integrated with the Oracle Enterprise Manager.
You use ctxload to perform the following operations:
Use ctxload to load a thesaurus from an import file into the iMT thesaurus tables.
An import file is an ASCII flat file that contains entries for synonyms, broader terms, narrower terms, or related terms which can be used to expand queries.
ctxload can also be used to export a thesaurus by dumping the contents of the thesaurus into a user-specified operating-system file.
See Also:
For examples of import files for thesaurus importing, see "Structure of ctxload Thesaurus Import File" in Appendix D. |
You can use ctxload to load text from a load file into a LONG or LONG RAW column in a table.
Suggestion: If the target table does not contain a LONG or LONG RAW column or you do not want to load text into a LONG or LONG RAW column, you can use SQL*Loader to populate the table with text. For more information on loading with SQL*Loader, see "SQL*Loader Example" in Appendix D. |
A load file is an ASCII flat file that contains the plain text, as well as any structured data (title, author, date, etc.), for documents to be stored in a text table; however, in place of the text for each document, the load file can store a pointer to a separate file that holds the actual text (formatted or plain) of the document.
The ctxload utility creates one row in the table for each document identified by a header in the load file.
See Also:
For examples of load files for text loading, see "Structure of ctxload Text Load File" in Appendix D. |
The ctxload utility supports updating database columns from operating system files and exporting database columns to files, specifically LONG RAW, LONG, BLOB and CLOB columns.
ctxload -user username[/password][@sqlnet_address] -name object_name -file file_name [-pk primary_key] [-export] [-update] [-thes] [-thescase y|n] [-thesdump] [-separate] [-longsize n] [-date date_mask] [-log file_name] [-trace] [-commitafter n]
Specify the username and password of the user running ctxload.
The username and password can be followed immediately by @sqlnet_address to permit logon to remote databases. The value for sqlnet_address is a database connect string. If the TWO_TASK environment variable is set to a remote database, you do not have to specify a value for sqlnet_address to connect to the database.
When you use ctxload to export/import a thesaurus, use object_name to specify the name of the thesaurus to be exported/imported.
You use object_name to identify the thesaurus in queries that use thesaurus operators.
When you use ctxload to update/export a text field, use object_name to specify the index associated with the text column.
When ctxload is used to import a thesaurus, use file_name to specify the name of the import file which contains the thesaurus entries.
When ctxload is used to export a thesaurus, use file_name to specify the name of the export file created by ctxload.
When ctxload is used to update a single row in a text column, use file_name to specify the file that stores the text to be inserted into the text column. You identify the destination row with -pk.
When ctxload is used to export a single row in a text column, use file_name to specify the file to which the text is exported. You identify the source row with -pk.
See Also:
For more information about the structure of ctxload import files, see Appendix D, "Loading Examples". |
Specify the primary key value of the row to be updated or exported.
When the primary key is compound, you must enclose the values within double quotes and separate the keys with a comma.
Exports the contents of a single cell in a database table into the operating system file specified by -file. ctxload exports the LONG, LONG RAW, CLOB or BLOB column in the row specified by -pk.
When you use the -export, you must specify a primary key with -pk.
Updates the contents of a single cell in a database table with the contents of the operating system file specified by -file. ctxload updates the LONG, LONG RAW, CLOB or BLOB column in for the row specified by -pk.
When you use -update, you must specify a primary key with -pk.
Import a thesaurus. Specify the source file with the -file argument. You specify the name of the thesaurus to be imported with -name.
Specify y to create a case-sensitive thesaurus with the name specified by -name and populate the thesaurus with entries from the thesaurus import file specified by -file. If -thescase is 'y' (the thesaurus is case-sensitive), ctxload enters the terms in the thesaurus exactly as they appear in the import file.
The default for -thescase is 'n' (case-insensitive thesaurus)
Export a thesaurus. Specify the name of the thesaurus to be exported with the -name argument. Specify the destination file with the -file argument.
For text loading, include this parameter to specify that the text of each document in the load file is a pointer to a separate text file. This instructs ctxload to load the contents of each text file in the LONG or LONG RAW column for the specified row.
For text loading, specify the maximum number of kilobytes to load into the LONG or LONG RAW column.
The minimum value is 1 (that is 1 Kb) and the maximum value is machine dependent.
Note: You must enter the value for longsize as a number only. Do not include a 'K' or 'k' to indicate kilobytes. |
Specify the TO_CHAR date format for any date columns loaded using ctxload.
Specify the name of the log file to which ctxload writes any national-language supported (NLS) messages generated during processing. If you do not specify a log file name, the messages appear on the standard output.
Enables SQL statement tracing using 'ALTER SESSION SET SQL_TRACE TRUE'. This command captures all processed SQL statements in a trace file, which can be used for debugging. The location of the trace file is operating-system dependent and can be modified using the USER_DUMP_DEST initialization parameter.
See Also:
For more information about SQL trace and the USER_DUMP_DEST initialization parameter, see Oracle8 Administrator's Guide. |
Specify the number of rows (documents) that are inserted into the table before a commit is issued to the database. The default is 1.
This section provides examples for some of the operations that ctxload can perform.
The following example imports a thesaurus named tech_doc from an import file named tech_thesaurus.txt:
ctxload -user jsmith/123abc -thes -name tech_doc -file tech_thesaurus.txt
The following example dumps the contents of a thesaurus named tech_doc into a file named tech_thesaurus.out:
ctxload -user jsmith/123abc -thesdump -name tech_doc -file tech_thesaurus.out
The following example exports a single text field identified by the primary key value of 1 to the file myfile. The index myindex identifies the text column.
ctxload -user scott/tiger -export -name myindex -file myfile -pk 1
To export a single text field identified by a compound primary key, you must enclose the primary keys with quotes and separate the values with commas as follows:
ctxload -user scott/tiger -export -name myindex -file myfile -pk "Oracle,1"
The following example updates a single text field identified by primary key value of 1 with the contents of myfile. The index myindex identifies the text column.
ctxload -user scott/tiger -update -name myindex -file myfile -pk 1
To update a single text field identified by a compound primary key, you must enclose the primary key with quotes and separate the values with commas as follows:
ctxload -user scott/tiger -update -name myindex -file myfile -pk "Oracle,1"
The ctxkbtc compiler takes one or more specified thesauri and compiles them with the interMedia Text knowledge base to create an extended knowledge base. The extended information can be application-specific terms and relationships.
The extended knowledge base overrides any terms and relationships in the knowledge base where there is overlap. The extended knowledge base is accessed during tasks that use the knowledge base, such as theme indexing, processing ABOUT queries in English, and extracting document themes with document services.
See Also:
For more information about the knowledge base packaged with interMedia Text, see Appendix J, "Knowledge Base - Category Hierarchy". For more information about the ABOUT operator, see ABOUT operator in Chapter 4. For more information about document services, see Chapter 8, "CTX_DOC Package". |
ctxkbtc -user uname/passwd [-name thesname1 [thesname2 ... thesname16]] [-revert] [-verbose] [-log filename]
Specify the username and password for the administrator creating an extended knowledge base.
Specify the name(s) of the thesauri (up to 16) to be compiled with the knowledge base to create the extended knowledge base. The thesauri you specify must already be loaded with ctxload.
Reverts the extended knowledge base to the default knowledge base provided by interMedia Text.
Displays all warnings and messages, including non-NLS messages, to the standard output.
Specify the log file for storing all messages. When you specify a log file, no messages are reported to standard out.
Knowledge base extension cannot be performed when theme indexing is being performed.
In addition, any SQL sessions that are using interMedia Text functions must be exited and reopened to make use of the extended knowledge base.
There can be only one user extension per installation. Since a user extension affects all users at the installation, only administrators or terminology managers should extend the knowledge base.
Running ctxkbtc twice removes the previous extension.
Before being compiled, each thesaurus must be loaded into interMedia Text case sensitive with the "-thescase Y" option in ctxload.
Terms are case sensitive. If a thesaurus has a term in uppercase, for example, the same term present in lowercase form in a document will not be recognized.
The maximum length of a term is 80 characters.
Disambiguated homographs are not supported.
The following constraints apply to thesaurus relations:
Oracle recommends that new terms be linked to one of the categories in the knowledge base for best results in theme proving when appropriate.
For example, if a hierarchy of medical terms is added, the existing category health and medicine can be made a broader term for the new terms. If new terms are kept completely disjoint from existing categories, fewer themes from new terms will be proven. The result of this is poorer precision and recall with ABOUT queries as well poor quality of gists and theme highlighting.
When multiple thesauri are to be compiled, precedence is determined by the order in which thesauri are listed in the arguments to the compiler (most preferred first). A user thesaurus always has precedence over the built-in KB.
The following table lists the size limits associated with creating and compiling an extended knowledge base: