Oracle8i interMedia Text Migration Release 8.1.5 A67845-01 |
|
This chapter describes how to migrate document presentation. The following topics are covered:
In interMedia Text query applications, you can present selected documents with query terms highlighted for text queries or with themes highlighted for ABOUT queries.
You can generate three types of output associated with highlighting: a marked-up version of the document, a plain text version of the document (filtered output), and highlight offset information for the document.
In pre-8.1.5, you used the procedure CTX_QUERY.CTX_HIGHLIGHT the three types of output listed above, namely a marked-up version of the document, a plain text version of the document (filtered output), and highlight offset information for the document.
In interMedia Text 8.1.5, these three types of output are generated by three different procedures in the CTX_DOC (document services) package. In addition, you can get plain text and HTML versions for each type of output.
The result tables you use to store this output in 8.1.5.5 are also different from pre-8.1.5 result tables.
In interMedia Text 8.1.5, the output for theme highlighting is different from what is was in pre-8.1.5. In pre-8.1.5, the system highlighted paragraphs in the document that best represented the query. In interMedia Text 8.1.5, individual themes, which can be words or phrases, are highlighted.
Use CTX_QUERY.HIGHLIGHT to obtain highlight information, marked-up documents, and filtered documents.
For example, to highlight all the occurrences of the term dog in a document identified by textkey 14, issue the following statement:
ctx_query.highlight ( cspec=> 'text_policy', textkey => '14', query => 'dog', id=> 14, hightab => 'highlight_ascii', mutab => 'mu_ascii' );
This example stores the offset information in the HIGHTAB table and the highlighted marked-up document in the MU_ASCII table.
For text highlighting, the behavior is same as in pre-8.1.5. You supply the query, and Oracle highlights words in document that satisfy the query. You can obtain plain-text or HTML highlighting.
For theme queries, interMedia Text 8.1.5 procedures highlight and markup words or phrases that best represent the theme query. This is behavior is different from pre-8.1.5 where paragraphs are highlighted for theme queries.
Highlight offset information is useful for when you write your own custom routines for displaying documents.
To obtain highlight offset information, use the CTX_DOC.HIGHLIGHT procedure. This procedure takes a query and a document, and returns highlight offset information for either plaintext or HTML formats.
With offset information, you have the freedom to highlight with different font types or colors rather than using the standard plain text markup obtained from CTX_DOC.MARKUP.
See Also:
For more information about using CTX_DOC.HIGHLIGHT, see its specification in the Oracle8i interMedia Text Reference. |
The CTX_DOC.MARKUP procedure takes a document reference and a query, and returns a marked-up version of the document. The output can be either marked-up plaintext or marked-up HTML.
In 8.1.5, you can customize the markup sequence for HTML navigation.
See Also:
For more information about CTX_DOC.MARKUP, see its specification in the Oracle8i interMedia Text Reference. |
When documents are stored in their native formats such as Microsoft Word, you can use the filter procedure CTX_DOC.FILTER to obtain either a plain text or HTML version of the document.
See Also:
For more information about CTX_DOC.FILTER, see its specification in the Oracle8i interMedia Text Reference. |
The following changes have been made in 8.1.5:
The following table describes list of themes, Gists, and theme summaries. Their definitions have not changed in 8.1.5:
Before you generate list of themes, theme summaries, or Gists, you must create result table to store the CTX_LING output.
To create a theme table called CTX_THEMES to store the list of themes from REQUEST_THEMES, issue the following SQL statement:
create table ctx_themes ( cid number, pk varchar2(64), theme varchar2(2000), weight number);
To create a Gist table called CTX_GIST to store the Gist or theme summaries from REQUEST_GIST, issue the following SQL statement:
create table ctx_gist ( cid number, pk varchar2(64), pov varchar2(80), gist long);
Use CTX_LING.REQUEST_THEMES to generate themes.
The following anynomous PL/SQL block generates a list of themes for document 20 by calling CTX_LING.REQUEST_THEMES and then CTX_LING.SUBMIT. declare handle number; begin ctx_ling.request_themes('CTXSYS.DOC_POLICY','20','CTX_THEMES'); handle := ctx_ling.submit; end;
Use CTX_LING.REQUEST_GIST to generate theme summaries and gists.
The following anonymous PL/SQL block generates a theme summary for document 20 about the theme of insects. The theme summary is generated by calling CTX_LING.REQUEST_GIST and then CTX_LING.SUBMIT.
declare handle number; begin ctx_ling.request_gist('CTXSYS.DOC_POLICY','20','CTX_GIST', 'PARAGRAPH', 'insects'); handle := ctx_ling.submit; end;
You can obtain a list of themes where each element in the list is a hierarchical list of parent themes. To do so, issue the following statements:
SQL> exec ctx_ling.set_full_themes(TRUE) SQL> exec ctx_ling.request_themes('ctx_thidx', pk, 'ctx_themes') SQL> exec ctx_ling.submit(200)
You change the default size of Gists using the ConText Workbench administration tool.
The CTX_LING package is no longer supported. The Gist and theme generation procedures are in the CTX_DOC package. No need to explicitly submit document services requests, since requests are synchronous. No servers need to be running.
A list of themes is a list of the main concepts in a document.
Use the CTX_DOC.THEMES procedure to generate lists of themes.
See Also:
To learn about the command syntax for CTX_DOC.THEMES, see Oracle8i interMedia Text Reference. |
To create a theme table:
create table ctx_themes (query_id number, theme varchar2(2000), weight number);
To obtain a list of themes where each element in the list is a single theme, issue:
begin ctx_doc.themes('newsindex',34,'CTX_THEMES',1,full_themes => FALSE); end;
To obtain a list of themes where each element in the list is a hierarchical list of parent themes, issue:
begin ctx_doc.themes('newsindex',34,'CTX_THEMES',1,full_themes => TRUE); end;
The definition of a Gist and theme summary has not changed for 8.1.5. A Gist is the text of a document that best represents what the document is about as a whole. A theme summary is the text of a document that best represents a single theme in the document.
In 8.1.5, you can specify the size of the Gist or theme summary when you call the procedure.
Use the procedure CTX_DOC.GIST to generate Gists and theme summaries.
See Also:
To learn about the command syntax for CTX_DOC.GIST, see Oracle8i interMedia Text Reference. |
To create a gist table:
create table ctx_gist (query_id number, pov varchar2(80), gist CLOB);
The following example returns a default sized paragraph level Gist for document 34:
begin ctx_doc.gist('newsindex',34,'CTX_GIST',1,'PARAGRAPH', pov =>'GENERIC'); end;
The following example generates a non-default size Gist of ten paragraphs:
begin ctx_doc.gist('newsindex',34,'CTX_GIST',1,'PARAGRAPH', pov =>'GENERIC', numParagraphs => 10); end;
The following example generates a Gist whose number of paragraphs is ten percent of the total paragraphs in document:
begin ctx_doc.gist('newsindex',34,'CTX_GIST',1, 'PARAGRAPH', pov =>'GENERIC', maxPercent => 10); end;
The following example returns a theme summary on the theme of insects for document with textkey 34. The default Gist size is returned.
begin ctx_doc.gist('newsindex',34,'CTX_GIST',1, 'PARAGRAPH', pov => 'insects'); end;