Oracle8i National Language Support Guide Release 8.1.5 A67789-01 |
|
This chapter tells how to set up an NLS environment, and includes the following topics:
NLS parameters determine the locale-specific behavior on both the client and the server. There are four ways to specify NLS parameters:
NLS_TERRITORY = "CZECH REPUBLIC"
% setenv NLS_SORT FRENCH
SVRMGR> ALTER SESSION SET NLS_SORT = FRENCH
For a complete description of ALTER SESSION, see Oracle8i SQL Reference.
TO_CHAR(hiredate, 'DD/MON/YYYY', 'nls_date_language = FRENCH')
Table 2-1 shows the precedence order when using NLS parameters. Higher priority settings will override lower priority settings. For example, a default value will have the lowest possible priority, and can be overridden by any other method. And explicitly setting an NLS parameter within a SQL function can override all other settings -- default, INIT.ORA, environment variable, and ALTER SESSION parameters.
:Highest Priority | |
1 |
Explicitly set in SQL functions |
2 |
Set by an ALTER SESSION statement |
3 |
Set as an environment variable |
4 |
Specified in the initialization parameter file |
5 |
Default |
|
Lowest Priority |
Table 2-2 lists the NLS parameters available with the Oracle server.
:A locale is a linguistic and cultural environment in which a system or program is running. Setting the NLS_LANG parameter is the simplest way to specify locale behavior. It sets the language, territory, and character set used by the database for both the server session and the client application. Using this one parameter ensures that the language and territory environment for both the server and client are the same.
The NLS_LANG parameter has three components (language, territory, and charset) in the form:
NLS_LANG = language_territory.charset
Each component controls the operation of a subset of NLS features.
language |
Specifies conventions such as the language used for Oracle messages, day names, and month names. Each supported language has a unique name; for example, American, French, or German. The language argument specifies default values for the territory and character set arguments, so either (or both) territory or charset can be omitted. If language is not specified, the value defaults to American. For a complete list of languages, see "Languages". |
territory |
Specifies conventions such as the default calendar, collation, date, monetary, and numeric formats. Each supported territory has a unique name; for example, America, France, or Canada. If territory is not specified, the value defaults to America. For a complete list of territories, see "Territories". |
charset |
Specifies the character set used by the client application (normally that of the user's terminal). Each supported character set has a unique acronym, for example, US7ASCII, WE8ISO8859P1, WE8DEC, WE8EBCDIC500, or JA16EUC. Each language has a default character set associated with it. Default values for the languages available on your system are listed in the installation or user's guide. For a complete list of character sets, see "Character Sets". |
Note: All components of the NLS_LANG definition are optional; any item left out will default. If you specify territory or charset, you must include the preceding delimiter [underscore ( _ ) for territory, period ( . ) for charset], otherwise the value will be parsed as a language name.
The three arguments of NLS_LANG can be specified in many combinations, as in the following examples:
NLS_LANG = AMERICAN_AMERICA.US7ASCII
or
NLS_LANG = FRENCH_CANADA.WE8DEC
or
NLS_LANG = JAPANESE_JAPAN.JA16EUC
Note that illogical combinations could be set, but would not work properly. For example, the following tries to support Japanese by using a Western European character set:
NLS_LANG = JAPANESE_JAPAN.WE8DEC
Since WE8DEC does not support any Japanese characters, the result is that you will be unable to store Japanese data.
You can set NLS_LANG as an environment variable at the command line. For example, on UNIX, you could specify the value of NLS_LANG by entering the following line at the prompt:
% setenv NLS_LANG FRENCH_FRANCE.WE8DEC
Because NLS_LANG is an environment variable, it is read by the client application at startup time. The client communicates the information defined by NLS_LANG to the server when it connects.
The following examples show how date and number formats are affected by NLS_LANG.
%seenv NLS_LANG American_America.WE8ISO8859P1 SVRMGR> SELECT ename, hiredate, ROUND(sal/12,2) sal FROM emp; ENAME HIREDATE SAL --------- --------- ------ Clark 09-DEC-88 4195.83 Miller 23-MAR-92 4366.67 Strauß 01-APR-95 3795.87
If NLS_LANG is set with the language as French, the territory as France, and the character set as Western European 8-bit ISO 8859-1, the same query returns:
%setenv NLS_LANG French_France.WE8ISO8859P1; SVRMGR> SELECT ename, hiredate, ROUND(sal/12,2) sal FROM emp; ENAME HIREDATE SAL --------- --------- ------- Clark 09/12/88 4195,83 Miller 23/03/92 4366,67 Strauß 01/04/95 3795,87
NLS_LANG sets the NLS language and territory environment used by the database for both the server session and the client application. Using the one parameter ensures that the language environments of both database and client application are automatically the same. But you might want to modify your environment further. To do that, you can use NLS_LANGUAGE or NLS_TERRITORY.
Parameter type: |
string |
Parameter scope: |
Initialization Parameter and ALTER SESSION |
Default value: |
NLS_LANG |
Range of values: |
any valid language name |
NLS_LANGUAGE specifies the default conventions for the following session characteristics:
The value specified for NLS_LANGUAGE in the initialization file is the default for all sessions in that instance.
For example, to specify the default session language as French, the parameter should be set as follows:
NLS_LANGUAGE = FRENCH
In this case, the server message
ORA-00942: table or view does not exist
will appear as
ORA-00942: table ou vue inexistante
Messages used by the server are stored in binary-format files that are placed in the ORA_RDBMS directory, or the equivalent. Multiple versions of these files can exist, one for each supported language, using the filename convention
<product_id><language_abbrev>.MSB
For example, the file containing the server messages in French is called ORAF.MSB, "F" being the language abbreviation for French.
Messages are stored in these files in one specific character set, depending on the particular machine and operating system. If this is different from the database character set, message text is automatically converted to the database character set. If necessary, it will be further converted to the client character set if it is different from the database character set. Hence, messages will be displayed correctly at the user's terminal, subject to the limitations of character set conversion.
The default value of NLS_LANGUAGE may be operating system specific. You can alter the NLS_LANGUAGE parameter by changing the value in the initialization file and then restarting the instance.
For more information on the default value, see your operating system-specific Oracle documentation.
The following examples show behavior before and after setting NLS_LANGUAGE.
SVRMGR> ALTER SESSION SET NLS_LANGUAGE Italian SVRMGR> SELECT ename, hiredate, ROUND(sal/12,2) sal FROM emp; ENAME HIREDATE SAL ----- -------- --- Clark 09-Dic-88 4195.83 Miller 23-Mar-87 4366.67 Strauß 01-Apr-95 3795.87 SVRMGR> ALTER SESSION SET NLS_LANGUAGE German SVRMGR> SELECT ename, hiredate, ROUND(sal/12,2) sal FROM emp; ENAME HIREDATE SAL ----- -------- --- Clark 09-DEZ-88 4195.83 Miller 23-MÄR-87 4366.67 Strauß 01-APR-95 3795.87
Parameter type: |
string |
Parameter scope: |
Initialization Parameter and ALTER SESSION |
Default value: |
NLS_LANG |
Range of values: |
any valid territory name |
NLS_TERRITORY specifies the conventions for the following default date and numeric formatting characteristics:
The value specified for NLS_TERRITORY in the initialization file is the default for the instance. For example, to specify the default as France, the parameter should be set as follows:
NLS_TERRITORY = FRANCE
In this case, numbers would be formatted using a comma as the decimal character.
You can alter the NLS_TERRITORY parameter by changing the value in the initialization file and then restarting the instance. The default value of NLS_TERRITORY can be operating system-specific.
The following examples show behavior before and after setting NLS_TERRITORY.
SQL> describe SalaryTable; Name Null? TYPE --------- ------- ------ SALARY NUMBER SQL> column SALARY format L999,999.99; SQL> SELECT * from SalaryTable; SALARY --------------------- $100,000.00 $150,000.00 SQL> ALTER SESSION SET NLS_TERRITORY = Germany; Session altered. SQL> SELECT * from SalaryTable; SALARY --------------------- DM100,000.00 DM150,000.00 SQL> ALTER SESSION SET NLS_LANGUAGE = German; Sitzung wurde geandert. SQL> SELECT * from SalaryTable; SALARY --------------------- DM100,000.00 DM150,000.00 SQL> ALTER SESSION SET NLS_TERRITORY = France; Sitzung wurde geandert. SQL> SELECT * from SalaryTable; SALARY --------------------- F100,000.00 F150,000.00
Note that the symbol for currency units changed, but no monetary conversion calculations were done.
The default values for language and territory can be overridden during a session by using the ALTER SESSION statement. For example:
% setenv NLS_LANG Italian_Italy.WE8DEC SVRMGR> SELECT ename, hiredate, ROUND(sal/12,2) sal FROM emp; ENAME HIREDATE SAL ----- -------- --- Clark 09-Dic-88 4195,83 Miller 23-Mar-87 4366,67 Strauß 01-Apr-95 3795,87 SVRMGR> ALTER SESSION SET NLS_LANGUAGE = German 2 > NLS_DATE_FORMAT = 'DD.MON.YY' 3 > NLS_NUMERIC_CHARACTERS = '.,'; SVRMGR> SELECT ename, hiredate, ROUND(sal/12,2) sal FROM emp; ENAME HIREDATE SAL ----- -------- --- Clark 09.DEZ.88 4195.83 Miller 23.MÄR.87 4366.67 Strauß 01.APR.95 3795.87
This feature implicitly determines the language environment of the database for each session. An ALTER SESSION statement is automatically executed when a session connects to a database to set the values of the database parameters NLS_LANGUAGE and NLS_TERRITORY to those specified by the language and territory arguments of NLS_LANG. If NLS_LANG is not defined, no implicit ALTER SESSION statement is executed.
When NLS_LANG is defined, the implicit ALTER SESSION is executed for all instances to which the session connects, for both direct and indirect connections. If the values of NLS parameters are changed explicitly with ALTER SESSION during a session, the changes are propagated to all instances to which that user session is connected.
All messages and text should be in the same language. For example, when running a Developer 2000 application, messages and boilerplate text seen by the user originate from three sources:
The application is responsible for meeting the last requirement. NLS takes care of the other two.
Many different time formats are used throughout the world. Some typical ones are:
Country | Description | Example |
---|---|---|
Finland |
hh24:mi:ss |
13:50:23 |
Germany |
hh24:mi:ss |
13:50:23 |
Japan |
hh24:mi:ss |
13:50:23 |
UK |
hh24:mi:ss |
13:50:23 |
US |
hh:mi:ss am |
1.50.23 PM |
Oracle allows you to control how dates appear through the use of date parameters.
Many different date formats are used throughout the world. Some typical ones are:
Country | Description | Example |
---|---|---|
Finland |
dd.mm.yyyy |
28.02.1998 |
Germany |
dd.mm.yy |
28.02.98 |
Japan |
yy-mm-dd |
98-02-28 |
UK |
dd-mon-yy |
28-Feb-98 |
US |
dd-mon-yy |
28-Feb-98 |
This parameter defines the default date format to use with the TO_CHAR and TO_DATE functions. The default value of this parameter is determined by NLS_TERRITORY. The value of this parameter can be any valid date format mask, and the value must be surrounded by quotation marks. For example:
NLS_DATE_FORMAT = "MM/DD/YYYY"
To add string literals to the date format, enclose the string literal with double quotes. Note that every special character (such as the double quote) must be preceded with an escape character. The entire expression must be surrounded with single quotes. For example:
NLS_DATE_FORMAT = '\"Today\'s date\" MM/DD/YYYY'
As another example, to set the default date format to display Roman numerals for months, you would include the following line in the initialization file:
NLS_DATE_FORMAT = "DD RM YYYY"
With such a default date format, the following SELECT statement would return the month using Roman numerals (assuming today's date is February 12, 1997):
SELECT TO_CHAR(SYSDATE) CURRDATE FROM DUAL; CURRDATE --------- 12 II 1997
The value of this parameter is stored in the internal date format. Each format element occupies two bytes, and each string occupies the number of bytes in the string plus a terminator byte. Also, the entire format mask has a two-byte terminator. For example, "MM/DD/YY" occupies 12 bytes internally because there are three format elements, two one-byte strings (the two slashes), and the two-byte terminator for the format mask. The format for the value of this parameter cannot exceed 24 bytes.
Note: The applications you design may need to allow for a variable-length default date format. Also, the parameter value must be surrounded by double quotes: single quotes are interpreted as part of the format mask.
You can alter the default value of NLS_DATE_FORMAT by changing its value in the initialization file and then restarting the instance, and you can alter the value during a session using an ALTER SESSION SET NLS_DATE_FORMAT command.
Currently, the default date format for most territories specifies the year format as "YY" to indicate the last 2 digits. If your applications are Year 2000 compliant, you can safely specify the NLS_DATE_FORMAT using "YYYY" or "RRRR". If your applications are not yet Year 2000 compliant, you may wish to specify the NLS_DATE_FORMAT as "RR". The "RR" format will have the following effect: Given a year with 2 digits, RR will return a year in the next century if the year is less than 50 and the last 2 digits of the current year are greater than or equal to 50; return a year in the preceding century if the year is less than or equal to 50 and the last 2 digits of the current year are less than 50.
See the Date Format Models section in the Oracle8i SQL Reference for full details on Date Format Elements.
Partition bound expressions for a date column must specify a date using a format which requires that the month, day, and 4-digit year are fully specified. For example, the date format MM-DD-YYYY requires that the month, day, and 4-digit year are fully specified. In contrast, the date format DD-MON-YY (11-jan-97, for example) is invalid because it relies on the current date for the century.
Use TO_DATE() to specify a date format which requires the full specification of month, day, and 4-digit year. For example:
TO_DATE('11-jan-1997', 'dd-mon-yyyy')If the default date format, specified by NLS_DATE_FORMAT, of your session does not support specification of a date independent of current century (that is, if your default date format is MM-DD-YY), you must take one of the following actions:
For more information on using TO_DATE(), see Oracle8i SQL Reference.
Parameter type: |
string |
Parameter scope: |
Initialization Parameter, Environment Variable, and |
Default value: |
default language for dates |
Range of values: |
any valid language name |
This parameter specifies the language for the spelling of day and month names by the functions TO_CHAR and TO_DATE, overriding that specified implicitly by NLS_LANGUAGE. NLS_DATE_LANGUAGE has the same syntax as the NLS_LANGUAGE parameter, and all supported languages are valid values. For example, to specify the date language as French, the parameter should be set as follows:
NLS_DATE_LANGUAGE = FRENCH
In this case, the query
SELECT TO_CHAR(SYSDATE, 'Day:Dd Month yyyy') FROM DUAL;
would return
Mercredi:12 Février 1997
Month and day name abbreviations are also in the language specified, for example:
Me:12 Fév 1997
The default date format also uses the language-specific month name abbreviations. For example, if the default date format is DD-MON-YYYY, the above date would be inserted using:
INSERT INTO tablename VALUES ('12-Fév-1997');
The abbreviations for AM, PM, AD, and BC are also returned in the language specified by NLS_DATE_LANGUAGE. Note that numbers spelled using the TO_CHAR function always use English spellings; for example:
SELECT TO_CHAR(TO_DATE('12-Fév'),'Day: ddspth Month')
FROM DUAL;
would return:
Mercredi: twenty-seventh Février
You can alter the default value of NLS_DATE_LANGUAGE by changing its value in the initialization file and then restarting the instance, and you can alter the value during a session using an ALTER SESSION SET NLS_DATE_LANGUAGE command.
Oracle allows you to control calendar-related items through the use of parameters.
The type of calendar information stored for each territory is as follows:
Some cultures consider Sunday to be the first day of the week. Others consider Monday to be the first day of the week. A German calendar starts with Monday.
März | 1998 | |||||
---|---|---|---|---|---|---|
Mo | Di | Mi | Do | Fr | Sa | So |
1 | ||||||
2 | 3 | 4 | 5 | 6 | 7 | 8 |
9 | 10 | 11 | 12 | 13 | 14 | 15 |
16 | 17 | 18 | 19 | 20 | 21 | 22 |
23 | 24 | 25 | 26 | 27 | 28 | 29 |
30 | 31 |
Many countries, Germany, for example, use weeks for scheduling, planning, and bookkeeping. Oracle supports this convention.
In the ISO standard, the year relating to an ISO week number can be different from the calendar year. For example, 1st Jan 1988 is in ISO week number 53 of 1987. A week always starts on a Monday and ends on a Sunday.
To support the ISO standard, a format element IW is provided that returns the ISO week number.
A typical example with four or more days in the first week is:
January | 1998 | ||||||
---|---|---|---|---|---|---|---|
Mo | Tu | We | Th | Fr | Sa | Su | |
1 | 2 | 3 | 4 | <= 1st week of 1998 | |||
5 | 6 | 7 | 8 | 9 | 10 | 11 | <= 2nd week of 1998 |
12 | 13 | 14 | 15 | 16 | 17 | 18 | <= 3rd week of 1998 |
19 | 20 | 21 | 22 | 23 | 24 | 25 | <= 4th week of 1998 |
26 | 27 | 28 | 29 | 30 | 31 | <= 5th week of 1998 |
A typical example with three or fewer days in the first week is:
January | 1999 | ||||||
---|---|---|---|---|---|---|---|
Mo | Tu | We | Th | Fr | Sa | Su | |
1 | 2 | 3 | <= 53rd week of 1998 | ||||
4 | 5 | 6 | 7 | 8 | 9 | 10 | <= 1st week of 1999 |
11 | 12 | 13 | 14 | 15 | 16 | 17 | <= 2nd week of 1999 |
18 | 19 | 20 | 21 | 22 | 23 | 24 | <= 3rd week of 1999 |
25 | 26 | 27 | 28 | 29 | 30 | 31 | <= 4th week of 1999 |
Oracle supports six calendar systems, as well as the default Gregorian.
The Islamic calendar starts from the year of the Hegira. The Japanese Imperial calendar starts from the beginning of an Emperor's reign. For example, 1998 is the tenth year of the Heisei era. It should be noted, however, that the Gregorian system is also widely understood, so both 98 and 10 can be used to represent 1998.
Parameter type: |
string |
Parameter scope: |
Initialization Parameter and ALTER SESSION |
Default value: |
Gregorian |
Range of values: |
any valid calendar format name |
Many different calendar systems are in use throughout the world. NLS_CALENDAR specifies which calendar system Oracle uses.
NLS_CALENDAR can have one of the following values:
For example, if NLS_CALENDAR is set to "Japanese Imperial", the date format is "E YY-MM-DD", and the date is May 15, 1997, then the SYSDATE is displayed as follows:
SELECT SYSDATE FROM DUAL; SYSDATE -------- H 09-05-15
Oracle allows you to control how numbers appear.
The database must know the number-formatting convention used in each session to interpret numeric strings correctly. For example, the database needs to know whether numbers are entered with a period or a comma as the decimal character (234.00 or 234,00). Similarly, the application needs to be able to display numeric information in the format expected at the client site.
Some typical formats are:
Country | Example |
---|---|
Finland |
1.234.567,89 |
Germany |
1.234.567,89 |
Japan |
1,234,567.89 |
UK |
1,234,567.89 |
US |
1,234,567.89 |
This parameter specifies the decimal character and grouping separator, overriding those defined implicitly by NLS_TERRITORY. The group separator is the character that separates integer groups (that is, the thousands, millions, billions, and so on). The decimal character separates the integer and decimal parts of a number.
Any character can be the decimal or group separator. The two characters specified must be single-byte, and both characters must be different from each other. The characters cannot be any numeric character or any of the following characters: plus (+), hyphen (-), less than sign (<), greater than sign (>).
The characters are specified in the following format:
NLS_NUMERIC_CHARACTERS = "<decimal_character><group_separator>"
The grouping separator is the character returned by the number format mask G. For example, to set the decimal character to a comma and the grouping separator to a period, the parameter should be set as follows:
NLS_NUMERIC_CHARACTERS = ",."
Both characters are single byte and must be different. Either can be a space.
Note: When the decimal character is not a period (.) or when a group separator is used, numbers appearing in SQL statements must be enclosed in quotes. For example, with the value of NLS_NUMERIC_CHARACTERS above, the following SQL statement requires quotation marks around the numeric literals:
INSERT INTO SIZES (ITEMID, WIDTH, QUANTITY) VALUES (618, '45,5', TO_NUMBER('1.234','9G999'));
You can alter the default value of NLS_NUMERIC_CHARACTERS in either of these ways:
Oracle allows you to control how currency and financial symbols appear.
Many different currency formats are used throughout the world. Some typical ones are:
Country | Example |
---|---|
Finland |
1.234,56 mk |
Germany |
1.234,56 DM |
Japan |
¥1,234.56 |
UK |
£1,234.56 |
US |
$1,234.56 |
Parameter type: |
string |
Parameter scope: |
Initialization Parameter, Environment Variable, and |
Default value: |
local currency symbol |
Range of values: |
any valid format name |
This parameter specifies the character string returned by the number format mask L, the local currency symbol, overriding that defined implicitly by NLS_TERRITORY. For example, to set the local currency symbol to "Dfl" (including a space), the parameter should be set as follows:
NLS_CURRENCY = "Dfl "
In this case, the query
SELECT TO_CHAR(TOTAL, 'L099G999D99') "TOTAL" FROM ORDERS WHERE CUSTNO = 586
would return
TOTAL ------------- Dfl 12.673,49
You can alter the default value of NLS_CURRENCY by changing its value in the initialization file and then restarting the instance, and you can alter its value during a session using an ALTER SESSION SET NLS_CURRENCY command.
Parameter type: |
string |
Parameter scope: |
Initialization Parameter, Environment Variable, and |
Default value: |
ISO international currency symbol |
Range of values: |
any valid territory name |
This parameter specifies the character string returned by the number format mask C, the ISO currency symbol, overriding that defined implicitly by NLS_TERRITORY.
Local currency symbols can be ambiguous; for example, a dollar sign ($) can refer to US dollars or Australian dollars. ISO Specification 4217 1987-07-15 defines unique "international" currency symbols for the currencies of specific territories (or countries).
For example, the ISO currency symbol for the US Dollar is USD, for the Australian Dollar AUD. To specify the ISO currency symbol, the corresponding territory name is used.
NLS_ISO_CURRENCY has the same syntax as the NLS_TERRITORY parameter, and all supported territories are valid values. For example, to specify the ISO currency symbol for France, the parameter should be set as follows:
NLS_ISO_CURRENCY = FRANCE
In this case, the query
SELECT TO_CHAR(TOTAL, 'C099G999D99') "TOTAL"
FROM ORDERS WHERE CUSTNO = 586
would return
TOTAL ------------- FRF12.673,49
You can alter the default value of NLS_ISO_CURRENCY by changing its value in the initialization file and then restarting the instance, and you can alter its value during a session using an ALTER SESSION SET NLS_ISO_CURRENCY command.
Typical ISO currency symbols are:
Country | Example |
---|---|
Finland |
1.234.567,89 FIM |
Germany |
1.234.567,89 DEM |
Japan |
1,234,567.89 JPY |
UK |
1,234,567.89 GBP |
US |
1,234,567.89 USD |
Parameter type: |
string |
Parameter scope: |
Initialization Parameter, Environment Variable, and |
Default value: |
Dual currency symbol |
Range of values: |
any valid name |
This parameter can be used to override the default dual currency symbol defined in the territory. When starting a new session without setting NLS_DUAL_CURRENCY, you will use the default dual currency symbol defined in the territory of your current language environment. When you set NLS_DUAL_CURRENCY, you will start up a session with its value as the dual currency symbol.
NLS_DUAL_CURRENCY was introduced to help support the Euro. The following character sets support the Euro symbol:
:
Parameter type: |
string |
Parameter scope: |
Initialization Parameter, Environment Variable, and |
Default value: |
derived from NLS_TERRITORY |
Range of values: |
any valid name |
NLS_MONETARY_CHARACTERS specifies the characters that indicate monetary units, such as the dollar sign ($) for U.S. Dollars, and the cent symbol (¢) for cents.
The two characters specified must be single-byte and cannot be the same as each other. They also cannot be any numeric character or any of the following characters: plus (+), hyphen (-), less than sign (<), greater than sign (>).
NLS_CREDIT sets the symbol that displays a credit in financial reports. The default value of this parameter is determined by NLS_TERRITORY.
NLS_DEBIT sets the symbol that displays a debit in financial reports. The default value of this parameter is determined by NLS_TERRITORY.
Oracle allows you to choose how data is sorted through the use of collation parameters.
Different languages have different sort orders. What's more, different cultures or countries using the same alphabets may sort words differently. For example, the German language sharp s (ß) is sorted differently in Germany and Austria. The linguistic sort sequence German sorts this sequence as the two characters SS, while the linguistic sort sequence Austrian sorts it as SZ. Another example is the treatment of ö, o, and oe. They are sorted differently throughout the various Germanic languages.
Oracle provides many different types of sort, but achieving a linguistically correct sort frequently harms performance. This is a trade-off the database administrator needs to make on a case-by-case basis. A typical case would be when sorting Spanish. In traditional Spanish, ch and ll are distinct characters, which means that the correct order would be: cerveza, colorado, cheremoya, lago, luna, llama. But a true linguistic sort will cause some performance degradation.
Sorting East Asian languages is difficult and complex. At present, Oracle typically relies on the binary order of the particular character set for sorting East Asian Languages. As an example, the Shift-JIS character set table is ordered by kanji radicals, therefore, Oracle uses that binary order for its sorts in a Shift-JIS environment.
Conventionally, when character data is sorted, the sort sequence is based on the numeric values of the characters defined by the character encoding scheme. Such a sort is called a binary sort. Such a sort produces reasonable results for the English alphabet because the ASCII and EBCDIC standards define the letters A to Z in ascending numeric value.
Note, however, that in the ASCII standard, all uppercase letters appear before any lowercase letters. In the EBCDIC standard, the opposite is true: all lowercase letters appear before any uppercase letters.
When characters used in other languages are present, a binary sort generally does not produce reasonable results. For example, an ascending ORDER BY query would return the character strings ABC, ABZ, BCD, ÄBC, in that sequence, when the Ä has a higher numeric value than B in the character encoding scheme.
To produce a sort sequence that matches the alphabetic sequence of characters for a particular language, another sort technique must be used that sorts characters independently of their numeric values in the character encoding scheme. This technique is called a linguistic sort. A linguistic sort operates by replacing characters with other binary values that reflect the character's proper linguistic order so that a binary sort returns the desired result.
The Oracle server provides both sort mechanisms. Linguistic sort sequences are defined as part of language-dependent data. Each linguistic sort sequence has a unique name. NLS parameters define the sort mechanism for ORDER BY queries. A default value can be specified, and this value can be overridden for each session with the NLS_SORT parameter. A complete list of linguistic definitions is provided in "Linguistic Definitions".
Warning: Linguistic sorting is not supported on Asian multi-byte character sets. If the database character set is multi-byte, you will get binary sorting, which makes the sort sequence dependent on the character set specification. There are two exceptions to this rule: Japanese Hiragana/Katakana and the UTF8 character set. This means that the Japanese Yomi sort is only possible by creating an extra column using the Hiragana or Katakana reading for the kanji and sorting on that column.
You can create a function-based index which uses languages other than English. A simple example is:
SVRMGR> CREATE INDEX nls_index ON my_table (NLSSORT(name, 'NLS_SORT = German'));
So, after
SVRMGR> SELECT * FROM my_table ORDER BY name;
rows will be returned using a German collation sequence.
For more information, see the description of function-based indexes in Oracle8i Concepts.
You can create a function-based index which allows case-insensitive searches. For example:
SVRMGR> CREATE INDEX case_insensitive_ind ON my_table(NLS_UPPER(empname)); SVRMGR> SELECT * FROM my_table WHERE NLS_UPPER(empname) = 'KARL';
For more information, see the description of function-based indexes in Oracle8i Application Developer's Guide - Fundamentals.
Linguistic special cases are character sequences that need to be treated as a single character when sorting. Such special cases are handled automatically when using a linguistic sort. For example, one of the linguistic sort sequences for Spanish specifies that the double characters ch and ll are sorted as single characters appearing between c and d and between l and m respectively.
Another example is the German language sharp s (ß). The linguistic sort sequence German can sort this sequence as the two characters SS, while the linguistic sort sequence Austrian sorts it as SZ.
Special cases like these are also handled when converting uppercase characters to lowercase, and vice versa. For example, in German the uppercase of the sharp s (ß) is the two characters SS. Such case-conversion issues are handled by the NLS_UPPER, NLS_LOWER, and NLS_INITCAP functions, according to the conventions established by the linguistic sort sequence. (The standard functions UPPER, LOWER, and INITCAP do not handle these special cases.)
This parameter specifies the type of sort for character data, overriding that defined implicitly by NLS_LANGUAGE.
The syntax of NLS_SORT is:
NLS_SORT = { BINARY | name }
BINARY specifies a binary sort and name specifies a particular linguistic sort sequence. For example, to specify the linguistic sort sequence called German, the parameter should be set as follows:
NLS_SORT = German
The name given to a linguistic sort sequence has no direct connection to language names. Usually, however, each supported language will have an appropriate linguistic sort sequence defined that uses the same name.
Note: When the NLS_SORT parameter is set to BINARY, the optimizer can in some cases satisfy the ORDER BY clause without doing a sort (by choosing an index scan). But when NLS_SORT is set to a linguistic sort, a sort is always needed to satisfy the ORDER BY clause.
You can alter the default value of NLS_SORT by changing its value in the initialization file and then restarting the instance, and you can alter its value during a session using an ALTER SESSION SET NLS_SORT command.
A complete list of linguistic definitions is provided in Table A-8, "Linguistic Definitions".
Parameter type: |
string |
Parameter scope: |
Environment Variable and ALTER SESSION |
Default value: |
binary |
Range of values: |
BINARY or ANSI |
This parameter lets you avoid the cumbersome process of using NLS_SORT in SQL statements. Normally, comparison in the WHERE clause is binary. To use linguistic comparison, the NLSSORT function must be used. Sometimes this can be tedious, especially when the linguistic sort needed has already been specified in the NLS_SORT session parameter. NLS_COMP can be used in such cases to indicate that the comparisons must be linguistic according to the NLS_SORT session parameter. This is done by altering the session:
SVRMGR> ALTER SESSION SET NLS_COMP = ANSI;
To specify that comparison in the WHERE clause is always binary, do
SVRMGR> ALTER SESSION SET NLS_COMP = BINARY;
As a final note, when NLS_COMP is set to ANSI, a linguistic index must exist on the column where the linguistic order is desired.
To enable a linguistic index, use the syntax:
SVRMGR> CREATE INDEX i ON t(NLSSORT(col, 'NLSSORT=FRENCH'));
Parameter type: |
string |
Parameter scope: |
Initialization Parameter and ALTER SESSION |
Default value: |
derived from NLS_TERRITORY |
Range of values: |
any valid character |
NLS_LIST_SEPARATOR specifies the character to use to separate values in a list of values.
The character specified must be single-byte and cannot be the same as either the numeric or monetary decimal character, any numeric character, or any of the following characters: plus (+), hyphen (-), less than sign (<), greater than sign (>), period (.).
Oracle allows you to specify the character set used for the client.
Parameter type: |
string |
Parameter scope: |
Environment Variable |
Default value: |
derived from NLS_LANG |
Range of values: |
any valid character set name |
NLS_NCHAR specifies the character set used by the client application for national character set data. If it is not specified, the client application uses the same character set which it uses for the database character set data.