Oracle8i National Language Support Guide Release 8.1.5 A67789-01 |
|
American Standard Code for Information Interchange. A common encoded 7-bit character set for English. ASCII includes the letters A-Z and a-z, as well as digits, punctuation symbols, and control characters. The Oracle character set name for this is US7ASCII.
Sorting of strings based on their binary coded value representations.
Case conversion refers to changing a character from its uppercase to lowercase form, or vice versa.
An independent unit used to represent data, such as a letter, a letter with a diacritical mark, a digit, ideograph, punctuation, or symbol.
Character classification information provides details about the type of character associated with each legal character code; that is, whether it is an alphabetic, uppercase, lowercase, punctuation, control, or space character, etc.
The type of mapping used in defining an encoded character set. Oracle supports many character set encodings including single-byte, multiple-byte, shift-sensitive multi-byte and fixed-width character set encoding.
Conversion from one encoded character set to another.
The encoded character set which the client uses. A client character set can differ from the database server character set, in which case, character set conversion must occur.
Ordering of all character strings from an alphabet into a linear sequence. Collation may be used on a linguistic sort order or a binary sort order.
A character that graphically combines with a preceding base character. These characters are not used in isolation. They include such characters as accents, diacritics, Hebrew points, Arabic vowel signs, and Indic matras.
A single character which can be represented by a composite character sequence. This type of character is found in the scripts of Thai, Lao, Vietnamese, and Korean Hangul, as well as many Latin characters used in European languages.
A character sequence consisting of a base character followed by one or more combining characters. This is also referred to as a combining character sequence.
The encoded character set.
A mark added to a letter that usually provides information about pronunciation or stress.
Extended Binary Coded Decimal Interchange Code. EBCDIC is a family of encoded character sets used mostly on IBM systems.
A character set encoding is a set of unambiguous rules that establishes a character set and the one-to-one relationship between each character of the set and its bit representation.
See "Character Encoding Schemes".
Extended UNIX Codes. A common encoding method used on Asian UNIX systems. It combines up to four different encoded character sets in a single data stream.
The new monetary currency used by participating member states of the European Union.
To write data to files for the purpose of archiving, or moving data between operating systems or Oracle databases.
An ordered collection of character glyphs which provides a graphical representation of characters within a character set.
The graphic representation of a character on a display device or paper. For example, H, H, or H are different glyphs, but represent the same character.
A symbol representing an idea. Chinese is an example of an ideographic system.
To read a module from the file system or database, and incorporate it into a display.
The process of making software flexible enough to be used in many different linguistic and cultural environments. Internationalization should not be confused with localization, which is the process of preparing software for use in one specific locale.
International Standards Organization.
A universal character set standard defining the characters of most major scripts used in the modern world. In 1993, ISO adopted Unicode version 1.1 as ISO/IEC 10646-1:1993. ISO/IEC 10646 has two formats: UCS2 is a 2-byte fixed-width format and UCS4 is a 4-byte fixed-width format. There are three levels of implementation, all relating to support for composite characters. Level 1 requires no composite character support, level 2 requires support for specific scripts (including most of the Unicode scripts such as Arabic, Thai, etc.), and level 3 requires unrestricted support for composite characters in all languages.
The 3-letter abbreviation used to denote a local currency, which is based on the ISO 4217 standard. For example, "USD" represents the United States Dollar.
A family of 8-bit encoded character sets. The most common one is ISO 8859-1 (also known as Latin-1), and is used for Western European languages.
Formally known as the ISO 8859-1 character set standard. An 8-bit extension to ASCII which adds 128 characters covering the most common Latin characters used in Western Europe. The Oracle character set name for this is WE8ISO8859P1. See also "ISO 8859".
An index built on a linguistic collation order.
Sorting of strings based on requirements from a locale instead of based on the binary representation of the strings.
The currency symbol used in a country or region. For example, "$" represents the United States Dollar.
A collection of information regarding the linguistic and cultural preferences from a particular region. Typically, a locale consists of language territory, character set, linguistic, and calendar information defined in NLS data files.
The process of providing language- or culture-specific information for software systems. Translation of an application's user interface would be an example of localization. Localization should not be confused with internationalization, which is the process of generalizing software so it can handle many different linguistic and cultural conventions.
Support for only one language.
A coded character that can be represented in one or more bytes. Multibyte data streams can include characters with varying widths, and can therefore make extensive text processing of individual characters a challenge. See "Wide Characters".
An alternate character set from the database character set that can be specified for NCHAR, NVARCHAR2, and NCLOB columns. NCHAR character sets, unlike the database character set, can support fixed-width multibyte character sets. Care must be taken when selecting an NCHAR character set, since its character repertoire must be included in the database character set as well.
Net8 enables two or more computers that run the Oracle server to exchange data through a third-party network. It is independent of the communications protocol.
National Language Support. NLS allows users to interact with the database in their native languages. It also allows applications to run in different linguistic and cultural environments.
A general phrase referring to the contents in many files with .nlb suffixes. These files contain data that the NLSRTL library uses to provide specific NLS support.
National Language Support Run-Time Library. This library is responsible for providing locale-independent algorithms for internationalization. The locale-specific information (i.e., NLSDATA) is read by the NLSRTL library during run-time.
A character used during character conversion when the desired character is not available in the target character set. For example, "?" is often used as Oracle's default replacement character.
Multilingual support which is restricted to a group of related languages. Support for related languages, but not all languages. Similar language families, such as Western European languages can be represented with, for example, ISO 8859/1. In this case, however, Thai could not be added.
Now called Net8. Net8 enables two or more computers that run the Oracle server to exchange data through a third-party network. It is independent of the communications protocol.
A collection of related graphic symbols used in a writing system. Some scripts are used to represent multiple languages, and some languages use multiple scripts. Example of scripts include Latin, Arabic, and Han.
The character set used by the database server.
UCS stands for "Universal Multiple-Octet Coded Character Set". It is a 1993 ISO and IEC standard character set.
Unicode is a type of universal character set, a collection of 64K characters encoded in a 16-bit space. It encodes nearly every character in just about every existing character set standard, covering most written scripts used in the world. It is owned and defined by Unicode Inc. Unicode is canonical encoding which means its value can be passed around in different locales. But it does not guarantee a round-trip conversion between it and every Oracle character set without information loss.
A 16-bit binary value that can represent a unit of encoded text for processing and interchange. Every point between U+0000 and U+FFFF is a code point. The term is interchangeable with code element, code position, and code value.
The following shows how different Unicode-related character sets relate to one another in terms of character code value ranges:
Fixed-width 16-bit Unicode. Each character occupies 16 bits of storage. The Latin-1 characters are the first 256 code points in this standard, so it can be viewed as a 16-bit extension of Latin-1. Oracle does not yet support this character set in the NLS run-time library.
Fixed-width 32-bit Unicode. Each character occupies 32 bits of storage. The UCS2 characters are the first 65,536 code points in this standard, so it can be viewed as a 32-bit extension of UCS2. This is also sometimes referred to as ISO-10646. ISO-10646 is a standard that specifies up to 2,147,483,648 characters in 32768 planes, of which the first plane is the UCS2 set. The ISO standard also specifies transformations between different encodings.
Being able to use as many languages as desired. A universal character set, such as Unicode, helps to provide unrestricted multilingual support because it supports a very large character repertoire, encompassing most modern languages of the world.
A variable-width encoding of UCS2 which uses sequences of 1, 2, or 3 bytes per character. Characters from 0-127 (the 7-bit ASCII characters) are encoded with one byte, characters from 128-2047 require two bytes, and characters from 2048-65535 require three bytes. The Oracle character set name for this is UTF8 (for the Unicode 2.0 standard). The standard has left room for expansion to support the UCS4 characters with sequences of 4, 5, and 6 bytes per character.
An extension to UCS2 that allows for pairs of UCS2 code points to represent extended characters from the UCS4 set. UCS2 has ranges of code points allocated for high (leading) and low (trailing) surrogates that support UTF16 encodings.
A fixed-width character format that is well-suited for extensive text processing because it allows for data to be processed in consistent fixed-width chunks. Wide characters are intended for supporting internal character processing, and are therefore implementation-dependent.