20
Cache Fusion and Inter-instance Performance

This chapter explains how Cache Fusion resolves reader/writer conflicts in Oracle Parallel Server. It describes Cache Fusion and its benefits in general terms that apply to most types of systems and applications. The chapter also describes OPS- and Cache Fusion-related statistics and provides many procedures that explain how to use these statistics to monitor and tune performance.

The topics in this chapter are:

The Role of Cache Fusion in Resolving Cache Coherency Conflicts
How Cache Fusion Produces Consistent Read Blocks
Improved Scalability with Cache Fusion
The Interconnect and Interconnect Protocols for OPS
Performance Expectations
Monitoring Cache Fusion and Inter-instance Performance

Note:
For an overview of Cache Fusion processing, please refer to "Cache Fusion Processing and the Block Server Process".

The Role of Cache Fusion in Resolving Cache Coherency Conflicts

Inter-instance contention for data blocks and the resulting cache coherency issues are the main performance problems of OPS. In most cases, proper partitioning resolves most contention problems.

In reality, however, most packaged applications are not effectively partitioned, or are partitioned only to a limited extent. Such applications experience 3 types of inter-instance contention:

Reader/Reader
Reader/Writer
Writer/Writer

Reader/writer contention occurs when one instance needs to read a data block in consistent mode and the correct version of the block does not exist in the instance's cache. OPS easily resolves this type of contention because multiple instances can share the same blocks for read access without cache coherency conflicts. The other types of contention, however, are more complex from a cache coherency point-of-view.

In the case of inserts into tables, for example, writer/writer conflicts are partially addressed by free list groups. In other cases, however, the only alternative is to address writer/writer cache coherency issues by isolating hot blocks using locking, by implementing deferred pinging, or by application partitioning. Reader/writer conflicts, on the other hand, are more prevalent and easier to resolve.

Reader/writer contention is the most common type of contention in OLTP and hybrid applications. The ability to combine DSS and OLTP processing in a typical application depends on OPS' efficiency in resolving such conflicts.

For the "reader" part of reader/writer conflicts there are two subcategories: the contention caused by current readers and contention caused by consistent read readers. Of these two, consistent read readers are typically more prevalent and Cache Fusion directly addresses these.

How Cache Fusion Produces Consistent Read Blocks

If a data block requested by one instance is in the memory cache of a remote instance, Cache Fusion resolves the conflict using remote memory access, not disk access. The requesting instance sends a request for a consistent-read copy of the block to the holding instance. The BSP (Block Server Process) on the holding instance transmits the consistent-read image of the requested block directly from the holding instance's buffer cache to the requesting instance's buffer cache across a high speed interconnect.

As Figure 21-1 illustrates, Cache Fusion enables the buffer cache of one node to ship data blocks directly to the buffer cache of another node by way of low latency, high bandwidth interconnects. This reduces the need for expensive disk I/O in parallel cache management.

Cache Fusion also leverages new interconnect technologies for low latency, user-space based, interprocessor communication. This drastically lowers CPU usage by reducing operating system context switches for inter-node messages.

Figure 20-1 Cache Fusion Ships Blocks from Cache to Cache Across the Interconnect

Oracle manages write/write contention using conventional disk-based PCM (Parallel Cache Management). A later version of Oracle will use Cache Fusion to provide faster writer/writer contention resolution.

Note:
Cache Fusion is always enabled.

See Also:
For more information on partitioning, please refer to Oracle8i Concepts

Partitioning Data to Improve Write/write Conflict Resolution

Cache Fusion only solves part of the block conflict resolution issue by providing improved scalability for applications that experience high levels of reader/writer contention. For applications with high writer/writer contention levels, you also need to accurately partition your application's tables to reduce the potential for writer/writer conflicts.

See Also:
For more information on partitioning, please refer to Oracle8i Tuning and Oracle8i Concepts.

Improved Scalability with Cache Fusion

Cache Fusion improves application transaction throughput and scalability by providing:

Greatly reduced operating system context switches, and hence reduced CPU utilization, during reader/writer cache coherency conflicts
Further reduced CPU utilization for user-mode IPC platforms
Reduced I/O for block pinging and reduced X-to-S lock conversions
Consistent-read block transfers by way of high speed interconnects

Applications demonstrating high reader/writer conflict rates under disk-based PCM benefit the most from Cache Fusion.

Note:
If 20% or more of your application's inter-instance activity was due to reader/writer contention, you will likely see significant performance gains from Cache Fusion.

Packaged applications also scale more effectively as a result of Cache Fusion. Applications in which OLTP and reporting functions execute on separate nodes may also benefit from Cache Fusion. Reporting functions that access data from tables modified by OLTP functions receive their versions of data blocks by way of high-speed interconnects. This reduces the 'pinging' of data blocks to disk.

Performance gains are derived primarily from reduced X-to-S lock conversions and the corresponding reduction in disk I/O for X-to-S lock conversions.

Note:
All applications achieve some performance gains from Cache Fusion. The degree of improvement depends upon the operating system, the application workload, and the overall system configuration.

The direct mapping of data buffers across instances during inter-processor communication also avoids having to copy memory from one address space to another. This shortened execution path reduces CPU requirements and increases the communications bandwidth as explained under the following headings.

Reduced Context Switches and CPU Utilization

Cache Fusion dramatically reduces operating system context switches. This results in reduced CPU utilization and frees CPU cycles for applications processing.

Reduced CPU Utilization with User-mode IPCs

Cache Fusion reduces CPU utilization by taking advantage of user-mode IPCs, also known as "memory-mapped IPCs", for both Unix- and NT-based platforms. If the appropriate hardware support is available, operating system context switches are minimized beyond the basic reductions achieved with Cache Fusion alone. This also eliminates costly data copying and system calls.

User-mode IPCs reduce CPU utilization because user processes can communicate without using the operating system kernel. In other words, there is no need to switch from user execution mode to kernel execution mode.

Reduced I/O for Block Pinging and Reduced X-to-S Lock Conversions

Cache Fusion reduces expensive lock operations and disk I/O for data and undo segment blocks by transmitting consistent-read blocks directly from one instance's buffer cache to another. This can reduce the latency required to resolve reader/writer conflicts by as much as 90%.

Disk-based PCM may require as much as 80ms (milliseconds) to resolve reader/writer conflicts. This involves disk I/O for the requested block as well as I/O to write rollback segment blocks to disk.

Cache Fusion resolves reader/writer conflicts with approximately 1/10th the processing effort required by disk-based PCM using little or no disk I/O. To do this, Cache Fusion only incurs overhead for processing the consistent-read request and for constructing a consistent-read copy of the requested block in memory and transferring it to the requesting instance. On some platforms this can take less than 1ms.

Consistent-read Block Transfers by way of High Speed Interconnects

Because Cache Fusion exploits high speed IPCs, OPS benefits from the performance gains of the latest technologies for low latency communication across cluster interconnects. Further performance gains can be expected with even more efficient protocols, such as VIA and user-mode IPCs.

The Interconnect and Interconnect Protocols for OPS

The primary components affecting Cache Fusion performance are the interconnect and the protocols that process inter-node communication. The interconnect bandwidth, its latency, and the efficiency of the IPC protocol determine the speed with which Cache Fusion processes consistent-read block requests.

Influencing Interconnect Processing

Once your interconnect is operative, you cannot significantly influence its performance. However, you can influence a protocol's efficiency by adjusting the IPC buffer sizes.

Supported Interconnect Software

Interconnects supporting OPS and Cache Fusion use one of these protocols:

TCP/IP (Transmission Control Protocol/Interconnect Protocol)
UDP (User Datagram Protocol)
VIA (Virtual Interconnect Architecture)
Other proprietary protocols that are hardware vendor-specific

OPS can use any interconnect product that supports these protocols. The interconnect product must also be certified for OPS hardware cluster platforms.

Performance Expectations

Cache Fusion performance levels may vary in terms of latency and throughput from application to application. Performance is further influenced by the type and mixture of transactions your system processes.

The performance gains from Cache Fusion also vary with each workload. The hardware, the interconnect protocol specifications, and the operating system resource usage also affect performance.

As mentioned earlier, if your application did not demonstrate a significant amount of consistent-read contention prior to Cache Fusion, your performance with Cache Fusion will likely remain unchanged. However, if your application experienced numerous lock conversions and heavy disk I/O as a result of consistent-read conflicts, your performance with Cache Fusion should improve dramatically.

As an example, query statistics in V$SYSSTAT and you should observe that your system's processing with Cache Fusion has fewer X-to-S lock converts. The fewer X-to-S lock conversions your application generates, the less disk I/O your system requires. The following section, "Monitoring Cache Fusion and Inter-instance Performance", describes how to evaluate Cache Fusion performance in more detail.

See Also:
For more information on lock types, please refer to Chapter 7, "Overview of Locking Mechanisms".

Cache Fusion Block Request Latencies

Block request latencies in Cache Fusion can vary according to the protocol in use. With TCP/UDP, the latency can range from 5 to 50ms. The performance statistics achieved with TCP/UDP protocols, however, vary from system to system.

With VIA, which is only available on NT, the latency can be less than 1ms. Performance statistics for the VIA protocol do not vary significantly because VIA uses fewer context switches. Fewer context switches mean reduced overhead.

Monitoring Cache Fusion and Inter-instance Performance

This section describes how to obtain and analyze OPS and Cache Fusion statistics to monitor inter-instance performance. Topics in this section include:

Goals of Monitoring Cache Fusion and OPS Performance
Statistics for Monitoring OPS and Cache Fusion
Using V$SYSTEM_EVENTS to Identify Performance Problems

Goals of Monitoring Cache Fusion and OPS Performance

The main goal of monitoring Cache Fusion and OPS performance is to examine the latency and fine-tune your system's processing by observing trends over time. Do this by analyzing the performance statistics from several views as described in the following sections. Use these monitoring procedures on an on-going basis to observe processing trends and to maintain processing at optimal levels.

Latency Statistics in OPS

The procedures in the following sections describe performance issues in terms of latency. The procedures also describe how to analyze other performance-related issues such as Integrated Distributed Lock Manager (IDLM) resource use and the status of general system events.

The Role of Latency in OPS Processing

Latency is the most important aspect of OPS and Cache Fusion performance. Latency is the amount of time required to complete a request for a consistent-read block. Latency is influenced by the type of requests and responses involved in consistent-read operations. Each type of request may have a different outcome as described in the following:

When an instance needs a consistent-read version of a data block, it sends a lock request to the IDLM to determine which instance is currently modifying the data block. This request is typically for a shared lock. If a data block is currently being modified by another instance, the IDLM forwards the consistent-read request to the owner of the X lock for that block. If the block is not being modified by an instance, the IDLM grants an S lock to the requesting instance and the requesting instance reads the block from disk.
The BSP on the instance holding the lock for the requested block uses messaging to prepare and transfer a consistent-read version of the block to the requesting node. BSP first looks for the version of the requested block in its cache. If the correct version is not there, the holding instance builds the version using rollback segment information from the local cache. This avoids disk I/O.
The holding instance may respond to a requesting instance with a message granting the requestor permission to read the block from disk when the requested block is no longer in the holding instance's cache. In other words, while most consistent-read requests are filled without disk I/O, some transactions may require the requestor to read the block from disk.

Statistics for Monitoring OPS and Cache Fusion

Oracle collects Cache Fusion-related performance statistics from the buffer cache and IDLM layers. Oracle also collects general OPS statistics for lock requests and lock waits. You can use several views to examine these statistics.

Maintaining an adequate history of system performance helps you more easily identify trends as these statistics change. This is especially important for identifying increasing latencies and adverse workload changes.

Procedures in this section use statistics that are grouped according to the following topics:

Analyzing Global Cache and Cache Fusion Statistics
Analyzing Global Lock Statistics
Analyzing IDLM Resource, Lock, Message, and Memory Resource Statistics
Analyzing OPS I/O Statistics
Analyzing Lock Conversions by Type
Analyzing Latch, OPS, and IDLM-related Statistics

In many cases, resolving performance issues requires that you first identify a problem using the specific procedures in each statistics group. You then use the V$SYSTEM_EVENT view to pinpoint the cause as described under the heading, "Events in V$SYSTEM_EVENTS Specifically Related to OPS".

You must set the parameter TIMED_STATISTICS to TRUE for Oracle to collect statistics for most views discussed in the procedures in this section. The timed statistics from views discussed in this chapter are displayed in units of 1/100ths of a second.

Note:
You must also run CATPARR.SQL to create OPS-related views and tables for storing and viewing statistics as described under the next heading.

Creating OPS Data Dictionary Views with CATPARR.SQL

The SQL script CATPARR.SQL creates parallel server data dictionary views. To run this script, you must have SYSDBA privileges.

CATALOG.SQL creates the standard V$ dynamic views, as described in the Oracle8i Reference, as well as:

GV$CACHE
GV$PING
GV$CLASS_PING
GV$FILE_PING

You can rerun CATPARR.SQL if you want the EXT_TO_OBJ table to contain the latest information after you add extents. If you drop objects without rerunning CATPARR.SQL, EXT_TO_OBJ may display misleading information.

Global Dynamic Performance Views

Tuning and performance information for the Oracle database is stored in a set of dynamic performance tables known as the "V$ fixed views". Each active instance has its own set of fixed views. In OPS, you can query a global dynamic performance (GV$) view to retrieve the V$ view information from all qualified instances. A global fixed view is available for all of the existing dynamic performance views except for V$ROLLNAME, V$CACHE_LOCK, V$LOCK_ACTIVITY, and V$LOCKS_WITH_COLLISIONS.

The global view contains all the columns from the local view, with an additional column, INST_ID (datatype INTEGER). This column displays the instance number from which the associated V$ information was obtained. You can use the INST_ID column as a filter to retrieve V$ information from a subset of available instances. For example, the query:

   SELECT * FROM GV$LOCK WHERE INST_ID = 2 or INST_ID = 5;

Retrieves information from the V$ views on instances 2 and 5.

Each global view contains a GLOBAL hint that creates a parallel query to retrieve the contents of the local view on each instance.

If you have reached the limit of PARALLEL_MAX_SERVERS on an instance and you attempt to query a GV$ view, one additional parallel server process will be spawned for this purpose. The extra process is not available for parallel operations other than GV$ queries.

Note:
If PARALLEL_MAX_SERVERS is set to zero for an instance, additional parallel server processes do not spawn to accommodate a GV$ query.

If you have reached the limit of PARALLEL_MAX_SERVERS on an instance and issue multiple GV$ queries, all but the first query will fail. In most parallel queries, if a server process could not be allocated this would result in either an error or a sequential execution of the query by the query coordinator.

For global views, it may be acceptable to continue running the query in parallel and return the data from the instances that could allocate servers for the query. If it is acceptable to retrieve results only from instances where server allocation succeeded, set the value to TRUE.

See Also:
"Specifying Instance Groups" and Oracle8i Reference for restrictions on GV$ views and complete descriptions of all related parameters and V$ dynamic performance views.

Analyzing Global Cache and Cache Fusion Statistics

Oracle collects global cache statistics at the buffer cache layer within an instance. These statistics include counts and timings of requests for global resources.

Requests for global locks on data blocks originate in the buffer cache of the requesting instance. Before a request enters the IDLM, Oracle allocates data structures in the SGA to track the state of the request. These structures are called "lock elements".

To monitor global cache statistics, query the V$SYSSTAT view and analyze its output as described in the following procedures.

Procedures for Monitoring Global Cache Statistics

Complete the following steps to analyze global cache statistics.

Use this syntax to query V$SYSSTAT:

   SELECT * FROM V$SYSSTAT WHERE NAME LIKE 'global cache';

Oracle responds with output similar to:

NAME                                                             	VALUE
---------------------------------------------------------------- 	----------
global cache gets                                                     	12480
global cache get time                                                   	996
global cache converts                                                   	 21
global cache convert time                                                	48
global cache cr blocks received                                          	 1
global cache cr block receive time                                        	1
global cache cr read from disk                                            	0
global cache freelist waits                                               	0
global cache defers                                                       	0
global cache convert timeouts                                             	0
global cache cr timeouts                                                 	 0
global cache fairness down converts                                      	 0

Use your V$SYSSTAT output to perform the calculations and analyses described in the remaining procedures for this statistics group.

Procedures 2 and 3 use the following Cache Fusion statistics from V$SYSSTAT:

global cache cr blocks received
global cache cr blocks receive time
global cache cr read from disk
global cache cr timeouts
global cache fairness down converts
Calculate the latency for Cache Fusion requests using this formula:

The result, which typically varies from 5 to 40ms depending on your system configuration and volume, is the average latency of a consistent-read request round trip from requesting instance, to holding instance, and back to the requesting instance. If your CPU has limited idle time and your system typically processes long-running queries, the latency may be higher. However, it is possible to have an average latency of less than 1ms if your interconnect protocol is user-mode IPC.

Consistent-read server request latency can also be influenced by a high value for the DB_MULTI_BLOCK_READ_COUNT parameter. This is because a requesting process may issue more than one request for a block depending on the setting of this parameter. Correspondingly, the requesting process may wait longer.

Calculate the total number of consistent-read requests using this formula:

A high proportion of read permissions from disk indicates that blocks are rapidly aging out of the buffer cache of the holding instance. To resolve this, increase the size of your buffer cache.

Procedures 4 and 5 require that you take snapshots of your statistics, for example, by using UTLBSTAT and UTLESTAT.

See Also:
For more information on UTLBSTAT and UTLESTAT, please refer to Oracle8i Tuning.

Procedure 4 uses the following global cache statistics from V$SYSSTAT:

global cache convert time
global cache converts
global cache get time
global cache gets
Calculate the average convert times and average get times using one of these formulas:

If the average convert or get time is high, there is excessive contention. Another cause may be that latencies for lock operations are high due to overall system workload or system problems. A reasonable value for a cache gets is 20-30ms while converts should take 10-20ms on average.

Oracle increments global cache gets when a new lock on a resource is opened. A convert is counted when there is already an open lock and Oracle converts it to another mode.

The elapsed time for a get thus includes the allocation and initialization of new locks. If the average cache get or average convert times are excessive, your system may be experiencing timeouts.

If the global cache convert times or global cache get times are high, refer to statistics in the V$SYSTEM_EVENTS view to identify events with a high value for TIME_WAITED statistics.

Analyze lock convert timeouts by examining the value for 'global cache convert timeouts'. If your V$SYSSTAT output shows a value other than zero (0) for this statistic, check your system for congestion or high contention. In general, convert timeouts should not occur; their existence indicates serious performance problems.
Analyze the global cache consistent-read timeouts by examining the value for this statistic in your V$SYSSTAT output. Oracle increments this statistic after the system waits too long for the completion of a consistent-read request. If this statistic shows a value other than zero (0), too much time has elapsed after the initiation of a consistent-read request and a timeout has occurred. If this happens, you will also usually find that the average time for consistent-read request completions has increased. If you have timeouts and the latency is high, your system may have an excessive workload or there may be excessive contention for data blocks.

Analyzing Global Lock Statistics

Global lock statistics provide counts and timings for both PCM and non-PCM lock activity. Oracle collects global lock statistics from the IDLM API layer. All Oracle clients to the IDLM, of which the buffer cache is only one, make their requests to the IDLM through this layer. Thus, global lock statistics include lock requests originating from all layers of the kernel, while global lock statistics relate to buffer cache OPS activity.

Use procedures in this section to monitor data from V$SYSSTAT to derive averages, latencies, and counts. This establishes a rough indicator of the OPS workload generated by an instance.

Procedures for Analyzing Global Lock Statistics

Use the following procedures to view and analyze statistics from V$SYSSTAT for global lock processing.

Use this syntax to query V$SYSSTAT:

   SELECT * FROM V$SYSSTAT WHERE NAME LIKE 'global lock';

Oracle responds with output similar to:

NAME                                                             	VALUE
---------------------------------------------------------------- 	----------
global lock sync gets                                                   	703
global lock async gets                                                	12748
global lock get time                                                   	1071
global lock sync converts                                               	303
global lock async converts                                               	41
global lock convert time                                                 	93
global lock releases                                                    	573

Use your V$SYSSTAT output to perform the calculations and analyses described in the remaining procedures in this statistics group.

Calculate the average global lock gets using this formula:

If the result is more than 20 or 30ms, query the TIME_WAITED column in V$SYSTEM_EVENTS using the DESCEND keyword to identify which lock events are waited for most frequently using this query:

   SELECT EVENT_TIME_WAITED, AVERAGE_WAIT 
   FROM V$SYSTEM_EVENTS 
   ORDER BY TIME_WAITED DESCEND;

Oracle increments global lock gets when a new lock on a resource is opened. A convert is counted when there is already an open lock and Oracle converts it to another mode.

The elapsed time for a get thus includes the allocation and initialization of new locks. If the average lock get or average lock convert times are excessive, your system may be experiencing timeouts.

If the global lock convert times or global lock get times are high, refer to statistics in the V$SYSTEM_EVENTS view to identify events with a high value for TIME_WAITED statistics.

Calculate the average global lock convert time using this formula:

If the result is more than 20ms, query the TIME_WAITED column in V$SYSTEM_EVENTS using the DESCEND keyword to identify the event causing the delay.

Analyze the V$LIBRARYCACHE and V$ROWCACHE views to observe IDLM activity on non-PCM resources. These views have IDLM-specific columns that identify IDLM resource use. Do this if you have frequent and extended waits for library cache pins, enqueues, or DFS lock handles.

Analyzing IDLM Resource, Lock, Message, and Memory Resource Statistics

Oracle collects IDLM resource, lock, and message statistics at the IDLM level. Use these statistics to monitor IDLM latency and workloads. These statistics appear in the V$DLM_CONVERT_LOCAL and V$DLM_CONVERT_REMOTE views.

These views record average convert times, count information, and timed statistics for each type of lock request. V$DLM_CONVERT_LOCAL shows statistics for local lock operations. V$DLM_CONVERT_REMOTE shows values for remote conversions. The average convert times in these views are in 100ths of a second.

Note:
Count information in these views is cumulative for the life of an instance.

How IDLM Workloads Affect Performance

The IDLM workload is an important aspect of OPS and Cache Fusion performance because each consistent-read request results in a lock request. High IDLM workloads as a result of heavy request rates can adversely affect performance.

The IDLM performs local lock operations entirely within the local node, or in other words, without sending messages. Remote lock operations require sending messages to and waiting for responses from other nodes. Most down-converts, however, are local operations for the IDLM.

The following procedures for analyzing IDLM resource, locks, and message statistics appear in two groups. The first group of procedures explains how to monitor IDLM resources and locks. The second group explains how to monitor message statistics.

Procedures for Analyzing IDLM Resource and Lock Statistics

Use the following procedures to obtain and analyze statistics from V$DLM_CONVERT_LOCAL and V$DLM_CONVERT_REMOTE for DLM resource processing.

You must enable event 29700 to populate the V$DLM_CONVERT_LOCAL and V$DLM_CONVERT_REMOTE views. Do this by entering this syntax:

   EVENT="29700 TRACE NAME CONTEXT FOREVER"

Use this syntax to query V$DLM_CONVERT_LOCAL:

   SELECT CONVERT_TYPE, 
   AVERAGE_CONVERT_TIME, 
   CONVERT_COUNT 
   FROM V$DLM_CONVERT_LOCAL;

Oracle responds with output similar to:

CONVERT_TYPE                           AVERAGE_CONVERT_TIME CONVERT_COUNT
-------------------------------------- -------------------- -------------
NULL -> SS                                                0             0
NULL -> SX                                                0             0
NULL -> S                                                 1           146
NULL -> SSX                                               0             0
NULL -> X                                                 1            92
SS   -> SX                                                0             0
SS   -> S                                                 0             0
SS   -> SSX                                               0             0
SS   -> X                                                 0             0
SX   -> S                                                 0             0
SX   -> SSX                                               0             0
SX   -> X                                                 0             0
S    -> SX                                                0             0
S    -> SSX                                               0             0
S    -> X                                                 3            46
SSX  -> X                                                 0             0
16 rows selected.

Use this syntax to query V$DLM_CONVERT_REMOTE:

SELECT * FROM V$DLM_CONVERT_REMOTE;

Oracle responds with output identical in format to the output for V$DLM_CONVERT_LOCAL.

Use your output from V$DLM_CONVERT_LOCAL and V$DLM_CONVERT_REMOTE to perform the calculation described in the following procedure.

Calculate the ratio of local-to-remote lock operations using this query:

   SELECT r.CONVERT_TYPE,
    r.AVERAGE_CONVERT_TIME,
    l.AVERAGE_CONVERT_TIME,
    r.CONVERT_COUNT,
    l.CONVERT_COUNT,
      FROM V$DLM_CONVERT_LOCAL l, V$DLM_CONVERT_REMOTE r
      GROUP BY r.CONVERT_TYPE;

It is useful to maintain a history of workloads and latencies for lock converts. Changes in request rates for lock operations without increases in average latencies usually results from changing application workload patterns.

Deterioration of both request rates and latencies usually indicates an increased rate of lock conflicts or an increased workload due to message latencies, system problems, or timeouts. If the LMD process shows high CPU consumption, or consumption that is greater than 20% of the CPU, while overall system resource consumption is normal, consider binding the LMD process to a specific processor if the system has more than one CPU.

If latencies increase, also examine CPU data and other operating system statistics that you can obtain using utilities such as "sar," "vmstat" and "netstat".

IDLM Message Statistics

The IDLM sends messages either directly or by using flow control. For both methods, the IDLM attaches markers known as "tickets" to each message. The allotment of tickets for each IDLM is limited. However, the IDLM can re-use tickets indefinitely.

IDLMs send messages directly until no more tickets are available. When an IDLM runs out of tickets, messages must wait in a flow control queue until outstanding messages finish processing and more tickets are available. Flow-controlled messaging is managed by the LMD process.

The rationing of tickets prevents one node from sending an excessive amount of messages to another node during periods of heavy inter-instance communication. This also prevents one node with heavy remote consistent-read block requirements from assuming control of messaging resources throughout a cluster at the expense of other, less-busy nodes.

The V$DLM_MISC view contains the following statistics about message activity:

DLM messages sent directly
DLM messages flow controlled
DLM messages received
DLM total incoming message queue length

Procedures for Analyzing IDLM Message Statistics

Use the following procedures to obtain and analyze message statistics in V$DLM_MISC.

Use this syntax to query V$DLM_MISC:

   SELECT NAME, VALUE FROM V$DLM_MISC;

Oracle responds with output similar to:

STATISTIC# NAME
---------- -------------------------------------VALUE-----
         0 dlm messages sent directly           29520

         1 dlm messages flow controlled          1851

         2 dlm messages received                29668

         3 dlm total incoming msg queue length    297

4 rows selected.

Note:
Oracle support may request information from your V$DLM_MISC output for debugging purposes.

Use your output from V$DLM_MISC to perform the following procedure.

Calculate the average receive queue length between two snapshots using this equation:

Oracle increments the value for 'total incoming message queue length' whenever a new request enters the LMD process' message queue. When messages leave the LMD queue to begin processing, Oracle increments the value for 'messages received'.

The size of the queue may increase if a large number of requests simultaneously arrives at the LMD. This can occur when the volume of locking activity is high or when the LMD processes a large quantity of consistent-read requests. Typically, the average receive queue length is less than 10.

Analyzing OPS I/O Statistics

In addition to the global cache and global lock statistics that were previously discussed, you can also use statistics in V$SYSSTAT to measure the I/O workload related to global conflict resolution. There are three important sets of statistics in V$SYSSTAT for this purpose:

DBWR forced writes
Remote instance undo header writes
Remote instance undo block writes

DBWR forced writes occur when Oracle resolves inter-instance data block contention by writing the requested block to disk before the requesting node can use it.

If a consistent-read request requires information from another instance's cache to roll back a block and make it read consistent, Oracle must also write rollback segment headers and rollback segment blocks to disk. One instance must write the undo blocks and undo headers to disk while another instance reads them.

Cache Fusion minimizes the disk I/O for consistent-reads. This can lead to a substantial reduction in physical writes performed by each instance. Before Cache Fusion, a consistent-read involving data from a remote instance required up to 3 writes, 3 reads, a rollback segment header, an undo segment block, and multiple lock converts for one requested block.

Procedures for Analyzing OPS I/O Statistics

Use the following procedures to obtain and analyze message statistics in V$SYSSTAT.

Use this syntax to query V$SYSSTAT:

   SELECT NAME, VALUE FROM V$SYSSTAT 
   WHERE NAME IN ('DBWR forced writes', 
   'remote instance undo block writes',
   'remote instance undo header writes',
   'physical writes');.

Oracle responds with output similar to:

NAME                                                           VALUE
--------------------------------------------------------- ----------
physical writes                                                41802
DBWR forced writes                                              5403
remote instance undo block writes                                  0
remote instance undo header writes                                 2
4 rows selected.

Use your V$SYSSTAT output to perform the following calculations.

Calculate the ratio of OPS-related I/O to overall physical I/O using this equation:

You should see a noticeable decrease in this ratio between this calculation and pre-Cache Fusion statistics.

Calculate how many writes to rollback segments occur when a remote instance needs to read from rollback segments that are in use by a local instance using this equation:

The ratio shows how much disk I/O is related to writes to rollback segments. With Cache Fusion, this ratio should be very low.

Analyzing Lock Conversions by Type

This section describes how to analyze output from three views to quantify lock conversions by type. The tasks and the views discussed in this section are:

Using V$LOCK_ACTIVITY to Analyze Lock Conversions
Using V$CLASS_PING to Identify Pinging by Block Class
Using V$PING to Identify Hot Objects

See Also:
For more information about lock conversions, please refer to Chapter 8, "Integrated Distributed Lock Manager".

Using V$LOCK_ACTIVITY to Analyze Lock Conversions

V$LOCK_ACTIVITY summarizes how many lock up- and down-converts have occurred during an instance's lifetime. X-to-N down-converts denote the number of times a lock was down-converted because another instance wanted to modify a resource.

The other major type of down-convert is X-to-S. This type of down-convert occurs when an instance reads a resource that was last modified by a local instance. Both types of lock conversions involve I/O. However, Cache Fusion should reduce X-to-S down-converts because they are not needed for buffer locks.

Using V$CLASS_PING to Identify Pinging by Block Class

V$CLASS_PING summarizes lock conversion activity by showing whether disk I/O is occurring on the following classes of blocks:

Data blocks
Segment headers
Extent headers
Undo blocks

All X_2_NULL_FORCED_WRITE and X_2_S_FORCED_WRITE conversions involve write I/O. In other words, values in the columns for each block class provide an indicator of the cause of the disk I/O.

Using V$PING to Identify Hot Objects

V$PING helps identify "hot" blocks and "hot" objects. The sum of the columns FORCED_READS and FORCED_WRITES indicates the actual pinging activity on a particular block or object.

See Also:
For more information about V$PING, please refer to "Querying the V$PING View to Detect Pinging".

All three views provide different levels of detail. If you suspect that pinging or OPS itself is the cause of a performance problem, monitor V$LOCK_ACTIVITY to generate an overall OPS workload profile. Use information from V$LOCK_ACTIVITY to record the rate at which lock conversions occur.

For more details, use V$CLASS_PING to identify the type of block on which lock conversions and pinging are occurring. Once you have identified the class, use V$PING to obtain details about a particular table or index and the file and block numbers on which there is significant lock conversion activity.

Analyzing Latch, OPS, and IDLM-related Statistics

Latches are low level locking mechanisms that protect SGA data structures. Excessive contention for latches degrades performance.

Use V$DLM_LATCH and V$LATCH_MISSES to monitor latch contention within the IDLM. These views show information about a particular latch, its statistics, and the location in the code from where the latch is acquired.

Procedures for Analyzing Latch, OPS, and IDLM-related Statistics

Use the following procedures to analyze latch, OPS, and IDLM-related statistics.

Query V$LATCH using this syntax:
```
   SELECT * FROM V$LATCH; 
```

Oracle responds with output similar to:

cache buffer handle

46184

1

1

0

0

cache buffers chained

84139946

296547

.996

29899

.101

cache buffers lru

4760378

11718

.998

227

.019

channel handle pool

1

0

1

0

0

channel operations

1

0

1

0

0

dlm ast latch

542776

494

.999

1658

3.356

dlm cr bast queue

37194

1

1

0

0

dlm deadlock list

32839

0

1

0

0

dlm domain lock la

1

0

1

0

0

dlm domain lock ta

49164

1

1

0

0

dlm group lock lat

1

0

1

0

0

dlm group lock tab

25239

1

1

0

0

dlm lock table fre

325306

270

.999

327

1.211

dlm process hash l

6346

0

1

0

0

dlm process table

2

0

1

0

0

dlm recovery domai

2014

0

1

0

0

dlm resource hash

683031

1709

.997

41342

24.191

dlm resource scan

188

0

1

0

0

dlm resource table

182093

70

1

2

.029

dlm shared communication

190766

211

.999

313

1.483

dlm timeout list

113294

40

0

3

.075

dml lock allocation

261

0

1

0

0

22 rows selected.

Note:
The content of the five columns in this output example from left to right are: gets, hits, misses, sleeps, and the sleeps-to-misses ratio.

If the output from the previous procedure reveals a high ratio of sleeps-to-misses, attempt to determine where the sleeps occur. To do this, execute this query on the V$LATCH_MISSES view:

   SELECT PARENT_NAME, "WHERE", SLEEP_COUNT
   FROM V$LATCH_MISSES
   ORDER BY SLEEP_COUNT DESCENDING;

Oracle responds with output similar to:

PARENT_NAME

WHERE

SLEEP_COUNT

------------------------

----------------------

--------------

dlm resource hash list

kjrrmas1: lookup master n

39392

cache buffers chains

kcbgtcr: kslbegin

27738

library cache

kglhdgn: child:

15408

shared pool

kghfnd: min scan

6876

cache buffers chains

kcbrls: kslbegin

2124

shared pool

kghalo

1667

dlm ast latch

kjucll: delete lock from

1464

7 rows selected.

Use your V$LATCH and V$LATCH_MISSES output to perform the following procedures.

Calculate the ratio of gets to misses using your V$LATCH output from the first procedure in this group in this formula:

High numbers for misses usually indicate contention for the same resources and locks. Acceptable ratios range from 90 to 95%.

Analyze the ratio of sleeps to misses using your V$LATCH_MISSES output from procedure 2. This ratio determines how often a process sleeps when it cannot immediately get a latch but wants to wait for the latch.

A ratio of 2 means that for each miss, a process attempts to get a latch twice before acquiring it. A high number of sleeps-to-misses usually indicates process scheduling delays or high operating system workloads. It can also indicate internal inefficiencies or high concurrency on one resource. For example, when many locks are opened simultaneously on the same resource, then processes might have to wait for a resource latch.

In V$LATCH_MISSES, the WHERE column shows the function in which the latch is acquired. This information is useful in determining internal performance problems. Usually, the latch slept on for long periods shows up in V$SESSION_WAIT or V$SYSTEM_EVENT under the 'latch free' wait event category.

The following section describes how to use V$SYSTEM_EVENTS in more detail.

Using V$SYSTEM_EVENTS to Identify Performance Problems

Data about Cache Fusion and OPS events appears in the V$SYSTEM_EVENT view. To identify events for which processes have waited the longest, query V$SYSTEM_EVENT on the TIME_WAITED column using the DESCENDING keyword. The TIME_WAITED column shows the total wait time for each system event listed. For an example of how to query V$SYSTEM_EVENTS, refer to Step 2.

By generating an ordered list of event waits, you can easily locate performance bottlenecks. Each COUNT represents a voluntary context switch. The TIME_WAIT value is the cumulative time that processes waited for particular system events. The values in the TOTAL_TIMEOUT and AVERAGE_WAIT columns provide additional information about system efficiency.

Events in V$SYSTEM_EVENTS Specifically Related to OPS

The following events appearing in V$SYSTEM_EVENT output represent waits for OPS events:

Global cache cr request
Library cache pin
Buffer busy due to global cache
Global cache lock busy
Global cache lock open x
Global cache lock open s
Global cache lock null to x
Global cache lock s to x
Global cache lock null to s

Events Related to Non-PCM Resources

You can monitor other events in addition to those listed under the previous heading because performance problems may be related to OPS. These events are:

Row cache locks
Enqueues
Library cache pins
DFS lock handle

General Observations

If the time waited for global cache events is high relative to other waits, look for increased latencies, contention, or excessive system workloads using V$SYSSTAT statistics and operating system performance monitors. A high number of global cache busy or buffer busy waits indicates increased contention in the buffer cache.

In OLTP systems with data block address locking and a high degree of contention, it is not unusual when the global cache wait events represent a high proportion of the sum of the total time waited.

If a lot of wait time is used by waits for non-buffer cache resources as indicated by statistics in the rows 'row cache lock', 'enqueues', and 'library cache pin', monitor the V$ROWCACHE and V$LIBRARYCACHE views for OPS-related issues. Specifically, observe values in the IDLM columns of each of these views.

Common OPS problems arise from poorly managed space parameters or sequences that are not cached. In such cases, processes wait for row cache locks and enqueues and V$ROWCACHE will show a high number of conflicts for certain dictionary caches.

cache buffer handle	46184	1	1	0	0
cache buffers chained	84139946	296547	.996	29899	.101
cache buffers lru	4760378	11718	.998	227	.019
channel handle pool	1	0	1	0	0
channel operations	1	0	1	0	0
dlm ast latch	542776	494	.999	1658	3.356
dlm cr bast queue	37194	1	1	0	0
dlm deadlock list	32839	0	1	0	0
dlm domain lock la	1	0	1	0	0
dlm domain lock ta	49164	1	1	0	0
dlm group lock lat	1	0	1	0	0
dlm group lock tab	25239	1	1	0	0
dlm lock table fre	325306	270	.999	327	1.211
dlm process hash l	6346	0	1	0	0
dlm process table	2	0	1	0	0
dlm recovery domai	2014	0	1	0	0
dlm resource hash	683031	1709	.997	41342	24.191
dlm resource scan	188	0	1	0	0
dlm resource table	182093	70	1	2	.029
dlm shared communication	190766	211	.999	313	1.483
dlm timeout list	113294	40	0	3	.075
dml lock allocation	261	0	1	0	0
22 rows selected.

PARENT_NAME	WHERE	SLEEP_COUNT
------------------------	----------------------	--------------
dlm resource hash list	kjrrmas1: lookup master n	39392
cache buffers chains	kcbgtcr: kslbegin	27738
library cache	kglhdgn: child:	15408
shared pool	kghfnd: min scan	6876
cache buffers chains	kcbrls: kslbegin	2124
shared pool	kghalo	1667
dlm ast latch	kjucll: delete lock from	1464
7 rows selected.

20 Cache Fusion and Inter-instance Performance

The Role of Cache Fusion in Resolving Cache Coherency Conflicts

How Cache Fusion Produces Consistent Read Blocks

Figure 20-1 Cache Fusion Ships Blocks from Cache to Cache Across the Interconnect

Partitioning Data to Improve Write/write Conflict Resolution

Improved Scalability with Cache Fusion

Reduced Context Switches and CPU Utilization

Reduced CPU Utilization with User-mode IPCs

Reduced I/O for Block Pinging and Reduced X-to-S Lock Conversions

Consistent-read Block Transfers by way of High Speed Interconnects

The Interconnect and Interconnect Protocols for OPS

Influencing Interconnect Processing

Supported Interconnect Software

Performance Expectations

Cache Fusion Block Request Latencies

Monitoring Cache Fusion and Inter-instance Performance

Goals of Monitoring Cache Fusion and OPS Performance

Latency Statistics in OPS

The Role of Latency in OPS Processing

Statistics for Monitoring OPS and Cache Fusion

Creating OPS Data Dictionary Views with CATPARR.SQL

Global Dynamic Performance Views

Analyzing Global Cache and Cache Fusion Statistics

Procedures for Monitoring Global Cache Statistics

Analyzing Global Lock Statistics

Procedures for Analyzing Global Lock Statistics

Analyzing IDLM Resource, Lock, Message, and Memory Resource Statistics

How IDLM Workloads Affect Performance

Procedures for Analyzing IDLM Resource and Lock Statistics

IDLM Message Statistics

Procedures for Analyzing IDLM Message Statistics

Analyzing OPS I/O Statistics

Procedures for Analyzing OPS I/O Statistics

Analyzing Lock Conversions by Type

Using V$LOCK_ACTIVITY to Analyze Lock Conversions

Using V$CLASS_PING to Identify Pinging by Block Class

Using V$PING to Identify Hot Objects

Analyzing Latch, OPS, and IDLM-related Statistics

Procedures for Analyzing Latch, OPS, and IDLM-related Statistics

Using V$SYSTEM_EVENTS to Identify Performance Problems

Events in V$SYSTEM_EVENTS Specifically Related to OPS

Events Related to Non-PCM Resources

General Observations

20
Cache Fusion and Inter-instance Performance