Oracle8(TM) Server Tuning Release 8.0 A54638-01 |
|
Contention occurs when multiple processes try to access the same resource simultaneously. Some processes must then wait for access to various database structures. Topics discussed in this chapter include:
Symptoms of resource contention problems can be found in V$SYSTEM_EVENT. This view reveals various system problems that may be impacting performance, problems such as latch contention, buffer contention, I/O contention. It is important to remember that these are only symptoms of problems--not the actual causes.
For example, by looking at V$SYSTEM_EVENT you might notice lots of buffer busy waits. It may be that many processes are inserting into the same block, but must wait for each other before they can insert. The solution might be to introduce free lists for the object in question.
Buffer busy waits may also have caused some latch free waits. Since most of these waits were caused by misses on the cache buffer hash chain latch, this was also a side effect of trying to insert into the same block. Rather than increasing SPINCOUNT to reduce the latch free waits (a symptom), you should change the object to allow for multiple processes to insert into free blocks. This approach will effectively reduce contention.
See Also: Oracle8 Server Administrator's Guide to understand which resources are used by certain Oracle8 features.
The V$RESOURCE_LIMIT view provides information about current and maximum global resource utilization for some of the system resources. This information enables you to make better decisions when choosing values for resource limit-controlling parameters.
If the system has idle time, start your investigation by checking V$SYSTEM_EVENT. Examine the events with the highest average wait time, then take appropriate action on each. For example, if there is a high number of latch free waits, look in V$LATCH to see which latch is the problem.
For excessive buffer busy waits, look in V$WAITSTAT to see which block type has the highest wait count and the highest wait time. Look in V$SESSION_WAIT for cache buffer waits so you can decode the file and block number of an object.
The rest of this chapter describes common problems. Remember that the different forms of contention are symptoms which can be fixed by making changes in one of two places:
Sometimes you have no alternative but to change the application in order to overcome performance constraints.
The rest of the sections in this chapter provide strategies for solving contention problems. These include:
In this section, you will learn how to reduce contention for rollback segments. The following issues are discussed:
Contention for rollback segments is reflected by contention for buffers that contain rollback segment blocks. You can determine whether contention for rollback segments is reducing performance by checking the dynamic performance table V$WAITSTAT.
V$WAITSTAT contains statistics that reflect block contention. By default, this table is only available to the user SYS and to other users who have SELECT ANY TABLE system privilege, such as SYSTEM. These statistics reflect contention for different classes of block:
Use the following query to monitor these statistics over a period of time while your application is running:
SELECT class, count
FROM v$waitstat
WHERE class IN ('system undo header', 'system undo block',
'undo header', 'undo block');
The result of this query might look like this:
CLASS COUNT
------------------ ----------
system undo header 2089
system undo block 633
undo header 1235
undo block 942
Compare the number of waits for each class of block with the total number of requests for data over the same period of time. You can monitor the total number of requests for data over a period of time with this query:
SELECT SUM(value)
FROM v$sysstat
WHERE name IN ('db block gets', 'consistent gets');
The output of this query might look like this:
SUM(VALUE)
----------
929530
The information in V$SYSSTAT can also be obtained through SNMP.
If the number of waits for any class is greater than 1% of the total number of requests, you should consider creating more rollback segments to reduce contention.
To reduce contention for buffers containing rollback segment blocks, create more rollback segments. Table 18-1 shows some general guidelines for choosing how many rollback segments to allocate based on the number of concurrent transactions on your database. These guidelines are appropriate for most application mixes.
Number of Current Transactions(n) |
Recommended Number of Rollback Segments |
n < 16 |
4 |
16 <= n < 32 |
8 |
32 <= n |
n/4 |
In this section, you will learn how to reduce contention for certain processes used by the Oracle's multi-threaded server architecture:
This section discusses how to identify contention for dispatcher processes, how to add dispatcher processes, and how to enable connection pooling.
Contention for dispatcher processes can be reflected by either of these symptoms:
V$DISPATCHER contains statistics reflecting the activity of dispatcher processes. By default, this table is only available to the user SYS and to other users who have SELECT ANY TABLE system privilege, such as SYSTEM. These columns reflect busy rates for dispatcher processes:
IDLE |
This column shows the idle time for the dispatcher process in hundredths of a second. |
BUSY |
This column shows the busy time for the dispatcher process in hundredths of a second. |
Use the following query to monitor these statistics over a period of time while your application is running:
SELECT network "Protocol",
SUM(busy) / ( SUM(busy) + SUM(idle) ) "Total Busy Rate"
FROM v$dispatcher
GROUP BY network;
This query returns the total busy rate for the dispatcher processes of each protocol; that is, the percentage of time the dispatcher processes of each protocol are busy. The result of this query might look like this:
Protocol Total Busy Rate
-------- ---------------
decnet .004589828
tcp .029111042
From this result, you can make these observations:
If the database is only in use 8 hours per day, statistics need to be normalized by the effective work times. You cannot simply look at statistics from the time the instance started; rather, you must check statistics relevant to the workload you are applying. Thus, if the dispatcher processes for a specific protocol are busy more than 50% of the effective work time, then by adding dispatcher processes you may be able to improve performance for users connected to Oracle using that protocol.
V$QUEUE contains statistics reflecting the response queue activity for dispatcher processes. By default, this table is only available to the user SYS and to other users who have SELECT ANY TABLE system privilege, such as SYSTEM. These columns show wait times for responses in the queue:
Use the following query to monitor these statistics occasionally while your application is running:
SELECT network "Protocol",
DECODE( SUM(totalq), 0, 'No Responses',
SUM(wait)/SUM(totalq) || ' hundredths of seconds')
"Average Wait Time per Response"
FROM v$queue q, v$dispatcher d
WHERE q.type = 'DISPATCHER'
AND q.paddr = d.paddr
GROUP BY network;
This query returns the average time, in hundredths of a second, that a response waits in each response queue for a dispatcher process to route it to a user process. This query uses the V$DISPATCHER table to group the rows of the V$QUEUE table by network protocol. The query also uses the DECODE syntax to recognize those protocols for which there have been no responses in the queue. The result of this query might look like this:
Protocol Average Wait Time per Response
-------- ------------------------------
decnet .1739130 hundredths of seconds
tcp No Responses
From this result, you can tell that a response in the queue for DECNET dispatcher processes waits an average of 0.17 hundredths of a second and that there have been no responses in the queue for TCP dispatcher processes.
If the average wait time for a specific network protocol continues to increase steadily as your application runs, then by adding dispatcher processes you may be able to improve performance of those user processes connected to Oracle using that protocol.
To add dispatcher processes while Oracle is running, use the MTS_DISPATCHERS parameter of the ALTER SYSTEM command.
The total number of dispatcher processes across all protocols is limited by the value of the initialization parameter MTS_MAX_DISPATCHERS. You may need to increase this value before adding dispatcher processes. The default value of this parameter is 5 and the maximum value varies depending on your operating system.
See Also: Oracle8 Server Administrator's Guide for more information on adding dispatcher processes.
MTS_DISPATCHERS lets you enable various attributes for each dispatcher. Previously you could specify a protocol and an initial number of dispatchers. These attributes are specified in a position-dependent, comma-separated string assigned to MTS_DISPATCHERS. For example:
MTS_DISPATCHERS = "TCP, 3"
While remaining backwardly compatible with this format, Oracle8 supports a name-value syntax to let you specify existing and additional attributes in a position-independent case-insensitive manner. For example:
MTS_DISPATCHERS = "(PROTOCOL=TCP)(DISPATCHERS=3)"
One and only one of the following attributes is required: PROTOCOL, ADDRESS, or DESCRIPTION.
The optional attribute POOL (or POO) is used to enable the Net8 connection pooling feature.
See Also: Oracle8 Server SQL Reference and the Net8 Administrator's Guide for more information about connection pooling.
This section discusses how to identify contention for shared server processes and how to increase the maximum number of shared server processes.
Contention for shared server processes can be reflected by a steady increase in waiting time for requests in the request queue. The dynamic performance table V$QUEUE contains statistics reflecting the request queue activity for shared server processes. By default, this table is only available to the user SYS and to other users who have SELECT ANY TABLE system privilege, such as SYSTEM. These columns show wait times for requests in the queue:
Monitor these statistics occasionally while your application is running:
SELECT DECODE( totalq, 0, 'No Requests',
wait/totalq || ' hundredths of seconds')
"Average Wait Time Per Requests"
FROM v$queue
WHERE type = 'COMMON';
This query returns the total wait time for all requests and total number of requests for the request queue. The result of this query might look like this:
Average Wait Time per Request
-----------------------------
.090909 hundredths of seconds
From the result, you can tell that a request waits an average of 0.09 hundredths of a second in the queue before it is processed.
You can also determine how many shared server processes are currently running by issuing this query:
SELECT COUNT(*) "Shared Server Processes"
FROM v$shared_servers
WHERE status != 'QUIT';
The result of this query might look like this:
Shared Server Processes
-----------------------
10
Since Oracle automatically adds shared server processes if the load on existing ones increases drastically, you are unlikely to improve performance simply by explicitly adding more shared server processes. However, if the number of shared server processes has reached the limit established by the initialization parameter MTS_MAX_SERVERS and the average wait time in the requests queue is still increasing, you may be able to improve performance by increasing the MTS_MAX_SERVERS value. The default value of this parameter is 20 and the maximum value varies depending on your operating system. You can then either allow Oracle to automatically add shared server processes, or explicitly add shared processes through one of these means:
See Also: Oracle8 Server Administrator's Guide for more information on adding shared server processes.
This section describes how to detect and alleviate contention for parallel server processes when using parallel execution:
Statistics in the V$PQ_SYSSTAT view are useful for determining the appropriate number of parallel server processes for an instance. The statistics that are particularly useful are "Servers Busy", "Servers Idle", "Servers Started", and "Servers Shutdown".
Frequently, you will not be able to increase the maximum number of parallel server processes for an instance because the maximum number is heavily dependent upon the capacity of your CPUs and your I/O bandwidth. However, if servers are continuously starting and shutting down, you should consider increasing the value of the parameter PARALLEL_MIN_SERVERS.
For example, if you have determined that the maximum number of concurrent parallel server processes that your machine can manage is 100, you should set PARALLEL_MAX_SERVERS to 100. Next, determine how many parallel server processes the average parallel operation needs, and how many parallel operations are likely to be executed concurrently. For this example, assume you will have two concurrent operations with 20 as the average degree of parallelism. Thus at any given time there could be 80 parallel server processes busy on an instance. Thus you should set the PARALLEL_MIN_SERVERS parameter to 80.
Periodically examine V$PQ_SYSSTAT to determine if the 80 parallel server processes for the instance are actually busy. To determine if the instance's parallel server processes are active, issue the following query:
SELECT * FROM V$PQ_SYSSTAT
WHERE statistic = "Servers Busy";
STATISTIC VALUE
--------------------- -----------
Servers Busy 70
If you find that typically there are fewer than PARALLEL_MIN_SERVERS busy at any given time, your idle parallel server processes are additional system overhead that is not being used. Consider decreasing the value of the parameter PARALLEL_MIN_SERVERS. If you find that there are typically more parallel server processes active than the value of PARALLEL_MIN_SERVERS and the "Servers Started" statistic is continuously growing, consider increasing the value of the parameter PARALLEL_MIN_SERVERS.
Contention for redo log buffer access rarely inhibits database performance. However, Oracle provides you with methods to monitor and reduce any latch contention that does occur. This section covers:
When LGWR writes redo entries from the redo log buffer to a redo log file, user processes can then copy new entries over the entries that have been written to disk. LGWR normally writes fast enough to ensure that space is always available in the buffer for new entries, even when access to the redo log is heavy.
The statistic redo buffer allocation retries reflects the number of times a user process waits for space in the redo log buffer. This statistic is available through the dynamic performance table V$SYSSTAT. By default, this table is only available to the user SYS and to users granted SELECT ANY TABLE system privilege, such as SYSTEM. Use the following query to monitor these statistics over a period of time while your application is running:
SELECT name, value
FROM v$sysstat
WHERE name = 'redo buffer allocation retries';
The information in V$SYSSTAT can also be obtained through SNMP.
The value of redo buffer allocation retries should be near 0. If this value increments consistently, processes have had to wait for space in the buffer. This may be caused by the log buffer being too small, or it could be caused by checkpointing or log switching. Increase the size of the redo log buffer, if necessary, by changing the value of the initialization parameter LOG_BUFFER. The value of this parameter, expressed in bytes, must be a multiple of DB_BLOCK_SIZE. Alternatively, improve the checkpointing or archiving process.
Note: Multiple archiver processes are not recommended. A single automatic ARCH process can archive redo logs, keeping pace with the LGWR process.
Access to the redo log buffer is regulated by two types of latch: the redo allocation latch and redo copy latches
The redo allocation latch controls the allocation of space for redo entries in the redo log buffer. To allocate space in the buffer, an Oracle user process must obtain the redo allocation latch. Since there is only one redo allocation latch, only one user process can allocate space in the buffer at a time. The single redo allocation latch enforces the sequential nature of the entries in the buffer.
After allocating space for a redo entry, the user process may copy the entry into the buffer. This is called "copying on the redo allocation latch". A process may only copy on the redo allocation latch if the redo entry is smaller than a threshold size.
The maximum size of a redo entry that can be copied on the redo allocation latch is specified by the initialization parameter LOG_SMALL_ENTRY_MAX_SIZE. The value of this parameter is expressed in bytes. The minimum, maximum, and default values vary depending on your operating system.
The copy latch is obtained first. Then the allocation latch is obtained, the allocation is performed, and the allocation latch is released. Next the copy is performed under the copy latch, and the copy latch is released. The allocation latch is thus held for only a very short period of time, and the system does not try to obtain the copy latch while holding the allocation latch.
If the redo entry is too large to copy on the redo allocation latch, the user process must obtain a redo copy latch before copying the entry into the buffer. While holding a redo copy latch, the user process copies the redo entry into its allocated space in the buffer and then releases the redo copy latch.
If your computer has multiple CPUs, your redo log buffer can have multiple redo copy latches. These allow multiple processes to copy entries to the redo log buffer concurrently. The number of redo copy latches is determined by the parameter LOG_SIMULTANEOUS_COPIES; its default value is the number of CPUs available to your Oracle instance.
On single-CPU computers, there should be no redo copy latches since only one process can by active at once. In this case, all redo entries are copied on the redo allocation latch, regardless of size.
Heavy access to the redo log buffer can result in contention for redo log buffer latches. Latch contention can reduce performance. Oracle collects statistics for the activity of all latches and stores them in the dynamic performance table V$LATCH. By default, this table is only available to the user SYS and to other users who have SELECT ANY TABLE system privilege, such as SYSTEM.
Each row in the V$LATCH table contains statistics for a different type of latch. The columns of the table reflect activity for different types of latch requests. The distinction between these types of requests is whether the requesting process continues to request a latch if it is unavailable:
These columns of the V$LATCH table reflect willing-to-wait requests:
For example, consider the case in which a process makes a willing-to-wait request for a latch that is unavailable. The process waits and requests the latch again and the latch is still unavailable. The process waits and requests the latch a third time and acquires the latch. This activity increments the statistics as follows:
These columns of the V$LATCH table reflect immediate requests:
IMMEDIATE GETS |
This column shows the number of successful immediate requests for each latch. |
IMMEDIATE MISSES |
This column shows the number of unsuccessful immediate requests for each latch. |
Use the following query to monitor the statistics for the redo allocation latch and the redo copy latches over a period of time:
SELECT ln.name, gets, misses, immediate_gets, immediate_misses
FROM v$latch l, v$latchname ln
WHERE ln.name IN ('redo allocation', 'redo copy')
AND ln.latch# = l.latch#;
The output of this query might look like this:
NAME GETS MISSES IMMEDIATE_GETS IMMEDIATE_MISSES
------------ ---------- ---------- -------------- ----------------
redo allo... 252867 83 0 0
redo copy 0 0 22830 0
From the output of the query, calculate the wait ratio for each type of request.
Contention for a latch may be affecting performance if either of these conditions is true:
If either of these conditions is true for a latch, try to reduce contention for that latch. These contention thresholds are appropriate for most operating systems, though some computers with many CPUs may be able to tolerate more contention without performance reduction.
Most cases of latch contention occur when two or more Oracle processes concurrently attempt to obtain the same latch. Latch contention rarely occurs on single-CPU computers where only a single process can be active at once.
To reduce contention for the redo allocation latch, you should minimize the time that any single process holds the latch. To reduce this time, reduce copying on the redo allocation latch. Decreasing the value of the LOG_SMALL_ENTRY_MAX_SIZE initialization parameter reduces the number and size of redo entries copied on the redo allocation latch.
On multiple-CPU computers, multiple redo copy latches allow multiple processes to copy entries to the redo log buffer concurrently. The default value of LOG_SIMULTANEOUS_COPIES is the number of CPUs available to your Oracle instance.
If you observe contention for redo copy latches, add more latches. To increase the number of redo copy latches, increase the value of LOG_SIMULTANEOUS_COPIES. It can help to have up to twice as many redo copy latches as CPUs available to your Oracle instance.
The LRU (least recently used) latch controls the replacement of buffers in the buffer cache. For SMP systems, Oracle automatically sets the number of LRU latches to be one half the number of CPUs on the system. For non-SMP systems, one LRU latch is sufficient.
Contention for the LRU latch can impede performance on symmetric multiprocessor (SMP) machines with a large number of CPUs. You can detect LRU latch contention by querying V$LATCH, V$SESSION_EVENT, and V$SYSTEM_EVENT. To avoid contention, consider bypassing the buffer cache or redesigning the application.
You can specify the number of LRU latches on your system with the initialization parameter DB_BLOCK_LRU_LATCHES. This parameter sets the maximum value for the desired number of LRU latches. Each LRU latch controls a set of buffers; Oracle balances allocation of replacement buffers among the sets.
To select the appropriate value for DB_BLOCK_LRU_LATCHES, consider the following:
Note: You cannot dynamically change the number of sets during the lifetime of the instance.
Free list contention can reduce the performance of some applications. This section covers:
Contention for free lists is reflected by contention for free data blocks in the buffer cache. You can determine whether contention for free lists is reducing performance by querying the dynamic performance table V$WAITSTAT.
The V$WAITSTAT table contains block contention statistics. By default, this table is only available to the user SYS and to other users who have SELECT ANY TABLE system privilege, such as SYSTEM.
Use the following procedure to find the segment names and free lists which have contention:
High numbers indicate that there is some contention.
SELECT SEGMENT_NAME, SEGMENT_TYPE
FROM DBA_EXTENTS
WHERE FILE_ID = file
AND BLOCK BETWEEN block_id AND block_id + blocks
This will return the segment name (segment) and type (type).
SELECT SEGMENT_NAME, FREELISTS
FROM DBA_SEGMENTS
WHERE SEGMENT_NAME = segment
AND SEGMENT_TYPE = type
To reduce contention for the free lists of a table, re-create the table with a larger value for the FREELISTS storage parameter. Increasing the value of this parameter to the number of Oracle processes that concurrently insert data into the table may benefit performance for the INSERT statements.
Re-creating the table may simply involve dropping and creating it again. However, you may want to use one of these means instead: