22
Recovering the Database

This chapter describes Oracle recovery features on a parallel server. It covers the following topics:

Overview

This chapter discusses three types of recovery:

Table 22-1 Types of Recovery

Type of Recovery	Definition
Instance failure.	Occurs when a software or hardware problem prevents an instance from continuing work.
Media failure.	Occurs when the storage medium for Oracle files is damaged. This usually prevents Oracle from reading or writing data.
Parallel recovery.	For Recovery Manager, the restore and application of incremental backups are parallelized using channel allocation. Application of redo (whether it is done by Recovery Manager or by Server Manager) is determined by the RECOVERY_PARALLELISM parameter.

Recovery from Instance Failure

The following sections describe the recovery performed after failure of instances accessing the database in shared mode.

After instance failure, Oracle uses the online redo log files to perform automatic recovery of the database. For a single instance running in exclusive mode, instance recovery occurs as soon as the instance starts up again after it has failed or shut down abnormally.

When instances accessing the database in shared mode fail, online instance recovery is performed automatically. Instances that continue running on other nodes are not affected as long as they are reading from the buffer cache. If instances attempt to write, the transaction stops. All operations to the database are suspended until cache recovery of the failed instance is complete.

Single-node Failure

Oracle Parallel Server (OPS) performs instance recovery by coordinating recovery operations through the SMON processes of the other running instances. If one instance fails, the SMON process of another instance notices the failure and automatically performs instance recovery for the failed instance.

Instance recovery does not include restarting the failed instance or any applications that were running on that instance. Applications that were running may continue by failover, as described in "Recovery from Instance Failure".

When one instance performs recovery for another failed instance, the surviving instance reads redo log entries generated by the failed instance and uses that information to ensure all committed transactions are reflected in the database. Data from committed transactions is not lost. The instance performing recovery rolls back any transactions that were active at the time of the failure and releases resources being used by those transactions.

Multiple-node Failure

As long as one instance continues running, its SMON process performs instance recovery for any other instances that fail in a parallel server.

If all instances of a parallel server fail, instance recovery is performed automatically the next time an instance opens the database. The instance does not have to be one of the instances that failed, and it can mount the database in either shared or exclusive mode from any node of the parallel server. This recovery procedure is the same for Oracle running in shared mode as it is for Oracle in exclusive mode, except that one instance performs instance recovery for all failed instances.

Fast-Start Checkpointing

Fast-start checkpointing is the basis for Fast-start fault recovery in Oracle8i. Fast-start checkpointing occurs continuously, advancing the checkpoint as Oracle write blocks to disk. Fast-start checkpointing always writes the oldest modified block first, ensuring that every write allows the checkpoint time to be advanced. This eliminates bulk writes and the resulting I/O spikes that occur with conventional checkpointing, yielding smooth and efficient on-going performance.

You can specify a limit on how long the roll forward phase of Fast-start checkpointing takes. Oracle automatically adjusts the checkpoint write rate to meet the specified roll-forward limit while issuing the minimum number of writes. For details on how to do this, please refer to Oracle8i Tuning.

Fast-Start Roll Back

The rollback phase of system fault recovery in Oracle8i uses "non-blocking" rollback technology. This means new transactions can begin immediately after roll forward completes. When a new transaction accesses a row locked by a dead transaction, the new transaction rolls back only the changes that prevent the transaction's progress. New transactions do not have to wait for Oracle to roll back the entire dead transaction, so long-running transactions no longer affect recovery time. The Fast-start technology maximizes data availability and ensures predictable recovery time.

In addition, the database server can roll back dead transactions in parallel. This technique is used against rows not blocking new transactions, and only when the cost of performing dead transaction roll back in parallel is less than performing it serially.

See Also:
Oracle8i Concepts.

Access to Datafiles for Instance Recovery

An instance performing recovery for another instance must have access to all online datafiles that the failed instance was accessing. When instance recovery fails because a datafile fails verification, the instance that attempted to perform recovery does not fail but a message is written to the ALERT file.

After you correct the problem that prevented access to the database files, use the SQL statement ALTER SYSTEM CHECK DATAFILES to make the files available to the instance.

See Also:
"Datafiles".

Freezing the Database for Instance Recovery

With OPS, you can use the dynamic parameter FREEZE_DB_FOR_FAST_INSTANCE_RECOVERY to control freezing of the database during instance recovery. All instances must have the same value for this parameter.

When this parameter is set to TRUE, Oracle freezes the entire database during instance recovery. The advantage of freezing the entire database is to stop other disk activities except those for instance recovery. Instance recovery may thus complete sooner. The drawback of freezing the entire database is that it becomes unavailable during instance recovery.

When this parameter is set to FALSE, Oracle does not freeze the database, thus part of the unaffected database is accessible during instance recovery.

The system attempts to intelligently select an appropriate default.

If all online datafiles use hash locks, the default value of this parameter is FALSE. This is because when hash locks are used most parts of the database can be accessed by users during instance recovery.
If data files use fine grain locks, the default is TRUE. When fine grain locks are used an instance death may affect a larger portion of the database. Affected data will be accessible only after instance recovery. In this case, setting this parameter to TRUE can potentially make those parts of the database available sooner.

To see the number of times the entire database is frozen for instance recovery after this instance has started up, check the "instance recovery database freeze count" statistic in V$SYSSTAT.

See Also:
The Oracle8i Reference.

Phases of Oracle Instance Recovery

Figure 22-1 illustrates the degree of database availability during each phase of Oracle instance recovery.

Figure 22-1 Phases of Oracle Instance Recovery

Phases of recovery are these:

OPS is running on multiple nodes.
Node failure is detected.
The LM is reconfigured; resource and lock management is redistributed onto the set of surviving nodes. One call gets persistent resources. Lock value block is marked as dubious for locks held in exclusive or protected write mode. Lock requests are queued.
LCKn processes build a list of all invalid lock elements.
Roll forward. Redo logs of the dead thread(s) are applied to the database.
LCKn processes make all invalid lock elements valid.
Roll back. Rollback segments are applied to the database for all uncommitted transactions.
Instance recovery is complete, and all data is accessible.

During phase 5, forward application of the redo log, database access is limited by the transitional state of the buffer cache. The following data access restrictions exist for all user data in all datafiles, regardless of whether you are using hashed or fine grain locking, or any particular features:

No writes to surviving buffer caches can succeed while the access is limited.
No disk I/O of any sort by way of the buffer cache and direct path can be done from any of the surviving instances.
No lock requests are made to the IDLM for user data.

Reads of buffers already in the cache with the correct global lock can be done, since they do not involve any I/O or lock operations.

The transitional state of the buffer cache begins at the conclusion of the initial lock scan phase when instance recovery is first started by scanning for dead redo threads. Subsequent lock scans are made if new "dead" threads are discovered. This state lasts while the redo log is applied (cache recovery) and ends when the redo logs have been applied and the file headers have been updated. Cache recovery operations conclude with validation of the invalid locks, which occurs after the buffer cache state is normalized.

Recovery from Media Failure

After a media failure resulting in the loss of one or more database files, use backups of the datafiles to recover the database.

If you are using Recovery Manager, you might also need to apply incremental backups, archived redo log files and a backup of the control file.

If you are using operating system utilities, you might need to apply archived redo log files to the database and use a backup of the control file.

This section describes:

Complete Media Recovery

You can perform complete media recovery in either exclusive or shared mode. Table 22-2 shows the status of the database that is required to recover particular database objects.

Table 22-2 Database Status for Media Recovery

To Recover	Database Status
An entire database or the SYSTEM tablespace.	The database must be mounted but not opened by any instance.
A tablespace other than the SYSTEM tablespace.	The database must be opened by the instance performing the recovery and the tablespace must be offline.
A datafile.	The database can be open with the datafile offline, or the database can be mounted but not opened by any instance. (For a datafile in the SYSTEM tablespace, the database must be mounted but not open.)

You can recover multiple datafiles or tablespaces on multiple instances simultaneously.

Complete Media Recovery Using Operating System Utilities

With operating system utilities you can perform open database recovery of tablespaces or datafiles in shared mode. Do this using the Server Manager command RECOVER TABLESPACE or RECOVER DATAFILE.

You can use the Server Manager RECOVER DATABASE command to recover a database that is mounted in shared mode, but not open. Only one instance can issue this command in OPS.

Note:
The recommended method of recovering a database is to use Server Manager. We do not recommend direct use of the SQL command ALTER DATABASE RECOVER.

Complete Media Recovery Using Recovery Manager

With Recovery Manager you can issue the following statements to restore and recover the files:

RESTORE DATABASE
RESTORE TABLESPACE
RESTORE DATAFILE
RECOVER DATABASE
RECOVER TABLESPACE
RECOVER DATAFILE

The commands you use in Recovery Manager for OPS are the same as those you use to recover single instance environments.

See Also:
For more information refer to the Oracle8i Backup and Recovery Guide.

Incomplete Media Recovery

Incomplete media recovery can be performed while the database is mounted in shared or exclusive mode but not opened by any instance. Do this using the following database recovery options:

With Recovery Manager use one of the following options with the SET command prior to restoring and recovering:

UNTIL CHANGE integer
UNTIL TIME date
UNTIL LOGSEQ integer THREAD integer

With operating system utilities restore your appropriate backups and then use one of the following options with the RECOVER DATABASE command:

UNTIL CANCEL
UNTIL CHANGE integer
UNTIL TIME date

See Also:
The Oracle8i Backup and Recovery Guide.

Restoring and Recovering Redo Log Files

Media recovery of a database accessed by OPS may require multiple archived log files to be open at the same time. Because each instance writes redo log data to a separate redo thread, recovery may require as many as one archived log file per thread.

However, if a thread's online redo log contains enough recovery information, restoring archived log files for that thread is unnecessary.

Recovery Using Recovery Manager

Recovery Manager automatically restores and applies the archive logs required. By default, Recovery Manager restores archive logs to the LOG_ARCHIVE_DEST directory of the instances to which it connects. If you are using multiple nodes to restore and recover, this means that the archive logs may be restored to any of the nodes performing the restore/recover.

The node that actually reads the restored logs and performs the roll forward is the target node to which the connection was initially made. You must ensure that the logs are readable from that node.

Recovery Using Operating System Utilities

During recovery, Oracle prompts you for the archived log files as they are needed. Messages supply information about the required files and Oracle prompts you for the filenames.

For example, if the log history is enabled and the filename format is LOG_T%t_SEQ%s, where %t is the thread and %s is the log sequence number, then you might receive these messages to begin recovery with SCN 9523 in thread 8:

ORA-00279: Change 9523 generated at 27/09/91 11:42:54 needed for thread 8 
ORA-00289: Suggestion : LOG_T8_SEQ438 
ORA-00280: Change 9523 for thread 8 is in sequence 438 
Specify log: {<RET> = suggested | filename | AUTO | FROM | CANCEL}

If you use the ALTER DATABASE statement with the RECOVER clause instead of Server Manager, you receive these messages but not the prompt. Redo log files may be required for each enabled thread in OPS. Oracle issues a message when a log file is no longer needed. The next log file for that thread is then requested, unless the thread was disabled or recovery is finished.

If recovery reaches a time when an additional thread was enabled, Oracle simply requests the archived log file for that thread. Whenever an instance enables a thread, it writes a redo entry that records the change; therefore, all necessary information about threads is available from the redo log files during recovery.

If recovery reaches a time when a thread was disabled, Oracle informs you that the log file for that thread is no longer needed and does not request further log files for the thread.

Note:
If Oracle reconstructs the names of archived redo log files, the format that LOG_ARCHIVE_FORMAT specifies for the instance doing recovery must be the same as the format specified for the instances that archived the files. All instances should use the same value of LOG_ARCHIVE_FORMAT in OPS, and the instance performing recovery should also use that value. You can specify a different value of LOG_ARCHIVE_DEST during recovery if the archived redo log files are not at their original archive destinations.

Disaster Recovery

This section describes disaster recovery using Recovery Manager and operating system utilities. Disaster recovery is used when a failure makes an entire site unavailable. In this case, you can recover at an alternate site using open or closed database backups.

Note:
To recover up to the latest point in time, all logs must be available at a remote site; otherwise some committed transactions may be lost.

Disaster Recovery Using Recovery Manager

The following scenario assumes:

You have lost the entire database, all control files and the online redo log
You will be distributing your restore over 2 nodes
There are 4 tape drives (two on each node)

You are using a recovery catalog

Note:
It is highly advisable to back up the database immediately after opening the database reset logs, since all previous backups are invalidated. This step is not shown in this example.

The SET UNTIL command is used in case the database structure has changed in the most recent backups and you wish to recover to that point in time. In this way, Recovery Manager restores the database to the same structure the database had at the specified time.

Before You Begin: Before beginning the database restore, you must:

Restore your initialization file and your recovery catalog from your most recent backup

Catalog archive logs, datafile copies, or backup sets that are on disk but are not registered in the recovery catalog

The archive logs up to the logseq number being restored must be cataloged in the recovery catalog, or Recovery Manager will not know where to find them.

If you resynchronize the recovery catalog frequently, and have an up-to-date copy from which you have restored, there should not be many archive logs that need cataloging.

Note:
You only have to perform this step if you lose your recovery catalog and have already restored and performed point-in-time recovery on it. This is not necessary if the recovery catalog is still intact. You might, however, need to catalog a few archived logs, even with an intact catalog, but you only need to recreate the ones that were created since the last "catalog resync". A "catalog resync" is the process by which rman copies information about backups, copies, and archivelogs from the target database control file to the recovery catalog.

What the Sample Script Does: The following script restores and recovers the database to the most recently available archived log, which is log 124 thread 1. It does the following:

Starts the database NOMOUNT and restricts connections to DBA-only users.
Restores the control file to the location specified.
Copies (or replicates) this control file to all the other locations specified by the CONTROL_FILES initialization parameter.
Mounts the control file.
Catalogs any archive logs not in the recovery catalog.
Restores the database files (to the original locations).

If volume names have changed, you must use the statement SET NEWNAME FOR... before the restore, then perform a switch after the restore. This updates the control file with the datafiles' new locations.
Recovers the datafiles by either using a combination of incremental backups and redo, or just redo.

Recovery Manager completes the recovery when it reaches the log sequence number specified.
Opens the database resetlogs.

Note:
Only complete the following step if you are certain there are no other archived logs to apply.
Oracle recommends you back up your database after the resetlogs. This is not shown in the example.

Restore/Recover Sample Script:

The DBA starts Server Manager as follows:

   CONNECT SCOTT/TIGER AS SYSDBA

Oracle responds with:

   Connected.

Then enter the following STARTUP syntax:

   STARTUP NOMOUNT RESTRICT

The DBA starts Recovery Manager and runs the script.

Note:
The user specified in the target parameter must have SYSDBA privilege.

   RMAN TARGET SCOTT/TIGER@NODE1 RCVCAT RMAN/RMAN@RCAT
   RUN { 
     SET UNTIL LOGSEQ 124 THREAD 1; 
     ALLOCATE CHANNEL T1 TYPE 'SBT_TAPE' CONNECT 'INTERNAL/KNL@NODE1'; 
     ALLOCATE CHANNEL T2 TYPE 'SBT_TAPE' CONNECT 'INTERNAL/KNL@NODE1'; 
     ALLOCATE CHANNEL T3 TYPE 'SBT_TAPE' CONNECT 'INTERNAL/KNL@NODE2'; 
     ALLOCATE CHANNEL T4 TYPE 'SBT_TAPE' CONNECT 'INTERNAL/KNL@NODE2'; 
     ALLOCATE CHANNEL D1 TYPE DISK; 
     RESTORE CONTROLFILE; 
     ALTER DATABASE MOUNT; 
     CATALOG ARCHIVELOG '/ORACLE/DB_FILES/NODE1/ARCH/ARCH_1_123.RDO'; 
     CATALOG ARCHIVELOG '/ORACLE/DB_FILES/NODE1/ARCH/ARCH_1_124.RDO'; 
     RESTORE DATABASE; 
     RECOVER DATABASE; 
     SQL 'ALTER DATABASE OPEN RESETLOGS'; 
     }

Disaster Recovery Using Operating System Utilities

To do this, use the following procedure:

Restore the last full backup at the alternate site as described in the Oracle8i Backup and Recovery Guide.
Start Server Manager.
Connect as SYSDBA.
Start and mount the database with the STARTUP MOUNT statement.
Initiate an incomplete recovery using the RECOVER command with the appropriate UNTIL option.

The following command is an example:
```
RECOVER DATABASE USING BACKUP CONTROLFILE UNTIL CANCEL
```
When prompted with a suggested redo log file name for a specific thread, use that filename.

If the suggested archive log is not in the archive directory, specify where the file can be found. If redo information is needed for a thread and a file name is not suggested, try using archive log files for the thread in question.
Repeat step 6 until all archive log files have been applied.
Stop the recovery operation using the CANCEL command.

Issue the ALTER DATABASE OPEN RESETLOGS statement.

Note:
If any distributed database actions are used, check to see whether your recovery procedures require coordinated distributed database recovery. Otherwise, you may cause logical corruption to the distributed data.

Parallel Recovery

The goal of the parallel recovery feature is to use compute and I/O parallelism to reduce the elapsed time required to perform crash recovery, single-instance recovery, or media recovery. Parallel recovery is most effective at reducing recovery time when several datafiles on several disks are being recovered concurrently.

Parallel Recovery Using Recovery Manager

With Recovery Manager's RESTORE and RECOVER commands Oracle can automatically parallelize all three stages of recovery.

Restoring Data Files: When restoring data files, the number of channels you allocate in the Recovery Manager recover script effectively sets the parallelism Recovery Manager uses. For example, if you allocate 5 channels, you can have up to 5 parallel streams restoring data files.

Applying Incremental Backups: Similarly, when you are applying incremental backups, the number of channels you allocate determines the potential parallelism.

Applying Redo Logs: Oracle applies the redo logs in parallel as determined by the RECOVERY_PARALLELISM parameter.

The RECOVERY_PARALLELISM initialization parameter specifies the number of redo application server processes participating in instance or media recovery. One process reads the log files sequentially and dispatches redo information to several recovery processes that apply the changes from the log files to the datafiles. A value of 0 or 1 indicates recovery is to be performed serially by one process. The value of this parameter cannot exceed the value of the PARALLEL_MAX_SERVERS parameter.

Parallel Recovery Using Operating System Utilities

You can parallelize instance and media recovery two ways:

The Oracle Server can use one process to read the log files sequentially and dispatch redo information to several recovery processes to apply the changes from the log files to the datafiles. Oracle automatically starts the recovery processes, so you do not need to use more than one session to perform recovery.

Setting the RECOVERY_ PARALLELISM Parameter

The RECOVERY_PARALLELISM initialization parameter specifies the number of redo application server processes participating in instance or media recovery. One process reads the log files sequentially and dispatches redo information to several recovery processes. The recovery processes then apply the changes from the log files to the datafiles. A value of 0 or 1 indicates that recovery is performed serially by one process. The value of this parameter cannot exceed the value of the PARALLEL_MAX_SERVERS parameter.

Specifying RECOVER Command Options

When you use the RECOVER command to parallelize instance and media recovery, the allocation of recovery processes to instances is operating system specific. The DEGREE keyword of the PARALLEL clause can either signify the number of processes on each instance of a parallel server or the number of processes to spread across all instances.

Fast-start Parallel Rollback in OPS

Setting the INIT.ORA parameter FAST_START_PARALLEL_ROLLBACK to LOW or HIGH enables Fast-start Parallel Rollback. This parameter helps determine the maximum number of server processes that participate in Fast-start parallel rollback. If the value is FALSE, Fast-start parallel rollback is disabled.

If the value for FAST_START_PARALLEL_ROLLBACK is LOW, the number of processes used for Fast-start rollback is 2 times the value for CPU_COUNT. If the value is HIGH, at most 4 times the value of CPU_COUNT is the number of rollback servers used for Fast-start parallel rollback.

In OPS, multiple parallel recovery processes are owned by and operated only within the instance that generated them. To determine an accurate setting for FAST_START_PARALLEL_ROLLBACK, examine the contents of V$FAST_START_SERVERS and V$FAST_START_TRANSACTIONS.

Fast-start Parallel Rollback does not perform cross-instance rollback. However, it can improve the processing of rollback segments for a single database with multiple instances since each instance can spawn its own group of recovery processes.

Managed Standby and Standby Databases

You can protect OPS systems against disasters by using standby databases. To simplify the adminstration of standby databases, consider using the Managed Standby feature. Please refer to the Oracle8i Backup and Recovery Guide for details about the Managed Standby Database feature.

22Recovering the Database