15
Operating System Recovery Scenarios

This chapter describes how to recover from common media failures, and includes the following topics:

Understanding the Types of Media Failures

Media failures fall into two general categories: permanent and temporary. Use different recovery strategies depending on the type of failure.

Permanent media failures are serious hardware problems that cause permanent loss of data on the disk. Lost data cannot be recovered except by repairing or replacing the failed storage device and restoring backups of the files stored on the damaged storage device.

Temporary media failures are hardware problems that make data temporarily inaccessible, but do not corrupt the data. Following are two examples of temporary media failures:

A disk controller fails. Once the disk controller is replaced, Oracle can access the data on the disk.
Power to a storage device is cut off. Once the power is returned, the storage device and all associated data is accessible again.

Recovering After the Loss of Datafiles

If a media failure affects datafiles, the appropriate recovery procedure depends on:

The archiving mode of the database: ARCHIVELOG or NOARCHIVELOG.
The type of media failure.
The files affected by the media failure.

The following sections explain the appropriate recovery strategies for the database mode:

Losing Datafiles in NOARCHIVELOG Mode

If either a permanent or temporary media failure affects any datafiles of a database operating in NOARCHIVELOG mode, Oracle automatically shuts down the database. Depending on the type of media failure, you can use one of two recovery methods:

If the media failure is temporary, correct the hardware problem and restart the database. Usually, instance recovery is possible, and all committed transactions can be recovered using the online redo log.
If the media failure is permanent, follow the procedure "Recovering a Database in NOARCHIVELOG Mode".

Losing Datafiles in ARCHIVELOG Mode

If either a permanent or temporary media failure affects the datafiles of a database operating in ARCHIVELOG mode, the following scenarios can occur.

Damaged Datafiles Database Status Solution

Datafiles in the SYSTEM tablespace or datafiles with active rollback segments.

Oracle shuts down.

If the hardware problem is temporary, fix it and restart the database. Usually, instance recovery recovers lost transactions.
If the hardware problem is permanent, follow the procedure in "Performing Closed Database Recovery".

Non-SYSTEM datafiles or datafiles that do not contain active rollback segments.

Oracle takes affected datafiles offline, but the database stays open.

If the unaffected portions of the database must remain available, do not shut down the database. Take tablespaces containing problem datafiles offline using the temporary option, then follow the procedure in "Performing Closed Database Recovery".

Damaged Datafiles	Database Status	Solution
Datafiles in the SYSTEM tablespace or datafiles with active rollback segments.	Oracle shuts down.	If the hardware problem is temporary, fix it and restart the database. Usually, instance recovery recovers lost transactions. If the hardware problem is permanent, follow the procedure in "Performing Closed Database Recovery".
Non-SYSTEM datafiles or datafiles that do not contain active rollback segments.	Oracle takes affected datafiles offline, but the database stays open.	If the unaffected portions of the database must remain available, do not shut down the database. Take tablespaces containing problem datafiles offline using the temporary option, then follow the procedure in "Performing Closed Database Recovery".

Recovering Through an ADD DATAFILE Operation

If database recovery rolls forward through an ADD DATAFILE operation, Oracle will stop the recovery when applying the ADD DATAFILE redo data and let you confirm the location of the file.

For example, suppose you create a new tablespace containing two datafiles: /db/db2.f and /db/db3.f. If you later perform media recovery through the CREATE TABLESPACE operation, Oracle may signal the following error when applying the CREATE TABLESPACE redo data:

ORA-00283: recovery session canceled due to errors 
ORA-01244: unnamed datafile(s) added to controlfile by media recovery
ORA-01110: data file 3: '/db/db2.f'
ORA-01110: data file 2: '/db/db3.f'

To recover through an ADD DATAFILE operation:

View the files added by selecting from V$DATAFILE:

SELECT file#, name FROM v$datafile;

FILE#          NAME
-------------------------------------
1              /db/db1.f
2              /db/UNNAMED00002
3              /db/UNNAMED00003

If multiple unnamed files exist, determine which unnamed file corresponds to which datafile using one of these methods:
- Open the alert.log, which contains messages about the original file location for each unnamed file.
- Derive the original file location of each unnamed file from the error message and V$DATAFILE: each unnamed file corresponds to the file in the error message with the same file number.

Issue the ALTER DATABASE RENAME FILE command to rename the datafiles. For example, enter:

ALTER DATABASE RENAME FILE '/db/UNNAMED00002' TO '/db/db3.f';
ALTER DATABASE RENAME FILE '/db/UNNAMED00003' TO '/db/db2.f';

Continue recovery by issuing the previous recovery command. For example, enter:
```
RECOVER DATABASE
```

Recovering Transported Tablespaces

The transportable tablespace feature of Oracle allows a user to transport a set of tablespaces from one database to another. Transporting or "plugging" a tablespace into a database is like creating a tablespace with pre-loaded data. Using this feature is often an advantage because:

It is faster than import/export or load/unload, since it involves only copying datafiles and integrating metadata.
You can use it to move index data, which allows you to avoid having to rebuild indexes.

Like normal tablespaces, plugged-in tablespaces are recoverable. While you can recover normal tablespaces without a backup, you must have a version of the plugged-in datafiles in order to recover a plugged-in tablespace.

To recover a plugged-in tablespace, restore a backup of the plugged-in datafiles and issue normal recovery commands. The backup can be the initial version of the plugged-in datafiles or any backup taken after the tablespace is plugged in. Just as when recovering through a CREATE TABLESPACE operation, Oracle may signal ORA-01244 when recovering through a tablespace plug-in operation. In this case, rename the unnamed files to the correct locations using the procedure in "Recovering Through an ADD DATAFILE Operation".

See Also: For detailed information about using the transportable tablespace feature, see Oracle8i Administrator's Guide.

Recovering After the Loss of Online Redo Log Files

If a media failure has affected the online redo logs of a database, the appropriate recovery procedure depends on:

The configuration of the online redo log: mirrored or non-mirrored.
The type of media failure: temporary or permanent.
The types of online redo log files affected by the media failure: current, active, unarchived, or inactive.

The following sections describe the appropriate recovery strategies these situations:

Recovering After Losing a Member of a Multiplexed Online Redo Log Group

If the online redo log of a database is multiplexed, and at least one member of each online redo log group is not affected by the media failure, Oracle allows the database to continue functioning as normal. Oracle writes error messages to the LGWR trace file and the alert.log of the database.

Solve the problem by taking one of the following actions:

If the hardware problem is temporary, correct it. LGWR then accesses the previously unavailable online redo log files as if the problem never existed.
If the hardware problem is permanent, drop the damaged member and add a new member using the procedure below.

Note:
The newly added member provides no redundancy until the log group is reused.

To replace a damaged member of a redo log group:

Locate the filename of the damaged member in V$LOGFILE. The status will be INVALID if the file is inaccessible:

SELECT group#, status, member FROM v$logfile;

GROUP#    STATUS       MEMBER
-------   -----------  ---------------------
0001                    /oracle/dbs/log1a.f
0001                    /oracle/dbs/log1b.f
0002                    /oracle/dbs/log2a.f
0002      INVALID       /oracle/dbs/log2b.f
0003                    /oracle/dbs/log3a.f
0003                    /oracle/dbs/log3b.f

Drop the damaged member. For example, to drop member log2b.f from group 2, issue:

ALTER DATABASE DROP LOGFILE MEMBER '/oracle/dbs/log2b.f';

Add a new member to the group. For example, to add log2c.f to group 2, issue:
```
ALTER DATABASE ADD LOGFILE MEMBER '/oracle/dbs/log2c.f' TO GROUP 2;
    

         
```
If the file you want to add already exists, it must be the same size as the other group members, and you must specify REUSE:
```
ALTER DATABASE ADD LOGFILE MEMBER '/oracle/dbs/log2b.f' REUSE TO GROUP 2;
```

Recovering After the Loss of All Members of an Online Redo Log Group

If a media failure damages all members of an online redo log group, different scenarios can occur, depending on the type of online redo log group affected by the failure and the archiving mode of the database.

If the damaged log group is inactive, then it is not needed for instance recovery; if it is active, it is needed for instance recovery. Your first task is to determine whether the damaged group is active or inactive.

To determine whether the damaged groups are active:

Locate the filename of the lost redo log in V$LOGFILE and then look for the group number corresponding to it. For example, enter:

SELECT group#, status, member FROM v$logfile;

GROUP#    STATUS       MEMBER
-------   -----------  ---------------------
0001                    /oracle/dbs/log1a.f
0001                    /oracle/dbs/log1b.f
0002      INVALID       /oracle/dbs/log2a.f
0002      INVALID       /oracle/dbs/log2b.f
0003                    /oracle/dbs/log3a.f
0003                    /oracle/dbs/log3b.f

Determine which groups are active. For example, enter:

SELECT group#, members, status, archived FROM v$log;

GROUP#  MEMBERS           STATUS     ARCHIVED
------  -------           ---------  -----------
 0001   2                 INACTIVE   YES
 0002   2                 ACTIVE     NO
 0003   2                 CURRENT    NO

If the affected group is inactive, follow the procedure in "Losing an Inactive, Online Redo Log Group". If the affected group is active (as in the above example), follow the procedure in "Losing an Active Online Redo Log Group".

Losing an Inactive, Online Redo Log Group

If all members of an online redo log group with INACTIVE status are damaged, the procedure depends on whether you can fix the media problem that damaged the inactive redo log group.

Type of Media Failure Procedure

Temporary

Fix the problem. LGWR can reuse the redo log group when required.

Permanent

The damaged inactive online redo log group will eventually halt normal database operation. Clear, i.e., reinitialize, the damaged group manually using ALTER DATABASE CLEAR LOGFILE as described below.

Type of Media Failure	Procedure
Temporary	Fix the problem. LGWR can reuse the redo log group when required.
Permanent	The damaged inactive online redo log group will eventually halt normal database operation. Clear, i.e., reinitialize, the damaged group manually using ALTER DATABASE CLEAR LOGFILE as described below.

You can clear an active redo log group when the database is open or closed. The procedure depends on whether the damaged group has been archived.

To clear an inactive, online redo log group that has been archived:

If the database is shut down, start a new instance and mount the database, but do not open it:
```
STARTUP MOUNT
    

         
```
Reinitialize the damaged log group. For example, to clear redo log group 2, issue:
```
ALTER DATABASE CLEAR LOGFILE GROUP 2;
```

To clear an inactive, online redo log group that has not been archived:

Clearing an unarchived log allows it to be reused without archiving it. This action will make backups unusable if they were started before the last change in the log, unless the file was taken offline prior to the first change in the log. Hence, if you need the cleared log file for recovery of a backup, you cannot recover that backup.

If the database is shut down, start a new instance and mount the database, but do not open it:
```
STARTUP MOUNT
    

         
```
Clear the log using the UNARCHIVED keyword. For example, to clear log group 2, issue:
```
ALTER DATABASE CLEAR LOGFILE UNARCHIVED GROUP 2;
    

         
```
If there is an offline datafile that requires the cleared unarchived log to bring it online, the keywords UNRECOVERABLE DATAFILE are required. The datafile and its entire tablespace will have to be dropped because the redo necessary to bring it online is being cleared, and there is no copy of it. For example, enter:
```
ALTER DATABASE CLEAR LOGFILE UNARCHIVED GROUP 2 UNRECOVERABLE DATAFILE;
    

         
```
Immediately back up the database using an O/S utility. Now you can use this backup for complete recovery without relying on the cleared log group. For example, enter:
```
% cp /disk1/oracle/dbs/*.f /disk2/backup
```

Back up the database's control file:

ALTER DATABASE BACKUP CONTROLFILE TO 'filename';

Failure of CLEAR LOGFILE Operation

The ALTER DATABASE CLEAR LOGFILE command can fail with an I/O error due to media failure when it is not possible to:

Relocate the redo log file onto alternative media by re-creating it under the currently configured redo log file name.
Reuse the currently configured log file name to recreate the redo log file because the name itself is invalid or unusable (for example, due to media failure).

In these cases, the CLEAR LOGFILE command (before receiving the I/O error) would have successfully informed the control file that the log was being cleared and did not require archiving. The I/O error occurred at the step in which CLEAR LOGFILE attempts to create the new redo log file and write zeros to it.

Losing an Active Online Redo Log Group

If your database is still running and the lost active log is not the current log, issue the ALTER SYSTEM CHECKPOINT command. If successful, your active log is rendered inactive, and you can follow the procedure in "Losing an Active Online Redo Log Group". If unsuccessful, or if your database has halted, perform one of these procedures, depending on the archiving mode.

To recover from loss of an active online redo log group in NOARCHIVELOG mode:

If the media failure is temporary, correct the problem so that Oracle can reuse the group when required.
Restore the database from a whole database backup using an O/S utility. For example, enter:
```
% cp /disk2/backup/*.f /disk1/oracle/dbs
```
Mount the database:
```
STARTUP MOUNT
```

Open the database using the RESETLOGS option:

ALTER DATABASE OPEN RESETLOGS;

Shut down the database normally:
```
SHUTDOWN IMMEDIATE
```
Make a whole database backup. For example, enter:
```
% cp /disk1/oracle/dbs/*.f /disk2/backup
```

To recover from loss of an active online redo log group in ARCHIVELOG mode:

If the media failure is temporary, correct the problem so that Oracle can reuse the group when required.
Perform incomplete media recovery. Use the procedure given in "Performing Incomplete Media Recovery", recovering up through the log before the damaged log.
Ensure that the current name of the lost redo log can be used for a newly created file. If not, rename the members of the damaged online redo log group to a new location. For example, enter:
```
ALTER DATABASE RENAME FILE '/oracle/dbs/log_1.rdo' TO '/temp/log_1.rdo';
ALTER DATABASE RENAME FILE '/oracle/dbs/log_2.rdo' TO '/temp/log_2.rdo';
```
Open the database using the RESETLOGS option:
```
ALTER DATABASE OPEN RESETLOGS;
```
Note:
All updates executed from the endpoint of the incomplete recovery to the present must be re-executed.

Loss of Multiple Redo Log Groups

If you have lost multiple groups of the online redo log, use the recovery method for the most difficult log to recover. The order of difficulty, from most difficult to least, follows:

The current online redo log
The active online redo log
The unarchived redo log
The inactive online redo log

Recovering After the Loss of Archived Redo Log Files

If the database is operating in ARCHIVELOG mode, and the only copy of an archived redo log file is damaged, it does not affect the present operation of the database. The following situations can arise, however, depending on when the redo log was written and when you backed up the datafile.

If you backed up Then

All datafiles after the filled online redo log group (which is now archived) was written.

The archived version of the filled online redo log group is not required for complete media recovery operation.

A given datafile before the filled online redo log group was written.

If the corresponding datafile is damaged by a permanent media failure, use the most recent backup of the damaged datafile and perform incomplete recovery up to the damaged log.

If you backed up	Then
All datafiles after the filled online redo log group (which is now archived) was written.	The archived version of the filled online redo log group is not required for complete media recovery operation.
A given datafile before the filled online redo log group was written.	If the corresponding datafile is damaged by a permanent media failure, use the most recent backup of the damaged datafile and perform incomplete recovery up to the damaged log.

WARNING:
If you know that an archived redo log group has been damaged, immediately back up all datafiles so that you will have a whole database backup that does not require the damaged archived redo log.

Recovering After the Loss of Control Files

If a media failure has affected the control files of a database (whether control files are multiplexed or not), the database continues to run until the first time that an Oracle background process needs to access the control files. At this point, the database and instance are automatically shut down.

If the media failure is temporary and the database has not yet shut down, avoid the automatic shutdown of the database by immediately correcting the media failure. If the database shuts down before you correct the temporary media failure, however, then you can restart the database after fixing the problem and restoring access to the control files.

The appropriate recovery procedure for media failures that permanently prevent access to control files of a database depends on whether you have multiplexed the control files. The following sections describe the appropriate procedures:

Losing a Member of a Multiplexed Control File

Use the following procedures to recover a database if all the following conditions are met:

A permanent media failure has damaged one or more control files of a database.
At least one control file has not been damaged by the media failure.

Note:
If all control files of a multiplexed control file configuration have been damaged, follow the procedure in "Losing All Copies of the Current Control File".

To restore a control file to its default location:

If the instance is still running, abort it:
```
SHUTDOWN ABORT
    

         
```
Correct the hardware problem that caused the media failure. If you cannot repair the hardware problem quickly, you can proceed with database recovery by restoring damaged control files to an alternative storage device.
Use an intact multiplexed copy of the database's current control file to copy over the damaged control files. For example, to replace bad_cf.f with good_cf.f, you might enter:
```
% cp /oracle/good_cf.f /oracle/dbs/bad_cf.f
    

         
```
Start an instance and open the database:
```
STARTUP
```

To restore a control file to its non-default location:

If the instance is still running, abort it:
```
SHUTDOWN ABORT
    

         
```
If you cannot correct the hardware problem that caused the media failure, copy the intact control file to alternative locations. For example, to rename good_cf.f as new_cf.f you might issue:
```
% cp /oracle/dbs/good_cf.f /oracle/dbs/new_cf.f
    

         
```
Edit the parameter file of the database so that the CONTROL_FILES parameter reflects the current locations of all control files and excludes all control files that were not restored. For example, to add new_cf.f you might enter:
```
CONTROL_FILES = '/oracle/dbs/good_cf.f', '/oracle/dbs/new_cf.f'
    

         
```
Start a new instance and open the database:
```
STARTUP
```

Losing All Copies of the Current Control File

If all control files of a database have been lost or damaged by a permanent media failure, but all online redo logfiles remain intact, you can recover the database by creating a new control file.

To create a new control file:

Create the control file using the CREATE CONTROLFILE statement with the NORESETLOGS option. See Table 15-1 for options.
Recover the database as normal:
```
RECOVER DATABASE
    

         
```

Depending on the existence and currency of a control file backup, you have the following options for generating the text of the CREATE CONTROLFILE command

Table 15-1 Options for Creating the Control File

If you	Then
Executed ALTER DATABASE BACKUP CONTROLFILE TO TRACE NORESETLOGS after you made the last structural change to the database, and if you have saved the SQL command output	Use the CREATE CONTROLFILE statement from the output as-is.
Performed your most recent execution of ALTER DATABASE BACKUP CONTROLFILE TO TRACE before you made a structural change to the database	Edit the output of ALTER DATABASE BACKUP CONTROLFILE TO TRACE to reflect that change. For example, if you recently added a datafile to the database, add that datafile to the DATAFILE clause of the CREATE CONTROLFILE statement.
Have not backed up the control file using the TO TRACE option, but used the TO filename option of ALTER DATABASE BACKUP CONTROLFILE	Use the control file copy to obtain SQL command output. Copy the backup control file and execute STARTUP MOUNT before executing ALTER DATABASE BACKUP CONTROLFILE TO TRACE NORESETLOGS. If your control file copy predated a recent structural change, edit the TO TRACE output to reflect that structural change.
Do not have a backup of the control file in either TO TRACE format or TO filename format	Generate the CREATE CONTROLFILE statement manually.

Note:
If your character set is not the default US7ASCII, then you must specify the character set as an argument to the CREATE CONTROLFILE statement.

Recovering from User Errors

An accidental or erroneous operational or programmatic change to the database can cause loss or corruption of data. Recovery may require a return to a state prior to the error.

Note:
If you have properly granted powerful privileges (such as DROP ANY TABLE) to only selected, appropriate users, user errors that require database recovery are minimized.

The following scenario describes how to recover a table that has been accidentally dropped.

You can keep the database that experienced the user error online and available for normal use. The database can remain open or be shut down. Back up all datafiles of the existing database in case an error is made during the remaining steps of this procedure.
Create a temporary copy of the database to a past point-in-time using time-based recovery. Be careful not to cause a conflict with the existing control file of the permanent database. Restore a single control file backup to an alternative location (Step 4) and edit the parameter file, as necessary, or create a new control file at the alternative location. Also, restore all datafiles to alternative locations (Step 5) so that you do not affect the permanent copy of the database.
Export the lost data using the Oracle utility Export from the temporary, restored version of the database. In this case, export the accidentally dropped table.

Note:
System audit options are exported.
Import the exported data (Step 3) into the permanent copy of the database using the Oracle Import utility.
Delete the files of the temporary, reconstructed copy of the database to conserve space.

See Also: For more information about the Import and Export utilities, see Oracle8i Utilities.

Performing Media Recovery in a Distributed Environment

The manner in which you perform media recovery depends on whether your database participates in a distributed database system. The Oracle distributed database architecture is autonomous. Therefore, depending on the type of recovery operation selected for a single, damaged database, you may have to coordinate recovery operations globally among all databases in the distributed database system.

Table 15-2 summarizes different types of recovery operations and whether coordination among nodes of a distributed database system is required.

Table 15-2 Recovery Operations in a Distributed Database Environment

If you are	Then
Restoring a whole backup for a database that was never accessed from a remote node	Use non-coordinated, autonomous database recovery.
Restoring a whole backup for a database that was accessed by a remote node	Shut down all databases and restore them using the same coordinated full backup.
Performing complete media recovery of one or more databases in a distributed database	Use non-coordinated, autonomous database recovery.
Performing incomplete media recovery of a database that was never accessed by a remote node	Use non-coordinated, autonomous database recovery.
Performing incomplete media recovery of a database that was accessed by a remote node	Use coordinated, incomplete media recovery to the same global point in time for all databases in the distributed database.

Coordinating Time-Based and Change-Based Distributed Database Recovery

In special circumstances, one node in a distributed database may require recovery to a past time. To preserve global data consistency, it is often necessary to recover all other nodes in the system to the same point in time. This operation is called coordinated, time-based, distributed database recovery. The following tasks should be performed with the standard procedures of time-based and change-based recovery described in this chapter.

Recover the database that requires the recovery operation using time-based recovery. For example, if a database needs to be recovered because of a user error (such as an accidental table drop), recover this database first using time-based recovery. Do not recover the other databases at this point.
After you have recovered the database and opened it using the RESETLOGS option, look in the alert.log of the database for the RESETLOGS message.

If the message is, "RESETLOGS after complete recovery through change xxx," you have applied all the changes in the database and performed a complete recovery. Do not recover any of the other databases in the distributed system, or you will unnecessarily remove changes in them. Recovery is complete.

If the message is, "RESETLOGS after incomplete recovery UNTIL CHANGE xxx," you have successfully performed an incomplete recovery. Record the change number from the message and proceed to the next step.
Recover all other databases in the distributed database system using change-based recovery, specifying the change number (SCN) from Step 2.

Recovering a Database with Snapshots

If a master database is independently recovered to a past time (that is, coordinated, time-based distributed database recovery is not performed), any dependent remote snapshot that was refreshed in the interval of lost time will be inconsistent with its master table. In this case, the administrator of the master database should instruct the remote administrators to perform a complete refresh of any inconsistent snapshot.

15Operating System Recovery Scenarios