Fast-Start Failover, Observe-Only Mode and Health Conditions

Oracle Data Guard Broker allows the database administrators to automate some tasks and an easy way to configure properly a lot of features and details for data guard environments. The Fast-Start FailOver (FSFO) allows the broker to automatically failover to standby database in case of failure of the primary. But until 19c the only option is always to trigger the failover. This changed at 19c with a nice new feature that allows us to put FSFO in Observe-Only Mode.

In this post, I will focus just on new features for FSFO like Observer-Only Mode and Health Conditions for it. Lag and other details will not be covered here.

Observe-Only Mode

The Observe-Only Mode is a simple change that allows putting the FSFO to just observing/monitoring the DG environment, but in case of failure, it does not change the roles between primary and standby. Simple like that. As the Broker documentation for Observe-Only Mode says:

The observe-only mode enables you to test the impact of using fast-start failover in your configuration, without making any actual changes to the configuration.

Mode details can be checked in this link at documentation too. But FSFO is that:

Primary, Standby, Observer

Enable Observe-Only

So, to enable it is very simple, just need to call “ENABLE FAST_START FAILOVER OBSERVE ONLY”:

DGMGRL> ENABLE FAST_START FAILOVER OBSERVE ONLY;
Enabled in Observe-Only Mode.
DGMGRL>
And at drc* trace file at primary side we can see:
2020-06-11T23:45:19.329+02:00
ENABLE FAST_START FAILOVER OBSERVE ONLY
FSFO SetState(st=47 "ENABLE OBONLY", fl=0x0 "", ob=0x2b621d39, tgt=2, v=0)
Setup log_archive_dest_n of GROUP=0 PRIORITY=0 with 'golds19c' as FSFO target
Fast-Start Failover (FSFO) has been enabled under observe-only mode between:
  Primary = "gold19c"
  Standby = "golds19c"
2020-06-11T23:45:20.527+02:00
ENABLE FAST_START FAILOVER OBSERVE ONLY completed successfully

And the result it is FSFO at Observe-Only Mode:

DGMGRL> show fast_start failover;

Fast-Start Failover: Enabled in Observe-Only Mode

  Protection Mode:    MaxAvailability
  Lag Limit:          0 seconds

  Threshold:          30 seconds
  Active Target:      golds19c
  Potential Targets:  "golds19c"
    golds19c   valid
  Observer:           goldsn1.oralocal
  Shutdown Primary:   TRUE
  Auto-reinstate:     TRUE
  Observer Reconnect: (none)
  Observer Override:  FALSE

Configurable Failover Conditions
  Health Conditions:
    Corrupted Controlfile          YES
    Corrupted Dictionary           YES
    Inaccessible Logfile            NO
    Stuck Archiver                  NO
    Datafile Write Errors          YES

  Oracle Error Conditions:
    (none)

DGMGRL>

And after we force the shutdown of the database, we can see that the roles not changed:

[oracle@goldpn1 ~]$ srvctl stop database -d gold19c -o abort
[oracle@goldpn1 ~]$

At Observer log file we can see some information that the error with primary was detected but nothing is done since it is in observe mode:

…
Unable to connect to database using gold19c
[W000 2020-06-12T00:13:22.248+02:00] Primary database cannot be reached.
[W000 2020-06-12T00:13:22.248+02:00] Fast-Start Failover threshold has expired.
[W000 2020-06-12T00:13:22.248+02:00] Try to connect to the standby.
[W000 2020-06-12T00:13:22.248+02:00] Making a last connection attempt to primary database before proceeding with Fast-Start Failover.
[W000 2020-06-12T00:13:22.248+02:00] Check if the standby is ready for failover.
ORA-12514: TNS:listener does not currently know of service requested in connect descriptor

Unable to connect to database using gold19c
[W000 2020-06-12T00:13:22.261+02:00] A fast-start failover would have been initiated...
[W000 2020-06-12T00:13:22.261+02:00] Unable to failover since this observer is in observe-only mode
[W000 2020-06-12T00:13:22.261+02:00] Fast-Start Failover is not possible because observe-only mode.
[W000 2020-06-12T00:13:22.261+02:00] Try to connect to the primary.
ORA-12514: TNS:listener does not currently know of service requested in connect descriptor

Unable to connect to database using gold19c
[W000 2020-06-12T00:13:22.269+02:00] Primary database cannot be reached.
[W000 2020-06-12T00:13:22.269+02:00] Fast-Start Failover observe-only mode enabled.
[W000 2020-06-12T00:13:22.269+02:00] Will not attempt a Fast-Start Failover.
[W000 2020-06-12T00:13:22.269+02:00] Retry connecting to primary.
[W000 2020-06-12T00:13:23.270+02:00] Try to connect to the primary.
ORA-12514: TNS:listener does not currently know of service requested in connect descriptor

Unable to connect to database using gold19c
[W000 2020-06-12T00:13:23.277+02:00] Primary database cannot be reached.
[W000 2020-06-12T00:13:24.278+02:00] Try to connect to the primary.
ORA-12514: TNS:listener does not currently know of service requested in connect descriptor
…

And at drc* trace file at standby side we can see:

2020-06-12T00:13:21.103+02:00
Fast-Start Failover cannot proceed because: "observe-only mode"

Until now, this means that error with primary was detected, logged at logs, but no action was taken. The roles continue the same. The show report confirm this too:

DGMGRL> show configuration verbose;

Configuration - gold19c

  Protection Mode: MaxAvailability
  Members:
  gold19c  - Primary database
    golds19c - (*) Physical standby database

  (*) Fast-Start Failover target

  Properties:
    FastStartFailoverThreshold      = '30'
    OperationTimeout                = '30'
    TraceLevel                      = 'USER'
    FastStartFailoverLagLimit       = '0'
    CommunicationTimeout            = '180'
    ObserverReconnect               = '0'
    FastStartFailoverAutoReinstate  = 'TRUE'
    FastStartFailoverPmyShutdown    = 'TRUE'
    BystandersFollowRoleChange      = 'ALL'
    ObserverOverride                = 'FALSE'
    ExternalDestination1            = ''
    ExternalDestination2            = ''
    PrimaryLostWriteAction          = 'CONTINUE'
    ConfigurationWideServiceName    = 'gold19c_CFG'

Fast-Start Failover: Enabled in Observe-Only Mode
  Lag Limit:          0 seconds
  Threshold:          30 seconds
  Active Target:      golds19c
  Potential Targets:  "golds19c"
    golds19c   valid
  Observer:           goldsn1.oralocal
  Shutdown Primary:   TRUE
  Auto-reinstate:     TRUE
  Observer Reconnect: (none)
  Observer Override:  FALSE

Configuration Status:
ORA-12514: TNS:listener does not currently know of service requested in connect descriptor
ORA-16625: cannot reach member "gold19c"
DGM-17017: unable to determine configuration status

DGMGRL> show fast_start failover;

Fast-Start Failover: Enabled in Observe-Only Mode

  Protection Mode:    MaxAvailability
  Lag Limit:          0 seconds

  Threshold:          30 seconds
  Active Target:      golds19c
  Potential Targets:  "golds19c"
    golds19c   valid
  Observer:           goldsn1.oralocal
  Shutdown Primary:   TRUE
  Auto-reinstate:     TRUE
  Observer Reconnect: (none)
  Observer Override:  FALSE

Configurable Failover Conditions
  Health Conditions:
    Corrupted Controlfile          YES
    Corrupted Dictionary           YES
    Inaccessible Logfile            NO
    Stuck Archiver                  NO
    Datafile Write Errors          YES

  Oracle Error Conditions:
    (none)

DGMGRL>

In some scenarios, this can be good because it allows us to fix the problem and not pass from a failover event (manual reinstate and so on in some cases). Another option is to use the Observe-Only mode to do what the name says, just observer. Think in one environment that you want to test some conditions and the health of the environment (network and others) before you really enable the FSFO.

So, if the primary database returns, the FSFO returns normally:

[oracle@goldpn1 ~]$ srvctl start database -d gold19c
[oracle@goldpn1 ~]$

At drc* file for standby:

2020-06-12T00:16:52.837+02:00
Primary connected to this instance.
2020-06-12T00:17:00.186+02:00
FSFO SetState(st=2 "UNSYNC", fl=0x1 "AVAIL", ob=0x0, tgt=2, v=11)
2020-06-12T00:17:06.951+02:00
FSFO SetState(st=1 "SYNC", fl=0x1 "AVAIL", ob=0x0, tgt=2, v=12)

At Broker:

DGMGRL> show configuration;

Configuration - gold19c

  Protection Mode: MaxAvailability
  Members:
  gold19c  - Primary database
    golds19c - (*) Physical standby database

Fast-Start Failover: Enabled in Observe-Only Mode

Configuration Status:
SUCCESS   (status updated 51 seconds ago)

DGMGRL> show fast_start failover;

Fast-Start Failover: Enabled in Observe-Only Mode

  Protection Mode:    MaxAvailability
  Lag Limit:          0 seconds

  Threshold:          30 seconds
  Active Target:      golds19c
  Potential Targets:  "golds19c"
    golds19c   valid
  Observer:           goldsn1.oralocal
  Shutdown Primary:   TRUE
  Auto-reinstate:     TRUE
  Observer Reconnect: (none)
  Observer Override:  FALSE

Configurable Failover Conditions
  Health Conditions:
    Corrupted Controlfile          YES
    Corrupted Dictionary           YES
    Inaccessible Logfile            NO
    Stuck Archiver                  NO
    Datafile Write Errors          YES

  Oracle Error Conditions:
    (none)

DGMGRL>

Upgrade and Downgrade modes

If the FSFO is operating in Observer-Only ode it is impossible to “upgrade” it to normal mode:

DGMGRL>  ENABLE FAST_START FAILOVER
Error: ORA-16889: Fast-start failover mode cannot be changed between normal and observe-only modes.

Failed.
DGMGRL>

To do that, we need to disable the FSFO and enable it in normal mode:

DGMGRL>  ENABLE FAST_START FAILOVER
Error: ORA-16889: Fast-start failover mode cannot be changed between normal and observe-only modes.

Failed.
DGMGRL>
DGMGRL> DISABLE FAST_START FAILOVER ;
Disabled.
DGMGRL> ENABLE FAST_START FAILOVER ;
Enabled in Zero Data Loss Mode.
DGMGRL>

To downgrade is the same, we can’t downgrade directly, need to disable and change to Observe-Only mode:

DGMGRL> ENABLE FAST_START FAILOVER OBSERVE ONLY;
Error: ORA-16889: Fast-start failover mode cannot be changed between normal and observe-only modes.

Failed.
DGMGRL>

Health Conditions

This is not a new feature for 19c, but help to reduce the scenarios where FSFO is triggered. It is possible to control the Health Conditions and disable/enable some errors like corrupted controlfiles or stuck archive. All options can be checked here at the documentation.

Look below at “Configurable Failover Conditions”, everything there can be set:

DGMGRL>  show fast_start failover;

Fast-Start Failover: Enabled in Observe-Only Mode

  Protection Mode:    MaxAvailability
  Lag Limit:          0 seconds

  Threshold:          30 seconds
  Active Target:      golds19c
  Potential Targets:  "golds19c"
    golds19c   valid
  Observer:           goldsn1.oralocal
  Shutdown Primary:   TRUE
  Auto-reinstate:     TRUE
  Observer Reconnect: (none)
  Observer Override:  FALSE

Configurable Failover Conditions
  Health Conditions:
    Corrupted Controlfile          YES
    Corrupted Dictionary           YES
    Inaccessible Logfile            NO
    Stuck Archiver                  NO
    Datafile Write Errors          YES

  Oracle Error Conditions:
    (none)

DGMGRL>

Some examples:

DGMGRL> ENABLE FAST_START FAILOVER CONDITION "Inaccessible Logfile";
Succeeded.
DGMGRL> show fast_start failover;

Fast-Start Failover: Enabled in Observe-Only Mode

  Protection Mode:    MaxAvailability
  Lag Limit:          0 seconds

  Threshold:          30 seconds
  Active Target:      golds19c
  Potential Targets:  "golds19c"
    golds19c   valid
  Observer:           goldsn1.oralocal
  Shutdown Primary:   TRUE
  Auto-reinstate:     TRUE
  Observer Reconnect: (none)
  Observer Override:  FALSE

Configurable Failover Conditions
  Health Conditions:
    Corrupted Controlfile          YES
    Corrupted Dictionary           YES
    Inaccessible Logfile           YES
    Stuck Archiver                  NO
    Datafile Write Errors          YES

  Oracle Error Conditions:
    (none)

DGMGRL>

DGMGRL> ENABLE FAST_START FAILOVER CONDITION "Corrupted Dictionary";
Succeeded.
DGMGRL> DISABLE FAST_START FAILOVER CONDITION "Inaccessible Logfile";
Succeeded.
DGMGRL> show fast_start failover;

Fast-Start Failover: Enabled in Observe-Only Mode

  Protection Mode:    MaxAvailability
  Lag Limit:          0 seconds

  Threshold:          30 seconds
  Active Target:      golds19c
  Potential Targets:  "golds19c"
    golds19c   valid
  Observer:           goldsn1.oralocal
  Shutdown Primary:   TRUE
  Auto-reinstate:     TRUE
  Observer Reconnect: (none)
  Observer Override:  FALSE

Configurable Failover Conditions
  Health Conditions:
    Corrupted Controlfile          YES
    Corrupted Dictionary           YES
    Inaccessible Logfile            NO
    Stuck Archiver                  NO
    Datafile Write Errors          YES

  Oracle Error Conditions:
    (none)

DGMGRL>

Another option is to enable (or disable) for special condition/error from controlfile. The error ORA-240 can be set at trigger option:

DGMGRL> ENABLE FAST_START FAILOVER CONDITION 240;
Succeeded.
DGMGRL> show fast_start failover;

Fast-Start Failover: Enabled in Observe-Only Mode

  Protection Mode:    MaxAvailability
  Lag Limit:          0 seconds

  Threshold:          30 seconds
  Active Target:      golds19c
  Potential Targets:  "golds19c"
    golds19c   valid
  Observer:           goldsn1.oralocal
  Shutdown Primary:   TRUE
  Auto-reinstate:     TRUE
  Observer Reconnect: (none)
  Observer Override:  FALSE

Configurable Failover Conditions
  Health Conditions:
    Corrupted Controlfile          YES
    Corrupted Dictionary           YES
    Inaccessible Logfile            NO
    Stuck Archiver                  NO
    Datafile Write Errors          YES

  Oracle Error Conditions:
    ORA-240: control file enqueue held for more than %s seconds

DGMGRL> DISABLE FAST_START FAILOVER CONDITION 240;
Succeeded.
DGMGRL>

But just for ORA-240, other errors are not yet enabled:

DGMGRL> ENABLE FAST_START FAILOVER CONDITION 600;
Error: ORA-16524: unsupported command, option, or argument

Failed.
DGMGRL>

Observe-Only and Conditions

The new feature Observe-Only mode for 19c is a good feature because it allows more control where and when the FSFO is triggered. Until this, the only option was ON or OFF. And in scenarios that you want to test, or even validate the environment before enable (for real) was impossible.

And if we combine this with Heath Condition check, it is a powerful control for the DG environment. It allows a better tune.

 

Disclaimer: “The postings on this site are my own and don’t necessarily represent my actual employer positions, strategies or opinions. The information here was edited to be useful for general purpose, specific data and identifications were removed to allow reach the generic audience and to be useful for the community. Post protected by copyright.

Leave a Reply

Your email address will not be published. Required fields are marked *