Oracle e MAA – Artigo VI – Failover Broker

Neste artigo vamos ver como fazer um failover e reinstate em um ambiente que temos o Broker configurado. Se você está acompanhando a série de artigos até verá que já passamos por failover manual, resintate, switchover (todos manuais e sem Broker) e no último nós configuramos o Broker.

De qualquer forma, ainda não chegamos a um ambiente ideal de Oracle MAA (Maximum Availability Architecture) com DataGuard e Oracle RAC. Até o momento não foi configurado o Fast-Start Failover e em caso de falha, mesmo com o Broker, ainda precisamos de intervenção manual em caso de falha no ambiente.

SEXTO ARTIGO

Neste artigo vamos ver como proceder em caso de falha do ambiente quando estamos com o Broker, vamos ver como realizar o failover e reinstate através do Broker. Como disse acima, ainda não estamos com o ambiente de MAA finalizado, mas isso não impede que ocorra uma falha e você tenha que interferir.

Acredito que já tenha ficado claro através da séria sobre MAA que muitos passos não são aplicados em um ambiente real na mesma sequência que a apresentada na série, você não vai fazer um failover só para testar o Broker – você já configura tudo sem precisar “restaurar” o ambiente. Através dos artigos cobrimos os diversos pontos que envolvem um ambiente MAA, você tem que estar preparado para agir mesmo quer o seu ambiente não esteja completamente configurado. Você vai dizer para o seu gerente que não pode realizar um failover porque não tem o Broker configurado?

AMBIENTE

Neste artigo temos um ambiente com dois Oracle RAC sincronizados através de DataGuard. O banco primary é o “maa” e o standby é o “maastb”, ambos sincronizados com DG operando em MAXIMUM AVAILABILITY e com real-time apply. Por fim, temos o Broker configurado e “cuidando” do ambiente.

Antes de continuar voltamos brevemente ao ponto destacado no artigo do failover manual: sincronia e importância do ambiente. Sobre a sincronia destaco que em um ambiente DG ela deve ser um dos pontos chaves, você opera em MAA e não deixa seu ambiente sincronizado? Quanto a importância, se você está operando em MAA com Oracle RAC em ambos os lados o seu ambiente primário e standby é importante também não é?

Então, você tem um banco de dados que armazena dados importantes e não mantêm seu standby sincronizado com o primary? Você já avisou seu gerente que vocês vão perder dados?

Não que todo o ambiente MAA com DG nunca deve perder dados. Talvez por razões de projeto exista um acordo quanto o que possa ser perdido de informação, algumas informações podem ser recuperadas por outros métodos. De qualquer forma, conheça bem o seu ambiente, saiba todos os seus detalhes.

CONTROLE

Em artigos anteriores já foi usado o mesmo artifício para exemplificar a sincronia do DG, voltamos com ele aqui. Utilizando a mesma tabela que foi criada antes, limpamos seus dados e em cada instância do primary serão inseridos dados. Com esta tabela mostramos que no ambiente atual, mesmo sem gerar archivelog, as transações estão protegidas.

Na instância primary maa1:
	SQL> SELECT database_role FROM v$database;

	DATABASE_ROLE
	----------------
	PRIMARY

	SQL> SELECT instance_name FROM v$instance;

	INSTANCE_NAME
	----------------
	maa1

	SQL> DELETE FROM testedg;

	1 row deleted.

	SQL> INSERT INTO testedg(c1, c2) VALUES(1, sysdate);

	1 row created.

	SQL> COMMIT;

	Commit complete.

	SQL>

Na instância primary maa2:
	SQL> SELECT instance_name FROM v$instance;

	INSTANCE_NAME
	----------------
	maa2

	SQL> INSERT INTO testedg(c1, c2) VALUES(2, sysdate);

	1 row created.

	SQL> COMMIT;

	Commit complete.

	SQL> SELECT c1, TO_CHAR(c2, 'DD/MM/RRRR HH24:MI:SS') AS momento FROM testedg;

			C1 MOMENTO
	---------- -------------------
			 1 11/05/2014 12:01:07
			 2 11/05/2014 12:02:17

	SQL>

FAILOVER

Criando a falha

Aqui a falha será um pouco mais catastrófica que a do último failover, os servidores do Oracle RAC primary foram fisicamente desligados. Suponha que algo parecido aconteça no seu ambiente, que a sua controladora do Storage dos discos responsáveis pelo sue banco deu erro e você não estava com o seu Raid na vertical? Imagine que seu Exadata perdeu mais que duas células de Storage no turno da noite.

Então, você chega de manhã (isso se já não te ligaram a noite) e já tem uma comissão te “recepcionando” com cara assustada e pronta para entregar a bomba e dizer: Boa Sorte! Bom, vamos ver que o cenário não é tão negro assim dá pra sair dessa bem tranquilo.

FAILOVER TO

Já falei acima que a primeira preocupação é a síncrona, mas como você já tem todo o ambiente configurado de maneira correta sabe que a sincronia não será problema. Se você quiser conferir conecte-se no standby e verifique como ela está:

SQL> SELECT database_role FROM v$database;

DATABASE_ROLE
----------------
PHYSICAL STANDBY

SQL> SELECT protection_mode, protection_level FROM v$database;

PROTECTION_MODE      PROTECTION_LEVEL
-------------------- --------------------
MAXIMUM AVAILABILITY MAXIMUM AVAILABILITY

SQL>

Como estamos com o Broker configurado, vamos usar ele. Aqui um detalhe, se você tentar se conectar no primary irá receber um erro, você tem que recorrer a conexão do standby no Broker:

[oracle@rac11stb01 ~]$ dgmgrl
DGMGRL for Linux: Version 11.2.0.3.0 - 64bit Production

Copyright (c) 2000, 2009, Oracle. All rights reserved.

Welcome to DGMGRL, type "help" for information.
DGMGRL> CONNECT sys@maa
Password:
Unable to connect to database
ORA-12543: TNS:destination host unreachable

Failed.
DGMGRL> CONNECT sys@maastb
Password:
Connected.
DGMGRL>

Como vimos acima o Broker não consegue conectar no ambiente primary. Sabemos que ele está fora e precisamos do failover. Claro que antes de você fazer o failover você conversou com a equipe de infra/sistema e verificou a indisponibilidade de retorno rápido do ambiente. Assim, a única opção é o failover.

Depois de conectado no Broker você pode verificar como está o seu ambiente:

DGMGRL> SHOW CONFIGURATION VERBOSE;

Configuration - dgrac

  Protection Mode: MaxAvailability
  Databases:
	maa    - Primary database
	maastb - Physical standby database

  Properties:
	FastStartFailoverThreshold      = '30'
	OperationTimeout                = '30'
	FastStartFailoverLagLimit       = '30'
	CommunicationTimeout            = '180'
	FastStartFailoverAutoReinstate  = 'TRUE'
	FastStartFailoverPmyShutdown    = 'TRUE'
	BystandersFollowRoleChange      = 'ALL'

Fast-Start Failover: DISABLED

Configuration Status:
ORA-12543: TNS:destination host unreachable
ORA-16625: cannot reach database "maa"
DGM-17017: unable to determine configuration status

DGMGRL> SHOW DATABASE maa

Database - maa

  Role:            PRIMARY
  Intended State:  TRANSPORT-ON
  Instance(s):
	maa1
	maa2

Database Status:
DGM-17016: failed to retrieve status for database "maa"
ORA-12543: TNS:destination host unreachable
ORA-16625: cannot reach database "maa"

DGMGRL> SHOW DATABASE maastb;

Database - maastb

  Role:            PHYSICAL STANDBY
  Intended State:  APPLY-ON
  Transport Lag:   0 seconds
  Apply Lag:       0 seconds
  Real Time Query: OFF
  Instance(s):
	maastb1 (apply instance)
	maastb2

Database Status:
SUCCESS

DGMGRL>

Como você viu acima, o banco primary (maa) está fora e o Broker está esperando você se decidir. Para realizar o failover através do Broker basta digitar o comando “FAILOVER TO <nome do standby>” como mostrarei abaixo. Para realizar o failover execute:

DGMGRL> FAILOVER TO maastb;
Performing failover NOW, please wait...
Operation requires a connection to instance "maastb1" on database "maastb"
Connecting to instance "maastb1"...
Connected.
Failover succeeded, new primary is "maastb"
DGMGRL>

No log do broker da instância maastb1:
	05/11/2014 12:52:28
	FAILOVER TO maastb
	Beginning failover to database maastb
	Notifying Oracle Clusterware to teardown database for FAILOVER
	05/11/2014 12:53:18
	Protection mode set to MAXIMUM AVAILABILITY
	05/11/2014 12:53:29
	Deferring associated archivelog destinations of sites permanently disabled due to Failover
	Notifying Oracle Clusterware to buildup primary database after FAILOVER
	Data Guard notifying Oracle Clusterware to start services and other instances change
	05/11/2014 12:53:33
	Posting DB_DOWN alert ...
			... with reason Data Guard Failover
	05/11/2014 12:53:37
	Command FAILOVER TO maastb completed

No log do broker da instância maastb2:
	05/11/2014 18:52:26
	FAILOVER TO maastb
	Command FAILOVER TO maastb completed with warning ORA-16523
	05/11/2014 18:53:18
	Protection mode set to MAXIMUM AVAILABILITY
	05/11/2014 18:53:33
	Command FAILOVER TO maastb completed

Observe atentamente as saídas mostradas acima. Na primeira temos a execução do comando e observe que o Broker para proceder com o comando conectou especificamente em uma única instância. Nos logs do Broker temos a informação de que a instância foi reiniciada devido a failover (teardown) e que os destinos de archives foram “desabilitados”.

Vamos ver o que aconteceu no alertlog da instância “maastb1” (faça uma relação com os horários dos logs anteriores). Adianto as desculpas, mas o log abaixo é extenso exatamente para ilustrar tudo o que ocorreu:

No alertlog da standby maastb (nova primary):
	Sun May 11 12:52:28 2014
	Data Guard Broker: Beginning failover
	Sun May 11 12:52:29 2014
	ALTER DATABASE RECOVER MANAGED STANDBY DATABASE CANCEL
	Sun May 11 12:52:29 2014
	MRP0: Background Media Recovery cancelled with status 16037
	Errors in file /u01/app/oracle/diag/rdbms/maastb/maastb1/trace/maastb1_pr00_10962.trc:
	ORA-16037: user requested cancel of managed recovery operation
	Sun May 11 12:52:29 2014
	Managed Standby Recovery not using Real Time Apply
	Recovery interrupted!
	Recovered data files to a consistent state at change 8878906
	Sun May 11 12:52:33 2014
	MRP0: Background Media Recovery process shutdown (maastb1)
	Managed Standby Recovery Canceled (maastb1)
	Completed: ALTER DATABASE RECOVER MANAGED STANDBY DATABASE CANCEL
	ALTER DATABASE RECOVER MANAGED STANDBY DATABASE FINISH FORCE
	Attempt to do a Terminal Recovery (maastb1)
	Media Recovery Start: Managed Standby Recovery (maastb1)
	 started logmerger process
	Managed Standby Recovery not using Real Time Apply
	Parallel Media Recovery started with 2 slaves
	Sun May 11 12:52:36 2014
	Media Recovery Waiting for thread 2 sequence 99 (in transit)
	Sun May 11 12:52:37 2014
	krsv_proc_kill: Killing 6 processes (all RFS, wait for I/O)
	Begin: Standby Redo Logfile archival
	End: Standby Redo Logfile archival
	Terminal Recovery timestamp is '05/11/2014 12:52:40'
	Terminal Recovery: applying standby redo logs.
	Terminal Recovery: thread 2 seq# 99 redo required
	Terminal Recovery:
	Recovery of Online Redo Log: Thread 2 Group 9 Seq 99 Reading mem 0
	  Mem# 0: +DG01/maastb/onlinelog/group_9.260.844716095
	  Mem# 1: +FRA/maastb/onlinelog/group_9.591.844716097
	Identified End-Of-Redo (failover) for thread 2 sequence 99 at SCN 0xffff.ffffffff
	Terminal Recovery: thread 1 seq# 140 redo required
	Terminal Recovery:
	Recovery of Online Redo Log: Thread 1 Group 6 Seq 140 Reading mem 0
	  Mem# 0: +DG01/maastb/onlinelog/group_6.267.844716079
	  Mem# 1: +FRA/maastb/onlinelog/group_6.604.844716081
	Identified End-Of-Redo (failover) for thread 1 sequence 140 at SCN 0xffff.ffffffff
	Incomplete Recovery applied until change 8878917 time 05/11/2014 18:08:24
	Sun May 11 12:52:41 2014
	Media Recovery Complete (maastb1)
	Terminal Recovery: successful completion
	Sun May 11 12:52:41 2014
	ARC0: Archiving not possible: error count exceeded
	Sun May 11 12:52:41 2014
	ARC3: Archiving not possible: error count exceeded
	Forcing ARSCN to IRSCN for TR 0:8878917
	Attempt to set limbo arscn 0:8878917 irscn 0:8878917
	Resetting standby activation ID 723321957 (0x2b1d0465)
	ARCH: Archival stopped, error occurred. Will continue retrying
	ORACLE Instance maastb1 - Archival Error
	ORA-16014: log 6 sequence# 140 not archived, no available destinations
	ORA-00312: online log 6 thread 1: '+DG01/maastb/onlinelog/group_6.267.844716079'
	ORA-00312: online log 6 thread 1: '+FRA/maastb/onlinelog/group_6.604.844716081'
	Completed: ALTER DATABASE RECOVER MANAGED STANDBY DATABASE FINISH FORCE
	ALTER DATABASE COMMIT TO SWITCHOVER TO PRIMARY WAIT WITH SESSION SHUTDOWN
	ALTER DATABASE SWITCHOVER TO PRIMARY (maastb1)
	Maximum wait for role transition is 15 minutes.
	Backup controlfile written to trace file /u01/app/oracle/diag/rdbms/maastb/maastb1/trace/maastb1_rsm0_18329.trc
	Standby terminal recovery start SCN: 8878906
	RESETLOGS after incomplete recovery UNTIL CHANGE 8878917
	Online log +DG01/maastb/onlinelog/group_1.257.844716051: Thread 1 Group 1 was previously cleared
	Online log +FRA/maastb/onlinelog/group_1.568.844716053: Thread 1 Group 1 was previously cleared
	Online log +DG01/maastb/onlinelog/group_2.268.844716057: Thread 1 Group 2 was previously cleared
	Online log +FRA/maastb/onlinelog/group_2.564.844716059: Thread 1 Group 2 was previously cleared
	Online log +DG01/maastb/onlinelog/group_3.269.844716063: Thread 2 Group 3 was previously cleared
	Online log +FRA/maastb/onlinelog/group_3.562.844716065: Thread 2 Group 3 was previously cleared
	Online log +DG01/maastb/onlinelog/group_4.270.844716067: Thread 2 Group 4 was previously cleared
	Online log +FRA/maastb/onlinelog/group_4.559.844716071: Thread 2 Group 4 was previously cleared
	Standby became primary SCN: 8878905
	Sun May 11 12:52:45 2014
	Setting recovery target incarnation to 3
	Switchover: Complete - Database mounted as primary
	Sun May 11 12:52:51 2014
	Completed: ALTER DATABASE COMMIT TO SWITCHOVER TO PRIMARY WAIT WITH SESSION SHUTDOWN
	ALTER DATABASE SET STANDBY DATABASE TO MAXIMIZE AVAILABILITY
	Completed: ALTER DATABASE SET STANDBY DATABASE TO MAXIMIZE AVAILABILITY
	ALTER DATABASE OPEN
	Data Guard Broker initializing...
	This instance was first to open
	Picked broadcast on commit scheme to generate SCNs
	Sun May 11 12:52:53 2014
	Assigning activation ID 723379342 (0x2b1de48e)
	LGWR: Primary database is in MAXIMUM AVAILABILITY mode
	Destination LOG_ARCHIVE_DEST_2 is UNSYNCHRONIZED
	LGWR: Destination LOG_ARCHIVE_DEST_1 is not serviced by LGWR
	Thread 1 advanced to log sequence 2 (thread open)
	Sun May 11 12:52:54 2014
	ARC3: Becoming the 'no SRL' ARCH
	Thread 1 opened at log sequence 2
	Sun May 11 12:52:54 2014
	  Current log# 2 seq# 2 mem# 0: +DG01/maastb/onlinelog/group_2.268.844716057ARC0: Becoming the 'no SRL' ARCH

	  Current log# 2 seq# 2 mem# 1: +FRA/maastb/onlinelog/group_2.564.844716059
	Successful open of redo thread 1
	MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
	Sun May 11 12:52:55 2014
	SMON: enabling cache recovery
	ARC0: Becoming the 'no SRL' ARCH
	Archived Log entry 323 added for thread 1 sequence 1 ID 0x2b1de48e dest 1:
	Archiver process freed from errors. No longer stopped
	Instance recovery: looking for dead threads
	Instance recovery: lock domain invalid but no dead threads
	Sun May 11 12:53:04 2014


	***********************************************************************

	Fatal NI connect error 12543, connecting to:
	 (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=rac11pri-scan.tjsc.jus.br)(PORT=1521))(CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=maa)(CID=(PROGRAM=oracle)(HOST=rac11stb01.tjsc.jus.br)(USER=oracle))))

	  VERSION INFORMATION:
			TNS for Linux: Version 11.2.0.3.0 - Production
			TCP/IP NT Protocol Adapter for Linux: Version 11.2.0.3.0 - Production
	  Time: 11-MAY-2014 12:53:04
	  Tracing not turned on.
	  Tns error struct:
		ns main err code: 12543

	TNS-12543: TNS:destination host unreachable
		ns secondary err code: 12560
		nt main err code: 513

	TNS-00513: Destination host unreachable
		nt secondary err code: 113
		nt OS err code: 0
	Sun May 11 12:53:06 2014
	[18329] Successfully onlined Undo Tablespace 2.
	Undo initialization finished serial:0 start:3723812044 end:3723813584 diff:1540 (15 seconds)
	Verifying file header compatibility for 11g tablespace encryption..
	Verifying 11g file header compatibility for tablespace encryption completed
	Sun May 11 12:53:06 2014
	SMON: enabling tx recovery
	Database Characterset is WE8MSWIN1252
	Sun May 11 12:53:07 2014
	minact-scn: Inst 1 is a slave inc#:4 mmon proc-id:10424 status:0x2
	minact-scn status: grec-scn:0x0000.00000000 gmin-scn:0x0000.00000000 gcalc-scn:0x0000.00000000
	No Resource Manager plan active


	***********************************************************************

	Fatal NI connect error 12543, connecting to:
	 (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=rac11pri-scan.tjsc.jus.br)(PORT=1521))(CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=maa)(CID=(PROGRAM=oracle)(HOST=rac11stb01.tjsc.jus.br)(USER=oracle))))

	  VERSION INFORMATION:
			TNS for Linux: Version 11.2.0.3.0 - Production
			TCP/IP NT Protocol Adapter for Linux: Version 11.2.0.3.0 - Production
	  Time: 11-MAY-2014 12:53:13
	  Tracing not turned on.
	  Tns error struct:
		ns main err code: 12543

	TNS-12543: TNS:destination host unreachable
		ns secondary err code: 12560
		nt main err code: 513

	TNS-00513: Destination host unreachable
		nt secondary err code: 113
		nt OS err code: 0
	Starting background process GTX0
	Sun May 11 12:53:14 2014
	GTX0 started with pid=40, OS id=10940
	Starting background process RCBG
	Sun May 11 12:53:14 2014
	RCBG started with pid=41, OS id=10942
	replication_dependency_tracking turned off (no async multimaster replication found)
	Sun May 11 12:53:16 2014
	Starting background process QMNC
	Sun May 11 12:53:16 2014
	QMNC started with pid=42, OS id=10944
	Completed: ALTER DATABASE OPEN
	ALTER SYSTEM SET log_archive_trace=0 SCOPE=BOTH SID='maastb1';
	ALTER SYSTEM SET log_archive_format='arch_%t_%s_%r.arc' SCOPE=SPFILE SID='maastb1';
	Sun May 11 12:53:22 2014
	Starting background process SMCO
	Sun May 11 12:53:22 2014
	SMCO started with pid=44, OS id=10957
	Sun May 11 12:53:22 2014


	***********************************************************************

	Fatal NI connect error 12543, connecting to:
	 (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=rac11pri-scan.tjsc.jus.br)(PORT=1521))(CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=maa)(CID=(PROGRAM=oracle)(HOST=rac11stb01.tjsc.jus.br)(USER=oracle))))

	  VERSION INFORMATION:
			TNS for Linux: Version 11.2.0.3.0 - Production
			TCP/IP NT Protocol Adapter for Linux: Version 11.2.0.3.0 - Production
	  Time: 11-MAY-2014 12:53:22
	  Tracing not turned on.
	  Tns error struct:
		ns main err code: 12543

	TNS-12543: TNS:destination host unreachable
		ns secondary err code: 12560
		nt main err code: 513

	TNS-00513: Destination host unreachable
		nt secondary err code: 113
		nt OS err code: 0
	Error 12543 received logging on to the standby
	PING[ARC2]: Heartbeat failed to connect to standby 'maa'. Error is 12543.
	Sun May 11 12:53:22 2014
	ARC1: STARTING ARCH PROCESSES
	Sun May 11 12:53:22 2014
	ARC0: Becoming the 'no SRL' ARCH
	Sun May 11 12:53:22 2014
	ARC4 started with pid=48, OS id=10961
	ARC4: Archival started
	ARC1: STARTING ARCH PROCESSES COMPLETE
	ARC1: Becoming the 'no SRL' ARCH
	Sun May 11 12:53:26 2014
	ALTER SYSTEM SET standby_file_management='AUTO' SCOPE=BOTH SID='*';
	ALTER SYSTEM SET archive_lag_target=0 SCOPE=BOTH SID='*';
	ALTER SYSTEM SET log_archive_max_processes=4 SCOPE=BOTH SID='*';
	ALTER SYSTEM SET log_archive_min_succeed_dest=1 SCOPE=BOTH SID='*';
	ALTER SYSTEM SET db_file_name_convert='+DATA/maa','+DG01/maastb','+FRA/maa','+FRA/maastb' SCOPE=SPFILE;
	ALTER SYSTEM SET log_file_name_convert='+DATA/maa','+DG01/maastb','+FRA/maa','+FRA/maastb' SCOPE=SPFILE;
	Sun May 11 12:53:29 2014
	ALTER SYSTEM SET log_archive_dest_state_2='RESET' SCOPE=BOTH;
	Sun May 11 12:53:31 2014
	Starting background process CJQ0
	Sun May 11 12:53:31 2014
	CJQ0 started with pid=52, OS id=10995
	Failover succeeded. Primary database is now maastb.
	Sun May 11 12:53:33 2014
	Setting Resource Manager plan DEFAULT_MAINTENANCE_PLAN via parameter
	Sun May 11 12:53:33 2014
	Starting background process VKRM
	Sun May 11 12:53:33 2014
	VKRM started with pid=60, OS id=11000
	Setting Resource Manager plan SCHEDULER[0x318F]:DEFAULT_MAINTENANCE_PLAN via scheduler window
	Setting Resource Manager plan DEFAULT_MAINTENANCE_PLAN via parameter
	Sun May 11 12:53:38 2014
	Active Session History (ASH) performed an emergency flush. This may mean that ASH is undersized. If emergency flushes are a recurring issue, you may consider increasing ASH size by setting the value of _ASH_SIZE to a sufficiently large value. Currently, ASH size is 4194304 bytes. Both ASH size and the total number of emergency flushes since instance startup can be monitored by running the following query:
	 select total_size,awr_flush_emergency_count from v$ash_info;

Analisando com calma tudo o que temos acima observamos que a primeira coisa que o Broker fez foi cancelar a aplicação de redo oriundos da antiga primary. Identificamos também que o standby se tornou primary no SCN# 8878905 e marcou um End-Of-Redo em ambas as threads. Depois, todas as transações e informações sofreram commit para que o banco passasse a ser o primary. Além disso, o banco abriu em MAXIMUM AVAILABILITY e já detectou que o log_archive_dest_2 está dessincronizado (por isso as falhas de TNS que aparecem).  Acima eu não demonstrei alertlog da segunda instância que compõe o banco “maastb”, mas as informações contidas lá são simples (já que tudo foi feito pela “maastb1”).

Se você não tivesse o Broker você teria que fazer tudo isso manualmente. Todos os comandos desde o cancel ante o failover seriam manuais, inclusive o ajuste do CRS. Para verificar como ficou o ambiente podemos executar no Broker:

DGMGRL> SHOW CONFIGURATION VERBOSE;

Configuration - dgrac

  Protection Mode: MaxAvailability
  Databases:
	maastb - Primary database
	  Warning: ORA-16629: database reports a different protection level from the protection mode

	maa    - Physical standby database (disabled)
	  ORA-16661: the standby database needs to be reinstated

  Properties:
	FastStartFailoverThreshold      = '30'
	OperationTimeout                = '30'
	FastStartFailoverLagLimit       = '30'
	CommunicationTimeout            = '180'
	FastStartFailoverAutoReinstate  = 'TRUE'
	FastStartFailoverPmyShutdown    = 'TRUE'
	BystandersFollowRoleChange      = 'ALL'

Fast-Start Failover: DISABLED

Configuration Status:
WARNING

DGMGRL> SHOW DATABASE VERBOSE maastb;

Database - maastb

  Role:            PRIMARY
  Intended State:  TRANSPORT-ON
  Instance(s):
	maastb1
	maastb2

  Database Warning(s):
	ORA-16629: database reports a different protection level from the protection mode

  Properties:
	DGConnectIdentifier             = 'maastb'
	ObserverConnectIdentifier       = ''
	LogXptMode                      = 'SYNC'
	DelayMins                       = '0'
	Binding                         = 'OPTIONAL'
	MaxFailure                      = '0'
	MaxConnections                  = '1'
	ReopenSecs                      = '300'
	NetTimeout                      = '30'
	RedoCompression                 = 'DISABLE'
	LogShipping                     = 'ON'
	PreferredApplyInstance          = ''
	ApplyInstanceTimeout            = '0'
	ApplyParallel                   = 'AUTO'
	StandbyFileManagement           = 'AUTO'
	ArchiveLagTarget                = '0'
	LogArchiveMaxProcesses          = '4'
	LogArchiveMinSucceedDest        = '1'
	DbFileNameConvert               = '+DATA/maa, +DG01/maastb, +FRA/maa, +FRA/maastb'
	LogFileNameConvert              = '+DATA/maa, +DG01/maastb, +FRA/maa, +FRA/maastb'
	FastStartFailoverTarget         = ''
	InconsistentProperties          = '(monitor)'
	InconsistentLogXptProps         = '(monitor)'
	SendQEntries                    = '(monitor)'
	LogXptStatus                    = '(monitor)'
	RecvQEntries                    = '(monitor)'
	SidName(*)
	StaticConnectIdentifier(*)
	StandbyArchiveLocation(*)
	AlternateLocation(*)
	LogArchiveTrace(*)
	LogArchiveFormat(*)
	TopWaitEvents(*)
	(*) - Please check specific instance for the property value

Database Status:
WARNING

DGMGRL>

Observe acima que temos alguns detalhes a serem averiguados. Temos um belo “ORA-16629: database reports a different protection level from the protection mode” e ele ocorre pois o novo standby não está disponível para sincronização de archives (vou demonstrar abaixo). Vemos também que o “maastb” é o novo primary.

Se verificarmos o antigo primary iremos ver informações importantes:

DGMGRL> SHOW DATABASE VERBOSE maa;

Database - maa

  Role:            PHYSICAL STANDBY
  Intended State:  APPLY-ON
  Transport Lag:   (unknown)
  Apply Lag:       (unknown)
  Real Time Query: OFF
  Instance(s):
	maa1
	maa2

  Properties:
	DGConnectIdentifier             = 'maa'
	ObserverConnectIdentifier       = ''
	LogXptMode                      = 'SYNC'
	DelayMins                       = '0'
	Binding                         = 'optional'
	MaxFailure                      = '0'
	MaxConnections                  = '1'
	ReopenSecs                      = '300'
	NetTimeout                      = '30'
	RedoCompression                 = 'DISABLE'
	LogShipping                     = 'ON'
	PreferredApplyInstance          = ''
	ApplyInstanceTimeout            = '0'
	ApplyParallel                   = 'AUTO'
	StandbyFileManagement           = 'AUTO'
	ArchiveLagTarget                = '0'
	LogArchiveMaxProcesses          = '4'
	LogArchiveMinSucceedDest        = '1'
	DbFileNameConvert               = '+DG01/maastb, +DATA/maa, +FRA/maastb, +FRA/maa'
	LogFileNameConvert              = '+DG01/maastb, +DATA/maa, +FRA/maastb, +FRA/maa'
	FastStartFailoverTarget         = ''
	InconsistentProperties          = '(monitor)'
	InconsistentLogXptProps         = '(monitor)'
	SendQEntries                    = '(monitor)'
	LogXptStatus                    = '(monitor)'
	RecvQEntries                    = '(monitor)'
	SidName(*)
	StaticConnectIdentifier(*)
	StandbyArchiveLocation(*)
	AlternateLocation(*)
	LogArchiveTrace(*)
	LogArchiveFormat(*)
	TopWaitEvents(*)
	(*) - Please check specific instance for the property value

Database Status:
ORA-16661: the standby database needs to be reinstated

DGMGRL>

O principal detalhe aqui é a informação que ele necessita de um reinstate: “ORA-16661: the standby database needs to be reinstated”. Se você disponibilizar o seu antigo primary você terá que fazer o reinstate dele.

CONTROLE 2

Até o momento tudo foi muito legal e bonito, fizemos o failover através do Broker e ele cuidou de tudo por nós. Claro que tudo isso aconteceu porque deixamos e mantivemos o ambiente são, a sincronização entre primary e standby estava correta e nenhuma informação foi perdida. Vamos ver a tabela de teste que usamos no início do artigo:

SQL> SELECT instance_name, status FROM gv$instance;

INSTANCE_NAME    STATUS
---------------- ------------
maastb2          OPEN
maastb1          OPEN

SQL> SELECT c1, TO_CHAR(c2, 'DD/MM/RRRR HH24:MI:SS') AS momento FROM testedg;

		C1 MOMENTO
---------- -------------------
		 1 11/05/2014 12:01:07
		 2 11/05/2014 12:02:17

SQL>

REINSTATE

Já aviso com antecedência que os próximos logs são extensos. De qualquer forma eles estão recheados de detalhes técnicos que merecem atenção e um olhar mais cuidadoso.

Se você deixar tudo da forma como está agora você tem o banco “maastb” atuando como primary e esperando que o banco “maa” assuma seu lugar como standy. No log do Broker das instâncias do primary você verá a seguinte informação (bem como nas saídas do dgmgrl que mostrei acima):

05/11/2014 19:40:53
Error: The actual protection level 'RESYNCHRONIZATION' is different from the configured protection mode 'MAXIMUM AVAILABILITY'.
Data Guard Broker Status Summary:
  Type                        Name                             Severity  Status
  Configuration               dgrac                             Warning  ORA-16608
  Primary Database            maastb                            Warning  ORA-16629
  Physical Standby Database   maa                                 Error  ORA-16661

Além disso, você deve estar se perguntando o que acontecerá se o meu antigo primary voltar e ficar disponível? Será que vai dar problema? Hum… não.

O simples fato de usarmos o Broker já nos ajuda tremendamente. Quando o antigo primary voltar ele precisará de um reinstate assim que retorne. Observe abaixo alertlog do antigo primary assim que ele foi religado (este log foi “cortado”, mas suas partes interessantes estão mantidas):

Mon May 12 05:14:46 2014
Starting ORACLE instance (normal)
****************** Large Pages Information *****************

Total Shared Global Region in Large Pages = 0 KB (0%)

Large Pages used by this instance: 0 (0 KB)
Large Pages unused system wide = 0 (0 KB) (alloc incr 4096 KB)
Large Pages configured system wide = 0 (0 KB)
Large Page size = 2048 KB
...
...
SYS auditing is disabled
Starting up:
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Partitioning and Real Application Clusters options.
ORACLE_HOME = /u01/app/oracle/product/11.2.0.3/db_1
System name:    Linux
Node name:      rac11pri01.tjsc.jus.br
Release:        2.6.39-400.17.1.el6uek.x86_64
Version:        #1 SMP Fri Feb 22 18:16:18 PST 2013
Machine:        x86_64
VM name:        VMWare Version: 6
...
...
  fal_server               = "MAASTB"
  log_archive_trace        = 0
  log_archive_config       = "DG_CONFIG=(maa,maastb)"
...
...
DBW0 started with pid=17, OS id=3509
Mon May 12 05:14:51 2014
LGWR started with pid=18, OS id=3516
Mon May 12 05:14:51 2014
CKPT started with pid=19, OS id=3519
Mon May 12 05:14:51 2014
SMON started with pid=20, OS id=3521
Mon May 12 05:14:51 2014
RECO started with pid=21, OS id=3523
...
...
Reconfiguration complete
Mon May 12 05:14:52 2014
LCK0 started with pid=28, OS id=3558
Starting background process RSMN
Mon May 12 05:14:52 2014
RSMN started with pid=29, OS id=3562
ORACLE_BASE not set in environment. It is recommended
that ORACLE_BASE be set in the environment
Reusing ORACLE_BASE from an earlier startup = /u01/app/oracle
Mon May 12 05:14:53 2014
DMON started with pid=30, OS id=3565
Mon May 12 05:14:55 2014
ALTER SYSTEM SET local_listener='(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=10.17.42.14)(PORT=1521))))' SCOPE=MEMORY SID='maa1';
ALTER DATABASE MOUNT /* db agent *//* {1:51382:2} */
This instance was first to mount
NOTE: Loaded library: /opt/oracle/extapi/64/asm/orcl/1/libasm.so
NOTE: Loaded library: System
...
...
Database mounted in Shared Mode (CLUSTER_DATABASE=TRUE)
Lost write protection disabled
Completed: ALTER DATABASE MOUNT /* db agent *//* {1:51382:2} */
ALTER DATABASE OPEN /* db agent *//* {1:51382:2} */
Data Guard Broker initializing...
Data Guard Broker initialization complete
Data Guard: verifying database primary role...
Mon May 12 05:15:06 2014
Starting Data Guard Broker (DMON)
Mon May 12 05:15:06 2014
INSV started with pid=35, OS id=3657
Mon May 12 05:15:10 2014
NSV1 started with pid=36, OS id=3669
Mon May 12 05:15:16 2014
Data Guard: version check completed
Data Guard determines a failover has occurred - this is no longer a primary database
ORA-16649 signalled during: ALTER DATABASE OPEN /* db agent *//* {1:51382:2} */...

Observe acima que ao subir ele abriu o banco, mas logo em seguida que o processo do Broker foi iniciado ele detectou o ORA-16649 ao fazer o “verifying database primary role”. Assim, ele detectou que “this is no longer a primary database” e deixou o banco em modo mount. No log do Broker da antiga primary nós temos:

05/12/2014 05:15:06
>> Starting Data Guard Broker bootstrap <<
Broker Configuration File Locations:
	  dg_broker_config_file1 = "+DATA/maa/dr1maa.dat"
	  dg_broker_config_file2 = "+FRA/maa/dr2maa.dat"
05/12/2014 05:15:10
DMON Registering service maa_DGB with listener(s)
Broker Configuration:       "dgrac"
	  Protection Mode:            Maximum Availability
	  Fast-Start Failover (FSFO): Disabled, flags=0x0, version=0
	  Primary Database:           maa (0x01010000)
	  Standby Database:           maastb, Enabled Physical Standby (0x02010000)
05/12/2014 05:15:16
Site maastb returned ORA-16623.
Version Check Results:
	  Database maastb returned ORA-16623
Disable redo transport to all standby databases
RSM0 failed to remove all destinations, error ORA-16718

Observe que também temos um “ORA-16623” e “ORA-16718”. Isso simplesmente nos diz que o banco atual não pode abrir e precisa de reinstate para mudar de papel. Outro detalhe interessante que peço que observe é quem sinalizou o ORA-16623, se você notar foi a nova primary que fez isso e o erro nos diz que ocorreu uma troca de papeis.

Para resolver tudo nós precisamos fazer o reinstate do antigo primary para que ele se torne o novo standby. Para isso execute no dgmgrl o seguinte comando (estou conectando através de uma instância do standby):

DGMGRL> REINSTATE DATABASE maa;
Reinstating database "maa", please wait...
Operation requires shutdown of instance "maa1" on database "maa"
Shutting down instance "maa1"...
ORA-01109: database not open

Database dismounted.
ORACLE instance shut down.
Operation requires startup of instance "maa1" on database "maa"
Starting instance "maa1"...
ORACLE instance started.
Database mounted.
Continuing to reinstate database "maa" ...
Reinstatement of database "maa" succeeded
DGMGRL>

Observe acima que ao chamar o reinstate do banco através do Broker ele conectou no antigo primary (maa), realizou o shutdown e abriu a instância em modo mount. Enquanto o comando acima era executado a seguinte informação aparecia no log do Broker da instância do primary:

Log Broker instancia maastb1:
	05/11/2014 20:11:54
	REINSTATE DATABASE maa
	Database maa can be reinstated
	05/11/2014 20:13:02
	Command REINSTATE DATABASE maa completed with warning ORA-16570
	05/11/2014 20:13:46
	REINSTATE DATABASE maa
	Database maa can be reinstated
	05/11/2014 20:14:06
	Successfully completed reinstatement of database 0x01001000; removing ReinstateContextArray property
	05/11/2014 20:14:47
	Command REINSTATE DATABASE maa completed
	EDIT DATABASE maa SET PROPERTY ActualApplyInstance = maa1
	Apply Instance for database maa is maa1
	Command EDIT DATABASE maa SET PROPERTY ActualApplyInstance = maa1 completed

A informação é praticamente a mesma que apareceu no dgmgrl. As informações ficam mais interessantes no alertlog das instâncias do primary, observe abaixo:

Alertlog da instância maastb1:
	Sun May 11 20:12:28 2014
	ALTER SYSTEM SET log_archive_dest_2='service="maa"','LGWR SYNC AFFIRM delay=0 optional compression=disable max_failure=0 max_connections=1 reopen=1 db_unique_name="maa" net_timeout=30','valid_for=(all_logfiles,primary_role)' SCOPE=BOTH;
	ALTER SYSTEM SET log_archive_dest_state_2='ENABLE' SCOPE=BOTH;
	Sun May 11 20:12:30 2014
	******************************************************************
	LGWR: Setting 'active' archival for destination LOG_ARCHIVE_DEST_2
	******************************************************************
	LGWR: Error 16009 disconnecting from destination LOG_ARCHIVE_DEST_2 standby host 'maa'
	Error 16009 for archive log file 1 to 'maa'
	LGWR: Failed to archive log 1 thread 1 sequence 3 (16009)
	Thread 1 advanced to log sequence 3 (LGWR switch)
	  Current log# 1 seq# 3 mem# 0: +DG01/maastb/onlinelog/group_1.257.844716051
	  Current log# 1 seq# 3 mem# 1: +FRA/maastb/onlinelog/group_1.568.844716053
	Sun May 11 20:12:42 2014
	Archived Log entry 328 added for thread 1 sequence 2 ID 0x2b1de48e dest 1:
	Sun May 11 20:12:51 2014
	ALTER SYSTEM SET log_archive_dest_state_2='RESET' SCOPE=BOTH;
	Sun May 11 20:13:48 2014


	***********************************************************************

	Fatal NI connect error 12514, connecting to:
	 (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=rac11pri-scan.tjsc.jus.br)(PORT=1521))(CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=maa_DGB)(CID=(PROGRAM=oracle)(HOST=rac11stb01.tjsc.jus.br)(USER=oracle))))

	  VERSION INFORMATION:
			TNS for Linux: Version 11.2.0.3.0 - Production
			TCP/IP NT Protocol Adapter for Linux: Version 11.2.0.3.0 - Production
	  Time: 11-MAY-2014 20:13:48
	  Tracing not turned on.
	  Tns error struct:
		ns main err code: 12564

	TNS-12564: TNS:connection refused
		ns secondary err code: 0
		nt main err code: 0
		nt secondary err code: 0
		nt OS err code: 0


	***********************************************************************

	Fatal NI connect error 12514, connecting to:
	 (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=rac11pri-scan.tjsc.jus.br)(PORT=1521))(CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=maa_DGB)(CID=(PROGRAM=oracle)(HOST=rac11stb01.tjsc.jus.br)(USER=oracle))))

	  VERSION INFORMATION:
			TNS for Linux: Version 11.2.0.3.0 - Production
			TCP/IP NT Protocol Adapter for Linux: Version 11.2.0.3.0 - Production
	  Time: 11-MAY-2014 20:13:48
	  Tracing not turned on.
	  Tns error struct:
		ns main err code: 12564

	TNS-12564: TNS:connection refused
		ns secondary err code: 0
		nt main err code: 0
		nt secondary err code: 0
		nt OS err code: 0


	***********************************************************************

	Fatal NI connect error 12514, connecting to:
	 (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=rac11pri-scan.tjsc.jus.br)(PORT=1521))(CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=maa_DGB)(CID=(PROGRAM=oracle)(HOST=rac11stb01.tjsc.jus.br)(USER=oracle))))

	  VERSION INFORMATION:
			TNS for Linux: Version 11.2.0.3.0 - Production
			TCP/IP NT Protocol Adapter for Linux: Version 11.2.0.3.0 - Production
	  Time: 11-MAY-2014 20:13:48
	  Tracing not turned on.
	  Tns error struct:
		ns main err code: 12564

	TNS-12564: TNS:connection refused
		ns secondary err code: 0
		nt main err code: 0
		nt secondary err code: 0
		nt OS err code: 0
	Sun May 11 20:13:53 2014
	ALTER SYSTEM SET log_archive_dest_state_2='ENABLE' SCOPE=BOTH;
	Sun May 11 20:13:54 2014
	Thread 1 cannot allocate new log, sequence 4
	Checkpoint not complete
	  Current log# 1 seq# 3 mem# 0: +DG01/maastb/onlinelog/group_1.257.844716051
	  Current log# 1 seq# 3 mem# 1: +FRA/maastb/onlinelog/group_1.568.844716053
	LGWR: Standby redo logfile selected to archive thread 1 sequence 4
	LGWR: Standby redo logfile selected for thread 1 sequence 4 for destination LOG_ARCHIVE_DEST_2
	Thread 1 advanced to log sequence 4 (LGWR switch)
	  Current log# 2 seq# 4 mem# 0: +DG01/maastb/onlinelog/group_2.268.844716057
	  Current log# 2 seq# 4 mem# 1: +FRA/maastb/onlinelog/group_2.564.844716059
	Sun May 11 20:13:59 2014
	Archived Log entry 330 added for thread 1 sequence 3 ID 0x2b1de48e dest 1:
	ARC0: Standby redo logfile selected for thread 1 sequence 3 for destination LOG_ARCHIVE_DEST_2
	Sun May 11 20:14:07 2014
	ALTER SYSTEM SET log_archive_dest_2='service="maa"','LGWR SYNC AFFIRM delay=0 optional compression=disable max_failure=0 max_connections=1 reopen=300 db_unique_name="maa" net_timeout=30','valid_for=(all_logfiles,primary_role)' SCOPE=BOTH;
	ALTER SYSTEM ARCHIVE LOG
	Sun May 11 20:14:07 2014
	Thread 1 cannot allocate new log, sequence 5
	Checkpoint not complete
	  Current log# 2 seq# 4 mem# 0: +DG01/maastb/onlinelog/group_2.268.844716057
	  Current log# 2 seq# 4 mem# 1: +FRA/maastb/onlinelog/group_2.564.844716059
	******************************************************************
	LGWR: Setting 'active' archival for destination LOG_ARCHIVE_DEST_2
	******************************************************************
	LGWR: Standby redo logfile selected to archive thread 1 sequence 5
	LGWR: Standby redo logfile selected for thread 1 sequence 5 for destination LOG_ARCHIVE_DEST_2
	Thread 1 advanced to log sequence 5 (LGWR switch)
	  Current log# 1 seq# 5 mem# 0: +DG01/maastb/onlinelog/group_1.257.844716051
	  Current log# 1 seq# 5 mem# 1: +FRA/maastb/onlinelog/group_1.568.844716053
	Archived Log entry 339 added for thread 1 sequence 4 ID 0x2b1de48e dest 1:
	Archived Log entry 342 added for thread 2 sequence 4 ID 0x2b1de48e dest 1:
	Sun May 11 20:15:14 2014
	Destination LOG_ARCHIVE_DEST_2 is SYNCHRONIZED
	LGWR: Standby redo logfile selected to archive thread 1 sequence 6
	LGWR: Standby redo logfile selected for thread 1 sequence 6 for destination LOG_ARCHIVE_DEST_2
	Thread 1 advanced to log sequence 6 (LGWR switch)
	  Current log# 2 seq# 6 mem# 0: +DG01/maastb/onlinelog/group_2.268.844716057
	  Current log# 2 seq# 6 mem# 1: +FRA/maastb/onlinelog/group_2.564.844716059
	Sun May 11 20:15:15 2014
	Archived Log entry 344 added for thread 1 sequence 5 ID 0x2b1de48e dest 1:

Alertlog instância maastb2:
	Mon May 12 02:12:33 2014
	******************************************************************
	LGWR: Setting 'active' archival for destination LOG_ARCHIVE_DEST_2
	******************************************************************
	LGWR: Error 16009 disconnecting from destination LOG_ARCHIVE_DEST_2 standby host 'maa'
	Error 16009 for archive log file 3 to 'maa'
	LGWR: Failed to archive log 3 thread 2 sequence 3 (16009)
	Thread 2 advanced to log sequence 3 (LGWR switch)
	  Current log# 3 seq# 3 mem# 0: +DG01/maastb/onlinelog/group_3.269.844716063
	  Current log# 3 seq# 3 mem# 1: +FRA/maastb/onlinelog/group_3.562.844716065
	Mon May 12 02:12:36 2014
	Deleted Oracle managed file +FRA/maastb/flashback/log_10.540.846646525
	Archived Log entry 327 added for thread 2 sequence 2 ID 0x2b1de48e dest 1:
	Mon May 12 02:13:56 2014
	LGWR: Standby redo logfile selected to archive thread 2 sequence 4
	LGWR: Standby redo logfile selected for thread 2 sequence 4 for destination LOG_ARCHIVE_DEST_2
	Thread 2 advanced to log sequence 4 (LGWR switch)
	  Current log# 4 seq# 4 mem# 0: +DG01/maastb/onlinelog/group_4.270.844716067
	  Current log# 4 seq# 4 mem# 1: +FRA/maastb/onlinelog/group_4.559.844716071
	Mon May 12 02:13:59 2014
	Archived Log entry 329 added for thread 2 sequence 3 ID 0x2b1de48e dest 1:
	ARC3: Standby redo logfile selected for thread 2 sequence 3 for destination LOG_ARCHIVE_DEST_2
	Mon May 12 02:14:09 2014
	Thread 2 cannot allocate new log, sequence 5
	Checkpoint not complete
	  Current log# 4 seq# 4 mem# 0: +DG01/maastb/onlinelog/group_4.270.844716067
	  Current log# 4 seq# 4 mem# 1: +FRA/maastb/onlinelog/group_4.559.844716071
	Mon May 12 02:14:10 2014
	FAL[server, ARC0]: FAL archive failed, see trace file.
	Mon May 12 02:14:12 2014
	ARC3: Archive log rejected (thread 1 sequence 1) at host 'maa'
	FAL[server, ARC3]: FAL archive failed, see trace file.
	ARCH: FAL archive failed. Archiver continuing
	ORACLE Instance maastb2 - Archival Error. Archiver continuing.
	ARC0: Archive log rejected (thread 2 sequence 2) at host 'maa'
	FAL[server, ARC0]: FAL archive failed, see trace file.
	ARCH: FAL archive failed. Archiver continuing
	ORACLE Instance maastb2 - Archival Error. Archiver continuing.
	******************************************************************
	LGWR: Setting 'active' archival for destination LOG_ARCHIVE_DEST_2
	******************************************************************
	LGWR: Standby redo logfile selected to archive thread 2 sequence 5
	LGWR: Standby redo logfile selected for thread 2 sequence 5 for destination LOG_ARCHIVE_DEST_2
	Thread 2 advanced to log sequence 5 (LGWR switch)
	  Current log# 3 seq# 5 mem# 0: +DG01/maastb/onlinelog/group_3.269.844716063
	  Current log# 3 seq# 5 mem# 1: +FRA/maastb/onlinelog/group_3.562.844716065
	Mon May 12 02:15:14 2014
	Thread 2 cannot allocate new log, sequence 6
	Checkpoint not complete
	  Current log# 3 seq# 5 mem# 0: +DG01/maastb/onlinelog/group_3.269.844716063
	  Current log# 3 seq# 5 mem# 1: +FRA/maastb/onlinelog/group_3.562.844716065
	Destination LOG_ARCHIVE_DEST_2 is SYNCHRONIZED
	LGWR: Standby redo logfile selected to archive thread 2 sequence 6
	LGWR: Standby redo logfile selected for thread 2 sequence 6 for destination LOG_ARCHIVE_DEST_2
	Thread 2 advanced to log sequence 6 (LGWR switch)
	  Current log# 4 seq# 6 mem# 0: +DG01/maastb/onlinelog/group_4.270.844716067
	  Current log# 4 seq# 6 mem# 1: +FRA/maastb/onlinelog/group_4.559.844716071
	Mon May 12 02:15:19 2014
	Archived Log entry 346 added for thread 2 sequence 5 ID 0x2b1de48e dest 1:

Se você observar acima verá que o log_archive_dest_2 foi “reabilitado” e passou a sincronizar com o seu destino. Observe que ele respeitou as mesmas informações que havíamos definido desde o início da configuração do DG.

Os logs mais interessantes ficam no antigo primary (e novo standby), primeiro observe abaixo o log do Broker:

Log do Broker na maa1:
	05/12/2014 05:30:07
	Data Guard notifying Oracle Clusterware of database role change
	05/12/2014 05:30:29
	Creating process RSM0
	05/12/2014 05:30:47
	Physical RSM: Reinstatement... Converting old primary control file to physical standby control file
	05/12/2014 05:30:52
	Purging diverged redos on resetlogs branch 844762805, starting SCN 8878906
	Purged 0 archived logs
	05/12/2014 05:31:11
	Data Guard Broker shutting down
	RSM0 successfully terminated
	05/12/2014 05:31:14
	>> DMON Process Shutdown << 	05/12/2014 05:31:43 	>> Starting Data Guard Broker bootstrap <<
	Broker Configuration File Locations:
		  dg_broker_config_file1 = "+DATA/maa/dr1maa.dat"
		  dg_broker_config_file2 = "+FRA/maa/dr2maa.dat"
	05/12/2014 05:31:47
	Database needs to be reinstated or re-created, Data Guard broker ready
	05/12/2014 05:31:49
	DMON Registering service maa_DGB with listener(s)
	05/12/2014 05:31:54
	Apply Instance for Database maa set to maa1
	Data Guard notifying Oracle Clusterware of database role change
	Creating process RSM0
	05/12/2014 05:32:23
	Notifying Oracle Clusterware to buildup after REINSTATEMENT
	Data Guard notifying Oracle Clusterware to start services and other instances change
	05/12/2014 05:32:26
	Command REINSTATE DATABASE maa completed
	05/12/2014 05:32:50
	Command EDIT DATABASE maa SET PROPERTY ActualApplyInstance = maa1 completed

Se analisarmos com calma o log acima veremos que ele detectou qual o SCN necessário para o reinstate, realizou o reinstate e corrigiu o CRS também. No alertlog da instância do antigo primary nós temos:

Alertlog da maa1:
	Mon May 12 05:30:29 2014
	RSM0 started with pid=39, OS id=3988
	Mon May 12 05:30:32 2014
	Using STANDBY_ARCHIVE_DEST parameter default value as USE_DB_RECOVERY_FILE_DEST
	RFS[1]: Assigned to RFS process 3990
	RFS[1]: Database mount ID mismatch [0x2b1de48e:0x2b44d48f] (723379342:725931151)
	FLASHBACK DATABASE TO SCN 8878905
	Mon May 12 05:30:35 2014
	RFS[2]: Assigned to RFS process 3993
	RFS[2]: Database mount ID mismatch [0x2b1de48e:0x2b44d48f] (723379342:725931151)
	Flashback Restore Start
	Flashback Restore Complete
	Flashback Media Recovery Start
	 started logmerger process
	Parallel Media Recovery started with 2 slaves
	Mon May 12 05:30:38 2014
	Recovery of Online Redo Log: Thread 2 Group 3 Seq 99 Reading mem 0
	  Mem# 0: +DATA/maa/onlinelog/group_3.257.843489101
	  Mem# 1: +FRA/maa/onlinelog/group_3.284.843489101
	Recovery of Online Redo Log: Thread 1 Group 1 Seq 139 Reading mem 0
	  Mem# 0: +DATA/maa/onlinelog/group_1.272.843488553
	  Mem# 1: +FRA/maa/onlinelog/group_1.286.843488555
	Recovery of Online Redo Log: Thread 1 Group 2 Seq 140 Reading mem 0
	  Mem# 0: +DATA/maa/onlinelog/group_2.271.843488555
	  Mem# 1: +FRA/maa/onlinelog/group_2.285.843488555
	Incomplete Recovery applied until change 8878906 time 05/11/2014 18:08:14
	Mon May 12 05:30:47 2014
	Flashback Media Recovery Complete
	Completed: FLASHBACK DATABASE TO SCN 8878905
	alter database convert to physical standby
	ALTER DATABASE CONVERT TO PHYSICAL STANDBY (maa1)
	Flush standby redo logfile failed:1649
	Clearing standby activation ID 723321957 (0x2b1d0465)
	The primary database controlfile was created using the
	'MAXLOGFILES 192' clause.
	There is space for up to 188 standby redo logfiles
	Use the following SQL commands on the standby database to create
	standby redo logfiles that match the primary database:
	ALTER DATABASE ADD STANDBY LOGFILE 'srl1.f' SIZE 52428800;
	ALTER DATABASE ADD STANDBY LOGFILE 'srl2.f' SIZE 52428800;
	ALTER DATABASE ADD STANDBY LOGFILE 'srl3.f' SIZE 52428800;
	ALTER DATABASE ADD STANDBY LOGFILE 'srl4.f' SIZE 52428800;
	ALTER DATABASE ADD STANDBY LOGFILE 'srl5.f' SIZE 52428800;
	Waiting for all non-current ORLs to be archived...
	All non-current ORLs have been archived.
	Clearing online redo logfile 1 +DATA/maa/onlinelog/group_1.272.843488553
	Clearing online log 1 of thread 1 sequence number 139
	Clearing online redo logfile 1 complete
	Clearing online redo logfile 2 +DATA/maa/onlinelog/group_2.271.843488555
	Clearing online log 2 of thread 1 sequence number 140
	Clearing online redo logfile 2 complete
	Clearing online redo logfile 3 +DATA/maa/onlinelog/group_3.257.843489101
	Clearing online log 3 of thread 2 sequence number 99
	Clearing online redo logfile 3 complete
	Clearing online redo logfile 4 +DATA/maa/onlinelog/group_4.262.843489103
	Clearing online log 4 of thread 2 sequence number 98
	Clearing online redo logfile 4 complete
	Completed: alter database convert to physical standby
	Mon May 12 05:31:05 2014
	Shutting down instance (immediate)
	Shutting down instance: further logons disabled
	Stopping background process MMNL
	Stopping background process MMON
	License high water mark = 6
	alter database CLOSE NORMAL
	ORA-1109 signalled during: alter database CLOSE NORMAL...
	alter database DISMOUNT
	Shutting down archive processes
	Archiving is disabled
	...
	...
	Shutting down Data Guard Broker processes
	Mon May 12 05:31:14 2014
	Completed: Data Guard Broker shutdown
	NOTE: force a map free for map id 2
	Mon May 12 05:31:15 2014
	ARCH: Archival disabled due to shutdown: 1089
	Shutting down archive processes
	Archiving is disabled
	Mon May 12 05:31:15 2014
	Stopping background process VKTM
	...
	...
	Instance shutdown complete
	Mon May 12 05:31:23 2014
	Starting ORACLE instance (normal)
	****************** Large Pages Information *****************

	Total Shared Global Region in Large Pages = 0 KB (0%)
	...
	...
	Starting up:
	Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
	With the Partitioning and Real Application Clusters options.
	...
	...
	DMON started with pid=30, OS id=4114
	Mon May 12 05:31:31 2014
	alter database  mount
	This instance was first to mount
	Mon May 12 05:31:31 2014
	ALTER SYSTEM SET local_listener='(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=10.17.42.14)(PORT=1521))))' SCOPE=MEMORY SID='maa1';
	NOTE: Loaded library: /opt/oracle/extapi/64/asm/orcl/1/libasm.so
	NOTE: Loaded library: System
	SUCCESS: diskgroup DATA was mounted
	SUCCESS: diskgroup FRA was mounted
	NOTE: dependency between database maa and diskgroup resource ora.DATA.dg is established
	NOTE: dependency between database maa and diskgroup resource ora.FRA.dg is established
	Mon May 12 05:31:37 2014
	ARCH: STARTING ARCH PROCESSES
	NSS2 started with pid=33, OS id=4147
	Mon May 12 05:31:37 2014
	ARC0 started with pid=34, OS id=4149
	ARC0: Archival started
	...
	...	
	ARC2: Becoming the 'no FAL' ARCH
	ARC2: Becoming the 'no SRL' ARCH
	ARC1: Becoming the heartbeat ARCH
	Physical Standby Database mounted.
	Lost write protection disabled
	ARC1: Becoming the active heartbeat ARCH
	ARC1: Becoming the active heartbeat ARCH
	ARC3: Archival started
	ARC0: STARTING ARCH PROCESSES COMPLETE
	Completed: alter database  mount
	Mon May 12 05:31:43 2014
	Starting Data Guard Broker (DMON)
	Mon May 12 05:31:43 2014
	...
	...
	Using STANDBY_ARCHIVE_DEST parameter default value as USE_DB_RECOVERY_FILE_DEST
	Primary database is in MAXIMUM AVAILABILITY mode
	Changing standby controlfile to RESYNCHRONIZATION level
	Standby controlfile consistent with primary
	RFS[1]: Assigned to RFS process 4190
	RFS[1]: Selected log 8 for thread 2 sequence 4 dbid 722024964 branch 847284763
	Mon May 12 05:31:59 2014
	Primary database is in MAXIMUM AVAILABILITY mode
	Standby controlfile consistent with primary
	Standby controlfile consistent with primary
	RFS[2]: Assigned to RFS process 4192
	RFS[2]: Selected log 5 for thread 1 sequence 4 dbid 722024964 branch 847284763
	ALTER SYSTEM SET log_archive_trace=0 SCOPE=BOTH SID='maa1';
	ALTER SYSTEM SET log_archive_format='arch_%t_%s_%r.arc' SCOPE=SPFILE SID='maa1';
	ALTER SYSTEM SET standby_file_management='AUTO' SCOPE=BOTH SID='*';
	ALTER SYSTEM SET archive_lag_target=0 SCOPE=BOTH SID='*';
	ALTER SYSTEM SET log_archive_max_processes=4 SCOPE=BOTH SID='*';
	ALTER SYSTEM SET log_archive_min_succeed_dest=1 SCOPE=BOTH SID='*';
	ALTER SYSTEM SET db_file_name_convert='+DG01/maastb','+DATA/maa','+FRA/maastb','+FRA/maa' SCOPE=SPFILE;
	ALTER SYSTEM SET log_file_name_convert='+DG01/maastb','+DATA/maa','+FRA/maastb','+FRA/maa' SCOPE=SPFILE;
	ALTER SYSTEM SET fal_server='maastb' SCOPE=BOTH;
	ALTER DATABASE RECOVER MANAGED STANDBY DATABASE  THROUGH ALL SWITCHOVER DISCONNECT  USING CURRENT LOGFILE
	Attempt to start background Managed Standby Recovery process (maa1)
	Mon May 12 05:32:01 2014
	MRP0 started with pid=45, OS id=4198
	MRP0: Background Managed Standby Recovery process started (maa1)
	Mon May 12 05:32:01 2014
	RFS[3]: Assigned to RFS process 4194
	RFS[3]: Selected log 9 for thread 2 sequence 3 dbid 722024964 branch 847284763
	Mon May 12 05:32:01 2014
	RFS[4]: Assigned to RFS process 4196
	RFS[4]: Selected log 6 for thread 1 sequence 3 dbid 722024964 branch 847284763
	RFS[4]: New Archival REDO Branch(resetlogs_id): 847284763  Prior: 844762805
	RFS[4]: Archival Activation ID: 0x2b1de48e Current: 0x0
	RFS[4]: Effect of primary database OPEN RESETLOGS
	RFS[4]: Managed Standby Recovery process is active
	RFS[4]: Incarnation entry added for Branch(resetlogs_id): 847284763 (maa1)
	Mon May 12 05:32:01 2014
	Setting recovery target incarnation to 3
	Mon May 12 05:32:02 2014
	Deleted Oracle managed file +FRA/maa/flashback/log_2.339.847184419
	Mon May 12 05:32:02 2014
	Archived Log entry 632 added for thread 1 sequence 3 ID 0x2b1de48e dest 1:
	Archived Log entry 633 added for thread 2 sequence 3 ID 0x2b1de48e dest 1:
	Mon May 12 05:32:04 2014
	RFS[5]: Assigned to RFS process 4212
	RFS[5]: Opened log for thread 2 sequence 99 dbid 722024964 branch 844762805
	RFS[4]: Opened log for thread 1 sequence 140 dbid 722024964 branch 844762805
	Archived Log entry 634 added for thread 2 sequence 99 rlc 844762805 ID 0x2b1d0465 dest 2:
	 started logmerger process
	Managed Standby Recovery starting Real Time Apply
	RFS[3]: Opened log for thread 2 sequence 1 dbid 722024964 branch 847284763
	Parallel Media Recovery started with 2 slaves
	Mon May 12 05:32:07 2014
	Media Recovery start incarnation depth : 1, target inc# : 3, irscn : 8878917
	Archived Log entry 635 added for thread 2 sequence 1 rlc 847284763 ID 0x2b1de48e dest 2:
	RFS[5]: Opened log for thread 2 sequence 2 dbid 722024964 branch 847284763
	Waiting for all non-current ORLs to be archived...
	All non-current ORLs have been archived.
	Media Recovery Log +FRA/maa/archivelog/2014_05_12/thread_2_seq_99.603.847344725
	Identified End-Of-Redo (failover) for thread 2 sequence 99 at SCN 0x0.877b45
	Archived Log entry 636 added for thread 1 sequence 140 rlc 844762805 ID 0x2b1d0465 dest 2:
	Media Recovery Log +FRA/maa/archivelog/2014_05_12/thread_1_seq_140.605.847344725
	Identified End-Of-Redo (failover) for thread 1 sequence 140 at SCN 0x0.877b45
	Mon May 12 05:32:08 2014
	Completed: ALTER DATABASE RECOVER MANAGED STANDBY DATABASE  THROUGH ALL SWITCHOVER DISCONNECT  USING CURRENT LOGFILE
	Resetting standby activation ID 723321957 (0x2b1d0465)
	Media Recovery End-Of-Redo indicator encountered
	Media Recovery Continuing
	Media Recovery Log +FRA/maa/archivelog/2014_05_12/thread_2_seq_1.592.847344727
	Archived Log entry 637 added for thread 1 sequence 4 ID 0x2b1de48e dest 1:
	Media Recovery Waiting for thread 1 sequence 1
	Fetching gap sequence in thread 1, gap sequence 1-2
	Archived Log entry 638 added for thread 2 sequence 2 rlc 847284763 ID 0x2b1de48e dest 2:
	Mon May 12 05:32:10 2014
	Primary database is in MAXIMUM AVAILABILITY mode
	Standby controlfile consistent with primary
	Standby controlfile consistent with primary
	RFS[6]: Assigned to RFS process 4242
	RFS[6]: Selected log 5 for thread 1 sequence 5 dbid 722024964 branch 847284763
	Mon May 12 05:32:11 2014
	RFS[7]: Assigned to RFS process 4234
	RFS[7]: Opened log for thread 1 sequence 1 dbid 722024964 branch 847284763
	Mon May 12 05:32:11 2014
	RFS[8]: Assigned to RFS process 4238
	RFS[8]: Opened log for thread 1 sequence 2 dbid 722024964 branch 847284763
	Archived Log entry 639 added for thread 1 sequence 1 rlc 847284763 ID 0x2b1de48e dest 2:
	Media Recovery Log +FRA/maa/archivelog/2014_05_12/thread_1_seq_1.595.847344731
	Media Recovery Waiting for thread 1 sequence 2 (in transit)
	Mon May 12 05:32:14 2014
	Archived Log entry 640 added for thread 2 sequence 4 ID 0x2b1de48e dest 1:
	Archived Log entry 641 added for thread 1 sequence 2 rlc 847284763 ID 0x2b1de48e dest 2:
	Media Recovery Log +FRA/maa/archivelog/2014_05_12/thread_1_seq_2.593.847344731
	Media Recovery Log +FRA/maa/archivelog/2014_05_12/thread_2_seq_2.584.847344727
	Mon May 12 05:32:15 2014
	Primary database is in MAXIMUM AVAILABILITY mode
	Standby controlfile consistent with primary
	Standby controlfile consistent with primary
	RFS[9]: Assigned to RFS process 4250
	RFS[9]: Selected log 8 for thread 2 sequence 5 dbid 722024964 branch 847284763
	Mon May 12 05:32:25 2014
	Media Recovery Log +FRA/maa/archivelog/2014_05_12/thread_1_seq_3.604.847344723
	Media Recovery Log +FRA/maa/archivelog/2014_05_12/thread_2_seq_3.339.847344723
	Media Recovery Log +FRA/maa/archivelog/2014_05_12/thread_2_seq_4.581.847344735
	Media Recovery Log +FRA/maa/archivelog/2014_05_12/thread_1_seq_4.594.847344731
	Media Recovery Waiting for thread 1 sequence 5 (in transit)
	Recovery of Online Redo Log: Thread 1 Group 5 Seq 5 Reading mem 0
	  Mem# 0: +DATA/maa/onlinelog/group_5.263.843615365
	  Mem# 1: +FRA/maa/onlinelog/group_5.289.843615367
	Media Recovery Waiting for thread 2 sequence 5 (in transit)
	Recovery of Online Redo Log: Thread 2 Group 8 Seq 5 Reading mem 0
	  Mem# 0: +DATA/maa/onlinelog/group_8.259.843615383
	  Mem# 1: +FRA/maa/onlinelog/group_8.703.843615385
	Mon May 12 05:33:16 2014
	Changing standby controlfile to MAXIMUM AVAILABILITY level
	RFS[6]: Selected log 6 for thread 1 sequence 6 dbid 722024964 branch 847284763
	Mon May 12 05:33:16 2014
	Deleted Oracle managed file +FRA/maa/flashback/log_3.259.843502613
	Mon May 12 05:33:16 2014
	Media Recovery Waiting for thread 1 sequence 6 (in transit)
	Recovery of Online Redo Log: Thread 1 Group 6 Seq 6 Reading mem 0
	  Mem# 0: +DATA/maa/onlinelog/group_6.261.843615373
	  Mem# 1: +FRA/maa/onlinelog/group_6.670.843615373
	Archived Log entry 642 added for thread 1 sequence 5 ID 0x2b1de48e dest 1:
	Mon May 12 05:33:19 2014
	Standby controlfile consistent with primary
	RFS[9]: Selected log 9 for thread 2 sequence 6 dbid 722024964 branch 847284763
	Mon May 12 05:33:19 2014
	Archived Log entry 643 added for thread 2 sequence 5 ID 0x2b1de48e dest 1:
	Media Recovery Waiting for thread 2 sequence 6 (in transit)
	Recovery of Online Redo Log: Thread 2 Group 9 Seq 6 Reading mem 0
	  Mem# 0: +DATA/maa/onlinelog/group_9.256.843615389
	  Mem# 1: +FRA/maa/onlinelog/group_9.504.843615389

Fazendo uma análise detalhada nós encontramos que o banco “maa” sofreu um FLASHBACK para o SCN 8878905 e que foi concluído com sucesso. Após, o banco foi convertido para PHYSICAL STANDBY e devido aos passos do procedimento ele foi fechado. Depois ele foi religado (automaticamente) e detectou que precisava de sincronia de archives com o seu primary (maastb). Ele detectou que ocorreu o End-Of-Redo e após isso continuou aplicando os archives que ocorreram após esse evento (via RFS – Media Recovery Continuing). Nesse caso, dependendo do tempo que seu antigo primary ficou “off-line”, a quantidade de archivelog necessário para replicar pode ser grande. Pode ocorrer inclusive a necessidade de recuperar do backup (do novo primary) os archives que foram apagados e necessários para fechar o gap existente entre os ambientes.

Depois, o banco foi configurado para atuar como standby e usar o real-time apply (USING CURRENT LOGFILE). Aqui um detalhe, o Broker configura automaticamente o real-time apply se detectar que o nanco standby tem configurado os standbylog files. Observe que ele detectou a presença de uma nova incarnação devido ao RESETLOGS sofrido pelo flashback e pelo End-Of-Redo. Para finalizar, ao identificar que o primary opera em MAXIMUM AVAILABILITY ele sincroniza os archivelog e deixa tudo sincronizado.

DETALHES

Se você notar todos os logs até aqui (principalmente os últimos) irá ver como eles se encaixam, todas estas informações que aqui estão apresentadas de forma serial acontecem em paralelo. O fluxo basicamente é um comando reinstate detectado através que dgmgrl que é comunicado ao Broker do primary e do standby. Durante o processo de reinstate o antigo primary sofre um flashback, é reiniciado, sincronizado e configurado como novo standby. Como o banco primary está operando em MAXIMUM AVAILABILITY tudo fica sincronizado, primary e standby.

Todos estes comandos que antes foram manuais (observe nos artigos de failover e de reinstate manual) agora foram realizados automaticamente pelo Broker. Não precisamos detectar qual o SCN que ocorreu a troca de papeis, muito menos fazer o flashback e corrigir o CRS.

Claro que tudo isso precisou de algumas configurações, você acha que o Broker iria conseguir conectar e reiniciar as instâncias sem o registro manual delas no listener? Lembra do dbStaticCionnection do Broker que configuramos no final do artigo anterior? Viu como alguns passos anteriores (que podem ter sido chatos) poupam algum tempo?

Um detalhe ao religar outras instâncias do antigo primary. Ao subir ele irá detectar um “ORA-01153: an incompatible media recovery is active” e “ORA-10458: standby database requires recovery”, isso é “normal” e forçará um shutdown abort da sua instância. Basta você religar a instância em modo mount para tudo voltar ao normal. Isso está ligado a um detalhe co CRS que explico no próximo parágrafo.

Como nem tudo são flores na vida de um DBA, você precisará ajustar o CRS das instâncias. O Broker ajusta o CRS do ambiente, mas em partes. Ele “esquece” de trocar o modo de abertuda das bases, assim o antigo primary e novo standby estará definido para iniciar em modo open e o novo primary estará em modo mount (por ser o antigo standby). Observe abaixo isso e a sua correção:

No antigo primary e novo standby:
	[oracle@rac11pri02 ~]$ srvctl config database -d maa -v
	Database unique name: maa
	Database name: maa
	Oracle home: /u01/app/oracle/product/11.2.0.3/db_1
	Oracle user: oracle
	Spfile: +DATA/maa/spfilemaa.ora
	Domain:
	Start options: open
	Stop options: immediate
	Database role: PHYSICAL_STANDBY
	Management policy: AUTOMATIC
	Server pools: maa
	Database instances: maa1,maa2
	Disk Groups: DATA,FRA
	Mount point paths:
	Services:
	Type: RAC
	Database is administrator managed
	[oracle@rac11pri02 ~]$
	[oracle@rac11pri02 ~]$
	[oracle@rac11pri02 ~]$
	[oracle@rac11pri02 ~]$ srvctl modify database -d maa -s MOUNT
	[oracle@rac11pri02 ~]$
	[oracle@rac11pri02 ~]$
	[oracle@rac11pri02 ~]$
	[oracle@rac11pri02 ~]$ srvctl config database -d maa -v
	Database unique name: maa
	Database name: maa
	Oracle home: /u01/app/oracle/product/11.2.0.3/db_1
	Oracle user: oracle
	Spfile: +DATA/maa/spfilemaa.ora
	Domain:
	Start options: mount
	Stop options: immediate
	Database role: PHYSICAL_STANDBY
	Management policy: AUTOMATIC
	Server pools: maa
	Database instances: maa1,maa2
	Disk Groups: DATA,FRA
	Mount point paths:
	Services:
	Type: RAC
	Database is administrator managed
	[oracle@rac11pri02 ~]$

No novo primary (antigo standby):

[oracle@rac11stb02 ~]$ srvctl config database -d maastb -v
	Database unique name: maastb
	Database name:
	Oracle home: /u01/app/oracle/product/11.2.0.3/db_1
	Oracle user: oracle
	Spfile:
	Domain:
	Start options: mount
	Stop options: immediate
	Database role: PRIMARY
	Management policy: AUTOMATIC
	Server pools: maastb
	Database instances: maastb1,maastb2
	Disk Groups: DG01,FRA
	Mount point paths:
	Services:
	Type: RAC
	Database is administrator managed
	[oracle@rac11stb02 ~]$
	[oracle@rac11stb02 ~]$
	[oracle@rac11stb02 ~]$ srvctl modify database -d maastb -s OPEN
	[oracle@rac11stb02 ~]$
	[oracle@rac11stb02 ~]$
	[oracle@rac11stb02 ~]$ srvctl config database -d maastb -v
	Database unique name: maastb
	Database name:
	Oracle home: /u01/app/oracle/product/11.2.0.3/db_1
	Oracle user: oracle
	Spfile:
	Domain:
	Start options: open
	Stop options: immediate
	Database role: PRIMARY
	Management policy: AUTOMATIC
	Server pools: maastb
	Database instances: maastb1,maastb2
	Disk Groups: DG01,FRA
	Mount point paths:
	Services:
	Type: RAC
	Database is administrator managed
	[oracle@rac11stb02 ~]$

Se você configurar o CRS antes de ligar as outras instâncias do standby, não sofrerá dos ORA-XXXX listados antes.

AMBIENTE FINAL

Chegamos a um ambiente em que temos um DG com um banco primary chamado de maastb que começou a atuar assim após failover do seu antigo primary. O antigo primary, banco maa, sofreu failover e um posterior reinstate para ser reintegrado ao ambiente como physical standby. Observe o ambiente final:

DGMGRL> SHOW CONFIGURATION;

Configuration - dgrac

  Protection Mode: MaxAvailability
  Databases:
	maastb - Primary database
	maa    - Physical standby database

Fast-Start Failover: DISABLED

Configuration Status:
SUCCESS

DGMGRL>

Todos os passos que seguimos até agora permitiram fazer isso através do Broker. O uso deste, comparado com o modo “manual”, nos poupou e deu menos preocupações. Ainda não estamos em um ambiente ideal, precisamos configurar o fast-start failover para ter um ambiente confortável e auto gerenciável.

No fim recomendo ligar as outras instância do antigo primary e fazer um backup do ambiente.

2 thoughts on “Oracle e MAA – Artigo VI – Failover Broker

  1. Pingback: Oracle e MAA – Artigo VII – Switchover Broker | Have you hugged your backup today?

  2. Pingback: Oracle e MAA – Artigo VIII – Fast-Start Failover | Have you hugged your backup today?

Leave a Reply

Your email address will not be published. Required fields are marked *