The Oracle Maximum Availability Architecture (MAA) is the correct way to protect your Oracle database environment (and investment). It covers from a simple single instance to Exadata/Engineered Systems RACand a multi-site database with Data Guard protection. But do you know that to reach the MAA (whatever the architecture level that you are protecting) you need to use ZDLRA?
The question is why ZDLRA is needed? The point from ZDLRA is that it can (and needed to be used) to protect and reach zero RPO to all architectures. ZDLRA is more (much more) than just a backup appliance, is the core of every MAA design. You can’t reach zero RPO without using it.
Oracle Data Guard Broker allows the database administrators to automate some tasks and an easy way to configure properly a lot of features and details for data guard environments. The Fast-Start FailOver (FSFO) allows the broker to automatically failover to standby database in case of failure of the primary. But until 19c the only option is always to trigger the failover. This changed at 19c with a nice new feature that allows us to put FSFO in Observe-Only Mode.
In this post, I will focus just on new features for FSFO like Observer-Only Mode and Health Conditions for it. Lag and other details will not be covered here.
The Observe-Only Mode is a simple change that allows putting the FSFO to just observing/monitoring the DG environment, but in case of failure, it does not change the roles between primary and standby. Simple like that. As the Broker documentation for Observe-Only Mode says:
The observe-only mode enables you to test the impact of using fast-start failover in your configuration, without making any actual changes to the configuration.
When you change the parameters for the database is possible to specify the db_unique_name and allow more control where you want to apply/use it. This is very useful to limit the scope, but you need to be aware of some collateral effects. Even not present at the official doc, you can use it. But check here some details that you need to take care of.
When configuring a database with Real-Time Redo at ZDLRA it is important to check the deletion policy for archivelog. This is even more important when the database is protected with dataguard. I already wrote about Real-time Redo in this previous post, and when using with dataguard in another post.
But sometimes (during maintenance as an example) you can face the error RMAN-08137: warning: archived log not deleted, needed for standby or upstream capture process if the deletion policy of archivelog is not aligned with your needs.
The Virtual Private Catalog (VPC) user is a key piece for a good ZDLRA architecture design. The detail is not how to create it, but how to correctly integrate it in your design, and this is more important if you have replicated ZDLRA or using Real-Time redo transport.
Here I will show and discuss VPC implications for your architecture design when deploying ZDLRA. Even for a complete and new implementation (together with database) or adding ZDLRA at your already running environment. All points here try to show some perspectives and key points that can help you to correct use and define VPC’s.
ZDLRA can be used from a small single database environment to big environments where you need protection in more than one site at the same time. At every level, you can use different features of ZDLRA to provide desirable protection. Here I will show how to reach zero RPO for both primary and standby databases. All the steps, doc, and tech parts are covered.
You can check the examples the reference for every scenario int these two papers from the Oracle MAA team: MAA Overview On-Premises and Oracle MAA Reference Architectures. They provide good information on how to prepare to reduce RPO and improve RTO. In resume, the focus is the same, reduce the downtime and data loss in case of a catastrophe (zero RPO, and zero RPO).
If you looked both papers before, you saw that to provide good protection is desirable to have an additional site to, at least, send the backups. And if you go higher, for GOLD and PLATINUM environments, you start to have multiple sites synced with data guard. These Critical/Mission-critical environments need to be protected for every kind of catastrophic failure, from disk until complete site outage (some need to follow specific law’s requirements, bank as an example).
And the focus of this post is these big environments. I will show you how to use ZDLRA to protect both sites, reaching zero RPO even for standby databases. And doing that, you can survive for a catastrophic outage (like entire datacenter failure) and still have zero RPO. Going further, you can even have zero RPO if you lose completely on site when using real-time redo for ZDLRA, and this is not written in the docs by the way.
The idea for Real-Time Redo is to reach zero RPO for every kind of database and this includes ones with and without DG. As you can see in my last post, where I showed how to configure Real-Time Redo for one database, some little steps need to be executed and they are pretty similar than a remote destination for archivelog for DG.
Real-time redo transport is the feature that allows you to reduce to zero the RPO (Recovery Point Objective) for your database. Check how to configure real-time redo, the steps, parameters, and other details that need to be modified to enable it.
The idea behind real-time redo transport it is easy, basically the ZDLRA it is a remote destination for your redo log buffers/archivelogs of your database. It is really, really, similar to what occurs for data guard configurations (but here you don’t need to set all datafiles as an example). It is not the same too because ZDLRA can detect if the database stops/crash and will generate the archivelog (at ZDLRA side) with all the received redo and this can be used to restore to, at least zero/sub-seconds, of data loss.
This post starts from one environment that you already enrolled in the database at ZDLRA. I already wrote about how to do that, you can check here in my previous post. This is the first post about real-time redo, here you will see how to configure and verify it is working.
This article closes the series for DG and Fast-Start Failover that I covered with more details the case of isolation can leverage the shutdown of your healthy/running primary database. The “ORA-16830: primary isolated from fast-start failover partners”.
In the first article, I wrote about how one simple detail that impacts dramatically the reliability of your MAA environment. Where you put your Observer in DG environment (when Fast-Start Failover is in use) have a core figure in case of outages, and you can face Primary isolation and shutdown. Besides that, there is no clear documentation to base yourself on “pros and cons” to define the correct place for Observer. You read more in my article here.
In the second article, I wrote about one new feature that can help to have more protected and cover more scenarios for Fast-Start Failover/DG. Using Multiple Observers you can remove the single point of failure and allow you to put one Observer in each side of your environment (primary, standby, and a third one). You can read more in my article here.
In this last article, I discuss how, even using all the features, there is no perfect solution. Another point is discussing here is how (maybe) Oracle can improve that. Below I will show more details that even multiple observers continue to shutdown a healthy primary database. Unfortunately, it is a lot of tech info and is a log thread output. But you can jump directly to the end to see the discussion about how this can be improved.
Recently I made a post about a little issue that I got with Oracle Data Guard. In that scenario, because of outage in the standby datacenter, healthy primary database shutdown with error “ORA-16830: primary isolated…”. Just to remember that the database was running with Maximum Availability, Fail-Start Failover enabled, and (the most important detail) the Observer was running in the standby datacenter too.
The point from my previous post tried to show that does not exists one doc that provides full details about “pros” and “cons” where put your observer. Whatever the place, on the primary datacenter or in standby, it has little details to check. Even the best (ideal) scenario, with a third datacenter, can be tough to sustain.
Here I will try to show one option that can help you and improve the reliability of your MAA/DG environment. At least, you will have more options to decide how to protect your database. Bellow, I show some details about how to configure and use multiple observers, but if you want to jump and see a little concern you can directly to the end of the post.