ZDLRA + MAA, Protection for Platinum Architecture

The Platinum architecture is the last defined at MAA references and is the highest level of protection that you can achieve for MAA. It goes beyond the Gold protection (that I explained in my previous post) and you can have application continuity even version upgrade for your database.

The image above was taken from https://www.oracle.com/a/tech/docs/maa-overview-onpremise-2019.pdf

Platinum Architecture

The biggest difference for Platinum architecture is the usage of GoldenGate to sync both sides of the design. Everything used until now continues to be valid here. As I wrote before, basically, we have the same design of Gold (on every side) and synced with GoldenGate.

As you can see above (and compared with the Gold architecture) the focus is beyond RPO and RTO, as you can see below in the old version of the MAA Reference Architecture doc the current reference evolved (compare both images).

Check that with the new reference there is no single point of failure. And as define for required in this architecture, you can upgrade your database to a new version and your application continues to run without interruption. So, we are talking beyond protection, the focus in application continuity at finest allowing to survive to several outages. If you want to check more technical details please read this doc. If you look closely you need to have four database failures (complete) to have unavailability.

The Architecture

And the outage matrix is (and is the same as Gold):

Event	RPO	RTO
Hardware Error	Zero	Zero
Database Error (SW)	Zero	Zero
Database Corruption	Zero	Zero
Site Outage	Zero	Zero

Hardware Error: Even occurring HW error for the machines running the primary database, you can failover/switchover to the standby without losing data. And since your primary database runs over RAC, before switch/fails you need to face all nodes failings (that is not common). And this is independent of the side that you running your production, they are the same. So, with two nodes RAC, you need to have eight HW failures before the interruption.
Database Error: For SW error, using DG, is not an outage that will cause data loss or even application continuity error. Remember that you can patch the standby. So, in case of some error, you can patch it and switchover and the application continues to runs without a gasp.
Database Corruption: For logical and physical corruption both sides can be used to restore the exact block. So, there is no outage caused by this error (on both sides).
Site Outage: The point for site outage is important because even with complete loss your data is in another site and read for usage. But you need to guarantee that both (PRY and STB) are synced. And even going furthers, the sync between both sides of GoldenGate needs to be synced.

The point for Platinum architecture is GoldenGate. It is there to reach zero RPO/RTO between datacenters, and have application continuity even during database upgrades. But be aware that also leaves you with some datatype restrictions. And to add more, GoldenGate is more complex to administrate than a DataGuard because of the way that replicates the data.

So, if you want to use the Platinum architecture, be aware that you need to be mature enough to troubleshoot the Goldengate. If you have one problem with the synchronization (and both sides are not synced) and you face one site outage, you can have data loss.

ZDLRA to protect Platinum Architecture

Since we need to protect the database even in case of outage for the entire datacenter, ZDLRA is in place to do that. The idea is the same as Gold architecture. Again, look at the schematic at begin of the post and imagine that you have a network outage between sites, and if you have a second failure inside the remaining site, what will be your RPO? It will be huge, if you have an HW failure you will lose data (think in one environment that you are not running over Exadata). So, ZDLRA is your last line of defense. Is there to protects and guarantee the ZERO RPO even when the second standby is offline.

The outage matrix when using ZDLRA is:

Event	RPO	RTO
Hardware Error	Zero	Zero
Database Error (SW)	Zero	Zero
Database Corruption	Zero	Zero
Site Outage	Zero	Zero

And again, in case of failure of the database the ZDLRA will generate a partial archivelog until the moment of failure. This means that for every failure HR/SW/Corruption ZDLRA will protect until the last SCN generated by the database. So, whatever the outage, you have one copy of all your transactions at an external place that is not linked (at any level like storage/HW) with your database.

How to use ZDLRA

The way to use ZDLRA is quite different here. The GG and DG are responsible to sync the databases, and even to provide the RPO/RTO between sites. They are the first two lines of protection. For the second line, we use ZDLRA.

So, more or less, the steps here are the same as Gold. I don’t have anything published about GG, but on the internet, there are several examples of how to configure full replication using GG:

Following these steps, you will have almost everything: Primary and Standby databases running with Oracle RAC and synced. Both will be protected by the corresponding ZDLRA. If you need more information about ZDLRA itself you can read my blog series about ZDLRA. For ZDLRA and MAA integration, more details can be check at this doc about ZDLRA and MAA.

Trying to protect without ZDLRA

So, if we don’t want to use ZDLRA here you can still have (more or less the same RPO/RTO) but will have a single point of failure. If one site is down, the other side will be unprotected and with RPO different than zero (will be like Gold architecture). Remember that is just ZDLRA that has real-time redo that externally protects you until your last transaction.

Again, it is really hard to design, without using Oracle, several levels of protection for outages for Platinum architecture. Several technologies (and layers of that) will be needed to put in place to try to reach the same level that ZDLRA can provide to you. Remember, we are talking about one design that has in one datacenter RAC databases synced with DataGuard, and this database will be synced by GG to another datacenter and database that runs with RAC+RAC synced with DG. So, we are talking about for, at least, 4 different storages to provide 4 NAS locations for the alternate archivelog destination. And again, just to have an outside solution for archivelog copy, but what about that is in redolog buffer memory? Who will protect them? So, not a cheap solution at all.

References

I will list some references for Platinum architecture that you can read from Oracle itself:

Oracle Maximum Availability Architecture (MAA)

Oracle MAA Reference Architectures (Old version)

Maximum Availability with Oracle Database 19c

Multitenant MAA Solutions (includes the “Aurous” Option)

Best Practices for Database Consolidation

Oracle® Database High Availability Overview

Disaster Recovery for Oracle Database Zero Data Loss Recovery Appliance, Active Data Guard and Oracle GoldenGate

Deploying the Zero Data Loss Recovery Appliance in a Data Guard Configuration

Disclaimer: “The postings on this site are my own and don’t necessarily represent my actual employer positions, strategies or opinions. The information here was edited to be useful for general purpose, specific data and identifications were removed to allow reach the generic audience and to be useful for the community. Post protected by copyright.”

Fernando Simon

Have you hugged your backup today?