The proceed to patch/upgrade ZDLRA is not complicated, but as usual, some details need to be checked before starting the procedure. Since it is one engineering system based at Exadata, the procedure has one part that (maybe) needs to upgrade this stack too. But, is possible to upgrade just the recovery appliance library.
Whatever if need or no to upgrade the Exadata stack, the upgrade for recovery appliance library is the same. The commands and checks are the same. The procedure described in this post cover the upgrade of the recovery appliance library. For Exadata stack, it is in another post.
Where we are
Before even start the patch/upgrade it is important to know exactly which version you are running. To do this execute the command racli version at you database node:
HAIP (High Availability IP) is not supported for the Exadata environment but can occur (if you did not create the cluster using OEDA) that HAIP became in use. And this particularity true for ZDLRA. So, during the upgrade from the previous version (12.2) to a higher version, it is needed to remove HAIP.
Usually, when we upgrading from 12.2 to 18c the HAIP is removed from Exadata. If the upgrade is from 12.1, and HAIP is there, it continues and is not removed by the upgrade process. If you are using HAIP and your GI is 12.1, this procedure as-is described here can’t be used (need some adaptation), because of some requirements from ASM+ACFS+DB. But since this is a preliminary step from a GI upgrade, the focus is to disable and remove it from GI.
The HAIP is not needed for Exadata because by architecture the InfiniBand network already defines (per server) two IP’s to avoid the single point of failure. So, it is not needed to create an additional layer (HAIP and virtual IP), that does the same that already exists by network design.
When configuring a database with Real-Time Redo at ZDLRA it is important to check the deletion policy for archivelog. This is even more important when the database is protected with dataguard. I already wrote about Real-time Redo in this previous post, and when using with dataguard in another post.
But sometimes (during maintenance as an example) you can face the error RMAN-08137: warning: archived log not deleted, needed for standby or upstream capture process if the deletion policy of archivelog is not aligned with your needs.
In my previous post, I showed how the clone to tape occurs for ZDLRA. But as explained, the clones occur through the scheduler and follow some rules. For full backup, as an example, it clones the last available.
But sometimes, it is needed to call the clone for some specific backup, maybe to do long-term storage to follow some regimentation/law. And if we leverage this for the clone, jobs can maybe take to long, or clone more that you need.
As you saw in my last post, the configuration to enable clone to tape for ZDLRA it is not complicated, but you need to take care of some details to avoid errors. Besides that, ZDLRA relies on OSB to do that (when configured with native tape support) and this has some details that you need to be aware of.
In this post, I will show how the clone to tape works for ZDLRA. And how you can check some details about OSB.
With ZDLRA you can clone your backups to tape using two ways. The first is using third-party software and the second is using Oracle Secure Backup (OSB). This integration from ZDLRA and OSB is the native way to do that.
Cloning backups to tape will help to offload backups from ZDLRA (reducing the space usage if you need to sustain long recovery windows), and add another layer of protection (since you can put tapes in a third site).
Here I will show how easy is to configure the OSB backup and how to integrate it into your backup policy.
For ZDLRA the protection policies have a significant role in the appliance management, but not just that, for the architecture design too. And usually (and unfortunately) policies do not take a lot of attention as deserved.
To create a good ZDLRA design, and avoid future problems, it is important to understand all the requirements for the protection policies and all the impacts. You can check the official documentation for this, but I will explain deeply the details that can pass without you notice them in the documentation.
The Virtual Private Catalog (VPC) user is a key piece for a good ZDLRA architecture design. The detail is not how to create it, but how to correctly integrate it in your design, and this is more important if you have replicated ZDLRA or using Real-Time redo transport.
Here I will show and discuss VPC implications for your architecture design when deploying ZDLRA. Even for a complete and new implementation (together with database) or adding ZDLRA at your already running environment. All points here try to show some perspectives and key points that can help you to correct use and define VPC’s.
ZDLRA can be used from a small single database environment to big environments where you need protection in more than one site at the same time. At every level, you can use different features of ZDLRA to provide desirable protection. Here I will show how to reach zero RPO for both primary and standby databases. All the steps, doc, and tech parts are covered.
You can check the examples the reference for every scenario int these two papers from the Oracle MAA team: MAA Overview On-Premises and Oracle MAA Reference Architectures. They provide good information on how to prepare to reduce RPO and improve RTO. In resume, the focus is the same, reduce the downtime and data loss in case of a catastrophe (zero RPO, and zero RPO).
If you looked both papers before, you saw that to provide good protection is desirable to have an additional site to, at least, send the backups. And if you go higher, for GOLD and PLATINUM environments, you start to have multiple sites synced with data guard. These Critical/Mission-critical environments need to be protected for every kind of catastrophic failure, from disk until complete site outage (some need to follow specific law’s requirements, bank as an example).
And the focus of this post is these big environments. I will show you how to use ZDLRA to protect both sites, reaching zero RPO even for standby databases. And doing that, you can survive for a catastrophic outage (like entire datacenter failure) and still have zero RPO. Going further, you can even have zero RPO if you lose completely on site when using real-time redo for ZDLRA, and this is not written in the docs by the way.