Quick post for today. Recently needed to upgrade to the last version of Autonomous Health Framework (AHF) from an Exadata running GI 19.5. In this particular case the GI was not even running AHF, but still using the standalone TFA that comes with it. So, here I will show how to upgrade to the last version of AHF and replacing the TFA as well.
Recently during the Exadata patch, one database node reported an issue during the patchmgr and stopped the patch apply. The error was related to missing volumes (LVDoNotRemoveOrUse) at LVM. During the post, you can check the error, but please take attention that it changes some LVM config file contents. So, check correctly the step executed and (if possible) open pro-active SR to be sure what you will be doing.
Sometimes even a perfectly running server can pass over issues after a simple reboot. And was exactly that occurred recently with one Exadata database node. And was not the first time that the same error appears (and since there is no well-documented step by step to fix it I documented them below). So, check how to fix the issue related to the read-only locking_type of LVM detected by dracut.
If you search around about how to patch Oracle Database you will find a lot of blog posts teaching how to patch your Oracle Home (OH) (I will not put the list here because it will be enormous – but just follow Mike Dietrich). But most of them write nothing about OCW, how to patch it, or if it is needed to patch OCW. And unfortunately, even Oracle is not clear about that.
Just to complement, recently Liron Amitzi got one issue related to OCW. And if you search more, you will find that Frits Hoogland wrote something about it too. But in the end, need I to concern about OCW? And, what is OCW?
OCW means Oracle Clusterware, and basically is the core for the Grid Infrastructure, it is everything there. But for OH is important too because if the database needs to communicate with GI Clusterware it uses the OCW binaries/libraries that are at OH (like srvctl, crstctl) to do that. So, if have some kind of bug at this portion of OCW, it needs to be patched.
The point is that the only place that you can find the OCW patch is under the GI RU patch. Look at the readme for last GI RU 188.8.131.52.200714 (Patch 31305339):
And if you look at the readme for DB RU 184.108.40.206.200714 (Patch 31281355) there is no reference to the OCW patch. So, if apply just the DB RU the OCW will not be updated.
And just to remember you that patch 31305087 does not exist alone to be downloaded:
The Platinum architecture is the last defined at MAA references and is the highest level of protection that you can achieve for MAA. It goes beyond the Gold protection (that I explained in my previous post) and you can have application continuity even version upgrade for your database.
The image above was taken from https://www.oracle.com/a/tech/docs/maa-overview-onpremise-2019.pdf
The Gold architecture for MAA is used to emphasis the application continuity. All the possible outages (planned or no) are protected by Oracle features. Here we are one step further and start to design using multi-site architecture. Data Guard, RAC, Oracle Clusterware, everything is there. But even with these, ZDLRA is still needed to allow complete protection.
The image above taken from https://www.oracle.com/a/tech/docs/maa-overview-onpremise-2019.pdf.
With the MAA references, we have the blueprints and highlights how to protect them since the standalone/single instance until the multiple site database. But for Gold we are beyond RPO and RTO, they are important but application continuity and data continuity join to complete the whole picture.
The MAA defined Silver architecture for database environments that use (or need) high availability to survive for outages. The idea is having more than one single instance running, and to do that, it relies on Oracle Clusterware and Engineered Systems to mitigate the single point of failure. But is not just a database that gains with this, the Silver architecture is the first step to have application continuity. And again, ZDLRA is there since the beginning.
As you can see above, the Silver by MAA blueprints improves compared with Bronze architecture that I spoke at the last post. But the basic points are there: RPO and RTO. They continue to base rule here. And the goals are the same: Data Availability, Data Protection, Performance (no impact), Cost (lower cost), and Risk (reduce). More technical details here at the MAA Overview doc.
Oracle Maximum Availability Architecture (MAA) means more than just Data Guard or Golden Gate to survive outages, is related to data protection, data availability, and application continuity. MAA defines four reference architectures that can be used to guide during the deploy/design of your environment, and ZDLRA is there for all architectures.
Image above taken from https://www.oracle.com/a/tech/docs/maa-overview-onpremise-2019.pdf.
With the MAA references, we have the blueprints and highlights how to protect them since the standalone/single instance until the multiple site database. The MAA goal is to survive an outage but also sustain: Data Availability, Data Protection, Performance (no impact), Cost (lower cost), and Risk (reduce).
The Oracle Maximum Availability Architecture (MAA) is the correct way to protect your Oracle database environment (and investment). It covers from a simple single instance to Exadata/Engineered Systems RAC and a multi-site database with Data Guard protection. But do you know that to reach the MAA (whatever the architecture level that you are protecting) you need to use ZDLRA?
So, I will start a series of posts to cover the MAA and ZDLRA. Discussing what you need to do (and how) to reach the maximum level of availability as is at the MAA architecture (as defined in the documentation and best practices: Oracle Maximum Availability Architecture (MAA) Blueprints for On-Premises, MAA Best Practices – Oracle Database, and Maximum Availability with Oracle Database 19c).
The question is why ZDLRA is needed? The point from ZDLRA is that it can (and needed to be used) to protect and reach zero RPO to all architectures. ZDLRA is more (much more) than just a backup appliance, is the core of every MAA design. You can’t reach zero RPO without using it.
The process of patch ZDLRA is not complicated, but it is important to be aware of some details. The most important is from where you are until where you want to go. This is crucial because it will define what commands you will need to execute.
If you read the previous post about the process, you can notice that I was running the ZDLRA 12.2 version, and forwarded to 19.2 version. In that case, I needed to use the upgrade path since I was changing the major release and the racli commands had the “upgrade” parameter.
In this post I will show how to do a simple update (or patch apply) for ZDLRA, this means that I will remain inside the same major release for recovery appliance library. Some steps and checks are the same.
Whatever you need to do (patch or upgrade), the startup point it is the note 1927416.1 that cover the supported versions for ZDLRA. There it is possible to find all the supported versions for the recovery appliance library as well as the Exadata versions. Please, not upgrade the Exadata stack with a version that is not listed on this page.