Recently I made two posts about the process for patch/upgrading your 21 Grid Infrastructure (GI) while the databases continue to be running. The first post shows how to do this using the GUI interface, and the second one show more details about the process for AFD/ACFS Kernel Driver Update. But here in this post, I will show how to do the Zero-Downtime Patch (zeroDowntimeGIPatching – ZDGIP) in silent mode.
This way to do the patch is important because allows you to automatize it. You can create your own script and call it (using Ansible, Puppet, Chief, etc.) to upgrade your servers (or farms) remotely.
The current environment is the same of the first post:
OEL 8.4 Kernel 5.4.17-2102.201.3.el8uek.x86_64.
Oracle GI 21c, version 21.3 with no one-off or patches installed.
Oracle Database 21c, RU 21.5 (with OCW 21.5).
TFA version is 21.4 (last available in March 2022).
And I will apply the same RU 21.5 (220.127.116.11.220118) for GI which is patch 33531909.
The patch process is almost the same as the first post, the main change is the response file and the way to call the gridSetup.sh. So, for this reason, I recommend for you read the first (and second) post. Below you will see a quick review of previous steps and a focus on the new
Recently I made one post about how to use the new feature -zeroDowntimeGIPatching when patching the Grid Infrastructure for 21c. It is a new feature/option that allows your database continues to be running while the grid is patched. You can see my post here. But during that post I talked about the usage of -updateosfiles when calling the rootcrs.sh and want to clarify some details and provide better examples.
Before you think about upgrading the ACFS/AFD drivers you need to check if they are compatible with the version or kernel that you are running. The only place to check this is the MOS note ACFS Support On OS Platforms (Certification Matrix). (Doc ID 1369107.1). On that note, you will see tables for each major version (18c, 19c, 21c), and you can see the versions of Linux Version and Kernel versions that are compatible. Below is marked for OEL 8:
And you can see that my version of Linux Kernel is compatible. If your version is not compatible, not update the ACFS/AFD kernel drivers.
Oracle 21c delivered a lot of new features and for Grid infrastructure one of the most interesting is the zero-downtime patch (zeroDowntimeGIPatching). This basically allows your database continues to be running while you patch/upgrade your GI. The official doc can be seen here. Let’s say that is an evolution of the Out of Place (OOP) patch for GI.
In this post I will show how to do that, but some details before starting:
This post shows how to do the zero-downtime patch using GUI mode.
I will do another post showing how to do in silent mode the same procedure. So, it can be automatized.
In a third post, I will detail how the zero-downtime works behind the scenes and will discuss some logs.
Recently I posted about the upgrade of AHF/TAF from version 19 to 21 at Exadata and also for ODA. But with version 21 of AHF, some collections are made automatically and this can impact your space usage. Here you can see how to check this and disable/modify some of these.
The automatic collection for AHF/TFA is a feature that generates the diagnostic packages (to send to Oracle) when some specifics errors appear in the database. The collected errors follow some patterns like ORA-0600, ORA-07445, and several others. The basic idea can be seen in the official doc here and in the image below (retried directly from the official doc).
Recently I made a post about how to upgrade the TFA to AHF at Oracle Exadata. For today, the post is about how to upgrade AHF at ODA. The procedure is quite simple, but you need to check where to up it and if everything is up and running.
With the release of the 21c of Oracle Database is time to study new features. The 21c version of Grid Infrastructure (and ASM) was released and an upgrade from orders versions can be executed. It is not a complex task, but some details need to be verified. In this post, I will show the steps to upgrade the Grid Infrastructure to 21c. If you need to upgrade from 18c to 19c you can check my previous post.
OS version: If it is compatible with 21c and if you are using asmlib or asm filter, check kernel modules and certification matrix.
Current GI: Maybe you need to apply some patches. The best practice recommends using the last version.
Used features (like AFD, HAIP, Resources): Check compatibilities of the old features with 21c. Maybe you need to remove HAIP or change your crs resources.
21c requirements for GI: Check memory, space, and database versions.
Oracle Home patches (for databases running): Check if you need to apply some patches for your database to be compatible with GI 21c.
Backup of your Databases: Just in case you need to roll back something.
The environment that I am using for this example is:
Oracle Linux 8.4.
GI cluster with two nodes.
ASM Filter for disk access.
19.11 for GI.
19.12 for Oracle Home database.
I personally recommend upgrading your current GI to 19c before upgrade or apply one of the last PSU for your running version. This avoids a lot of errors since most of the know bugs will be patched. Check below my environment:
Quick post for today. Recently needed to upgrade to the last version of Autonomous Health Framework (AHF) from an Exadata running GI 19.5. In this particular case the GI was not even running AHF, but still using the standalone TFA that comes with it. So, here I will show how to upgrade to the last version of AHF and replacing the TFA as well.
If you search around about how to patch Oracle Database you will find a lot of blog posts teaching how to patch your Oracle Home (OH) (I will not put the list here because it will be enormous – but just follow Mike Dietrich). But most of them write nothing about OCW, how to patch it, or if it is needed to patch OCW. And unfortunately, even Oracle is not clear about that.
Just to complement, recently Liron Amitzi got one issue related to OCW. And if you search more, you will find that Frits Hoogland wrote something about it too. But in the end, need I to concern about OCW? And, what is OCW?
OCW means Oracle Clusterware, and basically is the core for the Grid Infrastructure, it is everything there. But for OH is important too because if the database needs to communicate with GI Clusterware it uses the OCW binaries/libraries that are at OH (like srvctl, crstctl) to do that. So, if have some kind of bug at this portion of OCW, it needs to be patched.
The point is that the only place that you can find the OCW patch is under the GI RU patch. Look at the readme for last GI RU 18.104.22.168.200714 (Patch 31305339):
And if you look at the readme for DB RU 22.214.171.124.200714 (Patch 31281355) there is no reference to the OCW patch. So, if apply just the DB RU the OCW will not be updated.
And just to remember you that patch 31305087 does not exist alone to be downloaded:
HAIP (High Availability IP) is not supported for the Exadata environment but can occur (if you did not create the cluster using OEDA) that HAIP became in use. And this particularity true for ZDLRA. So, during the upgrade from the previous version (12.2) to a higher version, it is needed to remove HAIP.
Usually, when we upgrading from 12.2 to 18c the HAIP is removed from Exadata. If the upgrade is from 12.1, and HAIP is there, it continues and is not removed by the upgrade process. If you are using HAIP and your GI is 12.1, this procedure as-is described here can’t be used (need some adaptation), because of some requirements from ASM+ACFS+DB. But since this is a preliminary step from a GI upgrade, the focus is to disable and remove it from GI.
The HAIP is not needed for Exadata because by architecture the InfiniBand network already defines (per server) two IP’s to avoid the single point of failure. So, it is not needed to create an additional layer (HAIP and virtual IP), that does the same that already exists by network design.
The REPLACE DISK command was released with 12.1 and allow to do an online replacement for a failed disk. This command is important because it reduces the rebalance time doing just the SYNC phase. Comparing with normal disk replacement (DROP and ADD in the same command), the REPLACE just do mirror resync.
Basically, when the REPLACE command is called, the rebalance just copy/sync the data from the survivor disk (the partner disk from the mirror). It is faster since the previous way with drop/add execute a complete rebalance from all AU of the diskgroup, doing REBALANCE and SYNC phase.
The replace disk command is important for the SWAP disk process for Exadata (where you add the new 14TB disks) since it is faster to do the rebalance of the diskgroup.