Recently I posted about the upgrade of AHF/TAF from version 19 to 21 at Exadata and also for ODA. But with version 21 of AHF, some collections are made automatically and this can impact your space usage. Here you can see how to check this and disable/modify some of these.
The automatic collection for AHF/TFA is a feature that generates the diagnostic packages (to send to Oracle) when some specifics errors appear in the database. The collected errors follow some patterns like ORA-0600, ORA-07445, and several others. The basic idea can be seen in the official doc here and in the image below (retried directly from the official doc).
Recently I made a post about how to upgrade the TFA to AHF at Oracle Exadata. For today, the post is about how to upgrade AHF at ODA. The procedure is quite simple, but you need to check where to up it and if everything is up and running.
With the release of the 21c of Oracle Database is time to study new features. The 21c version of Grid Infrastructure (and ASM) was released and an upgrade from orders versions can be executed. It is not a complex task, but some details need to be verified. In this post, I will show the steps to upgrade the Grid Infrastructure to 21c. If you need to upgrade from 18c to 19c you can check my previous post.
OS version: If it is compatible with 21c and if you are using asmlib or asm filter, check kernel modules and certification matrix.
Current GI: Maybe you need to apply some patches. The best practice recommends using the last version.
Used features (like AFD, HAIP, Resources): Check compatibilities of the old features with 21c. Maybe you need to remove HAIP or change your crs resources.
21c requirements for GI: Check memory, space, and database versions.
Oracle Home patches (for databases running): Check if you need to apply some patches for your database to be compatible with GI 21c.
Backup of your Databases: Just in case you need to roll back something.
The environment that I am using for this example is:
Oracle Linux 8.4.
GI cluster with two nodes.
ASM Filter for disk access.
19.11 for GI.
19.12 for Oracle Home database.
I personally recommend upgrading your current GI to 19c before upgrade or apply one of the last PSU for your running version. This avoids a lot of errors since most of the know bugs will be patched. Check below my environment:
Quick post for today. Recently needed to upgrade to the last version of Autonomous Health Framework (AHF) from an Exadata running GI 19.5. In this particular case the GI was not even running AHF, but still using the standalone TFA that comes with it. So, here I will show how to upgrade to the last version of AHF and replacing the TFA as well.
If you search around about how to patch Oracle Database you will find a lot of blog posts teaching how to patch your Oracle Home (OH) (I will not put the list here because it will be enormous – but just follow Mike Dietrich). But most of them write nothing about OCW, how to patch it, or if it is needed to patch OCW. And unfortunately, even Oracle is not clear about that.
Just to complement, recently Liron Amitzi got one issue related to OCW. And if you search more, you will find that Frits Hoogland wrote something about it too. But in the end, need I to concern about OCW? And, what is OCW?
OCW means Oracle Clusterware, and basically is the core for the Grid Infrastructure, it is everything there. But for OH is important too because if the database needs to communicate with GI Clusterware it uses the OCW binaries/libraries that are at OH (like srvctl, crstctl) to do that. So, if have some kind of bug at this portion of OCW, it needs to be patched.
The point is that the only place that you can find the OCW patch is under the GI RU patch. Look at the readme for last GI RU 184.108.40.206.200714 (Patch 31305339):
And if you look at the readme for DB RU 220.127.116.11.200714 (Patch 31281355) there is no reference to the OCW patch. So, if apply just the DB RU the OCW will not be updated.
And just to remember you that patch 31305087 does not exist alone to be downloaded:
HAIP (High Availability IP) is not supported for the Exadata environment but can occur (if you did not create the cluster using OEDA) that HAIP became in use. And this particularity true for ZDLRA. So, during the upgrade from the previous version (12.2) to a higher version, it is needed to remove HAIP.
Usually, when we upgrading from 12.2 to 18c the HAIP is removed from Exadata. If the upgrade is from 12.1, and HAIP is there, it continues and is not removed by the upgrade process. If you are using HAIP and your GI is 12.1, this procedure as-is described here can’t be used (need some adaptation), because of some requirements from ASM+ACFS+DB. But since this is a preliminary step from a GI upgrade, the focus is to disable and remove it from GI.
The HAIP is not needed for Exadata because by architecture the InfiniBand network already defines (per server) two IP’s to avoid the single point of failure. So, it is not needed to create an additional layer (HAIP and virtual IP), that does the same that already exists by network design.
The REPLACE DISK command was released with 12.1 and allow to do an online replacement for a failed disk. This command is important because it reduces the rebalance time doing just the SYNC phase. Comparing with normal disk replacement (DROP and ADD in the same command), the REPLACE just do mirror resync.
Basically, when the REPLACE command is called, the rebalance just copy/sync the data from the survivor disk (the partner disk from the mirror). It is faster since the previous way with drop/add execute a complete rebalance from all AU of the diskgroup, doing REBALANCE and SYNC phase.
The replace disk command is important for the SWAP disk process for Exadata (where you add the new 14TB disks) since it is faster to do the rebalance of the diskgroup.
Survive to disk failures it is crucial to avoid data corruption, but sometimes, even with redundancy at ASM, multiple failures can happen. Check in this post how to use the undocumented feature “mount restricted force for recovery” to resurrect diskgroup and lose less data when multiple failures occur.
Diskgroup redundancy is a key factor for ASM resilience, where you can survive to disk failures and still continue to run databases. I will not extend about ASM disk redundancy here, but usually, you can configure your diskgroup without redundancy (EXTERNAL), double redundancy (NORMAL), triple redundancy (HIGH), and even fourth redundancy (EXTEND for stretch clusters).
If you want to understand more about redundancy you have a lot of articles at MOS and on the internet that provide useful information. One good is this. The idea is simple, spread multiple copies in different disks. And can even be better if you group disks in the same failgroups, so, your data will have multiple copies in separate places.
As an example, this a key for Exadata, where every storage cell is one independent failgroup and you can survive to one entire cell failure (or double full, depending on the redundancy of your diskgroup) without data loss. The same idea can be applied at a “normal” environment, where you can create failgroup to disks attached to controller A, and another attached to controller B (so the failure of one storage controller does not affect all failgroups). At ASM, if you do not create failgroup, each disk is a different one in diskgroups that have redundancy enabled.
Recently I executed the upgrade of Oracle GI to 19c version, from 18.104.22.168 to 22.214.171.124 version. But one step that was not showed there was that, because of requirements, the GI was upgraded from 126.96.36.199 to 188.8.131.52. This upgrade is a just Release Update (RU) apply and opatchauto command.
But during this upgrade, from 18.2 to 18.6, I faced (more than one time – 5 to be precise) errors during the update because of the MGMTDB errors. I got these errors:
ORA-12514, TNS: Listener does not currently know of service requested in connect descriptor
CRS-10407: (:CLSCRED1079:)Credential domain does not exist.
Here I will show how to solve these errors, how to identify if everything was fine and if you can continue. Be careful that it is an example, always open a support SR to identify the source of the error.
Upgrade GRID infrastructure is one activity that usually is postponed because it involves a sensible area that, when not works, causes big downtime until be fixed. But, in the last versions, it is not a complicated task and if you follow the basic rules, it works without problems.
Here I will show a little example of how to upgrade the GI from 18.6.0 to 19.5. The steps below were executed at Exadata running version 184.108.40.206.0.191012 and GI 220.127.116.11, but can be done in every environment that supports Oracle GI.