Category Archives: Grid Infrastructure

21c Grid Infrastructure Upgrade

With the release of the 21c of Oracle Database is time to study new features. The 21c version of Grid Infrastructure (and ASM) was released and an upgrade from orders versions can be executed. It is not a complex task, but some details need to be verified. In this post, I will show the steps to upgrade the Grid Infrastructure to 21c. If you need to upgrade from 18c to 19c you can check my previous post.

Planning

The first step that you need to do is plan everything. You need to check the requirements, read the docs, download files, and plan the actions. While I am writing this post, there is no official MOS docs about how to upgrade the GI to 19c. The first place to the procedure is the official doc for GI Installation and Upgrade, mainly chapter 11. And another good example is 19c Grid Infrastructure and Database Upgrade steps for Exadata Database Machine running on Oracle Linux (Doc ID 2542082.1).

So, what you need to consider:

  • OS version: If it is compatible with 21c and if you are using asmlib or asm filter, check kernel modules and certification matrix.
  • Current GI: Maybe you need to apply some patches. The best practice recommends using the last version.
  • Used features (like AFD, HAIP, Resources): Check compatibilities of the old features with 21c. Maybe you need to remove HAIP or change your crs resources.
  • 21c requirements for GI: Check memory, space, and database versions.
  • Oracle Home patches (for databases running): Check if you need to apply some patches for your database to be compatible with GI 21c.
  • Backup of your Databases: Just in case you need to roll back something.

My environment

The environment that I am using for this example is:

  • Oracle Linux 8.4.
  • GI cluster with two nodes.
  • ASM Filter for disk access.
  • 19.11 for GI.
  • 19.12 for Oracle Home database.

I personally recommend upgrading your current GI to 19c before upgrade or apply one of the last PSU for your running version. This avoids a lot of errors since most of the know bugs will be patched. Check below my environment:

Click here to read more…

Upgrade AHF and TFA at Exadata

Quick post for today. Recently needed to upgrade to the last version of Autonomous Health Framework (AHF) from an Exadata running GI 19.5. In this particular case the GI was not even running AHF, but still using the standalone TFA that comes with it. So, here I will show how to upgrade to the last version of AHF and replacing the TFA as well.

Click here to read more…

Patching OCW for OH

If you search around about how to patch Oracle Database you will find a lot of blog posts teaching how to patch your Oracle Home (OH) (I will not put the list here because it will be enormous – but just follow Mike Dietrich). But most of them write nothing about OCW, how to patch it, or if it is needed to patch OCW.  And unfortunately, even Oracle is not clear about that.

Just to complement, recently Liron Amitzi got one issue related to OCW. And if you search more, you will find that Frits Hoogland wrote something about it too. But in the end, need I to concern about OCW? And, what is OCW?

OCW

OCW means Oracle Clusterware, and basically is the core for the Grid Infrastructure, it is everything there. But for OH is important too because if the database needs to communicate with GI Clusterware it uses the OCW binaries/libraries that are at OH (like srvctl, crstctl) to do that. So, if have some kind of bug at this portion of OCW, it needs to be patched.

The point is that the only place that you can find the OCW patch is under the GI RU patch. Look at the readme for last GI RU 19.8.0.0.200714 (Patch 31305339):

And if you look at the readme for DB RU 19.8.0.0.200714 (Patch 31281355) there is no reference to the OCW patch. So, if apply just the DB RU the OCW will not be updated.

And just to remember you that patch 31305087 does not exist alone to be downloaded:

Click here to read more…

Exadata and ZDLRA, Disable HAIP

HAIP (High Availability IP) is not supported for the Exadata environment but can occur (if you did not create the cluster using OEDA) that HAIP became in use. And this particularity true for ZDLRA. So, during the upgrade from the previous version (12.2) to a higher version, it is needed to remove HAIP.

Usually, when we upgrading from 12.2 to 18c the HAIP is removed from Exadata. If the upgrade is from 12.1, and HAIP is there, it continues and is not removed by the upgrade process. If you are using HAIP and your GI is 12.1, this procedure as-is described here can’t be used (need some adaptation), because of some requirements from ASM+ACFS+DB. But since this is a preliminary step from a GI upgrade, the focus is to disable and remove it from GI.

The HAIP is not needed for Exadata because by architecture the InfiniBand network already defines (per server) two IP’s to avoid the single point of failure. So, it is not needed to create an additional layer (HAIP and virtual IP), that does the same that already exists by network design.

Click here to read more…

ASM, REPLACE DISK Command

The REPLACE DISK command was released with 12.1 and allow to do an online replacement for a failed disk. This command is important because it reduces the rebalance time doing just the SYNC phase. Comparing with normal disk replacement (DROP and ADD in the same command), the REPLACE just do mirror resync.

Basically, when the REPLACE command is called, the rebalance just copy/sync the data from the survivor disk (the partner disk from the mirror). It is faster since the previous way with drop/add execute a complete rebalance from all AU of the diskgroup, doing REBALANCE and SYNC phase.

The replace disk command is important for the SWAP disk process for Exadata (where you add the new 14TB disks) since it is faster to do the rebalance of the diskgroup.

Click here to read more…

ASM, Mount Restricted Force For Recovery

Survive to disk failures it is crucial to avoid data corruption, but sometimes, even with redundancy at ASM, multiple failures can happen. Check in this post how to use the undocumented feature “mount restricted force for recovery” to resurrect diskgroup and lose less data when multiple failures occur.

Diskgroup redundancy is a key factor for ASM resilience, where you can survive to disk failures and still continue to run databases. I will not extend about ASM disk redundancy here, but usually, you can configure your diskgroup without redundancy (EXTERNAL), double redundancy (NORMAL), triple redundancy (HIGH), and even fourth redundancy (EXTEND for stretch clusters).

If you want to understand more about redundancy you have a lot of articles at MOS and on the internet that provide useful information. One good is this. The idea is simple, spread multiple copies in different disks. And can even be better if you group disks in the same failgroups, so, your data will have multiple copies in separate places.

As an example, this a key for Exadata, where every storage cell is one independent failgroup and you can survive to one entire cell failure (or double full, depending on the redundancy of your diskgroup) without data loss. The same idea can be applied at a “normal” environment, where you can create failgroup to disks attached to controller A, and another attached to controller B (so the failure of one storage controller does not affect all failgroups). At ASM, if you do not create failgroup, each disk is a different one in diskgroups that have redundancy enabled.

Click here to read more…

Solving MGMTDB errors during 18c GI RU apply

Recently I executed the upgrade of Oracle GI to 19c version, from 18.6.0.0 to 19.5.0.0 version. But one step that was not showed there was that, because of requirements, the GI was upgraded from 18.2.0.0 to 18.6.0.0. This upgrade is a just Release Update (RU) apply and opatchauto command.

But during this upgrade, from 18.2 to 18.6, I faced (more than one time – 5 to be precise) errors during the update because of the MGMTDB errors. I got these errors:

  • ORA-12514, TNS: Listener does not currently know of service requested in connect descriptor
  • ORA-01017: invalid username/password; logon denied
  • MGTCA-1005 : Could not connect to the GIMR.
  • CRS-10407: (:CLSCRED1079:)Credential domain does not exist.

Here I will show how to solve these errors, how to identify if everything was fine and if you can continue. Be careful that it is an example, always open a support SR to identify the source of the error.

Click here to read more…

19c Grid Infrastructure Upgrade

Upgrade GRID infrastructure is one activity that usually is postponed because it involves a sensible area that, when not works, causes big downtime until be fixed. But, in the last versions, it is not a complicated task and if you follow the basic rules, it works without problems.

Here I will show a little example of how to upgrade the GI from 18.6.0 to 19.5. The steps below were executed at Exadata running version 19.2.7.0.0.191012 and GI 18.6.0.0, but can be done in every environment that supports Oracle GI.

Click here to read more…

TFA error after GI upgrade to 19c

Recently I made an Exadata stack upgrade/update to the last 19.2 version (19.2.7.0.0.191012) and I upgraded the GI from 18c to 19c (last 19c version – 19.5.0.0.191015) and after that, TFA does not work.

Since I don’t want to complete execute a TFA clean and reinstallation I tried to find the error and the solution. Here I want to share with you the workaround (since there is no solution yet) that I discovered and used to fix the error.

The environment

The actual environment is:

  • Old Grid Infrastructure: Version 18.6.0.0.190416
  • New Grid Infrastructure: Version 19.5.0.0.191015
  • Exadata domU: Version 19.2.7.0.0.191012 running kernel 4.1.12-124.30.1.el7uek.x86_64

TFA error

After upgrade the GI from 18c to 19c, the TFA does not work. If you try to start it or collect log using it, you can receive errors. In the environment described here, the TFA was running fine with the 18c version, and the rootupgrade script from 18c to 19c does not report an error.

And to be more precise, the TFA upgrade from 18c to 19c called by rootupgrade was ok (according to the log – I will show later). But even after that, the error occurs.

The provided solution as usual (by MOS support): download the lastest TFA and reinstall the actual one. Unfortunately, I not like this approach because can lead to an error during GI upgrade for next releases (like 20) and updates (19.6 as an example).

Click here to read more…