Category Archives: Grid Infrastructure

Exadata and ZDLRA, Disable HAIP

HAIP (High Availability IP) is not supported for the Exadata environment but can occur (if you did not create the cluster using OEDA) that HAIP became in use. And this particularity true for ZDLRA. So, during the upgrade from the previous version (12.2) to a higher version, it is needed to remove HAIP.

Usually, when we upgrading from 12.2 to 18c the HAIP is removed from Exadata. If the upgrade is from 12.1, and HAIP is there, it continues and is not removed by the upgrade process. If you are using HAIP and your GI is 12.1, this procedure as-is described here can’t be used (need some adaptation), because of some requirements from ASM+ACFS+DB. But since this is a preliminary step from a GI upgrade, the focus is to disable and remove it from GI.

The HAIP is not needed for Exadata because by architecture the InfiniBand network already defines (per server) two IP’s to avoid the single point of failure. So, it is not needed to create an additional layer (HAIP and virtual IP), that does the same that already exists by network design.

Click here to read more…

ASM, REPLACE DISK Command

The REPLACE DISK command was released with 12.1 and allow to do an online replacement for a failed disk. This command is important because it reduces the rebalance time doing just the SYNC phase. Comparing with normal disk replacement (DROP and ADD in the same command), the REPLACE just do mirror resync.

Basically, when the REPLACE command is called, the rebalance just copy/sync the data from the survivor disk (the partner disk from the mirror). It is faster since the previous way with drop/add execute a complete rebalance from all AU of the diskgroup, doing REBALANCE and SYNC phase.

The replace disk command is important for the SWAP disk process for Exadata (where you add the new 14TB disks) since it is faster to do the rebalance of the diskgroup.

Click here to read more…

ASM, Mount Restricted Force For Recovery

Survive to disk failures it is crucial to avoid data corruption, but sometimes, even with redundancy at ASM, multiple failures can happen. Check in this post how to use the undocumented feature “mount restricted force for recovery” to resurrect diskgroup and lose less data when multiple failures occur.

Diskgroup redundancy is a key factor for ASM resilience, where you can survive to disk failures and still continue to run databases. I will not extend about ASM disk redundancy here, but usually, you can configure your diskgroup without redundancy (EXTERNAL), double redundancy (NORMAL), triple redundancy (HIGH), and even fourth redundancy (EXTEND for stretch clusters).

If you want to understand more about redundancy you have a lot of articles at MOS and on the internet that provide useful information. One good is this. The idea is simple, spread multiple copies in different disks. And can even be better if you group disks in the same failgroups, so, your data will have multiple copies in separate places.

As an example, this a key for Exadata, where every storage cell is one independent failgroup and you can survive to one entire cell failure (or double full, depending on the redundancy of your diskgroup) without data loss. The same idea can be applied at a “normal” environment, where you can create failgroup to disks attached to controller A, and another attached to controller B (so the failure of one storage controller does not affect all failgroups). At ASM, if you do not create failgroup, each disk is a different one in diskgroups that have redundancy enabled.

Click here to read more…

Solving MGMTDB errors during 18c GI RU apply

Recently I executed the upgrade of Oracle GI to 19c version, from 18.6.0.0 to 19.5.0.0 version. But one step that was not showed there was that, because of requirements, the GI was upgraded from 18.2.0.0 to 18.6.0.0. This upgrade is a just Release Update (RU) apply and opatchauto command.

But during this upgrade, from 18.2 to 18.6, I faced (more than one time – 5 to be precise) errors during the update because of the MGMTDB errors. I got these errors:

  • ORA-12514, TNS: Listener does not currently know of service requested in connect descriptor
  • ORA-01017: invalid username/password; logon denied
  • MGTCA-1005 : Could not connect to the GIMR.
  • CRS-10407: (:CLSCRED1079:)Credential domain does not exist.

Here I will show how to solve these errors, how to identify if everything was fine and if you can continue. Be careful that it is an example, always open a support SR to identify the source of the error.

Click here to read more…

19c Grid Infrastructure Upgrade

Upgrade GRID infrastructure is one activity that usually is postponed because it involves a sensible area that, when not works, causes big downtime until be fixed. But, in the last versions, it is not a complicated task and if you follow the basic rules, it works without problems.

Here I will show a little example of how to upgrade the GI from 18.6.0 to 19.5. The steps below were executed at Exadata running version 19.2.7.0.0.191012 and GI 18.6.0.0, but can be done in every environment that supports Oracle GI.

Click here to read more…

TFA error after GI upgrade to 19c

Recently I made an Exadata stack upgrade/update to the last 19.2 version (19.2.7.0.0.191012) and I upgraded the GI from 18c to 19c (last 19c version – 19.5.0.0.191015) and after that, TFA does not work.

Since I don’t want to complete execute a TFA clean and reinstallation I tried to find the error and the solution. Here I want to share with you the workaround (since there is no solution yet) that I discovered and used to fix the error.

The environment

The actual environment is:

  • Old Grid Infrastructure: Version 18.6.0.0.190416
  • New Grid Infrastructure: Version 19.5.0.0.191015
  • Exadata domU: Version 19.2.7.0.0.191012 running kernel 4.1.12-124.30.1.el7uek.x86_64

TFA error

After upgrade the GI from 18c to 19c, the TFA does not work. If you try to start it or collect log using it, you can receive errors. In the environment described here, the TFA was running fine with the 18c version, and the rootupgrade script from 18c to 19c does not report an error.

And to be more precise, the TFA upgrade from 18c to 19c called by rootupgrade was ok (according to the log – I will show later). But even after that, the error occurs.

The provided solution as usual (by MOS support): download the lastest TFA and reinstall the actual one. Unfortunately, I not like this approach because can lead to an error during GI upgrade for next releases (like 20) and updates (19.6 as an example).

Click here to read more…

Exadata, workaround for oracka.ko error

Recently I made an Exadata stack upgrade/update to the last 19.2 version (19.2.7.0.0.191012) released in October of 2019, and update the GI to the last 19c version (19.5.0.0.191015) and after that, I hade some issues to create 11G databases.

So, when I try to create an 11G RAC database, the error “File -oracka.ko- was not found” appears and creation fails. Here I want to share with you the workaround (since there is no solution yet) that I discovered and used to bypass the error.

The environment

The actual environment is:

  • Grid Infrastructure: Version 19.5.0.0.191015
  • Exadata domU: Version 19.2.7.0.0.191012 running kernel 4.1.12-124.30.1.el7uek.x86_64
  • 11G Database: Version 11.2.0.4.180717
  • ACFS: Used to store some files

oracka.ko

So, calling dbca:

[DEV-oracle@exsite1c1-]$ /u01/app/oracle/product/11.2.0.4/dbhome_1/bin/dbca -silent -createDatabase -templateName General_Purpose.dbc -gdbName D11TST19 -adminManaged -sid D11TST19 -sysPassword oracle11 -systemPassword oracle11 -characterSet WE8ISO8859P15 -emConfiguration NONE -storageType ASM -diskGroupName DATAC8 -recoveryGroupName RECOC8 -nodelist exsite1c1,exsite1c2 -sampleSchema false
Copying database files
100% complete
Look at the log file "/u01/app/oracle/cfgtoollogs/dbca/D11TST19/D11TST19.log" for further details.
[DEV-oracle@exsite1c1-]$

Click here to read more…

Reimage ODA

The idea to reimage ODA is to refresh the environment without the need to jump from one by one to reach the last available version, or even rescue the system from S.O. failure/crash. The process to do a reimage can be check in the official documentation but unfortunately can be very tricky because the information (the order and steps) are not 100% clear. The idea is to show you how to reimage using version 18 (18.3 in this example), that represents the last available.

In resume the process is executed in the order:

  1. ILOM: Boot the ISO
  2. Prepare to create the appliance
  3. Upload GI and DB base version to the repository
  4. Linux: Create the appliance
  5. Firmware and patch
  6. Create Oracle Homes and the databases
  7. Finish and clean the install

More…

ODA, ACFS and ASM Dilemma

As you know, for ODA, you have two options for storage: ACFS or ASM. If you choose ACFS, you can create all versions for databases, from 11g to 18c (until this moment). But if you choose ASM, the 11g will not be compatible.

So, ASM or ACFS? If you choose ACFS, the diskgroup where ACFS runs will be sliced and you have one mount point for each database. If you have, as an example, one system with more than 30 databases, can be complicated to manage all the ACFS mount points. So, ASM it simple and easier solution to sustain. Besides the fact that it is more homogeneous with other database environments (Exadata, RAC’s …etc).

If you choose ASM you can’t use 11g versions or avoid the ACFS mount points for all databases, but you can do a little simple approach to use 11g databases and still use ASM for others. Took one example where just 3 or 4 databases will run over 11g version and all others 30 databases in the environment will be in 12/18. To achieve that, the option, in this case, is using a “manual” ACFS mount point, I will explain.

More…

Reaching Exadata 18c

Here I cover in raw, undocumented and uncommented mode the process to update and upgrade your Exadata using the last version of everything. AND since Oracle 18c was released to use with Oracle Exadata (from SQL Maria), this post include the Oracle 18c upgrade for Grid Infrastructure and Oracle database binaries installation.

Since one friend was decommissioning one old Exadata X2 (running after the end of life), I used to do some tests and here you will find the commands, outputs, images that I use to (in order):

  • Apply the patch 12.1.0.2.180116 for Oracle Grid Infrastructure 12.1.
  • Update the Infiniband switches to last version 2.2.7-1.
  • Apply the patch for Exadata Storage Server using the last version, 18.1.4.0.0.180125.3.
  • Apply the patch for Exadata Database Servers using the last version, 18.1.4.0.0.180125.3
  • Upgrade the Oracle Grid Infrastructure 12.1 to Grid Infrastructure 12.2, applying PSU 12.2.0.1.180116 at same time.
  • Upgrade the Oracle Grid Infrastructure 12.2 to Oracle Grid Infrastructure 18c.
  • Install Oracle database binaries for Oracle Database 18c and Create a test Oracle Rac Instance.

Continue lendo…