Tag Archives: Exadata

Oracle Engineered Systems since 2010

Recently I made a tweet about a new project with Oracle Engineered System (X9M) that remembered me about what I made with these systems until now. So, this opened the opportunity to tell my background and history until now working with these systems. Is not a show-off of ego boost post.

Click here to read more…

Upgrade AHF and TFA at Exadata

3 Replies

Quick post for today. Recently needed to upgrade to the last version of Autonomous Health Framework (AHF) from an Exadata running GI 19.5. In this particular case the GI was not even running AHF, but still using the standalone TFA that comes with it. So, here I will show how to upgrade to the last version of AHF and replacing the TFA as well.

Click here to read more…

Fixing Exadata Missing Volumes at LVM

Exadata, dracut and LVM locking_type as read-only

1 Reply

Sometimes even a perfectly running server can pass over issues after a simple reboot. And was exactly that occurred recently with one Exadata database node. And was not the first time that the same error appears (and since there is no well-documented step by step to fix it I documented them below). So, check how to fix the issue related to the read-only locking_type of LVM detected by dracut.

Click here to read more…

MAA, Blueprints and On-Premise Architecture Reference

1 Reply

The Oracle Maximum Availability Architecture (MAA) is the correct way to protect your Oracle database environment (and investment). It covers from a simple single instance to Exadata/Engineered Systems RAC and a multi-site database with Data Guard protection. But do you know that to reach the MAA (whatever the architecture level that you are protecting) you need to use ZDLRA?

So, I will start a series of posts to cover the MAA and ZDLRA. Discussing what you need to do (and how) to reach the maximum level of availability as is at the MAA architecture (as defined in the documentation and best practices: Oracle Maximum Availability Architecture (MAA) Blueprints for On-Premises, MAA Best Practices – Oracle Database, and Maximum Availability with Oracle Database 19c).

Why ZDLRA?

The question is why ZDLRA is needed? The point from ZDLRA is that it can (and needed to be used) to protect and reach zero RPO to all architectures. ZDLRA is more (much more) than just a backup appliance, is the core of every MAA design. You can’t reach zero RPO without using it.

Click here to read more…

Exadata and ZDLRA, Patch Exadata Stack

4 Replies

The process to patch Exadata stack and software changed in the last years and it became easier. Now, with patchmgr to be used for all (database servers, storage cells, and switches) the process is much easier to control the steps. Here I will show the steps that are involved in this process.

Independent if it is ZDLRA or Exadata, the process for Engineering System is the same. So, this post can be used as a guide for the Exadata patch apply as well. In 2018 I already made a similar process about how to patch/upgrade Exadata to 18c (you can access here) and even made a partial/incomplete post for 12c in 2015.

The process will be very similar and can be done in rolling and non-rolling mode. In the first, the services continue to run and you don’t need to shutdown databases, but will take more time because the patchmgr applies server by server. At the second, you need to shutdown the entire GI and the patch is applied in parallel and will be faster.

Click here to read more…

ZDLRA, Patch the Recovery Appliance

6 Replies

The proceed to patch/upgrade ZDLRA is not complicated, but as usual, some details need to be checked before starting the procedure. Since it is one engineering system based at Exadata, the procedure has one part that (maybe) needs to upgrade this stack too. But, is possible to upgrade just the recovery appliance library.

Whatever if need or no to upgrade the Exadata stack, the upgrade for recovery appliance library is the same. The commands and checks are the same. The procedure described in this post cover the upgrade of the recovery appliance library. For Exadata stack, it is in another post.

Where we are

Before even start the patch/upgrade it is important to know exactly which version you are running. To do this execute the command racli version at you database node:

[root@zeroinsg01 ~]# racli version
Recovery Appliance Version:
        exadata image: 19.2.3.0.0.190621
        rarpm version: ra_automation-12.2.1.1.2.201907-30111072.x86_64
        rdbms version: RDBMS_12.2.0.1.0_LINUX.X64_RELEASE
        transaction  : kadjei_julpsu_ip2
        zdlra version: ZDLRA_12.2.1.1.2.201907_LINUX.X64_RELEASE
[root@zeroinsg01 ~]#

With this, we can discover the ZDLRA version running (12.2.1.1.2.201907 in this case), and the Exadata image version (19.2.3.0.0.190621).

Click here to read more…

Exadata and ZDLRA, Disable HAIP

5 Replies

HAIP (High Availability IP) is not supported for the Exadata environment but can occur (if you did not create the cluster using OEDA) that HAIP became in use. And this particularity true for ZDLRA. So, during the upgrade from the previous version (12.2) to a higher version, it is needed to remove HAIP.

Usually, when we upgrading from 12.2 to 18c the HAIP is removed from Exadata. If the upgrade is from 12.1, and HAIP is there, it continues and is not removed by the upgrade process. If you are using HAIP and your GI is 12.1, this procedure as-is described here can’t be used (need some adaptation), because of some requirements from ASM+ACFS+DB. But since this is a preliminary step from a GI upgrade, the focus is to disable and remove it from GI.

The HAIP is not needed for Exadata because by architecture the InfiniBand network already defines (per server) two IP’s to avoid the single point of failure. So, it is not needed to create an additional layer (HAIP and virtual IP), that does the same that already exists by network design.

Click here to read more…

ASM, REPLACE DISK Command

1 Reply

The REPLACE DISK command was released with 12.1 and allow to do an online replacement for a failed disk. This command is important because it reduces the rebalance time doing just the SYNC phase. Comparing with normal disk replacement (DROP and ADD in the same command), the REPLACE just do mirror resync.

Basically, when the REPLACE command is called, the rebalance just copy/sync the data from the survivor disk (the partner disk from the mirror). It is faster since the previous way with drop/add execute a complete rebalance from all AU of the diskgroup, doing REBALANCE and SYNC phase.

The replace disk command is important for the SWAP disk process for Exadata (where you add the new 14TB disks) since it is faster to do the rebalance of the diskgroup.

Click here to read more…

ASM, Mount Restricted Force For Recovery

1 Reply

Survive to disk failures it is crucial to avoid data corruption, but sometimes, even with redundancy at ASM, multiple failures can happen. Check in this post how to use the undocumented feature “mount restricted force for recovery” to resurrect diskgroup and lose less data when multiple failures occur.

Diskgroup redundancy is a key factor for ASM resilience, where you can survive to disk failures and still continue to run databases. I will not extend about ASM disk redundancy here, but usually, you can configure your diskgroup without redundancy (EXTERNAL), double redundancy (NORMAL), triple redundancy (HIGH), and even fourth redundancy (EXTEND for stretch clusters).

If you want to understand more about redundancy you have a lot of articles at MOS and on the internet that provide useful information. One good is this. The idea is simple, spread multiple copies in different disks. And can even be better if you group disks in the same failgroups, so, your data will have multiple copies in separate places.

As an example, this a key for Exadata, where every storage cell is one independent failgroup and you can survive to one entire cell failure (or double full, depending on the redundancy of your diskgroup) without data loss. The same idea can be applied at a “normal” environment, where you can create failgroup to disks attached to controller A, and another attached to controller B (so the failure of one storage controller does not affect all failgroups). At ASM, if you do not create failgroup, each disk is a different one in diskgroups that have redundancy enabled.

Click here to read more…

Fernando Simon

Have you hugged your backup today?

Tag Archives: Exadata

Oracle Engineered Systems since 2010

Upgrade AHF and TFA at Exadata

Fixing Exadata Missing Volumes at LVM

Exadata, dracut and LVM locking_type as read-only

MAA, Blueprints and On-Premise Architecture Reference

Why ZDLRA?

Exadata and ZDLRA, Patch Exadata Stack

ZDLRA, Patch the Recovery Appliance

Where we are

Exadata and ZDLRA, Disable HAIP

ASM, REPLACE DISK Command

ASM, Mount Restricted Force For Recovery