Tag Archives: Engineered Systems

ZDLRA, Protection Policies

For ZDLRA the protection policies have a significant role in the appliance management, but not just that, for the architecture design too. And usually (and unfortunately) policies do not take a lot of attention as deserved.

To create a good ZDLRA design, and avoid future problems, it is important to understand all the requirements for the protection policies and all the impacts. You can check the official documentation for this, but I will explain deeply the details that can pass without you notice them in the documentation.

Click here to read more…

ZDLRA, Virtual Private Catalog User – VPC

The Virtual Private Catalog (VPC) user is a key piece for a good ZDLRA architecture design. The detail is not how to create it, but how to correctly integrate it in your design, and this is more important if you have replicated ZDLRA or using Real-Time redo transport.

Here I will show and discuss VPC implications for your architecture design when deploying ZDLRA. Even for a complete and new implementation (together with database) or adding ZDLRA at your already running environment. All points here try to show some perspectives and key points that can help you to correct use and define VPC’s.

Click here to read more…

ZDLRA, Webinar

On 18/Dec/2019 I presented a webinar about ZDLRA: Understanding ZDLRA. This was done through the Luxembourg Oracle User Group (LUXOUG).

In this webinar, I covered some details about what it is, and some features that are part of ZDLRA. Besides that, I showed internals details about how it stores and indexes the backups. 

If you lose it, you can watch now. It is free. Just check these two links:

You can download the presentation here too: Understanding ZDLRA.

Fell free to follow my website and check more details for ZDLRA and other posts about EXADATA, MAA, and Oracle.

Click here to read more…

Exadata, Missing Metric

Understand metrics for Exadata Storage Server is important to understand how all the software features are being used and all the details from that. Here I will discuss one case where the FC_IO_BY_R_SEC metric can show not precise values. And I will discuss one missing metric that can save a lot.

If you have doubts about metrics, you can check my post about metrics, it was an introduction, but cover some aspects of how to read and use it. You can check my other post where I show how to use metric DB_FC_IO_BY_SEC to identify database problems that can be hidden when checking only from the database side.

Click here to read more…

Exadata, Understanding Metrics

Metrics for Exadata deliver to you one way to deeply see, and understand, what it is happening for Exadata Storage Server and Exadata Software. Understand it is fundamental to identify and solve problems that can be hidden (or even unsee) from the database side. In this post, I will explain details about these metrics and what you can do using them.

My last article about Exadata Storage Server metrics was about one example of how to use them to identify problems that do not appear in the database side. In that post, I showed how I used the metric DB_FC_IO_BY_SEC to identify bad queries.

The point for Exadata (that I made in that article), is that most of the time, Exadata is so powerful that bad statements are handled without a problem because of the features that exist (flashcache, smartio, and others). But another point is that usually, Exadata is a high consolidated environment, where you “consolidate” a lot of databases and it is normal that some of them have different workloads and needs. Using metrics can help you to do a fine tune of your environment, but besides that, it delivers to you one way to check and control everything that’s happening.

In this post, I will not explain each metric one by one, but guide you to understand metrics and some interesting and important details about them.

Click here to read more…

TFA error after GI upgrade to 19c

Recently I made an Exadata stack upgrade/update to the last 19.2 version (19.2.7.0.0.191012) and I upgraded the GI from 18c to 19c (last 19c version – 19.5.0.0.191015) and after that, TFA does not work.

Since I don’t want to complete execute a TFA clean and reinstallation I tried to find the error and the solution. Here I want to share with you the workaround (since there is no solution yet) that I discovered and used to fix the error.

The environment

The actual environment is:

  • Old Grid Infrastructure: Version 18.6.0.0.190416
  • New Grid Infrastructure: Version 19.5.0.0.191015
  • Exadata domU: Version 19.2.7.0.0.191012 running kernel 4.1.12-124.30.1.el7uek.x86_64

TFA error

After upgrade the GI from 18c to 19c, the TFA does not work. If you try to start it or collect log using it, you can receive errors. In the environment described here, the TFA was running fine with the 18c version, and the rootupgrade script from 18c to 19c does not report an error.

And to be more precise, the TFA upgrade from 18c to 19c called by rootupgrade was ok (according to the log – I will show later). But even after that, the error occurs.

The provided solution as usual (by MOS support): download the lastest TFA and reinstall the actual one. Unfortunately, I not like this approach because can lead to an error during GI upgrade for next releases (like 20) and updates (19.6 as an example).

Click here to read more…

Exadata, workaround for oracka.ko error

Recently I made an Exadata stack upgrade/update to the last 19.2 version (19.2.7.0.0.191012) released in October of 2019, and update the GI to the last 19c version (19.5.0.0.191015) and after that, I hade some issues to create 11G databases.

So, when I try to create an 11G RAC database, the error “File -oracka.ko- was not found” appears and creation fails. Here I want to share with you the workaround (since there is no solution yet) that I discovered and used to bypass the error.

The environment

The actual environment is:

  • Grid Infrastructure: Version 19.5.0.0.191015
  • Exadata domU: Version 19.2.7.0.0.191012 running kernel 4.1.12-124.30.1.el7uek.x86_64
  • 11G Database: Version 11.2.0.4.180717
  • ACFS: Used to store some files

oracka.ko

So, calling dbca:

[DEV-oracle@exsite1c1-]$ /u01/app/oracle/product/11.2.0.4/dbhome_1/bin/dbca -silent -createDatabase -templateName General_Purpose.dbc -gdbName D11TST19 -adminManaged -sid D11TST19 -sysPassword oracle11 -systemPassword oracle11 -characterSet WE8ISO8859P15 -emConfiguration NONE -storageType ASM -diskGroupName DATAC8 -recoveryGroupName RECOC8 -nodelist exsite1c1,exsite1c2 -sampleSchema false
Copying database files
100% complete
Look at the log file "/u01/app/oracle/cfgtoollogs/dbca/D11TST19/D11TST19.log" for further details.
[DEV-oracle@exsite1c1-]$

Click here to read more…

Exadata, Using metrics to help you

It is well known that Exadata delivers a lot of power for databases and, besides that, has a lot of features that can be combined to reach the desired goals. But you need to understand how to use Exadata, it is not just knowing the internal hardware pieces, put some SQL hints, or use smart scan that makes a better DBA (or DMA).

Think about the “traditional” environment (DB + storage) and how you check for performance problems there. Basically, you just have/receive the number of IOPS from luns, throughput in MB/s, and latency from the storage side. But Exadata provides a lot of metrics that go beyond that and can be used to really understand what it is happening between the database and the access of data blocks.

For me, one of the most underrated (and not even well explained in web) features of Exadata is the metrics because they can help you to really understand Exadata deeply. As an example, from metrics, you can check the MB/s read from flash cache, disks (per type), flash log writes, reads that bypassed flash cache and went to disk, Smart I/O per database, PDB or consumer groups. It is not part of this post explain all the metrics (will be in another one), but you can read more at Chapter 6 of the Exadata User Guide.

In this post, I will show you one example of how to use the metric to identify and solve database problems. Sometimes it can be a hide and seek game, but I will try to show you how to use metrics and how they can help you on your daily basis.

Click here to read more…

ZDLRA, Multi-site protection – ZERO RPO for Primary and Standby

ZDLRA can be used from a small single database environment to big environments where you need protection in more than one site at the same time. At every level, you can use different features of ZDLRA to provide desirable protection. Here I will show how to reach zero RPO for both primary and standby databases. All the steps, doc, and tech parts are covered.

You can check the examples the reference for every scenario int these two papers from the Oracle MAA team: MAA Overview On-Premises and Oracle MAA Reference Architectures. They provide good information on how to prepare to reduce RPO and improve RTO. In resume, the focus is the same, reduce the downtime and data loss in case of a catastrophe (zero RPO, and zero RPO).

Multi-site protection

If you looked both papers before, you saw that to provide good protection is desirable to have an additional site to, at least, send the backups. And if you go higher, for GOLD and PLATINUM environments, you start to have multiple sites synced with data guard. These Critical/Mission-critical environments need to be protected for every kind of catastrophic failure, from disk until complete site outage (some need to follow specific law’s requirements, bank as an example).

And the focus of this post is these big environments. I will show you how to use ZDLRA to protect both sites, reaching zero RPO even for standby databases. And doing that, you can survive for a catastrophic outage (like entire datacenter failure) and still have zero RPO. Going further, you can even have zero RPO if you lose completely on site when using real-time redo for ZDLRA, and this is not written in the docs by the way.

Click here to read more…

ZDLRA, Real-Time Redo and Zero RPO

The idea for Real-Time Redo is to reach zero RPO for every kind of database and this includes ones with and without DG. As you can see in my last post, where I showed how to configure Real-Time Redo for one database, some little steps need to be executed and they are pretty similar than a remote destination for archivelog for DG.

But if you noticed, the configuration for the remote destination was defined as ASYNC, and hinted like that at ZDLRA docs (“Protection of Ongoing Transactions” or at “How Real-Time Redo Transport Works”). In the same post, I suggested as “controversial” because the ASYNC does not guarantee the RPO zero. 

You can see more in the DataGuard docs at (Oracle Data Guard Protection Modes and Oracle Data Guard Concepts and Administration), but the resume it is:

  • ASYNC: The primary database does not wait for the response from a remote destination.
  • SYNC/NOAFIRM: The Primary database holds commit until the remote destination report that received the redo data. It does not wait until the remote site report that wrote the data in the disc.
  • SYNC/AFFIRM: The primary database holds commit until the remote destination report that received redo data and wrote it at the disk.

You can read with more details the difference here: Best Practices for Synchronous Redo Transport and Best Practices for Asynchronous Redo Transport.

The idea is simple, if you use ASYNC, there is no guarantee for zero data loss between the primary database and the remote destination.

Click here to read more…