Survive to disk failures it is crucial to avoid data corruption, but sometimes, even with redundancy at ASM, multiple failures can happen. Check in this post how to use the undocumented feature “mount restricted force for recovery” to resurrect diskgroup and lose less data when multiple failures occur.
Diskgroup redundancy is a key factor for ASM resilience, where you can survive to disk failures and still continue to run databases. I will not extend about ASM disk redundancy here, but usually, you can configure your diskgroup without redundancy (EXTERNAL), double redundancy (NORMAL), triple redundancy (HIGH), and even fourth redundancy (EXTEND for stretch clusters).
If you want to understand more about redundancy you have a lot of articles at MOS and on the internet that provide useful information. One good is this. The idea is simple, spread multiple copies in different disks. And can even be better if you group disks in the same failgroups, so, your data will have multiple copies in separate places.
As an example, this a key for Exadata, where every storage cell is one independent failgroup and you can survive to one entire cell failure (or double full, depending on the redundancy of your diskgroup) without data loss. The same idea can be applied at a “normal” environment, where you can create failgroup to disks attached to controller A, and another attached to controller B (so the failure of one storage controller does not affect all failgroups). At ASM, if you do not create failgroup, each disk is a different one in diskgroups that have redundancy enabled.
When configuring a database with Real-Time Redo at ZDLRA it is important to check the deletion policy for archivelog. This is even more important when the database is protected with dataguard. I already wrote about Real-time Redo in this previous post, and when using with dataguard in another post.
But sometimes (during maintenance as an example) you can face the error RMAN-08137: warning: archived log not deleted, needed for standby or upstream capture process if the deletion policy of archivelog is not aligned with your needs.
In my previous post, I showed how the clone to tape occurs for ZDLRA. But as explained, the clones occur through the scheduler and follow some rules. For full backup, as an example, it clones the last available.
But sometimes, it is needed to call the clone for some specific backup, maybe to do long-term storage to follow some regimentation/law. And if we leverage this for the clone, jobs can maybe take to long, or clone more that you need.
As you saw in my last post, the configuration to enable clone to tape for ZDLRA it is not complicated, but you need to take care of some details to avoid errors. Besides that, ZDLRA relies on OSB to do that (when configured with native tape support) and this has some details that you need to be aware of.
In this post, I will show how the clone to tape works for ZDLRA. And how you can check some details about OSB.
With ZDLRA you can clone your backups to tape using two ways. The first is using third-party software and the second is using Oracle Secure Backup (OSB). This integration from ZDLRA and OSB is the native way to do that.
Cloning backups to tape will help to offload backups from ZDLRA (reducing the space usage if you need to sustain long recovery windows), and add another layer of protection (since you can put tapes in a third site).
Here I will show how easy is to configure the OSB backup and how to integrate it into your backup policy.
For ZDLRA the protection policies have a significant role in the appliance management, but not just that, for the architecture design too. And usually (and unfortunately) policies do not take a lot of attention as deserved.
To create a good ZDLRA design, and avoid future problems, it is important to understand all the requirements for the protection policies and all the impacts. You can check the official documentation for this, but I will explain deeply the details that can pass without you notice them in the documentation.
ZDLRA can be used from a small single database environment to big environments where you need protection in more than one site at the same time. At every level, you can use different features of ZDLRA to provide desirable protection. Here I will show how to reach zero RPO for both primary and standby databases. All the steps, doc, and tech parts are covered.
You can check the examples the reference for every scenario int these two papers from the Oracle MAA team: MAA Overview On-Premises and Oracle MAA Reference Architectures. They provide good information on how to prepare to reduce RPO and improve RTO. In resume, the focus is the same, reduce the downtime and data loss in case of a catastrophe (zero RPO, and zero RPO).
If you looked both papers before, you saw that to provide good protection is desirable to have an additional site to, at least, send the backups. And if you go higher, for GOLD and PLATINUM environments, you start to have multiple sites synced with data guard. These Critical/Mission-critical environments need to be protected for every kind of catastrophic failure, from disk until complete site outage (some need to follow specific law’s requirements, bank as an example).
And the focus of this post is these big environments. I will show you how to use ZDLRA to protect both sites, reaching zero RPO even for standby databases. And doing that, you can survive for a catastrophic outage (like entire datacenter failure) and still have zero RPO. Going further, you can even have zero RPO if you lose completely on site when using real-time redo for ZDLRA, and this is not written in the docs by the way.
For ZDLRA, the task type INDEX_BACKUP it is important (if it is not the most) because it is responsible to create the virtual full backup. This task runs for every backup that you ingest at ZDLRA and here, I will show with more details what occurs at ZDLRA: internals steps, phases, and tables involved.
As you saw in my previous post, ZDLRA opens every backup that you sent and read every block of it to generate one new virtual full backup. And this backup is validated block a block (physically and logically) against corruption. It differs from a snapshot because it is content-aware (in this case it is proprietary Oracle datafile blocks inside another proprietary Oracle rman block) and Oracle it is the only that can do this guaranteeing that result is valid.
One of the most knows features of ZDLRA is the virtual full backup, basically incremental forever strategy. But what this means in real life? In this post, I will show some details about that and how interesting they are, check what it is Virtual Full Backup and Incremental Forever strategy for ZDLRA.
This post is based on my previous one where I showed all the steps to configure the VPC and enroll database at ZDLRA.
Virtual Full Backup
A virtual full backup appears as an incremental level 0 backup in the recovery catalog. From the user’s perspective, a virtual full backup is indistinguishable from a non-virtual full backup. Using virtual backups, Recovery Appliance provides the protection of frequent level 0 backups with only the cost of frequent level 1 backups.
This definition (and image) are in the Zero Data Loss Recovery Appliance Administrator’s Guide and I think that represents the essence of the virtual full backup. ZDLRA receive the incremental level 1 backup, index it, and generate a level 0 to you that it is indistinguishable from a normal backup level 0.
Quick post how to check and identify done for INDEX_BACKUP task in ZDLRA. In one simple way, just to contextualize, INDEX_BACKUP is one task for ZDLRA that (after you input the backup of datafile) generate an index of the blocks and create the virtual backup for you.
Here I will start a new series about ZDLRA with some hints based on my usage experience (practically since the release in 2014). The post from today is just little scratch about ZDLRA internals, I will extend this post in others (and future posts), stay tuned.