Fixing Exadata Missing Volumes at LVM

Recently during the Exadata patch, one database node reported an issue during the patchmgr and stopped the patch apply. The error was related to missing volumes (LVDoNotRemoveOrUse) at LVM. During the post, you can check the error, but please take attention that it changes some LVM config file contents. So, check correctly the step executed and (if possible) open pro-active SR to be sure what you will be doing.

The error

During the patchmgr from nopde01 to node02 of dbnodes I got the error below:

[root@exavm01s4 dbnodeupdate]# ./dbserver_patch_20.210314/patchmgr --dbnodes /u01/patches/exadatapt/dbnode_exavm_exavm02s4 --upgrade --iso_repo /u01/patches/exadatapt/domU/p32459080_201000_Linux-x86-64.zip --target_version 20.1.8.0.0.210317 --skip_gi_db_validation

************************************************************************************************************
NOTE    patchmgr release: 21.210314 (always check MOS 1553103.1 for the latest release of dbserver.patch.zip)
NOTE
NOTE    Database nodes will reboot during the update process.
NOTE
WARNING Do not interrupt the patchmgr session.
WARNING Do not resize the screen. It may disturb the screen layout.
WARNING Do not reboot database nodes during update or rollback.
WARNING Do not open logfiles in write mode and do not try to alter them.
************************************************************************************************************
2021-04-22 14:42:48 +0200        :INFO   : Checking hosts connectivity via ICMP/ping
2021-04-22 14:42:49 +0200        :INFO   : Hosts Reachable: [exavm02s4]
2021-04-22 14:42:49 +0200        :INFO   : All hosts are reachable via ping/ICMP
2021-04-22 14:42:49 +0200        :Working: Verify SSH equivalence for the root user to exavm02s4
2021-04-22 14:42:50 +0200        :INFO   : SSH equivalency verified to host exavm02s4
2021-04-22 14:42:50 +0200        :SUCCESS: Verify SSH equivalence for the root user to exavm02s4
2021-04-22 14:42:52 +0200        :Working: Initiate prepare steps on node(s).
2021-04-22 14:42:53 +0200        :Working: Check free space on exavm02s4
2021-04-22 14:42:57 +0200        :SUCCESS: Check free space on exavm02s4
2021-04-22 14:43:23 +0200        :SUCCESS: Initiate prepare steps on node(s).
2021-04-22 14:43:23 +0200        :Working: Initiate update on 1 node(s).
2021-04-22 14:43:24 +0200        :Working: dbnodeupdate.sh running a backup on 1 node(s).
2021-04-22 16:45:26 +0200        :ERROR  : dbnodeupdate.sh backup failed on one or more nodes

    SUMMARY OF ERRORS FOR exavm02s4:

    exavm02s4: ERROR: Backup failed investigate logfiles /var/log/cellos/dbnodeupdate.log and /var/log/cellos/dbserver_backup.sh.log


2021-04-22 16:45:33 +0200        :FAILED : dbnodeupdate.sh running a backup on 1 node(s).
[INFO     ] Collected dbnodeupdate diag in file: Diag_patchmgr_dbnode_upgrade_220421144247.tbz
-rw-r--r-- 1 root root 1701698 Apr 22 16:45 Diag_patchmgr_dbnode_upgrade_220421144247.tbz
2021-04-22 16:45:35 +0200        :ERROR  : FAILED run of command:./dbserver_patch_20.210314/patchmgr --dbnodes /u01/patches/exadatapt/dbnode_exavm_exavm02s4 --upgrade --iso_repo /u01/patches/exadatapt/domU/p32459080_201000_Linux-x86-64.zip --target_version 20.1.8.0.0.210317 --skip_gi_db_validation
2021-04-22 16:45:35 +0200        :INFO   : Upgrade attempted on nodes in file /u01/patches/exadatapt/dbnode_exavm_exavm02s4: [exavm02s4]
2021-04-22 16:45:35 +0200        :INFO   : Current image version on dbnode(s) is:
2021-04-22 16:45:35 +0200        :INFO   : exavm02s4: 19.2.19.0.0.201013
2021-04-22 16:45:35 +0200        :INFO   : For details, check the following files in /u01/patches/exadatapt/dbnodeupdate/dbserver_patch_20.210314:
2021-04-22 16:45:35 +0200        :INFO   :  - <dbnode_name>_dbnodeupdate.log
2021-04-22 16:45:35 +0200        :INFO   :  - patchmgr.log
2021-04-22 16:45:35 +0200        :INFO   :  - patchmgr.trc
2021-04-22 16:45:35 +0200        :INFO   : Exit status:1
2021-04-22 16:45:35 +0200        :INFO   : Exiting.

If you check the patchmgr.log we can check the same error message. But looking at /var/log/cellos/dbnodeupdate.log (at target node that will be patched) the true error appears:

[root@exavm02s4 ~]# vi /var/log/cellos/dbnodeupdate.log
...
...
    Setting interval between checks to 0 seconds
    [INFO] Mount spare root partition /dev/VGExaDb/LVDbSys2 to /mnt_spare
      Failed to find logical volume "VGExaDb/LVDoNotRemoveOrUse"
    [INFO] Preserve and then reset label for the root partition /dev/VGExaDb/LVDbSys1
    [INFO] Total amount of space available for snapshot: 1 GB
    [INFO] Will be using snapshot of size: 1 GB
    [INFO] Create LVM snapshot with 1 GB size of the root partition /dev/VGExaDb/LVDbSys1
      WARNING: Missing device /dev/xvda2 reappeared, updating metadata for VG VGExaDb to version 44.
      WARNING: Device /dev/xvda2 still marked missing because of allocated data on it, remove volumes and consider vgreduce --removemissing.
      WARNING: Missing device /dev/xvdd1 reappeared, updating metadata for VG VGExaDb to version 44.
      WARNING: Device /dev/xvdd1 still marked missing because of allocated data on it, remove volumes and consider vgreduce --removemissing.
      Cannot change VG VGExaDb while PVs are missing.
      Consider vgreduce --removemissing.
      Cannot process volume group VGExaDb
    Unable to create LVM snapshot with 1Gb size of the root partition /dev/VGExaDb/LVDbSys1
[1619095405][2021-04-22 16:45:18 +0200][INFO][./dbnodeupdate.sh][DiaryEntry][]  Entering PrintGenError Backup failed investigate logfiles /var/log/cellos/dbnodeupdate.log and /var/log/cellos/dbserver_backup.sh.log
...

The error is clear: “Missing device” and “Cannot change VG VGExaDb while PVs are missing”. So, basically, the LVM is reporting missing volumes and we need to recreate/reimport it again.

Recovering

The start point is to get a baseline from one correct node. If you don’t know, one example from correct LVM volumes for Exadata VM are:

[root@exavm01s4 dbserver_patch_20.210314]# pvs
  PV         VG      Fmt  Attr PSize    PFree
  /dev/xvda2 VGExaDb lvm2 a--   <24.50g    0
  /dev/xvdd1 VGExaDb lvm2 a--   <62.00g    0
  /dev/xvdf  VGExaDb lvm2 a--   <50.00g    0
  /dev/xvdg  VGExaDb lvm2 a--  <150.00g    0
[root@exavm01s4 dbserver_patch_20.210314]# vgs
  VG      #PV #LV #SN Attr   VSize   VFree
  VGExaDb   4   5   0 wz--n- 286.48g    0
[root@exavm01s4 dbserver_patch_20.210314]#
[root@exavm01s4 dbserver_patch_20.210314]# lvs
  LV                 VG      Attr       LSize   Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
  LVDbOra1           VGExaDb -wi-ao---- 221.48g
  LVDbSwap1          VGExaDb -wi-ao----  16.00g
  LVDbSys1           VGExaDb -wi-ao----  24.00g
  LVDbSys2           VGExaDb -wi-a-----  24.00g
  LVDoNotRemoveOrUse VGExaDb -wi-a-----   1.00g
[root@exavm01s4 dbserver_patch_20.210314]#

But when I check the failed node I have:

[root@exavm02s4 ~]# pvs
  PV         VG      Fmt  Attr PSize    PFree
  /dev/xvda2 VGExaDb lvm2 a-m   <24.50g    0
  /dev/xvdd1 VGExaDb lvm2 a-m   <62.00g 1.00g
  /dev/xvdf  VGExaDb lvm2 a--   <50.00g    0
  /dev/xvdg  VGExaDb lvm2 a--  <150.00g    0
[root@exavm02s4 ~]#
[root@exavm02s4 ~]# lvs
  LV        VG      Attr       LSize   Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
  LVDbOra1  VGExaDb -wi-ao--p- 221.48g
  LVDbSwap1 VGExaDb -wi-ao--p-  16.00g
  LVDbSys1  VGExaDb -wi-ao--p-  24.00g
  LVDbSys2  VGExaDb -wi-a---p-  24.00g
[root@exavm02s4 ~]#
[root@exavm02s4 ~]# vgs
  VG      #PV #LV #SN Attr   VSize   VFree
  VGExaDb   4   4   0 wz-pn- 286.48g 1.00g
[root@exavm02s4 ~]#

As you can see the failed node is missing logical volumes, mainly the LVDoNotRemoveOrUse of 1GB.

The first step to solve is to try to remove the missing volumes. This step will fail but is crucial to be executed because it will generate one backup of the current configuration of LVM. And we need to edit this backup file to reload it.

[root@exavm02s4 ~]# vgreduce --removemissing --verbose VGExaDb
    There are 2 physical volumes missing.
    There are 2 physical volumes missing.
    Archiving volume group "VGExaDb" metadata (seqno 46).
  WARNING: Partial LV LVDbSys1 needs to be repaired or removed.
  WARNING: Partial LV LVDbSys2 needs to be repaired or removed.
  WARNING: Partial LV LVDbOra1 needs to be repaired or removed.
  WARNING: Partial LV LVDbSwap1 needs to be repaired or removed.
  There are still partial LVs in VG VGExaDb.
  To remove them unconditionally use: vgreduce --removemissing --force.
  WARNING: Proceeding to remove empty missing PVs.
    There are 2 physical volumes missing.
    Creating volume group backup "/etc/lvm/backup/VGExaDb" (seqno 47).
[root@exavm02s4 ~]#

Look that the file /etc/lvm/backup/VGExaDb is generated. And as an example, you can see that even trying to re-create or delete the missing volume generate error as well:

[root@exavm02s4 ~]# lvcreate -n LVDoNotRemoveOrUse -L1G VGExaDb
  WARNING: Missing device /dev/xvda2 reappeared, updating metadata for VG VGExaDb to version 48.
  WARNING: Device /dev/xvda2 still marked missing because of allocated data on it, remove volumes and consider vgreduce --removemissing.
  WARNING: Missing device /dev/xvdd1 reappeared, updating metadata for VG VGExaDb to version 48.
  WARNING: Device /dev/xvdd1 still marked missing because of allocated data on it, remove volumes and consider vgreduce --removemissing.
  Cannot change VG VGExaDb while PVs are missing.
  Consider vgreduce --removemissing.
  Cannot process volume group VGExaDb
[root@exavm02s4 ~]#
[root@exavm02s4 ~]# vgextend --restoremissing LVDbSys1 VGExaDb
  Volume group "LVDbSys1" not found
  Cannot process volume group LVDbSys1
[root@exavm02s4 ~]#

Modifying the LVM

After the /etc/lvm/backup/VGExaDb is created we can check the content of it and verify that are volumes (physical) marked with the MISSING flag:

[root@exavm02s4 ~]# cd /etc/lvm/
[root@exavm02s4 lvm]#
[root@exavm02s4 lvm]# ls -l backup/
total 4
-rw------- 1 root root 3653 Apr 22 17:19 VGExaDb
[root@exavm02s4 lvm]#
[root@exavm02s4 lvm]#
[root@exavm02s4 lvm]# cat backup/VGExaDb
# Generated by LVM2 version 2.02.186(2)-RHEL7 (2019-08-27): Thu Apr 22 17:19:30 2021

contents = "Text Format Volume Group"
version = 1

description = "Created *after* executing 'vgreduce --removemissing --verbose VGExaDb'"

creation_host = "exavm02s4.mynt.simon.net"      # Linux exavm02s4.mynt.simon.net 4.1.12-124.42.4.el7uek.x86_64 #2 SMP Thu Sep 3 16:14:48 PDT 2020 x86_64
creation_time = 1619104770      # Thu Apr 22 17:19:30 2021

VGExaDb {
        id = "ynfwGi-HZPF-0fe9-38lq-MbKE-DMhp-40QUXh"
        seqno = 48
        format = "lvm2"                 # informational
        status = ["RESIZEABLE", "READ", "WRITE"]
        flags = []
        extent_size = 8192              # 4 Megabytes
        max_lv = 0
        max_pv = 0
        metadata_copies = 0

        physical_volumes {

                pv0 {
                        id = "o6OAXC-J3xd-Z8YT-mE2j-MZyN-UM4H-7cP3Q0"
                        device = "/dev/xvda2"   # Hint only

                        status = ["ALLOCATABLE"]
                        flags = ["MISSING"]
                        dev_size = 51380126     # 24.5 Gigabytes
                        pe_start = 384
                        pe_count = 6271 # 24.4961 Gigabytes
                }

                pv1 {
                        id = "eQDYEs-cwbA-OI58-R9hP-Sure-bsbI-rE5w0x"
                        device = "/dev/xvdd1"   # Hint only

                        status = ["ALLOCATABLE"]
                        flags = ["MISSING"]
                        dev_size = 130023326    # 62 Gigabytes
                        pe_start = 384
                        pe_count = 15871        # 61.9961 Gigabytes
                }

                pv2 {
                        id = "fE09JF-ajaK-7057-oiZL-6DRO-UGfL-l1vIFy"
                        device = "/dev/xvdf"    # Hint only

                        status = ["ALLOCATABLE"]
                        flags = []
                        dev_size = 104857600    # 50 Gigabytes
                        pe_start = 2048
                        pe_count = 12799        # 49.9961 Gigabytes
                }

                pv3 {
                        id = "SNJiFM-PEjR-xuyU-8kBK-vqGf-b5AT-kDfW2S"
                        device = "/dev/xvdg"    # Hint only

                        status = ["ALLOCATABLE"]
                        flags = []
                        dev_size = 314572800    # 150 Gigabytes
                        pe_start = 2048
                        pe_count = 38399        # 149.996 Gigabytes
                }
        }

        logical_volumes {
...
...

So, we edit the file and remove the MISSING flag content (not the entire flag – just the value of it):

[root@exavm02s4 lvm]# cat /etc/lvm/backup/VGExaDb |grep MISSING
                        flags = ["MISSING"]
                        flags = ["MISSING"]
[root@exavm02s4 lvm]#
[root@exavm02s4 lvm]#
[root@exavm02s4 lvm]# cd /etc/lvm/backup/
[root@exavm02s4 backup]#
[root@exavm02s4 backup]#
[root@exavm02s4 backup]# vi /etc/lvm/backup/VGExaDb
[root@exavm02s4 backup]#
[root@exavm02s4 backup]#
[root@exavm02s4 backup]# cat VGExaDb |grep MISSING
[root@exavm02s4 backup]#

After remove we can restore the config file that we edit (the backup file). Please BE AWARE that this can damage your LVM if you have not edited the correct file. Never use files generated from another node, the volume’s ids can be different. Below one parameter is the backup file VGExaDb itself, and the other one is the name of the volume group (VG). Both have the same name, for Exadata the VG is VGExaDb.

[root@exavm02s4 backup]# vgcfgrestore -f VGExaDb VGExaDb
  Volume group VGExaDb has active volume: LVDbSys2.
  Volume group VGExaDb has active volume: LVDbSwap1.
  Volume group VGExaDb has active volume: LVDbSys1.
  Volume group VGExaDb has active volume: LVDbOra1.
  WARNING: Found 4 active volume(s) in volume group "VGExaDb".
  Restoring VG with active LVs, may cause mismatch with its metadata.
Do you really want to proceed with restore of volume group "VGExaDb", while 4 volume(s) are active? [y/n]: y
  Restored volume group VGExaDb
  Scan of VG VGExaDb from /dev/xvda2 found metadata seqno 49 vs previous 48.
  Scan of VG VGExaDb from /dev/xvdd1 found metadata seqno 49 vs previous 48.
  Scan of VG VGExaDb from /dev/xvdf found metadata seqno 49 vs previous 48.
  Scan of VG VGExaDb from /dev/xvdg found metadata seqno 49 vs previous 48.
[root@exavm02s4 backup]#

As you can see above the volume group was restored and a new sequence number was generated to identify it.

After that we can scan the volumes again to check if everything was added correctly and reboot the node:

[root@exavm02s4 backup]# vgscan
  Reading volume groups from cache.
  Found volume group "VGExaDb" using metadata type lvm2
[root@exavm02s4 backup]#
[root@exavm02s4 backup]#
[root@exavm02s4 backup]# vgs
  VG      #PV #LV #SN Attr   VSize   VFree
  VGExaDb   4   4   0 wz--n- 286.48g 1.00g
[root@exavm02s4 backup]#
[root@exavm02s4 backup]# reboot
...
...

After the reboot we can scan again and recreate the missing volume LVDoNotRemoveOrUse:

[root@exavm02s4 ~]# vgscan
  Reading volume groups from cache.
  Found volume group "VGExaDb" using metadata type lvm2
[root@exavm02s4 ~]#
[root@exavm02s4 ~]# lvscan
  ACTIVE            '/dev/VGExaDb/LVDbSys1' [24.00 GiB] inherit
  ACTIVE            '/dev/VGExaDb/LVDbSys2' [24.00 GiB] inherit
  ACTIVE            '/dev/VGExaDb/LVDbOra1' [221.48 GiB] inherit
  ACTIVE            '/dev/VGExaDb/LVDbSwap1' [16.00 GiB] inherit
[root@exavm02s4 ~]#
[root@exavm02s4 ~]# lvcreate -n LVDoNotRemoveOrUse -L1G VGExaDb
  Logical volume "LVDoNotRemoveOrUse" created.
[root@exavm02s4 ~]#
[root@exavm02s4 ~]# lvscan
  ACTIVE            '/dev/VGExaDb/LVDbSys1' [24.00 GiB] inherit
  ACTIVE            '/dev/VGExaDb/LVDbSys2' [24.00 GiB] inherit
  ACTIVE            '/dev/VGExaDb/LVDbOra1' [221.48 GiB] inherit
  ACTIVE            '/dev/VGExaDb/LVDbSwap1' [16.00 GiB] inherit
  ACTIVE            '/dev/VGExaDb/LVDoNotRemoveOrUse' [1.00 GiB] inherit
[root@exavm02s4 ~]#

Is not clear (and I couldn’t investigate) why this error appeared. it was not the first time that I got the same error. Maybe can be related to dracut issue that I described in my previous post.

Disclaimer: “The postings on this site are my own and don’t necessarily represent my actual employer positions, strategies or opinions. The information here was edited to be useful for general purpose, specific data and identifications were removed to allow reach the generic audience and to be useful for the community. Post protected by copyright.”

Leave a Reply

Your email address will not be published. Required fields are marked *