TFA error after GI upgrade to 19c

Recently I made an Exadata stack upgrade/update to the last 19.2 version (19.2.7.0.0.191012) and I upgraded the GI from 18c to 19c (last 19c version – 19.5.0.0.191015) and after that, TFA does not work.

Since I don’t want to complete execute a TFA clean and reinstallation I tried to find the error and the solution. Here I want to share with you the workaround (since there is no solution yet) that I discovered and used to fix the error.

The environment

The actual environment is:

  • Old Grid Infrastructure: Version 18.6.0.0.190416
  • New Grid Infrastructure: Version 19.5.0.0.191015
  • Exadata domU: Version 19.2.7.0.0.191012 running kernel 4.1.12-124.30.1.el7uek.x86_64

TFA error

After upgrade the GI from 18c to 19c, the TFA does not work. If you try to start it or collect log using it, you can receive errors. In the environment described here, the TFA was running fine with the 18c version, and the rootupgrade script from 18c to 19c does not report an error.

And to be more precise, the TFA upgrade from 18c to 19c called by rootupgrade was ok (according to the log – I will show later). But even after that, the error occurs.

The provided solution as usual (by MOS support): download the lastest TFA and reinstall the actual one. Unfortunately, I not like this approach because can lead to an error during GI upgrade for next releases (like 20) and updates (19.6 as an example).

So, when I tried to collect TFA:

[root@exsite1c1 ~]# /u01/app/19.0.0.0/grid/tfa/bin/tfactl diagcollect
WARNING - TFA Software is older than 180 days. Please consider upgrading TFA to the latest version.
TFA-00002 Oracle Trace File Analyzer (TFA) is not running
Please start TFA before running collections
[root@exsite1c1 ~]#

So, when checking for running TFA I made ps -ef and not saw process running:

[root@exsite1c1 ~]# ps -ef |grep tfa
root      10665      1  0 Nov21 ?        00:00:06 /bin/sh /etc/init.d/init.tfa run >/dev/null 2>&1 </dev/null
root      40285  37137  0 11:05 pts/0    00:00:00 grep --color=auto tfa
[root@exsite1c1 ~]#

And if I try to start TFA (as root), nothing report (error or OK):

[root@exsite1c1 ~]# /etc/init.d/init.tfa start
Starting TFA..
Waiting up to 100 seconds for TFA to be started..
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
[root@exsite1c1 ~]# 
[root@exsite1c1 ~]# ps -ef |grep tfa
root      10665      1  0 Nov21 ?        00:00:06 /bin/sh /etc/init.d/init.tfa run >/dev/null 2>&1 </dev/null
root      46031  37137  0 11:07 pts/0    00:00:00 grep --color=auto tfa
[root@exsite1c1 ~]#

Checking in the MOS I saw related problems with bad Perl version. For this TFA release is needed version 5.10 at lease. But was not the case:

[root@exsite1c1 ~]# perl -v

This is perl 5, version 16, subversion 3 (v5.16.3) built for x86_64-linux-thread-multi
(with 39 registered patches, see perl -V for more detail)

Copyright 1987-2012, Larry Wall

Perl may be copied only under the terms of either the Artistic License or the
GNU General Public License, which may be found in the Perl 5 source kit.

Complete documentation for Perl, including FAQ lists, should be found on
this system using "man perl" or "perldoc perl".  If you have access to the
Internet, point your browser at http://www.perl.org/, the Perl Home Page.

[root@exsite1c1 ~]#

Searching the problem

Digging for the source of the problem I checked the rootupgrade but the report was good. The TFA upgrade completed with success:

[root@exsite1c1 ~]# vi /u01/app/grid/crsdata/exsite1c2/crsconfig/rootcrs_exsite1c2_2019-11-15_12-12-21AM.log
...
...
2019-11-14 14:18:40: Executing the [UpgradeTFA] step with checkpoint [null] ...
2019-11-14 14:18:40: Executing cmd: /u01/app/19.0.0.0/grid/bin/clsecho -p has -f clsrsc -m 595 '1' '18' 'UpgradeTFA'
2019-11-14 14:18:40: Executing cmd: /u01/app/19.0.0.0/grid/bin/clsecho -p has -f clsrsc -m 595 '1' '18' 'UpgradeTFA'
2019-11-14 14:18:40: Command output:
>  CLSRSC-595: Executing upgrade step 1 of 18: 'UpgradeTFA'.
>End Command output
2019-11-14 14:18:40: CLSRSC-595: Executing upgrade step 1 of 18: 'UpgradeTFA'.
2019-11-14 14:18:40: Executing cmd: /u01/app/19.0.0.0/grid/bin/clsecho -p has -f clsrsc -m 4015
2019-11-14 14:18:40: Executing cmd: /u01/app/19.0.0.0/grid/bin/clsecho -p has -f clsrsc -m 4015
2019-11-14 14:18:40: Command output:
>  CLSRSC-4015: Performing install or upgrade action for Oracle Trace File Analyzer (TFA) Collector.
>End Command output
2019-11-14 14:18:40: CLSRSC-4015: Performing install or upgrade action for Oracle Trace File Analyzer (TFA) Collector.
2019-11-14 14:18:40: Executing the [ValidateEnv] step with checkpoint [null] ...
2019-11-14 14:18:40: Executing cmd: /u01/app/19.0.0.0/grid/crs/install/tfa_setup -silent -crshome /u01/app/19.0.0.0/grid
2019-11-14 14:18:40: Executing cmd: /u01/app/19.0.0.0/grid/bin/clsecho -p has -f clsrsc -m 595 '2' '18' 'ValidateEnv'
2019-11-14 14:18:40: Executing cmd: /u01/app/19.0.0.0/grid/bin/clsecho -p has -f clsrsc -m 595 '2' '18' 'ValidateEnv'
2019-11-14 14:18:40: Command output:
>  CLSRSC-595: Executing upgrade step 2 of 18: 'ValidateEnv'.
...
...
2019-11-14 14:23:45: Command output:
>
>  TFA Installation Log will be written to File : /tmp/tfa_install_293046_2019_11_14-14_18_40.log
...
...
2019-11-14 14:23:45: Command output:
>  CLSRSC-4003: Successfully patched Oracle Trace File Analyzer (TFA) Collector.
>End Command output

And other related logs reported complete success:

[root@exsite1c1 ~]# cat /tmp/tfa_install_293046_2019_11_14-14_18_40.log
[2019-11-14 14:18:40] Log File written to : /tmp/tfa_install_293046_2019_11_14-14_18_40.log
[2019-11-14 14:18:40]
[2019-11-14 14:18:40] Starting TFA installation
[2019-11-14 14:18:40]
[2019-11-14 14:18:40] TFA Version: 192000 Build Date: 201904260414
[2019-11-14 14:18:40]
[2019-11-14 14:18:40] About to check previous TFA installations ...
[2019-11-14 14:18:40] TFA HOME : /u01/app/18.0.0/grid/tfa/exsite1c1/tfa_home
[2019-11-14 14:18:40]
[2019-11-14 14:18:40] Installed Build Version: 184100 Build Date: 201902260236
[2019-11-14 14:18:40]
[2019-11-14 14:18:40] INSTALL_TYPE GI
[2019-11-14 14:18:40] Shutting down TFA for Migration...
[2019-11-14 14:20:24]
[2019-11-14 14:20:24] Removing /etc/init.d/init.tfa...
[2019-11-14 14:20:24]
[2019-11-14 14:20:24] Migrating TFA to /u01/app/19.0.0.0/grid/tfa/exsite1c1/tfa_home...
[2019-11-14 14:20:50]
[2019-11-14 14:20:50] Starting TFA on exsite1c1...
[2019-11-14 14:20:50]
[2019-11-14 14:21:05]
[2019-11-14 14:21:05] TFA_INSTALLER /u01/app/19.0.0.0/grid/crs/install/tfa_setup
[2019-11-14 14:21:05] TFA is already installed. Upgrading TFA
[2019-11-14 14:21:05]
[2019-11-14 14:21:05] TFA patching CRS or DB from zipfile extracted to /tmp/.293046.tfa
[2019-11-14 14:21:06] TFA Upgrade Log : /u01/app/19.0.0.0/grid/tfa/exsite1c1/tfapatch.log
[2019-11-14 14:23:31] Patch Status : 0
[2019-11-14 14:23:31] Patching OK : Running install_ext
[2019-11-14 14:23:32] Installing oratop extension..
[2019-11-14 14:23:32]
.-----------------------------------------------------------------.
| Host      | TFA Version | TFA Build ID         | Upgrade Status |
+-----------+-------------+----------------------+----------------+
| exsite1c1 |  19.2.0.0.0 | 19200020190426041420 | UPGRADED       |
| exsite1c2 |  18.4.1.0.0 | 18410020190226023629 | NOT UPGRADED   |
'-----------+-------------+----------------------+----------------'

[2019-11-14 14:23:44] Removing Old TFA /u01/app/18.0.0/grid/tfa/exsite1c1/tfa_home...
[2019-11-14 14:23:45] Cleanup serializable files
[2019-11-14 14:23:45]
[root@exsite1c1 ~]#
[root@exsite1c1 ~]# cat /u01/app/19.0.0.0/grid/tfa/exsite1c1/tfapatch.log

TFA will be upgraded on Node exsite1c1:


Upgrading TFA on exsite1c1 :

Stopping TFA Support Tools...

Shutting down TFA for Patching...

Shutting down TFA
. . . . .
. . .
Successfully shutdown TFA..

No Berkeley DB upgrade required

Copying TFA Certificates...


Starting TFA in exsite1c1...

Starting TFA..
Waiting up to 100 seconds for TFA to be started..
. . . . .
Successfully started TFA Process..
. . . . .
TFA Started and listening for commands

Enabling Access for Non-root Users on exsite1c1...

[root@exsite1c1 ~]#

One know problem occurs when (for some reason) the nodes of the clusters lost the sync for TFA. I tried to do the sync, and this pointed one clue:

[root@exsite1c1 ~]# /u01/app/19.0.0.0/grid/tfa/bin/tfactl syncnodes
WARNING - TFA Software is older than 180 days. Please consider upgrading TFA to the latest version.

/u01/app/19.0.0.0/grid/tfa/exsite1c1/tfa_home/bin/synctfanodes.sh: line 237: /u01/app/18.0.0/grid/perl/bin/perl: No such file or directory
TFA-00002 Oracle Trace File Analyzer (TFA) is not running

Current Node List in TFA :
1.

Unable to determine Node List to be synced. Please update manually.

Do you want to update this node list? [Y|N] [N]: ^C[root@exsite1c1 ~]#
[root@exsite1c1 ~]#

As you can see, the syncnodes.sh made a reference for the old 18c GI home. And inside of the sync script, you can see the reference of that like 237 (my mark below) checked for PERL, and this came from the file tfa_setup.txt.

[root@exsite1c1 ~]# vi /u01/app/19.0.0.0/grid/tfa/exsite1c1/tfa_home/bin/synctfanodes.sh
...
...
        if [ `$GREP -c '^PERL=' $tfa_home/tfa_setup.txt` -ge 1 ]    <== TFA CHECK
        then
                PERL=`$GREP '^PERL=' $tfa_home/tfa_setup.txt | $AWK -F"=" '{print $2}'`;
        fi

        if [ `$GREP -c '^CRS_HOME=' $tfa_home/tfa_setup.txt` -ge 1 ]
        then
                CRS_HOME=`$GREP '^CRS_HOME=' $tfa_home/tfa_setup.txt | $AWK -F"=" '{print $2}'`;
        fi

        if [ `$GREP -c '^RUN_MODE=' $tfa_home/tfa_setup.txt` -ge 1 ]
        then
                RUN_MODE=`$GREP '^RUN_MODE=' $tfa_home/tfa_setup.txt | $AWK -F"=" '{print $2}'`;
        fi
fi

RUSER=`$RUID | $AWK '{print $1}' | $AWK -F\( '{print $2}' | $AWK -F\) '{print $1}'`;

if [ $RUSER != $DAEMON_OWNER ]
then
        $ECHO "User '$RUSER' does not have permissions to run this script.";
        exit 1;
fi

SSH_USER="$DAEMON_OWNER";

HOSTNAME=`hostname | $CUT -d. -f1 | $PERL -ne 'print lc'`;    <===== LINE 237
...
...

Checking tfa_setup.txt

Checking the file we can see the error:

[root@exsite1c1 ~]# cat /u01/app/19.0.0.0/grid/tfa/exsite1c1/tfa_home/tfa_setup.txt
CRS_HOME=/u01/app/18.0.0/grid
exsite1c1%CRS_INSTALLED=1
NODE_NAMES=exsite1c1
ORACLE_BASE=/u01/app/grid
JAVA_HOME=/u01/app/18.0.0/grid/jdk/jre
exsite1c1%CRS%DIAGDEST=/u01/app/12.1.0.2/grid/OPatch/crs/log
exsite1c1%CFGTOOLS%DIAGDEST=/u01/app/12.1.0.2/grid/cfgtoollogs
exsite1c1%CRS%DIAGDEST=/u01/app/12.1.0.2/grid/crf/db/exsite1c1
exsite1c1%CRS%DIAGDEST=/u01/app/12.1.0.2/grid/crs/log
exsite1c1%CRS%DIAGDEST=/u01/app/12.1.0.2/grid/cv/log
exsite1c1%CRS%DIAGDEST=/u01/app/12.1.0.2/grid/evm/admin/log
exsite1c1%CRS%DIAGDEST=/u01/app/12.1.0.2/grid/evm/admin/logger
exsite1c1%CRS%DIAGDEST=/u01/app/12.1.0.2/grid/evm/log
exsite1c1%INSTALL%DIAGDEST=/u01/app/12.1.0.2/grid/install
exsite1c1%CRS%DIAGDEST=/u01/app/12.1.0.2/grid/log
exsite1c1%CRS%DIAGDEST=/u01/app/12.1.0.2/grid/network/log
exsite1c1%DBWLM%DIAGDEST=/u01/app/12.1.0.2/grid/oc4j/j2ee/home/log
exsite1c1%CRS%DIAGDEST=/u01/app/12.1.0.2/grid/opmn/logs
exsite1c1%CRS%DIAGDEST=/u01/app/12.1.0.2/grid/racg/log
exsite1c1%ASM%DIAGDEST=/u01/app/12.1.0.2/grid/rdbms/log
exsite1c1%CRS%DIAGDEST=/u01/app/12.1.0.2/grid/scheduler/log
exsite1c1%CRS%DIAGDEST=/u01/app/12.1.0.2/grid/srvm/log
exsite1c1%ACFS%DIAGDEST=/u01/app/grid/crsdata/exsite1c1/acfs
exsite1c1%CRS%DIAGDEST=/u01/app/grid/crsdata/exsite1c1/core
exsite1c1%CRS%DIAGDEST=/u01/app/grid/crsdata/exsite1c1/crsconfig
exsite1c1%CRS%DIAGDEST=/u01/app/grid/crsdata/exsite1c1/crsdiag
exsite1c1%CRS%DIAGDEST=/u01/app/grid/crsdata/exsite1c1/cvu
exsite1c1%CRS%DIAGDEST=/u01/app/grid/crsdata/exsite1c1/evm
exsite1c1%CRS%DIAGDEST=/u01/app/grid/crsdata/exsite1c1/output
exsite1c1%CRS%DIAGDEST=/u01/app/grid/crsdata/exsite1c1/trace
exsite1c1%INSTALL%DIAGDEST=/u01/app/oraInventory/ContentsXML
exsite1c1%INSTALL%DIAGDEST=/u01/app/oraInventory/logs
TRACE_LEVEL=1
INSTALL_TYPE=GI
PERL=/u01/app/18.0.0/grid/perl/bin/perl
RDBMS_ORACLE_HOME=/u01/app/oracle/product/12.1.0.2/dbhome_1||
RDBMS_ORACLE_HOME=/u01/app/oracle/product/11.2.0.4/dbhome_1||
RDBMS_ORACLE_HOME=/u01/app/12.2.0.1/grid||
TZ=Europe/Luxembourg
RDBMS_ORACLE_HOME=/u01/app/18.0.0/grid||
localnode%ADRBASE=/u01/app/grid
RDBMS_ORACLE_HOME=/u01/app/oracle/product/18.0.0/dbhome_1||
localnode%ADRBASE=/u01/app/oracle
RDBMS_ORACLE_HOME=/u01/app/oracle/product/11.2.0.4/financ||
localnode%ADRBASE=/u01/app/oracle
RDBMS_ORACLE_HOME=/u01/app/oracle/product/12.1.0.2/financ||
localnode%ADRBASE=/u01/app/oracle
DAEMON_OWNER=root
RDBMS_ORACLE_HOME=/u01/app/oracle/agent/13.2.0/agent_13.2.0.0.0||
RDBMS_ORACLE_HOME=/u01/app/12.1.0.2/grid||
RDBMS_ORACLE_HOME=/u01/app/19.0.0.0/grid||
localnode%ADRBASE=/u01/app/grid
CRS_ACTIVE_VERSION=
[root@exsite1c1 ~]#

As you can see above, the CRS_HOME, JAVA_HOME, PERL, and ORACLE_HOME parameters are pointing to the old GI folder. As a workaround I edited the tfa_setup.txt in both nodes and fixed the GI folder from 18.0 to 19.0:

[root@exsite1c1 ~]# vi /u01/app/19.0.0.0/grid/tfa/exsite1c1/tfa_home/tfa_setup.txt
[root@exsite1c1 ~]#
[root@exsite1c1 ~]#
[root@exsite1c1 ~]#
[root@exsite1c1 ~]# cat /u01/app/19.0.0.0/grid/tfa/exsite1c1/tfa_home/tfa_setup.txt
CRS_HOME=/u01/app/19.0.0.0/grid
exsite1c1%CRS_INSTALLED=1
NODE_NAMES=exsite1c1
ORACLE_BASE=/u01/app/grid
JAVA_HOME=/u01/app/19.0.0.0/grid/jdk/jre
exsite1c1%CRS%DIAGDEST=/u01/app/19.0.0.0/grid/OPatch/crs/log
exsite1c1%CFGTOOLS%DIAGDEST=/u01/app/19.0.0.0/grid/cfgtoollogs
exsite1c1%CRS%DIAGDEST=/u01/app/19.0.0.0/grid/crf/db/exsite1c1
exsite1c1%CRS%DIAGDEST=/u01/app/19.0.0.0/grid/crs/log
exsite1c1%CRS%DIAGDEST=/u01/app/19.0.0.0/grid/cv/log
exsite1c1%CRS%DIAGDEST=/u01/app/19.0.0.0/grid/evm/admin/log
exsite1c1%CRS%DIAGDEST=/u01/app/19.0.0.0/grid/evm/admin/logger
exsite1c1%CRS%DIAGDEST=/u01/app/19.0.0.0/grid/evm/log
exsite1c1%INSTALL%DIAGDEST=/u01/app/19.0.0.0/grid/install
exsite1c1%CRS%DIAGDEST=/u01/app/19.0.0.0/grid/log
exsite1c1%CRS%DIAGDEST=/u01/app/19.0.0.0/grid/network/log
exsite1c1%DBWLM%DIAGDEST=/u01/app/19.0.0.0/grid/oc4j/j2ee/home/log
exsite1c1%CRS%DIAGDEST=/u01/app/19.0.0.0/grid/opmn/logs
exsite1c1%CRS%DIAGDEST=/u01/app/19.0.0.0/grid/racg/log
exsite1c1%ASM%DIAGDEST=/u01/app/19.0.0.0/grid/rdbms/log
exsite1c1%CRS%DIAGDEST=/u01/app/19.0.0.0/grid/scheduler/log
exsite1c1%CRS%DIAGDEST=/u01/app/19.0.0.0/grid/srvm/log
exsite1c1%ACFS%DIAGDEST=/u01/app/grid/crsdata/exsite1c1/acfs
exsite1c1%CRS%DIAGDEST=/u01/app/grid/crsdata/exsite1c1/core
exsite1c1%CRS%DIAGDEST=/u01/app/grid/crsdata/exsite1c1/crsconfig
exsite1c1%CRS%DIAGDEST=/u01/app/grid/crsdata/exsite1c1/crsdiag
exsite1c1%CRS%DIAGDEST=/u01/app/grid/crsdata/exsite1c1/cvu
exsite1c1%CRS%DIAGDEST=/u01/app/grid/crsdata/exsite1c1/evm
exsite1c1%CRS%DIAGDEST=/u01/app/grid/crsdata/exsite1c1/output
exsite1c1%CRS%DIAGDEST=/u01/app/grid/crsdata/exsite1c1/trace
exsite1c1%INSTALL%DIAGDEST=/u01/app/oraInventory/ContentsXML
exsite1c1%INSTALL%DIAGDEST=/u01/app/oraInventory/logs
TRACE_LEVEL=1
INSTALL_TYPE=GI
PERL=/u01/app/19.0.0.0/grid/perl/bin/perl
RDBMS_ORACLE_HOME=/u01/app/oracle/product/12.1.0.2/dbhome_1||
RDBMS_ORACLE_HOME=/u01/app/oracle/product/11.2.0.4/dbhome_1||
TZ=Europe/Luxembourg
RDBMS_ORACLE_HOME=/u01/app/oracle/product/18.0.0/dbhome_1||
localnode%ADRBASE=/u01/app/oracle
RDBMS_ORACLE_HOME=/u01/app/oracle/product/11.2.0.4/financ||
localnode%ADRBASE=/u01/app/oracle
RDBMS_ORACLE_HOME=/u01/app/oracle/product/12.1.0.2/financ||
localnode%ADRBASE=/u01/app/oracle
DAEMON_OWNER=root
RDBMS_ORACLE_HOME=/u01/app/oracle/agent/13.2.0/agent_13.2.0.0.0||
RDBMS_ORACLE_HOME=/u01/app/19.0.0.0/grid||
localnode%ADRBASE=/u01/app/grid
CRS_ACTIVE_VERSION=19.0.0.0
[root@exsite1c1 ~]#

And after the edit was possible to start TAF correctly:

[root@exsite1c1 ~]# /etc/init.d/init.tfa start
Starting TFA..
Waiting up to 100 seconds for TFA to be started..
. . . . .
Successfully started TFA Process..
. . . . .
WARNING - TFA Software is older than 180 days. Please consider upgrading TFA to the latest version.
TFA Started and listening for commands
[root@exsite1c1 ~]#
[root@exsite1c1 ~]#
[root@exsite1c1 ~]# ps -ef |grep tfa
root     113905      1  0 11:31 ?        00:00:00 /bin/sh /etc/init.d/init.tfa run >/dev/null 2>&1 </dev/null
root     115917      1 99 11:31 ?        00:00:24 /u01/app/19.0.0.0/grid/jdk/jre/bin/java -server -Xms256m -Xmx512m -Djava.awt.headless=true -Ddisable.checkForUpdate=true -XX:ParallelGCThreads=5 oracle.rat.tfa.TFAMain /u01/app/19.0.0.0/grid/tfa/exsite1c1/tfa_home
root     117853  37137  0 11:31 pts/0    00:00:00 grep --color=auto tfa
[root@exsite1c1 ~]#

And execute the diagcollect:

[root@exsite1c1 ~]# /u01/app/19.0.0.0/grid/tfa/bin/tfactl diagcollect
WARNING - TFA Software is older than 180 days. Please consider upgrading TFA to the latest version.

By default TFA will collect diagnostics for the last 12 hours. This can result in large collections
For more targeted collections enter the time of the incident, otherwise hit <RETURN> to collect for the last 12 hours
[YYYY-MM-DD HH24:MI:SS,<RETURN>=Collect for last 12 hours] :

Collecting data for the last 12 hours for all components...
Collecting data for all nodes

Collection Id : 20191122124148exsite1c1

Detailed Logging at : /u01/app/grid/tfa/repository/collection_Fri_Nov_22_12_41_49_CET_2019_node_all/diagcollect_20191122124148_exsite1c1.log
2019/11/22 12:41:53 CET : NOTE : Any file or directory name containing the string .com will be renamed to replace .com with dotcom
2019/11/22 12:41:53 CET : Collection Name : tfa_Fri_Nov_22_12_41_49_CET_2019.zip
2019/11/22 12:41:54 CET : Collecting diagnostics from hosts : [exsite1c1, exsite1c2]
2019/11/22 12:41:54 CET : Scanning of files for Collection in progress...
2019/11/22 12:41:54 CET : Collecting additional diagnostic information...
2019/11/22 12:44:13 CET : Completed collection of additional diagnostic information...
2019/11/22 13:15:39 CET : Getting list of files satisfying time range [11/22/2019 00:41:53 CET, 11/22/2019 13:15:39 CET]
2019/11/22 13:40:42 CET : Collecting ADR incident files...
2019/11/22 13:40:48 CET : Completed Local Collection
2019/11/22 13:40:48 CET : Remote Collection in Progress...
.---------------------------------------.
|           Collection Summary          |
+-----------+-----------+-------+-------+
| Host      | Status    | Size  | Time  |
+-----------+-----------+-------+-------+
| exsite1c2 | Completed | 412MB |  318s |
| exsite1c1 | Completed | 284MB | 3534s |
'-----------+-----------+-------+-------'

Logs are being collected to: /u01/app/grid/tfa/repository/collection_Fri_Nov_22_12_41_49_CET_2019_node_all
/u01/app/grid/tfa/repository/collection_Fri_Nov_22_12_41_49_CET_2019_node_all/exsite1c1.tfa_Fri_Nov_22_12_41_49_CET_2019.zip
/u01/app/grid/tfa/repository/collection_Fri_Nov_22_12_41_49_CET_2019_node_all/exsite1c2.tfa_Fri_Nov_22_12_41_49_CET_2019.zip
[root@exsite1c1 ~]#
[root@exsite1c1 ~]# /u01/app/19.0.0.0/grid/tfa/bin/tfactl diagcollect -since 1h
WARNING - TFA Software is older than 180 days. Please consider upgrading TFA to the latest version.
Collecting data for all nodes

Collection Id : 20191122134319exsite1c1

Detailed Logging at : /u01/app/grid/tfa/repository/collection_Fri_Nov_22_13_43_20_CET_2019_node_all/diagcollect_20191122134319_exsite1c1.log
2019/11/22 13:43:24 CET : NOTE : Any file or directory name containing the string .com will be renamed to replace .com with dotcom
2019/11/22 13:43:24 CET : Collection Name : tfa_Fri_Nov_22_13_43_20_CET_2019.zip
2019/11/22 13:43:24 CET : Collecting diagnostics from hosts : [exsite1c1, exsite1c2]
2019/11/22 13:43:24 CET : Scanning of files for Collection in progress...
2019/11/22 13:43:24 CET : Collecting additional diagnostic information...
2019/11/22 13:44:49 CET : Getting list of files satisfying time range [11/22/2019 12:43:24 CET, 11/22/2019 13:44:49 CET]
2019/11/22 13:45:50 CET : Completed collection of additional diagnostic information...
2019/11/22 13:59:19 CET : Collecting ADR incident files...
2019/11/22 13:59:19 CET : Completed Local Collection
2019/11/22 13:59:19 CET : Remote Collection in Progress...
.--------------------------------------.
|          Collection Summary          |
+-----------+-----------+-------+------+
| Host      | Status    | Size  | Time |
+-----------+-----------+-------+------+
| exsite1c2 | Completed | 230MB | 295s |
| exsite1c1 | Completed | 105MB | 955s |
'-----------+-----------+-------+------'

Logs are being collected to: /u01/app/grid/tfa/repository/collection_Fri_Nov_22_13_43_20_CET_2019_node_all
/u01/app/grid/tfa/repository/collection_Fri_Nov_22_13_43_20_CET_2019_node_all/exsite1c2.tfa_Fri_Nov_22_13_43_20_CET_2019.zip
/u01/app/grid/tfa/repository/collection_Fri_Nov_22_13_43_20_CET_2019_node_all/exsite1c1.tfa_Fri_Nov_22_13_43_20_CET_2019.zip
[root@exsite1c1 ~]#

TFA error #2

Another error that I got in another cluster that passed for the same update/upgrade process was related with *ser files in tfa home. If I try to use TFA (with diagcolect as an example) I receive this error:

[root@exsite2c1 ~]# /u01/app/19.0.0.0/grid/tfa/exsite2c1/tfa_home/bin/tfactl diagcollect
WARNING - TFA Software is older than 180 days. Please consider upgrading TFA to the latest version.
Storable binary image v2.10 contains data of type 101. This Storable is v2.9 and can only handle data types up to 30 at /usr/lib64/perl5/vendor_perl/Storable.pm line 381, at /u01/app/19.0.0.0/grid/tfa/exsite2c1/tfa_home/bin/common/tfactlshare.pm line 25611.
[root@exsite2c1 ~]#

If you look in the MOS, will point to PERL version. But it is not the case here, the perl it is more than 5.10 version for this version of Exadata. The solution was more *.ser files to another folder (remove from TFA home), or delete it. After that, no more “Storage binary error” (but the error about with tfa_setup.txt continues):

[root@exsite2c1 ~]# mv /u01/app/19.0.0.0/grid/tfa/exsite2c1/tfa_home/internal/*ser /tmp
[root@exsite2c1 ~]# ls -l /u01/app/19.0.0.0/grid/tfa/exsite2c1/tfa_home/internal/*ser
ls: cannot access /u01/app/19.0.0.0/grid/tfa/exsite2c1/tfa_home/internal/*ser: No such file or directory
[root@exsite2c1 ~]#

Problem and Solution

It is not clear the source of the problem in this case. As you saw above, the logs of upgrade/update of GI from 18c to 19c reported success, even for TFA. But it is clear that tfa_setup.txt was left with wrong parameters inside. And if you look closely you can see that exists reference to the new GI home in the first version.

But unfortunately, the needed parameters were left with the wrong values. The workaround was just to change the tfa_setup.txt and fix the wrong folders for parameters. Was not tested to execute the $GI_HOME/grid/crs/install/tfa_setup -silent -crshome $GI_HOME to fix the filed, but you can try. The idea was trying to identify the issue instead of just remove TFA and reinstall it.

Again, this is a workaround tested in my environment and worked. You need to verify logs and other files to see if you hit the same issues. If yes, at least, you can try.

 

Disclaimer: “The postings on this site are my own and don’t necessarily represent my actual employer positions, strategies or opinions. The information here was edited to be useful for general purpose, specific data and identifications were removed to allow reach the generic audience and to be useful for the community. Post protected by copyright.”

Leave a Reply

Your email address will not be published. Required fields are marked *