AHF and TFA Management

Recently I posted about the upgrade of AHF/TAF from version 19 to 21 at Exadata and also for ODA. But with version 21 of AHF, some collections are made automatically and this can impact your space usage. Here you can see how to check this and disable/modify some of these.

The automatic collection for AHF/TFA is a feature that generates the diagnostic packages (to send to Oracle) when some specifics errors appear in the database. The collected errors follow some patterns like ORA-0600, ORA-07445, and several others. The basic idea can be seen in the official doc here and in the image below (retried directly from the official doc).

In my case, the automatic collection generates a problem with space usage. Look below the space consumption for AHF:

[root@exdbsrv01 ~]# cd /u01/app/grid/
[root@exdbsrv01 grid]# du -chs oracle.ahf/data/*
90M     oracle.ahf/data/exdbsrv01
9.0G    oracle.ahf/data/repository
4.0K    oracle.ahf/data/work
9.1G    total
[root@exdbsrv01 grid]#

As you can see, more than 9GB for data collection at AHF. This occurred because one database error generated a lot of ORA-600, and made AHF/TFA collect and generate traces for each one of these errors. This is designed for AHF/TFA, but unfortunately not desired here in my case. As the documentation says: Automatic collections are ON by default (look in my server):

[root@exdbsrv01 oracle.ahf]# /opt/oracle.ahf/bin/tfactl get autodiagcollect
.-------------------------------------------------.
|                    exdbsrv01                    |
+-----------------------------------------+-------+
| Configuration Parameter                 | Value |
+-----------------------------------------+-------+
| Auto Diagcollection ( autodiagcollect ) | ON    |
'-----------------------------------------+-------'

[root@exdbsrv01 oracle.ahf]#

But fortunately, is easy to disable it (“-c” propagate to all nodes of the cluster):

[root@exdbsrv01 grid]# /opt/oracle.ahf/bin/ahfctl set autodiagcollect=OFF -c
Successfully set autodiagcollect=OFF
.-------------------------------------------------.
|                    exdbsrv01                    |
+-----------------------------------------+-------+
| Configuration Parameter                 | Value |
+-----------------------------------------+-------+
| Auto Diagcollection ( autodiagcollect ) | OFF   |
'-----------------------------------------+-------'

[root@exdbsrv01 grid]#

If you already collected a lot of diagnostics packages (like me) you can easily delete it directly from AHF/TFA with the “purge” command (but remember to purge in each node of your cluster, there is no option to call just from one node and delete at all):

[root@exdbsrv01 grid]# /opt/oracle.ahf/bin/tfactl purge -h

Delete collections from TFA repository

Usage : /opt/oracle.ahf/tfa/bin/tfactl purge -older x[h|d] [-force]
Examples:
/opt/oracle.ahf/tfa/bin/tfactl purge -older 30d    - To remove file(s) older than 30 days.
/opt/oracle.ahf/tfa/bin/tfactl purge -older 10h    - To remove file(s) older than 10 hours.
[root@exdbsrv01 grid]#

And to delete everything older than 5 hours here is the example (you can use the “-force” to avoid the “Y/N” question:

[root@exdbsrv01 grid]# /opt/oracle.ahf/bin/tfactl purge -older 5h

List of files in the repository older than 5h:
/u01/app/grid/oracle.ahf/data/repository/auto_srdcORA-00700_Sun_Jul_25_10_13_17_CEST_2021_node_exdbsrv01
/u01/app/grid/oracle.ahf/data/repository/auto_srdcORA-00700_Tue_Jul_27_08_40_39_CEST_2021_node_exdbsrv01
/u01/app/grid/oracle.ahf/data/repository/collection_Tue_Jul_27_20_00_22_CEST_2021_node_exdbsrv01
/u01/app/grid/oracle.ahf/data/repository/auto_srdcORA-00700_Sat_Jul_24_10_30_56_CEST_2021_node_exdbsrv01
…
…
/u01/app/grid/oracle.ahf/data/repository/auto_srdcORA-00700_Sat_Jul_24_15_11_23_CEST_2021_node_exdbsrv01

Do you want to delete the above files. [Y|y|N|n] [Y]: Y

Deleting /u01/app/grid/oracle.ahf/data/repository/auto_srdcORA-00700_Sun_Jul_25_10_13_17_CEST_2021_node_exdbsrv01 .....Deleted.
Deleting /u01/app/grid/oracle.ahf/data/repository/auto_srdcORA-00700_Tue_Jul_27_08_40_39_CEST_2021_node_exdbsrv01 .....Deleted.
…
…
Deleting /u01/app/grid/oracle.ahf/data/repository/auto_srdcORA-00700_Sun_Jul_25_21_34_25_CEST_2021_node_exdbsrv01 .....Deleted.
Deleting /u01/app/grid/oracle.ahf/data/repository/auto_srdcORA-00700_Sat_Jul_24_15_11_23_CEST_2021_node_exdbsrv01 .....Deleted.
[root@exdbsrv01 grid]#

But is not just that we can do, there are several other things that we can check and enable/disable with commands “get” and “set”:

[root@exdbsrv01 oracle.ahf]# /opt/oracle.ahf/bin/tfactl get collect
Invalid option specified for get

GET various TFA features

Usage : /opt/oracle.ahf/tfa/bin/tfactl get [ autodiagcollect | trimfiles | tracelevel| reposizeMB | repositorydir | logsize | logcount| maxcorefilesize | maxcorecollectionsize| maxfilecollectionsize| autopurge| publicip | redact | minSpaceForRTScan | rtscan| diskUsageMon| diskUsageMonInterval| manageLogsAutoPurge| manageLogsAutoPurgeInterval | manageLogsAutoPurgePolicyAge | minfileagetopurge | tfaIpsPoolSize | tfaDbUtlPurgeAge | tfaDbUtlPurgeMode | tfaDbUtlPurgeThreadDelay| tfaDbUtlCrsProfileDelay | indexRecoveryMode | collection.isa | discovery | inventory | unreachableNodeSleepTime| unreachableNodeTimeOut | ipsAlertlogTrimsizeMB| clustereventmonitor | rediscoveryInterval] [-node] [-match pattern ]
Examples:
/opt/oracle.ahf/tfa/bin/tfactl get autopurge
/opt/oracle.ahf/tfa/bin/tfactl get match-pattern -match
[root@exdbsrv01 oracle.ahf]#

Other important commands are related to the print of current collections and configs. Look below that I collect the report+changed the collection+generate the report again (and correctly show the change):

[root@exdbsrv01 grid]# /opt/oracle.ahf/bin/tfactl get collect  -match
.------------------------------------------------------------------------------.
|                                   exdbsrv01                                  |
+----------------------------------------------------------------------+-------+
| Configuration Parameter                                              | Value |
+----------------------------------------------------------------------+-------+
| ISA Data Gathering ( collection.isa )                                | ON    |
| collectTrm                                                           | OFF   |
| collectAllDirsByFile                                                 | ON    |
| Auto Diagcollection ( autodiagcollect )                              | ON    |
| Generation of Mini Collections ( minicollection )                    | ON    |
| chaautocollect                                                       | ON    |
| Maximum File Collection Size (MB) ( maxFileCollectionSize )          | 5120  |
| Maximum Collection Size of Core Files (MB) ( maxCoreCollectionSize ) | 200   |
| minTimeForAutoDiagCollection                                         | 300   |
'----------------------------------------------------------------------+-------'

[root@exdbsrv01 grid]#
[root@exdbsrv01 grid]# /opt/oracle.ahf/bin/ahfctl set chaautocollect=OFF -c
Successfully set chaautocollect=OFF
.---------------------------------.
|            exdbsrv01            |
+-------------------------+-------+
| Configuration Parameter | Value |
+-------------------------+-------+
| chaautocollect          | OFF   |
'-------------------------+-------'

[root@exdbsrv01 grid]#
[root@exdbsrv01 grid]# /opt/oracle.ahf/bin/tfactl get collect  -match
.------------------------------------------------------------------------------.
|                                   exdbsrv01                                  |
+----------------------------------------------------------------------+-------+
| Configuration Parameter                                              | Value |
+----------------------------------------------------------------------+-------+
| ISA Data Gathering ( collection.isa )                                | ON    |
| collectTrm                                                           | OFF   |
| collectAllDirsByFile                                                 | ON    |
| Auto Diagcollection ( autodiagcollect )                              | OFF   |
| Generation of Mini Collections ( minicollection )                    | ON    |
| chaautocollect                                                       | OFF   |
| Maximum File Collection Size (MB) ( maxFileCollectionSize )          | 5120  |
| Maximum Collection Size of Core Files (MB) ( maxCoreCollectionSize ) | 200   |
| minTimeForAutoDiagCollection                                         | 300   |
'----------------------------------------------------------------------+-------'

[root@exdbsrv01 grid]#

A more comprehensive report came from AHF:

[root@exdbsrv01 oracle.ahf]# /opt/oracle.ahf/bin/ahfctl print config
.---------------------------------------------------------------------------------------------------------------------.
|                                                      exdbsrv01                                                      |
+--------------------------------------------------------------------------------------------------------+------------+
| Configuration Parameter                                                                                | Value      |
+--------------------------------------------------------------------------------------------------------+------------+
| TFA Version ( tfaversion )                                                                             | 21.1.4.0.0 |
| Java Version ( javaVersion )                                                                           | 1.8        |
| Public IP Network ( publicIp )                                                                         | true       |
| Repository current size (MB) ( currentsizemegabytes )                                                  | 9209       |
| Repository maximum size (MB) ( maxsizemegabytes )                                                      | 10240      |
| Cluster Event Monitor ( clustereventmonitor )                                                          | ON         |
| scandiskmon                                                                                            | OFF        |
| scanacfslog                                                                                            | OFF        |
| File Data Collection ( inventory )                                                                     | ON         |
| Automatic Purging ( autoPurge )                                                                        | ON         |
| Internal Search String ( internalSearchString )                                                        | ON         |
| ISA Data Gathering ( collection.isa )                                                                  | ON         |
| Trim Files ( trimfiles )                                                                               | ON         |
| collectTrm                                                                                             | OFF        |
| chmdataapi                                                                                             | ON         |
| chanotification ( chanotification )                                                                    | ON         |
| Skip event if it was flood controlled ( floodcontrol_events )                                          | OFF        |
| Consolidate similar events (COUNT shows number of events occurences) ( consolidate_events )            | OFF        |
| Managelogs Auto Purge ( manageLogsAutoPurge )                                                          | OFF        |
| scanacfseventlog                                                                                       | OFF        |
| Alert Log Scan ( rtscan )                                                                              | ON         |
| debugips                                                                                               | OFF        |
| generateZipMetadataJson                                                                                | ON         |
| collectAllDirsByFile                                                                                   | ON         |
| scanvarlog                                                                                             | OFF        |
| Auto Diagcollection ( autodiagcollect )                                                                | ON         |
| Public IP Network ( publicIp )                                                                         | ON         |
| Flood Control ( floodcontrol )                                                                         | ON         |
| Generation of Mini Collections ( minicollection )                                                      | ON         |
| odscan                                                                                                 | ON         |
| Disk Usage Monitor ( diskUsageMon )                                                                    | ON         |
| Start consuming data provided by SQLTicker ( sqlticker )                                               | OFF        |
| Discovery ( discovery )                                                                                | ON         |
| analyze                                                                                                | OFF        |
| indexInventory                                                                                         | ON         |
| Generation of Telemetry Data ( telemetry )                                                             | OFF        |
| chaautocollect                                                                                         | ON         |
| Granular Tracing ( granulartracing )                                                                   | OFF        |
| minPossibleSpaceForPurge                                                                               | 1024       |
| disk.threshold                                                                                         | 90         |
| mem.swapfree                                                                                           | 5120       |
| mem.util.samples                                                                                       | 4          |
| inventoryThreadPoolSize                                                                                | 1          |
| mem.swaptotal.samples                                                                                  | 2          |
| maxFileAgeToPurge                                                                                      | 1440       |
| mem.free                                                                                               | 20480      |
| actionrestartlimit                                                                                     | 30         |
| Minimum Free Space to enable Alert Log Scan (MB) ( minSpaceForRTScan )                                 | 500        |
| cpu.io.samples                                                                                         | 30         |
| mem.util                                                                                               | 80         |
| Maximum single Zip File Size (MB) ( maxZipSize )                                                       | 2048       |
| Time interval between consecutive Disk Usage Snapshot(minutes) ( diskUsageMonInterval )                | 60         |
| TFA ISA Purge Thread Delay (minutes) ( tfaDbUtlPurgeThreadDelay )                                      | 60         |
| firstDiscovery                                                                                         | 1          |
| TFA IPS Pool Size ( tfaIpsPoolSize )                                                                   | 5          |
| Maximum File Collection Size (MB) ( maxFileCollectionSize )                                            | 5120       |
| Time interval between consecutive Managelogs Auto Purge(minutes) ( manageLogsAutoPurgeInterval )       | 60         |
| arc.backupmissing.samples                                                                              | 2          |
| cpu.util.samples                                                                                       | 2          |
| cpu.usr.samples                                                                                        | 2          |
| cpu.sys                                                                                                | 50         |
| Flood Control Limit Count ( fc.limit )                                                                 | 3          |
| Flood Control Pause Time (minutes) ( fc.pauseTime )                                                    | 120        |
| Maximum Number of TFA Logs ( maxLogCount )                                                             | 10         |
| DB Backup Delay Hours ( dbbackupdelayhours )                                                           | 27         |
| cdb.backup.samples                                                                                     | 1          |
| arc.backupstatus                                                                                       | 1          |
| purgeFrequency                                                                                         | 4          |
| TFA ISA Purge Age (seconds) ( tfaDbUtlPurgeAge )                                                       | 2592000    |
| Maximum Collection Size of Core Files (MB) ( maxCoreCollectionSize )                                   | 200        |
| cpu.util                                                                                               | 80         |
| mem.swapfree.samples                                                                                   | 2          |
| cdb.backupstatus                                                                                       | 1          |
| mem.swaputl.samples                                                                                    | 2          |
| arc.backup.samples                                                                                     | 3          |
| unreachablenodeTimeOut                                                                                 | 3600       |
| Flood Control Limit Time (minutes) ( fc.limitTime )                                                    | 60         |
| mem.swaputl                                                                                            | 10         |
| mem.free.samples                                                                                       | 2          |
| Maximum Size of Core File (MB) ( maxCoreFileSize )                                                     | 20         |
| disk.samples                                                                                           | 2          |
| cpu.sys.samples                                                                                        | 30         |
| cpu.usr                                                                                                | 98         |
| arc.backupmissing                                                                                      | 1          |
| cpu.io                                                                                                 | 20         |
| Archive Backup Delay Minutes ( archbackupdelaymins )                                                   | 40         |
| inventoryPurgeThreadInterval                                                                           | 720        |
| Age of Purging Collections (Hours) ( minFileAgeToPurge )                                               | 12         |
| cpu.idle.samples                                                                                       | 2          |
| unreachablenodeSleepTime                                                                               | 300        |
| cpu.idle                                                                                               | 20         |
| mem.swaptotal                                                                                          | 24         |
| TFA ISA CRS Profile Delay (minutes) ( tfaDbUtlCrsProfileDelay )                                        | 2          |
| cdb.backupmissing                                                                                      | 1          |
| cdb.backupmissing.samples                                                                              | 2          |
| Trim Size ( trimsize )                                                                                 | 500000     |
| Maximum Size of TFA Log (MB) ( maxLogSize )                                                            | 52428800   |
| minTimeForAutoDiagCollection                                                                           | 300        |
| skipScanThreshold                                                                                      | 100        |
| fileCountInventorySwitch                                                                               | 5000       |
| TFA ISA Purge Mode ( tfaDbUtlPurgeMode )                                                               | profile    |
| country                                                                                                | US         |
| Debug Mask (Hex) ( debugmask )                                                                         | 0x000000   |
| Setting for ACR redaction (none|SANITIZE|MASK) ( redact )                                              | none       |
| language                                                                                               | en         |
| AlertLogLevel                                                                                          | ALL        |
| BaseLogPath                                                                                            | ERROR      |
| encoding                                                                                               | UTF-8      |
| UserLogLevel                                                                                           | ALL        |
| Logs older than the time period will be auto purged(days[d]|hours[h]) ( manageLogsAutoPurgePolicyAge ) | 30d        |
| isaMode                                                                                                | enabled    |
'--------------------------------------------------------------------------------------------------------+------------'

[root@exdbsrv01 oracle.ahf]#

Another important change is related to CPU usage for AHF/TFA. At several places, we can see relates/doubts/posts/forums telling about high CPU usage due to TFA. And if you link with the automatic collection you can pass several problems due to the limits. My example:

[root@exdbsrv01 oracle.ahf]# /opt/oracle.ahf/bin/ahfctl getresourcelimit

Tool: tfa, Resource: cpu, Limit value: 4.0
Tool: tfa, Resource: kmem no resource limit set
Tool: tfa, Resource: swmem no resource limit set
[root@exdbsrv01 oracle.ahf]#

As you can see above the CPU limit is 4. This means that TFA can use 4 CPUs of my servers to collect and generate data. This value can be changed to a more reasonable value. The way to think is that 1 represents 100% of single CPU usage. 4, means 100% for 4 CPU usage. So, to set to 50% of only one cpu you define the value as 0.5 (example from the doc): ahfctl setresourcelimit -value 0.5.

There are several other things to change and to set for AHF/TFA. The documentation is good and full of examples. Another good source of information is the Markus Flechtner presentation from 2019.

Some other examples of AHF/TFA management:

[root@exdbsrv01 grid]# /opt/oracle.ahf/bin/tfactl toolstatus

Running command tfactltoolstatus on exdbsrv02 ...

.------------------------------------------------------------------.
|                  TOOLS STATUS - HOST : exdbsrv02                 |
+----------------------+--------------+--------------+-------------+
| Tool Type            | Tool         | Version      | Status      |
+----------------------+--------------+--------------+-------------+
| Development Tools    | exachk       |   20.2.2.0.0 | DEPLOYED    |
|                      | oratop       |       14.1.2 | DEPLOYED    |
+----------------------+--------------+--------------+-------------+
| Support Tools Bundle | darda        | 2.10.0.R6036 | DEPLOYED    |
|                      | oswbb        |        8.3.2 | NOT RUNNING |
|                      | prw          | 12.1.13.11.4 | NOT RUNNING |
+----------------------+--------------+--------------+-------------+
| TFA Utilities        | alertsummary |   20.2.2.0.0 | DEPLOYED    |
|                      | calog        |   20.2.2.0.0 | DEPLOYED    |
|                      | dbcheck      |   18.3.0.0.0 | DEPLOYED    |
|                      | dbglevel     |   20.2.2.0.0 | DEPLOYED    |
|                      | grep         |   20.2.2.0.0 | DEPLOYED    |
|                      | history      |   20.2.2.0.0 | DEPLOYED    |
|                      | ls           |   20.2.2.0.0 | DEPLOYED    |
|                      | managelogs   |   20.2.2.0.0 | DEPLOYED    |
|                      | menu         |   20.2.2.0.0 | DEPLOYED    |
|                      | param        |   20.2.2.0.0 | DEPLOYED    |
|                      | ps           |   20.2.2.0.0 | DEPLOYED    |
|                      | pstack       |   20.2.2.0.0 | DEPLOYED    |
|                      | summary      |   20.2.2.0.0 | DEPLOYED    |
|                      | tail         |   20.2.2.0.0 | DEPLOYED    |
|                      | triage       |   20.2.2.0.0 | DEPLOYED    |
|                      | vi           |   20.2.2.0.0 | DEPLOYED    |
'----------------------+--------------+--------------+-------------'

Note :-
  DEPLOYED    : Installed and Available - To be configured or run interactively.
  NOT RUNNING : Configured and Available - Currently turned off interactively.
  RUNNING     : Configured and Available.


.------------------------------------------------------------------.
|                  TOOLS STATUS - HOST : exdbsrv01                 |
+----------------------+--------------+--------------+-------------+
| Tool Type            | Tool         | Version      | Status      |
+----------------------+--------------+--------------+-------------+
| Development Tools    | exachk       |   20.2.2.0.0 | DEPLOYED    |
|                      | oratop       |       14.1.2 | DEPLOYED    |
+----------------------+--------------+--------------+-------------+
| Support Tools Bundle | darda        | 2.10.0.R6036 | DEPLOYED    |
|                      | oswbb        |        8.3.2 | NOT RUNNING |
|                      | prw          | 12.1.13.11.4 | NOT RUNNING |
+----------------------+--------------+--------------+-------------+
| TFA Utilities        | alertsummary |   20.2.2.0.0 | DEPLOYED    |
|                      | calog        |   20.2.2.0.0 | DEPLOYED    |
|                      | dbcheck      |   18.3.0.0.0 | DEPLOYED    |
|                      | dbglevel     |   20.2.2.0.0 | DEPLOYED    |
|                      | grep         |   20.2.2.0.0 | DEPLOYED    |
|                      | history      |   20.2.2.0.0 | DEPLOYED    |
|                      | ls           |   20.2.2.0.0 | DEPLOYED    |
|                      | managelogs   |   20.2.2.0.0 | DEPLOYED    |
|                      | menu         |   20.2.2.0.0 | DEPLOYED    |
|                      | param        |   20.2.2.0.0 | DEPLOYED    |
|                      | ps           |   20.2.2.0.0 | DEPLOYED    |
|                      | pstack       |   20.2.2.0.0 | DEPLOYED    |
|                      | summary      |   20.2.2.0.0 | DEPLOYED    |
|                      | tail         |   20.2.2.0.0 | DEPLOYED    |
|                      | triage       |   20.2.2.0.0 | DEPLOYED    |
|                      | vi           |   20.2.2.0.0 | DEPLOYED    |
'----------------------+--------------+--------------+-------------'

Note :-
  DEPLOYED    : Installed and Available - To be configured or run interactively.
  NOT RUNNING : Configured and Available - Currently turned off interactively.
  RUNNING     : Configured and Available.

[root@exdbsrv01 grid]#
[root@exdbsrv01 grid]# /opt/oracle.ahf/bin/ahfctl set chaautocollect=OFF -c
Successfully set chaautocollect=OFF
.---------------------------------.
|            exdbsrv01            |
+-------------------------+-------+
| Configuration Parameter | Value |
+-------------------------+-------+
| chaautocollect          | OFF   |
'-------------------------+-------'

[root@exdbsrv01 grid]#

Disclaimer: “The postings on this site are my own and don’t necessarily represent my actual employer positions, strategies or opinions. The information here was edited to be useful for general purpose, specific data and identifications were removed to allow reach the generic audience and to be useful for the community. Post protected by copyright.”

Leave a Reply

Your email address will not be published. Required fields are marked *