A LOT OF FALSE ‘Global Cache Blocks Lost’ ALERTS REPORTED BY ENTERPRISE MANAGER CLOUD CONTROL 13C RELEASE 4

Print Friendly, PDF & Email

SYMPTOMS

The alert message looks like EM Event: Warning:TEST_PDB1 – Total global cache block lost is 15.

Host=db-test01.local 
Target type=Database Instance 
Target name=TEST_PDB1 
Categories=Error 
Message=Total global cache block lost is 15. 
Severity=Warning 
Event reported time=Set 04, 2020 2:30:09 AM GMT
Operating System=Linux
Platform=x86_64
Associated Incident Id=348883 
Associated Incident Status=New 
Associated Incident Owner= 
Associated Incident Acknowledged By Owner=No 
Associated Incident Priority=None 
Associated Incident Escalation Level=0 
Event Type=Metric Alert 
Event name=rac_global_cache:lost 
Metric Group=Global Cache Statistics
Metric=Global Cache Blocks Lost
Metric value=15
Key Value= 
Rule Name=ROOT_NOTIFICATION_RULE,ALL TARGET EVENTS 
Rule Owner=SYSMAN 
Update Details:
Total global cache block lost is 15. 

DIAGNOSE

Your email box is full of the messages like the EM Incident: Warning:New: – Total global cache block lost is 15.
In the OMS repository database there is a high number of sent Global Cache Blocks Lost alerts for the target instance

SELECT TO_DATE(COLLECTION_TIMESTAMP, 'DD-MM-YYYY') "RECEIVED AT",
       COUNT(COLLECTION_TIMESTAMP) "ALERTS"
FROM
       MGMT_VIEW.MGMT$ALERT_NOTIF_LOG
WHERE
       METRIC_NAME='rac_global_cache' AND
       METRIC_COLUMN='lost' AND
       COLUMN_LABEL = 'Global Cache Blocks Lost' AND
       TARGET_NAME = '&INSTANCE_NAME' 
GROUP BY 
       TO_DATE(COLLECTION_TIMESTAMP, 'DD-MM-YYYY')
ORDER BY 1;
Enter value for instance_name: TEST_PDB1
old   7:        TARGET_NAME = '&INSTANCE_NAME'
new   7:        TARGET_NAME = 'TEST_PDB1'

RECEIVED AT            ALERTS
------------------ ----------
29-AUG-20                 192
30-AUG-20                 355
31-SET-20                 355
01-SET-20                 355
02-SET-20                 355
03-SET-20                 355
04-SET-20                 325
05-SET-20                  90

8 rows selected.

 
From the output it’s seen that 355 related alert messages generated every day for the database instance TEST_PDB1.

During the day there are not many lost blocks (or even zero lost block) for the database instance

SET PAGES 999
SET LINES 300
COL MESSAGE FOR A60

SELECT TO_CHAR(COLLECTION_TIMESTAMP, 'HH24:MI DD-MM-YYYY') ALERTED_AT,
       MESSAGE 
FROM
       MGMT_VIEW.MGMT$ALERT_NOTIF_LOG 
WHERE
      TO_DATE(COLLECTION_TIMESTAMP, 'DD-MM-YYYY')=TO_DATE('&DDMMYYYY', 'DD-MM-YYYY') AND 
      TARGET_NAME = '&INSTANCE_NAME'
ORDER BY COLLECTION_TIMESTAMP;
Enter value for ddmmyyyy: 04-SET-20
old   7:       TO_DATE(COLLECTION_TIMESTAMP, 'DD-MM-YYYY')=TO_DATE('&DDMMYYYY', 'DD-MM-YYYY') AND
new   7:       TO_DATE(COLLECTION_TIMESTAMP, 'DD-MM-YYYY')=TO_DATE('04-SET-20', 'DD-MM-YYYY') AND
Enter value for instance_name: TEST_PDB1
old   8:       TARGET_NAME = '&INSTANCE_NAME'
new   8:       TARGET_NAME = 'TEST_PDB1'

ALERTED_AT       MESSAGE
---------------- ------------------------------------------------------------
00:00 04-09-2020 Total global cache block lost is 15.
00:00 04-09-2020 Total global cache block lost is 15.
00:05 04-09-2020 Total global cache block lost is 15.

...

23:50 04-09-2020 Total global cache block lost is 15.
23:50 04-09-2020 Total global cache block lost is 15.
23:55 04-09-2020 Total global cache block lost is 15.
23:55 04-09-2020 Total global cache block lost is 15.

202 rows selected.

In this example I have zero lost block during the day (at 04-SET-20 between 00:00 and 23:55), however, I received alerts throughout the day.

SOLUTION

First I want to say that the note is not about how to troubleshoot and resolve lost block issue. For that purpose the 563566.1 and 2296681.1 must be followed. The note is about why OEM keeps sending the alerts even if there are no lost blocks for the last hours(days, weeks and so on).

Well,
The Global Cache Blocks Lost alert is based on the Global Cache Blocks Lost Metric in Enterprise Manager.

By default the metric has the following threshold values : 1 lost block for WARNING and 3 lost block for CRITICAL. To find a number of lost blocks the Metric uses the gc blocks lost statistic of the V$SYSSTAT (GV$SYSSTAT) view of the target instance.

SET PAGES 999
SET LINES 300
COL NAME FOR A20
COL VALUE FOR 999999
SELECT NAME, VALUE FROM V$SYSSTAT WHERE NAME='gc blocks lost';
NAME                   VALUE
-------------------- -------
gc blocks lost            15

NOTE: The V$SYSSTAT is based on GV$SYSSTAT

SET LINES 300
SET PAGES 999
COL VIEW_DEFINITION FOR A50 WORD_WRAPPED
SELECT VIEW_DEFINITION 
FROM
       V$FIXED_VIEW_DEFINITION
WHERE
       VIEW_NAME='V$SYSSTAT';

VIEW_DEFINITION
--------------------------------------------------
select  STATISTIC# , NAME , CLASS , VALUE,
STAT_ID, CON_ID from GV$SYSSTAT where inst_id =
USERENV('Instance')

 
The V$SYSSTAT view keeps value of lost blocks (statistics name gc blocks lost) since the instance startup. It means that the value of gc blocks lost statistic can be only increased. But once it increased it never reset until the next instance restart.

When a number of lost block (gc blocks lost) exceeded the threshold value (1 block for warning or 3 blocks for critical) of the Global Cache Blocks Lost Metric the OEM starts to send alerts. As the statistic value (gc blocks lost) will never be less than 1 or 3 until the next instance restart a number of lost blocks in this case will always be more than the value of thresholds.

That’s why the OEM keeps sending Global Cache Blocks Lost alerts every 5 minutes after the Threshold exceeded once.
Even if there is no lost block for a long period of time (hours, days, weeks) you will still receive the Global Cache Blocks Lost messages.

If you are sure the cluster instance has no problem with loosing blocks (gc blocks lost) then disable the Global Cache Blocks Lost Metric. It will stop spamming your email box. To do so just empty thresholds for the metric.

REFERENCES

Troubleshooting gc block lost and Poor Network Performance in a RAC Environment (Doc ID 563566.1)
WAITEVENT: “gc current/cr block lost” Reference Note (Doc ID 2296681.1)
Tuning Inter-Instance Performance in RAC and OPS (Doc ID 181489.1)

EM 13c: How to disable “Global Cache Blocks Lost Metric” Using EMCLI (Doc ID 2543134.1)
False increase of ‘Global Cache Blocks Lost’ or ‘gc blocks lost’ after upgrade to 12c (Doc ID 2096299.1)

 
NOTE: You can find ratio of lost blocks by the following query against the target instance

SET PAGES 999
SET LINES 300
COL RATIO FOR 99999999

SELECT A.INST_ID "INSTANCE",
       A.VALUE "GC BLOCKS LOST",
       B.VALUE "GC CUR BLOCKS SERVED",
       C.VALUE "GC CR BLOCKS SERVED",
       A.VALUE/(B.VALUE+C.VALUE) RATIO
FROM
       GV$SYSSTAT A, 
       GV$SYSSTAT B,
       GV$SYSSTAT C
WHERE
       A.NAME='gc blocks lost' AND
       B.NAME='gc current blocks served' AND
       C.NAME='gc cr blocks served' and
       B.INST_ID=a.inst_id AND
       C.INST_ID = a.inst_id;

  INSTANCE GC BLOCKS LOST GC CUR BLOCKS SERVED GC CR BLOCKS SERVED     RATIO
---------- -------------- -------------------- ------------------- ---------
         1             15             32576274               42979         0


Database : 12.2.0.1.0, 19.8.0.0.0
OEM      : 13.4.0.0.0

Leave a Reply

Your email address will not be published. Required fields are marked *