技術分享 | Oracle 19c RAC環境下備庫node1 ADG異常、asm異常分析及處理詳情 - 數據庫,oracle,sql 墨天輪博客

本文為墨天輪數據庫管理服務團隊第146期技術分享，內容原創，作者為技術顧問達世德，如需轉載請聯繫小墨（VX：modb666）並註明來源。如需查看更多文章可關注【墨天輪】公眾號。

一、問題描述

巡檢發現：客户系統數據庫備庫ADG數據同步異常，具體信息如下：

查看複製錯誤信息：

查看複製相關的進程狀態：

二、分析過程

2.1、查看主節點alert日誌：

發現用户認證錯誤：

2.2、驗證SYS用户密碼，遠程訪問：

主庫-備庫：

備庫-主庫：

2.3 在備庫上查詢數據最後的同步時間：

select max(CHECKPOINT_TIME) from v$datafile_header;

在同一時間發生了網絡故障，因此初步判斷ADG的數據同步應該是網絡故障引發。

2.4、備庫rac節點1無法獲取磁盤組信息

發現當前節點無法獲取到磁盤組的信息：

2.5、查看磁盤組狀態信息：

通過命令驗證asm服務狀態。發現磁盤組無法連接，但是asm服務是正常運行的

2.6、分析asm alert日誌信息

日誌顯示 ORA-04031錯誤碼，提示shared memory太小，無法分配shared\_pool\_size 。

2.7、查詢node1實例啓動時間：

實例在2024-07-21進行了重啓操作，當前無法通過sqlplus / as sysasm連接到asm。

在node2（asm正常）上查看sga信息：

2.8、查看密碼文件

確認密碼文件位置：

可以看到密碼文件存儲在磁盤組中，但是node1節點的磁盤組當前法識別，因此導致密碼認證錯誤，ADG出現同步異常。

因此：ADG數據同步異常根本是備節點的ASM磁盤組異常，而ADG同步的密碼文件存儲在ASM中，所以數據同步發生認證錯誤，導致的異常，因此解決辦法即恢復備節點的ASM狀態即可。

三、問題處理

3.1 先解決node1節點的asm異常問題

3.1.1嘗試重啓node1的asm服務

# 
root用户執行：
srvctl status asm 
srvctl stop asm 
srvctl start asm

可以看到，asm服務正常運行，且無法停止。

3.1.2 重啓node1 的集羣服務

停止node1集羣：（請選擇在業務低峯期（23:00）執行如下操作），操作日誌如下：

# 
確認服務信息：
[root@testdb1 ~]# su - grid
testdb1:/home/grid(+ASM1)$asmcmd lsdg
Connected to an idle instance.
ASMCMD-8102: no connection to Oracle ASM; command requires Oracle ASM to run
testdb1:/home/grid(+ASM1)$srvctl status asm
ASM is running on testdb1,testdb2
# 
開始停止節點集羣（使用root用户執行）：
testdb1:/home/grid(+ASM1)$exit
logout
[root@testdb1 ~]# cd /oracle/app/19.0.0/grid/bin/
[root@testdb1 bin]# ./crsctl stop crs -f
CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'testdb1'
CRS-2673: Attempting to stop 'ora.crsd' on 'testdb1'
CRS-2790: Starting shutdown of Cluster Ready Services-managed resources on server 'testdb1'
CRS-2673: Attempting to stop 'ora.dsg.dsg.acfs' on 'testdb1'
CRS-2673: Attempting to stop 'ora.chad' on 'testdb1'
CRS-2673: Attempting to stop 'ora.testdbstd.db' on 'testdb1'
Error unmounting '/dsg'. Possible busy file system. Verify the logs.
Retrying unmount
CRS-2677: Stop of 'ora.testdbstd.db' on 'testdb1' succeeded
CRS-33673: Attempting to stop resource group 'ora.asmgroup' on server 'testdb1'
CRS-2679: Attempting to clean 'ora.OCR_VOTE.dg' on 'testdb1'
CRS-2679: Attempting to clean 'ora.ARCHIVEDG.dg' on 'testdb1'
CRS-2679: Attempting to clean 'ora.DATADG1.dg' on 'testdb1'
CRS-2679: Attempting to clean 'ora.DATADG2.dg' on 'testdb1'
CRS-2679: Attempting to clean 'ora.DSG.dg' on 'testdb1'
CRS-2679: Attempting to clean 'ora.MGMT.dg' on 'testdb1'
CRS-2673: Attempting to stop 'ora.LISTENER.lsnr' on 'testdb1'
CRS-2673: Attempting to stop 'ora.LISTENER2.lsnr' on 'testdb1'
CRS-2673: Attempting to stop 'ora.LISTENER3.lsnr' on 'testdb1'
CRS-2673: Attempting to stop 'ora.LISTENER_DG.lsnr' on 'testdb1'
CRS-2677: Stop of 'ora.LISTENER.lsnr' on 'testdb1' succeeded
CRS-2677: Stop of 'ora.LISTENER2.lsnr' on 'testdb1' succeeded
CRS-2677: Stop of 'ora.LISTENER3.lsnr' on 'testdb1' succeeded
CRS-2677: Stop of 'ora.LISTENER_DG.lsnr' on 'testdb1' succeeded
CRS-2673: Attempting to stop 'ora.testdb1.vip' on 'testdb1'
CRS-5017: The resource action "ora.ARCHIVEDG.dg clean" encountered the following error: 
ORA-04031: unable to allocate 816 bytes of shared memory ("shared pool","unknown object","sga heap(1,0)","KGLHD")
. For details refer to "(:CLSN00106:)" in "/oracle/app/grid/diag/crs/testdb1/crs/trace/crsd_oraagent_grid.trc".
CRS-2677: Stop of 'ora.testdb1.vip' on 'testdb1' succeeded
CRS-5017: The resource action "ora.ARCHIVEDG.dg check" encountered the following error: 
ORA-04031: unable to allocate 816 bytes of shared memory ("shared pool","unknown object","sga heap(1,0)","KGLHD")
. For details refer to "(:CLSN00109:)" in "/oracle/app/grid/diag/crs/testdb1/crs/trace/crsd_oraagent_grid.trc".
CRS-2680: Clean of 'ora.ARCHIVEDG.dg' on 'testdb1' failed
CRS-5017: The resource action "ora.OCR_VOTE.dg clean" encountered the following error: 
ORA-04031: unable to allocate 5184 bytes of shared memory ("shared pool","unknown object","SO private sga","kss private so")
. For details refer to "(:CLSN00106:)" in "/oracle/app/grid/diag/crs/testdb1/crs/trace/crsd_oraagent_grid.trc".
CRS-5017: The resource action "ora.DSG.dg clean" encountered the following error: 
ORA-04031: unable to allocate 5184 bytes of shared memory ("shared pool","unknown object","SO private sga","kss private so")
. For details refer to "(:CLSN00106:)" in "/oracle/app/grid/diag/crs/testdb1/crs/trace/crsd_oraagent_grid.trc".
CRS-5017: The resource action "ora.DATADG2.dg clean" encountered the following error: 
ORA-04031: unable to allocate 5184 bytes of shared memory ("shared pool","unknown object","SO private sga","kss private so")
. For details refer to "(:CLSN00106:)" in "/oracle/app/grid/diag/crs/testdb1/crs/trace/crsd_oraagent_grid.trc".
CRS-5017: The resource action "ora.DATADG1.dg clean" encountered the following error: 
ORA-04031: unable to allocate 5184 bytes of shared memory ("shared pool","unknown object","SO private sga","kss private so")
. For details refer to "(:CLSN00106:)" in "/oracle/app/grid/diag/crs/testdb1/crs/trace/crsd_oraagent_grid.trc".
CRS-5017: The resource action "ora.MGMT.dg clean" encountered the following error: 
ORA-04031: unable to allocate 5184 bytes of shared memory ("shared pool","unknown object","SO private sga","kss private so")
. For details refer to "(:CLSN00106:)" in "/oracle/app/grid/diag/crs/testdb1/crs/trace/crsd_oraagent_grid.trc".
CRS-5017: The resource action "ora.MGMT.dg check" encountered the following error: 
ORA-04031: unable to allocate 816 bytes of shared memory ("shared pool","unknown object","sga heap(1,0)","KGLHD")
. For details refer to "(:CLSN00109:)" in "/oracle/app/grid/diag/crs/testdb1/crs/trace/crsd_oraagent_grid.trc".
CRS-2680: Clean of 'ora.MGMT.dg' on 'testdb1' failed
CRS-5017: The resource action "ora.DATADG2.dg check" encountered the following error: 
ORA-04031: unable to allocate 568 bytes of shared memory ("shared pool","unknown object","sga heap(1,0)","KKSSP")
. For details refer to "(:CLSN00109:)" in "/oracle/app/grid/diag/crs/testdb1/crs/trace/crsd_oraagent_grid.trc".
CRS-2680: Clean of 'ora.DATADG2.dg' on 'testdb1' failed
CRS-5017: The resource action "ora.DSG.dg check" encountered the following error: 
ORA-04031: unable to allocate 12312 bytes of shared memory ("shared pool","unknown object","KKSSP^28","kglseshtTable")
. For details refer to "(:CLSN00109:)" in "/oracle/app/grid/diag/crs/testdb1/crs/trace/crsd_oraagent_grid.trc".
CRS-2680: Clean of 'ora.DSG.dg' on 'testdb1' failed
CRS-5017: The resource action "ora.DATADG1.dg check" encountered the following error: 
ORA-04031: unable to allocate 12312 bytes of shared memory ("shared pool","unknown object","KKSSP^2346","kglseshtTable")
. For details refer to "(:CLSN00109:)" in "/oracle/app/grid/diag/crs/testdb1/crs/trace/crsd_oraagent_grid.trc".
CRS-2680: Clean of 'ora.DATADG1.dg' on 'testdb1' failed
CRS-5017: The resource action "ora.OCR_VOTE.dg check" encountered the following error: 
ORA-04031: unable to allocate 12312 bytes of shared memory ("shared pool","unknown object","KKSSP^803","kglseshtTable")
. For details refer to "(:CLSN00109:)" in "/oracle/app/grid/diag/crs/testdb1/crs/trace/crsd_oraagent_grid.trc".
CRS-2680: Clean of 'ora.OCR_VOTE.dg' on 'testdb1' failed
CRS-2675: Stop of 'ora.dsg.dsg.acfs' on 'testdb1' failed
CRS-2679: Attempting to clean 'ora.dsg.dsg.acfs' on 'testdb1'
CRS-2675: Stop of 'ora.chad' on 'testdb1' failed
CRS-2679: Attempting to clean 'ora.chad' on 'testdb1'
CRS-2681: Clean of 'ora.chad' on 'testdb1' succeeded
Error unmounting '/dsg'. Possible busy file system. Verify the logs.
CRS-2681: Clean of 'ora.dsg.dsg.acfs' on 'testdb1' succeeded
CRS-2673: Attempting to stop 'ora.DSG.DSG.advm' on 'testdb1'
CRS-2677: Stop of 'ora.DSG.DSG.advm' on 'testdb1' succeeded
CRS-2673: Attempting to stop 'ora.proxy_advm' on 'testdb1'
CRS-2677: Stop of 'ora.proxy_advm' on 'testdb1' succeeded
CRS-2672: Attempting to start 'ora.testdb1.vip' on 'testdb2'
CRS-2676: Start of 'ora.testdb1.vip' on 'testdb2' succeeded
CRS-2799: Failed to shut down resource 'ora.ASMNET1LSNR_ASM.lsnr' on 'testdb1'
CRS-2794: Shutdown of Cluster Ready Services-managed resources on 'testdb1' has failed
CRS-2675: Stop of 'ora.crsd' on 'testdb1' failed
CRS-2799: Failed to shut down resource 'ora.crsd' on 'testdb1'
CRS-2795: Shutdown of Oracle High Availability Services-managed resources on 'testdb1' has failed
CRS-4687: Shutdown command has completed with errors.
CRS-4000: Command Stop failed, or completed with errors.
[root@testdb1 bin]# 
// 由於 ASM實例下 shared_pool_size 不足，導致集羣停止報錯誤：
# 
登錄grid用户驗證集羣狀態
[root@testdb1 bin]# su - grid 
Last login: Tue Sep  9 23:11:19 CST 2025
testdb1:/home/grid(+ASM1)$srvctl status asm
ASM is running on testdb2
testdb1:/home/grid(+ASM1)$   
#
　再次停止集羣服務，這次已經能夠正常停止（原因是第一次停止時部分服務停止成功，釋放了共享內存）
[root@testdb1 bin]# ./crsctl stop crs -f
CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'testdb1'
CRS-2673: Attempting to stop 'ora.crsd' on 'testdb1'
CRS-2790: Starting shutdown of Cluster Ready Services-managed resources on server 'testdb1'
CRS-2792: Shutdown of Cluster Ready Services-managed resources on 'testdb1' has completed
CRS-2677: Stop of 'ora.crsd' on 'testdb1' succeeded
CRS-2673: Attempting to stop 'ora.asm' on 'testdb1'
CRS-2673: Attempting to stop 'ora.crf' on 'testdb1'
CRS-2673: Attempting to stop 'ora.drivers.acfs' on 'testdb1'
CRS-2673: Attempting to stop 'ora.mdnsd' on 'testdb1'
CRS-2677: Stop of 'ora.asm' on 'testdb1' succeeded
CRS-2677: Stop of 'ora.drivers.acfs' on 'testdb1' succeeded
CRS-2677: Stop of 'ora.crf' on 'testdb1' succeeded
CRS-2673: Attempting to stop 'ora.cluster_interconnect.haip' on 'testdb1'
CRS-2677: Stop of 'ora.mdnsd' on 'testdb1' succeeded
CRS-2677: Stop of 'ora.cluster_interconnect.haip' on 'testdb1' succeeded
CRS-2673: Attempting to stop 'ora.ctssd' on 'testdb1'
CRS-2673: Attempting to stop 'ora.evmd' on 'testdb1'
CRS-2677: Stop of 'ora.ctssd' on 'testdb1' succeeded
CRS-2677: Stop of 'ora.evmd' on 'testdb1' succeeded
CRS-2673: Attempting to stop 'ora.cssd' on 'testdb1'
CRS-2677: Stop of 'ora.cssd' on 'testdb1' succeeded
CRS-2673: Attempting to stop 'ora.gipcd' on 'testdb1'
CRS-2673: Attempting to stop 'ora.gpnpd' on 'testdb1'
CRS-2677: Stop of 'ora.gipcd' on 'testdb1' succeeded
CRS-2677: Stop of 'ora.gpnpd' on 'testdb1' succeeded
CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'testdb1' has completed
CRS-4133: Oracle High Availability Services has been stopped.
[root@testdb1 bin]#

驗證asm服務狀態：

檢查集羣狀態：

啓動集羣：

再次檢查集羣狀態，確保每個服務處於online狀態：

檢查監聽，確保ASM和數據庫實例均正常註冊到監聽：

驗證asm狀態：

登錄數據庫，檢查磁盤組信息：

檢查ADG數據同步狀態：

此時可以看到，數據開始同步，且目前數據傳輸延遲1小時53分鐘，數據重放延遲15小時25分鐘。

0.png)此時數據同步完成，數據仍在重放中：

此時數據已經同步、重放均已經完成

創建臨時表，測試數據複製：

主節點創建表，並寫入數據後提交：

備庫進行查詢：

四、總結

此次故障為備機node1節點ASM實例中 shared\_pool\_size 共享池內存不足導致磁盤組異常，無法調用，ADG數據同步異常則是，ADG認證的密碼文件存儲在磁盤組中，備節點磁盤組丟失，導致認證錯誤，錯誤日誌如下：

需要解決asm實例下共享內存不足的問題，否則後期還可能出現同樣的錯誤。

查看當前的share內存配置：

建議增加 shared\_pool\_size 的值到2G，無需重啓實例。

SQL> alter system set shared_pool_size = 2048M scope=both;

墨天輪從樂知樂享的數據庫技術社區蓄勢出發，全面升級，提供多類型數據庫管理服務。墨天輪數據庫管理服務旨在為用户構建信賴可託付的數據庫環境，併為數據庫廠商提供中立的生態支持。
墨天輪數據庫服務官網：https://www.modb.pro/service

墨天輪博客

墨天輪博客

博客 / 詳情