近日遇到错误ORA-00600 [kjctr_pbmsg:badbmsg2],并且导致RAC节点实例重启,最终确认问题由于私网不稳定导致的。 ORA-00600:internalerrorcode,arguments:[kjctr_pbmsg:badbmsg2],[0x9FFFFFFFFC996B58],[0x9FFFFFFFFC9976B8],[],[],[],[],[],[],[],[],[]LMS1(
近日遇到错误ORA-00600 [kjctr_pbmsg:badbmsg2],并且导致RAC节点实例重启,最终确认问题由于私网不稳定导致的。
ORA-00600: internal error code, arguments: [kjctr_pbmsg:badbmsg2], [0x9FFFFFFFFC996B58], [0x9FFFFFFFFC9976B8], [], [], [], [], [], [], [], [], [] LMS1 (ospid: 12379): terminating the instance due to error 484
1. 具体分析如下,首先查看日志:
alert log
Mon Aug 11 23:53:10 2014 Errors in file /oracle/app/oracle/diag/rdbms/cdrdb/orcl/trace/orcl_lms1_12379.trc (incident=1104178): ORA-00600: internal error code, arguments: [kjctr_pbmsg:badbmsg2], [0x9FFFFFFFFC996B58], [0x9FFFFFFFFC9976B8], [], [], [], [], [], [], [], [], [] Incident details in: /oracle/app/oracle/diag/rdbms/cdrdb/orcl/incident/incdir_1104178/orcl_lms1_12379_i1104178.trc Mon Aug 11 23:53:12 2014 Dumping diagnostic data in directory=[cdmp_20140811235312], requested by (instance=1, osid=12379 (LMS1)), summary=[incident=1104178]. Use ADRCI or Support Workbench to package the incident. See Note 411.1 at My Oracle Support for error and packaging details. Mon Aug 11 23:53:13 2014 Sweep [inc][1104178]: completed Sweep [inc2][1104178]: completed Errors in file /oracle/app/oracle/diag/rdbms/cdrdb/orcl/trace/orcl_lms1_12379.trc: ORA-00600: internal error code, arguments: [kjctr_pbmsg:badbmsg2], [0x9FFFFFFFFC996B58], [0x9FFFFFFFFC9976B8], [], [], [], [], [], [], [], [], [] LMS1 (ospid: 12379): terminating the instance due to error 484 Mon Aug 11 23:53:22 2014 ORA-1092 : opitsk aborting process
orcl_lms1_12379_i1104178.trc
Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production With the Partitioning, Real Application Clusters, OLAP, Data Mining and Real Application Testing options ORACLE_HOME = /oracle/app/oracle/product/11.2.0/dbhome_1 System name: HP-UX Node name: h7sd05da Release: B.11.31 Version: U Machine: ia64 Instance name: orcl Redo thread mounted by this instance: 1 Oracle process number: 14 Unix process pid: 12379, image: oracleh7sd05da (LMS1) Dump continued from file: /oracle/app/oracle/diag/rdbms/cdrdb/orcl/trace/orcl_lms1_12379.trc ORA-00600: internal error code, arguments: [kjctr_pbmsg:badbmsg2], [0x9FFFFFFFFC996B58], [0x9FFFFFFFFC9976B8], [], [], [], [], [], [], [], [], [] ========= Dump for incident 1104178 (ORA 600 [kjctr_pbmsg:badbmsg2]) ======== *** 2014-08-11 23:53:10.339 dbkedDefDump(): Starting incident default dumps (flags=0x2, level=3, mask=0x0) ----- SQL Statement (None) ----- Current SQL information unavailable - no cursor. ----- Call Stack Trace ----- skdstdst <- ksedst <- dbkedDefDump <- ksedmp <- ksfdmp <- $cold_dbgexPhaseII <- dbgexProcessError <- dbgeExecuteForError <- dbgePostErrorKGE <- 2352 <- dbkePostKGE_kgsf <- 128 <- kgeadse <- kgerinv_internal <- kgerinv <- kgeasnmierr <- kjctr_pbmsg <- kjctr_rksxp <- kjctrcv <- kjcsrmg <- kjmsm <- ksbrdp <- opirip <- opidrv <- sou2o <- opimai_real <- ssthrdmain <- main <- main_opd_entry --------------------- Binary Stack Dump ---------------------
2. 检查patch信息,当前版本是11.2.0.2.1
$ opatch lsinventory Installed Top-level Products (1): Oracle Database 11g 11.2.0.2.0 Patch 10248523 : applied on Fri Mar 25 09:33:02 GMT+08:00 2011
3. 根据这个错误搜索相关的文档和BUG,列出下面的相关bug和描述
Bug 18015296 : ORA-600 [KJCTR_PBMSG:BADBMSG2] in 11.2.0.3
The assert is trigerred because the batch message is invalid/corrupt. This looks like some form of underlying infrastructure/network issue, Please work with customer to have this checked and tested.
Bug 18771858 : LMS0 TERMINATING THE INSTANCE DUE TO ERROR 484 (ORA-00600 [KJCTR_PBMSG:BADBMSG2] in 11.2.0.3
From the past bug 16240464 & bug 18015296 , both were closed by dev as not a product defect.
It was suggested that problem was outside Oracle stack at network level. So please check with CT on same lines to identify network problems (if any) with help from there OS/Net support. Refer Doc ID 563566.1 Troubleshooting gc block lost and Poor Network Performance in a RAC Environment
Bug 16240464 : INSTANCE CRASH WITH ORA-00600 [KJCTR_PBMSG:BADBMSG2] in 11.2.0.3
This looks like some form of underlying infrastructure/network issue, please work with customer to have this checked and tested.
Bug 17452853 : LNX64-12.1-EF,DB INST CRASH WITH LMS4 HIT ORA-600 [KJCTR_PBMSG:BADBMSG2] in 12.1.0.2
Bug 17049773 Diagnostic enhancement to give additional parameter in error ORA-600 [ kjctr_pbmsg:badbmsg2] in 12.1.0.1
Note: This fix will not address the root cause of the error but the additional information may help with diagnosis of the cause.
Bug 13917456 : LNX64-12.1-UD: ASM LMD HIT ORA-00600 KJCTR_PBMSG:BADBMSG2 IN NON-UPGRADED NODES in 12.1.0.0.2
It may occurred in upgrading stage from 11.2.0.3 to 12.1 . Not related with this SR.
4. 至此,我需要检查问题发生时的AWR,oswatcher和全部的LMS, LMD, LMON,LMHB and DIAG日志,看是否有跟多的信息记录。
同时也通过cluvfy和ORAchk来检查RAC的整体环境。
--. AWR report 22:00~23:00 on Aug 11 from both nodes. --. Deploy the oswatcher, then collect the current OS information, when the database workload is high. --. All the LMS, LMD, LMON,LMHB and DIAG from both nodes. --. CVU output: cluvfy stage -pre crsinst -n <node1,node2> -verbose --. Please run oraCheck as root. ORAchk - Health Checks for the Oracle Stack (Doc ID 1268927.2)
5. 在检查AWR的时候,发现有"gc blocks lost",这个错误理论上,如果私网正常的话,是不会出现的,它的出现,基本就可以说明,私网是不稳定的
awrrpt_2_29557_29558.html
Snap Id Snap Time Sessions Cursors/Session Begin Snap: 29557 11-Aug-14 22:00:45 563 1.3 End Snap: 29558 11-Aug-14 23:01:00 551 1.3 Elapsed: 60.24 (mins) DB Time: 4,835.90 (mins) Top 5 Timed Foreground Events Event Waits Time(s) Avg wait (ms) % DB time Wait Class db file sequential read 6,269,185 185,621 30 63.97 User I/O DB CPU 42,433 14.62 gc current grant 2-way 3,251,636 25,671 8 8.85 Cluster db file scattered read 550,524 9,873 18 3.40 User I/O gc cr multi block request 637,442 6,790 11 2.34 Cluster Instance Activity Stats Statistic Total per Second per Trans gc blocks lost 269 0.07 0.01 <<<<<<<<<<<<
awrrpt_1_29557_29558.html
Snap Id Snap Time Sessions Cursors/Session Begin Snap: 29557 11-Aug-14 22:00:44 2470 1.0 End Snap: 29558 11-Aug-14 23:00:59 2500 1.0 Elapsed: 60.25 (mins) DB Time: 4,549.47 (mins) Top 5 Timed Foreground Events Event Waits Time(s) Avg wait (ms) % DB time Wait Class db file sequential read 8,180,795 154,504 19 56.60 User I/O DB CPU 44,994 16.48 gc current grant 2-way 3,699,003 29,357 8 10.75 Cluster db file scattered read 677,065 10,190 15 3.73 User I/O gc cr multi block request 718,327 7,856 11 2.88 Cluster Statistic Total per Second per Trans gc blocks lost 410 0.11 0.01 <<<<<<<<<<<<
6. 对于这个错误,更加证明私网的问题可能性,最终结论如下
The Bugs 16240464 and 18015296 are raised for the similar issue and both the bugs are closed as "Vendor OS Problem".
The bug confirmed that this issue is cause because of logical block corruption during network transfer over the interconnect or Infrastructure issue.
The ORA-00600 [kjctr_pbmsg:badbmsg2] error is purely a result of unstable network.
From the AWR reports it is confirmed that we were seeing block lost during the problematic time frame. This is one of the evidence that network is either saturated or causing packets to be corrupted.
By the way, Checked the AWR report. Found "gc blocks lost".
Please involve the OS team and Network team to identify the root cause of the issue. The below note will helpful for the network issue.
Troubleshooting gc block lost and Poor Network Performance in a RAC Environment (Doc ID 563566.1)
7. 这个问题的处理其实还缺少更有力的证据,就是oswatcher日志,如果有问题出现时的oswatcher日 志,会让私网问题暴露的更清晰,毕竟整个问题分析过程中遇到的"gc blocks lost"和ORA-00600 [kjctr_pbmsg:badbmsg2]错误,都是oracle database角度报出的,并不能让OS的工程师信服,如果oswatcher日志记录当时的TCP和UDP丢包的话,会问题更清晰,责任更明确。
oswatcher的安装使用,请参考文档: OSWatcher (Doc ID 301137.1)

MySQL索引基数对查询性能有显着影响:1.高基数索引能更有效地缩小数据范围,提高查询效率;2.低基数索引可能导致全表扫描,降低查询性能;3.在联合索引中,应将高基数列放在前面以优化查询。

MySQL学习路径包括基础知识、核心概念、使用示例和优化技巧。1)了解表、行、列、SQL查询等基础概念。2)学习MySQL的定义、工作原理和优势。3)掌握基本CRUD操作和高级用法,如索引和存储过程。4)熟悉常见错误调试和性能优化建议,如合理使用索引和优化查询。通过这些步骤,你将全面掌握MySQL的使用和优化。

MySQL在现实世界的应用包括基础数据库设计和复杂查询优化。1)基本用法:用于存储和管理用户数据,如插入、查询、更新和删除用户信息。2)高级用法:处理复杂业务逻辑,如电子商务平台的订单和库存管理。3)性能优化:通过合理使用索引、分区表和查询缓存来提升性能。

MySQL中的SQL命令可以分为DDL、DML、DQL、DCL等类别,用于创建、修改、删除数据库和表,插入、更新、删除数据,以及执行复杂的查询操作。1.基本用法包括CREATETABLE创建表、INSERTINTO插入数据和SELECT查询数据。2.高级用法涉及JOIN进行表联接、子查询和GROUPBY进行数据聚合。3.常见错误如语法错误、数据类型不匹配和权限问题可以通过语法检查、数据类型转换和权限管理来调试。4.性能优化建议包括使用索引、避免全表扫描、优化JOIN操作和使用事务来保证数据一致性

InnoDB通过undolog实现原子性,通过锁机制和MVCC实现一致性和隔离性,通过redolog实现持久性。1)原子性:使用undolog记录原始数据,确保事务可回滚。2)一致性:通过行级锁和MVCC确保数据一致。3)隔离性:支持多种隔离级别,默认使用REPEATABLEREAD。4)持久性:使用redolog记录修改,确保数据持久保存。

MySQL在数据库和编程中的地位非常重要,它是一个开源的关系型数据库管理系统,广泛应用于各种应用场景。1)MySQL提供高效的数据存储、组织和检索功能,支持Web、移动和企业级系统。2)它使用客户端-服务器架构,支持多种存储引擎和索引优化。3)基本用法包括创建表和插入数据,高级用法涉及多表JOIN和复杂查询。4)常见问题如SQL语法错误和性能问题可以通过EXPLAIN命令和慢查询日志调试。5)性能优化方法包括合理使用索引、优化查询和使用缓存,最佳实践包括使用事务和PreparedStatemen

MySQL适合小型和大型企业。1)小型企业可使用MySQL进行基本数据管理,如存储客户信息。2)大型企业可利用MySQL处理海量数据和复杂业务逻辑,优化查询性能和事务处理。

InnoDB通过Next-KeyLocking机制有效防止幻读。1)Next-KeyLocking结合行锁和间隙锁,锁定记录及其间隙,防止新记录插入。2)在实际应用中,通过优化查询和调整隔离级别,可以减少锁竞争,提高并发性能。


热AI工具

Undresser.AI Undress
人工智能驱动的应用程序,用于创建逼真的裸体照片

AI Clothes Remover
用于从照片中去除衣服的在线人工智能工具。

Undress AI Tool
免费脱衣服图片

Clothoff.io
AI脱衣机

AI Hentai Generator
免费生成ai无尽的。

热门文章

热工具

SublimeText3 Linux新版
SublimeText3 Linux最新版

EditPlus 中文破解版
体积小,语法高亮,不支持代码提示功能

PhpStorm Mac 版本
最新(2018.2.1 )专业的PHP集成开发工具

SublimeText3 Mac版
神级代码编辑软件(SublimeText3)

记事本++7.3.1
好用且免费的代码编辑器