hanganalyze 3 分析总结 一种是会话级别的: ALTER SESSION SET EVENTS immediate trace name HANGANALYZE level level; 一种是实例级别: ORADEBUG hanganalyze level 一种是集群范围的: ORADEBUG setmypid ORADEBUG setinst all ORADEBUG -g def hanganaly
hanganalyze 3 分析总结测试案例
C:\Users\Administrator>SQLPLUS / AS SYSDBA
SQL*Plus: Release 10.2.0.4.0 - Production on Sat May 28 15:30:03 2011
Copyright (c) 1982, 2007, Oracle. All Rights Reserved.
数据库已经hang 了登录失败下面来做个 systemdump 吧 看看是什么原因
C:\Users\Administrator>sqlplus -prelim '/ as sysdba'
SQL*Plus: Release 10.2.0.4.0 - Production on Sat May 28 15:31:13 2011
Copyright (c) 1982, 2007, Oracle. All Rights Reserved.
Enter password:
SQL> oradebug setmypid
Statement processed.
SQL> oradebug unlimit;
Statement processed.
SQL> oradebug dump systemstate 10
Statement processed.
SQL> oradebug tracefile_name
c:\oracle\product\10.2.0\admin\systex\udump\systex_ora_4636.trc
SQL> oradebug hanganalyze 3
Hang Analysis in c:\oracle\product\10.2.0\admin\systex\udump\systex_ora_6824.trc
先来看看 hanganalyze 的trace
Chain 1 :
//// 看来是 139 阻塞了一下的会话了.
--
Other chains found:
Chain 2 :
Chain 3 :
Chain 4 :
Chain 5 :
Chain 6 :
Chain 7 :
Chain 8 :
OK ,我们在 systemdump 里面搜索 2932
O/S info: user: SYSTEM, term: WILLSON-LI, ospid: 2932
OSD pid info: Windows threadid: 2932, image: ORACLE.EXE (SHAD)
Dump of memory from 0x000007FF18249EC0 to 0x000007FF1824A0C8
7FF18249EC0 00000006 00000000 17615728 000007FF [........(Wa.....]
7FF18249ED0 00000010 000313A7 183ADF60 000007FF [........`.:.....]
7FF18249EE0 00000003 000313A7 1798B0E8 000007FF [................]
7FF18249EF0 0000000B 000313A7 1836A1C0 000007FF [..........6.....]
7FF18249F00 00000004 0003129B 16835088 000007FF [.........P......]
7FF18249F10 00000007 000313A7 16836020 000007FF [........ `......]
7FF18249F20 00000007 000313A7 00000000 00000000 [................]
7FF18249F30 00000000 00000000 00000000 00000000 [................]
Repeat 24 times
7FF1824A0C0 00000000 00000000 [........]
(FOB) flags=2 fib=000007FF169BB290 incno=0 pending i/o cnt=0
fname=C:\ORACLE\PRODUCT\10.2.0\ORADATA\SYSTEX\CONTROL04.CTL
fno=0 lblksz=16384 fsiz=430
(FOB) flags=2 fib=000007FF169BB630 incno=0 pending i/o cnt=0
fname=C:\ORACLE\PRODUCT\10.2.0\ORADATA\SYSTEX\SYSTEM01.DBF
fno=1 lblksz=8192 fsiz=64000
----------------------------------------
SO: 000007FF1836A1C0, type: 4, owner: 000007FF18263170, flag: INIT/-/-/0x00
(session) sid: 139 trans: 000007FF16ADE418, creator: 000007FF18263170, flag: (100041) USR/- BSY/-/-/-/-/-
DID: 0001-0010-00000029, short-term DID: 0000-0000-00000000
txn branch: 0000000000000000
oct: 0, prv: 0, sql: 0000000000000000, psql: 000007FF17EABB40, user: 0/SYS
service name: SYS$USERS
O/S info: user: WILLSON-LI\Administrator, term: WILLSON-LI, ospid: 5588:5372, machine: WORKGROUP\WILLSON-LI
program: sqlplus.exe
application name: sqlplus.exe, hash value=254292535
waiting for 'latch: shared pool' blocking sess=0x0000000000000000 seq=273 wait_time=0 seconds since wait started=129
address=c972f38, number=d6, tries=0 ////--- 在等待 latch c972f38
Dumping Session Wait History
for 'SQL*Net message from client' count=1 wait_time=61829702
driver id=42455100, #bytes=1, =0
for 'SQL*Net message to client' count=1 wait_time=3
driver id=42455100, #bytes=1, =0
for 'ksdxexeotherwait' count=1 wait_time=2
=0, =0, =0
for 'ksdxexeotherwait' count=1 wait_time=2
省略... 一直在等待 ksdxexeotherwait
---我们再搜索 c972f38
SO: 000007FF18263170, type: 2, owner: 0000000000000000, flag: INIT/-/-/0x00
(process) Oracle pid=16, calls cur/top: 000007FF183ADF60/000007FF183ADF60, flag: (0) -
int error: 0, call error: 0, sess error: 0, txn error 0
(post info) last post received: 0 0 0
last post received-location: No post
last process to post me: none
last post sent: 0 0 0
last post sent-location: No post
last process posted by me: none
(latch info) wait_event=0 bits=20
Location from where call was made: kghalo:
waiting for c972f38 Child shared pool level=7 child#=1
Location from where latch is held: kghupr1: Chunk Header
Context saved from call: 343209960
state=busy, wlstate=free
waiters [orapid (seconds since: put on list, posted, alive check)]:
11 (283, 1306567910, 283)
15 (268, 1306567910, 268)
10 (268, 1306567910, 268)
16 (129, 1306567910, 129)
23 (120, 1306567910, 120)
24 (108, 1306567910, 108)
waiter count=6
gotten 586960 times wait, failed first 238 sleeps 8
gotten 0 times nowait, failed: 0
on wait list for c972f38----------------------------------------------
holding (efd=7) 15e479a0 Child library cache level=5 child#=2
Location from where latch is held: kglobpn: child:: latch
Context saved from call: 1
state=busy, wlstate=free
可以看出来 SO : 000007FF18263170 holding 了 Child library cache 在等待 Child shared pool
SQL> oradebug poke 0xc972f38 4 0
BEFORE: [00C972F38, 00C972F3C) = 00000001
AFTER: [00C972F38, 00C972F3C) = 00000000
其他:
1、10G以前,使用DBX,GDB之类的工具:
找到一个前台进程的PID(ps -ef|grep LOCAL),然后用gdb $ORACLE_HOME/bin/oracle进行跟踪
oracle@gurufl02:~> ps -ef|grep LOCAL
oracle 26358 26357 0 09:52 ? 00:00:00 oracleora9 (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))
oracle 27848 27810 0 10:41 pts/3 00:00:00 grep LOCAL
oracle@gurufl02:~> gdb $ORACLE_HOME/bin/oracle 26358
然后执行:print ksudss(10)
( gdb) print ksudss(10)
[Switching to Thread 4154533552 (LWP 26358)]
warning: Unable to restore previously selected frame.
$1 = 181773020
Current language: auto; currently asm
这个时候在UDUMP或者BDUMP下会生成一个TRACE文件。里面包含SYSTEM STATE DUMP
除了通过print ksudss(10)进行systemstate dump,还可以进行下面的dump
print ksdhng(3,1,0) 相当于oradebug hanganalyze 3
print ksudps(10) 相当于oradebug dump processstate 10
print curdmp() 相当于oradebug call curdmp(也就是oradebug dump cursordump)
print ksdtrc(4) 相当于oradebug dump events 4(这里参数表示level,1–session,2–process,4–system)
print ksdsel(10046,12) –相当于为attach的进程设置10046事件level 12
print skdxipc() –相当于oradebug ipc
print skdxprst() –相当于oradebug procstat
当然如果能用oradebug,就应该使用oradebug,毕竟方便得多,也更安全。
2、10G下sqlplus -prelim
如果10G,可以使用sqlplus -prelim选项强制登录
export ORACLE_SID=ora9
sqlplus -prelim '/ as sysdba'
oradebug setmypid
oradebug unlimit;
oradebug dump systemstate 10
3、如果上述都不行,可以使用pstack把堆栈信息取出来,另外也可以使用truss, tusc,strace 之类的工具看看到底在等什么
数据库HANG住是计较头痛的事情,如何找到HANG住的原因,是DBA必须面临的课题