Maison >base de données >tutoriel mysql >1.4TBASM(RAC)磁盘损坏恢复小记

1.4TBASM(RAC)磁盘损坏恢复小记

WBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWB
WBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBoriginal
2016-06-07 15:29:251593parcourir

这周折腾了2天的时间帮客户成功恢复了一套近1.4TB的10.2.0.5 RAC(ASM). 该库在3月4号直接crash了。 大家可以看到,该库在开始报错读取redo,controlfile报错,本质原因是DISKGROUP dismount了,信息如下: Tue Mar 04 18:09:59 CST 2014 Errors in file /home/o

这周折腾了2天的时间帮客户成功恢复了一套近1.4TB的10.2.0.5 RAC(ASM). 该库在3月4号直接crash了。

大家可以看到,该库在开始报错读取redo,controlfile报错,本质原因是DISKGROUP dismount了,信息如下:

Tue Mar 04 18:09:59 CST 2014 <code class="php plain">Errors in file /home/oraprod/10.2.0/db/admin/xxxx/bdump/xxxx_lgwr_15943.trc: <code class="php plain">ORA-00345: redo log write error block 68145 <code class="php functions">count <code class="php plain">5 <code class="php plain">ORA-00312: online log 6 thread 2: <code class="php string">'+DATA/xxxx/onlinelog/o2_t2_redo3.log' <code class="php plain">ORA-15078: ASM diskgroup was forcibly dismounted <code class="php plain">Tue Mar 04 18:09:59 CST 2014 <code class="php plain">SUCCESS: diskgroup DATA was dismounted <code class="php plain">SUCCESS: diskgroup DATA was dismounted <code class="php plain">Tue Mar 04 18:10:00 CST 2014 <code class="php plain">Errors in file /home/oraprod/10.2.0/db/admin/xxxx/bdump/xxxx_lmon_15892.trc: <code class="php plain">ORA-00202: control file: <code class="php string">'+DATA/xxxx/controlfile/o1_mf_4g1zr1yo_.ctl' <code class="php plain">ORA-15078: ASM diskgroup was forcibly dismounted <code class="php plain">Tue Mar 04 18:10:00 CST 2014 <code class="php plain">KCF: write/open error block=0x1f41e online=1 <code class="php spaces"><code class="php plain">file=31 +DATA/xxxx/datafile/apps_ts_queues.310.692585175 <code class="php spaces"><code class="php plain">error=15078 txt: <code class="php string">'' <code class="php plain">Tue Mar 04 18:10:00 CST 2014 <code class="php plain">KCF: write/open error block=0x47d5d online=1 <code class="php spaces"><code class="php plain">file=51 +DATA/xxx/datafile/apps_ts_tx_data.353.692593409 <code class="php spaces"><code class="php plain">error=15078 txt: <code class="php string">'' <code class="php plain">Tue Mar 04 18:10:00 CST 2014 <code class="php plain">Errors in file /home/oraprod/10.2.0/db/admin/xxxx/bdump/xxxx_dbw2_15939.trc: <code class="php plain">ORA-00202: control file: <code class="php string">'+DATA/prod/controlfile/o1_mf_4g1zr1yo_.ctl' <code class="php plain">ORA-15078: ASM diskgroup was forcibly dismounted <code class="php plain">Tue Mar 04 18:10:00 CST 2014 <code class="php plain">KCF: write/open error block=0x47d5b online=1 <code class="php spaces"><code class="php plain">file=51 +DATA/prod/datafile/apps_ts_tx_data.353.692593409 <code class="php spaces"><code class="php plain">error=15078 txt: <code class="php string">'' <code class="php plain">Tue Mar 04 18:10:00 CST 2014 <p>数据库实例挂了之后,我们来看下ASM实例的alert log信息,如下:</p> <code class="php plain">Tue Mar 04 18:10:04 CST 2014 <code class="php plain">NOTE: SMON starting instance recovery <code class="php keyword">for <code class="php plain">group 1 (mounted) <code class="php plain">Tue Mar 04 18:10:04 CST 2014 <code class="php plain">WARNING: IO Failed. au:0 diskname:/dev/raw/raw5 <code class="php spaces"><code class="php plain">rq:0x200000000207b518 buffer:0x200000000235c600 au_offset(bytes):0 iosz:4096 operation:0 <code class="php spaces"><code class="php plain">status:2 <code class="php plain">WARNING: IO Failed. au:0 diskname:/dev/raw/raw5 <code class="php spaces"><code class="php plain">rq:0x200000000207b518 buffer:0x200000000235c600 au_offset(bytes):0 iosz:4096 operation:0 <code class="php spaces"><code class="php plain">status:2 <code class="php plain">NOTE: F1X0 found on disk 0 fcn 0.160230519 <code class="php plain">WARNING: IO Failed. au:33 diskname:/dev/raw/raw5 <code class="php spaces"><code class="php plain">rq:0x60000000002d64f0 buffer:0x400405df000 au_offset(bytes):0 iosz:4096 operation:0 <code class="php spaces"><code class="php plain">status:2 <code class="php plain">WARNING: cache failed to read gn 1 fn 3 blk 10752 <code class="php functions">count <code class="php plain">1 from disk 2 <code class="php plain">ERROR: cache failed to read fn=3 blk=10752 from disk(s): 2 <code class="php plain">ORA-15081: failed to submit an I/O operation to a disk <code class="php plain">NOTE: cache initiating offline of disk 2 group 1 <code class="php plain">WARNING: process 12863 initiating offline of disk 2.2526420198 (DATA_0002) with mask 0x3 in group 1 <code class="php plain">NOTE: PST update: grp = 1, dsk = 2, mode = 0x6 <code class="php plain">Tue Mar 04 18:10:04 CST 2014 <code class="php plain">ERROR: too many offline disks in PST (grp 1) <code class="php plain">Tue Mar 04 18:10:04 CST 2014 <code class="php plain">ERROR: PST-initiated MANDATORY DISMOUNT of group DATA <code class="php plain">Tue Mar 04 18:10:04 CST 2014 <code class="php plain">WARNING: Disk 2 in group 1 in mode: 0x7,state: 0x2 was taken offline <code class="php plain">Tue Mar 04 18:10:05 CST 2014 <code class="php plain">NOTE: halting all I/Os to diskgroup DATA <code class="php plain">NOTE: active pin found: 0x0x40045bb0fd0 <code class="php plain">Tue Mar 04 18:10:05 CST 2014 <code class="php plain">Abort recovery <code class="php keyword">for <code class="php plain">domain 1 <code class="php plain">Tue Mar 04 18:10:05 CST 2014 <code class="php plain">NOTE: cache dismounting group 1/0xD916EC16 (DATA) <code class="php plain">Tue Mar 04 18:10:06 CST 2014 <p>大家可以看到,ASM报了一个ORA-15081错误,在该错误之前是报对其中一个盘/dev/raw/raw5的IO操作错误。<br> 细心的朋友可以看到,这里由于IO 操作异常后,该disk被offline了。最后磁盘组无法mount。</p> <p>我们测试使用kfed read无法读取该disk,dd也无法操作。但是却可以直接dd 该disk对应的物理盘。</p> <p>磁盘组无法mount,从其中trace来看显然是磁盘头损坏,如下:</p> <code class="php plain">WARNING: cache read a corrupted block gn=1 dsk=2 blk=1 from disk 2 <code class="php plain">OSM metadata block dump: <code class="php plain">kfbh.endian: 0 ; 0x000: 0x00 <code class="php plain">kfbh.hard: 0 ; 0x001: 0x00 <code class="php plain">kfbh.type: 0 ; 0x002: KFBTYP_INVALID <code class="php plain">kfbh.datfmt: 0 ; 0x003: 0x00 <code class="php plain">kfbh.block.blk: 0 ; 0x004: T=0 NUMB=0x0 <code class="php plain">kfbh.block.obj: 0 ; 0x008: TYPE=0x0 NUMB=0x0 <code class="php plain">kfbh.check: 0 ; 0x00c: 0x00000000 <code class="php plain">kfbh.fcn.base: 0 ; 0x010: 0x00000000 <code class="php plain">kfbh.fcn.wrap: 0 ; 0x014: 0x00000000 <code class="php plain">kfbh.spare1: 0 ; 0x018: 0x00000000 <code class="php plain">kfbh.spare2: 0 ; 0x01c: 0x00000000 <code class="php spaces"><code class="php plain">CE: (0x0x400417ee4e0) group=1 (DATA) obj=2 (disk) blk=1 <code class="php spaces"><code class="php plain">hashFlags=0x0002 lid=0x0002 lruFlags=0x0000 bastCount=1 <code class="php spaces"><code class="php plain">redundancy=0x11 fileExtent=-2147483648 AUindex=0 blockIndex=1 <code class="php spaces"><code class="php functions">copy <code class="php plain">#0: disk=2 au=0 <code class="php spaces"><code class="php plain">BH: (0x0x40041795000) bnum=4586 type=reading state=reading chgSt=not modifying <code class="php spaces"><code class="php plain">flags=0x00000000 pinmode=excl lockmode=share bf=0x0x40041400000 <code class="php spaces"><code class="php plain">kfbh_kfcbh.fcn_kfbh = 0.0 lowAba=655.8572 highAba=0.0 <code class="php spaces"><code class="php plain">last kfcbInitSlot <code class="php keyword">return <code class="php plain">code=null cpkt lnk is null <p>大家知道Oracle ASM 10.2.0.5版本开始会对ASM disk header 进行自动备份,如果如果仅仅是盘头<br> 损坏那么恢复是很easy的。但是其实并不是这么简单,通过dd判断,该盘的前面几个block其实被损坏。</p> <p>最后我们通过ODU 直接将数据文件从磁盘拷贝到文件系统,然后起库,最后完成整个恢复过程。</p> <p>备注:在恢复过程中,发现ODU无法直接拷贝test201402.dbf 这样的文件,然而通过检查</p> <p>asm alias directory发现,其实是完好的,这里可能odu处理还有点小问题,我们通过手工将该元数据</p> <p>的AU 读取出来,然后匹配将剩下的文件全部抽取出来了,包括redo,controlfile,直接顺利打开数据库。</p> <p>不得不说,熊哥的ODU太强大了,秒杀各种Oracle ASM的数据库恢复Case!</p> <p> </p>
Déclaration:
Le contenu de cet article est volontairement contribué par les internautes et les droits d'auteur appartiennent à l'auteur original. Ce site n'assume aucune responsabilité légale correspondante. Si vous trouvez un contenu suspecté de plagiat ou de contrefaçon, veuillez contacter admin@php.cn