Home  >  Article  >  Database  >  初识全文索引

初识全文索引

WBOY
WBOYOriginal
2016-06-07 15:54:47973browse

通常来说,全文索引大多用在OLAP环境当中,全文索引擅长于词汇的快速搜索。 一、全文索引和普通b_tree索引对比 SQL create tablet1 (id int,name varchar(10)); Table created. SQL create indext1_ind on t1(name); Index created. SQL create tablet2 as s

通常来说,全文索引大多用在OLAP环境当中,全文索引擅长于词汇的快速搜索。

一、全文索引和普通b_tree索引对比

SQL> create tablet1 (id int,name varchar(10));

Table created.

SQL> create indext1_ind on t1(name);

Index created.

SQL> create tablet2 as select * from t1;

Table created.

SQL> create indext2_ind on t2(name) indextype is ctxsys.context;

Index created.

SQL> select *from t1 where name like '%tom%';

ID NAME

--------------------

1 tom

2 tom tom

2 tom tom

Execution Plan

----------------------------------------------------------

Plan hash value:3589342044

--------------------------------------------------------------------------------------

| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |

--------------------------------------------------------------------------------------

| 0 | SELECT STATEMENT | | 1 | 10 | 2 (0)| 00:00:01 |

| 1 | TABLE ACCESS BY INDEX ROWID| T1 | 1 | 10 | 2 (0)| 00:00:01 |

|* 2 | INDEX FULL SCAN | T1_IND| 1 | | 1 (0)| 00:00:01 |

--------------------------------------------------------------------------------------

PredicateInformation (identified by operation id):

---------------------------------------------------

2 - filter("NAME" LIKE '%tom%' AND"NAME" IS NOT NULL)

Statistics

----------------------------------------------------------

0 recursive calls

0 db block gets

4 consistent gets

0 physical reads

0 redo size

676 bytes sent via SQL*Net to client

519 bytes received via SQL*Net from client

2 SQL*Net roundtrips to/from client

0 sorts (memory)

0 sorts (disk)

3 rows processed

SQL> select *from t2 where contains(name,'tom')>0;

ID NAME

--------------------

1 tom

2 tom tom

2 tom tom

Execution Plan

----------------------------------------------------------

Plan hash value:785228215

--------------------------------------------------------------------------------------

| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |

--------------------------------------------------------------------------------------

| 0 | SELECT STATEMENT | | 3 | 30 | 7 (0)| 00:00:01 |

| 1 | TABLE ACCESS BY INDEX ROWID| T2 | 3 | 30 | 7 (0)| 00:00:01 |

|* 2 | DOMAIN INDEX | T2_IND| | | 1 (0)| 00:00:01 |

--------------------------------------------------------------------------------------

PredicateInformation (identified by operation id):

---------------------------------------------------

2 -access("CTXSYS"."CONTAINS"("NAME",'tom')>0)

Statistics

----------------------------------------------------------

10 recursive calls

0 db block gets

14 consistent gets

0 physical reads

0 redo size

676 bytes sent via SQL*Net to client

519 bytes received via SQL*Net from client

2 SQL*Net roundtrips to/from client

0 sorts (memory)

0 sorts (disk)

3 rows processed

SQL> selectobject_name,object_type from user_objects order by 1;

OBJECT_NAME OBJECT_TYPE

-------------------------------------------------

--DR开头的四张表为全文索引的基表

DR$T2_IND$X INDEX

DRC$T2_IND$R INDEX

SYS_IL0000236119C00006$$ INDEX

SYS_IL0000236124C00002$$ INDEX

SYS_IOT_TOP_236122 INDEX

SYS_IOT_TOP_236128 INDEX

SYS_LOB0000236119C00006$$ LOB

OBJECT_NAME OBJECT_TYPE

-------------------------------------------------

SYS_LOB0000236124C00002$$ LOB

T1 TABLE

T1_IND INDEX

T2 TABLE

T2_IND INDEX

二、DML操作对全文索引的影响

以context方式创建的全文索引并不是基于事务的,默认情况下,即使一个dml操作提交,信息也不会更新到全文索引中。

1、insert 操作

SQL> create tablet(name varchar2(30));

Table created.

SQL> create indext_ind on t(name) indextype is ctxsys.context;

Index created.

SQL> insert intot values('i am an oracle dba');

1 row created.

SQL> commit;

insert数据已提交,我们看看全文索引是否已更新

SQL> setautotrace on

SQL> select *from t where name like '%dba%';

NAME

------------------------------

i am an oracle dba 

Execution Plan

----------------------------------------------------------

Plan hash value:1601196873

--------------------------------------------------------------------------

| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |

--------------------------------------------------------------------------

| 0 | SELECT STATEMENT | | 1 | 17 | 2 (0)| 00:00:01 |

|* 1 | TABLE ACCESS FULL| T | 1 | 17 | 2 (0)| 00:00:01 |

--------------------------------------------------------------------------

PredicateInformation (identified by operation id):

---------------------------------------------------

1 - filter("NAME" IS NOT NULL AND"NAME" LIKE '%dba%')

Note

-----

- dynamic sampling used for this statement(level=2)

Statistics

----------------------------------------------------------

5 recursive calls

0 db block gets

15 consistent gets

0 physical reads

0 redo size

538 bytes sent via SQL*Net to client

520 bytes received via SQL*Net from client

2 SQL*Net roundtrips to/from client

0 sorts (memory)

0 sorts (disk)

1 rows processed

SQL> set line 200

SQL> select *from t where contains(name,'dba') >0;

no rows selected

Execution Plan

----------------------------------------------------------

Plan hash value:315187259

-------------------------------------------------------------------------------------

| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |

-------------------------------------------------------------------------------------

| 0 | SELECT STATEMENT | | 1 | 29 | 4 (0)| 00:00:01 |

| 1 | TABLE ACCESS BY INDEX ROWID| T | 1 | 29 | 4 (0)| 00:00:01 |

|* 2 | DOMAIN INDEX | T_IND| | | 4 (0)| 00:00:01 |

-------------------------------------------------------------------------------------

PredicateInformation (identified by operation id):

---------------------------------------------------

2 -access("CTXSYS"."CONTAINS"("NAME",'dba')>0)

Note

-----

- dynamic sampling used for this statement(level=2)

Statistics

----------------------------------------------------------

1829 recursive calls

0 db block gets

2696 consistent gets

30 physical reads

0 redo size

332 bytes sent via SQL*Net to client

509 bytes received via SQL*Net from client

1 SQL*Net roundtrips to/from client

164 sorts (memory)

0 sorts (disk)

0 rows processed

以上发现,全文索引并没有自动更新,而是把记录存放在线面的ctxsys.dr$pending表中。

SQL> setautotrace off

SQL> select *from ctxsys.dr$pending;

PND_CID PND_PID PND_ROWID PND_TIMESTAMP P

-------------------- ------------------ ------------------ -

1084 0 AABGmVAAEAAAADmAAA 03-APR-14 N

SQL> insert intot values('he is an oracle dba');

1 row created.

SQL> commit;

Commit complete.

SQL> select *from ctxsys.dr$pending;

PND_CID PND_PID PND_ROWID PND_TIMESTAMP P

-------------------- ------------------ ------------------ -

1084 0 AABGmVAAEAAAADmAAA 03-APR-14 N

1084 0 AABGmVAAEAAAADmAAB 03-APR-14 N

SQL> select *from t where contains(name,'dba') >0;

no rows selected

为了把信息同步到全文索引中,我们需要手工同步:

SQL> alter indext_ind rebuild parameters ('sync');

Index altered.

SQL> select *from t where contains(name,'dba') >0; 

NAME

------------------------------

i am an oracle dba

he is an oracle dba

SQL> select *from ctxsys.dr$pending;

no rows selected

2、delete操作

SQL> select *from t;

NAME

------------------------------

i am an oracle dba

he is an oracle dba

SQL> delete fromt where name='he is an oracle dba';

1 row deleted.

SQL> select *from t where contains(name,'dba') >0;

NAME

------------------------------

i am an oracle dba

SQL> select *from ctxsys.dr$pending;

no rows selected

SQL> select *from ctxsys.dr$delete;

DEL_IDX_IDDEL_IXP_ID DEL_DOCID

-------------------- ----------

1084 0 2

这里我们看到全文索引立即生效了,至于ctxsys.dr$delete里面的数据是delete操作的中间状态,用来维护一个事物,无论事物提交或者回滚。

SQL> rollback;

Rollback complete.

SQL> select *from t where contains(name,'dba') >0;

NAME

------------------------------

i am an oracle dba

he is an oracle dba

SQL> select *from ctxsys.dr$delete;

no rows selected

3、update操作

update操作相当于delete+insert操作,所以默认情况下需要手动刷新全文索引。

SQL> update t setname='oracle dba' where name='i am an oracle dba';

1 row updated.

SQL> select *from ctxsys.dr$delete;

DEL_IDX_IDDEL_IXP_ID DEL_DOCID

-------------------- ----------

1084 0 1

SQL> select *from ctxsys.dr$pending;

PND_CID PND_PID PND_ROWID PND_TIMESTAMP P

-------------------- ------------------ ------------------ -

1084 0 AABGmVAAEAAAADmAAA 03-APR-14 N

SQL> select *from t where contains(name,'dba') > 0;

NAME

------------------------------

he is an oracle dba

SQL> alter indext_ind rebuild parameters ('sync');

Index altered.

SQL> select *from t where contains(name,'dba') > 0;

NAME

------------------------------

he is an oracle dba

oracle dba

由于全文索引创建对象大多是海量数据的表,dml操作如果实时更新会影响到系统性能。创建全文索引同步的三个选项:

manual:默认选项

every:在一个时间段后更新索引

on commitdml:在事务提交后更新索引

语法如下:

create index t_indon t(name) indextype is ctxsys.context parameters('sync (on commit)');

查看全文索引信息和性能的工具包ctx_report

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn