Home  >  Article  >  Database  >  What is mysql full text index

What is mysql full text index

青灯夜游
青灯夜游Original
2023-04-23 19:03:243634browse

In mysql, full-text indexing is a technology to find any information in the entire book or entire article stored in the database. Most of the queries we need can be completed through numerical comparison, range filtering, etc. However, if you want to filter the query through keyword matching, you need a query based on similarity instead of the original precise numerical comparison; and Full-text indexing is designed for this scenario.

What is mysql full text index

The operating environment of this tutorial: windows7 system, mysql8 version, Dell G3 computer.

Introducing the


## concept

Full-text index (Full-Text Search) is a technology to find any information in the entire book or entire article stored in the database. It can obtain information about chapters, sections, paragraphs, sentences, words, etc. in the full text as needed, and can also perform various statistics and analysis. Full-text indexing is generally implemented through inverted indexing.

Most of the queries we need can be completed through numerical comparison, range filtering, etc. However, if you want to filter queries through keyword matching, you need to query based on similarity instead of the original precise numerical comparison. Full-text indexing is designed for this scenario.

You may say, you can use like % to achieve fuzzy matching, why do you need full-text indexing? like % is suitable when the text is relatively small, but it is unimaginable for retrieval of a large amount of text data. Full-text indexing can be N times faster than like % in the face of a large amount of data. The speed is not an order of magnitude, but full-text indexing may have accuracy issues.

You may not have paid attention to full-text indexing, but you should be familiar with at least one full-text indexing technology: various search engines. Although the index objects of search engines are extremely large amounts of data, and usually there is not a relational database behind them, the basic principles of full-text indexing are the same.

Version support

Before we begin, let’s talk about the full-text index version, storage engine, and data type support

    Before MySQL 5.6, only the MyISAM storage engine supports full-text index;
  1. In MySQL 5.6 and later versions, both MyISAM and InnoDB storage engines support full-text index;
  2. Only fields Full-text indexes can be built only if the data types are char, varchar, text and their series.
When testing or using full-text index, you must first check whether your MySQL version, storage engine and data type support full-text index.

Operation of full-text index


The operations of the index can be found in any search, but I will repeat them here.


Create

    Create a full-text index when creating a table
  1. create table fulltext_test (
        id int(11) NOT NULL AUTO_INCREMENT,
        content text NOT NULL,
        tag varchar(255),    PRIMARY KEY (id),
        FULLTEXT KEY content_tag_fulltext(content,tag)  // 创建联合全文索引列
    ) ENGINE=MyISAM DEFAULT CHARSET=utf8;
    Create a full-text index on an existing table
  1. create fulltext index content_tag_fulltext    on fulltext_test(content,tag);
    Create a full-text index through the SQL statement ALTER TABLE
  1. alter table fulltext_test    add fulltext index content_tag_fulltext(content,tag);

Modify

Modify O, delete and rebuild directly.

Delete

    Use DROP INDEX directly to delete the full-text index
  1. drop index content_tag_fulltext    on fulltext_test;
    Through SQL Statement ALTER TABLE deletes full-text index
  1. alter table fulltext_test    drop index content_tag_fulltext;

Use full-text index


Different from commonly used fuzzy matching like %, full-text index has its own syntax Format, use match and against keywords, such as


select * from fulltext_test 
    where match(content,tag) against('xxx xxx');

Note: The columns specified in the match() function must be exactly the same as the columns specified in the full-text index, otherwise it will An error is reported and the full-text index cannot be used. This is because the full-text index does not record which column the keyword comes from. If you want to use a full-text index for a column, create a separate full-text index for that column.

Test full text index


Add test data

Yes With the above knowledge, you can test the full-text index.

First create the test table and insert the test data

create table test (
    id int(11) unsigned not null auto_increment,
    content text not null,    primary key(id),
    fulltext key content_index(content)
) engine=MyISAM default charset=utf8;insert into test (content) values ('a'),('b'),('c');insert into test (content) values ('aa'),('bb'),('cc');insert into test (content) values ('aaa'),('bbb'),('ccc');insert into test (content) values ('aaaa'),('bbbb'),('cccc');

Execute the following query according to the syntax of the full-text index

select * from test where match(content) against('a');select * from test where match(content) against('aa');select * from test where match(content) against('aaa');

According to our inertial thinking, 4 records should be displayed. Yes, but the result is that there is no record. Only when the following query is executed,

select * from test where match(content) against('aaaa');

will find the

aaaa record.

Why? There are many reasons for this problem, the most common of which is

Minimum search length. In addition, when using full-text index, there must be at least 4 records in the test table, otherwise, unexpected results will occur.

The full-text index in MySQL has two variables, the minimum search length and the maximum search length. Words whose length is less than the minimum search length and greater than the maximum search length will not be indexed. In layman's terms, if you want to use full-text index search for a word, the length of the word must be within the range of the above two variables.

The default values ​​of these two can be viewed using the following command

show variables like '%ft%';

可以看到这两个变量在 MyISAM 和 InnoDB 两种存储引擎下的变量名和默认值

// MyISAM
ft_min_word_len = 4;
ft_max_word_len = 84;

// InnoDB
innodb_ft_min_token_size = 3;
innodb_ft_max_token_size = 84;

可以看到最小搜索长度 MyISAM 引擎下默认是 4,InnoDB 引擎下是 3,也即,MySQL 的全文索引只会对长度大于等于 4 或者 3 的词语建立索引,而刚刚搜索的只有 aaaa 的长度大于等于 4。

配置最小搜索长度

全文索引的相关参数都无法进行动态修改,必须通过修改 MySQL 的配置文件来完成。修改最小搜索长度的值为 1,首先打开 MySQL 的配置文件 /etc/my.cnf,在 [mysqld] 的下面追加以下内容

[mysqld]innodb_ft_min_token_size = 1ft_min_word_len = 1

然后重启 MySQL 服务器,并修复全文索引。注意,修改完参数以后,一定要修复下索引,不然参数不会生效。

两种修复方式,可以使用下面的命令修复

repair table test quick;

或者直接删掉重新建立索引,再次执行上面的查询,a、aa、aaa 就都可以查出来了。

但是,这里还有一个问题,搜索关键字 a 时,为什么 aa、aaa、aaaa 没有出现结果中,讲这个问题之前,先说说两种全文索引。

两种全文索引


自然语言的全文索引

默认情况下,或者使用 in natural language mode 修饰符时,match() 函数对文本集合执行自然语言搜索,上面的例子都是自然语言的全文索引。

自然语言搜索引擎将计算每一个文档对象和查询的相关度。这里,相关度是基于匹配的关键词的个数,以及关键词在文档中出现的次数。在整个索引中出现次数越少的词语,匹配时的相关度就越高。相反,非常常见的单词将不会被搜索,如果一个词语的在超过 50% 的记录中都出现了,那么自然语言的搜索将不会搜索这类词语。上面提到的,测试表中必须有 4 条以上的记录,就是这个原因。

这个机制也比较好理解,比如说,一个数据表存储的是一篇篇的文章,文章中的常见词、语气词等等,出现的肯定比较多,搜索这些词语就没什么意义了,需要搜索的是那些文章中有特殊意义的词,这样才能把文章区分开。

布尔全文索引

在布尔搜索中,我们可以在查询中自定义某个被搜索的词语的相关性,当编写一个布尔搜索查询时,可以通过一些前缀修饰符来定制搜索。

MySQL 内置的修饰符,上面查询最小搜索长度时,搜索结果 ft_boolean_syntax 变量的值就是内置的修饰符,下面简单解释几个,更多修饰符的作用可以查手册

  • + 必须包含该词
  • - 必须不包含该词
  • > 提高该词的相关性,查询的结果靠前
  • < 降低该词的相关性,查询的结果靠后
  • (*)星号 通配符,只能接在词后面

对于上面提到的问题,可以使用布尔全文索引查询来解决,使用下面的命令,a、aa、aaa、aaaa 就都被查询出来了。

select * test where match(content) against(&#39;a*&#39; in boolean mode);

总结


好了,差不多写完了,又到了总结的时候。

MySQL 的全文索引最开始仅支持英语,因为英语的词与词之间有空格,使用空格作为分词的分隔符是很方便的。亚洲文字,比如汉语、日语、汉语等,是没有空格的,这就造成了一定的限制。不过 MySQL 5.7.6 开始,引入了一个 ngram 全文分析器来解决这个问题,并且对 MyISAM 和 InnoDB 引擎都有效。

事实上,MyISAM 存储引擎对全文索引的支持有很多的限制,例如表级别锁对性能的影响、数据文件的崩溃、崩溃后的恢复等,这使得 MyISAM 的全文索引对于很多的应用场景并不适合。所以,多数情况下的建议是使用别的解决方案,例如 Sphinx、Lucene 等等第三方的插件,亦或是使用 InnoDB 存储引擎的全文索引。

几个注意点

  1. 使用全文索引前,搞清楚版本支持情况;
  2. 全文索引比 like + % 快 N 倍,但是可能存在精度问题;
  3. 如果需要全文索引的是大量数据,建议先添加数据,再创建索引;
  4. 对于中文,可以使用 MySQL 5.7.6 之后的版本,或者第三方插件。

【相关推荐:mysql视频教程

The above is the detailed content of What is mysql full text index. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn