What is mysql full text index-Mysql Tutorial-php.cn

Home

Database

Mysql Tutorial

What is mysql full text index

青灯夜游

Apr 23, 2023 pm 07:03 PM

mysql

In mysql, full-text indexing is a technology to find any information in the entire book or entire article stored in the database. Most of the queries we need can be completed through numerical comparison, range filtering, etc. However, if you want to filter the query through keyword matching, you need a query based on similarity instead of the original precise numerical comparison; and Full-text indexing is designed for this scenario.

What is mysql full text index

The operating environment of this tutorial: windows7 system, mysql8 version, Dell G3 computer.

Introducing the

## concept

Full-text index (Full-Text Search) is a technology to find any information in the entire book or entire article stored in the database. It can obtain information about chapters, sections, paragraphs, sentences, words, etc. in the full text as needed, and can also perform various statistics and analysis. Full-text indexing is generally implemented through inverted indexing.

Most of the queries we need can be completed through numerical comparison, range filtering, etc. However, if you want to filter queries through keyword matching, you need to query based on similarity instead of the original precise numerical comparison. Full-text indexing is designed for this scenario.

You may say, you can use like % to achieve fuzzy matching, why do you need full-text indexing? like % is suitable when the text is relatively small, but it is unimaginable for retrieval of a large amount of text data. Full-text indexing can be N times faster than like % in the face of a large amount of data. The speed is not an order of magnitude, but full-text indexing may have accuracy issues.

You may not have paid attention to full-text indexing, but you should be familiar with at least one full-text indexing technology: various search engines. Although the index objects of search engines are extremely large amounts of data, and usually there is not a relational database behind them, the basic principles of full-text indexing are the same.

Version support

Before we begin, let’s talk about the full-text index version, storage engine, and data type support

In MySQL 5.6 and later versions, both MyISAM and InnoDB storage engines support full-text index;
Only fields Full-text indexes can be built only if the data types are char, varchar, text and their series.

When testing or using full-text index, you must first check whether your MySQL version, storage engine and data type support full-text index.

Operation of full-text index

The operations of the index can be found in any search, but I will repeat them here.

Create

create table fulltext_test (
    id int(11) NOT NULL AUTO_INCREMENT,
    content text NOT NULL,
    tag varchar(255),    PRIMARY KEY (id),
    FULLTEXT KEY content_tag_fulltext(content,tag)  // 创建联合全文索引列
) ENGINE=MyISAM DEFAULT CHARSET=utf8;

create fulltext index content_tag_fulltext    on fulltext_test(content,tag);

alter table fulltext_test    add fulltext index content_tag_fulltext(content,tag);

Modify

Modify O, delete and rebuild directly.

Delete

drop index content_tag_fulltext    on fulltext_test;

alter table fulltext_test    drop index content_tag_fulltext;

Use full-text index

Different from commonly used fuzzy matching like %, full-text index has its own syntax Format, use match and against keywords, such as

select * from fulltext_test 
    where match(content,tag) against(&#39;xxx xxx&#39;);

Note: The columns specified in the match() function must be exactly the same as the columns specified in the full-text index, otherwise it will An error is reported and the full-text index cannot be used. This is because the full-text index does not record which column the keyword comes from. If you want to use a full-text index for a column, create a separate full-text index for that column.

Test full text index

Add test data

Yes With the above knowledge, you can test the full-text index.

First create the test table and insert the test data

create table test (
    id int(11) unsigned not null auto_increment,
    content text not null,    primary key(id),
    fulltext key content_index(content)
) engine=MyISAM default charset=utf8;insert into test (content) values (&#39;a&#39;),(&#39;b&#39;),(&#39;c&#39;);insert into test (content) values (&#39;aa&#39;),(&#39;bb&#39;),(&#39;cc&#39;);insert into test (content) values (&#39;aaa&#39;),(&#39;bbb&#39;),(&#39;ccc&#39;);insert into test (content) values (&#39;aaaa&#39;),(&#39;bbbb&#39;),(&#39;cccc&#39;);

Execute the following query according to the syntax of the full-text index

select * from test where match(content) against(&#39;a&#39;);select * from test where match(content) against(&#39;aa&#39;);select * from test where match(content) against(&#39;aaa&#39;);

According to our inertial thinking, 4 records should be displayed. Yes, but the result is that there is no record. Only when the following query is executed,

select * from test where match(content) against(&#39;aaaa&#39;);

will find the

aaaa record.

Why? There are many reasons for this problem, the most common of which is

Minimum search length. In addition, when using full-text index, there must be at least 4 records in the test table, otherwise, unexpected results will occur.

The full-text index in MySQL has two variables, the minimum search length and the maximum search length. Words whose length is less than the minimum search length and greater than the maximum search length will not be indexed. In layman's terms, if you want to use full-text index search for a word, the length of the word must be within the range of the above two variables.

The default values of these two can be viewed using the following command

show variables like &#39;%ft%&#39;;

可以看到这两个变量在 MyISAM 和 InnoDB 两种存储引擎下的变量名和默认值

// MyISAM
ft_min_word_len = 4;
ft_max_word_len = 84;

// InnoDB
innodb_ft_min_token_size = 3;
innodb_ft_max_token_size = 84;

可以看到最小搜索长度 MyISAM 引擎下默认是 4，InnoDB 引擎下是 3，也即，MySQL 的全文索引只会对长度大于等于 4 或者 3 的词语建立索引，而刚刚搜索的只有 aaaa 的长度大于等于 4。

配置最小搜索长度

全文索引的相关参数都无法进行动态修改，必须通过修改 MySQL 的配置文件来完成。修改最小搜索长度的值为 1，首先打开 MySQL 的配置文件 /etc/my.cnf，在 [mysqld] 的下面追加以下内容

[mysqld]innodb_ft_min_token_size = 1ft_min_word_len = 1

然后重启 MySQL 服务器，并修复全文索引。注意，修改完参数以后，一定要修复下索引，不然参数不会生效。

两种修复方式，可以使用下面的命令修复

repair table test quick;

或者直接删掉重新建立索引，再次执行上面的查询，a、aa、aaa 就都可以查出来了。

但是，这里还有一个问题，搜索关键字 a 时，为什么 aa、aaa、aaaa 没有出现结果中，讲这个问题之前，先说说两种全文索引。

两种全文索引

自然语言的全文索引

默认情况下，或者使用 in natural language mode 修饰符时，match() 函数对文本集合执行自然语言搜索，上面的例子都是自然语言的全文索引。

自然语言搜索引擎将计算每一个文档对象和查询的相关度。这里，相关度是基于匹配的关键词的个数，以及关键词在文档中出现的次数。在整个索引中出现次数越少的词语，匹配时的相关度就越高。相反，非常常见的单词将不会被搜索，如果一个词语的在超过 50% 的记录中都出现了，那么自然语言的搜索将不会搜索这类词语。上面提到的，测试表中必须有 4 条以上的记录，就是这个原因。

这个机制也比较好理解，比如说，一个数据表存储的是一篇篇的文章，文章中的常见词、语气词等等，出现的肯定比较多，搜索这些词语就没什么意义了，需要搜索的是那些文章中有特殊意义的词，这样才能把文章区分开。

布尔全文索引

在布尔搜索中，我们可以在查询中自定义某个被搜索的词语的相关性，当编写一个布尔搜索查询时，可以通过一些前缀修饰符来定制搜索。

MySQL 内置的修饰符，上面查询最小搜索长度时，搜索结果 ft_boolean_syntax 变量的值就是内置的修饰符，下面简单解释几个，更多修饰符的作用可以查手册

+ 必须包含该词
- 必须不包含该词
> 提高该词的相关性，查询的结果靠前
降低该词的相关性，查询的结果靠后
(*)星号 通配符，只能接在词后面

对于上面提到的问题，可以使用布尔全文索引查询来解决，使用下面的命令，a、aa、aaa、aaaa 就都被查询出来了。

select * test where match(content) against(&#39;a*&#39; in boolean mode);

总结

好了，差不多写完了，又到了总结的时候。

MySQL 的全文索引最开始仅支持英语，因为英语的词与词之间有空格，使用空格作为分词的分隔符是很方便的。亚洲文字，比如汉语、日语、汉语等，是没有空格的，这就造成了一定的限制。不过 MySQL 5.7.6 开始，引入了一个 ngram 全文分析器来解决这个问题，并且对 MyISAM 和 InnoDB 引擎都有效。

事实上，MyISAM 存储引擎对全文索引的支持有很多的限制，例如表级别锁对性能的影响、数据文件的崩溃、崩溃后的恢复等，这使得 MyISAM 的全文索引对于很多的应用场景并不适合。所以，多数情况下的建议是使用别的解决方案，例如 Sphinx、Lucene 等等第三方的插件，亦或是使用 InnoDB 存储引擎的全文索引。

几个注意点

使用全文索引前，搞清楚版本支持情况；
全文索引比 like + % 快 N 倍，但是可能存在精度问题；
如果需要全文索引的是大量数据，建议先添加数据，再创建索引；
对于中文，可以使用 MySQL 5.7.6 之后的版本，或者第三方插件。

【相关推荐：mysql视频教程】

The above is the detailed content of What is mysql full text index. For more information, please follow other related articles on the PHP Chinese website!

Statement

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

MySQL: BLOB and other no-sql storage, what are the differences?May 13, 2025 am 12:14 AM

MySQL'sBLOBissuitableforstoringbinarydatawithinarelationaldatabase,whileNoSQLoptionslikeMongoDB,Redis,andCassandraofferflexible,scalablesolutionsforunstructureddata.BLOBissimplerbutcanslowdownperformancewithlargedata;NoSQLprovidesbetterscalabilityand

MySQL Add User: Syntax, Options, and Security Best PracticesMay 13, 2025 am 12:12 AM

ToaddauserinMySQL,use:CREATEUSER'username'@'host'IDENTIFIEDBY'password';Here'showtodoitsecurely:1)Choosethehostcarefullytocontrolaccess.2)SetresourcelimitswithoptionslikeMAX_QUERIES_PER_HOUR.3)Usestrong,uniquepasswords.4)EnforceSSL/TLSconnectionswith

MySQL: How to avoid String Data Types common mistakes?May 13, 2025 am 12:09 AM

ToavoidcommonmistakeswithstringdatatypesinMySQL,understandstringtypenuances,choosetherighttype,andmanageencodingandcollationsettingseffectively.1)UseCHARforfixed-lengthstrings,VARCHARforvariable-length,andTEXT/BLOBforlargerdata.2)Setcorrectcharacters

MySQL: String Data Types and ENUMs?May 13, 2025 am 12:05 AM

MySQloffersechar, Varchar, text, Anddenumforstringdata.usecharforfixed-Lengthstrings, VarcharerForvariable-Length, text forlarger text, AndenumforenforcingdataAntegritywithaetofvalues.

MySQL BLOB: how to optimize BLOBs requestsMay 13, 2025 am 12:03 AM

Optimizing MySQLBLOB requests can be done through the following strategies: 1. Reduce the frequency of BLOB query, use independent requests or delay loading; 2. Select the appropriate BLOB type (such as TINYBLOB); 3. Separate the BLOB data into separate tables; 4. Compress the BLOB data at the application layer; 5. Index the BLOB metadata. These methods can effectively improve performance by combining monitoring, caching and data sharding in actual applications.

Adding Users to MySQL: The Complete TutorialMay 12, 2025 am 12:14 AM

Mastering the method of adding MySQL users is crucial for database administrators and developers because it ensures the security and access control of the database. 1) Create a new user using the CREATEUSER command, 2) Assign permissions through the GRANT command, 3) Use FLUSHPRIVILEGES to ensure permissions take effect, 4) Regularly audit and clean user accounts to maintain performance and security.

Mastering MySQL String Data Types: VARCHAR vs. TEXT vs. CHARMay 12, 2025 am 12:12 AM

ChooseCHARforfixed-lengthdata,VARCHARforvariable-lengthdata,andTEXTforlargetextfields.1)CHARisefficientforconsistent-lengthdatalikecodes.2)VARCHARsuitsvariable-lengthdatalikenames,balancingflexibilityandperformance.3)TEXTisidealforlargetextslikeartic

MySQL: String Data Types and Indexing: Best PracticesMay 12, 2025 am 12:11 AM

Best practices for handling string data types and indexes in MySQL include: 1) Selecting the appropriate string type, such as CHAR for fixed length, VARCHAR for variable length, and TEXT for large text; 2) Be cautious in indexing, avoid over-indexing, and create indexes for common queries; 3) Use prefix indexes and full-text indexes to optimize long string searches; 4) Regularly monitor and optimize indexes to keep indexes small and efficient. Through these methods, we can balance read and write performance and improve database efficiency.

See all articles

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Roblox: Grow A Garden - Complete Mutation Guide

3 weeks agoByDDD

Roblox: Bubble Gum Simulator Infinity - How To Get And Use Royal Keys

3 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

How to fix KB5055612 fails to install in Windows 10?

3 weeks agoByDDD

Nordhold: Fusion System, Explained

3 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

Mandragora: Whispers Of The Witch Tree - How To Unlock The Grappling Hook

3 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

Hot Tools

Dreamweaver Mac version

Visual web development tools

SublimeText3 Mac version

God-level code editing software (SublimeText3)

EditPlus Chinese cracked version

Small size, syntax highlighting, does not support code prompt function

MinGW - Minimalist GNU for Windows

This project is in the process of being migrated to osdn.net/projects/mingw, you can continue to follow us there. MinGW: A native Windows port of the GNU Compiler Collection (GCC), freely distributable import libraries and header files for building native Windows applications; includes extensions to the MSVC runtime to support C99 functionality. All MinGW software can run on 64-bit Windows platforms.

SecLists

SecLists is the ultimate security tester's companion. It is a collection of various types of lists that are frequently used during security assessments, all in one place. SecLists helps make security testing more efficient and productive by conveniently providing all the lists a security tester might need. List types include usernames, passwords, URLs, fuzzing payloads, sensitive data patterns, web shells, and more. The tester can simply pull this repository onto a new test machine and he will have access to every type of list he needs.

Hot Topics

1666

1426

1328

1273

1254