Home  >  Article  >  Database  >  mysql function full text search function

mysql function full text search function

伊谢尔伦
伊谢尔伦Original
2016-11-23 11:56:311498browse

Syntax:

 MATCH (col1,col2,...) AGAINST (expr [IN BOOLEAN MODE | WITH QUERY EXPANSION])

MySQL supports full-text indexing and search functions. The full-text index type FULLTEXT index in MySQL. FULLTEXT indexes are only available on MyISAM tables; they can be created from CHAR, VARCHAR, or TEXT columns as part of a CREATE TABLE statement, or added later using ALTER TABLE or CREATE INDEX. For larger data sets, entering your data into a table that does not have a FULLTEXT index and then creating the index is faster than entering the data into an existing FULLTEXT index.

Full text search is performed with the MATCH() function. The

mysql> CREATE TABLE articles (    ->   id INT UNSIGNED AUTO_INCREMENT NOT NULL PRIMARY KEY,    ->   title VARCHAR(200),    ->   body TEXT,    ->   FULLTEXT (title,body)    -> );Query OK, 0 rows affected (0.00 sec)
mysql> INSERT INTO articles (title,body) VALUES    -> ('MySQL Tutorial','DBMS stands for DataBase ...'),    -> ('How To Use MySQL Well','After you went through a ...'),    -> ('Optimizing MySQL','In this tutorial we will show ...'),    -> ('1001 MySQL Tricks','1. Never run mysqld as root. 2. ...'),    -> ('MySQL vs. YourSQL','In the following database comparison ...'),    -> ('MySQL Security','When configured properly, MySQL ...');Query OK, 6 rows affected (0.00 sec)
Records: 6  Duplicates: 0  Warnings: 0
mysql> SELECT * FROM articles    -> WHERE MATCH (title,body) AGAINST ('database');
+----+-------------------+------------------------------------------+
| id | title             | body                                     |
+----+-------------------+------------------------------------------+
|  5 | MySQL vs. YourSQL | In the following database comparison ... |
|  1 | MySQL Tutorial    | DBMS stands for DataBase ...             |
+----+-------------------+------------------------------------------+
2 rows in set (0.00 sec)

MATCH() function performs a natural language search within the database for a string. A database is a set of 1 or 2 columns contained in FULLTEXT. The search string is given as a parameter to AGAINST(). For each row in the table, MATCH() returns a correlation value, that is, a similarity measure between the search string and the text in that row in the specified column in the MATCH() table.

By default, the search is performed in a case-insensitive manner. However, you can perform a case-sensitive full-text search by using a binary sort on the indexed columns. For example, you can give a latin1_bin sorting method to a column that uses the latin1 character set, making full-text searches case-sensitive.

As in the above example, when MATCH() is used in a WHERE statement, the relevant value is a non-negative floating point number. Zero correlation means no similarity. The correlation is calculated based on the number of words in the line, the number of uniques in the line, the total number of words in the database, and the number of files (lines) that contain the particular word.

For natural language full-text search, it is required that the columns named in the MATCH() function are the same as the columns contained in some FULLTEXT indexes in your table. For the above query, please note that the columns named in the MATCH() function (title and full text) are the same as the columns in the FULLTEXT index of the article table. If you want to search the title and full text separately, you should create a FULLTEXT index on each column.

Alternatively run a Boolean search or search using query expansion.

The above example basically shows how to use the MATCH() function that returns the correlation order of rows in decreasing order. The following example shows how to retrieve the relevant value explicitly. The order of returned rows is uncertain because the SELECT statement does not contain a WHERE or ORDER BY clause:

mysql> SELECT id, MATCH (title,body) AGAINST ('Tutorial')    -> FROM articles;
+----+-----------------------------------------+
| id | MATCH (title,body) AGAINST ('Tutorial') |
+----+-----------------------------------------+
|  1 |                        0.65545833110809 |
|  2 |                                       0 |
|  3 |                        0.66266459226608 |
|  4 |                                       0 |
|  5 |                                       0 |
|  6 |                                       0 |
+----+-----------------------------------------+
6 rows in set (0.00 sec)

The following example is more complicated. The query returns the relevant values, sorting the rows in order of decreasing relevance. To achieve this result, you should specify MATCH() twice: once in the SELECT list and once in the WHERE clause. This causes no additional housekeeping because the MySQL optimizer notices that the two MATCH() calls are identical and activates the full-text search code only once.

mysql> SELECT id, body, MATCH (title,body) AGAINST    -> ('Security implications of running MySQL as root') AS score    
-> FROM articles WHERE MATCH (title,body) AGAINST    -> ('Security implications of running MySQL as root');
+----+-------------------------------------+-----------------+
| id | body           | score           |
+----+-------------------------------------+-----------------+
|  4 | 1. Never run mysqld as root. 2. ... | 1.5219271183014 |
|  6 | When configured properly, MySQL ... | 1.3114095926285 |
+----+-------------------------------------+-----------------+
2 rows in set (0.00 sec)

There are 2 rows in the table (0.00 seconds)

MySQL FULLTEXT execution treats any sequence of single-word character prototypes (letters, numbers, and underscore parts) as a word. This sequence may also contain single quotes ('), but there will be no more than one on a line. This means aaa'bbb will be treated as one word, while aaa''bbb will be treated as 2 words. Single quotes before or after a word will be removed by the FULLTEXT parser; 'aaa'bbb' will become aaa'bbb.

The FULLTEXT parser determines where a word begins and ends by looking for certain delimiters, such as ' ' (space mark), , (comma), and . (period). If words are not separated by delimiters (such as in Chinese), the FULLTEXT parser cannot determine the start and end positions of a word. In order to be able to add words or other indexed terms to the FULLTEXT index in such a language, you have to preprocess them so that they are separated by some arbitrary delimiter like ".

Some words are in the full text Will be ignored in the search:

Any word that is too short will be ignored. The default minimum length of the word that can be found in the full-text search is 4 characters.

Words in stop words will be ignored. "the" or "some" are too common words to be considered unsemantic. There is a built-in stop word, but it can be overridden through a user-defined list.

Every correct word in the vocabulary and query is based on its In this way, a word that appears in many documents has a lower importance (and even many words have zero importance) because of the importance of this particular word. Its semantic value is lower in the library. On the contrary, if the word is rare, then it will get a higher importance. The importance of the word is then combined and used to calculate the relevance of the row.

Best used with large vocabularies (indeed, when it is carefully tuned), for very small tables the distribution of words does not fully reflect their semantic value, and this mode can sometimes produce strange results, e.g. , Although the word "MySQL" appears in every row in the articles table, a search for this word may not yield any results:

mysql> SELECT * FROM articles
-> WHERE MATCH (title,body) AGAINST ('MySQL');
找不到搜索的词(0.00 秒)

这个搜索的结果为空,原因是单词 “MySQL” 出现在至少全文的50%的行中。 因此, 它被列入停止字。对于大型数据集,使用这个操作最合适不过了----一个自然语言问询不会从一个1GB 的表每隔一行返回一次。对于小型数据集,它的用处可能比较小。

一个符合表中所有行的内容的一半的单词查找相关文档的可能性较小。事实上, 它更容易找到很多不相关的内容。我们都知道,当我们在因特网上试图使用搜索引擎寻找资料的时候,这种情况发生的频率颇高。可以推论,包含该单词的行因其所在特别数据集 而被赋予较低的语义价值。 一个给定的词有可能在一个数据集中拥有超过其50%的域值,而在另一个数据集却不然。

当你第一次尝试使用全文搜索以了解其工作过程时,这个50% 的域值提供重要的蕴涵操作:若你创建了一个表,并且只将文章的1、2行插入其中, 而文中的每个单词在所有行中出现的机率至少为  50% 。那么结果是你什么也不会搜索到。一定要插入至少3行,并且多多益善。需要绕过该50% 限制的用户可使用布尔搜索代码。

1. 布尔全文搜索

利用IN BOOLEAN MODE修改程序, MySQL 也可以执行布尔全文搜索:

mysql> SELECT * FROM articles WHERE MATCH (title,body)    -> AGAINST ('+MySQL -YourSQL' IN BOOLEAN MODE);
+----+-----------------------+-------------------------------------+
| id | title                 | body                                |
+----+-----------------------+-------------------------------------+
|  1 | MySQL Tutorial        | DBMS stands for DataBase ...        |
|  2 | How To Use MySQL Well | After you went through a ...        |
|  3 | Optimizing MySQL      | In this tutorial we will show ...   |
|  4 | 1001 MySQL Tricks     | 1. Never run mysqld as root. 2. ... |
|  6 | MySQL Security        | When configured properly, MySQL ... |
+----+-----------------------+-------------------------------------+

这个问询检索所有包含单词“MySQL”的行,但不检索包含单词“YourSQL”的行。

布尔全文搜索具有以下特点:

它们不使用 50% 域值。.

它们不会按照相关性渐弱的顺序将行进行分类。你可以从上述问询结果中看到这一点:相关性最高的行是一个包含两个“MySQL” 的行,但它被列在最后的位置,而不是开头位置。

即使没有FULLTEXT,它们仍然可以工作,尽管这种方式的搜索执行的速度非常之慢。

最小单词长度全文参数和最大单词长度全文参数均适用。

停止字适用。

布尔全文搜索的性能支持以下操作符:

+

一个前导的加号表示该单词必须 出现在返回的每一行的开头位置。

-

一个前导的减号表示该单词一定不能出现在任何返回的行中。

(无操作符)

在默认状态下(当没有指定 + 或–的情况下),该单词可有可无,但含有该单词的行等级较高。这和MATCH() ... AGAINST()不使用IN BOOLEAN MODE修改程序时的运作很类似。

> 021c18ccb59b6067bed130a15fabd8d0 操作符增强其影响,而 3a11c09e5eab1dd0cd1a25f5714e192bturnover 7f6cd08813b34d5cafb3afff1c848201 REPAIR TABLE tbl_name QUICK;

Note that if you use myisamchk to perform an operation that modifies the table index (such as repair or analysis), use Rebuild the FULLTEXT index with the default full-text parameter values ​​for minimum and maximum word length and stop words, unless you have specified otherwise. This will cause the query to fail.

This problem occurs because only the server knows these parameters. Their storage location is not in the MyISAM index file. If you have modified the minimum word length or maximum word length or stop words in the server, to avoid this problem, specify the same ft_min_word_len, ft_max_word_len, and ft_stopword_file values ​​for myisamchk that you use for mysqld. For example, if you have set the minimum word length to 3, you can modify a table with myisamchk like this:

shell> myisamchk --recover --ft_min_word_len=3 tbl_name.MYI

To ensure that myisamchk and the server have the full text Use the same value for the parameters, and place each item in the [mysqld] and [myisamchk] sections of the selection file:

[mysqld]

ft_min_word_len=3

[myisamchk]

ft_min_word_len=3

Use REPAIR TABLE, ANALYZE TABLE, OPTIMIZE TABLE or ALTER TABLE instead of using myisamchk. These statements are executed by the server, which knows which full-text parameter value is more appropriate.


Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn