Home  >  Article  >  Backend Development  >  Sphinx full-text search PHP tutorial_PHP tutorial

Sphinx full-text search PHP tutorial_PHP tutorial

WBOY
WBOYOriginal
2016-07-20 11:12:46674browse


This is an article that was not published to the public half a year ago, but I am sharing it now. There may be some inaccuracies or imprecision, and some language may be frivolous. Please forgive me.
Sphinx full-text search PHP tutorial_PHP tutorial
Take the email data table in the previous article as an example:

Data structure:


CREATE TABLE email (
emailid mediumint(8) unsigned NOT NULL auto_increment COMMENT '邮件id',

fromid int(10) unsigned NOT NULL default '0' COMMENT '发送人ID',

toid int(10) unsigned NOT NULL default '0' COMMENT '收件人ID',
content text unsigned NOT NULL COMMENT '邮件内容',
subject varchar(100) unsigned NOT NULL COMMENT '邮件标题',

sendtime int(10) NOT NULL COMMENT '发送时间',

attachment varchar(100) NOT NULL COMMENT '附件ID,以逗号分割', PRIMARY KEY (emailid),
) ENGINE=MyISAM';



Use the open console. You must open the console PHP to connect to sphinx (make sure you have established the index source):

d:coreseekbinsearchd -c d:coreseekbinsphinx.conf



The PHP interface file sphinxapi.php is provided in the coreseek/api directory. This file contains a Sphinx full-text search PHP tutorial_PHP tutorialClient class

Introduce this file in PHP and make a new

$sphinx = new Sphinx full-text search PHP tutorial_PHP tutorialClient();

//sphinx的主机名和端口

$sphinx->SetServer ( 'loclahost', 9312 );

//设置返回结果集为php数组格式

$sphinx->SetArrayResult ( true );

//匹配结果的偏移量,参数的意义依次为:起始位置,返回结果条数,最大匹配条数

$sphinx->SetLimits(0, 20, 1000);

//最大搜索时间

$sphinx->SetMaxQueryTime(10);

 

//执行简单的搜索,这个搜索将会查询所有字段的信息,要查询指定的字段请继续看下文

$index = 'email' //索引源是配置文件中的 index 类,如果有多个索引源可使用,号隔开:'email,diary' 或者使用'*'号代表全部索引源

$result = $sphinx->query ('搜索关键字', $index); 

echo '

';

print_r($result);

echo '
';

$result is an array where

total is the total number of matched data

matches is matching data, including information such as id and attrs

words is the word segmentation of the search keyword



You may be wondering why there is no information about the content of the email. In fact, sphinx does not return a data array like mysql, because sphinx does not record complete data, but only records segmented data.

The details also depend on the matches array. The ID in matches refers to the first field in the sql_query SELECT statement in the configuration file. This is what our configuration file looks like

sql_query = SELECT emailid,fromid,toid,subject,content,sendtime,attachment FROM email

So the ID in matches refers to emailid

As for weight, it refers to the weight of the match. Generally, the higher the weight, the highest priority will be returned. For information on matching weight, please refer to the official document

attrs is the information in sql_attr_ in the configuration file. The usage of these attributes will be mentioned later


Having said all that, even if the search results are not the email data we want, the fact is that sphinx does not record real data, so to get the real email data, you have to search the mysql email table based on the ID in matches, but Generally speaking, the speed of this round trip is still much faster than mysql's LIKE, provided that the amount of data is more than hundreds of thousands, otherwise using sphinx will only be slower.



Next, we will introduce some usage of sphinx similar to mysql conditions

//emailid的范围

$sphinx->SetIdRange($min, $max); 

 

//属性过滤,可过滤的属性必需在配置文件中设置sql_attr_    ,之前我们定义了这些

    sql_attr_uint            = fromid

    sql_attr_uint            = toid

    sql_attr_timestamp  = sendtime

//如果你想再次修改这些属性,配置完成后记得重新建立索引才能生效

 

//指定一些值

$sphinx->SetFilter('fromid', array(1,2));    //fromid的值只能是1或者2

//和以上条件相反,可增加第三个参数

$sphinx->SetFilter('fromid', array(1,2), false);    //fromid的值不能是1或者2

//指定一个值的范围

$sphinx->SetFilterRange('toid', 5, 200);    //toid的值在5-200之间

//和以上条件相反,可增加第三个参数

$sphinx->SetFilterRange('toid', 5, 200, false);    //toid的值在5-200以外

 

//执行搜索

$result = $sphinx->query('关键字', '*');



Sort mode
Search results can be sorted using the following patterns:

SPH_SORT_RELEVANCE mode, sort in descending order of relevance (best matches first)

SPH_SORT_ATTR_DESC mode, sort by attributes in descending order (the higher the attribute value, the higher it is)

SPH_SORT_ATTR_ASC mode, sort by attributes in ascending order (the smaller the attribute value, the higher it is)

SPH_SORT_TIME_SEGMENTS mode, first descending order by time period (last hour/day/week/month), then descending order by relevance

SPH_SORT_EXTENDED mode, combines columns in a SQL-like manner and sorts them in ascending or descending order.

SPH_SORT_EXPR mode, sort by an arithmetic expression


//使用属性排序

//以fromid倒序排序,注意当再次使用SetSortMode会覆盖上一个排序

$sphinx->SetSortMode ( "SPH_SORT_ATTR_DESC", 'fromid');

//如果要使用多个字段排序可使用SPH_SORT_EXTENDED模式

//@id是sphinx内置关键字,这里指emailid,至于为什么是emailid,自己思考一下

$sphinx->SetSortMode ( "SPH_SORT_ATTR_DESC", 'fromid ASC, toid DESC, @id DESC');

//执行搜索

$result = $sphinx->query('关键字', '*');


//For more information, please check the official document for instructions on sorting mode

Match pattern
The following optional matching patterns are available:

SPH_MATCH_ALL, match all query terms (default mode);

SPH_MATCH_ANY, matches any one of the query terms;

SPH_MATCH_PHRASE, treats the entire query as a phrase and requires a complete match in order;

SPH_MATCH_BOOLEAN, treat the query as a Boolean expression

SPH_MATCH_EXTENDED, treats the query as an expression in the CoreSeek/Sphinx full-text search PHP tutorial_PHP tutorial internal query language. Starting with version CoreSeek 3/Sphinx full-text search PHP tutorial_PHP tutorial 0.9.9, this option is replaced by the option SPH_MATCH_EXTENDED2, which provides more functionality and better performance. This option is retained for compatibility with legacy code - so that old application code can continue to work even if Sphinx full-text search PHP tutorial_PHP tutorial and its components, including APIs, are upgraded.

SPH_MATCH_EXTENDED2, uses the second version of the "extended matching mode" to match the query.

SPH_MATCH_FULLSCAN, forces the query to be matched using the "full scan" mode described below. Note that in this mode, all query terms are ignored, and although filters, filter ranges, and grouping still work, no text matching will occur.

The main thing we want to focus on is the SPH_MATCH_EXTENDED2 extended matching mode. The extended matching mode allows the use of some conditional statements like mysql

//设置扩展匹配模式

$sphinx->SetMatchMode ( "SPH_MATCH_EXTENDED2" );

//查询中使用条件语句,字段用@开头,搜索内容包含测试,toid等于1的邮件:

$result = $sphinx->query('@content (测试) & @toid =1', '*');

//用括号和&(与)、|、(或者)、-(非,即!=)设置更复杂的条件

$result = $sphinx->query('(@content (测试) & @subject =呃) | (@fromid -(100))', '*');

//更多语法请查看官方文档匹配模式的说明


What is worth mentioning in the extended matching mode is the search field. If the field is set with attributes, then the fields searched by the extended matching do not contain these attributes by default. You can only use SetFilter() or SetFilterRange() etc.

We have set fromid, toid, and sendtime as attributes before, but what should we do if we want to use them as conditions in the extended matching mode?

Just select the field one more time in the sql_query statement

sql_query = SELECT emailid,fromid,fromid,toid,toid,subject,content,sendtime,sendtime,attachment FROM email

//Remember to re-establish the index after setting up

更多条件技巧
只是一些技巧,但不建议使用的部署环境中,至于为什么,请看文章结尾



、>=
默认sphinx没有这些比较符。

假如我想邮件的发送时间大于某一日期怎么办?用SetFilterRange()方法模拟一下

//大于等于某一时间截$time

$sphinx->SetFilterRange('sendtime', $time, 10000000000) //时间截最大是10个9,再加1是不可超越了。。

 

//大于某一时间截$time

$sphinx->SetFilterRange('sendtime', $time+1, 10000000000)

//小于等于某一时间截$time

$sphinx->SetFilterRange('sendtime', -1, $time)    //时间截最小是0,所以应该减1

//大于某一时间截$time

$sphinx->SetFilterRange('sendtime', -1, $time - 1)


IS NOT NULL
怎样搜索为空的字段,比如我要搜索附件为空的邮件,有人可能会想 @attachment ('')不就可以了吗?其实这是搜索两个单引号。。。sphinx搜索的字符串不用加引号的

目前sphinx是没有提供这样的功能,其实可以在mysql语句上作手脚:

sql_query = SELECT emailid,fromid,toidsubject,content,sendtime,attachement != '' as attach is not null FROM email //这里返回了一个新字段attachisnotnull,当attachisnotnull为1的时候附件就不为空了

//设置完成记得重新建立索引



FIND_IN_SET()
搜索包含某一附件的邮件,mysql习惯用FIND_IN_SET这么简单一句就搞定了,在sphinx中必需在配置里设置属性sql_attr_multi 多值属性(MVA):

sql_attr_multi = attachment #attachment可以是逗号分隔的附件ID,或者是空格、分号等sphinx都能识别

//设置完成记得重新建立索引

 

然后PHP中可以使用SetFilter()

//搜索包含附件ID为1或2邮件,mysql语法是这样FIND_IN_SET(`attachment`, '1,2')

$sphinx->SetFilter('attachment', array(1,2))

//可以使用SetFilterRange,搜索包含附件ID在50-100范围的邮件

$sphinx->SetFilterRange('attachment', 50, 100)


总结
如果你想一个免费、好用、极速的全文搜索引擎,sphinx无疑是最好的选择,但是不要忘记sphinx的目的:全文检索。不要去想那些乱七八糟条件。你想要把sphinx搜索变得像mysql那样灵活,可完全单独用在一些复杂的多条件搜索,像某些邮件的高级搜索,那么我建议你还是多花点时间在PHP或者mysql代码的优化上,因为那样可能会让你的搜索变得更慢。

最好的方法是以最简单的方法搜索到内容,将ID交还mysql数据库搜索。

www.bkjia.comtruehttp://www.bkjia.com/PHPjc/444552.htmlTechArticle这是半年前没有对外写的文章,现在拿出来分享下。可能会有一些不正确或不严谨的地方,某些语言可能比较轻浮,请见谅。 以上一篇的...
Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn