Heim >php教程 >php手册 >搭建coreseek(sphinx+mmseg3)详细安装配置+php之sphinx扩展安装+

搭建coreseek(sphinx+mmseg3)详细安装配置+php之sphinx扩展安装+

WBOY
WBOYOriginal
2016-06-06 19:52:541299Durchsuche

一个文档包含了安装、增量备份、扩展、api调用示例,省去了查找大量文章的时间。 搭建coreseek(sphinx+mmseg3)安装 [第一步] 先安装mmseg3 cd /var/ install wget http: // www.coreseek.cn/uploads/csft/4.0/coreseek-4.1-beta.tar.gz tar zxvf coreseek- 4

一个文档包含了安装、增量备份、扩展、api调用示例,省去了查找大量文章的时间。

搭建coreseek(sphinx+mmseg3)安装

 

[第一步] 先安装mmseg3

cd /var/<span>install</span>
<span>wget</span> http:<span>//</span><span>www.coreseek.cn/uploads/csft/4.0/coreseek-4.1-beta.tar.gz</span>
<span>tar</span> zxvf coreseek-<span>4.1</span>-beta.<span>tar</span><span>.gz

cd coreseek</span>-<span>4.1</span>-<span>beta
cd mmseg</span>-<span>3.2</span>.<span>14</span><span>
.</span>/<span>bootstrap
.</span>/configure --prefix=/usr/local/<span>mmseg3
</span><span>make</span> && <span>make</span> <span>install</span><span>

遇到的问题:
error: cannot </span><span>find</span> input <span>file</span>: src/Makefile.<span>in</span><span>
或者遇到其他类似error错误时...

解决方案:
依次执行下面的命令,我运行</span><span>'</span><span>aclocal</span><span>'</span><span>时又出现了错误,解决方案请看下文描述

</span><span>yum</span> -y <span>install</span><span> libtool

aclocal
libtoolize </span>--<span>force
automake </span>--add-<span>missing
autoconf
autoheader
</span><span>make</span> clean

安装好'libtool'继续从'aclocal'开始执行上面提到的一串命令,执行完后再运行最开始的安装流程即可。

[第二步] 安装coreseek

<span>##安装coreseek
$ cd csft</span>-<span>3.2</span>.<span>14</span> 或者 cd csft-<span>4.0</span>.<span>1</span> 或者 cd csft-<span>4.1</span><span>
$ </span><span>sh</span> buildconf.<span>sh</span><span>                                         #输出的warning信息可以忽略,如果出现error则需要解决
$ .</span>/configure --prefix=/usr/local/coreseek  --without-unixodbc --with-mmseg --with-mmseg-includes=/usr/local/mmseg3/include/mmseg/ --with-mmseg-libs=/usr/local/mmseg3/lib/ --with-<span>mysql
##如果提示mysql问题,可以查看MySQL数据源安装说明   http:</span><span>//</span><span>www.coreseek.cn/product_install/install_on_bsd_linux/#mysql</span>
$ <span>make</span> && <span>make</span> <span>install</span><span>
$ cd ..


##命令行测试mmseg分词,coreseek搜索(需要预先设置好字符集为zh_CN.UTF</span>-<span>8</span><span>,确保正确显示中文)
$ cd testpack
$ </span><span>cat</span> var/test/<span>test.xml    #此时应该正确显示中文
$ </span>/usr/local/mmseg3/bin/mmseg -d /usr/local/mmseg3/etc var/test/<span>test.xml
$ </span>/usr/local/coreseek/bin/indexer -c etc/csft.conf --<span>all
$ </span>/usr/local/coreseek/bin/search -c etc/csft.conf 网络搜索

出现这个 xmlpipe2 support NOT compiled in. To use xmlpipe2, install missing XML libra  错误

执行以下命令:

<span>yum</span> -y <span>install</span> expat expat-devel

依次安装后,从新编译coreseek,然后再生成索引,就可以通过了。

结果如下:

Coreseek Fulltext <span>4.1</span> [ Sphinx <span>2.0</span>.<span>2</span>-<span>dev (r2922)]  
Copyright (c) </span><span>2007</span>-<span>2011</span><span>,  
Beijing Choice Software Technologies Inc (http:</span><span>//</span><span>www.coreseek.com)  </span>
<span> 
using config </span><span>file</span> <span>'</span><span>etc/csft.conf</span><span>'</span><span>...  
index </span><span>'</span><span>xml</span><span>'</span>: query <span>'</span><span>网络搜索 </span><span>'</span>: returned <span>1</span> matches of <span>1</span> total <span>in</span> <span>0.000</span><span> sec  
 
displaying matches:  
</span><span>1</span>. document=<span>1</span>, weight=<span>1590</span>, published=Thu Apr  <span>1</span> <span>07</span>:<span>20</span>:<span>07</span> <span>2010</span>, author_id=<span>1</span><span>  
 
words:  
</span><span>1</span>. <span>'</span><span>网络</span><span>'</span>: <span>1</span> documents, <span>1</span><span> hits  
</span><span>2</span>. <span>'</span><span>搜索</span><span>'</span>: <span>2</span> documents, <span>5</span> hits  

下面开始sphinx与mysql的配置


创建sphinx统计表,在coreseek_test库中执行。

<span>CREATE TABLE sph_counter
(
    counter_id INTEGER PRIMARY KEY NOT NULL,
    max_doc_id INTEGER NOT NULL
);</span>

创建配置sphinx与mysql的配置文件

# <span>vi</span> /usr/local/coreseek/etc/csft_mysql.conf

#MySQL数据源配置,详情请查看:http:<span>//</span><span>www.coreseek.cn/docs/coreseek_4.1-sphinx_2.0.1-beta.html#conf-reference</span>
<span>
#源定义
source main
{
    type                    </span>=<span> mysql

    sql_host                </span>=<span> localhost
    sql_user                </span>=<span> root
    sql_pass                </span>= <span>123456</span><span> 
    sql_db                  </span>=<span> coreseek_test
    sql_port                </span>= <span>3306</span><span>
    sql_query_pre           </span>=<span> SET NAMES utf8
    sql_query_pre        </span>= REPLACE INTO sph_counter SELECT <span>1</span>,MAX(<span>id</span><span>) FROM hr_spider_company;
    sql_query               </span>= SELECT * FROM hr_spider_company WHERE <span>id</span>1<span> ) 
                                                         #sql_query第一列id需为整数
                                                         #title、content作为字符串</span>/<span>文本字段,被全文索引
    sql_attr_uint            </span>= <span>id</span><span>                        #从SQL读取到的值必须为整数
    sql_attr_uint            </span>=<span> from_id                #从SQL读取到的值必须为整数,不支持全文检索
    sql_attr_uint            </span>=<span> link_id                #从SQL读取到的值必须为整数,不支持全文检索
    sql_attr_uint            </span>=<span> add_time                #从SQL读取到的值必须为整数,不支持全文检索
    sql_field_string         </span>=<span> link_url                 #字符串字段(可全文搜索,可返回原始文本信息)
    sql_field_string          </span>=<span> company_name          #字符串字段(可全文搜索,可返回原始文本信息)
    sql_field_string          </span>=<span> type_name             #字符串字段(可全文搜索,可返回原始文本信息)
    sql_field_string          </span>=<span> trade_name             #字符串字段(可全文搜索,可返回原始文本信息)
    sql_field_string          </span>=<span> email                 #字符串字段(可全文搜索,可返回原始文本信息)
    sql_field_string          </span>=<span> description             #字符串字段(可全文搜索,可返回原始文本信息)

    sql_query_info_pre      </span>=<span> SET NAMES utf8         #命令行查询时,设置正确的字符集
    sql_query_info            </span>= SELECT <span>id</span>,from_id,link_id,company_name,type_name,trade_name,address,description, FROM_UNIXTIME(add_time) AS add_time  FROM hr_spider_company  WHERE <span>id</span>=$<span>id</span><span>                     #命令行查询时,从数据库读取原始数据信息
}

source delta : main  
{  
    sql_query_pre           </span>=<span> SET NAMES utf8  
    sql_query               </span>= SELECT * FROM hr_spider_company WHERE <span>id</span>>( SELECT max_doc_id FROM sph_counter WHERE counter_id=<span>1</span><span> )
    sql_query_post_index    </span>= REPLACE INTO sph_counter SELECT <span>1</span>,MAX(<span>id</span><span>) FROM hr_spider_company
}  



#index定义
index main
{
    source                </span>=<span> main                         #对应的source名称
    path                  </span>= /usr/local/coreseek/var/data/mysql     #请修改为实际使用的绝对路径,例如:/usr/local/coreseek/var/<span>...
    docinfo               </span>=<span> extern
    mlock                 </span>= <span>0</span><span>
    morphology            </span>=<span> none
    min_word_len          </span>= <span>1</span><span>
    html_strip            </span>= <span>0</span><span>

    #中文分词配置,详情请查看:http:</span><span>//</span><span>www.coreseek.cn/products-install/coreseek_mmseg/</span>
    charset_dictpath     = /usr/local/mmseg3/etc/          #BSD、Linux环境下设置,/<span>符号结尾
    charset_type        </span>= zh_cn.utf-<span>8</span><span>
}

index delta : main  
{  
    source          </span>=<span> delta  
    path            </span>= /usr/local/coreseek/var/data/<span>delta 
}


#全局index定义
indexer
{
    mem_limit            </span>=<span> 128M
}

#searchd服务定义
searchd
{
    listen              </span>= <span>9312</span><span>
    read_timeout        </span>= <span>5</span><span>
    max_children        </span>= <span>30</span><span>
    max_matches         </span>= <span>1000</span><span>
    seamless_rotate     </span>= <span>0</span><span>
    preopen_indexes     </span>= <span>0</span><span>
    unlink_old          </span>= <span>1</span><span>
    pid_file         </span>= /usr/local/coreseek/var/log/searchd_mysql.pid   #请修改为实际使用的绝对路径,例如:/usr/local/coreseek/var/<span>...
    log             </span>= /usr/local/coreseek/var/log/searchd_mysql.log        #请修改为实际使用的绝对路径,例如:/usr/local/coreseek/var/<span>...
    query_log         </span>= /usr/local/coreseek/var/log/query_mysql.log    #请修改为实际使用的绝对路径,例如:/usr/local/coreseek/var/<span>...
    binlog_path     </span>=<span>                                              #关闭binlog日志
}</span>

我的测试表名为hr_spider_company,你只需要根据实际需求更改为自己的表名即可。

调用命令列表:

启动后台服务(必须开启)

# /usr/local/coreseek/bin/searchd -c /usr/local/coreseek/etc/csft_mysql.conf

执行索引(查询、测试前必须执行一次)

/usr/local/coreseek/bin/indexer -c /usr/local/coreseek/etc/csft_mysql.conf --all --rotate

执行增量索引

/usr/local/coreseek/bin/indexer -c /usr/local/coreseek/etc/csft_mysql.conf delta --rotate

合并索引

/usr/local/coreseek/bin/indexer -c /usr/local/coreseek/etc/csft_mysql.conf --merge main delta --rotate --merge-dst-range deleted <span>0</span> <span>0</span>

(为了防止多个关键字指向同一个文档加上--merge-dst-range deleted 0 0)

后台服务测试

# /usr/local/coreseek/bin/search -c /usr/local/coreseek/etc/csft_mysql.conf  aaa

关闭后台服务

# /usr/local/coreseek/bin/searchd -c /usr/local/coreseek/etc/csft_mysql.conf --stop

自动化命令:

crontab -e

*/1 * * * * /bin/sh /usr/local/coreseek/bin/indexer -c /usr/local/coreseek/etc/csft_mysql.conf delta --rotate
*/5 * * * * /bin/sh /usr/local/coreseek/bin/indexer -c /usr/local/coreseek/etc/csft_mysql.conf --merge main delta --rotate --merge-dst-range deleted <span>0</span> <span>0</span>
30 1 * * *  /bin/sh /usr/local/coreseek/bin/indexer -c /usr/local/coreseek/etc/csft_mysql.conf --all --rotate

以下任务计划的意思是:每隔一分钟执行一遍增量索引,每五分钟执行一遍合并索引,每天1:30执行整体索引。

Sphinx扩展安装安装


Coreseek官方教程中建议php使用直接include一个php文件进行操作,事实上php有独立的sphinx模块可以直接操作 coreseek(coreseek就是sphinx!)已经进入了php的官方函数库,而且效率的提升不是一点点!但php模块依赖于 libsphinxclient包。

[第一步] 安装依赖libsphinxclient

# cd /var/<span>install</span>/coreseek-<span>4.1</span>-beta/csft-<span>4.1</span>/api/libsphinxclient/<span>
# .</span>/configure  --prefix=/usr/local/<span>sphinxclient

configure: creating .</span>/<span>config.status
config.status: creating Makefile
config.status: error: cannot </span><span>find</span> input <span>file</span>: Makefile.<span>in</span><span>   #报错configure失败    

</span><span>//</span><span>处理configure报错</span>
编译过程中报了一个config.status: error: cannot <span>find</span> input <span>file</span>: src/<span>Makefile.in这个的错误,然后运行下列指令再次编译就能通过了:
# aclocal
# libtoolize </span>--<span>force
# automake </span>--add-<span>missing
# autoconf
# autoheader
# </span><span>make</span><span> clean

</span><span>//</span><span>从新configure编译</span>
# ./<span>configure

# </span><span>make</span> && <span>make</span> <span>install</span>

[第二步] 安装sphinx的PHP扩展

http:<span>//</span><span>pecl.php.net/package/sphinx</span>
# <span>wget</span> http:<span>//</span><span>pecl.php.net/get/sphinx-1.3.0.tgz</span>
# <span>tar</span> zxvf sphinx-<span>1.3</span>.<span>0</span><span>.tgz
# cd sphinx</span>-<span>1.3</span>.<span>0</span><span>
# phpize
# .</span>/configure --with-php-config=/usr/bin/php-config --with-sphinx=/usr/local/<span>sphinxclient
# </span><span>make</span> && <span>make</span> <span>install</span><span>
# cd </span>/etc/php.d/<span>
# </span><span>cp</span><span> gd.ini  sphinx.ini
# </span><span>vi</span><span> sphinx.ini

extension</span>=<span>sphinx.so

# service php</span>-fpm restart

打开phpinfo看一下是否已经支持了sphinx模块。

php调用sphinx示例:

<span>php
    </span><span>$s</span> = <span>new</span><span> SphinxClient;
    </span><span>$s</span>->setServer("127.0.0.1", 9312<span>);

    </span><span>$s</span>-><span>setMatchMode(SPH_MATCH_PHRASE);
    </span><span>$s</span>->setMaxQueryTime(30<span>);
    </span><span>$res</span> = <span>$s</span>->query("宝马",'main'); <span>#</span><span>[宝马]关键字,[main]数据源source</span>
    <span>$err</span> = <span>$s</span>-><span>GetLastError();
    </span><span>var_dump</span>(<span>array_keys</span>(<span>$res</span>['matches'<span>]));
    </span><span>echo</span> "<br>"."通过获取的ID来读取数据库中的值即可。"."<br>"<span>;
    
    </span><span>echo</span> '<pre class="brush:php;toolbar:false">'<span>;
    </span><span>var_dump</span>(<span>$res</span><span>);
    </span><span>var_dump</span>(<span>$err</span><span>);
    </span><span>echo</span> '
';

调用示例二:支持分页

<span>php
    </span><span>header</span>("Content-type: text/html; charset=utf-8"<span>);
    </span><span>require</span>("./sphinxapi.php"<span>);
    </span><span>$s</span> = <span>new</span><span> SphinxClient;
    </span><span>$s</span>->setServer("192.168.252.132", 9312<span>);
    
    </span><span>//</span><span>SPH_MATCH_ALL, 匹配所有查询词(默认模式); SPH_MATCH_ANY, 匹配查询词中的任意一个; SPH_MATCH_EXTENDED2, 支持特殊运算符查询</span>
    <span>$s</span>-><span>setMatchMode(SPH_MATCH_ALL);
    </span><span>$s</span>->setMaxQueryTime(30);                                        <span>//</span><span>设置最大搜索时间</span>
    <span>$s</span>->SetArrayResult(<span>false</span>);                                        <span>//</span><span>是否将Matches的key用ID代替</span>
    <span>$s</span>->SetSelect ( "*" );                                            <span>//</span><span>设置返回信息的内容,等同于SQL</span>
    <span>$s</span>->SetRankingMode(SPH_RANK_BM25);                                <span>//</span><span>设置评分模式,SPH_RANK_BM25可能使包含多个词的查询的结果质量下降。 
    //$s->SetSortMode(SPH_SORT_EXTENDED);                            //发现增加此参数会使结果不准确
    //$s->SetSortMode(SPH_SORT_EXTENDED,"from_id asc,id desc");        //设置匹配项的排序模式, SPH_SORT_EXTENDED按一种类似SQL的方式将列组合起来,升序或降序排列。</span>
    <span>$weights</span> = <span>array</span> ('company_name' => 20<span>);
    </span><span>$s</span>->SetFieldWeights(<span>$weights</span>);                                    <span>//</span><span>设置字段权重</span>
    <span>$s</span>->SetLimits ( 0, 30, 1000, 0 );                                <span>//</span><span>设置结果集偏移量  SetLimits (便宜量,匹配项数目,查询的结果集数默认1000,阀值达到后停止)
    //$s->SetFilter ( $attribute, $values, $exclude=false );        //设置属性过滤
    //$s->SetGroupBy ( $attribute, $func, $groupsort="@group desc" );    //设置分组的属性</span>
    <span>$res</span> = <span>$s</span>->query('@* "汽车"','main','--single-0-query--'); <span>#</span><span>[宝马]关键字,[news]数据源source
    
    
    //代码高亮</span>
    <span>$tags</span> = <span>array</span><span>();
    </span><span>$tags_name</span> = <span>array</span><span>();
    </span><span>foreach</span>(<span>$res</span>['matches'] <span>as</span> <span>$key</span>=><span>$value</span><span>){
        </span><span>$tags</span>[] = <span>$value</span>['attrs'<span>];
          </span><span>$company_name</span>[] = <span>$value</span>['attrs']['company_name'<span>];
          </span><span>$description</span>[] = <span>$value</span>['attrs']['description'<span>];
    }
    </span><span>$company_name</span> = <span>$s</span>->BuildExcerpts (<span>$company_name</span>, 'main', '汽车', <span>$opts</span>=<span>array</span>() );        <span>//</span><span>执行高亮,这里索引名字千万不能用*</span>
    <span>$description</span> = <span>$s</span>->BuildExcerpts (<span>$description</span>, 'main', '汽车', <span>$opts</span>=<span>array</span>() );        <span>//</span><span>执行高亮,这里索引名字千万不能用*</span>
    <span>foreach</span>(<span>$tags</span> <span>as</span> <span>$k</span>=><span>$v</span><span>)
    {
        </span><span>$tags</span>[<span>$k</span>]['company_name'] = <span>$company_name</span>[<span>$k</span>];    <span>//</span><span>高亮后覆盖</span>
        <span>$tags</span>[<span>$k</span>]['description'] = <span>$description</span>[<span>$k</span>];    <span>//</span><span>高亮后覆盖</span>
<span>    }
    
    </span><span>//</span><span> 高亮后覆盖</span>
    <span>$i</span> = 0<span>;
    </span><span>foreach</span>(<span>$res</span>['matches'] <span>as</span> <span>$key</span>=><span>$value</span><span>){
        </span><span>$res</span>['matches'][<span>$key</span>] = <span>$tags</span>[<span>$i</span><span>];
        </span><span>$i</span>++<span>;
    }
    
    </span><span>$err</span> = <span>$s</span>-><span>GetLastError();
    
    </span><span>echo</span> '<pre class="brush:php;toolbar:false">'<span>;
    </span><span>var_export</span>(<span>$res</span><span>);
    </span><span>var_export</span>(<span>$err</span><span>);
    </span><span>echo</span> '
';

还有很对地方需要参考:http://www.coreseek.cn/docs/coreseek_4.1-sphinx_2.0.1-beta.html#api-reference

Stellungnahme:
Der Inhalt dieses Artikels wird freiwillig von Internetnutzern beigesteuert und das Urheberrecht liegt beim ursprünglichen Autor. Diese Website übernimmt keine entsprechende rechtliche Verantwortung. Wenn Sie Inhalte finden, bei denen der Verdacht eines Plagiats oder einer Rechtsverletzung besteht, wenden Sie sich bitte an admin@php.cn
Vorheriger Artikel:PHP多进程控制的实例Nächster Artikel:PHP 线程安全,多线程