首頁 >php教程 >php手册 >搭建coreseek(sphinx+mmseg3)详细安装配置+php之sphinx扩展安装+

搭建coreseek(sphinx+mmseg3)详细安装配置+php之sphinx扩展安装+

WBOY
WBOY原創
2016-06-06 19:52:541302瀏覽

一个文档包含了安装、增量备份、扩展、api调用示例,省去了查找大量文章的时间。 搭建coreseek(sphinx+mmseg3)安装 [第一步] 先安装mmseg3 cd /var/ install wget http: // www.coreseek.cn/uploads/csft/4.0/coreseek-4.1-beta.tar.gz tar zxvf coreseek- 4

一个文档包含了安装、增量备份、扩展、api调用示例,省去了查找大量文章的时间。

搭建coreseek(sphinx+mmseg3)安装

 

[第一步] 先安装mmseg3

cd /var/<span>install</span>
<span>wget</span> http:<span>//</span><span>www.coreseek.cn/uploads/csft/4.0/coreseek-4.1-beta.tar.gz</span>
<span>tar</span> zxvf coreseek-<span>4.1</span>-beta.<span>tar</span><span>.gz

cd coreseek</span>-<span>4.1</span>-<span>beta
cd mmseg</span>-<span>3.2</span>.<span>14</span><span>
.</span>/<span>bootstrap
.</span>/configure --prefix=/usr/local/<span>mmseg3
</span><span>make</span> && <span>make</span> <span>install</span><span>

遇到的问题:
error: cannot </span><span>find</span> input <span>file</span>: src/Makefile.<span>in</span><span>
或者遇到其他类似error错误时...

解决方案:
依次执行下面的命令,我运行</span><span>'</span><span>aclocal</span><span>'</span><span>时又出现了错误,解决方案请看下文描述

</span><span>yum</span> -y <span>install</span><span> libtool

aclocal
libtoolize </span>--<span>force
automake </span>--add-<span>missing
autoconf
autoheader
</span><span>make</span> clean

安装好'libtool'继续从'aclocal'开始执行上面提到的一串命令,执行完后再运行最开始的安装流程即可。

[第二步] 安装coreseek

<span>##安装coreseek
$ cd csft</span>-<span>3.2</span>.<span>14</span> 或者 cd csft-<span>4.0</span>.<span>1</span> 或者 cd csft-<span>4.1</span><span>
$ </span><span>sh</span> buildconf.<span>sh</span><span>                                         #输出的warning信息可以忽略,如果出现error则需要解决
$ .</span>/configure --prefix=/usr/local/coreseek  --without-unixodbc --with-mmseg --with-mmseg-includes=/usr/local/mmseg3/include/mmseg/ --with-mmseg-libs=/usr/local/mmseg3/lib/ --with-<span>mysql
##如果提示mysql问题,可以查看MySQL数据源安装说明   http:</span><span>//</span><span>www.coreseek.cn/product_install/install_on_bsd_linux/#mysql</span>
$ <span>make</span> && <span>make</span> <span>install</span><span>
$ cd ..


##命令行测试mmseg分词,coreseek搜索(需要预先设置好字符集为zh_CN.UTF</span>-<span>8</span><span>,确保正确显示中文)
$ cd testpack
$ </span><span>cat</span> var/test/<span>test.xml    #此时应该正确显示中文
$ </span>/usr/local/mmseg3/bin/mmseg -d /usr/local/mmseg3/etc var/test/<span>test.xml
$ </span>/usr/local/coreseek/bin/indexer -c etc/csft.conf --<span>all
$ </span>/usr/local/coreseek/bin/search -c etc/csft.conf 网络搜索

出现这个 xmlpipe2 support NOT compiled in. To use xmlpipe2, install missing XML libra  错误

执行以下命令:

<span>yum</span> -y <span>install</span> expat expat-devel

依次安装后,从新编译coreseek,然后再生成索引,就可以通过了。

结果如下:

Coreseek Fulltext <span>4.1</span> [ Sphinx <span>2.0</span>.<span>2</span>-<span>dev (r2922)]  
Copyright (c) </span><span>2007</span>-<span>2011</span><span>,  
Beijing Choice Software Technologies Inc (http:</span><span>//</span><span>www.coreseek.com)  </span>
<span> 
using config </span><span>file</span> <span>'</span><span>etc/csft.conf</span><span>'</span><span>...  
index </span><span>'</span><span>xml</span><span>'</span>: query <span>'</span><span>网络搜索 </span><span>'</span>: returned <span>1</span> matches of <span>1</span> total <span>in</span> <span>0.000</span><span> sec  
 
displaying matches:  
</span><span>1</span>. document=<span>1</span>, weight=<span>1590</span>, published=Thu Apr  <span>1</span> <span>07</span>:<span>20</span>:<span>07</span> <span>2010</span>, author_id=<span>1</span><span>  
 
words:  
</span><span>1</span>. <span>'</span><span>网络</span><span>'</span>: <span>1</span> documents, <span>1</span><span> hits  
</span><span>2</span>. <span>'</span><span>搜索</span><span>'</span>: <span>2</span> documents, <span>5</span> hits  

下面开始sphinx与mysql的配置


创建sphinx统计表,在coreseek_test库中执行。

<span>CREATE TABLE sph_counter
(
    counter_id INTEGER PRIMARY KEY NOT NULL,
    max_doc_id INTEGER NOT NULL
);</span>

创建配置sphinx与mysql的配置文件

# <span>vi</span> /usr/local/coreseek/etc/csft_mysql.conf

#MySQL数据源配置,详情请查看:http:<span>//</span><span>www.coreseek.cn/docs/coreseek_4.1-sphinx_2.0.1-beta.html#conf-reference</span>
<span>
#源定义
source main
{
    type                    </span>=<span> mysql

    sql_host                </span>=<span> localhost
    sql_user                </span>=<span> root
    sql_pass                </span>= <span>123456</span><span> 
    sql_db                  </span>=<span> coreseek_test
    sql_port                </span>= <span>3306</span><span>
    sql_query_pre           </span>=<span> SET NAMES utf8
    sql_query_pre        </span>= REPLACE INTO sph_counter SELECT <span>1</span>,MAX(<span>id</span><span>) FROM hr_spider_company;
    sql_query               </span>= SELECT * FROM hr_spider_company WHERE <span>id</span>1<span> ) 
                                                         #sql_query第一列id需为整数
                                                         #title、content作为字符串</span>/<span>文本字段,被全文索引
    sql_attr_uint            </span>= <span>id</span><span>                        #从SQL读取到的值必须为整数
    sql_attr_uint            </span>=<span> from_id                #从SQL读取到的值必须为整数,不支持全文检索
    sql_attr_uint            </span>=<span> link_id                #从SQL读取到的值必须为整数,不支持全文检索
    sql_attr_uint            </span>=<span> add_time                #从SQL读取到的值必须为整数,不支持全文检索
    sql_field_string         </span>=<span> link_url                 #字符串字段(可全文搜索,可返回原始文本信息)
    sql_field_string          </span>=<span> company_name          #字符串字段(可全文搜索,可返回原始文本信息)
    sql_field_string          </span>=<span> type_name             #字符串字段(可全文搜索,可返回原始文本信息)
    sql_field_string          </span>=<span> trade_name             #字符串字段(可全文搜索,可返回原始文本信息)
    sql_field_string          </span>=<span> email                 #字符串字段(可全文搜索,可返回原始文本信息)
    sql_field_string          </span>=<span> description             #字符串字段(可全文搜索,可返回原始文本信息)

    sql_query_info_pre      </span>=<span> SET NAMES utf8         #命令行查询时,设置正确的字符集
    sql_query_info            </span>= SELECT <span>id</span>,from_id,link_id,company_name,type_name,trade_name,address,description, FROM_UNIXTIME(add_time) AS add_time  FROM hr_spider_company  WHERE <span>id</span>=$<span>id</span><span>                     #命令行查询时,从数据库读取原始数据信息
}

source delta : main  
{  
    sql_query_pre           </span>=<span> SET NAMES utf8  
    sql_query               </span>= SELECT * FROM hr_spider_company WHERE <span>id</span>>( SELECT max_doc_id FROM sph_counter WHERE counter_id=<span>1</span><span> )
    sql_query_post_index    </span>= REPLACE INTO sph_counter SELECT <span>1</span>,MAX(<span>id</span><span>) FROM hr_spider_company
}  



#index定义
index main
{
    source                </span>=<span> main                         #对应的source名称
    path                  </span>= /usr/local/coreseek/var/data/mysql     #请修改为实际使用的绝对路径,例如:/usr/local/coreseek/var/<span>...
    docinfo               </span>=<span> extern
    mlock                 </span>= <span>0</span><span>
    morphology            </span>=<span> none
    min_word_len          </span>= <span>1</span><span>
    html_strip            </span>= <span>0</span><span>

    #中文分词配置,详情请查看:http:</span><span>//</span><span>www.coreseek.cn/products-install/coreseek_mmseg/</span>
    charset_dictpath     = /usr/local/mmseg3/etc/          #BSD、Linux环境下设置,/<span>符号结尾
    charset_type        </span>= zh_cn.utf-<span>8</span><span>
}

index delta : main  
{  
    source          </span>=<span> delta  
    path            </span>= /usr/local/coreseek/var/data/<span>delta 
}


#全局index定义
indexer
{
    mem_limit            </span>=<span> 128M
}

#searchd服务定义
searchd
{
    listen              </span>= <span>9312</span><span>
    read_timeout        </span>= <span>5</span><span>
    max_children        </span>= <span>30</span><span>
    max_matches         </span>= <span>1000</span><span>
    seamless_rotate     </span>= <span>0</span><span>
    preopen_indexes     </span>= <span>0</span><span>
    unlink_old          </span>= <span>1</span><span>
    pid_file         </span>= /usr/local/coreseek/var/log/searchd_mysql.pid   #请修改为实际使用的绝对路径,例如:/usr/local/coreseek/var/<span>...
    log             </span>= /usr/local/coreseek/var/log/searchd_mysql.log        #请修改为实际使用的绝对路径,例如:/usr/local/coreseek/var/<span>...
    query_log         </span>= /usr/local/coreseek/var/log/query_mysql.log    #请修改为实际使用的绝对路径,例如:/usr/local/coreseek/var/<span>...
    binlog_path     </span>=<span>                                              #关闭binlog日志
}</span>

我的测试表名为hr_spider_company,你只需要根据实际需求更改为自己的表名即可。

调用命令列表:

启动后台服务(必须开启)

# /usr/local/coreseek/bin/searchd -c /usr/local/coreseek/etc/csft_mysql.conf

执行索引(查询、测试前必须执行一次)

/usr/local/coreseek/bin/indexer -c /usr/local/coreseek/etc/csft_mysql.conf --all --rotate

执行增量索引

/usr/local/coreseek/bin/indexer -c /usr/local/coreseek/etc/csft_mysql.conf delta --rotate

合并索引

/usr/local/coreseek/bin/indexer -c /usr/local/coreseek/etc/csft_mysql.conf --merge main delta --rotate --merge-dst-range deleted <span>0</span> <span>0</span>

(为了防止多个关键字指向同一个文档加上--merge-dst-range deleted 0 0)

后台服务测试

# /usr/local/coreseek/bin/search -c /usr/local/coreseek/etc/csft_mysql.conf  aaa

关闭后台服务

# /usr/local/coreseek/bin/searchd -c /usr/local/coreseek/etc/csft_mysql.conf --stop

自动化命令:

crontab -e

*/1 * * * * /bin/sh /usr/local/coreseek/bin/indexer -c /usr/local/coreseek/etc/csft_mysql.conf delta --rotate
*/5 * * * * /bin/sh /usr/local/coreseek/bin/indexer -c /usr/local/coreseek/etc/csft_mysql.conf --merge main delta --rotate --merge-dst-range deleted <span>0</span> <span>0</span>
30 1 * * *  /bin/sh /usr/local/coreseek/bin/indexer -c /usr/local/coreseek/etc/csft_mysql.conf --all --rotate

以下任务计划的意思是:每隔一分钟执行一遍增量索引,每五分钟执行一遍合并索引,每天1:30执行整体索引。

Sphinx扩展安装安装


Coreseek官方教程中建议php使用直接include一个php文件进行操作,事实上php有独立的sphinx模块可以直接操作 coreseek(coreseek就是sphinx!)已经进入了php的官方函数库,而且效率的提升不是一点点!但php模块依赖于 libsphinxclient包。

[第一步] 安装依赖libsphinxclient

# cd /var/<span>install</span>/coreseek-<span>4.1</span>-beta/csft-<span>4.1</span>/api/libsphinxclient/<span>
# .</span>/configure  --prefix=/usr/local/<span>sphinxclient

configure: creating .</span>/<span>config.status
config.status: creating Makefile
config.status: error: cannot </span><span>find</span> input <span>file</span>: Makefile.<span>in</span><span>   #报错configure失败    

</span><span>//</span><span>处理configure报错</span>
编译过程中报了一个config.status: error: cannot <span>find</span> input <span>file</span>: src/<span>Makefile.in这个的错误,然后运行下列指令再次编译就能通过了:
# aclocal
# libtoolize </span>--<span>force
# automake </span>--add-<span>missing
# autoconf
# autoheader
# </span><span>make</span><span> clean

</span><span>//</span><span>从新configure编译</span>
# ./<span>configure

# </span><span>make</span> && <span>make</span> <span>install</span>

[第二步] 安装sphinx的PHP扩展

http:<span>//</span><span>pecl.php.net/package/sphinx</span>
# <span>wget</span> http:<span>//</span><span>pecl.php.net/get/sphinx-1.3.0.tgz</span>
# <span>tar</span> zxvf sphinx-<span>1.3</span>.<span>0</span><span>.tgz
# cd sphinx</span>-<span>1.3</span>.<span>0</span><span>
# phpize
# .</span>/configure --with-php-config=/usr/bin/php-config --with-sphinx=/usr/local/<span>sphinxclient
# </span><span>make</span> && <span>make</span> <span>install</span><span>
# cd </span>/etc/php.d/<span>
# </span><span>cp</span><span> gd.ini  sphinx.ini
# </span><span>vi</span><span> sphinx.ini

extension</span>=<span>sphinx.so

# service php</span>-fpm restart

打开phpinfo看一下是否已经支持了sphinx模块。

php调用sphinx示例:

<span>php
    </span><span>$s</span> = <span>new</span><span> SphinxClient;
    </span><span>$s</span>->setServer("127.0.0.1", 9312<span>);

    </span><span>$s</span>-><span>setMatchMode(SPH_MATCH_PHRASE);
    </span><span>$s</span>->setMaxQueryTime(30<span>);
    </span><span>$res</span> = <span>$s</span>->query("宝马",'main'); <span>#</span><span>[宝马]关键字,[main]数据源source</span>
    <span>$err</span> = <span>$s</span>-><span>GetLastError();
    </span><span>var_dump</span>(<span>array_keys</span>(<span>$res</span>['matches'<span>]));
    </span><span>echo</span> "<br>"."通过获取的ID来读取数据库中的值即可。"."<br>"<span>;
    
    </span><span>echo</span> '<pre class="brush:php;toolbar:false">'<span>;
    </span><span>var_dump</span>(<span>$res</span><span>);
    </span><span>var_dump</span>(<span>$err</span><span>);
    </span><span>echo</span> '
';

调用示例二:支持分页

<span>php
    </span><span>header</span>("Content-type: text/html; charset=utf-8"<span>);
    </span><span>require</span>("./sphinxapi.php"<span>);
    </span><span>$s</span> = <span>new</span><span> SphinxClient;
    </span><span>$s</span>->setServer("192.168.252.132", 9312<span>);
    
    </span><span>//</span><span>SPH_MATCH_ALL, 匹配所有查询词(默认模式); SPH_MATCH_ANY, 匹配查询词中的任意一个; SPH_MATCH_EXTENDED2, 支持特殊运算符查询</span>
    <span>$s</span>-><span>setMatchMode(SPH_MATCH_ALL);
    </span><span>$s</span>->setMaxQueryTime(30);                                        <span>//</span><span>设置最大搜索时间</span>
    <span>$s</span>->SetArrayResult(<span>false</span>);                                        <span>//</span><span>是否将Matches的key用ID代替</span>
    <span>$s</span>->SetSelect ( "*" );                                            <span>//</span><span>设置返回信息的内容,等同于SQL</span>
    <span>$s</span>->SetRankingMode(SPH_RANK_BM25);                                <span>//</span><span>设置评分模式,SPH_RANK_BM25可能使包含多个词的查询的结果质量下降。 
    //$s->SetSortMode(SPH_SORT_EXTENDED);                            //发现增加此参数会使结果不准确
    //$s->SetSortMode(SPH_SORT_EXTENDED,"from_id asc,id desc");        //设置匹配项的排序模式, SPH_SORT_EXTENDED按一种类似SQL的方式将列组合起来,升序或降序排列。</span>
    <span>$weights</span> = <span>array</span> ('company_name' => 20<span>);
    </span><span>$s</span>->SetFieldWeights(<span>$weights</span>);                                    <span>//</span><span>设置字段权重</span>
    <span>$s</span>->SetLimits ( 0, 30, 1000, 0 );                                <span>//</span><span>设置结果集偏移量  SetLimits (便宜量,匹配项数目,查询的结果集数默认1000,阀值达到后停止)
    //$s->SetFilter ( $attribute, $values, $exclude=false );        //设置属性过滤
    //$s->SetGroupBy ( $attribute, $func, $groupsort="@group desc" );    //设置分组的属性</span>
    <span>$res</span> = <span>$s</span>->query('@* "汽车"','main','--single-0-query--'); <span>#</span><span>[宝马]关键字,[news]数据源source
    
    
    //代码高亮</span>
    <span>$tags</span> = <span>array</span><span>();
    </span><span>$tags_name</span> = <span>array</span><span>();
    </span><span>foreach</span>(<span>$res</span>['matches'] <span>as</span> <span>$key</span>=><span>$value</span><span>){
        </span><span>$tags</span>[] = <span>$value</span>['attrs'<span>];
          </span><span>$company_name</span>[] = <span>$value</span>['attrs']['company_name'<span>];
          </span><span>$description</span>[] = <span>$value</span>['attrs']['description'<span>];
    }
    </span><span>$company_name</span> = <span>$s</span>->BuildExcerpts (<span>$company_name</span>, 'main', '汽车', <span>$opts</span>=<span>array</span>() );        <span>//</span><span>执行高亮,这里索引名字千万不能用*</span>
    <span>$description</span> = <span>$s</span>->BuildExcerpts (<span>$description</span>, 'main', '汽车', <span>$opts</span>=<span>array</span>() );        <span>//</span><span>执行高亮,这里索引名字千万不能用*</span>
    <span>foreach</span>(<span>$tags</span> <span>as</span> <span>$k</span>=><span>$v</span><span>)
    {
        </span><span>$tags</span>[<span>$k</span>]['company_name'] = <span>$company_name</span>[<span>$k</span>];    <span>//</span><span>高亮后覆盖</span>
        <span>$tags</span>[<span>$k</span>]['description'] = <span>$description</span>[<span>$k</span>];    <span>//</span><span>高亮后覆盖</span>
<span>    }
    
    </span><span>//</span><span> 高亮后覆盖</span>
    <span>$i</span> = 0<span>;
    </span><span>foreach</span>(<span>$res</span>['matches'] <span>as</span> <span>$key</span>=><span>$value</span><span>){
        </span><span>$res</span>['matches'][<span>$key</span>] = <span>$tags</span>[<span>$i</span><span>];
        </span><span>$i</span>++<span>;
    }
    
    </span><span>$err</span> = <span>$s</span>-><span>GetLastError();
    
    </span><span>echo</span> '<pre class="brush:php;toolbar:false">'<span>;
    </span><span>var_export</span>(<span>$res</span><span>);
    </span><span>var_export</span>(<span>$err</span><span>);
    </span><span>echo</span> '
';

还有很对地方需要参考:http://www.coreseek.cn/docs/coreseek_4.1-sphinx_2.0.1-beta.html#api-reference

陳述:
本文內容由網友自願投稿,版權歸原作者所有。本站不承擔相應的法律責任。如發現涉嫌抄襲或侵權的內容,請聯絡admin@php.cn