Home >php教程 >PHP开发 >LNMP+sphinx realizes instant query of big data

LNMP+sphinx realizes instant query of big data

高洛峰
高洛峰Original
2016-12-01 13:58:411688browse

Sphinx is a full-text search engine developed by Russian Andrew Aksyonoff. It is intended to provide high-speed, low-space-occupancy, and high-result-relevant full-text search functions for other applications. Sphinx can be easily integrated with SQL databases and scripting languages. The current system has built-in support for MySQL and PostgreSQL database data sources, and also supports reading XML data in a specific format from standard input.


The features of Sphinx are as follows:

a) High-speed indexing (peak performance can reach 10 MB/sec on modern CPUs);

b) High-performance search (on 2 – 4GB of text data) on average, the average response time for each retrieval is less than 0.1 seconds);

c) Can handle massive amounts of data (currently known to be able to process more than 100 GB of text data, and 100 M documents on a single CPU system);

d ) Provides an excellent relevance algorithm, a composite Ranking method based on phrase similarity and statistics (BM25);

e) supports distributed search;

f) supports phrase search

g) provides document summary generation

h ) Can be used as a MySQL storage engine to provide search services;

i) Supports multiple search modes such as Boolean, phrase, word similarity, etc.;

j) Documents support multiple full-text search fields (maximum no more than 32);

k) Documents support multiple additional attribute information (such as grouping information, timestamps, etc.);

l) Support word segmentation;


Although mysql's MYISAM provides full-text indexing, its performance is not flattering , In addition, the database is not very good at doing such things after all. We need to leave these tasks to more suitable programs to reduce the pressure on the database. Therefore, using Sphinx as a full-text indexing tool for mysql is a good choice. This week I will mainly learn how to use this tool. I will briefly record the learning process and make a memo. I hope it can inspire other friends who are learning this tool.


Install sphinx

wget http://sphinxsearch.com/files/sphinx-2.2.11-release.tar.gz
tar -xf sphinx-2.2.11-release.tar.gz  && cd sphinx-2.2.11-release
./configure  --prefix=/usr/local/spinx --with-mysql
make && make install
ln -s /usr/local/mysql/lib/libmysqlclient.so.18 /usr/lib64/
libsphinxclient 安装(PHP模块需要)
cd api/libsphinxclient
./configure –prefix=/usr/local/sphinx
make &&  make install

2. Install php extension

wget http://pecl.php.net/get/sphinx-1.3.0.tgz
tar zxf sphinx-1.3.3.tgz && cd sphinx-1.3.3
./configure --with-php-config=/usr/local/php/bin/php-config --with-sphinx=/usr/local/sphinx/
make &&  make install


3. Create configuration file

cp /usr/local/sphinx/etc/sphinx-min.conf.dist  /usr/local/sphinx/etc/sphinx.conf

#
# Minimal Sphinx configuration sample (clean, simple, functional)
#
 
source src1
{
        type                    = mysql
 
        sql_host                = localhost
        sql_user                = root
        sql_pass                = www.123
        sql_db                  = test
        sql_port                = 3306  # optional, default is 3306
 
        sql_query               = \
                SELECT id, group_id, UNIX_TIMESTAMP(date_added) AS date_added, title, content \
                FROM documents
 
        sql_attr_uint           = group_id
        sql_attr_timestamp      = date_added
}
 
 
index test1
{
        source                  = src1
        path                    = /usr/local/spinx/var/data/test1
}
 
 
indexer
{
        mem_limit               = 32M
}
 
 
searchd
{
        listen                  = 9312
        listen                  = 9306:mysql41
        log                     = /usr/local/spinx/var/log/searchd.log
        query_log               = /usr/local/spinx/var/log/query.log
        read_timeout            = 5
        max_children            = 30
        pid_file                = /usr/local/spinx/var/log/searchd.pid
        seamless_rotate         = 1
        preopen_indexes         = 1
        unlink_old              = 1
        workers                 = threads # for RT to work
        binlog_path             = /usr/local/spinx/var/data
}


4.Create Index and start

/usr/local/spinx/bin/indexer  -c /usr/local/spinx/etc/sphinx.conf --all
/usr/local/spinx/bin/searchd  -c /usr/local/spinx/etc/sphinx.conf

5. Query verification

cd /root/sphinx-2.2.11-release/api
python test.py  test
DEPRECATED: Do not call this method or, even better, use SphinxQL instead of an API
Query 'test ' retrieved 3 of 3 matches in 0.000 sec
Query stats:
        'test' found 5 times in 3 documents
Matches:
1. doc_id=1, weight=2, group_id=1, date_added=2016-11-30 01:21:20
2. doc_id=2, weight=2, group_id=1, date_added=2016-11-30 01:21:20
3. doc_id=4, weight=1, group_id=2, date_added=2016-11-30 01:21:20


mysql> select * from documents;
+----+----------+-----------+---------------------+-----------------+---------------------------------------------------------------------------+
| id | group_id | group_id2 | date_added          | title           | content                                                                   |
+----+----------+-----------+---------------------+-----------------+---------------------------------------------------------------------------+
|  1 |        1 |         5 | 2016-11-30 01:21:20 | test one        | this is my test document number one. also checking search within phrases. |
|  2 |        1 |         6 | 2016-11-30 01:21:20 | test two        | this is my test document number two                                       |
|  3 |        2 |         7 | 2016-11-30 01:21:20 | another doc     | this is another group                                                     |
|  4 |        2 |         8 | 2016-11-30 01:21:20 | doc number four | this is to test groups                                                    |
+----+----------+-----------+---------------------+-----------------+---------------------------------------------------------------------------+



Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Previous article:Linux system curl commandNext article:Linux system curl command