Sphinx is a full-text search engine developed by Russian Andrew Aksyonoff. It is intended to provide high-speed, low-space-occupancy, and high-result-relevant full-text search functions for other applications. Sphinx can be easily integrated with SQL databases and scripting languages. The current system has built-in support for MySQL and PostgreSQL database data sources, and also supports reading XML data in a specific format from standard input.
The features of Sphinx are as follows:
a) High-speed indexing (peak performance can reach 10 MB/sec on modern CPUs);
b) High-performance search (on 2 – 4GB of text data) on average, the average response time for each retrieval is less than 0.1 seconds);
c) Can handle massive amounts of data (currently known to be able to process more than 100 GB of text data, and 100 M documents on a single CPU system);
d ) Provides an excellent relevance algorithm, a composite Ranking method based on phrase similarity and statistics (BM25);
e) supports distributed search;
f) supports phrase search
g) provides document summary generation
h ) Can be used as a MySQL storage engine to provide search services;
i) Supports multiple search modes such as Boolean, phrase, word similarity, etc.;
j) Documents support multiple full-text search fields (maximum no more than 32);
k) Documents support multiple additional attribute information (such as grouping information, timestamps, etc.);
l) Support word segmentation;
Although mysql's MYISAM provides full-text indexing, its performance is not flattering , In addition, the database is not very good at doing such things after all. We need to leave these tasks to more suitable programs to reduce the pressure on the database. Therefore, using Sphinx as a full-text indexing tool for mysql is a good choice. This week I will mainly learn how to use this tool. I will briefly record the learning process and make a memo. I hope it can inspire other friends who are learning this tool.
Install sphinx
wget http://sphinxsearch.com/files/sphinx-2.2.11-release.tar.gz tar -xf sphinx-2.2.11-release.tar.gz && cd sphinx-2.2.11-release ./configure --prefix=/usr/local/spinx --with-mysql make && make install ln -s /usr/local/mysql/lib/libmysqlclient.so.18 /usr/lib64/ libsphinxclient 安装(PHP模块需要) cd api/libsphinxclient ./configure –prefix=/usr/local/sphinx make && make install
2. Install php extension
wget http://pecl.php.net/get/sphinx-1.3.0.tgz tar zxf sphinx-1.3.3.tgz && cd sphinx-1.3.3 ./configure --with-php-config=/usr/local/php/bin/php-config --with-sphinx=/usr/local/sphinx/ make && make install
3. Create configuration file
cp /usr/local/sphinx/etc/sphinx-min.conf.dist /usr/local/sphinx/etc/sphinx.conf
# # Minimal Sphinx configuration sample (clean, simple, functional) # source src1 { type = mysql sql_host = localhost sql_user = root sql_pass = www.123 sql_db = test sql_port = 3306 # optional, default is 3306 sql_query = \ SELECT id, group_id, UNIX_TIMESTAMP(date_added) AS date_added, title, content \ FROM documents sql_attr_uint = group_id sql_attr_timestamp = date_added } index test1 { source = src1 path = /usr/local/spinx/var/data/test1 } indexer { mem_limit = 32M } searchd { listen = 9312 listen = 9306:mysql41 log = /usr/local/spinx/var/log/searchd.log query_log = /usr/local/spinx/var/log/query.log read_timeout = 5 max_children = 30 pid_file = /usr/local/spinx/var/log/searchd.pid seamless_rotate = 1 preopen_indexes = 1 unlink_old = 1 workers = threads # for RT to work binlog_path = /usr/local/spinx/var/data }
4.Create Index and start
/usr/local/spinx/bin/indexer -c /usr/local/spinx/etc/sphinx.conf --all /usr/local/spinx/bin/searchd -c /usr/local/spinx/etc/sphinx.conf
5. Query verification
cd /root/sphinx-2.2.11-release/api python test.py test DEPRECATED: Do not call this method or, even better, use SphinxQL instead of an API Query 'test ' retrieved 3 of 3 matches in 0.000 sec Query stats: 'test' found 5 times in 3 documents Matches: 1. doc_id=1, weight=2, group_id=1, date_added=2016-11-30 01:21:20 2. doc_id=2, weight=2, group_id=1, date_added=2016-11-30 01:21:20 3. doc_id=4, weight=1, group_id=2, date_added=2016-11-30 01:21:20
mysql> select * from documents; +----+----------+-----------+---------------------+-----------------+---------------------------------------------------------------------------+ | id | group_id | group_id2 | date_added | title | content | +----+----------+-----------+---------------------+-----------------+---------------------------------------------------------------------------+ | 1 | 1 | 5 | 2016-11-30 01:21:20 | test one | this is my test document number one. also checking search within phrases. | | 2 | 1 | 6 | 2016-11-30 01:21:20 | test two | this is my test document number two | | 3 | 2 | 7 | 2016-11-30 01:21:20 | another doc | this is another group | | 4 | 2 | 8 | 2016-11-30 01:21:20 | doc number four | this is to test groups | +----+----------+-----------+---------------------+-----------------+---------------------------------------------------------------------------+