PHP sphinx high-efficiency search engine configuration tutorial-PHP Tutorial-php.cn

Home

Backend Development

PHP Tutorial

PHP sphinx high-efficiency search engine configuration tutorial

WBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWB

Jul 25, 2016 am 09:03 AM

tar -xvzf sphinx-2.0.1-beta.tar.gz
cd sphinx-2.0.1-beta
./configure --prefix=/usr/local/sphinx --with-mysql -- with-iconv

Copy code

Note: Add parameter --enable-id64 for 64-bit

make && make install
cd /usr/local/sphinx/etc/
cp sphinx.conf.dist sphinx.conf

Copy code

配置: # # Sphinx configuration file sample # # WARNING! While this sample file mentions all available options, # it contains (very) short helper descriptions only. Please refer to # doc/sphinx.html for details. #

########################################## ## data source definition ########################################## source src1 { # data source type. mandatory, no default value # known types are mysql, pgsql, mssql, xmlpipe, xmlpipe2, odbc type = mysql

################################################## ## SQL settings (for 'mysql' and 'pgsql' types) ##################################################

# some straightforward parameters for SQL source types sql_host = localhost sql_user = root sql_pass = ****** sql_db = ****** sql_port = 3306 # optional, default is 3306

# UNIX socket name # optional, default is empty (reuse client library defaults) # usually '/var/lib/mysql/mysql.sock' on Linux # usually '/tmp/mysql.sock' on FreeBSD # sql_sock = /tmp/mysql.sock

# MySQL specific client connection flags # optional, default is 0 # 数据传输方式 # mysql_connect_flags = 32 # enable compression

# MySQL specific SSL certificate settings # optional, defaults are empty # SLL链接 # mysql_ssl_cert = /etc/ssl/client-cert.pem # mysql_ssl_key = /etc/ssl/client-key.pem # mysql_ssl_ca = /etc/ssl/cacert.pem

# MS SQL specific Windows authentication mode flag # MUST be in sync with charset_type index-level setting # optional, default is 0 # # mssql_winauth = 1 # use currently logged on user credentials

# MS SQL specific Unicode indexing flag # optional, default is 0 (request SBCS data) # # mssql_unicode = 1 # request Unicode data from server

# ODBC specific DSN (data source name) # mandatory for odbc source type, no default value # # odbc_dsn = DBQ=C:data;DefaultDir=C:data;Driver={Microsoft Text Driver (*.txt; *.csv)}; # sql_query = SELECT id, data FROM documents.csv

# ODBC and MS SQL specific, per-column buffer sizes # optional, default is auto-detect # # sql_column_buffers = content=12M, comments=1M

# pre-query, executed before the main fetch query # multi-value, optional, default is empty list of queries # 发送SQL语句前发送 sql_query_pre = SET NAMES utf8 sql_query_pre = SET SESSION query_cache_type=OFF

# main document fetch query # mandatory, integer document ID field MUST be the first selected column # 需要查询的表构建查询 sql_query = SELECT id,target_type,genre,stars,sub_title,sports_team,music_band,music_album FROM ko_link #如果多个数据源并要在一个索引,必须要保持字段的顺序数量跟数据都要一致,否则将出错

# joined/payload field fetch query # joined fields let you avoid (slow) JOIN and GROUP_CONCAT # payload fields let you attach custom per-keyword values (eg. for ranking) # # syntax is FIELD-NAME 'from' ( 'query' | 'payload-query' ); QUERY # joined field QUERY should return 2 columns (docid, text) # payload field QUERY should return 3 columns (docid, keyword, weight) # # REQUIRES that query results are in ascending docuent ID order! # multi-value, optional, default is empty list of queries # 添加字段,来源与表自动连接 # 字段结果集保持为 # (1,tags1) # (1,tags2) # (2,tags3) # (2,tags4) # 添加字段将用于搜索,结果如有第3个字段,第3个字段表示该记录的权重,权重为大于1的值 # sql_joined_field = tags from query; SELECT docid, CONCAT('tag',tagid) FROM tags ORDER BY docid ASC # sql_joined_field = wtags from payload-query; SELECT docid, tag, tagweight FROM tags ORDER BY docid ASC

# file based field declaration # # content of this field is treated as a file name # and the file gets loaded and indexed in place of a field # # max file size is limited by max_file_field_buffer indexer setting # file IO errors are non-fatal and get reported as warnings # 把字段声明放入文件 # sql_file_field = content_file_path

# range query setup, query that must return min and max ID values # optional, default is empty # # sql_query will need to reference $start and $end boundaries # if using ranged query: # 分区查询,防止MYSQL死锁 # sql_query = # SELECT doc.id, doc.id AS group, doc.title, doc.data # FROM documents doc # WHERE id>=$start AND id

# range query step # optional, default is 1024 # 分区查询跳步 # sql_range_step = 1000

# unsigned integer attribute declaration # multi-value (an arbitrary number of attributes is allowed), optional # optional bit size can be specified, default is 32 # 声明无符号数字段 #sql_attr_uint = target_type # sql_attr_uint = forum_id:9 # 9 bits for forum_id #sql_attr_uint = group_id #声明BOOL字段 # boolean attribute declaration # multi-value (an arbitrary number of attributes is allowed), optional # equivalent to sql_attr_uint with 1-bit size # # sql_attr_bool = is_deleted

# bigint attribute declaration # multi-value (an arbitrary number of attributes is allowed), optional # declares a signed (unlike uint!) 64-bit attribute # 声明长整字段 # sql_attr_bigint = my_bigint_id

# UNIX timestamp attribute declaration # multi-value (an arbitrary number of attributes is allowed), optional # similar to integer, but can also be used in date functions # 声明时间字段 # sql_attr_timestamp = posted_ts # sql_attr_timestamp = last_edited_ts #sql_attr_timestamp = date_added # string ordinal attribute declaration # multi-value (an arbitrary number of attributes is allowed), optional # sorts strings (bytewise), and stores their indexes in the sorted list # sorting by this attr is equivalent to sorting by the original strings # 声明字符串字段用于排序等,但此字段不会被存储 # sql_attr_str2ordinal = author_name

# floating point attribute declaration # multi-value (an arbitrary number of attributes is allowed), optional # values are stored in single precision, 32-bit IEEE 754 format # 声明浮点字段 # sql_attr_float = lat_radians # sql_attr_float = long_radians

# multi-valued attribute (MVA) attribute declaration # multi-value (an arbitrary number of attributes is allowed), optional # MVA values are variable length lists of unsigned 32-bit integers # # syntax is ATTR-TYPE ATTR-NAME 'from' SOURCE-TYPE [;QUERY] [;RANGE-QUERY] # ATTR-TYPE is 'uint' or 'timestamp' # SOURCE-TYPE is 'field', 'query', or 'ranged-query' # QUERY is SQL query used to fetch all ( docid, attrvalue ) pairs # RANGE-QUERY is SQL query used to fetch min and max ID values, similar to 'sql_query_range' # 声明复合字段 # sql_attr_multi = uint tag from query; SELECT docid, tagid FROM tags # sql_attr_multi = uint tag from ranged-query; # SELECT docid, tagid FROM tags WHERE id>=$start AND id

# string attribute declaration # multi-value (an arbitrary number of these is allowed), optional # lets you store and retrieve strings # 只是把数据存储,但不会索引改字段 # sql_attr_string = stitle

# wordcount attribute declaration # multi-value (an arbitrary number of these is allowed), optional # lets you count the words at indexing time # 将转化成关键字的字段,用于提高匹配率 # sql_attr_str2wordcount = stitle

# combined field plus attribute declaration (from a single column) # stores column as an attribute, but also indexes it as a full-text field # 跟sql_attr_string不同是该属性加入索引 # sql_field_string = author # sql_field_str2wordcount = title # post-query, executed on sql_query completion # optional, default is empty # 取后查询 # sql_query_post = # post-index-query, executed on successful indexing completion # optional, default is empty # $maxid expands to max document ID actually fetched from DB # 索引后查询 # sql_query_post_index = REPLACE INTO counters ( id, val ) # VALUES ( 'max_indexed_id', $maxid )

# ranged query throttling, in milliseconds # optional, default is 0 which means no delay # enforces given delay before each query step #分区查询的时间间隔 sql_ranged_throttle = 0

# document info query, ONLY for CLI search (ie. testing and debugging) # optional, default is empty # must contain $id macro and must fetch the document by that id #命令行调试查询结果用 sql_query_info = SELECT * FROM ko_link WHERE id=$id

# kill-list query, fetches the document IDs for kill-list # k-list will suppress matches from preceding indexes in the same query # optional, default is empty ##清理指定查询ID列表,对于数据的更改 # sql_query_killlist = SELECT id FROM documents WHERE edited>=@last_reindex

# columns to unpack on indexer side when indexing # multi-value, optional, default is empty list # Enabling ZIP compression can reduce system load, but the zlib library zlib-dev library must be available # unpack_zlib = zlib_column # unpack_mysqlcompress = compressed_column # unpack_mysqlcompress = compressed_column_2

# maximum unpacked length allowed in MySQL COMPRESS() unpacker # optional, default is 16M # The size of the compression buffer cannot be smaller than the field storage value # unpack_mysqlcompress_maxsize = 16M

######################### ## xmlpipe2 configuration ######################### # type = xmlpipe

# shell command to invoke xmlpipe stream producer # mandatory # # xmlpipe_command = cat /usr/local/sphinx/var/test.xml

#xmlpipe2 field declaration # multi-value, optional, default is empty # # xmlpipe_field = subject # xmlpipe_field = content

#xmlpipe2 attribute declaration # multi-value, optional, default is empty # all xmlpipe_attr_XXX options are fully similar to sql_attr_XXX # # xmlpipe_attr_timestamp = published #xmlpipe_attr_uint = author_id

# perform UTF-8 validation, and filter out incorrect codes # avoids XML parser choking on non-UTF-8 documents #optional, default is 0 # # xmlpipe_fixup_utf8 = 1 }

# inherited source example #Inherit data source # all the parameters are copied from the parent source, # and may then be overridden in this source definition #source src1throttled : src1 #{ # sql_ranged_throttle = 100 #}

##################### ## index definition ##################### # local index example # # this is an index which is stored locally in the filesystem # # all indexing-time options (such as morphology and charsets) # are configured per local index index test1 { # index type # optional, default is 'plain' # known values are 'plain', 'distributed', and 'rt' (see samples below) #Index type local distributed # type = plain

# document source(s) to index # multi-value, mandatory # document IDs must be globally unique across all sources #Data source, multiple data sources can be used source = src1

# index files path and file name, without extension # mandatory, path must be writable, extensions will be auto-appended # Index saving path path = /usr/local/sphinx/var/data/test1

# document attribute values (docinfo) storage mode # optional, default is 'extern' # known values are 'none', 'extern' and 'inline' #Index storage method docinfo = extern

# memory locking for cached data (.spa and .spi), to prevent swapping # optional, default is 0 (do not mlock) # requires searchd to be run from root #Memory lock requires sufficient permissions mlock = 0

# a list of morphology preprocessors to apply #optional, default is empty # # builtin preprocessors are 'none', 'stem_en', 'stem_ru', 'stem_enru', # 'soundex', and 'metaphone'; additional preprocessors available from # libstemmer are 'libstemmer_XXX', where XXX is algorithm code # (see libstemmer_c/libstemmer/modules.txt) #Word extractor # morphology = stem_en, stem_ru, soundex # morphology = libstemmer_german # morphology = libstemmer_sv morphology = stem_en

# minimum word length at which to enable stemming # optional, default is 1 (stem everything) # Minimum word length for stemming # min_stemming_len = 1

# stopword files list (space separated) #optional, default is empty # contents are plain text, charset_table and stemming are both applied # Disable search terms # stopwords = /usr/local/sphinx/var/data/stopwords.txt

# wordforms file, in "mapfrom > mapto" plain text format #optional, default is empty #Word type dictionary can be generated using spelldump tool # wordforms = /usr/local/sphinx/var/data/wordforms.txt

# tokenizing exceptions file #optional, default is empty #Token special case file, that is, some words have complete meanings and cannot be split into indexes, such as a&t and a&t # plain text, case sensitive, space insensitive in map-from part # one "Map Several Words => ToASingleOne" entry per line # # exceptions = /usr/local/sphinx/var/data/exceptions.txt

#minimum indexed word length # default is 1 (index everything) # Minimum index length, that is, words smaller than the specified length will not be indexed min_word_len = 1

# charset encoding type # optional, default is 'sbcs' # known types are 'sbcs' (Single Byte CharSet) and 'utf-8' # Character Encoding charset_type = utf-8

# charset definition and case folding rules "table" # optional, default value depends on charset_type # # defaults are configured to include English and Russian characters only # you need to change the table to include additional ones # this behavior MAY change in future versions # # 'sbcs' default value is # charset_table = 0..9, A..Z->a..z, _, a..z, U+A8->U+B8, U+B8, U+C0..U+DF->U +E0..U+FF, U+E0..U+FF #Convert character table # 'utf-8' default value is # charset_table = 0..9, A..Z->a..z, _, a..z, U+410..U+42F->U+430..U+44F, U+430.. U+44F

# ignored characters list # optional, default value is empty # Ignore character table #ignore_chars = U+00AD

# minimum word prefix length to index # optional, default is 0 (do not index prefixes) #The minimum prefix length of the index, use it with care, indexing and search times will deteriorate # min_prefix_len = 0

# minimum word infix length to index # optional, default is 0 (do not index infixes) #The minimum infix length of the index. Use it with care. Indexing and search times will deteriorate. # min_infix_len = 0

# list of fields to limit prefix/infix indexing to # optional, default value is empty (index all fields in prefix/infix mode) # Unknown # prefix_fields = filename # infix_fields = url, domain

# enable star-syntax (wildcards) when searching prefix/infix indexes # search-time only, does not affect indexing, can be 0 or 1 # optional, default is 0 (do not use wildcard syntax) # Enable asterisk syntax # enable_star = 1

# expand keywords with exact forms and/or stars when searching fit indexes # search-time only, does not affect indexing, can be 0 or 1 # optional, default is 0 (do not expand keywords) # Expand search keywords in the form: running -> ( running | *running* | =running ) # expand_keywords = 1

# n-gram length to index, for CJK indexing # only supports 0 and 1 for now, other lengths to be implemented # optional, default is 0 (disable n-grams) # Basic support for Chinese and other languages # ngram_len = 1

# n-gram characters list, for CJK indexing #optional, default is empty #Value range for Chinese or other languages # ngram_chars = U+3000..U+2FA1F

# phrase boundary characters list #optional, default is empty # boundary character # phrase_boundary = ., ?, !, U+2026 # horizontal ellipsis

# phrase boundary word position increment #optional, default is 0 # Boundary character increment # phrase_boundary_step = 100

# blended characters list # blended chars are indexed both as separators and valid characters # for instance, AT&T will results in 3 tokens ("at", "t", and "at&t") #optional, default is empty # Mixed character list # blend_chars = +, &, U+23

# blended token indexing mode # a comma separated list of blended token indexing variants # known variants are trim_none, trim_head, trim_tail, trim_both, skip_pure # optional, default is trim_none #unknown # blend_mode = trim_tail, skip_pure

# whether to strip HTML tags from incoming documents # known values are 0 (do not strip) and 1 (do strip) #optional, default is 0 # Delete the html tag (be careful that the text is deleted) html_strip = 0

# what HTML attributes to index if stripping HTML # optional, default is empty (do not index anything) # Reserved HTML tags # html_index_attrs = img=alt,title; a=title;

# what HTML elements contents to strip # optional, default is empty (do not strip element contents) # Not only will the label be deleted, but the text it contains will also be deleted. #html_remove_elements = style, script

# whether to preopen index data files on startup # optional, default is 0 (do not preopen), searchd-only # Open the index in advance or open the index every time you query # preopen = 1

# whether to keep dictionary (.spi) on disk, or cache it in RAM # optional, default is 0 (cache in RAM), searchd-only #Whether the dictionary file is saved in memory # ondisk_dict = 1

# whether to enable in-place inversion (2x less disk, 90-95% speed) # optional, default is 0 (use separate temporary files), indexer-only # Whether to enable in-place index inversion will reduce disk usage and cause a slight performance loss. # inplace_enable = 1

# in-place fine-tuning options # optional, defaults are listed below #fine-tuning reverse in place # inplace_hit_gap = 0 # preallocated hitlist gap size # inplace_docinfo_gap = 0 # preallocated docinfo gap size # inplace_reloc_factor = 0.1 # relocation buffer size within arena # inplace_write_factor = 0.1 # write buffer size within arena

# whether to index original keywords along with stemmed versions # enables "=exactform" operator to work #optional, default is 0 # Whether to index the stemmed/remapped form of the original keyword while also indexing the original word # index_exact_words = 1

# position increment on overshort (less that min_word_len) words # optional, allowed values are 0 and 1, default is 1 #Increase the position value after passing a word that is too short (a word shorter than min_word_len) # overshort_step = 1

#position increment on stopword # optional, allowed values are 0 and 1, default is 1 #Add position value optional option after stop word # stopword_step = 1

#hitless words list # positions for these keywords will not be stored in the index # optional, allowed values are 'all', or a list file name # List of characters that cannot be interrupted # hitless_words = all # hitless_words = hitless.txt #Character file

# detect and index sentence and paragraph boundaries # required for the SENTENCE and PARAGRAPH operators to work # optional, allowed values are 0 and 1, default is 0 # Whether to check tag merging for HTML # index_sp = 1

# index zones, delimited by HTML/XML tags # a comma separated list of tags and wildcards # required for the ZONE operator to work # optional, default is empty string (do not index zones) # Weight of HTML tags # index_zones = title, h*, th }

# inherited index example #Index inheritance # all the parameters are copied from the parent index, # and may then be overridden in this index definition #index test1stemmed : test1 #{ # path = /usr/local/sphinx/var/data/test1stemmed # morphology = stem_en #}

# distributed index example # # this is a virtual index which can NOT be directly indexed, # and only contains references to other local and/or remote indexes #index dist1 #{ #Distributed index configuration # 'distributed' index type MUST be specified # type = distributed

# local index to be searched # there can be many local indexes configured # local = test1 # local = test1stemmed

#remoteagent # multiple remote agents may be specified # syntax for TCP connections is 'hostname:port:index1,[index2[,...]]' # syntax for local UNIX connections is '/path/to/socket:index1,[index2[,...]]' # agent = localhost:9313:remote1 # agent = localhost:9314:remote2,remote3 # agent = /var/run/searchd.sock:remote4

# blackhole remote agent, for debugging/testing # network errors and search results will be ignored # # agent_blackhole = testbox:9312:testindex1,testindex2

# remote agent connection timeout, milliseconds # optional, default is 1000 ms, ie. 1 sec # agent_connect_timeout = 1000

# remote agent query timeout, milliseconds # optional, default is 3000 ms, ie. 3 sec # agent_query_timeout = 3000 #}

# realtime index example # # you can run INSERT, REPLACE, and DELETE on this index on the fly # using MySQL protocol (see 'listen' directive below) #index rt #{ # 'rt' index type must be specified to use RT index # type = rt

# index files path and file name, without extension # mandatory, path must be writable, extensions will be auto-appended

# path = /usr/local/sphinx/var/data/rt

# RAM chunk size limit # RT index will keep at most this much data in RAM, then flush to disk # optional, default is 32M # #rt_mem_limit = 512M

#full-text field declaration # multi-value, mandatory #rt_field = title #rt_field = content

# unsigned integer attribute declaration # multi-value (an arbitrary number of attributes is allowed), optional # declares an unsigned 32-bit attribute #rt_attr_uint = gid

# RT indexes currently support the following attribute types: # uint, bigint, float, timestamp, string # #rt_attr_bigint = guid #rt_attr_float = gpa #rt_attr_timestamp = ts_added #rt_attr_string = author #}

###################### ## indexer settings ######################

indexer { #Memory usage limit for indexing process. Optional option, default 32M. # memory limit, in bytes, kiloytes (16384K) or megabytes (256M) # optional, default is 32M, max is 2047M, recommended is 256M to 1024M mem_limit = 32M

# maximum IO calls per second (for I/O throttling) # optional, default is 0 (unlimited) # Maximum number of I/O operations per second, used to limit I/O operations. Optional option, default is 0 (no limit). # max_iops = 40

# maximum IO call size, bytes (for I/O throttling) # optional, default is 0 (unlimited) # Maximum allowed I/O operation size, in bytes, used for I/O throttling. Optional option, default is 0 (no limit). # max_iosize = 1048576

# maximum xmlpipe2 field length, bytes # optional, default is 2M # Maximum field size allowed for XMLLpipe2 data source # max_xmlpipe2_field = 4M

# write buffer size, bytes # several (currently up to 4) buffers will be allocated # write buffers are allocated in addition to mem_limit # optional, default is 1M # The size of the write buffer, in bytes. Optional option, default value is 1MB. # write_buffer = 1M

# maximum file field adaptive buffer size # optional, default is 8M, minimum is 1M # # max_file_field_buffer = 32M }

####################### ## searchd settings #######################

searchd { # [hostname:]port[:protocol], or /unix/socket/path to listen on # known protocols are 'sphinx' (SphinxAPI) and 'mysql41' (SphinxQL) # # multi-value, multiple listen points are allowed # optional, defaults are 9312:sphinx and 9306:mysql41, as below # # listen = 127.0.0.1 # listen = 192.168.0.1:9312 # listen = 9312 # listen = /var/run/searchd.sock listen=9312 #listen = 9306:mysql41

# log file, searchd run info is logged here # optional, default is 'searchd.log' # All searchd runtime events will be recorded in this log file. log = /usr/local/sphinx/var/log/searchd.log

# query log file, all search queries are logged here # optional, default is empty (do not log queries) # All search queries will be recorded in this file. query_log = /usr/local/sphinx/var/log/query.log

# client read timeout, seconds # optional, default is 5 #The read timeout requested by the network client, in seconds. read_timeout = 5

# request timeout, seconds # optional, default is 5 minutes #When using persistent connections, the maximum time to wait between two queries (unit is seconds). client_timeout = 300

# maximum amount of children to fork (concurrent searches to run) # optional, default is 0 (unlimited) #The maximum number of child processes, used to control server load. It is not possible to have more searches running simultaneously than this setting value at any time. When the limit is reached, new incoming clients are rejected with a temporary failure (SEARCH_RETRY) status code and a message is given stating that the server has reached the maximum connection limit. max_children = 30

# PID file, searchd process ID file name # mandatory #Process ID file pid_file = /usr/local/sphinx/var/log/searchd.pid

# max amount of matches the daemon ever keeps in RAM, per-index # WARNING, THERE'S ALSO PER-QUERY LIMIT, SEE SetLimits() API CALL # default is 1000 (just like Google) #The maximum number of matches that the daemon maintains in memory for each index and returns to the client. max_matches = 1000

# seamless rotate, prevents rotate stalls if precaching huge datasets # optional, default is 1 #Prevent searchd rotation from stopping responding when indexes need to prefetch large amounts of data. Optional option, defaults to 1 (enables seamless rotation). seamless_rotate = 1

# whether to forcibly preopen all indexes on startup # optional, default is 1 (preopen everything) #Whether to force reopening of all index files at startup. Optional option, default is 0 (do not reopen). preopen_indexes = 1

# whether to unlink .old index copies on successful rotation. # optional, default is 1 (do unlink) #After successful index rotation, whether to delete the index copy with .old extension. Optional option, default is 1 (delete these index copies). unlink_old = 1

# attribute updates periodic flush timeout, seconds # updates will be automatically dumped to disk this frequently # optional, default is 0 (disable periodic flush) # When UpdateAttributes() is called, whether updates are written to disk after a period of time # attr_flush_period = 900

# instance-wide ondisk_dict defaults (per-index value take precedence) # optional, default is 0 (precache all dictionaries in RAM) #Global default value for ondisk_dict directive. Optional option, default value is 0 (pre-buffer the dictionary into memory). #ondisk_dict_default = 1

#MVA updates pool size # shared between all instances of searchd, disables attr flushes! # optional, default size is 1M #The maximum packet size allowed during network communication. mva_updates_pool = 1M

# max allowed network packet size # limits both query packets from clients, and responses from agents # optional, default size is 8M #The shared pool size of storage space used for multi-valued attribute MVA updates. max_packet_size = 8M

#crash log path # searchd will (try to) log crashed query to 'crash_log_path.PID' file # optional, default is empty (do not create crash logs) #Path of the crash log file # crash_log_path = /usr/local/sphinx/var/log/crash

# max allowed per-query filter count # optional, default is 256 #The maximum number of filters allowed to be set for each query. Used only for internal checking and does not directly affect memory usage or performance. max_filters = 256

# max allowed per-filter values count # optional, default is 4096 #The maximum number of values allowed by a single filter. Used only for internal checking and does not directly affect memory usage or performance. max_filter_values = 4096

# socket listen queue length # optional, default is 5 #TCP listening backlog list length. Requests that fail to match immediately fail with a "Connection refused" error message # listen_backlog = 5

# per-keyword read buffer size # optional, default is 256K #The size of the read buffer for each keyword. Optional option, default value is 256K. # read_buffer = 256K

# unhinted read size (currently used when reading hits) # optional, default is 32K #The size of read operations when silent. Optional option, default value is 32K. # read_unhinted = 32K

# max allowed per-batch query count (aka multi-query count) # optional, default is 32 #Limit the number of queries in each batch. Query volume after an OPEN max_batch_queries = 32

# max common subtree document cache size, per-query # optional, default is 0 (disable subtree optimization) # # subtree_docs_cache = 4M

# max common subtree hit cache size, per-query # optional, default is 0 (disable subtree optimization) # Limit RAM to use a common subtree for optimization. No optimization by default. # subtree_hits_cache = 8M

# multi-processing mode (MPM) # known values are none, fork, prefork, and threads # optional, default is fork # Way of working workers = threads # for RT to work

# max threads to create for searching local parts of a distributed index # optional, default is 0, which means disable multi-threaded searching # should work with all MPMs (ie. does NOT require workers=threads) # # dist_threads = 4

# binlog files path; use empty string to disable binlog # optional, default is build-time configured data directory # Binary log path # binlog_path = # disable logging # binlog_path = /usr/local/sphinx/var/data # binlog.001 etc will be created there

# binlog flush/sync mode # 0 means flush and sync every second # 1 means flush and sync every transaction # 2 means flush every transaction, sync every second #optional, default is 2 # Log refresh mode # binlog_flush = 2

#binlog per-file size limit # optional, default is 128M, 0 means no limit #Maximum log size # binlog_max_log_size = 256M

# per-thread stack size, only affects workers=threads mode # optional, default is 64K #Stack size of each thread. # thread_stack = 128K

# per-keyword expansion limit (for dict=keywords prefix searches) # optional, default is 0 (no limit) # Expand to the maximum number of keywords # expansion_limit = 1000

# RT RAM chunks flush period # optional, default is 0 (no periodic flush) #RT The time the index is checked in memory #rt_flush_period = 900

# query log file format # optional, known values are plain and sphinxql, default is plain # Query log format # query_log_format = sphinxql

# version string returned to MySQL network protocol clients # optional, default is empty (use Sphinx version) #MYSQL version # mysql_version_string = 5.0.37

# trusted plugin directory # optional, default is empty (disable UDFs) # Plug-in directory # plugin_dir = /usr/local/sphinx/lib

# default server-wide collation # optional, default is libc_ci # Link character set # collation_server = utf8_general_ci

# server-wide locale for libc based collations # optional, default is C # collation options # collation_libc_locale = ru_RU.UTF-8

# threaded server watchdog (only used in workers=threads mode) # optional, values are 0 and 1, default is 1 (watchdog on) # Whether to enable the server monitoring process #watchdog = 1

# SphinxQL compatibility mode (legacy columns and their names) # optional, default is 0 (SQL compliant syntax and result sets) #sphinxql Compatibility Mode # compat_sphinxql_magics = 1 }

# --eof--

Create index: /usr/local/sphinx/bin/indexer --config /usr/local/sphinx/etc/sphinx.conf index1

Continuous service for indexing: /usr/local/sphinx/bin/indexer --config /usr/local/sphinx/etc/sphinx.conf --all --rotate

Start the indexing service to make the PHP client available /usr/local/sphinx/bin/searchd --config /usr/local/sphinx/etc/sphinx.conf

Create PHP test files

$s = new SphinxClient;
setServer("localhost", 9312);
$s->setMatchMode(SPH_MATCH_ANY);
$s->setMaxQueryTime(3);
$result = $s->query("test");#Query
print_r ($result);
?>

Copy code

Just run it

searchd command: Force stop: searchd --config /home/myuser/sphinx.conf –stop

Quiet Stop: searchd --config /home/myuser/sphinx.conf –stopwait

Status: searchd --config /home/myuser/sphinx.conf –status

Specify PID file searchd --config /home/myuser/sphinx.conf --pidfile /home/myuser/sphinx.pid

Start in console mode: searchd --config /home/myuser/sphinx.conf –console

Only start the specified index searchd --index myindex

Tool software for generating some dictionaries: spelldump

indextool software for some transfer and other tools

Search string rules: * operator OR: hello | world

* operator NOT: hello-world hello !world

* field search operator: @title hello @body world

* field position limit modifier (introduced in version 0.9.9-rc1): @body[50] hello

* multiple-field search operator: @(title,body) hello world

* all-field search operator: @* hello

* phrase search operator: "hello world"

* proximity search operator: "hello world"~10

* quorum matching operator: "the world is a wonderful place"/3

* strict order operator (aka operator "before"): aaa

* exact form modifier (introduced in version 0.9.9-rc1): raining =cats and =dogs

* field-start and field-end modifier (introduced in version 0.9.9-rc2): ^hello world$

* NEAR, generalized proximity operator (introduced in version 2.0.1-beta): hello NEAR/3 world NEAR/4 "my test"

* SENTENCE operator (introduced in version 2.0.1-beta): all SENTENCE words SENTENCE "in one sentence"

* PARAGRAPH operator (introduced in version 2.0.1-beta): "Bill Gates" PARAGRAPH "Steve Jobs"

* zone limit operator: ZONE:(h3,h4) only in these titles

Expressions, support functions, etc. The date uses a timestamp (it seems that it is not used when it is not used as a MYSQL storage engine) * Arithmetic operators: +, -, *, /, %, DIV, MOD * Comparison operators: =, =, * Boolean operators: AND, OR, NOT * Bitwise operators: &, | *ABS() *BIGINT() *CEIL() *COS() * CRC32() *DAY() *EXP() *FLOOR() *GEODIST() *IDIV() *IF() *IN() * INTERVAL() *LN() *LOG10() *LOG2() *MAX() *MIN() *MONTH() *NOW() *POW() * SIN() * SINT() * SQRT() * YEAR() * YEARMONTH() * YEARMONTHDAY()

Introduction to client methods:

include_once 'sphinxapi.php';
$s = new SphinxClient();
$s->setServer("localhost", 9312);
$s- >SetConnectTimeout ( 1 );//Set link timeout
/*
$s->AddQuery();//List query
$s->RunQueries ();// Execute list query
$s->ResetFilters();//Clear filter conditions
$s->BuildExcerpts($docs, $index, $words);//Generate brief
$s->BuildKeywords($query , $index, $hits);//Generate keywords
$s->GetLastError();//Error
$s->GetLastWarning();//Warning
$s->FlushAttributes();/ /Index is flushed to the hard disk
$s->IsConnectError();//Link error
$s->ResetGroupBy();//Reset group
$s->SetFieldWeights (array('sub_title'=>1));//The minimum weight is 1
$s->SetIDRange($min, $max);//ID range
$s->SetIndexWeights(array('test1 '=>1));//Index weight
$s->Status();//Whether the service is available
$s->UpdateAttributes($index, $attrs, $values);//Update the hard disk index
*/
/*
SPH_MATCH_ALL, matches all query words (default mode);
SPH_MATCH_ANY, matches any of the query words;
SPH_MATCH_PHRASE, matches query as a phrase, requiring perfect match;
SPH_MATCH_BOOLEAN, matches query as a boolean expression (see Section 5.2, “Boolean query syntax”);
SPH_MATCH_EXTENDED, matches query as an expression in Sphinx internal query language (see Section 5.3, “Extended query syntax”). As of 0.9.9, this has been superceded by SPH_MATCH_EXTENDED2, providing additional functionality and better performance. The ident is retained for legacy application code that will continue to be compatible once Sphinx and its components, including the API, are upgraded.
SPH_MATCH_EXTENDED2, matches query using the second version of the Extended matching mode.
SPH_MATCH_FULLSCAN, m
*/
$s->setMatchMode(SPH_MATCH_ANY);//Matching mode
$s->setMaxQueryTime(3);//Query timeout
//$s->SetSelect ( $select ); //Set the returned fields
/*
$cl->SetSelect ( "*, @weight+(user_karma+ln(pageviews))*0.1 AS myweight" );
$cl->SetSelect ( "exp_years, salary_gbp* {$gbp_usd_rate} AS salary_usd,
IF(age>40,1,0) AS over40" );
$cl->SetSelect ( "*, AVG(price) AS avgprice" );
*/> ;
/*
$cl->SetGroupBy ( "category", SPH_GROUPBY_ATTR, "@count desc" );
$cl->SetGroupDistinct ( "vendor" );
==
SELECT id, weight , all-attributes,
COUNT(DISTINCT vendor) AS @distinct,
COUNT(*) AS @count
FROM products
GROUP BY category
ORDER BY @count DESC
*/
//$s->SetGroupBy ( $ groupby, SPH_GROUPBY_ATTR, $groupsort );//Summary
//$s->SetGroupDistinct ( $distinct );//Set non-duplicate fields
$s->SetArrayResult ( true ) ;//Whether the result has an ID
/*
SPH_SORT_RELEVANCE mode, that sorts by relevance in descending order (best matches first);
SPH_SORT_ATTR_DESC mode, that sorts by an attribute in descending order (bigger attribute values first);
SPH_SORT_ATTR_ASC mode, that sorts by an attribute in ascending order (smaller attribute values first);
SPH_SORT_TIME_SEGMENTS mode, that sorts by time segments (last hour/day/week/month) in descending order, and then by relevance in descending order;
SPH_SORT_EXTENDED mode, that sorts by SQL-like combination of columns in ASC/DESC order;
SPH_SORT_EXPR mode, that sorts by an arithmetic expression.
*/
//$s->SetSortMode ( SPH_SORT_EXTENDED, $ sortby );//Sort mode
/*
$s->SetOverride($attrname, $attrtype, $values);
$s->ResetOverrides();*/
/*
$s->SetRetries($count);//Retry if setting fails
$s->SetRankingMode($ranker);//Set ranking mode is applicable to SPH_MATCH_EXTENDED2 search
//When the third parameter is true, it is equivalent to $attribute!=$value, and the default value is false
$s->SetFilter ('target_type', $filtervals);//Set filtering, value list
$s->SetFilterFloatRange($attribute, $min, $max);//Floating range
$s->SetFilterRange($attribute, $min, $max);//Specified range
$s->SetGeoAnchor ($attrlat, $attrlong, $lat, $long);
*/
//link
//$s->SetFilter ( 'target_type', array(1),true );
< ;p>$s->SetLimits (0, 10);//Display quantity: starting amount maximum right offset
$result = $s->query("good","team");// Query
print_r($result);

Copy code

Statement

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

PHP Performance Tuning for High Traffic WebsitesMay 14, 2025 am 12:13 AM

ThesecrettokeepingaPHP-poweredwebsiterunningsmoothlyunderheavyloadinvolvesseveralkeystrategies:1)ImplementopcodecachingwithOPcachetoreducescriptexecutiontime,2)UsedatabasequerycachingwithRedistolessendatabaseload,3)LeverageCDNslikeCloudflareforservin

Dependency Injection in PHP: Code Examples for BeginnersMay 14, 2025 am 12:08 AM

You should care about DependencyInjection(DI) because it makes your code clearer and easier to maintain. 1) DI makes it more modular by decoupling classes, 2) improves the convenience of testing and code flexibility, 3) Use DI containers to manage complex dependencies, but pay attention to performance impact and circular dependencies, 4) The best practice is to rely on abstract interfaces to achieve loose coupling.

PHP Performance: is it possible to optimize the application?May 14, 2025 am 12:04 AM

Yes,optimizingaPHPapplicationispossibleandessential.1)ImplementcachingusingAPCutoreducedatabaseload.2)Optimizedatabaseswithindexing,efficientqueries,andconnectionpooling.3)Enhancecodewithbuilt-infunctions,avoidingglobalvariables,andusingopcodecaching

PHP Performance Optimization: The Ultimate GuideMay 14, 2025 am 12:02 AM

ThekeystrategiestosignificantlyboostPHPapplicationperformanceare:1)UseopcodecachinglikeOPcachetoreduceexecutiontime,2)Optimizedatabaseinteractionswithpreparedstatementsandproperindexing,3)ConfigurewebserverslikeNginxwithPHP-FPMforbetterperformance,4)

PHP Dependency Injection Container: A Quick StartMay 13, 2025 am 12:11 AM

APHPDependencyInjectionContainerisatoolthatmanagesclassdependencies,enhancingcodemodularity,testability,andmaintainability.Itactsasacentralhubforcreatingandinjectingdependencies,thusreducingtightcouplingandeasingunittesting.

Dependency Injection vs. Service Locator in PHPMay 13, 2025 am 12:10 AM

Select DependencyInjection (DI) for large applications, ServiceLocator is suitable for small projects or prototypes. 1) DI improves the testability and modularity of the code through constructor injection. 2) ServiceLocator obtains services through center registration, which is convenient but may lead to an increase in code coupling.

PHP performance optimization strategies.May 13, 2025 am 12:06 AM

PHPapplicationscanbeoptimizedforspeedandefficiencyby:1)enablingopcacheinphp.ini,2)usingpreparedstatementswithPDOfordatabasequeries,3)replacingloopswitharray_filterandarray_mapfordataprocessing,4)configuringNginxasareverseproxy,5)implementingcachingwi

PHP Email Validation: Ensuring Emails Are Sent CorrectlyMay 13, 2025 am 12:06 AM

PHPemailvalidationinvolvesthreesteps:1)Formatvalidationusingregularexpressionstochecktheemailformat;2)DNSvalidationtoensurethedomainhasavalidMXrecord;3)SMTPvalidation,themostthoroughmethod,whichchecksifthemailboxexistsbyconnectingtotheSMTPserver.Impl

See all articles