


A while ago I tried to use Sphinx, a full-text search system that can be easily called by various languages (PHP/Python/Ruby/etc). Most of the information on the Internet is installed and used in the Linux environment. Of course, as a production environment, it is necessary to deploy it in a *nix environment. For learning and testing, the Windows environment is more convenient.
This article aims to provide a convenient way to install and configure Sphinx under Windows to support Chinese full-text search. The configuration part is common under Linux.
1. About Sphinx
Sphinx is a full-text search engine released under GPLv2. Commercial licensing (for example, embedding into other programs) requires contacting the author (Sphinxsearch.com) to obtain commercial licensing.
Generally speaking, Sphinx is an independent search engine, intended to provide high-speed, low-space-occupancy, and high-result-relevant full-text search capabilities for other applications. Sphinx can be easily integrated with SQL databases and scripting languages.
The current system has built-in support for MySQL and PostgreSQL database data sources, and also supports reading XML data in specific formats from standard input. By modifying the source code, users can add new data sources (for example, native support for other types of DBMS).
The search API supports PHP, Python, Perl, Rudy and Java, and can also be used as a MySQL storage engine. The search API is very simple and can be ported to new languages within a few hours.
Sphinx Features:
- High-speed indexing (on modern CPUs, peak performance can reach 10MB/sec);
- High-performance search (average response time per retrieval is less than 0.1 seconds on 2–4GB of text data);
- Can handle massive amounts of data (currently known to be able to process more than 100GB of text data, and 100M documents on a single CPU system);
- Provides an excellent relevance algorithm, a composite Ranking method based on phrase similarity and statistics (BM25);
- Support distributed search;
- Providing excerpt generation of documents;
- Can be used as a MySQL storage engine to provide search services;
- Supports multiple search modes such as Boolean, phrase, word similarity, etc.;
- The document supports multiple full-text search fields (maximum no more than 32);
- Documents support multiple additional attribute information (for example: grouping information, timestamp, etc.);
- Stop word query;
- Supports single byte encoding and UTF-8 encoding;
- Native MySQL support (supports both MyISAM and InnoDB);
- Native PostgreSQL support.
The Chinese manual is available here, thanks to the translator for his hard work.
2. Installation of Sphinx on Windows
1. Find the latest windows version directly at http://www.sphinxsearch.com/downloads.html. What I downloaded here is Win32 release binaries with MySQL support. After downloading, unzip it in the D:sphinx directory;
2. Create a new data directory under D:sphinx to store index files and a log directory for log files. Copy D:sphinxsphinx.conf.in to D:sphinxbinsphinx.conf (note to modify the file name);
3. Modify D:sphinxbinsphinx.conf. Here are a few that need to be modified:
type = mysql # 数据源,我这里是mysql<br>sql_host = localhost # 数据库服务器<br>sql_user = root # 数据库用户名<br>sql_pass = '' # 数据库密码<br>sql_db = test # 数据库<br>sql_port = 3306 # 数据库端口sql_query_pre = SET NAMES utf8 # 去掉此行前面的注释,如果你的数据库是uft8编码的index test1<br>{<br># 放索引的目录<br> path = D:/sphinx/data/<br># 编码<br> charset_type = utf-8<br> # 指定utf-8的编码表<br> charset_table = 0..9, A..Z->a..z, _, a..z, U+410..U+42F->U+430..U+44F, U+430..U+44F<br> # 简单分词,只支持0和1,如果要搜索中文,请指定为1<br> ngram_len = 1<br># 需要分词的字符,如果要搜索中文,去掉前面的注释<br> ngram_chars = U+3000..U+2FA1F<br>}# index test1stemmed : test1<br># {<br> # path = @CONFDIR@/data/test1stemmed<br> # morphology = stem_en<br># }<br><br># 如果没有分布式索引,注释掉下面的内容<br><br># index dist1<br># {<br> # 'distributed' index type MUST be specified<br> # type = distributed# local index to be searched<br> # there can be many local indexes configured<br> # local = test1<br> # local = test1stemmed# remote agent<br> # multiple remote agents may be specified<br> # syntax is 'hostname:port:index1,[index2[,...]]<br> # agent = localhost:3313:remote1<br> # agent = localhost:3314:remote2,remote3# remote agent connection timeout, milliseconds<br> # optional, default is 1000 ms, ie. 1 sec<br> # agent_connect_timeout = 1000# remote agent query timeout, milliseconds<br> # optional, default is 3000 ms, ie. 3 sec<br> # agent_query_timeout = 3000<br># }# 搜索服务需要修改的部分<br>searchd<br>{<br> # 日志<br> log = D:/sphinx/log/searchd.log# PID file, searchd process ID file name<br> pid_file = D:/sphinx/log/searchd.pid# windows下启动searchd服务一定要注释掉这个<br> # seamless_rotate = 1<br>}
4. Import test data
C:Program FilesMySQLMySQL Server 5.0bin>mysql -uroot test 5. Create index D:sphinxbin>indexer.exe –all using config file ‘./sphinx.conf’… D:sphinxbin> 6. Search for ‘test’ and try D:sphinxbin>search.exe test using config file ‘./sphinx.conf’… displaying matches: words: Everyone has come out. 6. Test Chinese search Modify the documents data table in the test database, UPDATE `test`.`documents` SET `title` = 'Test Chinese', `content` = 'this is my test document number two, you should be able to find it' WHERE `documents`.`id` = 2 ; Rebuild index: D:sphinxbin>indexer.exe –all Try searching for ‘中文’: D:sphinxbin>search.exe Chinese using config file ‘./sphinx.conf’… words: It seems that it is not found. This is because the encoding in the windows command line is gbk, so of course it cannot be found. We can try it with a program, create a new file foo.php under D:sphinxapi, pay attention to utf-8 encoding
require ‘sphinxapi.php’; Start Sphinx searchd service D:sphinxbin>searchd.exe WARNING: forcing –console mode on Windows Execute PHP query: php d:/sphinx/api/foo.php Have the results come out? The remaining work is to read the manual and slowly explore the high-level configuration.
Sphinx 0.9.8-release (r1533)
Copyright (c) 2001-2008, Andrew Aksyonoff
indexing index ‘test1′…
collected 4 docs, 0.0 MB
sorted 0.0 Mhits, 100.0% done
total 4 docs, 193 bytes
total 0.101 sec, 1916.30 bytes/sec, 39.72 docs/sec
Sphinx 0.9.8-release (r1533)
Copyright (c) 2001-2008, Andrew Aksyonoff
index ‘test1′: query ‘test ‘: returned 3 matches of 3 total in 0.000 sec
1. document=1, weight=2, group_id=1, date_added=Wed Nov 26 14:58:59 2008
id=1
group_id=1
group_id2=5
date_added=2008-11-26 14:58:59
title=test one
content=this is my test document number one. also checking search within
phrases.
2. document=2, weight=2, group_id=1, date_added=Wed Nov 26 14:58:59 2008
id=2
group_id=1
group_id2=6
date_added=2008-11-26 14:58:59
title=test two
content=this is my test document number two
3. document=4, weight=1, group_id=2, date_added=Wed Nov 26 14:58:59 2008
id=4
group_id=2
group_id2=8
date_added=2008-11-26 14:58:59
title=doc number four
content=this is to test groups
1. ‘test’: 3 documents, 5 hits
D:sphinxbin>
Sphinx 0.9.8-release (r1533)
Copyright (c) 2001-2008, Andrew Aksyonoff
index ‘test1′: query ‘中文‘: returned 0 matches of 0 total in 0.000 sec
D:sphinxbin>
$s = new SphinxClient();
$s->SetServer(’localhost’,3312);
$result = $s->Query('中文');
var_dump($result);
?>
Sphinx 0.9.8-release (r1533)
Copyright (c) 2001-2008, Andrew Aksyonoff
using config file ‘./sphinx.conf’…
creating server socket on 0.0.0.0:3312
accepting connections
Articles you may be interested in

c盘的users是用户文件夹,主要存放用户的各项配置文件。users文件夹是windows系统的重要文件夹,不能随意删除;它保存了很多用户信息,一旦删除会造成数据丢失,严重的话会导致系统无法启动。

启动任务管理器的三个快捷键是:1、“Ctrl+Shift+Esc”,可直接打开任务管理器;2、“Ctrl+Alt+Delete”,会进入“安全选项”的锁定界面,选择“任务管理器”,即可以打开任务管理器;3、“Win+R”,会打开“运行”窗口,输入“taskmgr”命令,点击“确定”即可调出任务管理器。

PIN码是Windows系统为了方便用户本地登录而独立于window账户密码的快捷登录密码,是Windows系统新添加的一套本地密码策略;在用户登陆了Microsoft账户后就可以设置PIN来代替账户密码,不仅提高安全性,而且也可以让很多和账户相关的操作变得更加方便。PIN码只能通过本机登录,无法远程使用,所以不用担心PIN码被盗。

如何使用PHP扩展Sphinx进行全文搜索全文搜索是现代Web应用程序中的常见需求之一。为了满足用户对数据的高效查询和检索,我们可以使用Sphinx这个功能强大的开源搜索引擎来实现全文搜索功能。Sphinx使用C++编写,提供了PHP的扩展,方便我们在PHP项目中使用。本文将介绍如何使用PHP扩展Sphinx进行全文搜索

win10自带的onenote是UWP版本;onenote是一套用于自由形式的信息获取以及多用户协作工具,而UWP版本是“Universal Windows Platform”的简称,表示windows通用应用平台,不是为特定的终端设计的,而是针对使用windows系统的各种平台。

因为win10系统是不自带扫雷游戏的,需要用户自行手动安装。安装步骤:1、点击打开“开始菜单”;2、在打开的菜单中,找到“Microsoft Store”应用商店,并点击进入;3、在应用商店主页的搜索框中,搜索“minesweeper”;4、在搜索结果中,点击选择需要下载的“扫雷”游戏;5、点击“获取”按钮,等待获取完毕后自动完成安装游戏即可。

在windows中鼠标指针呈四箭头时一般表示选中对象可以上、下、左、右移动。在Windows中鼠标指针首次用不同的指针来表示不同的状态,如系统忙、移动中、拖放中;在Windows中使用的鼠标指针文件还被称为“光标文件”或“动态光标文件”。

方法:1、在电脑桌面上,右击“计算机”,选择“属性”;2、在“系统”界面的“windows 版本”区域即可查看当前系统版本。2、使用“Win+R”快捷键,打开“运行”窗口,输入“winver”回车,在弹出的对话框中即可查看当前系统版本信息。


Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

AI Hentai Generator
Generate AI Hentai for free.

Hot Article

Hot Tools

Dreamweaver Mac version
Visual web development tools

MinGW - Minimalist GNU for Windows
This project is in the process of being migrated to osdn.net/projects/mingw, you can continue to follow us there. MinGW: A native Windows port of the GNU Compiler Collection (GCC), freely distributable import libraries and header files for building native Windows applications; includes extensions to the MSVC runtime to support C99 functionality. All MinGW software can run on 64-bit Windows platforms.

MantisBT
Mantis is an easy-to-deploy web-based defect tracking tool designed to aid in product defect tracking. It requires PHP, MySQL and a web server. Check out our demo and hosting services.

Atom editor mac version download
The most popular open source editor

Notepad++7.3.1
Easy-to-use and free code editor
