search
HomeBackend DevelopmentPHP Tutorial最近要做一个全文搜索功能,不知道这块思路和技术这块怎么样?

最近要做全文搜索功能,用户输入框输入关键字,可以搜索到匹配该关键字的文章。
支持对文章内容匹配和文章标题匹配。想问下实现起来复杂么?
有哪些比较好的解决方案?

开发语言php,数据库mysql

回复内容:

最近要做全文搜索功能,用户输入框输入关键字,可以搜索到匹配该关键字的文章。
支持对文章内容匹配和文章标题匹配。想问下实现起来复杂么?
有哪些比较好的解决方案?

开发语言php,数据库mysql

给楼主一个选择方案:http://www.xunsearch.com/site/usercase
也是开源的,同时也提供商业服务,如果时间充裕的话,可以考虑自主开发,否则选择开源方案吧,而且社区比较活跃的

sphinx的中文分词版coreseek。
http://www.coreseek.cn/

我觉得 elasticsearch 还是不错的,java写的 就是一个搜搜引擎 而且是分布式的 也可以做日志搜索

  1. 数据库实现的话可扩展性不高。数据量大起来了,性能会下降。

  2. 开源方案有很多,如lucene,需求简单的话写来来也很快。也可以使用基于lucene的solr(http://lucene.apache.org/solr/)


最最最方便,扩展性强的,建议使用阿里与的opensearch,简直太简单方便了。

开源中文搜索引擎XunSearch:
http://www.cloud-sun.com/view/product
http://www.xunsearch.com/doc/php/guide/start.installation
1.性能劲爆:XunSearch单库最多支持40亿条数据,在5亿网页大约1.5TB的数据中检索时间不超过1秒(非缓存).
2.简单易用:前端是使用脚本语言PHP编写的开发工具包.API简单清晰,开发难度极低,提供全中文的示例代码,文档,辅助脚本工具等.
3.功能丰富:除支持基础的自定义分词,字段检索,布尔搜索外,还直接支持用户急需的相关搜索,拼音搜索,搜索建议等专业功能.
XunSearch作者同时是中文分词SCWS(提供有PECL扩展和纯PHP实现以及完整中文词典)的作者马明练hightman.
http://www.xunsearch.com/scws/index.php
PHP驱动的segmentfault.com的站内搜索用的就是XunSearch.

XunSearch搜索建议和纠错(比如拼音搜索):
http://www.xunsearch.com/doc/php/guide/search.fix

或者你可以利用MySQL InnoDB/MyISAM内置的FullText全文索引字段类型,用PECL SCWS对文件内容和标题字段分词后存入一个FullText的分词字段比如article_fc text,FULLTEXT (article_fc),然后用户输入时用PECL SCWS分词后再用MATCH AGAINST语句进行全文搜索:

<code>SELECT * FROM articles WHERE MATCH(article_fc) AGAINST('word1 word2');</code>

article_fc字段所在的表也可以和标题正文所在的文章表分开,查出后连接文章表读出标题正文即可.甚至可以用SQLite里建一个分词表,把分词内容都存到SQLite中,减轻MySQL压力.因为SQLite也是支持全文检索的,而且全文检索是一个读操作,SQLite的读性能是非常好的.

更简单粗暴的是,既不依赖PHP SCWS分词,也不依赖MySQL(InnoDB/MyISAM)/SQLite/XunSearch全文检索,直接提示用户分开关键词输入,然后用SQL LIKE进行模糊查询,数据量不大的情况下是可行且简单的方案:

<code>SELECT * FROM articles WHERE content LIKE '%word1%' OR content LIKE '%word2%';
SELECT * FROM articles WHERE content REGEXP 'word1|word2';</code>

Solr,apache的一个项目

Statement
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
How can you check if a PHP session has already started?How can you check if a PHP session has already started?Apr 30, 2025 am 12:20 AM

In PHP, you can use session_status() or session_id() to check whether the session has started. 1) Use the session_status() function. If PHP_SESSION_ACTIVE is returned, the session has been started. 2) Use the session_id() function, if a non-empty string is returned, the session has been started. Both methods can effectively check the session state, and choosing which method to use depends on the PHP version and personal preferences.

Describe a scenario where using sessions is essential in a web application.Describe a scenario where using sessions is essential in a web application.Apr 30, 2025 am 12:16 AM

Sessionsarevitalinwebapplications,especiallyfore-commerceplatforms.Theymaintainuserdataacrossrequests,crucialforshoppingcarts,authentication,andpersonalization.InFlask,sessionscanbeimplementedusingsimplecodetomanageuserloginsanddatapersistence.

How can you manage concurrent session access in PHP?How can you manage concurrent session access in PHP?Apr 30, 2025 am 12:11 AM

Managing concurrent session access in PHP can be done by the following methods: 1. Use the database to store session data, 2. Use Redis or Memcached, 3. Implement a session locking strategy. These methods help ensure data consistency and improve concurrency performance.

What are the limitations of using PHP sessions?What are the limitations of using PHP sessions?Apr 30, 2025 am 12:04 AM

PHPsessionshaveseverallimitations:1)Storageconstraintscanleadtoperformanceissues;2)Securityvulnerabilitieslikesessionfixationattacksexist;3)Scalabilityischallengingduetoserver-specificstorage;4)Sessionexpirationmanagementcanbeproblematic;5)Datapersis

Explain how load balancing affects session management and how to address it.Explain how load balancing affects session management and how to address it.Apr 29, 2025 am 12:42 AM

Load balancing affects session management, but can be resolved with session replication, session stickiness, and centralized session storage. 1. Session Replication Copy session data between servers. 2. Session stickiness directs user requests to the same server. 3. Centralized session storage uses independent servers such as Redis to store session data to ensure data sharing.

Explain the concept of session locking.Explain the concept of session locking.Apr 29, 2025 am 12:39 AM

Sessionlockingisatechniqueusedtoensureauser'ssessionremainsexclusivetooneuseratatime.Itiscrucialforpreventingdatacorruptionandsecuritybreachesinmulti-userapplications.Sessionlockingisimplementedusingserver-sidelockingmechanisms,suchasReentrantLockinJ

Are there any alternatives to PHP sessions?Are there any alternatives to PHP sessions?Apr 29, 2025 am 12:36 AM

Alternatives to PHP sessions include Cookies, Token-based Authentication, Database-based Sessions, and Redis/Memcached. 1.Cookies manage sessions by storing data on the client, which is simple but low in security. 2.Token-based Authentication uses tokens to verify users, which is highly secure but requires additional logic. 3.Database-basedSessions stores data in the database, which has good scalability but may affect performance. 4. Redis/Memcached uses distributed cache to improve performance and scalability, but requires additional matching

Define the term 'session hijacking' in the context of PHP.Define the term 'session hijacking' in the context of PHP.Apr 29, 2025 am 12:33 AM

Sessionhijacking refers to an attacker impersonating a user by obtaining the user's sessionID. Prevention methods include: 1) encrypting communication using HTTPS; 2) verifying the source of the sessionID; 3) using a secure sessionID generation algorithm; 4) regularly updating the sessionID.

See all articles

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Tools

SAP NetWeaver Server Adapter for Eclipse

SAP NetWeaver Server Adapter for Eclipse

Integrate Eclipse with SAP NetWeaver application server.

Atom editor mac version download

Atom editor mac version download

The most popular open source editor

SecLists

SecLists

SecLists is the ultimate security tester's companion. It is a collection of various types of lists that are frequently used during security assessments, all in one place. SecLists helps make security testing more efficient and productive by conveniently providing all the lists a security tester might need. List types include usernames, passwords, URLs, fuzzing payloads, sensitive data patterns, web shells, and more. The tester can simply pull this repository onto a new test machine and he will have access to every type of list he needs.

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

EditPlus Chinese cracked version

EditPlus Chinese cracked version

Small size, syntax highlighting, does not support code prompt function