search
HomeBackend DevelopmentPHP TutorialPHP中文分词的简单实现代码分享_php技巧

当然, 本文不是要对中文搜索引擎做研究, 而是分享如果用 PHP 做一个站内搜索引擎。 本文是这个系统中的一篇。
我使用的分词工具是中科院计算所的开源版本的 ICTCLAS。 另外还有开源的 Bamboo, 我随后也会对该工具进行调研。
从 ICTCLAS 出发是个不错的选择, 因为其算法传播比较广泛, 有公开的学术文档, 并且编译简单, 库依赖少。 但目前只提供了 C/C++, Java 和 C# 版本的代码, 并没有 PHP 版本的代码。 怎么办呢? 也许可以学习它的 C/C++ 源码和学术文档中, 然后再开发一个 PHP 版本出来。 不过, 我要使用进程间通信, 在 PHP 代码里调用 C/C++ 版本的可执行文件。
下载源码解压后, 在有 C++ 开发库和编译环境的机器上直接 make ictclas 即可。 它的 Makefile 脚本有个错误, 执行测试的代码没有加上'。/', 当然不能像 Windows 下执行成功了。 但也不影响编译结果。
进行中文分词的 PHP 类就在下面了, 用 proc_open() 函数来执行分词程序, 并通过管道和其交互, 输入要进行分词的文本, 读取分词结果。

复制代码 代码如下:

class NLP{
private static $cmd_path;
// 不以'/'结尾
static function set_cmd_path($path){
self::$cmd_path = $path;
}
private function cmd($str){
$descriptorspec = array(
0 => array("pipe", "r"),
1 => array("pipe", "w"),
);
$cmd = self::$cmd_path . "/ictclas";
$process = proc_open($cmd, $descriptorspec, $pipes);
if (is_resource($process)) {
$str = iconv('utf-8', 'gbk', $str);
fwrite($pipes[0], $str);
$output = stream_get_contents($pipes[1]);
fclose($pipes[0]);
fclose($pipes[1]);
$return_value = proc_close($process);
}
/*
$cmd = "printf '$input' | " . self::$cmd_path . "/ictclas";
exec($cmd, $output, $ret);
$output = join("\n", $output);
*/
$output = trim($output);
$output = iconv('gbk', 'utf-8', $output);
return $output;
}
/**
* 进行分词, 返回词语列表.
*/
function tokenize($str){
$tokens = array();
$output = self::cmd($input);
if($output){
$ps = preg_split('/\s+/', $output);
foreach($ps as $p){
list($seg, $tag) = explode('/', $p);
$item = array(
'seg' => $seg,
'tag' => $tag,
);
$tokens[] = $item;
}
}
return $tokens;
}
}
NLP::set_cmd_path(dirname(__FILE__));
?>

使用起来很简单(确保 ICTCLAS 编译后的可执行文件和词典在当前目录):
复制代码 代码如下:

require_once('NLP.php');
var_dump(NLP::tokenize('Hello, World!'));
?>
Statement
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
PHP Email: Step-by-Step Sending GuidePHP Email: Step-by-Step Sending GuideMay 09, 2025 am 12:14 AM

PHPisusedforsendingemailsduetoitsintegrationwithservermailservicesandexternalSMTPproviders,automatingnotificationsandmarketingcampaigns.1)SetupyourPHPenvironmentwithawebserverandPHP,ensuringthemailfunctionisenabled.2)UseabasicscriptwithPHP'smailfunct

How to Send Email via PHP: Examples & CodeHow to Send Email via PHP: Examples & CodeMay 09, 2025 am 12:13 AM

The best way to send emails is to use the PHPMailer library. 1) Using the mail() function is simple but unreliable, which may cause emails to enter spam or cannot be delivered. 2) PHPMailer provides better control and reliability, and supports HTML mail, attachments and SMTP authentication. 3) Make sure SMTP settings are configured correctly and encryption (such as STARTTLS or SSL/TLS) is used to enhance security. 4) For large amounts of emails, consider using a mail queue system to optimize performance.

Advanced PHP Email: Custom Headers & FeaturesAdvanced PHP Email: Custom Headers & FeaturesMay 09, 2025 am 12:13 AM

CustomheadersandadvancedfeaturesinPHPemailenhancefunctionalityandreliability.1)Customheadersaddmetadatafortrackingandcategorization.2)HTMLemailsallowformattingandinteractivity.3)AttachmentscanbesentusinglibrarieslikePHPMailer.4)SMTPauthenticationimpr

Guide to Sending Emails with PHP & SMTPGuide to Sending Emails with PHP & SMTPMay 09, 2025 am 12:06 AM

Sending mail using PHP and SMTP can be achieved through the PHPMailer library. 1) Install and configure PHPMailer, 2) Set SMTP server details, 3) Define the email content, 4) Send emails and handle errors. Use this method to ensure the reliability and security of emails.

What is the best way to send an email using PHP?What is the best way to send an email using PHP?May 08, 2025 am 12:21 AM

ThebestapproachforsendingemailsinPHPisusingthePHPMailerlibraryduetoitsreliability,featurerichness,andeaseofuse.PHPMailersupportsSMTP,providesdetailederrorhandling,allowssendingHTMLandplaintextemails,supportsattachments,andenhancessecurity.Foroptimalu

Best Practices for Dependency Injection in PHPBest Practices for Dependency Injection in PHPMay 08, 2025 am 12:21 AM

The reason for using Dependency Injection (DI) is that it promotes loose coupling, testability, and maintainability of the code. 1) Use constructor to inject dependencies, 2) Avoid using service locators, 3) Use dependency injection containers to manage dependencies, 4) Improve testability through injecting dependencies, 5) Avoid over-injection dependencies, 6) Consider the impact of DI on performance.

PHP performance tuning tips and tricksPHP performance tuning tips and tricksMay 08, 2025 am 12:20 AM

PHPperformancetuningiscrucialbecauseitenhancesspeedandefficiency,whicharevitalforwebapplications.1)CachingwithAPCureducesdatabaseloadandimprovesresponsetimes.2)Optimizingdatabasequeriesbyselectingnecessarycolumnsandusingindexingspeedsupdataretrieval.

PHP Email Security: Best Practices for Sending EmailsPHP Email Security: Best Practices for Sending EmailsMay 08, 2025 am 12:16 AM

ThebestpracticesforsendingemailssecurelyinPHPinclude:1)UsingsecureconfigurationswithSMTPandSTARTTLSencryption,2)Validatingandsanitizinginputstopreventinjectionattacks,3)EncryptingsensitivedatawithinemailsusingOpenSSL,4)Properlyhandlingemailheaderstoa

See all articles

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Tools

Safe Exam Browser

Safe Exam Browser

Safe Exam Browser is a secure browser environment for taking online exams securely. This software turns any computer into a secure workstation. It controls access to any utility and prevents students from using unauthorized resources.

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

mPDF

mPDF

mPDF is a PHP library that can generate PDF files from UTF-8 encoded HTML. The original author, Ian Back, wrote mPDF to output PDF files "on the fly" from his website and handle different languages. It is slower than original scripts like HTML2FPDF and produces larger files when using Unicode fonts, but supports CSS styles etc. and has a lot of enhancements. Supports almost all languages, including RTL (Arabic and Hebrew) and CJK (Chinese, Japanese and Korean). Supports nested block-level elements (such as P, DIV),

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

WebStorm Mac version

WebStorm Mac version

Useful JavaScript development tools