Home >Backend Development >PHP Tutorial >Simple code sharing for Chinese word segmentation in PHP_PHP tutorial
Of course, this article is not to do research on Chinese search engines, but to share how to use PHP to build an on-site search engine. This article is an article in this system.
The word segmentation tool I use is the open source version of ICTCLAS from the Institute of Computing Technology, Chinese Academy of Sciences. There is also the open source Bamboo, which I will also investigate later.
It is a good choice to start from ICTCLAS, because its algorithm is widely spread, has public academic documents, is easy to compile, and has few library dependencies. But currently only C/C++, Java and C# versions of the code are provided, and there is no PHP version of the code. What should we do? Maybe we can study its C/C++ source code and academic documents, and then develop a PHP version. However, I want to use inter-process communication to call the C/C++ version of the executable file from the PHP code.
After downloading and decompressing the source code, directly make ictclas on a machine with C++ development library and compilation environment. There is an error in its Makefile script, and the code that executes the test does not add '. /', of course it cannot be executed successfully like under Windows. But it does not affect the compilation results.
The PHP class for Chinese word segmentation is below. Use the proc_open() function to execute the word segmentation program, interact with it through the pipeline, input the text to be segmented, and read the word segmentation results.