Home  >  Article  >  Backend Development  >  Simple Chinese word segmentation code made in PHP_PHP tutorial

Simple Chinese word segmentation code made in PHP_PHP tutorial

WBOY
WBOYOriginal
2016-07-20 11:09:401064browse

For Chinese search engines, Chinese word segmentation is one of the most basic parts of the entire system, because the current Chinese search algorithm based on single characters is not very good. Of course, this article is not to do research on Chinese search engines, but to share what if Use PHP to build an on-site search engine. This article is an article in this system

The PHP class for Chinese word segmentation is below. Use the proc_open() function to execute the word segmentation program and interact with it through pipes. Enter the text to be segmented and read the segmentation results.

class NLP{
private static $cmd_path;

// Not '/' ends with
static function set_cmd_path($path){
self::$cmd_path = $path;
}

private function cmd($str){
$descriptorspec = array(
                             ); self::$cmd_path . "/ictclas";
$process = proc_open($cmd, $descriptorspec, $pipes);

if (is_resource($process)) {
$str = iconv('utf-8', 'gbk', $str);

      fwrite($pipes[0], $str); 🎜> fclose($pipes[0]);

fclose($pipes[1]);

$return_value = proc_close($process);
}

/*
        $cmd = "printf '$input' | " . self::$cmd_path . "/ictclas";

                                                                                                                                               $cmd =                           $cmd = " n", $output);

*/

$output = trim($output);

$output = iconv('gbk', 'utf-8', $output);

            return $output;
$output = self::cmd($input);
if($output){

$ps tutorial = preg_split('/s+/', $output);

foreach($ps as $p){
list($seg, $tag) = explode('/', $p);

$item = array(

'seg' => $seg,
'tag' => $tag,

                                                                             return $tokens;

}
}
NLP::set_cmd_path(dirname(__FILE__));
?>
It is very simple to use (make sure the ICTCLAS compiled executable file and dictionary are in the current directory):

require_once('NLP.php');
var_dump(NLP::tokenize('Hello, world!'));
?>



Webmaster experience,

If you want to achieve word segmentation for search engines, you need a powerful vocabulary library and a more intelligent Chinese Pinyin, writing, habits and other functional libraries.

www.bkjia.comtruehttp: //www.bkjia.com/PHPjc/444776.htmlTechArticleFor Chinese search engines, Chinese word segmentation is one of the most basic parts of the entire system, because currently it is based on single-character Chinese The search algorithm is not very good. Of course, this article is not to guide Chinese search...
Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn