Home  >  Article  >  Backend Development  >  PHP code for Chinese word segmentation_PHP tutorial

PHP code for Chinese word segmentation_PHP tutorial

WBOY
WBOYOriginal
2016-07-20 11:10:01776browse

I have used the dedecms word segmentation function before, but after testing it was still not ideal. After some processing, the results were still acceptable. Today I saw this word segmentation method again and showed it to everyone.

class NLP{
private static $cmd_path;
// does not end with '/'
static function set_cmd_path($path){
self::$cmd_path = $path;
}
private function cmd($str){
$descriptorspec = array(
0 => array("pipe", "r "),
1 => array("pipe", "w"),
);
$cmd = self::$cmd_path . "/ictclas";
$process = proc_open ($cmd, $descriptorspec, $pipes);
if (is_resource($process)) {
$str = iconv('utf-8', 'gbk', $str);
fwrite( $pipes[0], $str);
$output = stream_get_contents($pipes[1]);
fclose($pipes[0]);
fclose($pipes[1]);
$return_value = proc_close($process);
}
/*
$cmd = "printf '$input' | " . self::$cmd_path . "/ictclas";
exec ($cmd, $output, $ret);
$output = join("n", $output);
*/
$output = trim($output);
$output = iconv('gbk', 'utf-8', $output);
return $output;
}
/**
* Perform word segmentation and return a word list.
*/
function tokenize($str){
$tokens = array();
$output = self::cmd($input);
if($output){
$ps tutorial = preg_split('/s+/', $output) ;
foreach($ps as $p){
list($seg, $tag) = explode('/', $p);
$item = array(
'seg' = > $seg,
'tag' => $tag,
);
$tokens[] = $item;
}
}
return $tokens;
}
}
NLP::set_cmd_path(dirname(__FILE__));
?>

It is very simple to use (make sure the ICTCLAS compiled executable and dictionary are in the current directory ):
Copy the code as follows:
require_once('NLP.php');
var_dump(NLP::tokenize('Hello, World!'));
?>

The PHP class for Chinese word segmentation is below. Use the proc_open() function to execute the word segmentation program and interact with it through the pipeline. Enter the text to be segmented and read. Word segmentation results.


www.bkjia.comtruehttp: //www.bkjia.com/PHPjc/444764.htmlTechArticleI have used dedecms word segmentation function before. After testing, it was still not ideal. After some processing, the results were still acceptable. , today I saw this word segmentation method again, and I will show it to everyone...
Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn