Home > Article > Backend Development > PHP simple Chinese word segmentation system (1/2)_PHP tutorial
PHP simple Chinese word segmentation system structure: first word hash table, Trie index tree node advantages: in word segmentation, there is no need to predict the length of the word to be queried, and it is matched word by word along the tree chain. Disadvantages: The construction and maintenance are complicated, there are many word branches, and a certain amount of space is wasted
PHP tutorial simple Chinese word segmentation system
Structure: first word hash table, trie index tree node
Advantages: word segmentation , there is no need to predict the length of the query word, and it is matched word by word along the tree chain.
Disadvantages: The construction and maintenance are complicated, there are many word branches, and a certain amount of space is wasted
* @version 0.1
* @todo constructed a general dictionary algorithm and wrote a simple word segmentation
* @author shjuto@gmail.com
* trie dictionary tree
*
*/
class trie
{
private $trie;Function __construct()
{
$trie = array('children' => array(),'isword'=>false);
}/**
*/ FUNCTION & Setword ($ word = '')
{
$ TrieNode = & $ this- & gt; Trie;
for ($ i = 0; $ i & lt; strlen (STRLEN $word);$i++)
through 🎜> If($i == strlen($word)-1)
$trienode = &$trienode['children'][$character];
}
/**
* Determine whether it is a dictionary word
*
* @param string $word
* @return bool true/false
*/
function & isword($word)
{
$trienode = &$this->trie;
for($i = 0;$i < strlen($word);$i++)
{
$character = $word[$i];
if(!isset($trienode['children'][$character]))
{
return false;
}
else
{
//判断词结束
if($i == (strlen($word)-1) && $trienode['children'][$character]['isword'] == true)
{
return true;
}
elseif($i == (strlen($word)-1) && $trienode['children'][$character]['isword'] == false)
{
return false;
}
$trienode = &$trienode['children'][$character];
}
}
}
tree = $this->trie;
$find = array(); In AAB, you need to go back to
$ word = '';
for ($ i = 0; $ i & lt; $ TextLen; $ i ++)
{ if(isset($trienode['children'][$text[$i]]))
{
$word = $word .$text[$i];
$trienode = $trienode['children'][$text[$i]];
if($prenode == false)
{
$wordrootposition = $i;
}
$prenode = true;
if($trienode['isword'])
{
$find[] = array('position'=>$wordrootposition,'word' =>$word);
}
}
else
{
$trienode = $tree;
$word = '';
if($prenode)
{
$i = $i -1;
$prenode = false;
}
}
}
Return $find;
}
}
1 2