Home  >  Article  >  Backend Development  >  PHP simple Chinese word segmentation system (1/2)_PHP tutorial

PHP simple Chinese word segmentation system (1/2)_PHP tutorial

WBOY
WBOYOriginal
2016-07-20 11:08:30971browse

PHP simple Chinese word segmentation system structure: first word hash table, Trie index tree node advantages: in word segmentation, there is no need to predict the length of the word to be queried, and it is matched word by word along the tree chain. Disadvantages: The construction and maintenance are complicated, there are many word branches, and a certain amount of space is wasted

PHP tutorial simple Chinese word segmentation system

Structure: first word hash table, trie index tree node
Advantages: word segmentation , there is no need to predict the length of the query word, and it is matched word by word along the tree chain.
Disadvantages: The construction and maintenance are complicated, there are many word branches, and a certain amount of space is wasted
* @version 0.1
* @todo constructed a general dictionary algorithm and wrote a simple word segmentation
* @author shjuto@gmail.com
* trie dictionary tree
*
*/

class trie
{
private $trie;

Function __construct()
{
$trie = array('children' => array(),'isword'=>false);
}

/**
*/ FUNCTION & Setword ($ word = '')
{
$ TrieNode = & $ this- & gt; Trie;
for ($ i = 0; $ i & lt; strlen (STRLEN $word);$i++)
                                                                                                                                                                                                                                                   through 🎜>                                                                                                             ​​​If($i == strlen($word)-1)
                                                        
                                                                                                                                  $trienode = &$trienode['children'][$character];
           }
                                                                                                                                                                               

        /**
* Determine whether it is a dictionary word
*
* @param string $word
* @return bool true/false
*/
        function & isword($word)
        {
                $trienode = &$this->trie;
                for($i = 0;$i < strlen($word);$i++)
                {
                        $character = $word[$i];
                        if(!isset($trienode['children'][$character]))
                        {
                                return false;
                        }
                        else
                        {
                                //判断词结束
                                if($i == (strlen($word)-1) && $trienode['children'][$character]['isword'] == true)
                                {
                                        return true;
                                }
                                elseif($i == (strlen($word)-1) && $trienode['children'][$character]['isword'] == false)
                                {
                                        return false;
                                }
                                $trienode = &$trienode['children'][$character];       
                        }
                }
        }


                                                                                                                                                               tree = $this->trie;
                            $find = array(); In AAB, you need to go back to
$ word = '';
for ($ i = 0; $ i & lt; $ TextLen; $ i ++)
{                        if(isset($trienode['children'][$text[$i]]))
                        {
                                $word = $word .$text[$i];
                                $trienode = $trienode['children'][$text[$i]];
                                if($prenode == false)
                                {
                                        $wordrootposition = $i;
                                }
                                $prenode = true;
                                if($trienode['isword'])
                                {
                                        $find[] = array('position'=>$wordrootposition,'word' =>$word);
                                }
                        }
                        else
                        {
                                $trienode = $tree;
                                $word = '';
                                if($prenode)
                                {
                                        $i = $i -1;
                                        $prenode = false;
                                }
                        }
                }
Return $find;
}
}

1 2

www.bkjia.comtruehttp: //www.bkjia.com/PHPjc/444871.htmlTechArticlephp simple Chinese word segmentation system structure: first word hash table, Trie index tree node advantages: in word segmentation, no need Predict the length of the query word and match it word by word along the tree chain. Disadvantages: Construction and maintenance comparison...
Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn