Home  >  Article  >  Backend Development  >  Detailed explanation of the structural techniques of implementing dictionary tree Trie using Python code

Detailed explanation of the structural techniques of implementing dictionary tree Trie using Python code

高洛峰
高洛峰Original
2017-03-03 15:48:282072browse

Dictionary tree (Trie) can save some string->value correspondences. Basically, it has the same function as Java's HashMap, which is key-value mapping, except that the key of Trie can only be a string.

The power of Trie lies in its time complexity. Its insertion and query time complexity are both O(k), where k is the length of key, regardless of how many elements are saved in Trie. The Hash table is claimed to be O(1), but when calculating hash, it will definitely be O(k), and there are also problems such as collisions; the disadvantage of Trie is that it consumes very high space.
As for the implementation of Trie tree, you can use arrays or dynamically allocate pointers. When I did the question, I used arrays and statically allocated space for convenience.
Trie tree, also known as word search tree or key tree, is a tree structure and a variant of hash tree. Typical applications are for counting and sorting a large number of strings (but not limited to strings), so they are often used by search engine systems for text word frequency statistics. Its advantages are: it minimizes unnecessary string comparisons and has higher query efficiency than hash tables.
The core idea of ​​Trie is to exchange space for time. Use the common prefix of strings to reduce query time overhead to improve efficiency.
Each word in the Trie tree is stored through the character by character method, and words with the same prefix share prefix nodes.
As you can see, each path forms a word. The tree above stores to/tea/ The words ted/ten/inn.

The basic properties of the Trie tree can be summarized as:
(1) The root node does not contain characters. Except for the root node, each node only contains one character.
(2) From the root node to a certain node, the characters passing on the path are connected to form the string corresponding to the node.
(3) All child nodes of each node contain different strings.

Properties
(1) The root node does not contain characters, and each node except the root node contains only one character.
(2) From the root node to a certain node, the characters passing on the path are connected to form the string corresponding to the node.
(3) All child nodes of each node contain different strings.

Basic idea (taking letter tree as an example):
1. Insertion process
For a word, start from the root and follow the tree corresponding to each letter of the word The node branches in go downward until the word is traversed, and the last node is marked red, indicating that the word has been inserted into the Trie tree.
2. Query process
Similarly, traverse the trie tree downwards in alphabetical order of words starting from the root. Once it is found that a node mark does not exist or the word traversal is completed but the last node is not If it is marked in red, it means that the word does not exist. If the last node is marked in red, it means that the word exists.

Application
(1)Word frequency statistics
Save space than using hash directly
(2)Search prompt
Input prefix When prompted, the words that can be formed
(3) are used as auxiliary structures
such as suffix trees, AC automata, etc. Auxiliary structures

are implemented
Although Python does not have pointers, nested dictionaries can be used to implement tree structures. For non-ascii words, Unicode encoding is used for insertion and search.

#coding=utf-8 
class Trie: 
  root = {} 
  END = '/' 
  def add(self, word): 
    #从根节点遍历单词,char by char,如果不存在则新增,最后加上一个单词结束标志 
    node = self.root 
    for c in word: 
      node=node.setdefault(c,{}) 
    node[self.END] = None 
 
  def find(self, word): 
    node = self.root 
    for c in word: 
      if c not in node: 
        return False 
      node = node[c] 
    return self.END in node

For more detailed explanations, use Python Please pay attention to the PHP Chinese website for related articles on the structural techniques of code implementation of dictionary tree Trie!


Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn