Home  >  Article  >  Web Front-end  >  Detailed interpretation of trie prefix tree in JavaScript

Detailed interpretation of trie prefix tree in JavaScript

亚连
亚连Original
2018-06-09 10:21:221682browse

This article mainly introduces the example of javascript trie word search tree, and introduces the concept and implementation of trie in detail. It has certain reference value. Interested friends can refer to it

Introduction

Trie tree (from the word retrieval), also known as prefix word, word search tree, dictionary tree, is a tree structure, a hash tree A variant of , a multi-tree structure used for fast retrieval.

Its advantage is: it minimizes unnecessary string comparisons and has higher query efficiency than hash tables.

The core idea of ​​Trie is to exchange space for time. Use the common prefix of strings to reduce query time overhead to improve efficiency.

Trie tree also has its shortcomings. Assuming that we only process letters and numbers, then each node has at least 52+10 child nodes. To save memory, we can use linked lists or arrays. In JS, we use arrays directly, because JS arrays are dynamic and come with built-in optimization.

Basic properties

  1. The root node does not contain characters, except for the root node Each child node of contains a character

  2. from the root node to a certain node. The characters passing on the path are connected to form the string corresponding to the node

  3. All sub-nodes of each node contain different characters

Program implementation

// by 司徒正美
class Trie {
 constructor() {
  this.root = new TrieNode();
 }
 isValid(str) {
  return /^[a-z1-9]+$/i.test(str);
 }
 insert(word) {
  // addWord
  if (this.isValid(word)) {
   var cur = this.root;
   for (var i = 0; i < word.length; i++) {
    var c = word.charCodeAt(i);
    c -= 48; //减少”0“的charCode
    var node = cur.son[c];
    if (node == null) {
     var node = (cur.son[c] = new TrieNode());
     node.value = word.charAt(i);
     node.numPass = 1; //有N个字符串经过它
    } else {
     node.numPass++;
    }
    cur = node;
   }
   cur.isEnd = true; //樯记有字符串到此节点已经结束
   cur.numEnd++; //这个字符串重复次数

   return true;
  } else {
   return false;
  }
 }
 remove(word){
   if (this.isValid(word)) {
     var cur = this.root;
     var array = [], n = word.length
     for (var i = 0; i < n; i++) {
       var c = word.charCodeAt(i);
       c = this.getIndex(c)
       var node = cur.son[c];
       if(node){
         array.push(node)
         cur = node
       }else{
         return false
       }
 
     }
     if(array.length === n){
       array.forEach(function(){
         el.numPass--
       })
       cur.numEnd --
       if( cur.numEnd == 0){
         cur.isEnd = false
       } 
     }
   }else{
     return false
   }
 }
 preTraversal(cb){//先序遍历
    function preTraversalImpl(root, str, cb){ 
      cb(root, str);
      for(let i = 0,n = root.son.length; i < n; i ++){
        let node = root.son[i];
        if(node){
          preTraversalImpl(node, str + node.value, cb);
        }
      }
    } 
    preTraversalImpl(this.root, "", cb);
  }
 // 在字典树中查找是否存在某字符串为前缀开头的字符串(包括前缀字符串本身)
 isContainPrefix(word) {
  if (this.isValid(word)) {
   var cur = this.root;
   for (var i = 0; i < word.length; i++) {
    var c = word.charCodeAt(i);
    c -= 48; //减少”0“的charCode
    if (cur.son[c]) {
     cur = cur.son[c];
    } else {
     return false;
    }
   }
   return true;
  } else {
   return false;
  }
 }
 isContainWord(str) {
  // 在字典树中查找是否存在某字符串(不为前缀)
  if (this.isValid(word)) {
   var cur = this.root;
   for (var i = 0; i < word.length; i++) {
    var c = word.charCodeAt(i);
    c -= 48; //减少”0“的charCode
    if (cur.son[c]) {
     cur = cur.son[c];
    } else {
     return false;
    }
   }
   return cur.isEnd;
  } else {
   return false;
  }
 }
 countPrefix(word) {
  // 统计以指定字符串为前缀的字符串数量
  if (this.isValid(word)) {
   var cur = this.root;
   for (var i = 0; i < word.length; i++) {
    var c = word.charCodeAt(i);
    c -= 48; //减少”0“的charCode
    if (cur.son[c]) {
     cur = cur.son[c];
    } else {
     return 0;
    }
   }
   return cur.numPass;
  } else {
   return 0;
  }
 }
 countWord(word) {
  // 统计某字符串出现的次数方法
  if (this.isValid(word)) {
   var cur = this.root;
   for (var i = 0; i < word.length; i++) {
    var c = word.charCodeAt(i);
    c -= 48; //减少”0“的charCode
    if (cur.son[c]) {
     cur = cur.son[c];
    } else {
     return 0;
    }
   }
   return cur.numEnd;
  } else {
   return 0;
  }
 }
}

class TrieNode {
 constructor() {
  this.numPass = 0;//有多少个单词经过这节点
  this.numEnd = 0; //有多少个单词就此结束
  this.son = [];
  this.value = ""; //value为单个字符
  this.isEnd = false;
 }
}

Let’s focus on the insert method of TrieNode and Trie. Since the dictionary tree is mainly used for word frequency statistics, it has many node attributes, including numPass, numEnd but very important attributes.

The insert method is used to insert repeated words. Before starting, we must determine whether the word is legal and special characters and blanks cannot appear. When inserting, characters are broken up and placed into each node. numPass must be modified every time a node is passed.

Optimization

Now in each of our methods, there is an operation of c=-48. In fact, there are other characters between numbers, uppercase letters, and lowercase letters. Yes, this will cause unnecessary waste of space

// by 司徒正美
getIndex(c){
   if(c < 58){//48-57
     return c - 48
   }else if(c < 91){//65-90
     return c - 65 + 11
   }else {//> 97 
     return c - 97 + 26+ 11
   }
 }

Then the relevant method is to change c-= 48 to c = this.getIndex(c)

Test

var trie = new Trie(); 
  trie.insert("I"); 
  trie.insert("Love"); 
  trie.insert("China"); 
  trie.insert("China"); 
  trie.insert("China"); 
  trie.insert("China"); 
  trie.insert("China"); 
  trie.insert("xiaoliang"); 
  trie.insert("xiaoliang"); 
  trie.insert("man"); 
  trie.insert("handsome"); 
  trie.insert("love"); 
  trie.insert("Chinaha"); 
  trie.insert("her"); 
  trie.insert("know"); 
  var map = {}
  trie.preTraversal(function(node, str){
    if(node.isEnd){
     map[str] = node.numEnd
    }
  })
  for(var i in map){
    console.log(i+" 出现了"+ map[i]+" 次")
  }
  console.log("包含Chin(包括本身)前缀的单词及出现次数:"); 
  //console.log("China")
  var map = {}
  trie.preTraversal(function(node, str){
    if(str.indexOf("Chin") === 0 && node.isEnd){
      map[str] = node.numEnd
    }
   })
  for(var i in map){
    console.log(i+" 出现了"+ map[i]+" 次")
  }

Comparison of Trie tree and other data structures

Trie tree and binary Search tree

The binary search tree should be the earliest tree structure we come into contact with. We know that when the data size is n, the time complexity of the binary search tree insertion, search, and deletion operations is usually only O(log n). In the worst case, all nodes in the entire tree have only one child node, which degenerates into a linear list. At this time, the time complexity of insertion, search, and deletion operations is O(n).

Normally, the height n of the Trie tree is much greater than the length m of the search string, so the time complexity of the search operation is usually O(m), and the time complexity in the worst case is O (n). It is easy to see that the worst-case search in a Trie tree is faster than a binary search tree.

The Trie tree in this article uses strings as examples. In fact, it has strict requirements on the suitability of the key. If the key is a floating point number, the entire Trie tree may be extremely long and the nodes may be The readability is also very poor. In this case, it is not suitable to use Trie tree to save data; this problem does not exist with binary search tree.

Trie tree and Hash table

Consider the problem of Hash conflict. We usually say that the complexity of a Hash table is O(1). In fact, strictly speaking, this is the complexity of a Hash table that is close to perfect. In addition, we also need to consider that the hash function itself needs to traverse the search string, and the complexity is O(m ). When different keys are mapped to the "same position" (considering closed hashing, this "same position" can be replaced by an ordinary linked list), the complexity of the search depends on the number of nodes under the "same position" number, therefore, in the worst case, the Hash table can also become a one-way linked list.

Trie trees can be sorted according to the alphabetical order of keys more easily (the entire tree is traversed once in order), which is different from most Hash tables (Hash tables generally have different keys for different keys). is disordered).

Under ideal circumstances, the Hash table can quickly hit the target at O(1) speed. If the table is very large and needs to be placed on the disk, the search access of the Hash table will be faster under ideal circumstances. It only needs to be done once; but the number of disk accesses by the Trie tree needs to be equal to the node depth.

Many times a Trie tree requires more space than a Hash table. If we consider the situation where one node stores one character, there is no way to save it as a separate block when saving a string. . Node compression of the Trie tree can significantly alleviate this problem, which will be discussed later.

Improvements of Trie tree

Bitwise Trie (Bitwise Trie)

The principle is similar to the ordinary Trie tree, except that the smallest unit stored in the ordinary Trie tree is a character, but the Bitwise Trie stores bits That’s all. The access of bit data is directly implemented once by the CPU instruction. For binary data, it is theoretically faster than the ordinary Trie tree.

Node compression.

Branch compression: For stable Trie trees, they are basically search and read operations, and some branches can be compressed. For example, the inn of the rightmost branch in the previous figure can be directly compressed into a node "inn" without existing as a regular subtree. The Radix tree is based on this principle to solve the problem of the Trie tree being too deep.

Node mapping table: This method is also used when the nodes of the Trie tree may have been almost completely determined. For each state of the node in the Trie tree, if the total number of states is repeated many times, through an element Represented as a multi-dimensional array of numbers (such as Triple Array Trie), the space overhead of storing the Trie tree itself will be smaller, although an additional mapping table is introduced.

The application of prefix tree

The prefix tree is still easy to understand, and its application is also very wide.

(1) Fast retrieval of strings

The query time complexity of the dictionary tree is O(logL), L is the length of the string. So the efficiency is still relatively high. Dictionary trees are more efficient than hash tables.

(2) String sorting

From the above figure we can easily see that the words are sorted, and the alphabetical order is traversed first. Reduce unnecessary common substrings.

(3) The longest common prefix

The longest common prefix of inn and int is in. When traversing the dictionary tree to letter n, the common prefix of these words is in.

(4) Automatically match prefixes and display suffixes

When we use a dictionary or search engine, enter appl, and a bunch of stuff with the prefix of appl will automatically be displayed. Then it may be achieved through a dictionary tree. As mentioned earlier, the dictionary tree can find the common prefix. We only need to traverse and display the remaining suffixes.

The above is what I compiled for everyone. I hope it will be helpful to everyone in the future.

Related articles:

How to implement the streamlined style in Vue (detailed tutorial)

How to customize global components in vue?

How to implement multi-page development in vue2.0

The above is the detailed content of Detailed interpretation of trie prefix tree in JavaScript. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn