Home  >  Article  >  Web Front-end  >  JavaScript instant matching code for tens of thousands of keywords_javascript skills

JavaScript instant matching code for tens of thousands of keywords_javascript skills

WBOY
WBOYOriginal
2016-05-16 17:29:53944browse

When it comes to keyword search, the first thing that comes to mind is nothing more than using some character functions such as indexOf and replace, or at most adding some regular expressions. Although it is very simple to implement, have you ever carefully considered the efficiency issues behind it? For example, in keyword filtering in forums, generally the number of keywords to be filtered and the length of text to be detected are not large, so there is not much worth paying attention to in this instant process. But if the number of keywords is no longer a handful, but thousands, and the text to be detected is also a long one, the results are no longer so optimistic. Everyone knows that for every additional keyword, a full-text search will be added, and the final time spent will be far beyond the acceptable range.
 
Since we are considering that extreme keyword search, the usual one-by-one traversal search is obviously not feasible. Nowadays, JavaScript is used. It would be a shame for this language if it does not use Hash tables. With unique support for watches, you might as well take out a small amount of space in exchange for a large amount of time.
 
Let’s look at an example first. For example, there are the following keywords: foo1, foo2, bar1, bar2. Since space is exchanged for time, they must be preprocessed before searching. As mentioned earlier, JS’s flexible and efficient tables are obviously the most advantageous to use the tree structure. Even if you don’t understand it, it doesn’t matter. The final implementation structure is just like the following code. It’s also very familiar to be familiar with JSON:

Copy the code The code is as follows:

var Root =
{
f:
{
o:
o:
: true ,
: true
}
}
},
b:
{
a:
r :
                                      {
: true,
: true
}
}
}
};



This layer-by-layer structure is just like a tree, each character is A branch of a tree, the last character is a leaf, and there are no new nodes.
At this point you should understand that all you need to do is search down the tree for each word of the article. If you can reach the leaves, it means that the current character is one of the keywords; if you can't find the corresponding branch halfway, of course it is not a keyword.

For example, foo1 is accessed downward along the Root structure, and finally reaches Root['f']['o']['o']['1'], that is, a match is completed. Then skip the length of foo1 and continue searching later.

Therefore, the entire article only needs one search to find out the position of each keyword.
Since the hash table performance of JS is very high, the so-called search for branches is very fast. Because of the flexibility of JS, the code to achieve this effect is also very short.
 
In fact, it can be found that the number of keywords has little relationship with the search time. It only affects the width of the tree. Only the length of the article determines the search time.
 
Let’s take a limit test:
Keywords: Complete Collection of Idioms (19830 items)
Content: Complete Collection of Zhuxian.txt (1659219 words)
Time: 935ms
(Chrome26 / i3- 2312 CPU)
An article of 1.6 million words, matching 20,000 keywords, takes less than 1 second. It can be seen that by fully utilizing the flexibility of JavaScript, there is still great potential.
Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn