


Data structures and algorithms in JavaScript (5): Classic KMP algorithm_javascript skills
KMP algorithm and BM algorithm
KMP is a classic algorithm for prefix matching and BM suffix matching. It can be seen that the difference between prefix matching and suffix matching is only in the order of comparison
Prefix matching means: the comparison of the pattern string and the parent string is from left to right, and the movement of the pattern string is also from left to right
Suffix matching means: the comparison of the pattern string and the parent string is from right to left, and the movement of the pattern string is from left to right.
Through the previous chapter it is obvious that the BF algorithm is also a prefix algorithm, but the efficiency of one-by-one matching is very arrogant. Naturally, it is not necessary to mention O(mn). KMP, which is annoying online, explains a lot. They are basically taking the high-level route and you may be confused. I tried to use my own understanding to describe it in the most down-to-earth way
KMP
KMP is also an optimized version of the prefix algorithm. The reason why it is called KMP is the abbreviation of the three names of Knuth, Morris, and Pratt. Compared with BF, the optimization point of the KMP algorithm is "the distance of each backward movement" It will dynamically adjust the moving distance of each pattern string. BF is 1 every time,
Not necessarily for KMP
As shown in the figure, the difference between BF and KMP pre-algorithm is compared
I compared the pictures and found out:
Search for the pattern string P in the text string T. When it naturally matches the sixth letter c, it is found that the second level is inconsistent. Then the BF method is to move the entire pattern string P by one place, and KMP is to move it by two places. .
We know the matching method of BF, but why does KMP move two digits instead of one or three or four digits?
Let’s explain the previous picture. The pattern string P is correct when it matches ababa, and it is wrong when it matches c. Then the idea of the KMP algorithm is: ababa is correctly matched. Information, can we use this information to not move the "search position" back to the position that has been compared, but continue to move it backward, which improves efficiency.
Then the question is, how do I know how many positions to move?
The authors of this offset algorithm KMP have summarized it for us:
Shifting digits = Number of matched characters - Corresponding partial matching value
The offset algorithm is only related to substrings, not text strings, so special attention needs to be paid here
So how do we understand the number of matched characters in the substring and the corresponding partial matching value?
Matched characters:
T : abababaabab
p : ababacb
The red mark in p is the matched character, which is easy to understand
Partial match value:
This is the core algorithm, and it is also difficult to understand
If:
T:aaronaabbcc
P:aaronaac
We can observe this text. If we make an error when matching c, where will our next move be based on the previous structure? Where is the most reasonable move?
aaronaabbcc
aaronaac
That is to say: within the pattern text, if the beginning and end of a certain paragraph of characters are the same, then this paragraph of content can be skipped during natural filtering. This idea is also reasonable
Knowing this rule, the partial matching table algorithm given is as follows:
First of all, you need to understand two concepts: "prefix" and "suffix". "Prefix" refers to all the head combinations of a string except the last character; "suffix" refers to all the tail combinations of a string except the first character.
"Partial matching value" is the length of the longest common element between "prefix" and "suffix""
Let’s take a look at aaronaac’s division if it is a BF match.
Displacement of BF: a,aa,aar,aaro,aaron,aarona,aaronaa,aaronaac
So what about the divisions of KMP? Here we need to introduce prefixes and suffixes
Let’s first look at the results of the KMP partial matching table:
a a r o n a a c
[0, 1, 0, 0, 0, 1, 2, 0]
I am definitely confused, so don’t worry, let’s break it down, prefixes and suffixes
Match string: "Aaron"
Prefix: A, Aa, Aar, Aaro
Suffix: aron,ron,on,n
Moving position: In fact, it is to compare the prefix and suffix of each matched character to see if they are equal, and then calculate the total length
Decomposition of partial matching table
The matching table algorithm in KMP, where p represents the prefix, n represents the suffix, and r represents the result
a, p=>0, n=>0 r = 0
aa, p=>[a], n=>[a], r = a.length => 1
aar, p=>[a,aa], n=>[r,ar] ,r = 0
aaro, p=>[a,aa,aar], n=>[o,ra,aro] ,r = 0
aaron p=>[a,aa,aar,aaro], n=>[n,on,ron,aron] ,r = 0
aarona, p=>[a,aa,aar,aaro,aaron], n=>[a,na,ona,rona,arona] ,r = a.lenght = 1
aaronaa, p=>[a,aa,aar,aaro,aaron,aarona], n=>[a,aa,naa,onaa,ronaa,aronaa] , r = Math.max(a.length ,aa.length) = 2
aaronaac p=>[a,aa,aar,aaro,aaron,aarona], n=>[c,ac,aac,naac,onaac,ronaac] r = 0
Similar to the BF algorithm, first decompose the position of each possible matching subscript and cache it. When matching, use this "partial matching table" to locate the number of digits that need to be moved
So the final result of aaronaac’s matching table is 0,1,0,0,0,1,2,0.
The JS version of KMP will be implemented below, there are 2 types
KMP implementation (1): KMP caching matching table
KMP implementation (2): dynamically calculate next KMP
KMP implementation (1)
Matching table
The most important thing in the KMP algorithm is the matching table. If the matching table is not needed, then it is the implementation of BF. Adding the matching table is KMP
The matching table determines the next displacement count
Based on the rules of the matching table above, we design a kmpGetStrPartMatchValue method
function kmpGetStrPartMatchValue(str) { var prefix = []; var suffix = []; var partMatch = []; for (var i = 0, j = str.length; i < j; i++) { var newStr = str.substring(0, i + 1); if (newStr.length == 1) { partMatch[i] = 0; } else { for (var k = 0; k < i; k++) { //前缀 prefix[k] = newStr.slice(0, k + 1); //后缀 suffix[k] = newStr.slice(-k - 1); //如果相等就计算大小,并放入结果集中 if (prefix[k] == suffix[k]) { partMatch[i] = prefix[k].length; } } if (!partMatch[i]) { partMatch[i] = 0; } } } return partMatch; }
Completely in accordance with the implementation of the matching table algorithm in KMP, a->aa->aar->aaro->aaron->aarona-> is decomposed through str.substring(0, i 1) aaronaa-aaronaac
Then calculate the length of the common elements through prefix and suffix in each decomposition
Backoff Algorithm
KMP is also a front-end algorithm. You can completely transfer the BF one. The only modification is that BF directly adds 1 when backtracking. When KMP backtracks, we can calculate the next value through the matching table
//子循环 for (var j = 0; j < searchLength; j++) { //如果与主串匹配 if (searchStr.charAt(j) == sourceStr.charAt(i)) { //如果是匹配完成 if (j == searchLength - 1) { result = i - j; break; } else { //如果匹配到了,就继续循环,i++是用来增加主串的下标位 i++; } } else { //在子串的匹配中i是被叠加了 if (j > 1 && part[j - 1] > 0) { i += (i - j - part[j - 1]); } else { //移动一位 i = (i - j) } break; } }
The red mark is the core point of KMP. The value of next = the number of matched characters - the corresponding partial matching value
Complete KMP algorithm
<!doctype html><div id="test2"><div><script type="text/javascript"> function kmpGetStrPartMatchValue(str) { var prefix = []; var suffix = []; var partMatch = []; for (var i = 0, j = str.length; i < j; i++) { var newStr = str.substring(0, i + 1); if (newStr.length == 1) { partMatch[i] = 0; } else { for (var k = 0; k < i; k++) { //取前缀 prefix[k] = newStr.slice(0, k + 1); suffix[k] = newStr.slice(-k - 1); if (prefix[k] == suffix[k]) { partMatch[i] = prefix[k].length; } } if (!partMatch[i]) { partMatch[i] = 0; } } } return partMatch; } function KMP(sourceStr, searchStr) { //生成匹配表 var part = kmpGetStrPartMatchValue(searchStr); var sourceLength = sourceStr.length; var searchLength = searchStr.length; var result; var i = 0; var j = 0; for (; i < sourceStr.length; i++) { //最外层循环,主串 //子循环 for (var j = 0; j < searchLength; j++) { //如果与主串匹配 if (searchStr.charAt(j) == sourceStr.charAt(i)) { //如果是匹配完成 if (j == searchLength - 1) { result = i - j; break; } else { //如果匹配到了,就继续循环,i++是用来增加主串的下标位 i++; } } else { //在子串的匹配中i是被叠加了 if (j > 1 && part[j - 1] > 0) { i += (i - j - part[j - 1]); } else { //移动一位 i = (i - j) } break; } } if (result || result == 0) { break; } } if (result || result == 0) { return result } else { return -1; } } var s = "BBC ABCDAB ABCDABCDABDE"; var t = "ABCDABD"; show('indexOf',function() { return s.indexOf(t) }) show('KMP',function() { return KMP(s,t) }) function show(bf_name,fn) { var myDate = +new Date() var r = fn(); var div = document.createElement('div') div.innerHTML = bf_name +'算法,搜索位置:' + r + ",耗时" + (+new Date() - myDate) + "ms"; document.getElementById("test2").appendChild(div); } </script></div></div>
KMP(二)
第一种kmp的算法很明显,是通过缓存查找匹配表也就是常见的空间换时间了。那么另一种就是时时查找的算法,通过传递一个具体的完成字符串,算出这个匹配值出来,原理都一样
生成缓存表的时候是整体全部算出来的,我们现在等于只要挑其中的一条就可以了,那么只要算法定位到当然的匹配即可
next算法
function next(str) { var prefix = []; var suffix = []; var partMatch; var i = str.length var newStr = str.substring(0, i + 1); for (var k = 0; k < i; k++) { //取前缀 prefix[k] = newStr.slice(0, k + 1); suffix[k] = newStr.slice(-k - 1); if (prefix[k] == suffix[k]) { partMatch = prefix[k].length; } } if (!partMatch) { partMatch = 0; } return partMatch; }
其实跟匹配表是一样的,去掉了循环直接定位到当前已成功匹配的串了
完整的KMP.next算法
<!doctype html><div id="testnext"><div><script type="text/javascript"> function next(str) { var prefix = []; var suffix = []; var partMatch; var i = str.length var newStr = str.substring(0, i + 1); for (var k = 0; k < i; k++) { //取前缀 prefix[k] = newStr.slice(0, k + 1); suffix[k] = newStr.slice(-k - 1); if (prefix[k] == suffix[k]) { partMatch = prefix[k].length; } } if (!partMatch) { partMatch = 0; } return partMatch; } function KMP(sourceStr, searchStr) { var sourceLength = sourceStr.length; var searchLength = searchStr.length; var result; var i = 0; var j = 0; for (; i < sourceStr.length; i++) { //最外层循环,主串 //子循环 for (var j = 0; j < searchLength; j++) { //如果与主串匹配 if (searchStr.charAt(j) == sourceStr.charAt(i)) { //如果是匹配完成 if (j == searchLength - 1) { result = i - j; break; } else { //如果匹配到了,就继续循环,i++是用来增加主串的下标位 i++; } } else { if (j > 1) { i += i - next(searchStr.slice(0,j)); } else { //移动一位 i = (i - j) } break; } } if (result || result == 0) { break; } } if (result || result == 0) { return result } else { return -1; } } var s = "BBC ABCDAB ABCDABCDABDE"; var t = "ABCDAB"; show('indexOf',function() { return s.indexOf(t) }) show('KMP.next',function() { return KMP(s,t) }) function show(bf_name,fn) { var myDate = +new Date() var r = fn(); var div = document.createElement('div') div.innerHTML = bf_name +'算法,搜索位置:' + r + ",耗时" + (+new Date() - myDate) + "ms"; document.getElementById("testnext").appendChild(div); } </script></div></div>

C and JavaScript achieve interoperability through WebAssembly. 1) C code is compiled into WebAssembly module and introduced into JavaScript environment to enhance computing power. 2) In game development, C handles physics engines and graphics rendering, and JavaScript is responsible for game logic and user interface.

JavaScript is widely used in websites, mobile applications, desktop applications and server-side programming. 1) In website development, JavaScript operates DOM together with HTML and CSS to achieve dynamic effects and supports frameworks such as jQuery and React. 2) Through ReactNative and Ionic, JavaScript is used to develop cross-platform mobile applications. 3) The Electron framework enables JavaScript to build desktop applications. 4) Node.js allows JavaScript to run on the server side and supports high concurrent requests.

Python is more suitable for data science and automation, while JavaScript is more suitable for front-end and full-stack development. 1. Python performs well in data science and machine learning, using libraries such as NumPy and Pandas for data processing and modeling. 2. Python is concise and efficient in automation and scripting. 3. JavaScript is indispensable in front-end development and is used to build dynamic web pages and single-page applications. 4. JavaScript plays a role in back-end development through Node.js and supports full-stack development.

C and C play a vital role in the JavaScript engine, mainly used to implement interpreters and JIT compilers. 1) C is used to parse JavaScript source code and generate an abstract syntax tree. 2) C is responsible for generating and executing bytecode. 3) C implements the JIT compiler, optimizes and compiles hot-spot code at runtime, and significantly improves the execution efficiency of JavaScript.

JavaScript's application in the real world includes front-end and back-end development. 1) Display front-end applications by building a TODO list application, involving DOM operations and event processing. 2) Build RESTfulAPI through Node.js and Express to demonstrate back-end applications.

The main uses of JavaScript in web development include client interaction, form verification and asynchronous communication. 1) Dynamic content update and user interaction through DOM operations; 2) Client verification is carried out before the user submits data to improve the user experience; 3) Refreshless communication with the server is achieved through AJAX technology.

Understanding how JavaScript engine works internally is important to developers because it helps write more efficient code and understand performance bottlenecks and optimization strategies. 1) The engine's workflow includes three stages: parsing, compiling and execution; 2) During the execution process, the engine will perform dynamic optimization, such as inline cache and hidden classes; 3) Best practices include avoiding global variables, optimizing loops, using const and lets, and avoiding excessive use of closures.

Python is more suitable for beginners, with a smooth learning curve and concise syntax; JavaScript is suitable for front-end development, with a steep learning curve and flexible syntax. 1. Python syntax is intuitive and suitable for data science and back-end development. 2. JavaScript is flexible and widely used in front-end and server-side programming.


Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Safe Exam Browser
Safe Exam Browser is a secure browser environment for taking online exams securely. This software turns any computer into a secure workstation. It controls access to any utility and prevents students from using unauthorized resources.

Atom editor mac version download
The most popular open source editor

EditPlus Chinese cracked version
Small size, syntax highlighting, does not support code prompt function

SecLists
SecLists is the ultimate security tester's companion. It is a collection of various types of lists that are frequently used during security assessments, all in one place. SecLists helps make security testing more efficient and productive by conveniently providing all the lists a security tester might need. List types include usernames, passwords, URLs, fuzzing payloads, sensitive data patterns, web shells, and more. The tester can simply pull this repository onto a new test machine and he will have access to every type of list he needs.