search
HomeWeb Front-endJS TutorialHow to implement the longest common subsequence in javascript

How to implement the longest common subsequence in javascript

Jun 07, 2018 pm 05:01 PM
jslongest common subsequence

The longest common sequence (longest common sequence) and the longest common substring (longest common substring) are not the same thing. The following article mainly introduces you to the relevant information about the implementation of the longest common subsequence in JavaScript. , friends in need can refer to it.

Introduction

The Longest Common Subsequence LCS is to extract all the possible subsequences from the given two sequences X and Y. The possible extra characters are arranged in the order in which they are arranged in the original sequence. The algorithm for LCS problems has a wide range of uses. For example, in the management of different versions of software, the LCS algorithm is used to find the similarities and differences between the old and new versions; in software testing, the LCS algorithm is used to compare recorded and played back sequences. In the field of genetic engineering, the LCS algorithm is used The algorithm checks the similarities and differences between the patient's DNA strand and the bond's DNA strand; in the anti-plagiarism system, the LCS algorithm is used to check the plagiarism rate of the paper. The LCS algorithm can also be used for program code similarity measurement, human running sequence retrieval, video segment matching, etc., so research on the LCS algorithm has high application value.

Basic concepts

Subsequence: A subsequence of a specific sequence is zero or more elements in a given sequence The result obtained after removing it (without changing the relative order between elements). For example, the subsequences of the sequence are: , ,

Common subsequence: Given sequences X and Y, sequence Z is a subsequence of X and a subsequence of Y, then Z is the common subsequence of X and Y. For example, X=[A,B,C,B,D,A,B], Y=[B,D,C,A,B,A[, then the sequence Z=[B,C,A] is X and Y The common subsequence of , its length is 3. But Z is not the longest common subsequence of X and Y, and the sequences [B, C, B, A] and [B, D, A, B] are also the longest common subsequences of X and Y, with a length of 4 , and X and Y do not have a common subsequence with a length greater than or equal to 5. For the common subsequences of the sequence [A, B, C] and the sequence [E, F, G], there is only the empty sequence [].

Longest common subsequence: Given sequences X and Y, select the one or several with the longest length from all their common subsequences.
Substring: A new series formed by deleting zero or several characters from the front or the end of a sequence, or both at the same time. The difference is that subsequences can have characters cut out from the middle. How many subsequences are there in the string cnblogs? Obviously there are 27 of them, such as cb, cgs, etc. are all their subsequences

Give me a picture to explain:

We can see The subsequence is not necessarily continuous, what is continuous is the substring.

Problem Analysis

We still start the analysis from a matrix and derive the state transition equation ourselves.

First of all, we convert the problem into a concept that is familiar enough to the front end. Instead of calling it serially, it can be thought of as an array or a string. To keep things simple, let's just assume that two strings are being compared.

We focus on the concept of "subsequence", which can delete multiple or zero ones, or all of them. At this time our first subsequence is an empty string (if our sequence is not a string, we can still)! This is something you really need to pay attention to! Many people just can't understand the chart in "Introduction to Algorithms", and there are also many bloggers who don't pretend to understand. We always compare from left to right, and of course the first string, because it is the height of the matrix, is placed vertically.

##""##ABCDAB If X = "ABCDAB" and Y = "BDCABA", each takes out the shortest sequence, that is, compares the empty string with the empty string . The solution of the LCS equation is a number, so this table can only be filled with numbers. The length of the common area of ​​two empty strings is 0.
x "" B D C A B A

x##""0ABCDAB

Then we don’t move X and continue to let the empty string come out of the array, and Y lets “B” come out of the array. Obviously, the length of their common areas is 0. Y is replaced with other characters, D, C, or, they Continuous combinations of DC and DDC, the situation has not changed, it is still 0. Therefore, the first row is 0. Then we do not move Y, and Y only produces empty strings, then it is the same as the above analysis, all are 0, the first The columns are all 0.

"" B D C A B A
#""0000000##A##B0C0D0A0 B0The LCS problem is a little different from the backpack problem. The backpack problem can also be set to -1 OK, and the longest common subsequence has the left and upper sides fixed from the beginning because of the occurrence of empty subsequences.
x "" B D C A B A
0
Then we enlarge the problem a little further. This time both sides produce a character. Obviously, only when both are the same, can there be a common subsequence that is not an empty string, and the length can also be understood as 1.

A is "X", Y is any subsequence of "BDCA"

##x""##""00000000##B0C0D 0##ABContinue to fill in the blanks to the right, how to fill in? Obviously, LCS cannot be greater than the length of X. How can the subsequence of Y starting from the A string be equal to 1 compared with the A sequence of B. x
B D C A B A
##A 0
0 0 1
0
0
""

BD#""00000 000##B0C0D0A0B0x""B
C A B A
##A 0 0 0
1 1 1
If Then let's look at B first, ${X_1} == ${Y_0}, we get a new public substring, and we should add 1. why? Because our matrix is ​​a state table, describing the state migration process from left to right and top to bottom, and these states are accumulated based on existing states. What we need to confirm now is the relationship between the value of the grid we want to fill in and the values ​​of the grids around it that have already been filled in. At present, there is too little information and it is just an isolated point, so just fill in 1.

D

CABA00 ##A0000111##C0D0A0B0

Then we let Y have an extra D as a helper, {"",A,B,AB} vs {"",B,D,BD}. Obviously, continue to fill in 1. Fill in until the second one of Y Before B, it is all 1. Because when it comes to BDCAB, they have another common subsequence, AB.

#"" 0 0 0
0 0
##B 0 1
#""00000 00##A##B01111 2000 0At this step, we can summarize some rules. Then we will verify our ideas through calculations and add new rules or limiting conditions to improve them.
x "" B D C A B A
0 0 0 0 1 1 1
#C
D
A
B

Y Send all the characters, X is still 2 characters, after careful observation, still fill in 2.

Look at the five lines, send more X If the subsequence set of C and ABC is larger than the subsequence set of AB, then it and the B subsequence set of Y are larger. Even if they are not larger, they cannot be smaller than the original ones. Obviously the newly added C cannot become a combat power and is not a common character between the two, so the value should be equal to the subsequence set of AB.

×""##""0000 0000##B0111122##D0A0B0

And we can be sure that if the characters to be compared between the two strings are different, then the grid to be filled is related to the left or upper side, and the larger side will be used.

If the compared characters are the same, don’t worry, it happens that the C of X needs to be compared with the C of Y, that is, the subsequence set of ABC {"",A,B,C,AB,BC, ABC} is compared with the subsequence set {"",B,D,C,BD,DC,BDC} of BDC, and the common substrings obtained are "",B,D. At this time, the conclusion is still the same as before. When the characters are equal, its corresponding grid value is equal to the value of the left, right, and upper left corners, and the left, upper, and upper left sides are always equal. These mysteries require more rigorous mathematical knowledge to demonstrate.

Suppose there are two arrays, A and B. A[i] is the i-th element of A, and A(i) is the prefix consisting of the first element to the i-th element of A. m(i, j) is the longest common subsequence length of A(i) and B(j).

Due to the recursive nature of the algorithm itself, in fact, we only need to prove that for a certain i and j:

m(i, j) = m(i-1, j-1) 1 ( When A[i] = B[j])

m(i, j) = max( m(i-1, j), m(i, j-1) ) (When A[i ] ! = B[j])

The first formula is easy to prove, that is, when A[i] = B[j]. You can use counter-proof, assuming m(i, j) > m(i-1, j-1) 1 (m(i, j) cannot be less than m(i-1, j-1) 1, the reason is obvious) , then we can deduce the contradictory result that m(i-1, j-1) is not the longest.

The second one is a bit tricky. When A[i] != B[j], it is still a disproof, assuming m(i, j) > max( m(i-1, j), m(i, j-1) ).

By disproving the hypothesis, it can be obtained that m(i, j) > m(i-1, j). This can be deduced that A[i] must be in the LCS sequence corresponding to m(i, j) (contradictory evidence is available). And since A[i] != B[j], B[j] must not be in the LCS sequence corresponding to m(i, j). So it can be deduced that m(i, j) = m(i, j-1). This leads to results that contradict the hypothesis anyway.

Get certified.

We now use the following equation to continue filling in the table.

Program implementation

//by 司徒正美
function LCS(str1, str2){
  var rows = str1.split("")
  rows.unshift("")
  var cols = str2.split("")
  cols.unshift("")
  var m = rows.length 
  var n = cols.length 
  var dp = []
  for(var i = 0; i < m; i++){ 
   dp[i] = []
   for(var j = 0; j < n; j++){ 
    if(i === 0 || j === 0){
     dp[i][j] = 0
     continue
    }
    
    if(rows[i] === cols[j]){ 
     dp[i][j] = dp[i-1][j-1] + 1 //对角+1
    }else{
     dp[i][j] = Math.max( dp[i-1][j], dp[i][j-1]) //对左边,上边取最大
    }
   }
   console.log(dp[i].join(""))//调试
  } 
  return dp[i-1][j-1]
 }

LCS can be further simplified, just by moving the position, eliminating the need to generate a new array

//by司徒正美
function LCS(str1, str2){
 var m = str1.length 
 var n = str2.length
 var dp = [new Array(n+1).fill(0)] //第一行全是0
 for(var i = 1; i <= m; i++){ //一共有m+1行
  dp[i] = [0] //第一列全是0
  for(var j = 1; j <= n; j++){//一共有n+1列
   if(str1[i-1] === str2[j-1]){ 
    //注意这里,str1的第一个字符是在第二列中,因此要减1,str2同理
    dp[i][j] = dp[i-1][j-1] + 1 //对角+1
   } else {
     dp[i][j] = Math.max( dp[i-1][j], dp[i][j-1]) 
   }
  }
 } 
 return dp[m][n];
}

Print an LCS

#We will give the printing function and first look at how to print one. We start looking from the lower right corner and end at the top line. Therefore the target string is constructed in reverse order. In order to avoid using troublesome intermediate quantities such as stringBuffer, we can implement it recursively. Each time the program is executed, only one string is returned, otherwise an empty string is returned. PrintLCS(x,y,...) str[i ] Add them together to get the string we require.

We write another method to verify whether the string we get is a real LCS string. As a person who is already working, I cannot write code like a student in school and put it online without doing unit testing for others to step on.

//by 司徒正美,打印一个LCS
function printLCS(dp, str1, str2, i, j){
 if (i == 0 || j == 0){
  return "";
 }
 if( str1[i-1] == str2[j-1] ){
  return printLCS(dp, str1, str2, i-1, j-1) + str1[i-1];
 }else{
  if (dp[i][j-1] > dp[i-1][j]){
   return printLCS(dp, str1, str2, i, j-1);
  }else{
   return printLCS(dp, str1, str2, i-1, j);
  }
 }
}
//by司徒正美, 将目标字符串转换成正则,验证是否为之前两个字符串的LCS
function validateLCS(el, str1, str2){
 var re = new RegExp( el.split("").join(".*") )
 console.log(el, re.test(str1),re.test(str2))
 return re.test(str1) && re.test(str2)
}

Use:

function LCS(str1, str2){
 var m = str1.length 
 var n = str2.length
 //....略,自行补充
 var s = printLCS(dp, str1, str2, m, n)
 validateLCS(s, str1, str2)
 return dp[m][n]
}
var c1 = LCS( "ABCBDAB","BDCABA");
console.log(c1) //4 BCBA、BCAB、BDAB
var c2 = LCS("13456778" , "357486782" );
console.log(c2) //5 34678 
var c3 = LCS("ACCGGTCGAGTGCGCGGAAGCCGGCCGAA" ,"GTCGTTCGGAATGCCGTTGCTCTGTAAA" );
console.log(c3) //20 GTCGTCGGAAGCCGGCCGAA

Print all LCS

The idea is similar to the above , let us note that there is a Math.max value in the LCS method, which actually integrates three situations, so three strings can be forked. Our method will return an es6 collection object for automatic removal. Then each time the new set is used to merge the strings of the old set.

//by 司徒正美 打印所有LCS
function printAllLCS(dp, str1, str2, i, j){
 if (i == 0 || j == 0){
  return new Set([""])
 }else if(str1[i-1] == str2[j-1]){
  var newSet = new Set()
  printAllLCS(dp, str1, str2, i-1, j-1).forEach(function(el){
   newSet.add(el + str1[i-1])
  })
  return newSet
 }else{
  var set = new Set()
  if (dp[i][j-1] >= dp[i-1][j]){
   printAllLCS(dp, str1, str2, i, j-1).forEach(function(el){
    set.add(el)
   })
  }
  if (dp[i-1][j] >= dp[i][j-1]){//必须用>=,不能简单一个else搞定
   printAllLCS(dp, str1, str2, i-1, j).forEach(function(el){
    set.add(el)
   })
  } 
  return set
 } 
 }

Using:

function LCS(str1, str2){
 var m = str1.length 
 var n = str2.length
 //....略,自行补充
 var s = printAllLCS(dp, str1, str2, m, n)
 console.log(s)
 s.forEach(function(el){
  validateLCS(el,str1, str2)
  console.log("输出LCS",el)
 })
 return dp[m][n]
}
var c1 = LCS( "ABCBDAB","BDCABA");
console.log(c1) //4 BCBA、BCAB、BDAB
var c2 = LCS("13456778" , "357486782" );
console.log(c2) //5 34678 
var c3 = LCS("ACCGGTCGAGTGCGCGGAAGCCGGCCGAA" ,"GTCGTTCGGAATGCCGTTGCTCTGTAAA" );
console.log(c3) //20 GTCGTCGGAAGCCGGCCGAA

Space optimization

Using rolling array:

function LCS(str1, str2){
 var m = str1.length 
 var n = str2.length
 var dp = [new Array(n+1).fill(0)],now = 1,row //第一行全是0
 for(var i = 1; i <= m; i++){ //一共有2行
  row = dp[now] = [0] //第一列全是0
  for(var j = 1; j <= n; j++){//一共有n+1列
   if(str1[i-1] === str2[j-1]){ 
    //注意这里,str1的第一个字符是在第二列中,因此要减1,str2同理
    dp[now][j] = dp[i-now][j-1] + 1 //对角+1
   } else {
    dp[now][j] = Math.max( dp[i-now][j], dp[now][j-1]) 
   }
  }
  now = 1- now; //1-1=>0;1-0=>1; 1-1=>0 ...
 } 
 return row ? row[n]: 0
}

Dangerous recursive solution

A subsequence of str1 corresponds to a subsequence of the subscript sequence {1, 2, …, m} sequence, therefore, str1 has a total of ${2^m}$ different subsequences (the same is true for str2, such as ${2^n}$), so the complexity reaches an astonishing exponential time (${2^m * 2^ n}$).

//警告,字符串的长度一大就会爆栈
function LCS(str1, str2, a, b) {
  if(a === void 0){
   a = str1.length - 1
  }
  if(b === void 0){
   b = str2.length - 1
  }
  if(a == -1 || b == -1){
   return 0
  } 
  if(str1[a] == str2[b]) {
   return LCS(str1, str2, a-1, b-1)+1;
  }
  if(str1[a] != str2[b]) {
   var x = LCS(str1, str2, a, b-1)
   var y = LCS(str1, str2, a-1, b)
   return x >= y ? x : y
  }
 }

The above is what I compiled for everyone. I hope it will be helpful to everyone in the future.

Related articles:

Using vue to implement secondary route setting method

react project development

Implement multiple routing implementations in Vue-Router2.X

B D C A B A
##A 0
0 0 1 1 1
#C 0 1

The above is the detailed content of How to implement the longest common subsequence in javascript. For more information, please follow other related articles on the PHP Chinese website!

Statement
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
The Role of C/C   in JavaScript Interpreters and CompilersThe Role of C/C in JavaScript Interpreters and CompilersApr 20, 2025 am 12:01 AM

C and C play a vital role in the JavaScript engine, mainly used to implement interpreters and JIT compilers. 1) C is used to parse JavaScript source code and generate an abstract syntax tree. 2) C is responsible for generating and executing bytecode. 3) C implements the JIT compiler, optimizes and compiles hot-spot code at runtime, and significantly improves the execution efficiency of JavaScript.

JavaScript in Action: Real-World Examples and ProjectsJavaScript in Action: Real-World Examples and ProjectsApr 19, 2025 am 12:13 AM

JavaScript's application in the real world includes front-end and back-end development. 1) Display front-end applications by building a TODO list application, involving DOM operations and event processing. 2) Build RESTfulAPI through Node.js and Express to demonstrate back-end applications.

JavaScript and the Web: Core Functionality and Use CasesJavaScript and the Web: Core Functionality and Use CasesApr 18, 2025 am 12:19 AM

The main uses of JavaScript in web development include client interaction, form verification and asynchronous communication. 1) Dynamic content update and user interaction through DOM operations; 2) Client verification is carried out before the user submits data to improve the user experience; 3) Refreshless communication with the server is achieved through AJAX technology.

Understanding the JavaScript Engine: Implementation DetailsUnderstanding the JavaScript Engine: Implementation DetailsApr 17, 2025 am 12:05 AM

Understanding how JavaScript engine works internally is important to developers because it helps write more efficient code and understand performance bottlenecks and optimization strategies. 1) The engine's workflow includes three stages: parsing, compiling and execution; 2) During the execution process, the engine will perform dynamic optimization, such as inline cache and hidden classes; 3) Best practices include avoiding global variables, optimizing loops, using const and lets, and avoiding excessive use of closures.

Python vs. JavaScript: The Learning Curve and Ease of UsePython vs. JavaScript: The Learning Curve and Ease of UseApr 16, 2025 am 12:12 AM

Python is more suitable for beginners, with a smooth learning curve and concise syntax; JavaScript is suitable for front-end development, with a steep learning curve and flexible syntax. 1. Python syntax is intuitive and suitable for data science and back-end development. 2. JavaScript is flexible and widely used in front-end and server-side programming.

Python vs. JavaScript: Community, Libraries, and ResourcesPython vs. JavaScript: Community, Libraries, and ResourcesApr 15, 2025 am 12:16 AM

Python and JavaScript have their own advantages and disadvantages in terms of community, libraries and resources. 1) The Python community is friendly and suitable for beginners, but the front-end development resources are not as rich as JavaScript. 2) Python is powerful in data science and machine learning libraries, while JavaScript is better in front-end development libraries and frameworks. 3) Both have rich learning resources, but Python is suitable for starting with official documents, while JavaScript is better with MDNWebDocs. The choice should be based on project needs and personal interests.

From C/C   to JavaScript: How It All WorksFrom C/C to JavaScript: How It All WorksApr 14, 2025 am 12:05 AM

The shift from C/C to JavaScript requires adapting to dynamic typing, garbage collection and asynchronous programming. 1) C/C is a statically typed language that requires manual memory management, while JavaScript is dynamically typed and garbage collection is automatically processed. 2) C/C needs to be compiled into machine code, while JavaScript is an interpreted language. 3) JavaScript introduces concepts such as closures, prototype chains and Promise, which enhances flexibility and asynchronous programming capabilities.

JavaScript Engines: Comparing ImplementationsJavaScript Engines: Comparing ImplementationsApr 13, 2025 am 12:05 AM

Different JavaScript engines have different effects when parsing and executing JavaScript code, because the implementation principles and optimization strategies of each engine differ. 1. Lexical analysis: convert source code into lexical unit. 2. Grammar analysis: Generate an abstract syntax tree. 3. Optimization and compilation: Generate machine code through the JIT compiler. 4. Execute: Run the machine code. V8 engine optimizes through instant compilation and hidden class, SpiderMonkey uses a type inference system, resulting in different performance performance on the same code.

See all articles

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Tools

MantisBT

MantisBT

Mantis is an easy-to-deploy web-based defect tracking tool designed to aid in product defect tracking. It requires PHP, MySQL and a web server. Check out our demo and hosting services.

SAP NetWeaver Server Adapter for Eclipse

SAP NetWeaver Server Adapter for Eclipse

Integrate Eclipse with SAP NetWeaver application server.

MinGW - Minimalist GNU for Windows

MinGW - Minimalist GNU for Windows

This project is in the process of being migrated to osdn.net/projects/mingw, you can continue to follow us there. MinGW: A native Windows port of the GNU Compiler Collection (GCC), freely distributable import libraries and header files for building native Windows applications; includes extensions to the MSVC runtime to support C99 functionality. All MinGW software can run on 64-bit Windows platforms.

PhpStorm Mac version

PhpStorm Mac version

The latest (2018.2.1) professional PHP integrated development tool

VSCode Windows 64-bit Download

VSCode Windows 64-bit Download

A free and powerful IDE editor launched by Microsoft