最长公共子序列算法
最长的公共子序列 (Longest common Subsequence)
-
Let X and Y be two subsequences and by LCS algorithm we can find a maximum length common subsequence of X and Y.
令X和Y为两个子序列,通过LCS算法,我们可以找到X和Y的最大长度公共子序列 。
-
If |X|= m and |Y|= n then there is a 2m subsequence of x; we just compare each with Y (n comparisons).
如果| X | = m且| Y | = n,则存在x的2m子序列; 我们只将每个与Y比较(n个比较) 。
-
So the running time of this particular algorithm will be O(n 2m).
因此,该特定算法的运行时间将为O(n 2m) 。
-
LCS problem has an optimal substructure which means that the solution of the subproblem is parts of the final solution.
LCS问题具有最佳子结构,这意味着子问题的解决方案是最终解决方案的一部分。
-
In this case, firstly it’s necessary to find the length of LCS then later we will modify the algorithm to find LCS itself.
在这种情况下,首先必须找到LCS的长度,然后我们将修改算法以找到LCS本身。
-
For this define Xi, Y¬i to the prefixes of X and Y of length i and j respectively. Then define c[i,j] to be the length of LCS of Xi and Yi.
为此, 分别将X i , Y¬i定义为长度为i和j的X和Y的前缀。 然后将c [i,j]定义为X i和Y i的LCS的长度。
-
Then the length of LCS of X and Y will be c[m, n].
那么X和Y的LCS的长度将为c [m,n] 。
i.e. c[i,j]= c[i-1,j-1]+1 if x[i]=y[j],
C[i,j]= max(c[i,j-1],c[i-1,j]) otherwise
LCS长度算法 (LCS Length Algorithm)
1. m←length[X]
2. n←length[y]
3. let b[1…m,1…n]and c[0…m,0…n] be new tables
4. for i←1 to m
5. c[i,j]←0
6. do c[0,j]←0
7. for j← 0 to n
8. do c[0,j]←0
9. for i←1 to m
10. for j←1 to n
11. if xi=yi
12. then c[i,j]←c[i-1,j-1]+1
13. b[i,j]←”↖”
14. else if c[i-1,j]>=c[i,j-1]
15. then c[i,j]←c[i-1,j]
16. b[i,j]←”↑”
17. else c[i,j]←c[i,j-1]
18. b[I,j]←”←”
19. return c and b
LCS程序 (Procedure of LCS)
-
X and Y defined as a given X and Y set.
X和Y定义为给定的X和Y集。
-
First design a, c and b table where Y set values will be present in column side and X set values will be present in row side.
首先设计a , c和b表,其中Y设置值将出现在列侧,而X设置值将出现在行侧。
-
Line 1 assigns the length of X into variable m, and in step 2 the length of Y will be assigned to variable n.
第1行将X的长度分配给变量m ,在步骤2中, Y的长度将分配给变量n 。
-
For loop present in line 3 and 4 is used to initialize all the first column value to zero.
第3和第4行中存在的for循环用于将所有第一列值初始化为零。
-
Another for loop line 5-6 is used to initialize all the row values to zero.
另一个for循环行5-6用于将所有行值初始化为零。
-
In the given algorithm three arrows are used which are vertical, horizontal and semi-vertical. By using these arrows we have to find out the LCS element.
在给定的算法中,使用了三个箭头,分别是垂直,水平和半垂直。 通过使用这些箭头,我们必须找出LCS元素。
-
The loop present in line 7-16 is used to insert the numeric values with specific arrows in c and a table.
本管线7-16环是用来插入在C和A表特定箭头数值。
复杂: (Complexity:)
Complexity of longest common subsequence is O(mn). Where, m and n are length of two strings.
最长公共子序列的复杂度为O(mn) 。 其中, m和n是两个字符串的长度。
翻译自: https://www.includehelp.com/algorithms/algorithm-and-procedure-to-solve-a-longest-common-subsequence-problem.aspx
最长公共子序列算法