我正在努力研究“查找超序列”算法。
输入用于一组字符串
String A = "caagccacctacatca";
String B = "cgagccatccgtaaagttg";
String C = "agaacctgctaaatgctaga";
结果将是正确对齐的字符串集(下一步应该是合并)
String E = "ca ag cca cc ta cat c a";
String F = "c gag ccat ccgtaaa g tt g";
String G = " aga acc tgc taaatgc t a ga";
感谢您的任何建议(我在这项任务上呆了一天多)
合并后超字符串将是
cagagaccatgccgtaaatgcattacga
“这种情况”中超序列的定义类似于
当且仅当字符串 R 中的所有字符按照它们在输入序列 R 中出现的顺序出现在超级序列 S 中时,字符串 R 才包含在超级序列 S 中。
我尝试过的“解决方案”(这又是错误的做法)是:
public class Solution4
{
static boolean[][] map = null;
static int size = 0;
public static void main(String[] args)
{
String A = "caagccacctacatca";
String B = "cgagccatccgtaaagttg";
String C = "agaacctgctaaatgctaga";
Stack data = new Stack();
data.push(A);
data.push(B);
data.push(C);
Stack clone1 = data.clone();
Stack clone2 = data.clone();
int length = 26;
size = max_size(data);
System.out.println(size+" "+length);
map = new boolean[26][size];
char[] result = new char[size];
HashSet<String> chunks = new HashSet<String>();
while(!clone1.isEmpty())
{
String a = clone1.pop();
char[] residue = make_residue(a);
System.out.println("---");
System.out.println("OLD : "+a);
System.out.println("RESIDUE : "+String.valueOf(residue));
String[] r = String.valueOf(residue).split(" ");
for(int i=0; i<r.length; i++)
{
if(r[i].equals(" ")) continue;
//chunks.add(spaces.substring(0,i)+r[i]);
chunks.add(r[i]);
}
}
for(String chunk : chunks)
{
System.out.println("CHUNK : "+chunk);
}
}
static char[] make_residue(String candidate)
{
char[] result = new char[size];
for(int i=0; i<candidate.length(); i++)
{
int pos = find_position_for(candidate.charAt(i),i);
for(int j=i; j<pos; j++) result[j]=' ';
if(pos==-1) result[candidate.length()-1] = candidate.charAt(i);
else result[pos] = candidate.charAt(i);
}
return result;
}
static int find_position_for(char character, int offset)
{
character-=((int)'a');
for(int i=offset; i<size; i++)
{
// System.out.println("checking "+String.valueOf((char)(character+((int)'a')))+" at "+i);
if(!map[character][i])
{
map[character][i]=true;
return i;
}
}
return -1;
}
static String move_right(String a, int from)
{
return a.substring(0, from)+" "+a.substring(from);
}
static boolean taken(int character, int position)
{ return map[character][position]; }
static void take(char character, int position)
{
//System.out.println("taking "+String.valueOf(character)+" at "+position+" (char_index-"+(character-((int)'a'))+")");
map[character-((int)'a')][position]=true;
}
static int max_size(Stack stack)
{
int max=0;
while(!stack.isEmpty())
{
String s = stack.pop();
if(s.length()>max) max=s.length();
}
return max;
}
}
找到任何公共超序列并不是一件困难的任务:
在您的示例中,可能的解决方案类似于:
公共类 SuperSequenceTest {
public static void main(String[] args) {
String A = "caagccacctacatca";
String B = "cgagccatccgtaaagttg";
String C = "agaacctgctaaatgctaga";
int iA = 0;
int iB = 0;
int iC = 0;
char[] a = A.toCharArray();
char[] b = B.toCharArray();
char[] c = C.toCharArray();
StringBuilder sb = new StringBuilder();
while (iA < a.length || iB < b.length || iC < c.length) {
if (iA < a.length && iB < b.length && iC < c.length && (a[iA] == b[iB]) && (a[iA] == c[iC])) {
sb.append(a[iA]);
iA++;
iB++;
iC++;
}
else if (iA < a.length && iB < b.length && a[iA] == b[iB]) {
sb.append(a[iA]);
iA++;
iB++;
}
else if (iA < a.length && iC < c.length && a[iA] == c[iC]) {
sb.append(a[iA]);
iA++;
iC++;
}
else if (iB < b.length && iC < c.length && b[iB] == c[iC]) {
sb.append(b[iB]);
iB++;
iC++;
} else {
if (iC < c.length) {
sb.append(c[iC]);
iC++;
}
else if (iB < b.length) {
sb.append(b[iB]);
iB++;
} else if (iA < a.length) {
sb.append(a[iA]);
iA++;
}
}
}
System.out.println("SUPERSEQUENCE " + sb.toString());
}
}
然而,真正要解决的问题是找到已知问题的解决方案最短公共超序列 http://en.wikipedia.org/wiki/Shortest_common_supersequence http://en.wikipedia.org/wiki/Shortest_common_supersequence,
这并不容易。
有很多与该主题相关的研究。
例如参见:
http://www.csd.uwo.ca/~lila/pdfs/Towards%20a%20DNA%20solution%20to%20the%20Shortest%20Common%20Superstring%20Problem.pdf http://www.csd.uwo.ca/~lila/pdfs/Towards%20a%20DNA%20solution%20to%20the%20Shortest%20Common%20Superstring%20Problem.pdf
http://www.ncbi.nlm.nih.gov/pubmed/14534185 http://www.ncbi.nlm.nih.gov/pubmed/14534185
本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)