Info

Diese Frage ist geschlossen. Öffnen Sie sie erneut, um sie zu bearbeiten oder zu beantworten.

Ignore Deletions with Edit Distances (String Editing)

1 Ansicht (letzte 30 Tage)
Marcel Dorer
Marcel Dorer am 22 Apr. 2016
Geschlossen: MATLAB Answer Bot am 20 Aug. 2021
Hi, I'm trying to compare 2 strings with a function based on Miguel Castro's EditDist.m function. The function works pretty well but in my case I need to ignore some of the Deletions, namely all in the beginning and the end of the string.
For example when I compare the 2 Strings 'XXXXMatlabXXXX' and 'YYMatlabYY' the first 2 'X' and the last 2 'X' which would be deletions shouldn't count towards the EditDistance value (which should be 4 in this case). Basically one of the 2 strings has a random number of random surrounding values that should be ignored, deletions after the first Insertion/Replacement/Correct Value should be counted normally, at least until there is only a tail of deletions left.
Help would be really appreciated!
Here is the relevant part of the function I'm using:
for i = 1:n1
D(i+1,1) = D(i,1) + DelCost;
end;
for j = 1:n2
D(1,j+1) = D(1,j) + InsCost;
end;
for i = 1:n1
for j = 1:n2
if s1(i) == s2(j)
Repl = 0;
else
Repl = ReplCost;
end;
D(i+1,j+1) = min([D(i,j)+Repl D(i+1,j)+DelCost D(i,j+1)+InsCost]);
end;
end;
d = D(n1+1,n2+1);

Antworten (1)

Arnab Sen
Arnab Sen am 26 Apr. 2016
Bearbeitet: Arnab Sen am 27 Apr. 2016
Hello Marcel,
I am assuming that between two strings s1 and s2, s1 is known to be the one which is wrapped with some redundant characters.
Now, let's dig into what is meant by D(i,j) in the script. It means that the conversion cost of s1.substring(1,i) to s2.substring(1,j) and vice verse. Now, let's assume that after kth index of s1, all the indices are redundant. So,
D(n1,n2)=D(k,n2)+(n1-k)*DelCost.
So, Now the task is simple. We need to find out the value of k. Following code snippet should do that:
i=n1;
while(D(i,n2)-D(i-1,n2)==DelCost)
{
i=i-1;
}
k=i;
So, the last (n1-k) chars are redundant in s1.
Now we need to find out the front end redundant characters in s1. For this we can create another table (say X) where
X(i,j)= The conversion cost of s1.subtring(i,n1) to s2.sunstring(j,n2) and adopt similar approach.
A simpler approach would be just reverse the string s1 (say s1')and s2 (s2') and call edit distance again and perform same workflow. Now redundant character at the end of s1' are the redundant characters in the front end of the original string s1.
At the end subtracts DelCost*(number of total redundant characters in s1) from the original output.
  2 Kommentare
Marcel Dorer
Marcel Dorer am 26 Apr. 2016
Thanks a lot for the answer, it was pretty helpful and I understand the principle. There is only 1 thing I fail to understand:
{
i--;
}
I'm no matlab expert and I have to admit that I've never seen an expression like that. If I try to use that part in matlab a bracket error occurs. I'd really appreciate if you could explain this a little more!
Arnab Sen
Arnab Sen am 26 Apr. 2016
Bearbeitet: Arnab Sen am 26 Apr. 2016
Hi,
You are correct. MATLAB does not recognize i--. It's common in languages like C, C++, Java. Please consider the expression as
{
i=i-1;
}
I have edited the original answer as well accordingly. Thanks for pointing this out.
Please accept the answer if this helps.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by