How to compare letters if they are same

Hello
I have a problem with a code. I would like to compare two files with letters/names. I would like to find which name of file 2 corresponds to the name of file 1 , depending on the name similarity.
I mean I would like to match names from one file name to another file name.
Which command should I use?
Thank you in advance

11 Kommentare

dpb
dpb am 19 Jan. 2021
Give us some examples...exact matches use strfind and friends or the new(ish) string functions contains et al.
Pattern matching would use regular expressions which is a whole new world...
Rik
Rik am 19 Jan. 2021
Should those be detected as the same? Maybe the unique function will even do the trick (if you put the file names in a cellstr)
Ivan Mich
Ivan Mich am 19 Jan. 2021
Excuse me but I do not understand. what do you mean wtih the unique function will even do the trick (if you put the file names in a cellstr) ?
Ivan Mich
Ivan Mich am 19 Jan. 2021
Which command should I use?
Walter Roberson
Walter Roberson am 19 Jan. 2021
MATLAB would say that upper('Αθήνα') is ΑΘΉΝΑ not ΑΘΗΝΑ
Ivan Mich
Ivan Mich am 19 Jan. 2021
So is there a way to recognise the same as lower as upper letters ? (Despite of lower and upper letters, I want to match same words between these two files) ?
Stephen23
Stephen23 am 19 Jan. 2021
Bearbeitet: Stephen23 am 19 Jan. 2021
Disregarding case, do the names have to match exactly, or just "depending on the name similarity" as you wrote?
If you are need matching based on similarity (as you wrote), then you could use some edit distance metric (e.g. the Levenshtein distance).
How should diacritics be treated: as different characters, or ignored?
Ivan Mich
Ivan Mich am 19 Jan. 2021
I would like diacritics to be ignored if is it possible
Walter Roberson
Walter Roberson am 19 Jan. 2021
Which letters need to be supported? Do you need to support removing diacritics from letters other than U+0386 to U+0390 ? https://www.compart.com/en/unicode/block/U+0370 ? There are a lot of them... https://en.wikipedia.org/wiki/Greek_script_in_Unicode
Rik
Rik am 21 Jan. 2021
@Ivan Mich You know this forum. After 89 questions you should know we don't like people deleting parts of the question. Please don't do that. You are only giving people work to recover what you deleted, making your deletion pointless as well.

Melden Sie sich an, um zu kommentieren.

Antworten (2)

Walter Roberson
Walter Roberson am 20 Jan. 2021

0 Stimmen

Typically the easiest way to handle situations like this, that are not plain upper / lower case (use upper() or lower() or strcmpi() for those), is to create a mapping table,
%I only partly filled this out; see
%https://en.wikipedia.org/wiki/Greek_script_in_Unicode
map = char(0x001:0x03ff);
%https://www.unicode.org/charts/PDF/U0370.pdf
map(0x0391:0x03a9) = 0x03b1:0x03c9; %alpha to omega upper to lower ΑΩ αω
map(0x0386) = 0x03b1; % Ά
map(0x0388) = 0x03b5; % Έ
map(0x0389) = 0x03b7; % Ή
map(0x038A) = 0x03b9; % Ί
map(0x038C) = 0x03b9; % Ό
map(0x038E) = 0x03c5; % Ύ
map(0x038F) = 0x03c9; % Ώ
map(0x03AC) = 0x03b1; % ἀ
map(0x03AD) = 0x03b5; % ἐ
map(0x03AE) = 0x03b7; % ἠ
map(0x03AF) = 0x03b9; % ἰ
%https://www.unicode.org/charts/PDF/U0080.pdf
map(0x00b5) = 0x03bc; %mu
%https://en.wikipedia.org/wiki/Greek_Extended
map(0x1f00:0x1f0f) = 0x03b1; %alpha extended
map(0x1f10:0x1f1f) = 0x03b5; %epsilon extended
map(0x1f20:0x1f2f) = 0x03b7; %eta extended
%and more
After which you take
map('Αθήνα')
ans = 'αθηνα'
map('ΑΘΗΝΑ')
ans = 'αθηνα'

5 Kommentare

Ivan Mich
Ivan Mich am 20 Jan. 2021
Thank you , but command window shows me :
Error: File: greek_letters.m Line: 26 Column: 13
Invalid expression. Check for missing multiplication operator, missing or unbalanced
delimiters, or other syntax error. To construct matrices, use brackets instead of
parentheses.
Which MATLAB release are you using?
In your file, is line 26 the one that is
map(0x1f20:0x1f2f) = 0x03b7; %eta extended
?
The code was tested in R2020b.
Ivan Mich
Ivan Mich am 20 Jan. 2021
I am using MATLAB 2019a.
line 26 is map = char(0x001:0x03ff);
Your version did not have hex input yet.
In each place that I coded 0x followed by digits, convert that to a call to hex2dec() with the digits in quotes. You might need to remove the 0x part. For example
map = char(hex2dec('0001'):hex2dec('03ff'));
You can skip the leading 0, such as hex2dec('3ff') but using the leading 0 helps to emphasize that you are using Unicode code points, which by convention are given in 4 digit hex until 0x10000
%I only partly filled this out; see
%https://en.wikipedia.org/wiki/Greek_script_in_Unicode
H = @hex2dec;
map = char(H('0001'):H('03ff'));
%https://www.unicode.org/charts/PDF/U0370.pdf
map(H('0391'):H('03a9')) = H('03b1:H('03c9; %alpha to omega upper to lower ΑΩ αω
map(H('0386')) = H('03b1'); % Ά
map(H('0388')) = H('03b5'); % Έ
map(H('0389')) = H('03b7'); % Ή
map(H('038A')) = H('03b9'); % Ί
map(H('038C')) = H('03b9'); % Ό
map(H('038E')) = H('03c5'); % Ύ
map(H('038F')) = H('03c9'); % Ώ
map(H('03AC')) = H('03b1'); % ἀ
map(H('03AD')) = H('03b5'); % ἐ
map(H('03AE')) = H('03b7'); % ἠ
map(H('03AF')) = H('03b9'); % ἰ
%https://www.unicode.org/charts/PDF/U0080.pdf
map(H('00b5')) = H('03bc'); %mu
%https://en.wikipedia.org/wiki/Greek_Extended
map(H('1f00'):H('1f0f')) = H('03b1'); %alpha extended
map(H('1f10'):H('1f1f')) = H('03b5'); %epsilon extended
map(H('1f20'):H('1f2f')) = H('03b7'); %eta extended
%and more

Melden Sie sich an, um zu kommentieren.

Stephen23
Stephen23 am 20 Jan. 2021
Bearbeitet: Stephen23 am 20 Jan. 2021

0 Stimmen

Rather than building maps by hand, I would get Python to do the heavy lifting, e.g.:
baz = @(v)char(v(1)); % only need the first decomposed character.
fun = @(c)baz(py.unicodedata.normalize('NFKD',c)); % to remove diacritics.
in1 = 'Αθήνα';
in2 = 'ΑΘΗΝΑ';
st1 = arrayfun(fun,in1) % remove diacritics
st1 = 'Αθηνα'
st2 = arrayfun(fun,in2) % remove diacritics
st2 = 'ΑΘΗΝΑ'
strcmpi(st1,st2) % case-insensitive comparison
ans = logical
1

Kategorien

Gefragt:

am 19 Jan. 2021

Kommentiert:

Rik
am 21 Jan. 2021

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by