How to use Unicode numeric values in regexprep?

1 Ansicht (letzte 30 Tage)
Vlad Atanasiu
Vlad Atanasiu am 28 Mär. 2024
Beantwortet: Stephen23 am 28 Mär. 2024
How can "Häagen-Dasz" be converted to "Haagen-Dasz" using Uincode numeric values? For example,
regexprep('Häagen-Dasz','ä','A')
works fine, but
regexprep('Häagen-Dasz','\x{C4}','a')
does not. Here, the hexadecimal \x{C4} stands for [latin capital letter a] with diaeresis, i.e. [ä].

Akzeptierte Antwort

Yash
Yash am 28 Mär. 2024
Bearbeitet: Yash am 28 Mär. 2024
Hi Vlad,
'\x{C4}' represents the Unicode character Ä (Latin Capital Letter A with Diaeresis) in hexadecimal notation.
If you want to replace ä (Latin Small Letter A with Diaeresis), you should use \x{E4}, which is its Unicode hexadecimal representation.
In the context of your question, you're looking to replace ä with a. The correct approach would be to use the Unicode numeric value for ä in the regex and replace it with a. Here is the code:
regexprep('Häagen-Dasz','\x{E4}','a')
ans = 'Haagen-Dasz'
Hope this helps!

Weitere Antworten (2)

Stephen23
Stephen23 am 28 Mär. 2024
inp = 'Häagen-Dasz';
baz = @(v)char(v(1)); % only need the first decomposed character.
out = arrayfun(@(c)baz(py.unicodedata.normalize('NFKD',c)),inp) % remove diacritics.
out = 'Haagen-Dasz'
Read more:
https://docs.python.org/3/library/unicodedata.html
https://stackoverflow.com/questions/16467479/normalizing-unicode

VBBV
VBBV am 28 Mär. 2024
regexprep('Häagen-Dasz','ä','A')
ans = 'HAagen-Dasz'
regexprep('Häagen-Dasz','ä','\x{C4}')
ans = 'HÄagen-Dasz'
  2 Kommentare
VBBV
VBBV am 28 Mär. 2024
Verschoben: VBBV am 28 Mär. 2024
regexprep('Häagen-Dasz','\x{e4}','a')
ans = 'Haagen-Dasz'
VBBV
VBBV am 28 Mär. 2024
The unicode character for small a is \x{e4}

Melden Sie sich an, um zu kommentieren.

Kategorien

Mehr zu Just for fun finden Sie in Help Center und File Exchange

Produkte

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by