how to remove punctuation from Arabic text file

5 Ansichten (letzte 30 Tage)
Fateme Jalali
Fateme Jalali am 31 Jul. 2016
Bearbeitet: Thorsten am 1 Aug. 2016
Hello,I have a Arabic string and want to discard all punctuations. I want to keep only text and white space between words.For example this is my string: str='سلام. دوست خوب من!'. can I change codes below to do it?
str= fileread('D:/docc111.txt');
str1 = regexprep(str,'\s+',' ');%replace enter with white space
%or str1 = regexprep(str,'[\n\r]+',' ')
%str1 = 'Hello, I need 1 MATLAB code to discard all punctuation, and signs from 9 text files.'
Lstr1=length(str1);
str_space='\s'; %String of characters
str_caps='[A-Z]';
str_ch='[a-z]';
str_nums='[0-9]';
ind_space=regexp(str1,str_space);%Match regular expression
ind_caps=regexp(str1,str_caps);
ind_chrs=regexp(str1,str_ch);
ind_nums=regexp(str1,str_nums);
mask=[ind_space ind_caps ind_chrs ind_nums];
num_str2=1:1:Lstr1;
num_str2(mask)=[];
str3=str1;
str3(num_str2)=[];
chars = [str3];
%insert space after first index and after last index in chars
charsWithWhitespace = [' ', chars(1:end), ' '];
newTest = sprintf(strrep(charsWithWhitespace, '\n', ' '));
fid = fopen('myySE1.txt','w');
fprintf(fid, '%s',charsWithWhitespace);
fclose(fid);

Antworten (1)

Walter Roberson
Walter Roberson am 31 Jul. 2016

Kategorien

Mehr zu Cell Arrays finden Sie in Help Center und File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by