Cannot read emojis correctly from an imported json file
    8 Ansichten (letzte 30 Tage)
  
       Ältere Kommentare anzeigen
    
I have a json file coming from an exported telegram chat. I made some code to import it and process it so to have as a result a 2-column cell array where the first column is the sender and the second column is the message. Problem is: I see the whole message fine, except foor the emoji. They either come up as a weird character with a bunch of squares after it or straight up just a square (depending on how I import the json). Following code is the importing section:
fname = 'result.json'; 
fid = fopen(fname); 
raw = fread(fid,inf); 
% this way, an emoji is shown as ð
% fid = fopen(fname,'r','n','UTF-8'); 
% raw = fread(fid,'*char'); 
% this way, an emoji is shown as 
str = char(raw'); 
fclose(fid); 
val = jsondecode(str);
data=cell(size(val.messages,1),2);
for i=1:size(val.messages,1)
    if val.messages{i,1}.type=='message'
        data{i,1}=val.messages{i,1}.from;
        data{i,2}=val.messages{i,1}.text;
    else
        data{i,1}='not message';
        data{i,2}='not message';
    end
end
I need some help to figure out how to show emojis properly. Or at least to have a way to distinguish them (like some ID string/code or something), since I need to do some data analysis down the line. How could I find a solution to this? Can I use some different importing step?
I also have HTML files of the chat and I found a way to import them successfully (even though the processing is much more difficult). The emojis in the files show fine but are not shown in MATLAB. This might be a second step in case I can't solve the json problem. Any help is appreciated, thank you.
Edit: I've seen some ready-made whatsapp parsers that automatically organize data in tables. Alternatively, if someone has something similar for telegram raw data, it would be nice. I'd love to solve this problem directly to learn more, but if that isn't possible then I'd love an alternative solution.
0 Kommentare
Antworten (1)
  Poorna
      
 am 7 Apr. 2024
        Hi Paye, 
I see that you want to extract emojis from your exported telegram chat. If you have access to the html files of the chat, then you can use the "extractHTMLText" function to extract the text from your html file. In most cases this will also read the emojis from the text. You can then use the "tokenizedDocument" function to tokenize the extracted text. This function will automatically detect emojis and assign their type to be emoji. 
To know more about the above functions, refer to the following documentation:
To know more about analyzing emojis in MATLAB, refer to the following documentation:
Hope this Helps!
2 Kommentare
Siehe auch
Kategorien
				Mehr zu Text Data Preparation finden Sie in Help Center und File Exchange
			
	Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!

