How to extract text from .json files and combine them?
5 Ansichten (letzte 30 Tage)
Ältere Kommentare anzeigen
Susan
am 28 Mär. 2020
Kommentiert: Ameer Hamza
am 31 Mär. 2020
Hello everyone,
I've got some questions and any inputs would be greatly appreciated. I have bunch of .json files, say 1000. To read each files I run the following code
fname = 'C:\Users\...\d90f3c62681e.json';
val = jsondecode(fileread(fname));
the output is as follows. For each file the paper_id, the size of abstract, and the size of body_text changes. I am interested in the text data in the "abstract" and the "body text". How can I extract text file in the abstract and body_text, and combine all these .json files into one file?
val =
struct with fields:
paper_id: 'd90f3c62681e'
metadata: [1×1 struct]
abstract: [1×1 struct]
body_text: [4×1 struct]
bib_entries: [1×1 struct]
ref_entries: [1×1 struct]
back_matter: []
val.abstract =
struct with fields:
text: '300 words)
cite_spans: []
ref_spans: []
section: 'Abstract'
val.body_text =
4×1 struct array with fields:
text
cite_spans
ref_spans
section
4 Kommentare
Walter Roberson
am 28 Mär. 2020
Which release are you using? When I try in R2020a, I get
paper_id: '0a43046c154d0e521a6c425df215d90f3c62681e'
>> val.abstract
struct with fields:
text: '300 words) 33 Quantification of aerosolized influenza virus [and a bunch more]
Akzeptierte Antwort
Ameer Hamza
am 31 Mär. 2020
Bearbeitet: Ameer Hamza
am 31 Mär. 2020
As I answered in the comment on your other question, the following code will create a struct by combining the fields from individual files. It will then create a combined JSON file
files = dir('JSON files/*.json');
s = struct('abstract', [], 'body_text', []);
for i=1:numel(files)
filename = fullfile(files(i).folder, files(i).name);
data = jsondecode(fileread(filename));
if ~isempty(data.abstract)
s.abstract = [s.abstract; cell2struct({data.abstract.text}, 'text', 1)];
end
if ~isempty(data.body_text)
s.body_text = [s.body_text; cell2struct({data.body_text.text}, 'text', 1)];
end
end
str = jsonencode(s);
f = fopen('filename.json', 'w');
fprintf(f, '%s', str);
fclose(f);
2 Kommentare
Weitere Antworten (1)
Mohammad Sami
am 30 Mär. 2020
You can import your data into cell arrays
filelist = {};
vals = cell(length(filelist),1);
haveabstract = false(length(filelist),1);
havebody = false(length(filelist),1);
data = cell(length(filelist),3);
% first col paper_id, second_col abstract, third col body
for i=1:length(filelist)
vals{i} = jsondecode(fileread(filelist{i}));
haveabstract(i) = ~isempty(vals{i}.abstract);
havebody(i) = ~isempty(vals{i}.body_text);
data{i,1} = vals{i}.paper_id;
if haveabstract(i)
data{i,2} = vals{i}.abstract;
end
if havebody(i)
data{i,3} = vals{i}.body_text
end
end
Siehe auch
Kategorien
Mehr zu JSON Format finden Sie in Help Center und File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!