Failed to read xml error when using xmlread

43 Ansichten (letzte 30 Tage)
Sarah Immanuel
Sarah Immanuel am 13 Aug. 2020
Kommentiert: Walter Roberson am 17 Aug. 2020
I am trying to read several xml files in a loop using xmlread. An error 'Failed to read xml file' occurs. On examining the xml file I noticed that in the first line that says <?xml version="1.0" encoding="ISO8859-1"?>, if I change ISO8859-1 to ISO-8859-1, xmlread works. Is there an automated way to corect this or any other way to read the files in bulk without having to manually correct the header in each file?

Antworten (3)

dpb
dpb am 13 Aug. 2020
...
try
DOMnode=xmlread(filename(i)); % try to read the file
catch ME % catch the failure; fixup
fidi=fopen(filename(i),'r'); % open the file
fido=fopen('tmp','r'); % open a scratch temp file
while ~feof(fidi)
l=fgetl(fidi);
if ~empty(strfind(l,'ISO8859'))
l=strrep(l,'ISO8859','ISO-8859'); % fixup the record
end
fprintf(fid0,l) % output to temp file...
end
fidi=fclose(fidi);
fido=fclose(fido);
copyfile('tmp',filename(i)) % and copy over the original
end
DOMnode=xmlread(filename(i)); % and try again with corrected file...
  2 Kommentare
Sarah Immanuel
Sarah Immanuel am 14 Aug. 2020
Thanks a lot for your help.
Just a clarification: in this command fprintf(fid0,l) only the content of 'l' will be writtten to the tmp file? How do we get back all the other remaining content of the original file please?
Walter Roberson
Walter Roberson am 14 Aug. 2020
It is within the loop, so eventually the entire content is written.
However, the
fprintf(fid0, l)
should be
fwrite(fid0, l)

Melden Sie sich an, um zu kommentieren.


Walter Roberson
Walter Roberson am 14 Aug. 2020
Bearbeitet: Walter Roberson am 14 Aug. 2020
filename = 'InputFileName.xml';
S = fileread(filename);
SS = regexprep(S, 'encoding="ISO8859-', 'encoding="ISO-8859-', 'once');
if strcmp(S, SS)
remove = false; %optimization, do not write new file if not needed
tname = filename;
else
tname = tempname();
fid = fopen(tname, 'w');
fwrite(fid, tname);
fclose(fid);
remove = true;
end
DOMnode = xmlread(tname);
if remove; delete(tname); end
This code is deliberate in narrowing down to encoding= and only doing the first instance, so as to avoid accidentally changing any ISO8859 that might happen to be part of the data.
  3 Kommentare
Walter Roberson
Walter Roberson am 14 Aug. 2020
tname is set to filename when strcmp is true, not when it is false.
The comparison is true when the two strings S and SS are exactly the same, which would happen if regexprep did not make a change. Such as for a file that already has the right pattern, or which has a different encoding. In this situation the original file name is used directly for the later xmlread.
When the strcmp is false that means the original and regexprep versions are different, which means that the regexprep worked to make a new string. In that situation, a temporary file name is fetched, and the file is opened and the new content is written, and the temporary file is closed. It is this temporary file whose name is passed to xmlread. After the reading the temporary file is deleted
Walter Roberson
Walter Roberson am 14 Aug. 2020
See also https://www.mathworks.com/matlabcentral/answers/101632-how-can-i-use-a-function-such-as-xmlread-to-parse-xml-data-from-a-string-instead-of-from-a-file-i#comment_972999 which shows a Java related method. To use it you would do the fileread(), regexprep(), and then java.io.StringBufferInputStream() the result, and xmlread() what you get from that.

Melden Sie sich an, um zu kommentieren.


Sarah Immanuel
Sarah Immanuel am 14 Aug. 2020
Thanks a lot Walter, yes that makes sense. One last question, hope it is the last!. I am using Matlab2020a. The command tempfile() doesnt seem to work?
  3 Kommentare
Sarah Immanuel
Sarah Immanuel am 17 Aug. 2020
Hi Walter, thanks - the tempname() creates a tempfile but is not handled by the xmlread. It shows an error again. Can you help?
Walter Roberson
Walter Roberson am 17 Aug. 2020
maybe
tname = [tempfile() '.xml'];

Melden Sie sich an, um zu kommentieren.

Kategorien

Mehr zu Interactive Control and Callbacks finden Sie in Help Center und File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by