detecting Excel files without Excel extension
5 Ansichten (letzte 30 Tage)
Ältere Kommentare anzeigen
I need to be able to identify an Excel file without using xlsfinfo. It is a VERY slow function. Without it, my algorithm runs faster. Any ideas?
Background: Our test setup outputs text files that are named with the ".xls" extension. Back in Excel 2003 and before, if you named a tab delimited text file with "xls", Excel would open the file no problem and properly interpret it. Now, it is causing me headaches with importing data into MATLAB. MATLAB uses the ".xls" extension to check for wrong input to "textscan" and "importdata." Here is where it gets interesting. When you open a text file with the ".xls" extension and then save it, it is now an excel file and no longer tab delimited text file. Basically, I need to be able to tell the difference between text and Excel files without xlsfinfo because xlsfinfo is very slow.
0 Kommentare
Akzeptierte Antwort
Iain
am 23 Mai 2013
You can try something like
fid = fopen(filename,'r');
fifty_bytes = fread(fid,50,'uint8'); fclose(fid);
Valid_characters = ['A':'Z' 'a':'z' ',./\ 0123456789' 9 10 13 '!"£$%^&*()[]{}-=_+ ... and any other valid characters you expect in your text files. ]
excel_file = false;
for i = 1:50
if any(fifty_bytes ~= Valid_characters)
excel_file = true;
end
end
3 Kommentare
Weitere Antworten (2)
José-Luis
am 23 Mai 2013
Bearbeitet: José-Luis
am 23 Mai 2013
You could always check the file signature. You would need to read the file as a binary file and look into specific locations, using hexadecimal values.
Here is a snippet that will give you a string of the hexadecimal values:
fid = fopen('/path/to/test.xls','r');
uid = fread(fid,4,'ushort','b');
fseek(fid,520,'bof');
ftype = fread(fid,4,'ushort','b');
fclose(fid);
s1 = sprintf('%X',uid)
s2 = sprintf('%X',ftype)
If the file is an office document, s1 should be:
D0 CF 11 E0 A1 B1 1A E1 without the spaces
It it is an excel spreasheet, then s2 should be:
FD FF FF FF nn xx where the last two values (nn and xx) can change within a certain range.
It would be a matter of using strcmp(). You could of also compare hexadecimal values directly with fi(). For your purpose, it sounds like it would be enough to check if it is an Office document.
Text files carry no file identifier.
4 Kommentare
José-Luis
am 24 Mai 2013
What are you getting? Maybe it is a little endian system? You could try with:
uid = fread(fid,4,'ushort','l');
Same thing for ftype.
Siehe auch
Kategorien
Mehr zu Spreadsheets finden Sie in Help Center und File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!