How To Load Multiple Text Files (specific context)

Question

0 Stimmen

1903395U1_04Jun19_152636_0001.Raw8.txt

Hello MatLab community, I would like to load many text files (same # of rows and columns) contained in a same folder and compile/stock all 2nd columns in a one matrix.

Here's a example : For 30 text files, the resulting matrix would thus have 30 columns and as many rows as the files contain (specifically, they'd all have 2048 rows).

But here's the catch, there's a multi-lines header (something like 8 lines of header) before the data and the data is separated by a semicolon '' ; ''.

One of the text files is attached as an example.

Also, the names of the text files do NOT follow a certain pattern and they are quite random. I've already asked a very similar question here, but I wasn't considering the header. One helpful guy wrote the script below and I'd like to tweek it a little bit to include the right parameters for the textscan().

% Set input folder
input_folder = 'C:\Users\Cotet\Downloads';
% Read all *.txt files from input folder
% NOTE: This creates a MATLAB struct with a bunch of info about each text file
% in the folder you specified. 
files = dir(fullfile(input_folder, '*.txt'));
% Get full path names for each text file
file_paths = fullfile({files.folder}, {files.name});
% Read data from files, keep second column
for i = 1 : numel(file_paths)
    
    
    % Read data from ith file. 
    % NOTE: If you're file has a text header, missing data, or 
    % uses non white-space delimiters, you should check out the
    % documentation for textread to determine which options to use.
    data = textscan(file_paths{i}, '');
    
    % Save second data column to matrix
    % NOTE: Your data files all need to have the same number of rows for this to work
    A(:, i) = data(:, 2);
    
end

The part with which I'm concerned is this note :

% NOTE: If you're file has a text header, missing data, or

% uses non white-space delimiters, you should check out the

% documentation for textread to determine which options to use.

I've tried many things, but was ultimately unsuccessful.

Thank you so much in advance.

8 Kommentare
6 ältere Kommentare anzeigen 6 ältere Kommentare ausblenden

dpb am 7 Jun. 2019

Bearbeitet: dpb am 7 Jun. 2019

In MATLAB Online öffnen

What's the issue in following the other poster's sage advice? Here's the beginning of the attached file (I inserted the line numbers)...

1
Integration time [ms]:   0.030
Averaging Nr. [scans]: 1
Smoothing Nr. [pixels]: 0
Data measured with spectrometer [name]: 1903395U1
Wave   ;Sample   ;Dark     ;Reference;Scope
[nm]   ;[counts] ;[counts] ;[counts] 
8
189.95;  383.425;    0.000;    0.000
190.09;  416.425;    0.000;    0.000
90.24;  439.425;    0.000;    0.000
 ....

from which it's pretty easy to see there are 8 headerlines. As noted the delimiter is a semicolon so what's the problem with

data = textscan(file_paths{i}, '','headerlines',8,'delimiter',';');

Seems pretty straightforward.

If there aren't always the same number of header lines, then you may have a more difficult issue, but detectImportOptions will parse a file as regular as this easily and then you can use readtable instead or importdata will likely have no issues at an even simpler interface.

The question of how the files are named is something else entirely -- you'll have to have some way to either build a wildcard string that matches the subset you want or build a list manually or have some other way to do the selection on a case-by-case basis--Matlab is smart, but it's not prescient in being able to discern who'w wanted and who's not automagically. As that other respondent noted, his solution works--move the wanted files into their own subdirectory.

Thomas Côté am 10 Jun. 2019

In MATLAB Online öffnen

Thanks for the reply, really appreciated. Though, when I run the following code with your addition, I get the error below :

% Set input folder
input_folder = 'C:\Users\Cotet\Desktop\Calendrier de Travail\06 - Juin\4 juin\Thomas - 4200';
% Read all *.txt files from input folder
% NOTE: This creates a MATLAB struct with a bunch of info about each text file
% in the folder you specified. 
files = dir(fullfile(input_folder, '*.txt'));
% Get full path names for each text file
file_paths = fullfile({files.folder}, {files.name});
% Read data from files, keep second column
for i = 1 : numel(file_paths)
    
    
    % Read data from ith file. 
    % NOTE: If your file has a text header, missing data, or 
    % uses non white-space delimiters, you should check out the
    % documentation for textread to determine which options to use.
    data = textscan(file_paths{i}, '', 'headerlines', 8, 'delimiter', ';'));
    
    % Save second data column to matrix
    % NOTE: Your data files all need to have the same number of rows for this to work
    A(:, i) = data(:, 2);
    
end
% Calculate the average of the rows (second dimension) of A:
avg = mean(A, 2);

Error: File: MatLab - Conseil.m Line: 18 Column: 31

Invalid expression. When calling a function or indexing a variable, use parentheses. Otherwise, check for mismatched delimiters.

What's the problem? I thought at first it might be the curvy brackets for the ""file_paths{i}"", but I've tried both the parenthesis () and the square brackets []. Same error. There's still something wrong with how I'm calling the textscan() function. Also, all the text file are identical in the sense that only the values change.

I've noticed something though in the text file. We use the semicolon as the delimiter, but as for the 4th column, there's no semicolon at its end. Is it still ok?

Thanks!

Thomas Côté am 10 Jun. 2019

In MATLAB Online öffnen

Thanks Bob for the response. Unfortunately, when I use the formatspec and your code, it returns one row of empty cells (there should be 2048 rows and 4 columns). However, I think you're onto something. The FOR loop runs until the end, which is a good thing, and by that I mean this :

 data = textscan(file_paths{i}, format, 'headerlines', 8);
    
    % Save second data column to matrix
    % NOTE: Your data files all need to have the same number of rows for this to work
    A(:, i) = data(:, 2);
    

The data has 2048 rows and 4 columns. Then, I ask MatLab to stock only the 2nd column. After, the FOR loop do this with my other 30 files. So, in the end, because I have 31 files in total, I should end up with a matrix containing 31 columns (representing the 2nd column of each file) and 2048 rows (all the values of each of those 2nd columns).

Now, I have 31 columns as desired, but only 1 row with empty values. How could we fix this?

Thomas Côté am 10 Jun. 2019

Bearbeitet: Thomas Côté am 10 Jun. 2019

1903395U1_04Jun19_152636_0001.Raw8.txt

Oh, also, haha! The number '9 189.95' is actually only 189.95 (on the 9th row). The reason why you see "9 189.95" is because the commentator user "dpb" copied/pasted my text file into a MatLab script. The numbers should be red as this (text file attached) :

189.95; 424.600; 0.000; 0.000

190.09; 421.600; 0.000; 0.000

190.24; 427.600; 0.000; 0.000

190.38; 450.600; 0.000; 0.000

190.53; 421.600; 0.000; 0.000

190.68; 398.600; 0.000; 0.000

Melden Sie sich an, um zu kommentieren.

Melden Sie sich an, um diese Frage zu beantworten.

Follow Question

Answer 1

Guillaume am 10 Jun. 2019

In MATLAB Online öffnen

1 Stimme

As dpb suggested use one of the modern file import function such as readtable or readmatrix instead of the old textscan. These can figure the format of your file on their own or if they're struggling a bit have plenty of easy to understand options to help them along. They're also a lot more configurable, particularly if you use detectImportOptions.

For example, your text file is easily decoded with:

spectrum = readtable('1903395U1_04Jun19_154040_0001.Raw8.txt', 'HeaderLines', 8)

or for a neater table:

opts = detectImportOptions('1903395U1_04Jun19_154040_0001.Raw8.txt', 'ExpectedNumVariables', 4);  %only needed once for all the files that follow the same format
spectrum = readtable('1903395U1_04Jun19_154040_0001.Raw8.txt', opts)

detectImportOptions automatically figure out that the header is 8 lines, that the delimiter is ; and that the name of the columns is on the 6th row. I've told it that there is only 4 variables despite the header having 5 names (why is there a 'scope'?).

You can easily wrap that in a loop over all the files. The detectImportOptions is only needed once if all the files follow the same format. You can store the table from each file into a cell array but if your aim is to run statistics across the files then you'd be better off storing it all as one flat table with an additional variable indicating which file the data comes from. After that you can use groupsumarry or similar to compute your statistics all at once.

So the code would be something like:

%Get list of files. You haven't explained how these can be obtained.
filelist = dir('C:\somefolder\*.txt');
%Loop to read all files:
spectra = cell(size(filelist));  %stored in a file array at first
opts = detectImportOptions(fullfile(filelist(1).folder, filelist(1).name, 'ExpectedNumVariables', 4); 
for fileidx = 1:numel(filelist)
    spectrum= readtable(fullfile(filelist(fileidx).folder, filelist(fileidx).name), opts);  %read file
    spectrum.Source = repmat({filelist(fileidx).name}, height(spectrum), 1);  %add a variable indicating the source. Maybe you want to use only part of the filename
    spectra{fileidx} = spectrum;
end
%flatten it all into one table
spectra = vertcat(spectra{:});
%compute some stats, e.g. mean and standard deviation of spectra at each wavelength across the files
groupsumarry(spectra, 'Wave', {'mean', 'std'}, {'Sample', 'Dark', 'Reference'})

Code untested. There might be typos. Read the error messages carefully. Note that I'm using meaningful variable names instead of the utterly useless A.

1 Kommentar
-1 ältere Kommentare anzeigen -1 ältere Kommentare ausblenden

Thomas Côté am 10 Jun. 2019

In MATLAB Online öffnen

Thanks/Merci Guillaume, I really appreciate your help and it worked! I made little modifications and here's the working script I'll use :

%Get list of files. You haven't explained how these can be obtained. God drops the files here!
filelist = dir('C:\Users\Cotet\Desktop\Calendrier de Travail\06 - Juin\4 juin\Thomas - 4200\*.txt');
%Loop to read all files:
spectra = cell(size(filelist));  %stored in a file array at first
opts = detectImportOptions(fullfile(filelist(1).folder, filelist(1).name), 'ExpectedNumVariables', 4);
for fileidx = 1:numel(filelist)
    spectrum= readtable(fullfile(filelist(fileidx).folder, filelist(fileidx).name), opts);  %read file
    spectrum.Source = repmat({filelist(fileidx).name}, height(spectrum), 1);  %add a variable indicating the source. Maybe you want to use only part of the filename
    spectra{fileidx} = spectrum;
    AllSpec(:, fileidx) = spectrum(:, 2);
end
utterly_useless_A = table2array(AllSpec);
% Calculate the average of the rows (second dimension) of utterly_useless_A:
avg = mean(utterly_useless_A, 2);
Spectrum_Avg = [table2array(spectrum(:,1)) avg];

I hope you don't mind the "utterly useless A". I really dig that name haha!

Have a great day!

Melden Sie sich an, um zu kommentieren.

How To Load Multiple Text Files (specific context)

8 Kommentare
6 ältere Kommentare anzeigen 6 ältere Kommentare ausblenden

Akzeptierte Antwort

1 Kommentar
-1 ältere Kommentare anzeigen -1 ältere Kommentare ausblenden

Weitere Antworten (0)

Kategorien

Produkte

Version

Tags

Community Treasure Hunt

How To Load Multiple Text Files (specific context)

8 Kommentare 6 ältere Kommentare anzeigen 6 ältere Kommentare ausblenden

Akzeptierte Antwort

1 Kommentar -1 ältere Kommentare anzeigen -1 ältere Kommentare ausblenden

Weitere Antworten (0)

Kategorien

Produkte

Version

Tags

Siehe auch

Community Treasure Hunt

8 Kommentare
6 ältere Kommentare anzeigen 6 ältere Kommentare ausblenden

1 Kommentar
-1 ältere Kommentare anzeigen -1 ältere Kommentare ausblenden