Importing .dat data and creating a matrix

49 Ansichten (letzte 30 Tage)
OcDrive
OcDrive am 15 Aug. 2023
Bearbeitet: dpb am 15 Aug. 2023
Hello
I got slightly more than one hundred .dat files. I'm attaching one of them here (converted into .txt because it didn't allow me to upload .dat). I'm trying to import only one at a time at this moment.
Generally the format is a row of date: yyyy mm dd and below are rows and columns of data. What I'm trying to do is import that data and limit it by four points - creating a matrix - as follows:
(11,5).....(21,5)
(11,13)....(21,13)
I want to create this matrix x 365 days - so import the data and assign it to its date somehow. I've tried the built in Import Tool and the readtable function but I can't get it to work for me at all. Does someone know a good way to do that?
Thanks
  3 Kommentare
Star Strider
Star Strider am 15 Aug. 2023
To use readtable with a ‘.dat’ file, use the name-value pair 'FileType','text' listed under Text Files (no direct link to it), since they appear to be text files.
To upload one or more of them here (as .dat files), use the zip function and then upload the .zip file.
OcDrive
OcDrive am 15 Aug. 2023
Thanks Star Strider
function matrix = data(filename);
start_row = 5;
end_row = 13;
start_col = 11;
end_col = 21;
rows_per_iteration = 17;
cols_per_iteration = end_col - start_col + 1;
iterations = 365;
matrix = zeros(end_row - start_row + 1, cols_per_iteration, iterations);
fid = fopen(filename, 'r');
for i = 1:iterations
data = textscan(fid, '%f', rows_per_iteration * cols_per_iteration, 'HeaderLines', 1);
data = reshape(data{1}, cols_per_iteration, rows_per_iteration)';
matrix(:, :, i) = data(start_row:end_row, :);
fseek(fid, (rows_per_iteration - 1) * cols_per_iteration * 8, 'cof');
if feof(fid)
break;
end
end
fclose(fid);
end
I generally got this function and it works, but it seems that what I see as a column - Matlab doesn't. The matrix is shifted two rows up and to columns to the left, I used HeaderLines one as in the data file. Do you know what could be a reason? Should I rather than columns/rows look at numbers delimited by space somehow?

Melden Sie sich an, um zu kommentieren.

Akzeptierte Antwort

Chetan Bhavsar
Chetan Bhavsar am 15 Aug. 2023
Bearbeitet: Chetan Bhavsar am 15 Aug. 2023
% Constants
START_ROW_OFFSET = 5;
END_ROW_OFFSET = 13;
START_COL = 11;
END_COL = 21;
% Read data
data = readmatrix('data.dat.txt');
% Generate list of row numbers for the dates
date_rows = 1:17:size(data, 1);
% Initialize a structure array
data_struct = struct('date', [], 'data', []);
for i = 1:length(date_rows)
% Extract the year, month, and day
year = data(date_rows(i), 2);
month = data(date_rows(i), 4);
day = data(date_rows(i), 6);
% Format the date into a string
date_str = sprintf('%d-%02d-%02d', year, month, day);
% Extract data using the defined constants
data_values = data(date_rows(i) + START_ROW_OFFSET - 1 : date_rows(i) + END_ROW_OFFSET - 1, START_COL:END_COL);
% Store in the structure
data_struct(i).date = date_str;
data_struct(i).data = data_values;
end
for i = 1:3
disp(['Data for Date ' data_struct(i).date]);
disp(data_struct(i).data);
disp('------------------------');
end
Data for Date 1998-01-01
98387 98255 98074 98449 99182 99990 100568 100603 100861 101187 101268 99143 98458 98084 97581 98146 99136 98331 98211 99009 100364 101132 99868 98982 98199 96934 97451 96596 96487 97043 98757 99610 101667 100436 99509 98451 96733 96407 96485 96967 97942 99548 100202 101319 101411 100395 99021 96397 96544 96479 97728 99471 100483 100873 101257 102271 101268 99276 98579 98819 99015 99753 100783 101214 101496 101861 102418 101469 100199 100567 100900 101206 101697 101766 101614 102069 102288 102822 101443 100880 101504 102320 102882 102920 102210 101270 101661 102370 102709 101528 101785 102358 103194 103465 103313 102328 101469 101613 102223
------------------------
Data for Date 1998-01-02
98906 98972 98380 98217 98842 99546 99519 99260 99767 100432 101085 99665 98991 98480 97615 97962 98400 97461 98072 99090 99916 100528 99877 99638 98673 97949 97381 96417 95482 96163 98124 99604 101165 100458 100387 99563 98177 97258 96071 94445 95254 96965 99862 100964 100822 101035 100596 98528 98348 97551 96629 96736 97662 100071 101103 100406 101573 101312 100127 99707 99911 99026 98385 99167 100816 101463 100773 102167 102293 100930 99727 101196 101033 100732 101222 101760 102051 102435 103077 102660 100998 101513 102468 102370 102239 101933 101709 102329 103652 103375 102700 102086 102704 103134 103063 102764 102374 101744 102164
------------------------
Data for Date 1998-01-03
99060 99188 98855 98879 100021 100708 100690 100448 100443 100703 101048 100286 99463 98927 98155 98326 99091 98606 98514 99155 99888 100364 100341 100054 99178 97895 96485 95441 95335 96129 98116 99414 100873 100811 100534 100063 98318 96488 95740 96029 96606 97972 99462 100171 101344 101091 101063 99210 97270 97448 96868 96177 98802 99198 100207 100776 101510 101608 100997 99999 99546 98730 98706 100198 100342 100657 101326 101944 102279 101227 101605 101279 100648 101005 101499 100696 101115 102635 102808 102796 102296 102589 102264 102210 102440 101918 101056 101513 103306 103410 103556 103299 103031 102809 103021 103007 102561 101775 101435
------------------------
  2 Kommentare
OcDrive
OcDrive am 15 Aug. 2023
Thank you!
dpb
dpb am 15 Aug. 2023
year = data(date_rows(i), 2);
month = data(date_rows(i), 4);
day = data(date_rows(i), 6);
% Format the date into a string
date_str = sprintf('%d-%02d-%02d', year, month, day);
NOTA BENE: the above has the same assumption I made initially and overlooked didn't get right answer when got past a single-digit month. See my later follow-up for fixup code.
Also, once you do have the correct numeric values, there's no reason to convert to text to then convert back again, use the (y,m,d) vector form of input to datetime instead.

Melden Sie sich an, um zu kommentieren.

Weitere Antworten (1)

dpb
dpb am 15 Aug. 2023
Bearbeitet: dpb am 15 Aug. 2023
"...that data and limit it by four points - creating a matrix - as follows:
(11,5).....(21,5)
(11,13)....(21,13)"
I don't follow what the above is intended to represent? You mean you only want to keep four elements out of each day's worth of data given by those four indices or the array data(11:21,5:13) from each?
Either is relatively trivial, just need to know what, specifically, is intended.
Also, the file shows up with an extra linefeed or two in the first rows, it appears at least in my browser; the day of y,m,d is split at only one character on first record. One presumes/hopes that isn't real...
data=readlines('data.dat.txt'); % see what content is as string
data(1:5,:)
ans = 5×1 string array
"→1998→→1→→1→→→→→→→→→→→→→→→→→→→→→→→→→→→→→→" "101808→101808→101808→101808→101808→101808→101808→101808→101808→101808→101808→101808→101808→101808→101808→101808→101808→101808→101808→101808→101808→101808→101808→101808→101808→101808→101808→101808→101808→101808→101808→101808→101808→101808→101808→101808" "102996→102977→102896→102785→102654→102472→102248→102010→101747→101532→101380→101194→101017→100896→100878→100917→101015→101135→101270→101413→101476→101455→101387→101273→101153→101125→101191→101338→101581→101867→102158→102406→102612→102770→102885→102962" "104070→103923→103874→103595→103095→102595→102004→101110→100397→99380→98439→99194→98031→98212→98506→99048→100229→100992→101076→101122→101088→101302→101170→100862→100682→100750→100998→101614→102014→102698→102890→103075→103468→103751→103877→104087" "104112→104181→104175→103977→103452→102535→101486→100363→99643→99100→98387→98255→98074→98449→99182→99990→100568→100603→100861→101187→101268→100942→100482→100612→101210→101530→101868→102262→102526→102416→102559→102830→103105→103452→103770→103999"
Ah, so, looks to be tab-delimited...
numel(double(data{1})==9)
ans = 41
numel(double(data{2})==9)
ans = 251
But, they're not same number each...readmatrix may not work as desired, let's see about that...
data=readmatrix('data.dat.txt');
whos data
Name Size Bytes Class Attributes data 6205x36 1787040 double
data(1:5,1:7), data(1:5, end-4:end)
ans = 5×7
NaN 1998 NaN 1 NaN 1 NaN 101808 101808 101808 101808 101808 101808 101808 102996 102977 102896 102785 102654 102472 102248 104070 103923 103874 103595 103095 102595 102004 104112 104181 104175 103977 103452 102535 101486
ans = 5×5
NaN NaN NaN NaN NaN 101808 101808 101808 101808 101808 102406 102612 102770 102885 102962 103075 103468 103751 103877 104087 102830 103105 103452 103770 103999
data(end-4:end,1:10)
ans = 5×10
100857 99603 99985 101161 102023 102036 101166 100970 100976 101920 101559 100723 100590 101469 102019 102140 101555 101044 101167 102053 101721 101269 100984 101553 101835 101885 101546 101167 100786 101819 101391 101223 101272 101516 101579 101558 101425 101292 101184 101563 101149 101149 101245 101287 101364 101305 101282 101301 101242 101376
Well, that does seem able to handle, let's see about finding the dates using that first tab---
ixDate=isnan(data(:,1)); % logical vector records starting with a NaN
nnz(ixDate) % how many are there?
ans = 365
Aha! The number we would have expected for each day of a non-leapyear year...that's most excellent!
We can then get the dates easily enough; one presumes the data size is consistent for each...but let's check that out...
dataSize=diff(find(ixDate)); % the distance between date records
N=unique(dataSize) % easy way to see if all the same and what is
N = 17
And, they are all same size, with 17 data lines between...but that isn't commensurate with what would appear to be 21 rows requested above?
But, it's simple enough then to reshape the file however wanted...
dates=datetime(data(ixDate,[2 4 6])); % convert y,m, d to datetime
dates([1:3 end-2:end]) % and see if got it right
ans = 6×1 datetime array
01-Jan-1998 02-Jan-1998 03-Jan-1998 09-Dec-1998 30-Nov-1998 01-Dec-1998
Looks ok; goes from first to last day of year.
ERRATUM: LOOK MORE CLOSELY, THE LAST THREE AREN'T RIGHT!!! There's a fuller explanation at bottom, the quick fixup corrections are..
dateData=data(ixDate,[2 4:6]); % columns of y, m, 10s,1s day
dateData(isnan(dateData))=0; % fixup the initial 10s day column
dates=datetime(dateData(:,1),dateData(:,2),10*dateData(:,3)+dateData(:,4)); % fix day and convert
dates([1:3 end-2:end])
ans = 6×1 datetime array
01-Jan-1998 02-Jan-1998 03-Jan-1998 29-Dec-1998 30-Dec-1998 31-Dec-1998
OK, so now that does look more better...@OcDrive, you'll need to verify this behavior in the real file(s); would be a good thing to fix at the source if possible.
So, now, get rid of the date records and convert to 3D by the number records/day...
data=data(~ixDate,:); % remove the dates (keep not date)
whos data
Name Size Bytes Class Attributes data 5840x36 1681920 double
data=mat2cell(data,repmat(N-1,1,numel(dates)),size(data,2)); % by records, width to a cell array
whos data
Name Size Bytes Class Attributes data 365x1 1719880 cell
data=cat(3,data{:});
whos data
Name Size Bytes Class Attributes data 16x36x365 1681920 double
  3 Kommentare
dpb
dpb am 15 Aug. 2023
I couldn't interpret that desire so left as "exercise for Student" to select whatever range is desired...that would be some sort of colon indexing operation.
Your last presumption would simply be
r1= 5; r2=13;
c1=11; c2=21;
data=data(r1:r2,c1:c2,:);
presuming the wish is intended as inclusive.
dpb
dpb am 15 Aug. 2023
Bearbeitet: dpb am 15 Aug. 2023
Actually, the dates above aren't correct for the end...looks like maybe there isn't always a blank column there after some point? Mayhaps have to do something more there...
data=readmatrix('data.dat.txt'); % see what content is as string
ixDate=isnan(data(:,1)); % logical vector records starting with a NaN
nnz(ixDate) % how many are there?
ans = 365
dateData=data(ixDate,:); % extract those records to inspect more carefully
dateData([1:5 end-4:end],:)
ans = 10×36
NaN 1998 NaN 1 NaN 1 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 1998 NaN 1 NaN 2 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 1998 NaN 1 NaN 3 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 1998 NaN 1 NaN 4 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 1998 NaN 1 NaN 5 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 1998 NaN 12 2 7 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 1998 NaN 12 2 8 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 1998 NaN 12 2 9 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 1998 NaN 12 3 0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 1998 NaN 12 3 1 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
Yeah, well that sucks...the format does change somewhere from beginning to end.
Oh! Actually, the day is in two fields, the original NaN should be zeros in column 5 and then column 5:6 should be interpreted as one instead. Not sure if that is real or a figment of having uploaded the file; OP will have to determine what's going on there for sure.
To fix this as it was interpreted would be something like
dateData=data(ixDate,[2 4:6]); % pick out the date data columns
dateData(isnan(dateData))=0; % convert the initial nan-->0
dateData(:,3)=10*dateData(:,3)+dateData(:,4); % fixup the day number from the two columns
dates=datetime(dateData(:,1:3)); % convert y,m, d to datetime
dates([1:3 end-2:end]) % and see if got it right
ans = 6×1 datetime array
01-Jan-1998 02-Jan-1998 03-Jan-1998 29-Dec-1998 30-Dec-1998 31-Dec-1998

Melden Sie sich an, um zu kommentieren.

Kategorien

Mehr zu Data Import and Export finden Sie in Help Center und File Exchange

Produkte


Version

R2023a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by