Data is not saving to the workspace
Ältere Kommentare anzeigen
I have a large text file composed of a single row of 52480000 numbers separated by semicolons. I'm attempting to organize the data into 51250 rows of 1024 numbers and then separate this into distinct blocks of 1025 x 1024. The numbers need to stay in the same order they were in in the original file (with every 1025th number being the start of a new row) I have tried using a while and if loop.
R = 51250;
C = 1024;
fid = fopen( 'TEST_A.asc');
k = 0;
while ~feof(fid)
z = textscan( fid, '%d', R*C, 'EndOfLine', ';');
if ~isempty(z{1})
k = k + 1;
s = fprintf( 'TEST_A.asc', ';');
dlmwrite( s, reshape( z{1}, 1025, []), ';')
end
end
fclose(fid);
This code does not create an initial cell of 52480000 numbers, which means that none of the subsequent data sets (s & z) are created in the workspace. The problem is that if I textscan the data into Matlab before formatting it, the file creates a memory error. Does anyone notice anything that I don't about this code or have any pointers?
26 Kommentare
What is the size of that file? If the numbers had been stored in a binary file in double precision, that would still be more than 400MB. A text file is bound to be much larger and despite impressive progress GB files are a pain to process.
There are several ways of tackling this. Off the top of my head:
- Don't use a text file, go binary
- Split your text file in manageable chunks beforehand.
- Use a database instead
There are other ways but I can't be more specific without knowing what you are trying to achieve.
See earlier question:
"I'm attempting to organize the data into 51250 rows of 1024 numbers and then separate this into distinct blocks of 1025 x 1024"
Why do you need this intermediate step?
My answer showed you how to to simply process exactly those blocks of 1025*1024, avoiding that intermediate matrix entirely. What do gain by creating that huge matrix that you don't even want? My code shows how you can go directly to the smaller matrices (which seems to be your aim) without having to read the whole file data into MATLAB and without needing to use the intermediate step of rearranging all of the data into one pointlessly huge matrix.
Why not just read the blocks you need (1025*1024) instead of wasting time and memory with that huge matrix?
"The numbers need to stay in the same order they were in in the original file (with every 1025th number being the start of a new row) "
Yes, and that is what my answer does. Change R = 51250; back to R = 1025; and this code will work too.
Aaron Smith
am 10 Feb. 2017
Aaron Smith
am 10 Feb. 2017
@Aaron Smith: when my code works properly then the contents of Z will be empty at the end of all iterations. What were you expecting?
>> size(Z)
ans =
1 1
>> size(Z{1})
ans =
0 1
Much more interesting would be the value of k: please tell me what value k has.
Aaron Smith
am 10 Feb. 2017
"z is 1" z is actually a cell array, so it cannot be equal to one. What do you really mean?
textscan is not reading the data file. Possibly the format is not as expected. Do the numbers have decimal digits, or exponent notation? Please run this and tell me exactly what values out has (it will be slow):
fid = fopen('file.txt','rt');
out = [];
while ~feof(fid)
tmp = unique(fgets(fid,1e5));
out = union(out,double(tmp));
end
fclose(fid);
disp(out)
And also show exactly what this displays:
fid = fopen('file.txt','rt');
str = fgets(fid,60)
fclose(fid);
Aaron Smith
am 10 Feb. 2017
@Aaron Smith: the file contains newline characters (char 10), which means your original description of the file format "I have a very large text file composed of, in essence one row of numbers." is incorrect. Also your original question had code where you used textscan with semicolon delimiter. But there is not one single semicolon in the whole file.
As a result that code tells textscan to read a file with a particular format, but it is not the format that that file has. Because I wrote that code based on what you told me.
You can either experiment with textscan's options (e.g. EndOfLine, Delimiter, etc) yourself, or you can tell us exactly what format the file really has. If you want help then please upload a sample text file (the first two thousand numbers or so) in a new comment.
Image Analyst
am 10 Feb. 2017
Aaron: Please attach 'TEST_A.asc' for further help.
Aaron Smith
am 10 Feb. 2017
@Aaron Smith: in your last comment you forgot to show us what values tmp and out have.
Try something like this:
opt = {'EndOfLine',';', 'CollectOutput',true);
...
z = textscan( fid, '%d', R*C, opt{:});
Aaron Smith
am 10 Feb. 2017
@Aaron Smith: What is k's value when you get that error?
You have been asked twice to upload a sample file. It will be difficult to help your further without it.
I know my code works: I tested it. I even gave you the code that I used to generate the fake data file. If there is any problem then it is because your data file does not match the expected format somehow. So we need to see it.
Could it be that the number of values in the file is not divisible by 50*1025 ? If so then you might need a special case to handle the last matrix. Again, knowing the value of k and a sample file would be helpful.
@Aaron Smith: Try this, it saves all blocks of 1025x1024 values in their own files, and if there are any values left over at the end it saves them in one row in new file:
sbd = 'tempDir';
R = 1025;
C = 1024;
opt = {'EndOfLine',';', 'CollectOutput',true};
fid = fopen(fullfile(sbd,'temp0.txt'),'rt');
k = 0;
while ~feof(fid)
k = k+1;
Z = textscan(fid,'%d', R*C, opt{:});
S = fullfile(sbd,sprintf('temp0_%02d.txt',k));
if rem(numel(Z{1}),R)==0
dlmwrite(S,reshape(Z{1},[],R).',';')
else
dlmwrite(S,Z{1},';')
end
end
fclose(fid);
Note that I also added a transpose to get the data in the correct order.
Aaron Smith
am 13 Feb. 2017
Bearbeitet: Aaron Smith
am 13 Feb. 2017
Walter Roberson
am 13 Feb. 2017
sbd is the name of the subdirectory to save the individual files into. You can set it to '' if you do not want to use a subdirectory to store them
sbd = 'tempDir';
is a subdirectory of the current directory. I put all of the files into this subdirectory because I did not want them cluttering up my current directory. You can make the subdir '' if you want to use the current directory, or (even better) learn to use directory paths and put your data in its own subdirectory.
Aaron Smith
am 13 Feb. 2017
Stephen23
am 13 Feb. 2017
@Aaron Smith: get the second output from fopen:
[fileID,errmsg] = fopen(...)
and read the error message. It always turns out to be a spelling mistake, folder permissions, or the file not being in the location that they are looking in.
Aaron Smith
am 14 Feb. 2017
Bearbeitet: Aaron Smith
am 14 Feb. 2017
Stephen23
am 14 Feb. 2017
@Aaron Smith: just get rid of the fullfile if you don't want it.
However I would recommend learning to use filepaths to access data files, as it makes your code faster and more reliable (e.g. compared to cd or other buggy ideas). Note that the file path I used is relative to the current directory, and that this may be different for the command window and the code that is being called: that path needs to exist relative to where the code runs from. One simple resolution is to always specify the an absolute path. The internet is full of help on understanding relative/absolute paths, but you might as well start here:
"Is there a way to put the fullfile(sbd, ...) part in a separate line" Sure, it is just a function, you can put it wherever you want to.
Aaron Smith
am 15 Feb. 2017
You could register with dropbox, mediafire, google drive, or one of the many other file sharing websites, and send me the link of the file (via my profile page: please also include a link to this thread otherwise the email will get deleted automatically).
Akzeptierte Antwort
Weitere Antworten (0)
Kategorien
Mehr zu Text Files finden Sie in Hilfe-Center und File Exchange
Produkte
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!