Jump to a specific line in a large text file
11 Ansichten (letzte 30 Tage)
Ältere Kommentare anzeigen
Hello,
I have a program that grabs a few data values from each page of long text files (~1500 pages). Each page has exactly 70 lines on it and the data I'm interested in is on lines 9-16 of each page. Because of the nature of the export process (which I don't have control over) I've found that the best method for extracting the data is to jump to a specific line and then to skip a certain number space-delimited fields on that line until I get to the field I want to extract. (See code below.)
The problem is that this method is extremely slow, and the reason is because the only way I know of to jump to a particular line (say the 9th line of page 2, which would be line 79 of the entire document) is to go back to the beginning of the document every time, and then count lines the whole way down to the number I want. This method gets very slow towards the end of large documents.
One more note: I don't want to just jump 62 lines at the end of each page to get to the first field of the next page because sometimes the data fields are inconsistent and textscan may end up on an incorrect line because there aren't enough fields. If that happens, then every page after that would be off by one line. For this reason I need a method to jump to an absolute position within the document regardless of bit length and regardless of relative jumping from a previous jump.
What I need is a way to jump to a specific line in the document without having to individually count from the beginning every time, as is currently the case. Any ideas?
Thanks in advance.
file=fopen(filename);
p=1; %starting page
pages=1500; %total number of pages
line=8; %number of rows in the header to skip
start=zeros(1,pages); %will be used to store the starting line of each page
header=cell(pages,7); %data storage array
for n=1:pages %this populates the 'start' array with the starting line of each page
start(n) = line;
line=line+70; %70 lines per page
end
while p<=pages
frewind(file); %jump to the top of the file
line=start(p); %grabs the starting line for the current page from the 'start' array
grab1 = textscan(file, '%*s %*s %s', 1,'headerlines',line);
grab2 = textscan(file, '%*s %*s %s', 1,'headerlines',1);
grab3 = textscan(file, '%*s %*s %*s %s', 1,'headerlines',1);
grab4 = textscan(file, '%*s %*s %s', 1,'headerlines',1);
grab5 = textscan(file, '%*s %*s %*s %s', 1,'headerlines',3);
grab6 = textscan(file, '%*s %*s %*s %s', 1,'headerlines',1);
%misc. code that saves the data...
p=p+1; %increments the page for the while loop
end
fclose(file);
toc
0 Kommentare
Antworten (1)
Walter Roberson
am 3 Dez. 2012
Unfortunately, the operating systems themselves do not have any mechanism to position by lines.
As you always read the same number of lines each time, you could jump forward by (next line to start at minus current line to start at minus number of lines you just read).
There is a technique that is useful if you need to move around more in a text file, but I don't think it is worthwhile for your case.
0 Kommentare
Siehe auch
Kategorien
Mehr zu Data Import and Export finden Sie in Help Center und File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!