- How can we identify the start of the substring: does it always consist of exactly the same letters (eg MOD), or is it always preceded by some recognizable pattern of characters?
- How can we identify the end of the substring: is it always exactly the same file extension that you need to locate?
extract part of a string with an extension
3 Ansichten (letzte 30 Tage)
Ältere Kommentare anzeigen
Andrea
am 3 Dez. 2014
Bearbeitet: per isakson
am 4 Dez. 2014
Hi, I have a long string and I want to just exctract the names that have "hdf" as an extension:
I want just to get "MOD11C1.A2013001.005.2013015221704.hdf"
My string is:
U.S. GOVERNMENT COMPUTER
This US Government computer is for authorized users only. By accessing this
system you are consenting to complete monitoring with no expectation of privacy.
Unauthorized access or use may subject you to disciplinary action and criminal
prosecution.
********************************************************************************
</pre>
<pre><img src="/icons/blank.gif" alt="Icon "> Name Last modified Size Description<hr><img src="/icons/back.gif" alt="[DIR]"> Parent Directory -
<img src="/icons/image2.gif" alt="[IMG]"> BROWSE.MOD11C1.A2013001.005.2013015221704.1.jpg 15-Jan-2013 16:29 3.2M
<img src="/icons/image2.gif" alt="[IMG]"> BROWSE.MOD11C1.A2013001.005.2013015221704.2.jpg 15-Jan-2013 16:29 3.3M
<img src="/icons/unknown.gif" alt="[ ]"> MOD11C1.A2013001.005.2013015221704.hdf 15-Jan-2013 16:29 46M
<img src="/icons/unknown.gif" alt="[ ]"> MOD11C1.A2013001.005.2013015221704.hdf.xml 16-Jan-2013 02:15 32K
<hr></pre>
</body></html>
Thanks,
Zeinab
Akzeptierte Antwort
per isakson
am 3 Dez. 2014
Bearbeitet: per isakson
am 4 Dez. 2014
Here is a solution(?) based on regexp
>> cac = cssm;
>> cac{:}
ans =
MOD11C1.A2013001.005.2013015221704.hdf
ans =
MOD11C1.A2013001.005.2013015221704.hdf
>>
where
function cac = cssm()
str = fileread( 'cssm.txt' );
name_xpr = '[\w\.]+\.hdf';
cac = regexp( str, name_xpr, 'match' );
end
and cssm.txt contains the text of your question. Two identical name seems to be correct. You might want to apply unique
 
In response to comments:
My mistake illustrates a problem with regular expressions. Expressions often matches unexpected strings. I missed the case that ".hdf" is part of the base name rather than an extension. Now I have added that ".hdf" should be followed by "\s, Any white-space character; equivalent to [\f\n\r\t\v]". However, that white-space is not included in the output.
>> cssm
ans =
'MOD11C1.A2013001.005.2013015221704.hdf'
function cac = cssm()
str = fileread( 'cssm.txt' );
name_xpr = '[\w\.]+\.hdf(?=\s)'; % <<<<<<< modified
cac = regexp( str, name_xpr, 'match' );
end
 
Stephen Cobeldick already proposed this modification to the expression. I like Stephen's list, which helps to pinpoint the unique characteristics of the string. It triggers thinking. Does the filename always start with "MOD"? Could "MOD" appear in the middle of the name? It's risky to deduce rules out of small samples. If the name shall always start with "MOD"
name_xpr = '(?<=\s)MOD[\w\.]+\.hdf(?=\s)';
is a better expression.
4 Kommentare
Weitere Antworten (1)
Stephen23
am 3 Dez. 2014
Bearbeitet: Stephen23
am 3 Dez. 2014
Why not all on one line?
str = fileread('temp.txt');
C = regexp(str,'MOD[\w\.]+\.hdf(?=\s)','match');
C =
'MOD11C1.A2013001.005.2013015221704.hdf'
This matches all substrings that meet the following conditions:
- starts with 'MOD'
- ends with '.hdf'
- contains any combination of alphnumeric characters plus period
- is followed by a space character (ie excludes '....hdf.xml')
As suggested by per isakson, you might also want to apply unique to the output.
0 Kommentare
Siehe auch
Kategorien
Mehr zu HDF5 finden Sie in Help Center und File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!