regexp to filter file names

8 Ansichten (letzte 30 Tage)
chlor thanks
chlor thanks am 5 Jul. 2016
Kommentiert: Image Analyst am 5 Jul. 2016
I have files such as the following:
s =
'HI_B2_TTTT9_Default452_07052016.xlsx'
'HI_H2G_TTTT7_Default259_070516.xlsx'
'HI_B2C_TTTT9_Default1482_070516.xlsx'
'HI_A1C_TTTT4_468_070516.xlsx'
'HI_G1C_TTTT8_862_07052016.xlsx'
'HI_KA6_TTTT4_148_07052016.xlsx'
'HI_8C_TTTT7_279_Potato_07052016.xlsx'
I only wish to process the first six files and filter out the last one which is a different format than the first six files. Note that even though some of them did not say "Default" in the file names, it is still considered default since it did not specifically mention "Potato" or other keywords.
I try not to filter it out by keywords "Potato" since there may be future files add in this cell array that contains other keywords such as "Carrot", "Bacon", etc (I don't know what they will be yet) other than "Potato". In that case, they will not be filtered out as I wish they would.
Actually I think I figure out the code after looking at your answers?
I used find(cell2mat(regexp(s,'HI_\w+_\TTTT\d_(Default)?\d+_\d+')))
Thank y'all for all the inspiration!!

Akzeptierte Antwort

Azzi Abdelmalek
Azzi Abdelmalek am 5 Jul. 2016
s={'HI_A1C_TTTT4_468_07052016.xlsx'
'HI_B2_TTTT9_Default452_070516.xlsx'
'HI_GA1C_TTTT8_862_07052016.xlsx'
'HI_HB2C_TTTT7_Default259_070516.xlsx'
'HI_KA6_TTTT4_148_07052016.xlsx'
'HI_B2C_TTTT9_Default1482_070516.xlsx'
'HI_8C_TTTT7_279_Potato.xlsx'}
out=regexp(s,'\w+_\w+_\w+_(Default)?\d+_\d+','match','once')

Weitere Antworten (1)

Image Analyst
Image Analyst am 5 Jul. 2016
What's unique about the filenames you want to keep? Do they all end in 16 like in your small sample? If so do
fileStruct = dir('*16.xlsx');
Now, just use fileStruct(k).name in your loop or wherever you need to reference the filename.
  2 Kommentare
chlor thanks
chlor thanks am 5 Jul. 2016
Thank you for providing another insight to do this!
However, it will not work very well in my particular case (I fixed this particular little bug in my updated question...I made the question up so that I can rewrite the code later by myself.)
The filenames are unique taking the example of 'HI_A1C_TTTT4_468_07052016.xlsx':
HI may stands for a particular program name
A1C may stands for a particular operation within it
TTTT4 stands for who performed this operation
468 stands for the task number
07052016 stands for the date the file is made (you will notice that sometimes it is 070516 and sometimes it is 07052016 depends on how the person feel when they save the file...)
So the purpose of this regexp is to extract these files out of hundreds of other files that I have and I will later parsing these info using "split", but that's a different story...
Image Analyst
Image Analyst am 5 Jul. 2016
OK, though I'm still not sure what constitutes a good filename and a bad one. If it's just the presence of some list of keywords defined in advance, you might look at ismember to identify what strings, in a cell array of filenames, have any of the keywords in them.

Melden Sie sich an, um zu kommentieren.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by