Parsing or regexp HTML output from urlread
1 Ansicht (letzte 30 Tage)
Ältere Kommentare anzeigen
I need to extract the PubMed IDs from the below HTML, but I am not too fluent in the use of regexp.
Can anyone help with how I would extract the IDs from the below HTML, and store them in a vector?
I'm guessing there is some way to say: what is between '<Id>' and '</Id>' store in...
version="1.0" ? eSearchResult PUBLIC "-//NLM//DTD eSearchResult, 11 May 2002//EN" "http://www.ncbi.nlm.nih.gov/entrez/query/DTD/eSearch_020511.dtd" eSearchResult<Count>8</Count><RetMax>8</RetMax><RetStart>0</RetStart><IdList> href = "Id>16123227</Id">Id>16123227</Id</a href = "Id>9561342</Id">Id>9561342</Id</a href = "Id>8429296</Id">Id>8429296</Id</a href = "Id>1408722</Id">Id>1408722</Id</a href = "Id>2152845</Id">Id>2152845</Id</a href = "Id>2894889</Id">Id>2894889</Id</a href = "Id>2860133</Id">Id>2860133</Id</a href = "Id>6145799</Id">Id>6145799</Id</a /IdList<TranslationSet/><TranslationStack> TermSet Term"ulcerative colitis"[All Fields]</Term> href = "Field>All">Fields</Field</a href = "Count>33249</Count">Count>33249</Count</a href = "Explode>N</Explode">Explode>N</Explode</a /TermSet TermSet Term"Clonidine"[All Fields]</Term> href = "Field>All">Fields</Field</a href = "Count>16458</Count">Count>16458</Count</a href = "Explode>N</Explode">Explode>N</Explode</a /TermSet href = "OP>AND</OP">OP>AND</OP</a /TranslationStack<QueryTranslation>"ulcerative colitis"[All Fields] AND "Clonidine"[All Fields]</QueryTranslation></eSearchResult>
0 Kommentare
Akzeptierte Antwort
Tom
am 24 Jun. 2013
str = 'version="1.0" ? eSearchResult PUBLIC "-//NLM//DTD eSearchResult, 11 May 2002//EN" "http://www.ncbi.nlm.nih.gov/entrez/query/DTD/eSearch_020511.dtd" eSearchResult880<IdList> Id>16123227</Id Id>9561342</Id Id>8429296</Id Id>1408722</Id Id>2152845</Id Id>2894889</Id Id>2860133</Id Id>6145799</Id /IdList<TranslationSet/><TranslationStack> TermSet Term"ulcerative colitis"[All Fields]</Term> Fields</Field Count>33249</Count Explode>N</Explode /TermSet TermSet Term"Clonidine"[All Fields]</Term> Fields</Field Count>16458</Count Explode>N</Explode /TermSet OP>AND</OP /TranslationStack"ulcerative colitis"[All Fields] AND "Clonidine"[All Fields]</eSearchResult>';
%isolate the ID list string
IDList = regexp(str,'(?<=IdList>).*(?=/IdList)','match');
disp(IDList{1})
%get the ID numbers from the string
IDno = textscan(IDList{1},'Id>%d</Id');
disp(IDno{1})
Weitere Antworten (1)
Siehe auch
Kategorien
Mehr zu String Parsing finden Sie in Help Center und File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!