Parsing or regexp HTML output from urlread

1 Ansicht (letzte 30 Tage)
Philip  Spratt
Philip Spratt am 24 Jun. 2013
I need to extract the PubMed IDs from the below HTML, but I am not too fluent in the use of regexp.
Can anyone help with how I would extract the IDs from the below HTML, and store them in a vector?
I'm guessing there is some way to say: what is between '<Id>' and '</Id>' store in...

Akzeptierte Antwort

Tom
Tom am 24 Jun. 2013
str = 'version="1.0" ? eSearchResult PUBLIC "-//NLM//DTD eSearchResult, 11 May 2002//EN" "http://www.ncbi.nlm.nih.gov/entrez/query/DTD/eSearch_020511.dtd" eSearchResult880<IdList> Id>16123227</Id Id>9561342</Id Id>8429296</Id Id>1408722</Id Id>2152845</Id Id>2894889</Id Id>2860133</Id Id>6145799</Id /IdList<TranslationSet/><TranslationStack> TermSet Term"ulcerative colitis"[All Fields]</Term> Fields</Field Count>33249</Count Explode>N</Explode /TermSet TermSet Term"Clonidine"[All Fields]</Term> Fields</Field Count>16458</Count Explode>N</Explode /TermSet OP>AND</OP /TranslationStack"ulcerative colitis"[All Fields] AND "Clonidine"[All Fields]</eSearchResult>';
%isolate the ID list string
IDList = regexp(str,'(?<=IdList>).*(?=/IdList)','match');
disp(IDList{1})
%get the ID numbers from the string
IDno = textscan(IDList{1},'Id>%d</Id');
disp(IDno{1})

Weitere Antworten (1)

Sean de Wolski
Sean de Wolski am 24 Jun. 2013
  1 Kommentar
Philip  Spratt
Philip Spratt am 24 Jun. 2013
Must apologise, the output was HTML, hence the xml2struct didn't work.

Melden Sie sich an, um zu kommentieren.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by