How can I read from the REST API from uniprot with matlab - the data is produced in chunks
5 Ansichten (letzte 30 Tage)
Ältere Kommentare anzeigen
I tried reading this using the matlab command
taxonID = 9606
url = ['https://rest.uniprot.org/uniprotkb/search?query=taxonomy_id:' num2str(taxonID) '&fields=accession%2Cgene_oln%2Cgene_primary%2Cec%2Cxref_geneid%2Cxref_refseq%2Cmass%2Csequence&format=tsv&size=500&sort=protein_name%20asc'];
uniprotDL = webread(url);
The problem is that I only get the first 500 lines. size cannot be set larger than 500, they say it has to be solved using paging. The python example is as follows (not exactly the same, but very similar):
import requests
from requests.adapters import HTTPAdapter, Retry
re_next_link = re.compile(r'<(.+)>; rel="next"')
retries = Retry(total=5, backoff_factor=0.25, status_forcelist=[500, 502, 503, 504])
session = requests.Session()
session.mount("https://", HTTPAdapter(max_retries=retries))
def get_next_link(headers):
if "Link" in headers:
match = re_next_link.match(headers["Link"])
if match:
return match.group(1)
def get_batch(batch_url):
while batch_url:
response = session.get(batch_url)
response.raise_for_status()
total = response.headers["x-total-results"]
yield response, total
batch_url = get_next_link(response.headers)
url = 'https://rest.uniprot.org/uniprotkb/search?fields=accession%2Ccc_interaction&format=tsv&query=Insulin%20AND%20%28reviewed%3Atrue%29&size=500'
interactions = {}
for batch, total in get_batch(url):
for line in batch.text.splitlines()[1:]:
primaryAccession, interactsWith = line.split('\t')
interactions[primaryAccession] = len(interactsWith.split(';')) if interactsWith else 0
print(f'{len(interactions)} / {total}')
As you can see, they download it in chunks. But we want to do it in MatLab, since this is a part of some other package that we are building. Is it possible to do this in matlab?
0 Kommentare
Antworten (1)
Chetan
am 7 Sep. 2023
I understand that you are attempting to access a paged API and retrieve all available data.
In MATLAB, it is possible to implement a pagination approach to download data from the “UniProt REST API” in chunks.
You can utilise the basic while loop and “webread” function to access the data.
You check for result length when the empty results come then stop the while loop.
You can access the following example for more details:
baseURL = 'https://api.punkapi.com/v2/beers';
url = [baseURL '?per_page=10'];
beers = batchReadAPI(url);
numBeers = numel(beers);
disp(['Downloaded ' num2str(numBeers) ' beers']);
function data = batchReadAPI(baseURL)
data = [];
len = 1;
page=1;
while len ~= 0
url = [baseURL '&page=' num2str(page)];
options = weboptions('ContentType', 'json', 'CharacterEncoding', 'UTF-8');
response = webread(url, options);
data = [data; response]
if len~=0
len=length(response)
end
page=page+1;
end
end
Refer to the following documentation for more details like changing the content type and various options availabe with the “webread”:
I hope these suggestions help you resolve the issue you are facing.
Best regards,
Chetan Verma
0 Kommentare
Siehe auch
Kategorien
Mehr zu Web Services finden Sie in Help Center und File Exchange
Produkte
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!