Importing table/array from website to Matlab

Hello. I am trying to import the central table from this web url 'https://tennisabstract.com/reports/atp_elo_ratings.html', from the players name to the "grass" column into MatLab, as an array or table. I want to do it this way because it keeps updating every week, but I do not know how to approach this problem which makes me need your help.
Thank you, Erik

 Akzeptierte Antwort

Guillaume
Guillaume am 3 Jul. 2019

0 Stimmen

First, note that importing data from html is always going to be very iffy. html is a presentation format designed to display things to humans, it's not design for data transfer and you're going to have to remove all the presentation cruft to get at your data.
So, the first thing you should try is contacting the website to see if they have a direct interface to the underlying database.
Bearing this in mind, the following will import your data with the current format of the website. Any change, even minor to the format of that page may break the code.
%definition of patterns used to locate required information with a table row:
intpattern = '<td[^>]*>(\d+)</td>';
linkpattern = '<td[^>]*><a[^>]+>([^<]+)</a></td>';
numberpattern = '<td[^>]*>(\d+(\.\d+)?)</td>';
anypattern = '<td[^>]*>([^<]+)</td>';
emptypattern = '<td[^>]*></td>';
%read and parse html
html = webread('https://tennisabstract.com/reports/atp_elo_ratings.html');
tabledata = regexp(html, ['<tr[^>]*>', ...
intpattern, ...
linkpattern, ...
numberpattern, ...
numberpattern, ...
emptypattern, ...
numberpattern, ...
numberpattern, ...
numberpattern, ...
emptypattern, ...
anypattern, ...
numberpattern, ...
numberpattern], 'tokens');
assert(~isempty(tabledata), 'Failed to parse html according to pattern. The format of the page may have changed');
tabledata = cell2table(vertcat(tabledata{:}), 'VariableNames', {'Rank', 'Player', 'Age', 'ELO', 'Hard', 'Clay', 'Grass', 'Peak_Match', 'Peak_Age', 'Peak_ELO'});
tabledata = convertvars(tabledata, [1, 3:7, 9, 10], @str2double);
tabledata.Player = strrep(tabledata.Player, '&nbsp;', ' ')

5 Kommentare

Erik's comment moved here:
Thank you very much for the help Guillaume! I understand what you are saying about html and data transfer, and the code works perfectly. I have the following function that calculates the probability/odds for a player to beat another player by using its ELO rating.
function w = elo(Ra,Rb)
Qa = 10^(Ra/400);
Qb = 10^(Rb/400);
%The odds for the player with the higher ELO to win
favOdds = 1/(Qa/(Qa+Qb))
%The odds for the player with the lower ELO to win
dogOdds = 1/(Qb/(Qb+Qa))
If you have time, is there a good way to use this function along with my new table, so that odds can be calculated by just typing in to players name instead? Im thinking in a way that the program should know what column values to calculate from if I am typing a string or value that is located in the same row. Perhaps I have to convert the table to an array?
Thank you!
Your function doesn't assign to the output w. I'm taking a guess here as to what w, Ra and Rb are.
function winodds = elo(playertable, player1, player2)
%playertable: a table with at least 2 variables: Player and ELO.
%player1: exact name (case sensitive) of the 1st player
%player2: exact name (case sensitive) of the 2nd player
%winodds: 2x1 vector of the odds of [first player; 2nd player] winning the match
[found, row] = ismember({player1, player2}, playertable.Player);
assert(found(1), 'Player 1 not found in table');
assert(found(2), 'Player 2 not found in table');
ELO = playertable.ELO(row);
Q = 10.^(ELO/400);
winodds = Q / sum(Q); %I don't think your formula was correct. This is the correct formula according to Wikipedia
end
Erik's comment incorrectly posted as an answer moved here. Please select Comment on this anwer. Don't start new answers.
If I assign c as the playertable consisting of column 2 with all rows from tabledata and column 7 with all rows of table data [The player names, Their ELOs on grass] and call the following:
elony(c, 'Ivan Nedelko', 'Kevin King')
I get error in line #9. Am I calling the function in a wrong way? If so, could you please examplify one?
I appreciate your answers very much.
Don't use c as a variable name. It's meaningless and doesn't say anything about what it contains.
Assuming, you've imported the data as tabledata:
>> elo(tabledata, 'Ivan Nedelko', 'Kevin King')
ans =
0.230209216637309
0.769790783362691
Note: you could calculate the odds for matches of every player against any player with:
Q = 10 .^ (tabledata.ELO / 400);
odds = Q ./ (Q + Q.');
odds(r, c) is then the odds of tabledata.Player{r} winning against tabledata.Player{c}

Melden Sie sich an, um zu kommentieren.

Weitere Antworten (1)

Erik Börgesson
Erik Börgesson am 3 Jul. 2019

0 Stimmen

Thank you so much. I understand and it works just like I wanted it to.
Erik

Kategorien

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by