I have written a script that reads in data from an excel file into a 4x27 matrix and then performs all the relevant analysis I need and displays results in the command window. I am happy with the script, but I have large amounts of data to run through.
All the data is in the identical form (a 4x27 matrix) but it has to be copied and pasted from a webpage into the excel sheet. At the moment I am running the script for seperate excel sheets, but would like to copy and paste multiple data sets into the excel sheet and run the script once to get all the data.
i.e. the script currently reads in rows like:
data1
data1
data1
data1
then I have to manually create another excel file/overwrite the existing file to create more data. I want it to deal with:
data1
data1
data1
data1
data2
data2
data2
data2
etc......
I feel that since I have already read the script to deal with each dataset as i desire, this shouldn't be a hard task to perform if I use loops well. I just have an almost non-existent knowledge of how to use loops.
I was hoping people could point me in the right direction.
Any help would be great thanks.
---------------------EDIT 1----------------------------------
The code I have written has now been added in the edit, as has an example of a dataset that the code reads in (i uploaded it as .xls whilst on my computer i use .xlsx)
I want the improved script to do the same job, but on an excel sheet containing multiple datasets.
------------------------------EDIT 2------------------------------------
I have now uploaded a xls file of the form I wish the script to tackle. The spacing between datasets is arbitrary, and can be changed depending if it makes the job easier.

17 Kommentare

dpb
dpb am 24 Jun. 2017
Probably be simpler to start with the script you have to mung on and show a small subset of the data.
Is the analysis the same just on a larger dataset or is each set to be analyzed separately, just automated?
And, you can probably automate the download process as well...but let's work on this part first.
Joshua
Joshua am 24 Jun. 2017
MATLAB has two types of loops, 'for' loops and 'while' loops.
For loops run for a pre-specified number of iterations so if you know how many rows you want to read in each time, I would recommend a for loop.
While loops run while a certain condition is true. You could for example make a loop that reads in data until it reaches an empty cell.
I don't have a specific answer to your problem, but here are two links that might help. Also, if you post the code you have written so far I, or someone else, might be able to help you figure out how to implement it for multiple data series. Good luck!
https://www.mathworks.com/help/matlab/matlab_prog/loop-control-statements.html
https://www.mathworks.com/matlabcentral/answers/286103-reading-data-with-loop
Adam Cook
Adam Cook am 24 Jun. 2017
dpb, the latter is true, I want a spreadsheet full of many datasets, each to be analaysed seperately.
and Joshua I will upload the code shortly.
Thanks
Walter Roberson
Walter Roberson am 24 Jun. 2017
xlsread() automatically trims off any trailing rows of NaN, so you can find out how many rows were read by using size() without having to go looking for an empty cell.
Image Analyst
Image Analyst am 24 Jun. 2017
You say "have large amounts of data to run through" does that mean that you have many, many Excel workbooks? Or that you have many, many web pages, perhaps with each web page having a workbook that you need to download and read from?
Then you say "copied and pasted from a webpage into the excel sheet" so does that mean that from all your numerous input workbooks or web pages, you want to create one output workbook where all this separate data is consolidated onto one sheet? If so, how and where do you want to place all the small 4x27 matrices?
Then, why do they need to all be in one workbook before you run your script? Are the data somehow connected, like they're different time segments of one larger signal like temperatures or rainfall or something? For example each input is stock pricees from a month but you want to process many years worth of those months? Why can't you just do it one at a time where you read in one workbook or web page, process it, and write out its results into the output workbook?
Adam Cook
Adam Cook am 25 Jun. 2017
Lots to answer there image analyst; I'll try my best, hopefully the code and example dataset will help with the answer.
I am using the script to analyse odds which I gain from odds comparison websites. I copy and paste the odds from the website into an excel sheet and use this script to tell me which bets are "good bets". (I am not actually betting using the maths in this script, just using it to get to grips with the required programming techniques).
I want to be able to go to a website and copy and paste all the odds for relevant events into an excel workbook all in one go, then run the script which outputs "good bets". At the moment I have to copy and paste the odds and run the program 1 by 1, which is both time consuming and tediously frustrating.
ps. for some reason when copying and pasting data from the websites, the second line of the 4x27 matrix always appears blank, hence the start of this code removing this line.
Image Analyst
Image Analyst am 25 Jun. 2017
So you want to, or perhaps are already doing, use webread() to scrape the web page, and then using xlswrite() or ActiveX to paste the scraped content into a workbook, and you want to do that for many web pages to build up the complete workbook. Is that correct?
Then, once you have the complete workbook, you want to run some more MATLAB code on the collected data that is read back in from this workbook you created. Is that correct?
Adam Cook
Adam Cook am 25 Jun. 2017
Well that certainly is one thing I wish to do, but this query is simpler. I am happy at this point to create an excel spreadsheet myself manually (i.e. highlight with a mouse and copy and paste into spreadsheet) with all the data I need to analyse, as i am often only interested in certain aspects of the webpage, which may vary upon the day I visit.
Here I am mainly looking for advice on how to modify my code to perform the same function it does on one set of data in an excel spreadsheet to then perform on many (identically structured datasets) which are all in one excel sheet.
Does this make my problem any clearer?
dpb
dpb am 25 Jun. 2017
Well, not really... :)
What is the definition of a dataset? I downloaded your spreadsheet and it seems to be badly formed in that the data for what I presume is the first team are in the first row but the team name is in the second. Why don't you correct that when building the file, first? (I guess this is the reason there are four rows and you're looking at NaN values? If you'd build the file correctly initially, that issue would go away.)
Anyway, the question is, are these not independent calculations on a row-by-row basis and it doesn't matter how many rows there are or is the algorithm actually limited to comparing sets of three at at time? Looks to me like you've just hamstrung yourself by writing an explicit expression for three variables instead of generalizing it to compute for however many there are.
I'd think all you'd need to to if that assumption is true is simply append all the data into a block and compute it--"Look, Ma! No loops!!"
Adam Cook
Adam Cook am 25 Jun. 2017
This is the first script I have actually written in matlab (or any language), so It seems obvious that my lack of knowledge of correct terminology is having a detrimental effect on clearly stating my intentions and issues, I thank you all for your patience.
Yes the form of the dataset is annoying, this is the form it takes for some reason when copied and pasted from the website (oddschecker.com for reference) and I cannot seem to change this in the pasting process.
I don't want to amend this in the spreadsheet as the time taken to do that manually would allow me to copy and paste about 4 more datasets. So ideally having the script fix this annoyance (as mine does for 1 "dataset") would theoritcally increase my productivity by a factor of 3 or 4.
What I mean by a dataset is the 3 outcomes on one event. I only wish to calculate the relevant maths on a case-by-case basis for each dataset. They need not be compared to one another. My current script does everything I want for one dataset in a spreadsheet. I will now upload a file of the form I am talking about, containing many datasets.
Thanks again.
dpb
dpb am 25 Jun. 2017
Bearbeitet: dpb am 25 Jun. 2017
See the recent Answer; it recasts the computation into a function you can call over however many elements you have in an array. The cleanup can occur wherever you want it; I think if you can't fix it earlier that doing it when reading the spreadsheet is best and better than burying it into the function (who knows, you may well figure out how to make that go away at some point, why have to edit the calculation when do?).
Also, selecting the input file and processing it is more easily accomplished at higher level code that can then call the processing and eventually the output routines. (I didn't incorporate much there because there were undefined variables check sub N and didn't try to divine their meaning but again the naming indicates should be a general routine to compute them, not indices-specific versions for each.
I don't see why you need to limit to three events, however; doesn't that just create more work in the end to then cross compare those multiple sets? Why not compute the "whole shebang" then you can select the top N, say from that.
Using variables with either sequential names like X1, X2, ..., Xn or individual elements in an array such as D(1), D(2), D(3) is an almost guaranteed case of not using Matlab syntax optimally and eliminating those kinds of idioms is key to learning to use Matlab.
I'd that code and put it in a top level script that calls the function. If you're
[odds,team]=xlsread(...; % read the file
team(cellfun(@isempty,team))=[]; % clean up names
odds=odds(isfinite(odds(:,1)),:); % ditto odds array
THEN you can call the function to compute the statistic either on the group or any subset desired...if just determined to do it only on three at a time, :) then
for i=1:3:length(t)
[stat index]=bestbet(odds(i:i+2,:));
% process that group here
end
or compute all the stats at once and then select/choose from the result vector by rankings overall or whatever criter[ion/a] choose.
[stat index]=bestbet(odds);
Adam Cook
Adam Cook am 26 Jun. 2017
I understand completely the first two parts of this code, yet I don't know how to define the function in matlab.
Also for what I am doing the maths is only relevant for three outcomes of an event. i.e. if arsenal are playing chelsea, and leicester are playing liverpool comparing the odds of leicester and aresenal to win is utterly meaningless for what I am doing.
Your answer looks like it will fully remedy my problem, I am just unsure as to how to define a function in matlab, as I have never done this before.
p.s. Before your answer was posted I tried using my code on different datasheets as suggested by image analyst and it worked well.
dpb
dpb am 26 Jun. 2017
Bearbeitet: dpb am 26 Jun. 2017
" don't know how to define the function in matlab."
The function in Answer is fully functional; it will work on the size of the array passed. Simply copy it and put it in a file named bestbet.m and call it from the script as above. (Or, of course, you can name it something else if you want to reserve that for the script. You can also now actually define functions in script m-files; just need to keep names distinct for the script and the function. See the documentation section on the function keyword and the background section on concepts in the Programming chapter of Getting Started for tutorial info.
I was assuming the odds being download would be mutually exclusive, yes. If the sets are such that you're always looking at only three, then the loop as outlined above will let you put all the data in a single sheet. The two cleanup statements seem to handle the blank lines ok on your sample multiple sheet. That may be somewhat more convenient than using multiple sheets.
Adam Cook
Adam Cook am 26 Jun. 2017
When I copy and paste the entire function into a .m file, the function is underlined in read and says "FUNCTION keyword is invalid here"
dpb
dpb am 26 Jun. 2017
Would have to see text of actual file...it works here.
>> [r,t,raw]=xlsread('bestbet2.xls'); % read the 3-record file
>> r=r(isfinite(r(:,1)),:); % clean up the data
>> bestbet(r) % call the function
ans =
-0.0122
0.0006
0.0013
>>
That matches your previous code results for those three cases.
The function is
>> type bestbet.m
function [check,index]= bestbet(odds)
C = nanmean(odds,2); %computes averages of 3 rows of numerics
[D,index] = max(odds,[],2); %finds max value of rows
E = 1/sum(1./D); %ACE
F = 1/sum(1./C); %Mean ACE
G = C./F; %True odds
check=zeros(size(D)); % preallocate the statistic
indx=1:length(D); % index array 1:N observations
for i=1:length(D)
check(i,:) = 1/sum([1/D(i); 1./G(~(indx==i))])-1; % compute for each
end
>>
I changed a couple of variable names in the Answer but don't think that should make any difference---oh, I see. the header line is out of order; the keyword function has to be first on the line; my bad. I must've not had cursor where thought it was when I did a cut 'n paste operation.
Adam Cook
Adam Cook am 26 Jun. 2017
Cheers mate! Learnt a ridiculous amount during this. I can't believe I tried leaning matlab before without being part of this community, its insane!
Thanks for your patience and your time!
dpb
dpb am 26 Jun. 2017
No problem, glad to help...trying to teach and stretch queries beyond the bare minimum required is part of what I tend to try to do in order to hopefully push folks down the path a little quicker than they might wish to go if left to own devices... :)

Melden Sie sich an, um zu kommentieren.

 Akzeptierte Antwort

dpb
dpb am 25 Jun. 2017
Bearbeitet: dpb am 26 Jun. 2017

0 Stimmen

Rewriting the above to compute for any number of bets would seem simplest--
function [stat,mxIdx] = bestbets(odds)
% return betting odds prediction statistic and max row given odds array
C = nanmean(odds,2); %computes averages of rows
[D,mxIdx] = max(odds,[],2); %finds max value of rows
E = 1/sum(1./D); %ACE
F = 1/sum(1./C); % Mean ACE
G = C./F; % True odds
stat=zeros(size(D)); % preallocate the statistic
indx=1:length(D); % index array 1:N observations
for i=1:length(D)
stat(i,:) = 1/sum([1/D(i); 1./G(~(ix==i))])-1; % compute for each
end
Above reproduces your check array values for the given dataset.
NB: Above presumes the input odds array has been cleaned up before being passed, the problem about generating a suitable file format is a different discussion than than the algorithm implementation.
The loop could be eliminated with accumarray but left so could more easily get the implementation.
The "trick" is in building the logical addressing vector ~(indx==i) to select the elements from the G array with the exception of the one element being computed.
PS: While the computer doesn't care, as a general best practice I'd suggest using descriptive names for variables rather than just C, D, etc. It'll be bound to help in the long run as you develop into using Matlab and other programming languages more. I'll give high marks for brevity; that is also in my (nsh :) ) opinion an objective as exceedingly long variable names tend to just add clutter more than clarify, but at least some sort of a mnemonic to relate to the purpose is important for legibility.
ERRATUM Corrected order of function line to get function keyword first...

Weitere Antworten (0)

Kategorien

Mehr zu Loops and Conditional Statements finden Sie in Hilfe-Center und File Exchange

Gefragt:

am 24 Jun. 2017

Kommentiert:

dpb
am 26 Jun. 2017

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by