Compare average monthly stock return data

I would like to compare the average of the monthly stock return data listed in the 41x1 cell array 'DS_10T' with the data listed in the cell array 'DS_10B'.
As there are some NaNs within my data I hope that one can somehow not factor them into the averages.
I attached a short version of the relevant data as my original dataset would need significantly more memory, because it includes a lot more stocks.
Even tough every cell array contains 41 tables with data from different timeframes I would only need a comparison of the overall average monthly return out of all tables in 'DS_10T' versus the overall average monthly return of all tables of 'DS_10B' in the following format:
Where "Sell" depicts the average monthly returns of the 'DS_10B' array, "Buy" the average monthly returns of the 'DS_10T' array and Buy-Sell the difference between the two values. The numbers in parentheses can be ignored for now.

13 Kommentare

dpb
dpb am 29 Mär. 2021
Bearbeitet: dpb am 29 Mär. 2021
"I would only need a comparison of the overall average monthly return out of all tables in 'DS_10T' ..."
OK, define what this means, precisely...when I look at the data, I see an inconsistent set of dates in each; the first has all of 2010, the second the first 3Q of 2010 but then Q1 of 2011.
But, I also note that there are different numbers of data columns with different column names in each one -- so what can one average over?
Me no follow, sorry... :)
Fabian Niederreiter
Fabian Niederreiter am 29 Mär. 2021
Sorry for the misunderstanding. I will try to clarify:
For simplicity reasons I didn't mention why I have the data sorted excatly like it is. To summarize quickly, i sorted the stocks based on their past returns for different timeframes. The cell array 'DS_10T' includes the top 10% performing stocks from the past for different timeframes and the cell array 'DS_10B' inludes the bottom 10% performing stocks from the past for each time frame.
Right now I simply want to compare the overall average monthly return of all stocks in 'DS_10T' to the overall average monthly return of all stocks in 'DS_10B'. Hereby we do not longer need to pay attention to the dates (lines) or stock IDs (columns). So basicly just take every stock return variable that appears in 'DS_10T' as well as 'DS_10B' and calculate the averages for it. These number imo can simply be reference with DS_10T{1,k}(:,4:end). Unfortunatelly I am pretty new to Matlab and don't excatly know how to come up with a fitting "for loop" for it...
I apologize for still having so much unnecessary information in the tables and understand that this was confusing.
dpb
dpb am 29 Mär. 2021
"... want to compare the overall average monthly return of all stocks in 'DS_10T' to the overall average monthly return of all stocks in 'DS_10B'. Hereby we do not longer need to pay attention to the dates (lines)..."
If you don't pay attention to dates, then you'll have a global average of all months for all years??? That is, that would be one number for each variable. How many results do you expect?
You can average over issues/stocks ok (although I don't know as I can see what that could possibly mean (so to speak) once have done?)
Fabian Niederreiter
Fabian Niederreiter am 29 Mär. 2021
„That is, that would be one number for each variable. How many results do you expect?“ Yes, that‘s exactly what I want. As shown in the little picture in my original post just one number for DS_10T and one number for DS_10B and the difference between the two numbers.
Fabian Niederreiter
Fabian Niederreiter am 30 Mär. 2021
Hello again dpb,
Is there a chance that you have a proposition of code for my little problem?
Would appreciate that a lot :)
dpb
dpb am 30 Mär. 2021
Sorry, I've been tied up getting ready for annual meeting tomorrow -- I'll try to get a chance to take a look this evening, maybe...I think I understand the Q? now altho it'll take some effort to handle the various forms in each.
You don't have the whole thing in one great big table from which you created these cell arrays by any chance, do you? If the data exist in the whole first, instead of this, it's probably going to be a lot simpler to work there instead is one additional thought/question...
Fabian Niederreiter
Fabian Niederreiter am 30 Mär. 2021
Thanks for your quick answer. I really appreciate your effort despite your upcoming annual meeting. Good luck with that! :)
Although I do understand your concerns regarding the various forms in each cell, I want to point out that the data is always located in column 4 onwards for all of the tables. Nevertheless I am not an expert in how easy it is to run an average over all of the 41 tables of one cell.
I actually do have an overall table with all of the stocks' monthly data. It is called 'G' and also part of the attached dataset. Problem is that 'G' contains lot more stocks, monthly data and a lot more timeframes than the ones relevant for my goal. To be precise 'G' contains a lot of information I specifically don't want to analyse. That's why I created 'DS_10B' and 'DS_10T' in the previous steps of my code. So if you want to use 'G' in order to come up with the desired averages it is really important that we only use the exact stocks and time frames depicted in 'DS_10B' and 'DS_10T'.
As it may help you I now attached the current version of my code as well. You can run it from line 68 onwards.
dpb
dpb am 31 Mär. 2021
We just need a list of which IDs are wanted to be analyzed...
Annual meeting is over (whew!) but I need a little time to decompress...
IDs and timeframes though. Because we have 10 years data for most of the stocks and just want to analyse a fraction of the months.
dpb
dpb am 1 Apr. 2021
Your former response was throwing away the time factor and averaging across all months/years to get a single average???
We have to differentiate there. I very well looked at the time factor when creating the two analysing portfolios (cell arrays) 'DS_10B' and 'DS_10T'.
Now that I have the specific data I want to analyse stored in the tables of these cell arrays (41 tables each for 'DS_10B' and 'DS_10T') , we can average across all stocks/months within the cell array in order to get only two "overall averages" as an output.
Hey dpb,
Did you find time to look into it yet? :)
Would be really happy about your advise.
Nevermind I just managed to do it by myself :)
Attached my solution for the case that you are interested:
%% Mean Bottom Portfolio
meanDS_10B = NaN(1,numel(DS_10B));
for k =1:numel(DS_10B)
t = DS_10B{k,:};
% Extract stock data columns of every table
lastColumn = t{:, 4:end}; % Get all columns with stock return data of table as column vectors of doubles
%reshape into one single column in order to calculate mean
lastCo = reshape(lastColumn,[],1);
% mean the last column but use omitnan
mean_column = mean(lastCo,'omitnan');
meanDS_10B(k) = mean_column;
end
% Average all Means
avgRET_Bottom = mean(meanDS_10B,'omitnan');

Melden Sie sich an, um zu kommentieren.

Antworten (1)

dpb
dpb am 29 Mär. 2021

0 Stimmen

See the optional 'nanflag' argument to mean--
M = mean(___,nanflag) specifies whether to include or omit NaN values from the calculation for any of the previous syntaxes. mean(A,'includenan') includes all NaN values in the calculation while mean(A,'omitnan') ignores them.
cellfun and an anonymous function should let you do this in one line, methinks. :)
I didn't explore the big table in depth, but I'd guess you could probably also do all the analyses directly with the table itself with the use of grouping variables and rowfun and perhaps varfun

Kategorien

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by