Reshape a column vector containing several stocks returns into a matrix format

4 Ansichten (letzte 30 Tage)
Hello everyone,
I have a column vector containing dates and the respective stock prices for a large set of stocks with different starting and ending dates (as some stocks were only listed after the start of the sample or were delisted during the sample period). Hence, the first 100 rows of the vector might containg the dates and stock returns of stock 1, rows 101 to 350 the dates and prices of stock 2, rows 351 to 400 the same data of stock 3 and so on. I obtained the data set from the Wharton CRSP database.
Now, i would like to transform this column vector into a matrix which contains the dates for the largest available time period in the a first column and the respective stock prices in the subsequent columns such that column 2 contains the stock prices of stock 1, column 3 the stock prices of stock 2 and so on.
I already tried several merging or reshaping commands without any success. Therefore, I would be very thankful for any advice.
Kind regards, Maximilian
  5 Kommentare
Cedric
Cedric am 26 Jun. 2014
Bearbeitet: Cedric am 26 Jun. 2014
Hi Maximilian, I am traveling and I will come back on your problem tomorrow. It is not too complicated actually.
There is no such thing as an empty element/content in numeric arrays (there would be one in cell arrays, but the latter are not optimal for storing large arrays of numbers). How will you use this new array of PERMCOs restructured in column afterwards? Would it be ok/usable if we filled it with NaNs where there is no value?
Maximilian
Maximilian am 26 Jun. 2014
Hi Cedric, thanks a lot for your help
with empty elements I was refering to NaNs. As I already said, I am still rather new to Matlab =)
What i want to do afterwards is to sort the stocks into decile portfolios depending on their return on a given date (Portfolio Nr. 1 for example would contain the stocks with the highest return on the given date whereas portfolio Nr.10 contains the stocks with the lowest return on that date). Then i want to compute the average returns of these portfolios. If i understand this correctly, Matlab just ignores NaN's in its calculations. Hence, if i would calculate the average return on a given date for a portfolio of 10 stocks and I have a NaN for one stock at that date, Matlab would just calculate the average return of the 9 stocks for which prices are available.
Having the PERMCO as the name of the different stocks would be nice as it would allow me to identify the stocks contained in a given portfolio and to characterise it (Does it only contain stocks with small market capitalization, low book-tomarket ratio and so on).
Just to make this clear once again. The matrix I need afterwards should optimally have a dateline in column 1 and the stock prices of all the different stocks in the subsequent columns (Column 2 should contain the stock prices of stock 1, column 3 the stock prices of stock 2 and so on) Additionally it would be nice to have the PERMCO as the name of the different stock columns.
Thanks again for your kind help.
Kind regards, Maximilian

Melden Sie sich an, um zu kommentieren.

Akzeptierte Antwort

Cedric
Cedric am 27 Jun. 2014
Bearbeitet: Cedric am 27 Jun. 2014
Here is one way to process your data. You'll have to understand and check that it is correct (at least, check a dozen numbers in the array of prices/returns). I attached the M file, but I copy/pasted the code below as well, to facilitate the discussion. The output of block 3 is the plot below
Note 1 - you may want to update the terminology. What I named "prices" initially is the "returns" that you mention in your last comment.
Note 2 - base functions in MATLAB don't compute with NaNs:
>> sum(prices(1,:))
ans =
NaN
If you have the Financial toolbox, you can use special function for that purpose, e.g.
>> nansum(prices(1,:))
ans =
326.5000
If not, you can extract non-nan elements and aply base operations on these
>> id = ~isnan( prices(1,:) ) ;
>> sum( prices(1,id) )
ans =
326.5000
----[ CODE ]---------------------------------------------------------------------------------------------------
%%--- 1. Read data.
fprintf( 'Reading data.. ' ) ;
tic ;
data = xlsread( 'demo_data.xlsx' ) ;
fprintf( ': done in %.2es\n', toc ) ;
%%--- 2. Process data.
fprintf( 'Processing data.. ' ) ;
tic ;
% - 2a. Build list of unique PERMCOs. This corresponds to the horizontal
% header of the table of prices defined below.
permcos = unique( data(:,1) ).' ;
% - 2b. Build list of unique dates. This corresponds the the vertical
% header of the table of prices defined below.
dates = unique( data(:,2) ) ;
minDate = min( dates ) ;
% - 2c. This is the complicated part. Let's start with an example. We have
% a vector of dates
% 1] 19260102
% 2] 19260104
% 3] 19260105
% 4] 19260106
% .. ...
% 23283] 20131231
%
% where the number behind ] is the index of the element that follows. We
% will need a table which makes the inverse correspondance, which means
% making the association 19260102 -> 1, 19260104 -> 2, etc. We could build
% a giant array where element 19260102 is 1, etc.
%
% 19260102] 1
% 19260104] 2
% 19260105] 3
% 19260106] 4
% ... ..
% 20131231] 23283
%
% but it would be quite large in memory (~162MB). Instead, we build a
% mechanism which makes the correspondance between (date - min(dates) + 1)
% and indices
% 1] 1
% 3] 2 % Observe the discont. (*)
% 4] 3
% 5] 4
% .. ...
% 871129] 23283
%
% (*) which corresponds to the gap between 19260102 and 19260104. This
% array takes less than 7MB RAM, which is reasonable. We call that a lookup
% table; it associates a series of numbers/indices to another series of
% numbers.
shiftedDates = dates - minDate + 1 ;
dates_LT = zeros( max(shiftedDates), 1 ) ;
dates_LT(shiftedDates) = 1 : numel( shiftedDates ) ;
% If you display dates_LT(1:10), you'll see that it worked: element 1 is 1,
% element 3 is 2, etc.
% - 2c. Build array of prices per date/PERMCO.
% Prealloc an array of NaNs.
prices = nan( numel(dates), numel(permcos) ) ;
% Iterate through PERMCOs and define corresponding columns.
for pId = 1 : numel(permcos)
% Identify (build a logical index) rows of the data table associated
% with current PERMCO. The logical index will then be used to extract
% relevant dates/prices.
lId = data(:,1) == permcos(pId) ;
% Extract relevant dates.
currentDates = data(lId,2) ;
% Build the index, in the table of prices, of entries corresponding to
% these dates: shift them and then use lookup table.
dId = dates_LT(currentDates - minDate + 1) ;
% Store current prices at relevant "places" in the array of prices. The
% column is defined by the PERMCO ID (pId) and rows are defined by date
% IDs that we just computed.
prices(dId,pId) = data(lId,3) ;
end
fprintf( ': done in %.2es\n', toc ) ;
%%--- 3. Plot time series.
% - 3a. Plot, grid, ylabel.
plot( dates, nansum(prices, 2), 'Color', 0.85*[1,1,1], 'LineWidth', 2 ) ;
hold on ; grid on ;
plot( dates, prices ) ;
ylabel( 'Return [$]' ) ;
% - 3b. Legend.
legenItems = arrayfun( @(pId) sprintf( 'PERMCO %d', pId), permcos, ...
'UniformOutput', false ) ;
legend( [{'Sum portfolio (?)'}, legenItems{:}], 2 ) ;

Weitere Antworten (1)

Maximilian
Maximilian am 27 Jun. 2014
Bearbeitet: Maximilian am 27 Jun. 2014
Amazing !!!
Thank you so much.. I really appreciate it a lot..
I tried it and it works perfectly.
  1 Kommentar
Cedric
Cedric am 27 Jun. 2014
Bearbeitet: Cedric am 27 Jun. 2014
Great, you've been lucky, I got stuck at the airport and I spent 25 minutes on this to change my mind! Note that there is a more concise but more complex approach which consists in building both permco and date IDs, and build the prices array in one shot instead of the loop. I thought that the loop would be simpler to grasp though.
Cheers,
Cedric

Melden Sie sich an, um zu kommentieren.

Kategorien

Mehr zu Price and Analyze Financial Instruments finden Sie in Help Center und File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by