Reshape a column vector containing several stocks returns into a matrix format

Question

Maximilian am 23 Jun. 2014

0
Verknüpfen

Direkter Link zu dieser Frage

https://de.mathworks.com/matlabcentral/answers/137160-reshape-a-column-vector-containing-several-stocks-returns-into-a-matrix-format

Kommentiert: Emily Read am 26 Jun. 2019

Hello everyone,

I have a column vector containing dates and the respective stock prices for a large set of stocks with different starting and ending dates (as some stocks were only listed after the start of the sample or were delisted during the sample period). Hence, the first 100 rows of the vector might containg the dates and stock returns of stock 1, rows 101 to 350 the dates and prices of stock 2, rows 351 to 400 the same data of stock 3 and so on. I obtained the data set from the Wharton CRSP database.

Now, i would like to transform this column vector into a matrix which contains the dates for the largest available time period in the a first column and the respective stock prices in the subsequent columns such that column 2 contains the stock prices of stock 1, column 3 the stock prices of stock 2 and so on.

I already tried several merging or reshaping commands without any success. Therefore, I would be very thankful for any advice.

Kind regards, Maximilian

5 Kommentare
3 ältere Kommentare anzeigen3 ältere Kommentare ausblenden

Maximilian am 25 Jun. 2014

demo_data.xlsx

Hey,

first of all thank you for your immediate replies.

I hope that the following example gives a more precise picture of the problem i am facing. I am still rather new to Matlab.

The data i am using is downloaded from the CRSP database and I obtain it in the following format.

The array has 3 columns, where the first column specifies the date (the date is given in the following format yyyymmdd, but i transformed it using the datenum command in my script), the second column contains a unique identifier for each stock called PERMNO or PERMCO in CRSP (Hence, i think i could use this identifier to determine when a run of a new stock beginns) and the third column gives the stock prices.

If I now specify in CRSP that I want the data for all stocks listed in a given index in a given period (lets say from 01.01.1980 to 31.12.2013), CRSP then gives me an excel file (or text file) which consists of the 3 aforementioned columns but lists all the different stocks in the rows one after another. Hence, the first 2000 rows might represent the data for stock 1 (date, identifier, stock price in the different columns), wheres the next 1500 rows give me the same data for stock 2. Due to the fact that some of the stocks were only listed after the start of the period (01.01.1980) or were delisted before the end of the period (31.12.2013) the number of rows for each stock differs.

What i now want is a matrix whichs first column represents the trading dates of the entire period from 01.01.1980 to 31.12.2013 and the subsequent columns should represent the prices for the different stocks (each column should reperesent one specific stock and the stock identifier should be the name of the column if that is possible). Hence, column 2 of the matrix should give me the prices for the given dates (in column 1) of stock 1, column 3 the prices of stock 2 and so on. If the stock price is not available for a given date, I just want a missing value.

The appendix contains an exemplary excel file for three stocks over the considered time period. (Notice that the observations for stock 2 start later than the observations for stock 1 and 3)

Kind regards and thank you for your help.

Maximilian

Cedric am 26 Jun. 2014

Bearbeitet: Cedric am 26 Jun. 2014

Hi Maximilian, I am traveling and I will come back on your problem tomorrow. It is not too complicated actually.

There is no such thing as an empty element/content in numeric arrays (there would be one in cell arrays, but the latter are not optimal for storing large arrays of numbers). How will you use this new array of PERMCOs restructured in column afterwards? Would it be ok/usable if we filled it with NaNs where there is no value?

Maximilian am 26 Jun. 2014

Hi Cedric, thanks a lot for your help

with empty elements I was refering to NaNs. As I already said, I am still rather new to Matlab =)

What i want to do afterwards is to sort the stocks into decile portfolios depending on their return on a given date (Portfolio Nr. 1 for example would contain the stocks with the highest return on the given date whereas portfolio Nr.10 contains the stocks with the lowest return on that date). Then i want to compute the average returns of these portfolios. If i understand this correctly, Matlab just ignores NaN's in its calculations. Hence, if i would calculate the average return on a given date for a portfolio of 10 stocks and I have a NaN for one stock at that date, Matlab would just calculate the average return of the 9 stocks for which prices are available.

Having the PERMCO as the name of the different stocks would be nice as it would allow me to identify the stocks contained in a given portfolio and to characterise it (Does it only contain stocks with small market capitalization, low book-tomarket ratio and so on).

Just to make this clear once again. The matrix I need afterwards should optimally have a dateline in column 1 and the stock prices of all the different stocks in the subsequent columns (Column 2 should contain the stock prices of stock 1, column 3 the stock prices of stock 2 and so on) Additionally it would be nice to have the PERMCO as the name of the different stock columns.

Thanks again for your kind help.

Kind regards, Maximilian

Melden Sie sich an, um zu kommentieren.

Melden Sie sich an, um diese Frage zu beantworten.

Answer 1

Cedric am 27 Jun. 2014

3
Verknüpfen

Direkter Link zu dieser Antwort

https://de.mathworks.com/matlabcentral/answers/137160-reshape-a-column-vector-containing-several-stocks-returns-into-a-matrix-format#answer_142697

Bearbeitet: Cedric am 27 Jun. 2014

In MATLAB Online öffnen

maximilian_1.m

Here is one way to process your data. You'll have to understand and check that it is correct (at least, check a dozen numbers in the array of prices/returns). I attached the M file, but I copy/pasted the code below as well, to facilitate the discussion. The output of block 3 is the plot below

Note 1 - you may want to update the terminology. What I named "prices" initially is the "returns" that you mention in your last comment.

Note 2 - base functions in MATLAB don't compute with NaNs:

 >> sum(prices(1,:))
 ans =
   NaN

If you have the Financial toolbox, you can use special function for that purpose, e.g.

 >> nansum(prices(1,:))
 ans =
  326.5000

If not, you can extract non-nan elements and aply base operations on these

 >> id = ~isnan( prices(1,:) ) ;
 >> sum( prices(1,id) )
 ans =
  326.5000

----[ CODE ]---------------------------------------------------------------------------------------------------

 %%--- 1. Read data.
 fprintf( 'Reading data..    ' ) ;
 tic ;
 data = xlsread( 'demo_data.xlsx' ) ;
 fprintf( ': done in %.2es\n', toc ) ;
 %%--- 2. Process data.
 fprintf( 'Processing data.. ' ) ;
 tic ;
 % - 2a. Build list of unique PERMCOs. This corresponds to the horizontal
 %  header of the table of prices defined below.
 permcos = unique( data(:,1) ).' ;
 % - 2b. Build list of unique dates. This corresponds the the vertical
 %  header of the table of prices defined below.
 dates   = unique( data(:,2) ) ;
 minDate = min( dates ) ;
 % - 2c. This is the complicated part. Let's start with an example. We have
 % a vector of dates
 %                    1]  19260102
 %                    2]  19260104
 %                    3]  19260105
 %                    4]  19260106
 %                    ..  ...
 %                23283]  20131231
 %
 % where the number behind ] is the index of the element that follows. We
 % will need a table which makes the inverse correspondance, which means
 % making the association 19260102 -> 1, 19260104 -> 2, etc. We could build
 % a giant array where element 19260102 is 1, etc.
 %
 %                    19260102]  1
 %                    19260104]  2
 %                    19260105]  3
 %                    19260106]  4
 %                          ...  ..
 %                    20131231]  23283
 %
 % but it would be quite large in memory (~162MB). Instead, we build a
 % mechanism which makes the correspondance between (date - min(dates) + 1)
 % and indices
 %                    1]  1
 %                    3]  2            % Observe the discont. (*)
 %                    4]  3
 %                    5]  4
 %                    ..  ...
 %               871129]  23283
 %
 % (*) which corresponds to the gap between 19260102 and 19260104. This
 % array takes less than 7MB RAM, which is reasonable. We call that a lookup
 % table; it associates a series of numbers/indices to another series of
 % numbers.
 shiftedDates = dates - minDate + 1 ;
 dates_LT = zeros( max(shiftedDates), 1 ) ;
 dates_LT(shiftedDates) = 1 : numel( shiftedDates ) ;
 % If you display dates_LT(1:10), you'll see that it worked: element 1 is 1,
 % element 3 is 2, etc.
 % - 2c. Build array of prices per date/PERMCO.
 % Prealloc an array of NaNs.
 prices = nan( numel(dates), numel(permcos) ) ;
 % Iterate through PERMCOs and define corresponding columns.
 for pId = 1 : numel(permcos)
    % Identify (build a logical index) rows of the data table associated
    % with current PERMCO. The logical index will then be used to extract
    % relevant dates/prices.
    lId = data(:,1) == permcos(pId) ;
    % Extract relevant dates.
    currentDates = data(lId,2) ;
    % Build the index, in the table of prices, of entries corresponding to
    % these dates: shift them and then use lookup table.
    dId = dates_LT(currentDates - minDate + 1) ;
    % Store current prices at relevant "places" in the array of prices. The
    % column is defined by the PERMCO ID (pId) and rows are defined by date
    % IDs that we just computed.
    prices(dId,pId) = data(lId,3) ;
 end
 fprintf( ': done in %.2es\n', toc ) ;
 %%--- 3. Plot time series.
 % - 3a. Plot, grid, ylabel.
 plot( dates, nansum(prices, 2), 'Color', 0.85*[1,1,1], 'LineWidth', 2 ) ;
 hold on ;  grid on ;
 plot( dates, prices ) ;
 ylabel( 'Return [$]' ) ;
 % - 3b. Legend. 
 legenItems = arrayfun( @(pId) sprintf( 'PERMCO %d', pId), permcos, ...
    'UniformOutput', false ) ;
 legend( [{'Sum portfolio (?)'}, legenItems{:}], 2 ) ;

1 Kommentar
-1 ältere Kommentare anzeigen-1 ältere Kommentare ausblenden

Emily Read am 26 Jun. 2019

Cannot thank you enough for your help Mr Wannaz- you have saved me a lot of time!

Melden Sie sich an, um zu kommentieren.

Answer 2

Maximilian am 27 Jun. 2014

1
Verknüpfen

Direkter Link zu dieser Antwort

https://de.mathworks.com/matlabcentral/answers/137160-reshape-a-column-vector-containing-several-stocks-returns-into-a-matrix-format#answer_142724

Bearbeitet: Maximilian am 27 Jun. 2014

Amazing !!!

Thank you so much.. I really appreciate it a lot..

I tried it and it works perfectly.

1 Kommentar
-1 ältere Kommentare anzeigen-1 ältere Kommentare ausblenden

Cedric am 27 Jun. 2014

Bearbeitet: Cedric am 27 Jun. 2014

Great, you've been lucky, I got stuck at the airport and I spent 25 minutes on this to change my mind! Note that there is a more concise but more complex approach which consists in building both permco and date IDs, and build the prices array in one shot instead of the loop. I thought that the loop would be simpler to grasp though.

Cheers,

Cedric

Melden Sie sich an, um zu kommentieren.

Reshape a column vector containing several stocks returns into a matrix format

5 Kommentare
3 ältere Kommentare anzeigen3 ältere Kommentare ausblenden

Akzeptierte Antwort

1 Kommentar
-1 ältere Kommentare anzeigen-1 ältere Kommentare ausblenden

Weitere Antworten (1)

1 Kommentar
-1 ältere Kommentare anzeigen-1 ältere Kommentare ausblenden

Siehe auch

Kategorien

Tags

Community Treasure Hunt

Reshape a column vector containing several stocks returns into a matrix format

5 Kommentare 3 ältere Kommentare anzeigen3 ältere Kommentare ausblenden

Akzeptierte Antwort

1 Kommentar -1 ältere Kommentare anzeigen-1 ältere Kommentare ausblenden

Weitere Antworten (1)

1 Kommentar -1 ältere Kommentare anzeigen-1 ältere Kommentare ausblenden

Siehe auch

Kategorien

Tags

Community Treasure Hunt

5 Kommentare
3 ältere Kommentare anzeigen3 ältere Kommentare ausblenden

1 Kommentar
-1 ältere Kommentare anzeigen-1 ältere Kommentare ausblenden

1 Kommentar
-1 ältere Kommentare anzeigen-1 ältere Kommentare ausblenden