MATLAB Answers

Merge or join two data sets

4 views (last 30 days)
Frank Olsen
Frank Olsen on 17 Apr 2020
Answered: Peter Perkins on 27 Apr 2020
I'm working on a project where I have to conduct an event study. The study is based on two timeseries data set:
1) sentiments from news posts (negative, neutral and positive), and
2) futures returns on specific asset and a relevant index.
In the news data I have isolated the date and sentiment in a table, with a huge number of rows since there're several news posts each day (also weekends). The data is further split up to negative, neutral and positive to conduct the event study and examine them seperately. The pic below is the negative.
In the returns data set I have price developments in returns for the asset and its index in a timetable. One price each trading day (not weekends and holidays).
To perform the event study I need to find a way to remove all rows that do not match a trading day and duplicate dates, so I'm left with one row for each trading day. Is that possible?
Can it be done by merging/joining the two data set, since the return data only consists of observations on trading days
Note: The return data goes back further in time since I need to compute the "normal return" based on returns from -205 days to -5 days prior to the news sentiment.
Thanks a lot for the help! I really appreciate any suggestions!
Frank

  2 Comments

Mohammad Sami
Mohammad Sami on 17 Apr 2020
you can use groupsummary to summarise your sentiment data by day first. For example it will count how many negative news item appeared on that day. Then you can merge it with your returns table.
Star Strider
Star Strider on 17 Apr 2020
Explore the join function and its friends.

Sign in to comment.

Answers (1)

Peter Perkins
Peter Perkins on 27 Apr 2020
You have an irregular intraday table (sentiments) and a daily timetable (returns). It sounds like you want to turn the sentiment data into a daily series? I'm gonna guess that maybe you want, on each trading day, a count of negative, neutral, and positive posts. Posts on weekends I guess you want to ignore.
Let's say you have this:
posts =
10×1 timetable
time sentiment
____________________ _________
27-Apr-2020 02:28:13 Negative
27-Apr-2020 06:01:41 Neutral
27-Apr-2020 09:56:51 Neutral
27-Apr-2020 13:57:48 Negative
27-Apr-2020 21:09:31 Positive
28-Apr-2020 00:31:11 Neutral
28-Apr-2020 02:26:17 Negative
28-Apr-2020 09:59:27 Neutral
28-Apr-2020 18:51:41 Negative
28-Apr-2020 19:01:19 Neutral
There are a bunch of ways to get the sentiment counts as separate daily count variables; here I'll show groupcountsto first get counts. In less recent releases of MATLAB, you can use groupsummary, or varfun. groupcounts has a nice way to compute daily counts, but here you want daily-by-sentiment counts, so discretize the times to dates before calling groupcounts.
>> posts.time = dateshift(posts.time,'start','day');
>> posts2 = groupcounts(posts,["time" "sentiment"])
posts2 =
5×3 table
time sentiment GroupCount
___________ _________ __________
27-Apr-2020 Negative 2
27-Apr-2020 Neutral 2
27-Apr-2020 Positive 1
28-Apr-2020 Negative 2
28-Apr-2020 Neutral 3
Now you need to make separate variables for each sentiment; that's unstack.
>> posts3 = unstack(posts2,'GroupCount','sentiment')
posts3 =
2×4 table
time Negative Neutral Positive
___________ ________ _______ ________
27-Apr-2020 2 2 1
28-Apr-2020 2 3 NaN
The NaN is a bit annoying; it's because unstack uses sum by default for aggregation. In the R2020a version of MATLAB that's just out, you can work around that by specifying @numel, but it's also easy to use replacemissing.
>> posts4 = fillmissing(posts3,'constant',0,'DataVariables',["Negative" "Neutral" "Positive"])
posts4 =
2×4 table
time Negative Neutral Positive
___________ ________ _______ ________
27-Apr-2020 2 2 1
28-Apr-2020 2 3 0
Now you are in business. Figure out the weekdays during the period you care about, create a daily datetime vector, and synchronize your posts and your returns timetables to that time vector.

  0 Comments

Sign in to comment.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by