# How to find streaks in a categorical vector?

5 views (last 30 days)
buhmatlab on 30 Apr 2020
Answered: Peter Perkins on 5 May 2020
Hi,
is there a way to identify streaks in a categorical vector? Considering number vectors, I've found out that it might be useful to work with the functions find and diff but as this problem is concerned with a categorical vector, this does not work. Is there a workaround or would I need to convert my categorical vector into a number vector?
The table below illustrates my plan:
+--------+--------+
| CatVec | SerVec |
+--------+--------+
| A | 1 |
+--------+--------+
| C | 1 |
+--------+--------+
| A | 1 |
+--------+--------+
| B | 1 |
+--------+--------+
| B | 2 |
+--------+--------+
| C | 1 |
+--------+--------+
| A | 1 |
+--------+--------+
| A | 2 |
+--------+--------+
| A | 3 |
+--------+--------+
| B | 1 |
+--------+--------+
| C | 1 |
+--------+--------+
| C | 2 |
+--------+--------+

Image Analyst on 30 Apr 2020
Did you look at the findgroups() function?
Ameer Hamza on 30 Apr 2020
To easily solve this problem, It will be much easier to use double datatype as compared to categorical. You can simply use double() to convert the categorical array to the double array.

Peter Perkins on 5 May 2020
diff is a numeric function. categorical is not numeric. Ordinal categoricals have a mathematical ordering, but they don't have a "distance", so diff makes no sense for them.
But diff(x) is just x(2:end) - x(1:end-1). Again, categorical isn't numeric, but remember, the "trick" is not just diff(x), it's diff(x) > 0, so what you want is
d = c(2:end) ~= c(1:end-1)
At that point, you are exactly where you'd be if you had started with a numeric vector. Your runs (including singleton runs) begin at find(d) and the run lengths are diff(find(d)).

Vimal Rathod on 5 May 2020
You can use "findgroups" function to make groups in Categorical Data. The function accepts the data as categorical data so you don't need to change your data.