Comparing lists of years for similarity

Question

James Ryan am 14 Dez. 2016

0
Verknüpfen

Direkter Link zu dieser Frage

https://de.mathworks.com/matlabcentral/answers/316793-comparing-lists-of-years-for-similarity

Kommentiert: Guillaume am 15 Dez. 2016

My problem involves calibrating a numerical model which predicts some event which happens or not in each year. It could be economic events, coral bleaching, or many other things. I want to compare the similarity of results from different model versions, or with real-world historical data.

The models are expected to miss quite often, so looking for exact matches won't do. Size of error matters so Wilcoxson rank-sum won't do. The lists will often be different in length, and they could be quite a bit longer than my examples below.

Examples of what is subjectively "good" and "bad".

A = [1968 1972 1991 1993 2001 2010]
B = [1968 1972 1993 2001 2010]
C = [1969 1973 1991 1995 2001 2011]
D = [1950 1960 1991 1993 2001 2050]
E = [1968 1972 1991 1993 2001 2010 2050]

Consider A to be "correct"

B is missing one year entirely, but this is not disastrous.
C has only two matching values, but the others are close, I'd call this better than B.
D has three exact matches, but the others are way off.  I'd consider this the worst.
E has five exact matches and one really bad point.  Again, not disastrous.

Of course I don't expect an algorithm to match my subjective evaluation all the time. I just want it to take the things I have mentioned into account.

If I were to make up an algorithm off the cuff I'd probably try to for look points with near neighbors in the other list and score their distances root-mean-square style, with some maximum value counted against any points left with no neighbor. This is really crude, and there must be a better way.

Suggestions, please!

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

Melden Sie sich an, um zu kommentieren.

Melden Sie sich an, um diese Frage zu beantworten.

Answer 1

Guillaume am 14 Dez. 2016

2
Verknüpfen

Direkter Link zu dieser Antwort

https://de.mathworks.com/matlabcentral/answers/316793-comparing-lists-of-years-for-similarity#answer_247124

It sounds like you need some sort of edit distance calculation. A pure edit distance algorithm would rank B as better than C (1 deletion vs 4 substitutions) but you can weight the deletions more than the substitutions and give different weight to the substitutions by how far they are from the original value.

There is an edit distance function on the File Exchange. No idea of its quality.

2 Kommentare
Keine anzeigenKeine ausblenden

James Ryan am 14 Dez. 2016

Thanks. This definitely moves me closer to a solution. The only difference is in that algorithm (designed for strings) replacing one letter with another has the same "cost" regardless of the letter. With dates, the replacement matters. Maybe I can tweak it to work.

Guillaume am 15 Dez. 2016

Yes, as I said you can modify the standard algorithm to give different weight to substitutions depending on how far they are from the original value.

The concept of what you are trying to do is definitively one of an edit distance, so I'm sure you can find an algorithm already developed somewhere.

Melden Sie sich an, um zu kommentieren.

Answer 2

KSSV am 14 Dez. 2016

1
Verknüpfen

Direkter Link zu dieser Antwort

https://de.mathworks.com/matlabcentral/answers/316793-comparing-lists-of-years-for-similarity#answer_247120

You can try this ismembertol https://in.mathworks.com/help/matlab/ref/ismembertol.html. You can fix some tolerance limits and find out whether two sets of numbers have any common elements. You can decide your scenarios by setting your tolerance limits.

1 Kommentar
-1 ältere Kommentare anzeigen-1 ältere Kommentare ausblenden

James Ryan am 14 Dez. 2016

Another step in the right direction. Perhaps I could count exact matches, then near matches, and then count years which don't have a near match. Each count could be weighted differently to create a "nearness" score. Thanks.

Melden Sie sich an, um zu kommentieren.

Answer 3

Image Analyst am 15 Dez. 2016

0
Verknüpfen

Direkter Link zu dieser Antwort

https://de.mathworks.com/matlabcentral/answers/316793-comparing-lists-of-years-for-similarity#answer_247224

What about ismember() and/or setdiff()? You don't need ismembertol() if all your numbers (years) are integers. setdiff() tells you what numbers are different between the two vectors, and ismember() tells you what number are the same in the two vectors. Neither one cares about position but I don't think that matters to you - you only care if the number(s) is/are present or not in the array.

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

Melden Sie sich an, um zu kommentieren.

Comparing lists of years for similarity

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

Antworten (3)

2 Kommentare
Keine anzeigenKeine ausblenden

1 Kommentar
-1 ältere Kommentare anzeigen-1 ältere Kommentare ausblenden

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

Siehe auch

Kategorien

Tags

Community Treasure Hunt

Comparing lists of years for similarity

0 Kommentare -2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

Antworten (3)

2 Kommentare Keine anzeigenKeine ausblenden

1 Kommentar -1 ältere Kommentare anzeigen-1 ältere Kommentare ausblenden

0 Kommentare -2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

Siehe auch

Kategorien

Tags

Community Treasure Hunt

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

2 Kommentare
Keine anzeigenKeine ausblenden

1 Kommentar
-1 ältere Kommentare anzeigen-1 ältere Kommentare ausblenden

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden