How to cluster similar strings?

9 Ansichten (letzte 30 Tage)
Serbring
Serbring am 26 Jan. 2020
Kommentiert: Serbring am 29 Jan. 2020
Hi all,
I have long lists of strings which I have automatically collected with a brute web scraping routine. However, many strings are pretty similar and I would like to reduce the length of the list by showing only the really different names. Is there any way, cluster together the strings? Below, you will find a sample of the list.
Thank you so much.
Best regards.
{'microbiologia agraria' }
{'microbiologia forestale e ambientale' }
{'microbiologia generale' }
{'microbiologia agraria' }
{'microbiologia generale e ambientale' }
{'microbiologia del suolo e del sottosuolo' }
{'nutrition and health: the functional foods'}
{'microbiologia generale e ambientale' }
{'microbial biotechnologies in agroforestry' }
{'microbiologia generale ed ambientale' }
{'microbiologia agraria e forestale' }

Antworten (1)

Image Analyst
Image Analyst am 26 Jan. 2020
  1 Kommentar
Serbring
Serbring am 29 Jan. 2020
Thanks for your reply. I already knew those distances, but the real problem is how to deal with those number. I will try to be more specific, so that you will understand the basic idea of the algorithm I have developed.
Let's assume, I have three strings A, B and C. I computed the pair-wise distance between the strings (so:A - B, A-C, B-C), and then I summed the distance of one string with the other two (so A-B and A-C for A). Then, I don't have any idea on how to deal with those number. Any suggestion is appreciate.
Cheers
Michele

Melden Sie sich an, um zu kommentieren.

Produkte

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by