Finding duplicates without using the unique function

I'm struggling to make a user defined function that detects duplicates within a matrix. This is what I have so far:
function bmatch = testing(data)
edges = min(data):max(data);
[counts,values] = histcounts(data, edges);
if values(counts>=2)
bmatch = 1;
else
bmatch = 0;
end
However this doesn't detect duplicates or state the number of duplicates in a given matrix. I don't understand why.

4 Kommentare

Please describe what you mean by 'doesn't work'. How do you know it doesn't work? Is there an error message?
It can't detect duplicates within a matrix.
Give an example of what you want to catch, since you can always convert a matrix into a vector. So a being a matrix is irrelevant.
Something like this:
A = [ 1 2 3 4 5 6 7 8 9 10]
match_a = testing(A) % should return bmatch as 0 and make a matirx [0]
B = [ 1 1 2 3 4 5 6 5 9]
match_b = testing(B) % should return bmatch as 1 and make a matrix [1 1 5 5]

Melden Sie sich an, um zu kommentieren.

Antworten (1)

Walter Roberson
Walter Roberson am 2 Mai 2023
Verschoben: Walter Roberson am 2 Mai 2023
Watch carefully:
data = [ 1 2 3 4 5 6 7 8 9 10]
data = 1×10
1 2 3 4 5 6 7 8 9 10
edges = min(data):max(data)
edges = 1×10
1 2 3 4 5 6 7 8 9 10
[counts,values] = histcounts(data, edges)
counts = 1×9
1 1 1 1 1 1 1 1 2
values = 1×10
1 2 3 4 5 6 7 8 9 10
Notice that the final count is 2 and that the vector of counts is shorter than the number of entries in edges . Read carefully about what happens in the edge cases for histcounts
Your code also has problems if the values are not all integers, or if there are non-finite values -- or if one of the values is much larger than the others. For example your code should be able to handle testing([-1e40 1e40]) without difficulty, but your code will run out of memory.

3 Kommentare

if values(counts>=2)
You create a logical index of the locations where the counts are >= 2. Then you use that logical index to select certain elements of values. You have if applied to that list of values.
There might be multiple places where counts>=2 so values(counts>=2) might be empty (no duplicates detected), or might be a scalar (single duplicate detected), or might be more than one element. So values(counts>=2) could be empty or scalar or vector.
When you apply if to something, MATLAB considers the condition to be true only in the situation where all of the values being tested are non-zero. It does not matter what the numeric values are (well, except for NaN), if every element is non-zero, then the condition is true, and if the thing is empty or if there is even one value that is zero, the condition is false.
So what you are testing with the if values(counts>=2) is whether the places that have duplicates are all non-zero. If there are no duplicates or if there is a duplicate where values == 0, then the condition will fail; if there is at least one duplicate and the duplicates skip where values == 0 then the condition is considered true.
I suspect that is not what you intended to test.
You really need to be thinking more about what the code should do if there are elements that are not integers.
hint: if you sort the elements, then in the case where there are no duplicates, then there are no adjacent elements that are equal, but in the case that there are duplicates then there will be places where the adjacent elements are equal.

Melden Sie sich an, um zu kommentieren.

Kategorien

Produkte

Version

R2023a

Gefragt:

am 1 Mai 2023

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by