How to identify same pattern with phase difference among rows?

Iro

15 Mai 2015

0 Antworten

Aktualisiert 18 Mai 2015

16 Ansichten (30 Tage)

Melden Sie sich an, um diese Frage zu beantworten.

Follow Question

Melden Sie sich an, um diese Frage zu beantworten.

Follow Question

Ältere Kommentare anzeigen

In MATLAB Online öffnen

0 Stimmen

Hi all!

I have constructed a matrix (254864x6) with the following logic: Create all possible combinations of six integers (representing time intervals) between 5 and 20 that sum up to 60. This actually represents the time interval between two consecutive discrete events (6 such events in total) that have a repetition period T=1hour.

I know that this matrix has "double rows" in the sense that the same pattern repeats (if the observation period is e.g. 6h) but with some phase difference. For example, the rows:

5 5 20 20 5 5

and

5 5 5 5 20 20

10 10 15 15 5 5

and

15 15 5 5 10 10

My question is, how can I identify these rows, in a robust way, so I can save some memory and calculation time (by discarding them) in what I am doing further with this script (which uses this matrix as an input to some other process)?

Thanks,

Iro

5 Kommentare
3 ältere Kommentare anzeigen 3 ältere Kommentare ausblenden

Walter Roberson am 18 Mai 2015

In MATLAB Online öffnen

Define a canonical form for each vector by arranging it so that the largest value is as far left as possible. If the vector has multiple copies of the maximum, then the one to move to the left is the one that is followed by the biggest value (this implies that if there two copies of the maximum in a row, they will move to the beginning.) It's almost 2:30 AM here and I can't quite think of an efficient way of doing this. The process is akin to shifting the order of digits in a number to obtain the largest possible integer.

Once you have that canonical number, subtract 4 from each digit, so that the range of values goes from 1 to 16. Now use

idx = sub2idx([16 16 16 16 16 16], TheVector)

Keep an Encountered logical() vector.

If idx > length(Encountered) then the combination has not been encountered before. Set Encountered(idx) = true to mark that you have seen it now. This will extend the array, writing false between the previous end and the new location.

Otherwise if Encountered(idx) is true then you have seen this pattern before, so discard it.

Otherwise, you have not encountered the pattern before. Set Encountered(idx) = true to mark that you have seen it now.

The vector that will lead to the largest index will, I think, be [20, 5, 5, 5, 6, 19], which would be element #14745616 of the Encountered vector. That's not a negligible amount of storage. It's about 10 times what you would use for your full 254864 x 6 matrix. It mostly has the advantage of being fast once the canonicalization is done.

I do not know at the moment how you arrived at your figure of 254864 x 6, as there are 16777216 possible entries for 6 numbers each in the range 5 to 20. I just ran a test and 701246 of the possibilities add up to 60. (Better remember to use uint8() for your data storage.) The Encounters vector I proposed would be about 5% occupied. But it's a linear search time. You could keep a canonicalized version of your present entries, which would save you from having to do circshifts on every existing entry to find matches, but even if you kept that sorted, searching it would be a 2*log2 process of the table size, repeated for each search. My calculation is that binary search would make the run about 35 times slower than linear. ismember() implements binary search on proper arguments, but with the number of searches you would need to do, you would want to drop the error checking and call the C routine ismembc2() directly.

Walter Roberson am 18 Mai 2015

By the way, the reason for going for the largest digits first is that the amount of memory required is determined by the final dimension, so you want the final dimension to be less than the maximum.

Iro am 18 Mai 2015

Bearbeitet: Iro am 18 Mai 2015

In MATLAB Online öffnen

Hi Walter, thanks a lot for your reply!

I am a bit confused when it comes to the use of this logical encounter and what is the purpose of comparing its length to idx, which is a vector, according to this implementation:

function find_phase_shift(A) % A has already shifted order
    subtract_value=(min(min(A)))-1; % =4
    A=A-subtract_value;
    sub_ind_value=max(max(A)); %16 (=20-4)
    sub_ind_vec=sub_ind_value*ones(1,length(A));
    for i=1:size(A,1)
        idx = sub2ind(sub_ind_vec, A(i,:)); % idx = sub2ind([16 16 16 16 16 16], TheVector)  
        % size_idx=1x6...
        %...logical encounter?
    end
    %...
end

Could you be a bit more detailed in that? Or am I getting something wrong here?

Also, I changed the precision to int8 and I still get 16777216 rows of possible combinations but only 254864 when I filter those which sum up to 60...

Thanks again!

Iro

Melden Sie sich an, um zu kommentieren.

Melden Sie sich an, um diese Frage zu beantworten.

Follow Question