# Find pattern in vector while ignoring/skipping certain indices

4 Ansichten (letzte 30 Tage)
Haider Ali am 10 Jun. 2022
Kommentiert: Haider Ali am 12 Jun. 2022
Hello,
Is there an efficient way to search for a specific pattern in a mat vector while ignoring some indices in the pattern?
For example, I need to search for a 9-element pattern [0 4 X 0 6 Y 0 8 Z] in a mat vector, where X, Y, Z can be any values.
I currently have a loop based approach but is there a faster vectorized approach?
Thank you.
##### 3 Kommentare1 älteren Kommentar anzeigen1 älteren Kommentar ausblenden
Haider Ali am 11 Jun. 2022
@dpb, can regexp be used for uint16 mat vectors?
Haider Ali am 12 Jun. 2022
I am afraid I did not post the complete scenario in my question.
The data vector vec contains an ID pair and its associated data in the format [ID1 ID2 data ID1 ID2 data ...]. The goal is to find data associated with each ID pair. It is expected that data in vec has both noise and missing because of which an ID pair is searched ([0 6]) and then the previous ([0 4]) and next ([0 8]) ID pairs are also searched to reliably get the data value of each ID pair.
pattern = ID pairs
vec = data to be searched
out = first 2 columns are ID pairs, 3rd column is associated data
I have tried all of your methods but the following seems to be the fastest. Please have a look at the following code and suggest if it can be executed any faster.
len = length(out);
no_of_IDs_to_search = 1000;
tic
for j = 2:no_of_IDs_to_search % skip searching for ID pairs at 1st location
ind = strfind(vec,pattern(j*2-1:j*2)); % firstly, find all indices of a single ID pair e.g. [0 2] or [0 6]
ind(ind<4 | ind>((len-1)*3)) = []; % remove indices to aviod errors
if (~isempty(ind))
for k = 1:length(ind) % search through all indices and determine if previous and next IDs are a match
if (isequal(vec(ind(k)-3:ind(k)-3+1), pattern((j-1)*2-1:(j-1)*2)) && isequal(vec(ind(k)+3:ind(k)+3+1), pattern((j+1)*2-1:(j+1)*2)))
out(j,3) = vec(ind(k)+2); %update the corresponding index in output vector
break; % break if previous, current and next IDs are a match
end
end
end
end
toc
I have attached the data files.
Thank you.

Melden Sie sich an, um zu kommentieren.

### Antworten (4)

Image Analyst am 11 Jun. 2022
I think this should work but for your given pattern, and a vector of 100 million elements of random values, I never did see a match. And I ran it several times. Never found a match so hopefully you believe there should be a match somehow and you're not just using random integers like I did.
% Create sample data.
vec = randi(8, 1, 100000000);
% Define the pattern. Nan = "don't care".
pattern = [0 4 nan 0 6 nan 0 8 nan]
% Define a mask for what values we want to check.
lastIndex = length(vec) - length(pattern);
% Scan along the vector looking for matches.
for k = 1 : lastIndex
% Print out progress every 100 thousand window locations.
if mod(k, 100000) == 0
fprintf('k = %d of %d (%.1f%%)\n', k, lastIndex, 100*k/lastIndex);
end
% Extract the window.
thisWindow = vec(k : k+length(pattern)-1);
% Compare this window to our pattern but only at the mask = true locations.
% Found a match. Report where it was.
fprintf('Match at k = %d where vec = [%d, %d, %d, %d, %d, %d, %d, %d, %d]\n', k, thisWindow)
end
end
fprintf('Done!\n');
##### 1 Kommentar-1 ältere Kommentare anzeigen-1 ältere Kommentare ausblenden
Image Analyst am 11 Jun. 2022
If there is a match, it will find it quickly, just like the other solutions since it's basically the same algorithm.

Melden Sie sich an, um zu kommentieren.

Matt J am 11 Jun. 2022
Bearbeitet: Matt J am 11 Jun. 2022
vec=[0 4 1 0 6 5 0 8 7, 3 3 3 , 0 4 2 0 6 4 0 8 6]; %patterns start at i=1 and i=13
pat = [0 4 nan 0 6 nan 0 8 nan];
pat=pat(:); vec=vec(:)';
m=numel(vec); n=numel(pat);
include=find(~isnan(pat));
idx=0:m-n;
sequences = cell2mat(arrayfun(@(i)vec(i+idx),include,'uni',0));
matchlocations=find(all(sequences==pat(include),1) )
matchlocations = 1×2
1 13
##### 0 Kommentare-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

Melden Sie sich an, um zu kommentieren.

per isakson am 11 Jun. 2022
I assume it's a vector of integers.
Steve Amphlett showed this trick at comp.soft-sys.matlab twenty years ago.
%% Create sample data
pat = [0,4,nan,0,6,nan,0,8,nan];
msk = true(1,numel(pat));
msk(isnan(pat)) = false;
pat(not(msk)) = 0;
vec = randi([-8,8],1,1e6);
vec(101:109) = [0,4,11,0,6,12,0,8,13];
vec(701:709) = [0,4,14,0,6,15,0,8,16];
%
%% Search matches
tic
z = conv(vec,pat(end:-1:1));
hit = find(abs(z==sum(pat.^2)))-numel(pat)+1;
%%
% hit may contain false hits.
for ix = hit
v9 = vec(ix:ix+8);
if all( v9(msk) == pat(msk) )
disp(ix)
end
end
101 701
toc
Elapsed time is 0.044468 seconds.
##### 0 Kommentare-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

Melden Sie sich an, um zu kommentieren.

Voss am 11 Jun. 2022
Bearbeitet: Voss am 11 Jun. 2022
% the pattern:
pat = [0 4 NaN 0 6 NaN 0 8 NaN];
% create some data containing the pattern:
data = randn(1,10000);
idx = find(~isnan(pat));
for ii = 100:100:9900
data(ii+idx-1) = pat(idx);
end
% find the pattern in the data:
idx = find(~isnan(pat));
result = find(all(data((0:numel(data)-numel(pat)).'+idx) == pat(idx),2));
% display the result:
disp(result);
100 200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400 1500 1600 1700 1800 1900 2000 2100 2200 2300 2400 2500 2600 2700 2800 2900 3000 3100 3200 3300 3400 3500 3600 3700 3800 3900 4000 4100 4200 4300 4400 4500 4600 4700 4800 4900 5000 5100 5200 5300 5400 5500 5600 5700 5800 5900 6000 6100 6200 6300 6400 6500 6600 6700 6800 6900 7000 7100 7200 7300 7400 7500 7600 7700 7800 7900 8000 8100 8200 8300 8400 8500 8600 8700 8800 8900 9000 9100 9200 9300 9400 9500 9600 9700 9800 9900
##### 0 Kommentare-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

Melden Sie sich an, um zu kommentieren.

### Kategorien

Mehr zu Characters and Strings finden Sie in Help Center und File Exchange

R2018b

### Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by