Speed up search for matching strings

4 Ansichten (letzte 30 Tage)
Randy Hessel
Randy Hessel am 5 Jun. 2021
Kommentiert: Randy Hessel am 5 Jun. 2021
The following code works, but I would like to make if faster. Here, bowl_nodes and tmp_nodes are 1D arrays of the same size. Assume size(bowl_nodes) = size(tmp_nodes) = 1000. node_string is a string array. Each string in bowl_nodes matches with one and only one string in tmp_nodes. For example, the string in node_string( bowl_node(1) ) could be the same as the string in node_string( tmp_node(5) ). When matching strings are found, node_match stores the matching array indices and the element in tmp_nodes is removed. This way, every time the for-loop over m executes, it executes over fewer elements. When the code is done executing, tmp_nodes is empty. Can anyone recommend a way to speed up this code? The code's purpose is to fill up the node_match array, which links elements of bowl_nodes to elements of tmp_nodes and visa versa. (Note: tmp_nodes is a copy of a different array, so even though tmp_nodes gets removed by this code, the original array is still intact and is used later in the program.) Thanks.
for n= 1:size( bowl_nodes)
for m= 1:size( tmp_nodes)
if node_string( bowl_nodes(n) ) == node_string( tmp_nodes(m) )
node_match ( bowl_nodes(n) ) = tmp_nodes(m);
node_match ( tmp_nodes(m) ) = bowl_nodes(n);
tmp_nodes(m)= [];
break;
end
end
end

Akzeptierte Antwort

dpb
dpb am 5 Jun. 2021
Only need one loop and I'd guess it'd be faster as written above if you didn't bother to remove the found elements--I suspect the memory reallocation is far more expensive than the extra looping/searching is--but I didn't try timing it to see.
The code snippet shown also doesn't preallocate the index array so it is being reallocated while going through the loop as well unless that is in the actual code but just not shown in the posting.
Try something like
node_match=arrayfun(@(s)find(matches(tmp_nodes,s)),bowl_nodes);
  1 Kommentar
Randy Hessel
Randy Hessel am 5 Jun. 2021
dpb, thank you for your quick response. Interestingly, removing the found elements cut the run time approximately in half, when compared to not removing them. After some rethinking, this code was tried:
[~, bowl_indices]= sort( node_string( bowl_nodes) );
[~, opposed_indices]= sort( node_string( opposed_nodes) );
node_match( bowl_nodes( bowl_indices) )= opposed_nodes(opposed_indices);
node_match( opposed_nodes(opposed_indices) )= bowl_nodes( bowl_indices);
This code requires no loops and takes a fraction of a second to run. Now there is also no need to use the intermediate 'tmp_nodes' and the original array 'opposed_nodes' can be used directly. But thanks again for your repsonse, it is much appreciated.

Melden Sie sich an, um zu kommentieren.

Weitere Antworten (0)

Kategorien

Mehr zu Data Type Identification finden Sie in Help Center und File Exchange

Produkte


Version

R2021a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by