fast_sorted_mask

fast masking of ordered vectors based on binary search
17 Downloads
Aktualisiert 2 Dez 2021

# fast_sorted_mask
**Bryce M. Henson**
Matlab code for fast masking/selection of ordered vectors based on binary search.
**Status:** This Code **is ready for use in other projects**. Testing is implemented and passing.

Selecting the elements of a vector that lie between some limits (herein *masking*, sometimes known as 'gating') is a widely used analytical tool (eg. particle physics).
The common approach to masking data involves comparing each element (n) to the upper and lower limit (herein *Brute compare*) has complexity O(n). The novel contribution of this code is a demonstration of a relatively simple approach that uses a binary search algorithm ( O(log(n)) ) on an ordered vector to achieve superior performance in two use cases.
1. Data that is already sorted ( O(log(n)) cf. brute O(n) ).
2. When there is a requirement to repeatedly (m times) mask the same data such that the inial cost of the sort is offset by the increased speed of the mask operation. (O(n·log(n)+m·log(n)) cf. O(n·m) )
Note that if m is small and you check that the data is ordered (eg issorted(data)) you have probably lost most of any potential speedup already.

There are two things that the user may want from this masking operation:
1. Returning the number of data points(or counts) in this mask (herein *return mask count*).
* This is where the search based algorithm really shines compared to the brute mask (as it just subtracts the upper and lower index while the brute compare must count up the logical vector, see details).
2. Return the values of the data that makes it through this mask. (herein *retun mask values*)
* This has a smaller speedup because copying a subset of a vector (even a contiguous block) is a substantially slower than the search.

The code here demonstrates the algorithm for both types (counting and subset) in native matlab and provides a number of tests in order to compare the performance.
For taking small slices of large (>1e6 elements) sorted vectors, a speedup of 1000x is demonstrated.

## Usage Examples
The brute compare implementation is very easy by using a logical mask vector:
```
mask=data>lower_lim & data<upper_lim
```
(If you are not familaiar with Logical indexing [read more here](https://blogs.mathworks.com/loren/2013/02/20/logical-indexing-multiple-conditions/) )
the number of counts (integer) may be extracted as
```
num_mask=sum(mask)
```
or the subset of data points (vector)
```
subset_mask=data(mask)
```
The equivelent (but faster) operation using fast_sorted_mask on ordered data is:
```
mask_idx=fast_sorted_mask(data,lower_lim,upper_lim);
num_mask=mask_idx(2)-mask_idx(1)+1;
subset_mask=data(mask_idx(1):mask_idx(2));
```
WARNING: the data vector MUST be sorted. See figures above for when it is still advantagous to sort unordered data and then use this approach.

Zitieren als

Bryce Henson (2024). fast_sorted_mask (https://github.com/brycehenson/fast_sorted_mask), GitHub. Abgerufen .

Kompatibilität der MATLAB-Version
Erstellt mit R2019a
Kompatibel mit allen Versionen
Plattform-Kompatibilität
Windows macOS Linux

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Versionen, die den GitHub-Standardzweig verwenden, können nicht heruntergeladen werden

Version Veröffentlicht Versionshinweise
1.2.4

Fix logo labels

1.2.3

corrected description

1.2.2

added logo

1.2.1

include acknowledgment "Smart"/Silent Figure by Daniel Eaton

1.2

Um Probleme in diesem GitHub Add-On anzuzeigen oder zu melden, besuchen Sie das GitHub Repository.
Um Probleme in diesem GitHub Add-On anzuzeigen oder zu melden, besuchen Sie das GitHub Repository.