Main Content

extractEmbeddings

Extract feature embeddings from Segment Anything Model (SAM) encoder

Since R2024a

Description

embeddings = extractEmbeddings(sam,I) extracts the feature embeddings of the input image I from the encoder of a Segment Anything Model (SAM), sam, by running a forward pass on the neural network.

Note

This functionality requires Deep Learning Toolbox™, Computer Vision Toolbox™, and the Image Processing Toolbox™ Model for Segment Anything Model. You can install the Image Processing Toolbox Model for Segment Anything Model from Add-On Explorer. For more information about installing add-ons, see Get and Manage Add-Ons.

example

Examples

collapse all

Create a Segment Anything Model (SAM) object.

sam = segmentAnythingModel;

Load an image that contains objects to segment into the workspace.

I = imread("pears.png");

Extract the feature embeddings from the image.

embeddings = extractEmbeddings(sam,I);

Input Arguments

collapse all

Segment Anything Model for semantic segmentation, specified as a segmentAnythingModel object.

Image or batch of images from which to extract feature embeddings, specified as a 3-D or 4-D numeric array, depending on the number of RGB images.

Number of ImagesData Format
Single RGB image3-D numeric array of size H-by-W-by-3
Batch of B RGB images4-D numeric array of size H-by-W-by-3-by-B

H and W are the height and width, respectively, of the input image I.

Tip

For best model performance, use an image with a data range of [0, 255], such as one with a uint8 data type. If your input image has a larger data range, rescale the range of pixel values using the rescale function:

I = 255.*rescale(I)

Output Arguments

collapse all

Feature embeddings extracted from the Segment Anything Model encoder, returned as a 64-by-64-by-256 or 64-by-64-by-256-by-B array, depending on the number of input images.

Number of Input ImagesEmbeddings Format
Single RGB image64-by-64-by-256 array
Batch of B RGB images64-by-64-by-256-by-B array

Tips

  • For best model performance, use an image with a data range of [0, 255], such as one with a uint8 data type. If your input image has a larger data range, rescale the range of pixel values using the rescale function.

References

[1] Kirillov, Alexander, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, et al. "Segment Anything," April 5, 2023. https://doi.org/10.48550/arXiv.2304.02643.

Version History

Introduced in R2024a