svdsketch

Compute SVD of low-rank matrix sketch

Description

example

[U,S,V] = svdsketch(A) returns the singular value decomposition (SVD) of a low-rank matrix sketch of input matrix A. The matrix sketch is a low-rank approximation that only reflects the most important features of A (up to a tolerance), which enables faster calculation of a partial SVD of large matrices compared to using svds.

example

[U,S,V] = svdsketch(A,tol) specifies a tolerance for the matrix sketch. svdsketch uses tol to adaptively determine the rank of the matrix sketch approximation. As the tolerance gets larger, fewer features of A are used in the matrix sketch.

example

[U,S,V] = svdsketch(A,tol,Name,Value) specifies additional options with one or more Name,Value pair arguments. For example, you can specify 'MaxIterations' and a scalar to adjust the number of iterations used to form the matrix sketch.

example

[U,S,V,apxErr] = svdsketch(___) additionally returns a vector apxErr that contains the relative approximation error in each iteration, norm(U*S*V'-A,'fro')/norm(A,'fro'). The final entry in the vector apxErr(end) is the relative approximation error of the output returned by svdsketch.

Examples

collapse all

Use svdsketch to compute the SVD factors of a low-rank matrix approximation.

Use gallery to create a 200-by-200 random matrix with geometrically distributed singular values.

A = gallery('randsvd',200);

Use svdsketch to calculate the SVD of a low-rank approximation of A.

[U,S,V] = svdsketch(A);

Check the size of the outputs.

size(S)
ans = 1×2

120   120

The results indicate that the low-rank matrix approximation of A has a rank of 120.

Specify a tolerance with svdsketch to compute the SVD factors of a low-rank matrix approximation. svdsketch adaptively determines the appropriate rank of the matrix sketch based on the specified tolerance.

Use gallery to create a 200-by-200 random matrix with geometrically distributed singular values.

A = gallery('randsvd',200);

Use svdsketch to calculate the SVD of a low-rank approximation of A. Specify a tolerance of 1e-2, and find the size of the output S to determine the rank svdsketch uses for the matrix sketch.

tol = 1e-2;
[U,S,V] = svdsketch(A,tol);
size(S)
ans = 1×2

60    60

The results indicate that the low-rank matrix approximation of A has a rank of 60.

Examine how well the matrix sketch approximates A by comparing A to U*S*V'.

norm(U*S*V' - A,'fro')/norm(A,'fro')
ans = 0.0048

The result indicates that the matrix sketch approximates A within the specified tolerance of 1e-2.

Use svdsketch with the MaxSubspaceDimension option on a matrix that has slowly decaying singular values. You can use this option to force svdsketch to use a subset of the features of A in the matrix sketch.

Create a 5000-by-5000 matrix with values drawn from the standard normal distribution. View the distribution of matrix singular values.

A = randn(5000);
semilogy(svd(A),'-o') Since the singular values in A decay slowly, A has many important features and does not lend itself to low-rank approximations. To form a matrix sketch that reasonably approximates A, most or nearly all of the features need to be preserved.

Use svdsketch on the matrix with a tolerance of 1e-5. Specify four outputs to return the SVD factors as well as the relative approximation error in each iteration.

tol = 1e-5;
[U1,S1,V1,apxError1] = svdsketch(A,tol);
size(S1)
ans = 1×2

5000        5000

The size of S indicates that to satisfy the tolerance, svdsketch needs to preserve all of the features of A. For large sparse input matrices, this can present memory issues since the low-rank approximation of A formed by svdsketch is roughly the same size as A and thus might not fit in memory as a dense matrix.

Check the approximation error of the outputs. Since svdsketch preserves everything in A, the computed answer is accurate, but the calculation was just an expensive way to calculate svd(X).

apxError1(end)
ans = 1.9075e-08

Now, do the same calculation but specify MaxSubspaceDimension as 650 to limit the size of the subspace used to sketch A. This is useful to force svdsketch to use only a subset of the features in A to form the matrix sketch, reducing the size of the outputs.

[U2,S2,V2,apxError2] = svdsketch(A,tol,'MaxSubspaceDimension',650);
size(S2)
ans = 1×2

650   650

The outputs now have a smaller size.

Check the approximation error of the new outputs. The tradeoff for forcing the outputs to be smaller is that many important features of A need to be omitted in the matrix sketch, and the resulting rank 650 approximation of A does not meet the specified tolerance.

apxError2(end)
ans = 0.8214

Input Arguments

collapse all

Input matrix, specified as a sparse or full matrix. A is typically, but not always, a large and sparse matrix. svdsketch is best suited to operate on rank-deficient matrices that have a relatively small number of features.

Data Types: single | double
Complex Number Support: Yes

Matrix sketch tolerance, specified as a real numeric scalar in the range sqrt(eps(class(A))) <= tol < 1.

svdsketch uses the value of tol to adaptively determine which features of A to use in the low-rank approximation (matrix sketch) of A. As the value of tol increases, svdsketch uses fewer features of A to form the matrix sketch.

Example: [U,S,V] = svdsketch(A,1e-4)

Data Types: single | double

Name-Value Arguments

Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside quotes. You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN.

Example: [U,S,V] = svdsketch(A,1e-10,'MaxIterations',100) uses 100 iterations in the svdsketch algorithm.

Maximum subspace dimension, specified as a positive integer scalar. The subspace dimension controls the memory consumption of the svdsketch algorithm. If the algorithm runs out of memory for a specified tolerance, then you can specify a smaller value for MaxSubspaceDimension so that the algorithm stays within memory. For example, adjusting this parameter can be useful when A has singular values that decay slowly.

When you specify the MaxSubspaceDimension option, you set a maximum value on the rank of the matrix sketch used by svdsketch. Therefore, if svdsketch cannot satisfy the specified tolerance with a rank smaller than MaxSubspaceDimension, it uses the maximum allowed rank, and the resulting outputs might not satisfy the specified tolerance.

Example: [U,S,V] = svdsketch(A,1e-7,'MaxSubspaceDimension',150)

Data Types: single | double

Initial algorithm block size, specified as a positive integer scalar. The block size is the number by which the rank of the matrix sketch increases each iteration. A larger block size reduces the number of needed iterations by doing more work per iteration, but might also add more information to the calculation than is necessary to achieve convergence.

As the algorithm proceeds, svdsketch might adjust the block size from the initial value to speed up convergence if the relative approximation error apxErr is not decaying fast enough.

If you specify BlockSize, the value should be smaller than MaxSubspaceDimension.

Example: [U,S,V] = svdsketch(A,1e-7,'BlockSize',10)

Data Types: single | double

Maximum number of algorithm iterations, specified as a positive integer scalar. More iterations can produce a higher-quality matrix sketch at the cost of more execution time and higher memory consumption.

Example: [U,S,V] = svdsketch(A,1e-7,'MaxIterations',25)

Data Types: single | double

Number of power iterations, specified as a nonnegative integer scalar. Power iterations improve the orthogonality of the U and V outputs. You should generally select the number of power iterations to be 0, 1, or 2, as larger values can adversely contribute to round-off error.

Example: [U,S,V] = svdsketch(A,1e-7,'NumPowerIterations',2)

Data Types: single | double

Output Arguments

collapse all

Left singular vectors of matrix sketch, returned as a matrix. The columns of U are orthonormal, and they form a set of basis vectors for the range of the matrix sketch of A.

The size of U depends on the value of tol. As tol gets larger, svdsketch uses fewer features of the input matrix to form the matrix sketch, so U and V also have fewer columns.

Different machines and releases of MATLAB® can produce different singular vectors that are still numerically accurate. Corresponding columns in U and V can flip their signs, since this does not affect the value of the expression A = U*S*V'.

Singular values of matrix sketch, returned as a square diagonal matrix. The diagonal elements of S are the strictly positive singular values of the matrix sketch in decreasing order.

The size of S depends on the value of tol. As the tolerance increases, svdsketch uses fewer features of the input matrix to form the matrix sketch, so S has correspondingly fewer rows and columns.

Right singular vectors of matrix sketch, returned as a matrix. The columns of V are orthonormal, and they form a set of basis vectors for the null space of the matrix sketch of A.

The size of V depends on the value of tol. As tol gets larger, svdsketch uses fewer features of the input matrix to form the matrix sketch, so U and V also have fewer columns.

Different machines and releases of MATLAB can produce different singular vectors that are still numerically accurate. Corresponding columns in U and V can flip their signs, since this does not affect the value of the expression A = U*S*V'.

Relative approximation error in each iteration, returned as a vector. The length of apxErr is equal to the number of iterations used in the svdsketch algorithm. Use MaxIterations to adjust the number of iterations.

The entries of apxErr are the relative approximation errors in each iteration, norm(U*S*V'-A,'fro')/norm(A,'fro'). The final entry in the vector apxErr(end) is the relative approximation error of the output returned by svdsketch.

Tips

• Use svdsketch when you do not know ahead of time what rank to specify with svds, but you know what tolerance the approximation of the SVD should satisfy.

• svds computes the best possible rank-k approximation of the SVD (using the default "largest" method). svdsketch does not guarantee its rank-k approximation is the best one, which accounts for its speed advantage over svds.

Algorithms

svdsketch applies a tolerance to form a low-rank matrix approximation $A\approx QB$ of the input matrix A. This low-rank approximation is called a matrix sketch. The matrix sketch only preserves important features from A, filtering unnecessary information out. The relative approximation error apxErr of the matrix sketch aims to satisfy the specified tolerance tol:

${e}_{\text{sketch}}=\frac{{‖QB-A‖}_{F}}{{‖A‖}_{F}}\le tol$

The process svdsketch follows to form the matrix sketch is:

• svdsketch forms the matrix sketch iteratively, with each iteration adding new columns to Q and new rows to B. The new columns and rows are created by extracting features from A using a random sample matrix. You can control the number of columns and rows added in each iteration with the BlockSize name-value pair.

• During each iteration, svdsketch uses power iterations to improve the orthogonality of the new columns in Q. You can adjust the number of power iterations with the NumPowerIterations name-value pair.

• The iterations to form the matrix sketch stop when: the number of columns in Q and rows in B reach the specified value for MaxSubspaceDimension, the number of iterations reaches MaxIterations, or the relative approximation error converges (apxErr <= tol).

• To improve convergence speed, svdsketch might increase the specified initial value for BlockSize from iteration to iteration if the decay in apxErr is not sufficient.

After the matrix sketch $A\approx QB$ is formed, svdsketch computes the singular value decomposition (SVD) of the matrix sketch via [U1,S,V] = svd(B,'econ'), such that

$A\approx QB=\left(Q{U}_{1}\right)S{V}^{H}=US{V}^{H}.$

If svdsketch is able to filter out some features of A based on the specified tolerance, then the resulting SVD factors contain fewer singular values and singular vectors than if a full SVD was performed on A.

 Yu, Wenjian, Yu Gu, and Yaohang Li. “Efficient Randomized Algorithms for the Fixed-Precision Low-Rank Matrix Approximation.” SIAM Journal on Matrix Analysis and Applications 39, no. 3 (August 2018): 1339–1359. https://doi.org/10.1137/17M1141977.