Implement Hardware-Efficient QR Decomposition Using CORDIC in a Systolic Array

This example shows how to compute the QR decomposition of matrices using hardware-efficient MATLAB® code in Simulink®.

To solve a system of equations or compute a least-squares solution to the matrix equation AX = B using the QR decomposition, compute R and Q'B, where QR = A and RX = Q'B. R is an upper triangular matrix and Q is an orthogonal matrix. If you just want Q and R, then set B to be the identity matrix.

In this example, R is computed from matrix A by applying Givens transformations using the CORDIC algorithm. C = Q'B is computed from matrix B by applying the same Givens transformations. The algorithm uses only iterative shifts and additions to perform these computations.

Open Model

Open the fxpdemo_real_4x4_systolic_array_QR_model model. In this example, the number of rows and columns of A must be 4 and matrix B must have 4 rows and any number of columns. The systolic array can be extended to any size by following the same pattern. This model supports fixed-point, double-precision, and single-precision data types. This implementation works only with real input matrices.

open_system('fxpdemo_real_4x4_systolic_array_QR_model')

To enter your own input matrices A and B, open the block parameters of the corresponding enabled subsystem blocks to on the left side of the model. After simulation, the model returns the computed output matrices C = Q'B and R to the workspace. You can specify the number of CORDIC iterations in the block parameters of the transpose(Q)*B, R 4x4 Real CORDIC Systolic Array subsystem. If the inputs are fixed point, then the number of CORDIC iterations must be less than the word length. The accuracy of the computation improves one bit for each iteration, up to the word length of the inputs.

To see how the algorithm performs the factorization, look under the mask of the transpose(Q)*B, R 4x4 Real CORDIC Systolic Array subsystem. The annotations indicate which rows of the matrix are being operated on to zero-out the sub-diagonal elements.

To see the MATLAB code in the MATLAB Function block that performs the Givens transformations using CORDIC, continue to look under the block masks.

Use QR to Solve Matrix Equation `Ax = B`

The first step in solving the matrix equation AX = B is to compute RX = Q'B, where R is upper-triangular, Q is orthogonal, and Q*R = A.

These inputs are double-precision floating-point types. Set the number of iterations to be 52, which is the number of bits in the mantissa of double.

NumberOfCORDICIterations = 52;
A = 2*rand(4,4)-1;
B = 2*rand(4,4)-1;

To compute R and C = Q'B, simulate the model.

sim fxpdemo_real_4x4_systolic_array_QR_model
R

R = 4×4

    1.5149   -0.0519    1.7292   -0.3224
         0    0.9593   -0.0259   -0.0879
         0         0    0.2565    1.0888
         0         0         0   -0.6429

C = 4×4

    0.5942   -0.2382    0.0676   -0.9370
   -0.8887    0.6146   -0.5758    0.3051
    0.1725    0.7339    0.5409    0.5374
    0.8540    1.1078   -0.2183   -0.5620

Verify that back-substituting with R and C = Q'B gives the same results as MATLAB.

X = R\C

X = 4×4

   -7.1245  -12.1131   -0.6637    1.4236
   -0.8779    0.7572   -0.5511    0.3545
    6.3113   10.1759    0.6673   -1.6155
   -1.3283   -1.7231    0.3396    0.8741

X_should_be = A\B

X_should_be = 4×4

   -7.1245  -12.1131   -0.6637    1.4236
   -0.8779    0.7572   -0.5511    0.3545
    6.3113   10.1759    0.6673   -1.6155
   -1.3283   -1.7231    0.3396    0.8741

The norm of the difference between built-in MATLAB and the CORDIC QR solution should be small.

norm(X - X_should_be)

ans = 3.2578e-14

Compte `Q` and `R`

To compute Q and R, set B equal to the identity matrix.

NumberOfCORDICIterations = 52;

A = 2*rand(4,4)-1;
B = eye(size(A,1),'like',A);

sim fxpdemo_real_4x4_systolic_array_QR_model

When B equals the identity matrix, then Q = C'.

Q = C';

The theoretical QR decomposition is QR==A, so the difference between the computed QR and A should be small.

norm(Q*R - A)

ans = 2.2861e-15

QR is Not Unique

The QR decomposition is only unique up to the signs of the rows of R and the columns of Q. You can make a unique QR decomposition by making the diagonal elements of R all positive.

D = diag(sign(diag(R)));
Qunique = Q*D

Qunique = 4×4

   -0.3086    0.1224   -0.1033   -0.9376
   -0.6277   -0.7636   -0.0952    0.1174
   -0.5573    0.3930    0.7146    0.1559
    0.4474   -0.4975    0.6852   -0.2877

Runique = D*R

Runique = 4×4

    1.4459   -0.8090    0.1547    0.3977
         0    1.1441    0.0809   -0.2494
         0         0    0.8193    0.1894
         0         0         0    0.4836

Compare the computed QR from the model to the MATLAB qr function.

[Q0,R0] = qr(A);
D0 = diag(sign(diag(R0)));
Q0 = Q0*D0

Q0 = 4×4

   -0.3086    0.1224   -0.1033   -0.9376
   -0.6277   -0.7636   -0.0952    0.1174
   -0.5573    0.3930    0.7146    0.1559
    0.4474   -0.4975    0.6852   -0.2877

R0 = D0*R0

R0 = 4×4

    1.4459   -0.8090    0.1547    0.3977
         0    1.1441    0.0809   -0.2494
         0         0    0.8193    0.1894
         0         0         0    0.4836

Use Fixed-Point for Hardware-Efficient Implementation

Use fixed-point input data types to produce efficient HDL code for ASIC and FPGA devices.

You can run many test inputs through the model by making A and B 3-dimensional arrays.

n_test_inputs = 100;

The random inputs for matrices A and B are scaled between -1 and 1. Set the fixed-point word length to 18 bits and fraction length to 14 bits to allow for growth in the QR factorization and intermediate computations in the CORDIC algorithm.

word_length = 18;
fraction_length = 14;

The best-precision number of CORDIC iterations is the word length minus one. If the number of CORDIC iterations is set to smaller than word_length - 1, then the latency and clock ticks to next ready signal will be shorter, but it will be less accurate. The number of CORDIC iterations should not be set any larger because the generated code does not support shifts greater than the word length of a fixed-point type.

NumberOfCORDICIterations = word_length - 1

NumberOfCORDICIterations = 17

The random test inputs are concatenated so that at time k, the inputs are A(:,:,k) and B(:,:,k). Each element of A and B is a uniform random variable between -1 and +1.

A = 2*rand(4,4,n_test_inputs)-1;

Choose B to be the identity matrix so Q=C'.

B = eye(4);
B = repmat(B,1,1,n_test_inputs);

Cast A to fixed point, and cast B like A.

A = fi(A,1,word_length,fraction_length);
B = cast(B,'like',A);

Simulate Model

Simulate the model with fixed-point inputs.

sim fxpdemo_real_4x4_systolic_array_QR_model

Calculate and Plot Errors

Calculate the errors.

norm_error = zeros(1,size(R,3));
for k = 1:size(R,3)
    Q_times_R_minus_A = double(C(:,:,k))'*double(R(:,:,k)) - double(A(:,:,k));
    norm_error(k) = norm(Q_times_R_minus_A);
end

Plot the errors. The errors should be on the order of 10^-3.

clf
plot(norm_error,'o-')
grid on
title('norm(QR - A)')

%#ok<*NASGU,*NOPTS>