Fixed-Point Precision Rules for Avoiding Overflow in FIR Filters

Fixed-point FIR filters are commonly implemented on digital signal processors, FPGAs, and ASICs. A fixed-point filter uses fixed-point arithmetic and is represented by an equation with fixed-point coefficients. If the accumulator and output of the FIR filter do not have sufficient bits to represent their data, overflow occurs and distorts the signal. Use these two rules to determine FIR filter precision settings automatically. The aim is to minimize resource utilization (memory/storage and processing elements) while avoiding overflow. Because the rules are optimized based on the input precision, coefficient precision, and the coefficient values, the FIR filter must have nontunable coefficients.

The precision rules define the minimum and the maximum values of the FIR filter output. To determine these values, perform min/max analysis on the FIR filter coefficients.

Output Limits for FIR Filters

FIR filter is defined by:

$y [n] = \sum_{k = 0}^{N - 1} h_{k} x [n - k]$

x[n] is the input signal.
y[n] is the output signal.
h_k is the k^th filter coefficient.
N is the length of the filter.

Output Limits for FIR Filters with Real Input and Real Coefficients

Let the minimum value of the input signal be X_min, where X_min ≤ 0, and the maximum value be X_max, where X_max ≥ 0. The minimum output occurs when you multiply the positive coefficients by X_min and the negative coefficients by X_max. Similarly, the maximum output occurs when you multiply the positive coefficients by X_max and the negative coefficients by X_min.

If the sum of all the positive coefficients is

$G^{+} = \sum_{k = 0, h_{k} > 0}^{N - 1} h_{k}$

and the sum of all the negative coefficients is denoted as

$G^{-} = \sum_{k = 0, h_{k} < 0}^{N - 1} h_{k}$

then you can express the minimum output of the filter as

$Y_{\min} = G^{+} X_{\min} + G^{-} X_{\max}$

and the maximum output of the filter as

$Y_{\max} = G^{+} X_{\max} + G^{-} X_{\min}$

Therefore, the output of the filter lies in the interval [Y_min, Y_max].

Complex Filter Convolution Equations

You can define a complex filter (complex inputs and complex coefficients) in terms of the real and imaginary parts of its signals and coefficients:

$\begin{array}{l} Re (y [n]) = \sum_{k = 0}^{N - 1} Re (h_{k}) Re (x [n - k]) - \sum_{k = 0}^{N - 1} Im (h_{k}) Im (x [n - k]) \\ Im (y [n]) = \sum_{k = 0}^{N - 1} Re (h_{k}) Im (x [n - k]) + \sum_{k = 0}^{N - 1} Im (h_{k}) Re (x [n - k]) \end{array}$

The complex filter is decomposed into four real filters as depicted in the signal flow diagram. Each signal is annotated with an interval denoting its range.

Output Limits for FIR Filters with Complex Input and Complex Coefficients

You can extend the real filter min/max analysis to complex filters. Assume that both the real and imaginary parts of the input signal lie in the interval [X_min, X_max].

The complex filter contains two instances of the filter Re(h_k). Both filters have the same input range and therefore the same output range in the interval [V^re_min, V^re_max]. Similarly, the complex filter contains two instances of the filter Im(h_k). Both filters have the same output range in the interval [V^im_min, V^im_max].

Based on the min/max analysis of real filters, you can express V^re_min, V^re_max, V^im_min, and V^im_max as:

$\begin{array}{l} V_{\min}^{r e} = G_{r e}^{+} X_{\min} + G_{r e}^{-} X_{\max} \\ V_{\max}^{r e} = G_{r e}^{+} X_{\max} + G_{r e}^{-} X_{\min} \\ V_{\min}^{i m} = G_{i m}^{+} X_{\min} + G_{i m}^{-} X_{\max} \\ V_{\max}^{i m} = G_{i m}^{+} X_{\max} + G_{i m}^{-} X_{\min} \end{array}$

G⁺_re is the sum of the positive real parts of h_k, given by

$G_{r e}^{+} = \sum_{k = 0, Re (h_{k}) > 0}^{N - 1} Re (h_{k})$
G^-_re is the sum of the negative real parts of h_k, given by

$G_{r e}^{-} = \sum_{k = 0, Re (h_{k}) < 0}^{N - 1} Re (h_{k})$
G⁺_im is the sum of the positive imaginary parts of h_k, given by

$G_{i m}^{+} = \sum_{k = 0, Im (h_{k}) > 0}^{N - 1} Im (h_{k})$
G^-_im is the sum of the negative imaginary parts of h_k, given by

$G_{i m}^{-} = \sum_{k = 0, Im (h_{k}) < 0}^{N - 1} Im (h_{k})$

The minimum and maximum values of the real and imaginary parts of the output are:

$\begin{array}{l} Y_{\min}^{r e} = V_{\min}^{r e} - V_{\max}^{i m} \\ Y_{\max}^{r e} = V_{\max}^{r e} - V_{\min}^{i m} \\ Y_{\min}^{i m} = V_{\min}^{r e} + V_{\min}^{i m} \\ Y_{\max}^{i m} = V_{\max}^{r e} + V_{\max}^{i m} \end{array}$

The worst-case minimum and maximum on either the real or imaginary part of the output is given by

$\begin{array}{l} Y_{\min} = \min (Y_{\min}^{r e}, Y_{\min}^{i m}) \\ Y_{\max} = \max (Y_{\max}^{r e}, Y_{\max}^{i m}) \end{array}$

Fixed-Point Precision Rules

The fixed-point precision rules define the output word length and fraction length of the filter in terms of the accumulator word length and fraction length.

Full-Precision Accumulator Rule

Assume that the input is a signed or unsigned fixed-point signal with word length W_x and fraction length F_x. Also assume that the coefficients are signed or unsigned fixed-point values with fraction length F_h. You can now define full precision as the fixed-point settings that minimize the word length of the accumulator while avoiding overflow or any loss of precision.

The accumulator fraction length is equal to the product fraction length, which is the sum of the input and coefficient fraction lengths.

$F_{a} = F_{x} + F_{h}$
If Y_min = 0, then the accumulator is unsigned with word length

$W_{a} = ⌈ \log_{2} (Y_{\max} 2^{F_{a}} + 1) ⌉$

If Y_min < 0, then the accumulator is signed with word length

$W_{a} = ⌈ \log_{2} (\max (- Y_{\min} 2^{F_{a}}, Y_{\max} 2^{F_{a}} + 1)) ⌉ + 1$

The ceil operator rounds to the nearest integer towards +∞.

Output Same Word Length as Input Rule

This rule sets the output word length to be the same as the input word length. Then, it adjusts the fraction length to avoid overflow. W_q is the output word length and F_q is the output fraction length.

Truncate the accumulator to make the output word length same as the input word length.

$W_{q} = W_{x}$

Set the output fraction length F_q to

$F_{q} = F_{a} - (W_{a} - W_{x})$

Polyphase Interpolators and Decimators

You can extend these rules to polyphase FIR interpolators and decimators.

FIR Interpolators

Treat each polyphase branch of the FIR interpolator as a separate FIR filter. The output data type of the FIR interpolator is the worst-case data type of all the polyphase branches.

FIR Decimators

For decimators, the polyphase branches add up at the output. Hence, the output data type is computed as if it were a single FIR filter with all the coefficients of all the polyphase branches.

Fixed-Point Precision Rules for Avoiding Overflow in FIR Filters

Output Limits for FIR Filters

Fixed-Point Precision Rules

Polyphase Interpolators and Decimators

Related Topics