Texas Instruments | Radar Hardware Accelerator - Part 2 (Rev. A) | User Guides | Texas Instruments Radar Hardware Accelerator - Part 2 (Rev. A) User guides

Texas Instruments Radar Hardware Accelerator - Part 2 (Rev. A) User guides
User's Guide
SWRU527A – May 2017 – Revised March 2018
Radar Hardware Accelerator - Part 2
The radar hardware accelerator user's guide is presented in two parts: Radar Hardware Accelerator User's
Guide - Part 1 and Radar Hardware Accelerator User's Guide - Part 2. It describes the radar hardware
accelerator architecture, features, operation of various blocks and their register descriptions. The purpose
is to enable the user to understand the capabilities offered by the radar hardware accelerator and to
program it appropriately to achieve the desired functionality.
This user's guide is split into two parts:
• The first part (a separate document) provides an overview of the overall architecture and features
available in the radar hardware accelerator. The main features, such as, windowing, FFT and logmagnitude are covered in the first part.
• The second part of this document covers additional features like CFAR-CA, FFT stitching, complex
multiplication and other advanced usage possibilities. This part of the user's guide assumes that the
user has already read and understood Part 1 (see Radar Hardware Accelerator User's Guide - Part 1.
This document is organized as follows:
• Section 1 covers some additional features of the core computational unit related to pre-FFT
processing.
• Section 2 covers details of CFAR-CA detection feature.
• Section 3 covers other miscellaneous capabilities such as statistics computation are discussed.
• Section 4 covers the specific use-case of FFT stitching is described using an example.
1
2
3
4
5
Contents
Core Computational Unit – Pre-Processing .............................................................................. 2
Core Computational Unit - CFAR Engine................................................................................. 9
Core Computational Unit – Statistics ................................................................................... 17
FFT Stitching Use-Case Example........................................................................................ 19
References .................................................................................................................. 24
1
Core Computational Unit.................................................................................................... 2
2
Complex Multiplication Capability in Pre-Processing Block ............................................................ 6
3
BPM Removal Capability ................................................................................................... 7
4
CFAR Engine ................................................................................................................. 9
5
CFAR Engine Block Diagram ............................................................................................. 10
6
CFAR-CA: Cells Used for Surrounding Noise Average
List of Figures
7
8
9
10
11
12
13
14
15
..............................................................
CFAR Engine Output Format .............................................................................................
Handling of Samples Near the Edge in Non-Cyclic Mode ...........................................................
Handling of Samples Near the Edge in Cyclic Mode .................................................................
Input Formatter Sample Streaming for the Cyclic CFAR Example ..................................................
Statistics Block .............................................................................................................
Layout of Samples in Source Memory for 1TX, 1RX, 4K Complex FFT ............................................
Linear Interpolation of Window RAM Coefficients for 4K FFT .......................................................
Layout of Samples in Destination Memory (after Step 1) .............................................................
1024 4-Point FFTs in Step 2 (FFT Stitching) ..........................................................................
SWRU527A – May 2017 – Revised March 2018
Submit Documentation Feedback
Copyright © 2017–2018, Texas Instruments Incorporated
Radar Hardware Accelerator - Part 2
11
12
13
13
14
17
20
21
22
22
1
Core Computational Unit – Pre-Processing
16
www.ti.com
Layout of Final FFT Output in Correct Bin Order
......................................................................
24
List of Tables
1
Pre-Processing Block Registers ........................................................................................... 7
2
CFAR Modes and Register Settings ..................................................................................... 10
3
CFAR Output Modes and Register Settings ............................................................................ 12
4
Configuration Example for CFAR Cyclic Mode ......................................................................... 14
5
CFAR Engine Registers ................................................................................................... 14
6
Statistics Output Modes ................................................................................................... 18
7
Statistics Block Registers ................................................................................................. 19
8
Parameter-Set #0: Used for Step #1
9
Parameter-Set #1: Used for Step #2
....................................................................................
....................................................................................
23
23
Trademarks
All trademarks are the property of their respective owners.
1
Core Computational Unit – Pre-Processing
This section provides an overview of the pre-processing block inside the core computational unit.
Specifically, the features of interference zero-ing out and complex multiplication.
Core Computational Unit
FFT Engine (ACCEL_MODE = 00b)
Windowing+FFT
Log-Magnitude Post- processing
Pre-processing
24
24
Complex
Mult
24
24
Mag
24
JP L
x
LOG2EN
Log2
FFT _OUTP UT_
MODE
16
24
A pprox.
FFT
18
Mag
JP L
A pprox.
ABS EN
FFT_EN
WINDOW_EN
Zero Out Interf
Samples
Sin ,
Cos
LUT
>Th?
CMULT_MODE
Window
RAM
Statistics
(Sum, Max)
INTERFTHRESH
CFAR -CA Detector
CFAR Engine (ACCEL_MODE = 01b)
Local
Max
Check
Log-Magnitude Pre-processing
Compare
24
Mag
Squared
Mag
JP L
24
RAM
(left buffer)
G
U
A
R
D
Cell
Under
Test
G
U
A
R
D
RAM
(right
buffer)
24
Log2
S liding
S um
ACCEL_MODE
24
S liding
S um
Threshold
Mult /
Add
16
A pprox.
S um or
GO or
SO
Figure 1. Core Computational Unit
2
Radar Hardware Accelerator - Part 2
SWRU527A – May 2017 – Revised March 2018
Submit Documentation Feedback
Copyright © 2017–2018, Texas Instruments Incorporated
Core Computational Unit – Pre-Processing
www.ti.com
1.1
Pre-Processing Block
The pre-processing block (see Figure 1) provides the ability to perform some simple pre-FFT
manipulations of the samples. As shown in Figure 1, the pre-processing block, the windowing block, FFT
block and log-magnitude post-processing block are connected to each other, so that the output from one
block can be fed into the next block, if required. One or more of these blocks can be enabled in each
parameter-set to perform the desired operations.
The pre-processing block comprises provision for interference zero-out and complex multiplication. These
sub-blocks are described in the following sections.
1.1.1
Interference Zero-Out
In an FMCW radar transceiver, interference from another radar typically manifests itself as a time-domain
spike in a few samples. This spike corresponds to the time duration when the chirping frequency of both
radars overlap with each other. Such a time-domain spike caused by interference can lead to degradation
in the noise floor at the FFT output, causing loss of detection.
In order to mitigate the impact of interference, one common technique is to zero-out any large glitches in
the time-domain samples. The pre-processing block provides capability to perform this operation.
Specifically, the input samples are fed through a magnitude calculation (based on JPL approximation),
which computes a 24-bit magnitude of the 24-bit input complex sample. For definition of this
approximation, see Radar Hardware Accelerator User's Guide - Part 1. Any sample whose magnitude
exceeds a programmable threshold is zero-ed out, both in real and imaginary part. The programmable
threshold is a 24-bit register named INTERFTHRESH. This threshold register is not part of the parameterset and is a common register. Note that a value of 0xFFFFFF can be programmed to disable the zero-out
feature, since the magnitude can never exceed this value anyway. Alternately, a separate
INTERFTHRESH_EN register is provided as part of the parameter-set to control when the interference
zero-ing out should be enabled.
Note the limitation that the threshold is a static configuration and there is no specific provision in the
accelerator to automatically calculate the threshold based on the RMS of the current set of data samples.
However, there are possible ways of accomplishing this indirectly, by performing a first-pass of the input
samples to compute the signal statistics (with a bit-shift scaling on the statistic) and then using the main
processor or DMA to copy this scaled statistic value to the interference threshold register, so that in a
subsequent second pass, the interference zeroing out can be accomplished. On a separate note, if the
user is interested just to know the indices of the time-domain spikes, then it is possible to use the CFAR
engine path to look for spikes in the samples and record the indices of the spikes. The CFAR engine is
covered in Section 2.
1.1.2
Complex Multiplication
In addition to interference zero-out, the pre-processing block contains a complex multiplication sub-block.
The purpose of this sub-block is to enable several assorted capabilities that require complex multiplication
of the input samples. The CMULT_MODE register is used to enable and configure the complex
multiplication functionality. The complex multiplication sub-block can be disabled (bypassed) by the setting
cmult_mode to 000b. any other value of this register will enable the complex multiplication sub-block and
configure it to perform specific operation as described in the next few paragraphs.
There are seven modes of the complex multiplier supported, as follows. They are frequency shifter mode,
frequency shifter with auto-increment mode (a slow DFT mode), FFT stitching mode, magnitude squared
mode, scalar multiplication mode, vector multiplication modes 1 and 2.
• Frequency shifter mode: If the register value is CMULT_MODE = 001b, then the complex multiplier
functions as a frequency shifter, which can be used to de-rotate the input samples by a certain
frequency. This de-rotation is accomplished using cos, sin values from a twiddle factor look-up table
(LUT). This LUT contains the (compressed) equivalent of the cos, sin values corresponding to the
16384 long sequence exp(j*2*pi*(0:16383)/16384). Another register (TWIDINCR) is used to specify the
de-rotation frequency, by specifying how much the phase should change for each successive input
sample (that register controls how much the LUT read index increments every sample). In effect, the
input samples x(n) for n = 0 to SRCACNT-1 are multiplied by the sequence,
exp(j*2*pi*TWIDINCR*(0:SRCACNT-1)/16384).
SWRU527A – May 2017 – Revised March 2018
Submit Documentation Feedback
Copyright © 2017–2018, Texas Instruments Incorporated
Radar Hardware Accelerator - Part 2
3
Core Computational Unit – Pre-Processing
•
•
•
•
4
www.ti.com
Frequency shifter with auto-increment mode (a slow DFT mode): If the register value is
CMULT_MODE = 010b, then the complex multiplier functions in a mode which enables Discrete
Fourier Transform (DFT) computation. In this case, the complex multiplier performs a function that is
very similar to frequency shifter mode, except that, at the end of each iteration, the de-rotation
frequency is automatically incremented for the next iteration. Note that DFT computation for a given set
of input samples involves de-rotating the samples by one frequency at a time, and computing a sum of
the de-rotated samples for each such frequency. To achieve DFT computation, the Input Formatter
should be configured to send the same set of input samples to the complex multiplier for multiple
iterations (as many as the number of DFT bins required) and the complex multiplier de-rotates the
samples by one frequency at a time and auto-increments to the next frequency for the next iteration.
Also, the statistics block (explained in a later section) is used to compute the sum of the de-rotated
samples corresponding to each iteration, which then becomes the final DFT value.
The DFT computation is ‘slow’ in the sense that in each 200 MHz clock cycle, only one complex
multiplication is performed. For example, for a 512-point input sample set, it would take 512 clock
cycles per DFT bin. However, since the DFT mode is typically only used for FFT peak interpolation
(very few bins), it is acceptable. The starting frequency for the DFT computation is specified in the
TWIDINCR register (similar to the frequency shifter mode). The increment value by which the
frequency increments every iteration is obtained from FFTSIZE register – Note that the DFT mode
cannot be used simultaneously with FFT enabled, hence the FFTSIZE register has been over-loaded
for providing the increment value in this mode. The increment value is calculated as 2^(14 – FFTSIZE)
and hence the DFT resolution is 16384/2^(14 – FFTSIZE) = 2^FFTSIZE. As an example, if FFTSIZE =
1011b, then the DFT resolution is 2048. This is equivalent to computing DFT points corresponding to
2K size FFT grid. The highest resolution for the DFT would be obtained when FFTSIZE = 1110b (max
allowed value), in which case the DFT resolution is 16384 (corresponding to 16K size FFT grid).
In effect, for the kth iteration (with k starting from 0), the input samples x(n) for n = 0 to SRCACNT-1 are
multiplied by the sequence, exp(j*2*pi*(TWIDINCR+2^(14-FFTSIZE)*k)*(0:SRCACNT-1)/16384).
FFT Stitching mode: If the register value is CMULT_MODE = 011b, then the complex multiplier
functions in FFT stitching mode. This mode is useful when large size FFTs (2K and 4K) are required.
Since the FFT block natively supports only up to 1024 size, for 2048 and 4096 point FFT, an FFT
Stitching procedure using two steps (two parameter-sets) can be used. As an example, when a 4K size
FFT is needed, it is achieved in two steps as follows. In the first step, every 4th input sample is passed
through a 1K size FFT (four 1K point FFTs are performed on decimated input samples). Then, in the
next step, the resulting 4x1024 FFT outputs are sent through four-point “stitching” FFTs (1024 fourpoint FFTs), with an additional pre-multiplication by the complex multiplier block to achieve FFT
stitching. This pre-multiplication uses the twiddle factor LUT in a specific pattern, for which additional
configuration information is available in TWIDINCR register (2 LSB bits). If the LSB two bits of
TWIDINCR register are 00b, then the twiddle factor pattern will correspond to what is required for 2K
(2x1024) size FFT stitching. If the LSB two bits are 01b, then the twiddle factor pattern will correspond
to what is required for 4K (4x1024) size FFT stitching. Values of 10b and 11b are reserved and should
not be used. Also, the unused 12 MSB bits of TWIDINCR register must be zero in this mode of
operation. The last section includes a more detailed explanation and configuration information for the
FFT stitching example for 2K and 4K FFT, including the use of WINDOW_INTERP_FRACTION
register for extending the window RAM using linear interpolation to more than 1024 coefficients.
Magnitude squared mode: If the register value is CULT_MODE = 100b, then the complex multiplier
functions in magnitude squared mode. In this case, the complex multiplier takes every complex input
and produce the magnitude squared as the output. This can be used together with the statistics block
(explained in Section 3) to compute the mean squared sum of the input samples.
Scalar multiplication mode: If the register value is CMULT_MODE = 101b, then the complex multiplier
functions in scalar multiplication mode. This feature is useful if the input samples need to be scaled by
some constant factor. In this case, the complex multiplier will multiply each input sample with a 21-bit
scalar complex number that is programmed in ICMULTSCALE and QCMULTSCALE registers (for I
and Q value, each having 21 bits). The ICMULTSCALE and QCMULTSCALE registers are common
registers and not part of parameter-set. Note that this feature cannot be used to multiply the input
samples for different iterations (channels) with different complex scalars.
Radar Hardware Accelerator - Part 2
SWRU527A – May 2017 – Revised March 2018
Submit Documentation Feedback
Copyright © 2017–2018, Texas Instruments Incorporated
Core Computational Unit – Pre-Processing
www.ti.com
•
•
Vector multiplication mode 1: If the register value is CMULT_MODE = 110b, then the complex
multiplier functions in vector multiplication mode 1. The purpose of this mode is to enable element-wise
multiplication of two complex vectors, as well as dot-product capability (using statistics block to sum
the element-wise multiplication output). The samples from the Input Formatter block constitute one of
the two vectors, whereas the other vector is taken from a pre-loaded internal RAM inside the core
computational unit. (This internal RAM is a RAM that is normally used by the FFT block when
performing 1024-point FFT computation). This internal RAM can store 512-complex samples and
hence the vector multiplication can support a maximum of 512 elements of multiplication. The Vector
multiplication is not a highly parallelized operation, in the sense that only one complex multiplication is
done per 200MHz clock cycle.
It is important to note one important limitation: since the internal RAM is shared with the FFT,
performing a 1024-point FFT destroys the pre-loaded contents of the RAM. Therefore, performing
vector multiplication and 1024-point FFT back-to-back many times requires re-loading of the internal
RAM each time and will be inefficient. However, note that this limitation of having to re-load the internal
RAM does not apply when performing FFT of size 512 or less, which is often the case for second and
third dimension FFTs.
The operation of the vector multiplication mode 1 is as follows. The streaming set of samples from the
Input Formatter block coming at 200 MHz is element-wise multiplied with successive samples from the
internal RAM. The statistics block (described in a later section) can be used to compute the sum for
every iteration, which enables a dot-product implementation if desired. At the end of every iteration, the
addressing from the internal RAM is reset, so that for the next iteration, the samples are picked up
from the start of the internal RAM.
Vector multiplication mode 2: If the register value is CMULT_MODE = 111b, then the complex
multiplier functions in vector multiplication mode 2, which is slightly different from the earlier mode. The
only difference in this case is that at the end of every iteration, the addressing of the internal RAM is
not reset, so that for the next iteration, the samples from the internal RAM are picked up with an
address that continues from where it left off at the end of the previous iteration. This mode can be used
when a given set of input samples needs to be element-wise multiplied with multiple vectors. In this
case, the input formatter block can be configured to repeat the same set of samples for multiple
iterations, and the internal RAM can be loaded with all the vectors, such that for successive iterations,
the input samples are multiplied with successive vectors.
For loading the internal RAM used for the vector multiplication modes, the register bit STG1LUTSELWR is
used. The internal RAM for vector multiplication mode is mapped to the same address space as the
Window RAM. Therefore, this register bit is required to specify which of these two (Window RAM or
internal RAM) need to be selected, when loading the co-efficients via DMA or main processor. If the
register bit is 0, then the Window RAM is selected, else, the internal RAM for vector multiplication mode is
selected. Note that the other registers such as WINDOW_START, which pertains to windowing, are
always applicable only for the Window RAM. The STG1LUTSELWR register bit should in general be kept
as 0 (Window RAM selected). This allows the main processor or DMA to have access to the Window RAM
by default. Only when it is desired to load the internal RAM with coefficients for vector multiplication mode,
this register bit should be temporarily set to 1. After loading the coefficients, the register bit should be
made 0 again.
SWRU527A – May 2017 – Revised March 2018
Submit Documentation Feedback
Copyright © 2017–2018, Texas Instruments Incorporated
Radar Hardware Accelerator - Part 2
5
Core Computational Unit – Pre-Processing
www.ti.com
24x21 Complex
Multiplier
I
24
X
24
Q
Rounding
(remove 21 LSBs
with rounding)
24
I
24
Q
Magnitude -squared mode
Truncate 3
LSBs
Negate
Scalar Multiplication mode
ICMULTSCALE
Q_CMULT_SCAL
21
21
Frequency Shifter modes
TWID_INCR
Twiddle
Factor LUT
Truncate 3
LSBs
Vector Multiplication modes
Internal RAM
(shared with
FFT)
CMULT_MODE
Figure 2. Complex Multiplication Capability in Pre-Processing Block
Note that in all the above seven modes of the complex multiplier, only one complex multiplication is
performed every 200 MHz clock. So, the effective speed achieved for the multiply or multiply-accumulate
operation is 200 MHz.
1.1.3
BPM Removal
Although not explicitly shown in Figure 1, it is possible to multiply the input samples going from the input
formatter into the core computational unit with a +1/-1 programmable binary sequence (of length up to 64).
This feature is enabled by setting the register bit BPM_EN in the parameter-set.
This feature may be useful when Binary Phase Modulation (BPM) is used during transmission of chirps.
The BPM pattern is generally a pseudo-random sequence (chipping sequence) of 1’s and -1’s, which have
already been applied to the radar transmit signal. Therefore, the radar signal processing of the resultant
analog-to-digital converter (ADC) samples prior to FFT needs to undo the modulation. For instance, if
each chirp is transmitted with a +1 or -1 polarity, then it is necessary to undo this sequence prior to the
second dimension FFT processing across chirps. The BPM removal feature can be used to achieve this.
6
Radar Hardware Accelerator - Part 2
SWRU527A – May 2017 – Revised March 2018
Submit Documentation Feedback
Copyright © 2017–2018, Texas Instruments Incorporated
Core Computational Unit – Pre-Processing
www.ti.com
NOTE: An alternate way to achieve this is to pre-multiply the window coefficients, which are signed
numbers, in the window RAM, so that the process of windowing prior to FFT takes care of
undoing the BPM sequence.
When BPM removal is enabled, each input sample is multiplied by a +1 or -1, based on the sequence
present in the 64-bit BPMPATTERNLSB and BPMPATTERNMSB register. The register BPMRATE is used
to control for how many consecutive samples the same BPM bit is applied. For example, if BPMRATE = 4,
then the same BPM bit is applied for 4 consecutive samples. Similarly, if BPMRATE = 1, then the BPM bit
is changed for every sample.
There is another register BPMPHASE that specifies the number of consecutive samples for which the first
BPM bit is applied. Note that this is applicable only for the first BPM bit. If BPMPHASE = 0, then the first
BPM bit is applied for BPMRATE number of samples. Otherwise, the first BPM bit is applied for
BPMRATE – BPMPHASE number of samples. For example, if BPMPHASE = 1 and BPMRATE = 4, then
the first BPM bit is applied for 4-1 = 3 samples, and then subsequent BPM bits are applied with periodicity
of 4 samples for each bit. This is shown in Figure 3.
Input samples
BPMRATE = 4
BPMPHASE = 1
BPMPATTERN
0
1
2
3
0
3
4
5
6
4
7
8
9
1
0
...
4
1
0
0
1
0
1
1
...
LSB bit
Figure 3. BPM Removal Capability
If multiple iterations (for example, four back-to-back FFTs in a single parameter-set using REG_BCNT=3)
are done, then the same BPM pattern gets applied to the input samples in each iteration.
Note the limitation that the BPM pattern register is 64 bits long, hence, the maximum BPM sequence
length that is supported is 64. For higher BPM sequence length, the alternate approach of pre-multiplying
the window coefficients stored in the window RAM may be considered.
1.1.4
Pre-Processing Block – Register Descriptions
Table 1 lists all the registers of the pre-processing block. As explained in Radar Hardware Accelerator
User's Guide - Part 1, some of the registers are common (common for all parameter-sets) registers,
whereas, some others are “part of each parameter-set”. For each register, this distinction is captured as
part of the register description in Table 1.
Table 1. Pre-Processing Block Registers
Register
INTERFTHRESH
Width
24
ParameterSet? (Y/N) Description
N
Interference threshold:
.
INTERFTHRESH_EN
This register is used to specify the threshold for zero-ing out samples affected by
interference. Any sample whose magnitude exceeds this threshold is zero-ed out, if the
feature is enabled using INTERFTHRESH_EN register bit
1
Y
Interference zero-out Enable/Disable:
This register bit controls the enable/disable for the interference zero-out feature. The
feature is enabled if this register bit is set in any given parameter-set.
CMULT_MODE
3
Y
Complex multiplication mode:
This register is used to configure the mode of the complex multiplication sub-block. A
value of 000b disables/bypasses the complex multiplication. Any other value chooses one
of 7 available modes of operation . Detailed description of the seven modes in the main
description section.
SWRU527A – May 2017 – Revised March 2018
Submit Documentation Feedback
Copyright © 2017–2018, Texas Instruments Incorporated
Radar Hardware Accelerator - Part 2
7
Core Computational Unit – Pre-Processing
www.ti.com
Table 1. Pre-Processing Block Registers (continued)
Register
TWIDINCR
Width
14
ParameterSet? (Y/N) Description
Y
Twiddle factor configuration:
When the complex multiplication sub-block is programmed in one of the frequency shifter
modes (CMULT_MODE = 001b or 010b), this register is used to indicate the amount of
frequency shift. Specifically, the input samples x(n) for n = 0 to SRCACNT-1 are multiplied
by the sequence: exp(j*2*pi*TWIDINCR*(0:SRCACNT-1)/16384).
When the complex multiplication sub-block is programmed in FFT stitching mode
(CMULT_MODE = 011b), the last two bits of this register specify whether it is 2K or 4K
FFT stitching. Specifically, if the last two bits are 00b, then it is 2K (2*1024-point) FFT
stitching and if the last two bits are 01b, then it is 4K (4*1024-point) FFT stitching. Values
of 10b and 11b are reserved. Also, the 12 MSB bits of this register must be zero.
In all other modes of the complex multiplication sub-block, this register must be kept as 0.
FFTSIZE
4
Y
FFT size (DFT mode):
For the register description during normal FFT operation, see Radar Hardware Accelerator
User's Guide - Part 1. This register also has a second purpose in DFT mode as described
below.
When FFT is disabled, and complex multiplication mode is set to be frequency shifter with
auto-increment (slow DFT mode), then this register is used to specify the DFT bin
resolution. The DFT bin resolution is given by 2^FFTSIZE. For example, if FFTSIZE =
1011b, then the DFT resolution is 2048 (2K size).
STG1LUTSELWR
1
N
Select Window RAM or internal RAM:
The internal RAM for vector multiplication mode is mapped to the same address space as
the Window RAM. Hence, this register bit is required to specify which of these two needs
to be selected when loading the co-efficients via DMA or R4F. If this register bit is 0, then
the Window RAM is selected, else, the internal RAM for vector multiplication mode is
selected. Keep this register bit as 0 always, except during the period when internal RAM
needs to be loaded.
BPM_EN
1
Y
Enable/Disable BPM removal:
This register bit specifies whether the BPM removal needs to be enabled or not. If this
register is set, then BPM removal is enabled prior to feeding samples from the input
formatter into the core computational unit.
BPMPATTERNLSB
and
BPMPATTERNMSB
64
BPMRATE
10
N
BPM pattern:
Specifies the BPM pattern to be used to multiply the input samples if BPM removal is
enabled.
N
BPM rate:
Specifies the number of input samples corresponding to each BPM bit. Minimum valid
value for this register is 1.
BPMPHASE
4
Y
BPM starting phase:
Specifies the starting phase of the BPM pattern periodicity. For more information, see the
detailed description in Section 1.1.3.
8
Radar Hardware Accelerator - Part 2
SWRU527A – May 2017 – Revised March 2018
Submit Documentation Feedback
Copyright © 2017–2018, Texas Instruments Incorporated
Core Computational Unit - CFAR Engine
www.ti.com
2
Core Computational Unit - CFAR Engine
This section describes the CFAR engine block present in the core computational unit.
Core Computational Unit
FFT Engine (ACCEL_MODE = 00b)
Windowing+FFT
Log-Magnitude Post- processing
Pre-processing
Zero Out Interf
Samples
24
24
Mag
LOG2EN
24
JP L
x
Log2
FFT _OUTP UT_
MODE
16
24
A pprox.
FFT
18
Mag
JP L
A pprox.
24
24
Complex
Mult
ABS EN
FFT_EN
WINDOW_EN
Sin ,
Cos
LUT
>Th?
Window
RAM
CMULT_MODE
Statistics
(Sum, Max)
INTERFTHRESH
24
ACCEL_MODE
CFAR -CA Detector
CFAR Engine (ACCEL_MODE = 01b)
CFA R_GROUP ING_E N
Log-Magnitude Pre-Processing
CFA R_A V G_LE FT
CFA R_A V G_RIGHT
Compare
CFA R_INP _MODE
24
Mag
Squared
RAM
(left buffer)
G
U
A
R
D
Cell
Under
Test
G
U
A
R
D
RAM
(right
buffer)
24
CFA R_GUA RD
_INT
Mag
24
JP L
Log2
S liding
S um
16
Local
Max
Check
S liding
S um
CFA R_OUT _MODE
Threshold
Mult /
Add
CFA R_LOG_MODE
CFA R_THRE S H
CFA R_CY CLIC
A pprox.
CFA R_A B S_MODE
S um or
GO or
SO
CFA R_NOIS E A V G_MODE
Figure 4. CFAR Engine
2.1
CFAR Engine
The CFAR engine (see Figure 4) is a module that enables detection of objects, by identifying peaks in the
FFT output. Although there are several detection algorithms, the accelerator supports only the CFAR-CA
algorithm and a few variants of it. CFAR-CA stands for constant false alarm rate – Cell Averaging. The
other popular technique known as ordered-statistic CFAR, a.k.a CFAR-OS is not supported.
As shown in Figure 4, the CFAR engine path is selected by setting the accelerator mode ACCEL_MODE
= 01b. In this mode, the FFT path is not usable simultaneously and the input 24-bit samples from the input
formatter block will be routed into the CFAR engine. The CFAR engine has capability to perform CFARCA detection processing (both linear and logarithmic CFAR modes are available) and generate a peak list.
In cell-averaging CFAR, the processing steps involve computing a threshold for each sample under test
(cell under test) and deciding whether a peak is detected or not based on whether the cell under test
crosses that threshold. Additionally, peak grouping may be done, where a peak is declared only if the cell
under test is greater than its most immediate neighboring cells to its left and right. One thing to note here
is that for peak grouping, the left and right neighboring cells themselves are not required to be CFAR
qualified.
For each cell under test, the computation of threshold is done by averaging the magnitude (or magnitudesquared or log-magnitude) of a specified number of noise samples to the left and right of the cell under
test to determine a ‘surrounding noise average’ and then applying a scale factor (or addition factor in case
log-magnitude is used) on that surrounding noise average to determine the threshold. Thus, the CFAR-CA
detector takes one cell at a time, computes the threshold and decides whether a valid peak is present at
that cell.
SWRU527A – May 2017 – Revised March 2018
Submit Documentation Feedback
Copyright © 2017–2018, Texas Instruments Incorporated
Radar Hardware Accelerator - Part 2
9
Core Computational Unit - CFAR Engine
2.1.1
www.ti.com
CFAR Engine – Operation
The CFAR engine receives 24-bit input samples from the Input Formatter block. Typically, these are real
samples, representing the magnitude or magnitude-squared or log-magnitude of the FFT output. However,
the input to CFAR engine can instead be complex samples, in which case, either magnitude or magnitudesquared or log-magnitude of the complex samples can be computed inside the CFAR engine itself. This is
done by the log-magnitude pre-processing sub-block inside the CFAR engine (see block diagram). The
real unsigned result from this pre-processing operation is sent to CFAR-CA detection processing. The
registers CFAR_INP_MODE and CFAR_ABS_MODE are used to configure real vs. complex input, as well
as the nature of pre-processing required. The log-magnitude computation uses the same JPL
approximation for magnitude calculation and the same look-up table (LUT) approximation for log2
computation as described in Part 1 of the user guide for FFT engine post-processing.
CFAR Engine
Log-Magnitude Pre-processing
CFAR_AVG_LEFT
CFAR_INP_MODE
G
24
CFAR_AVG_RIGHT
JPL
Approx.
Local
Max
Check
G
CFAR_OUT_MODE
D
24
Threshold
Mult/Add
CFAR_GUARD
_INT
Mag
Compare
U Cell U
RAM
RAM
R Under R
(left buffer) A
(right buffer)
Test A
D
Mag
Squared
CFAR_GROUPING_EN
CFAR-CA Detector
Sliding
Sum
24
Log 2
16
CFAR_LOG_MODE
CFAR_THRESH
Sliding
Sum
CFAR_CYCLIC
Sum or
GO or
SO
CFAR_ABS_MODE
CFAR_CA_MODE
CFAR_ NOISE_DIV
Figure 5. CFAR Engine Block Diagram
As described earlier, the CFAR-CA detection processing involves finding a “surrounding noise average”
for each cell under test and then determining a threshold that is a function of the surrounding noise
average. The cell under test is compared against this threshold to decide whether a peak is present or not
in that cell. To calculate the threshold, the surrounding noise average is multiplied with (or added to) a
threshold scaling factor specified in CFAR_THRESH register.
There are two modes in which the CFAR detector can be used – in non-logarithmic mode (a.k.a linear
CFAR), the threshold scale factor is multiplied, and in logarithmic mode (a.k.a logarithmic CFAR), the
threshold scale factor is added. This is decided based on CFAR_LOG_MODE register.
The final detection threshold that is so obtained is used to compare against the cell under test to
determine whether a peak is detected in that cell.
Table 2 summarizes the register settings for the different CFAR modes of operation.
Table 2. CFAR Modes and Register Settings
Input Real or
Complex
Desired PreProcessing
Linear CFAR
Real
N/A
1
00
0
Complex
Magnitude
0
10
0
Mag-squared
0
00
0
Log2-Mag
0
11
0
Real
N/A
1
00
1
Complex
Log2-Mag
0
11
1
Log CFAR
10
Register Values to Use
Desired CFAR
Mode
Radar Hardware Accelerator - Part 2
CFAR_INP_MODE
CFAR_ABS_MODE CFAR_LOG_MODE
SWRU527A – May 2017 – Revised March 2018
Submit Documentation Feedback
Copyright © 2017–2018, Texas Instruments Incorporated
Core Computational Unit - CFAR Engine
www.ti.com
Desired CFAR-CA Algorithm
CFAR_CA_MODE Register
Setting
CFAR-CA
00
CFAR-CAGO
01
CFAR-CASO
10
The surrounding noise average computation has multiple options – cell averaging (CFAR-CA), cell
averaging with greater-of selection (CFAR-CAGO) and cell averaging with smaller-of selection (CFARCASO). The register CFAR_CA_MODE is used to select between CFAR-CA, CFAR-CAGO and CFARCASO modes. In CFAR-CA, the noise samples on the left side and right side of the cell under test (after
ignoring some guard cells on either side) are simply averaged to determine the surrounding noise average
value. In CFAR-CAGO, the noise samples on the left side and right side are averaged independently and
the greater of the two is used to determine the threshold. In CFAR-CASO, the lesser of the two is used
The number of samples on the left side and right side used for computing the noise average is configured
using CFAR_AVG_LEFT and CFAR_AVG_RIGHT registers and the number of guard cells is configured
using CFAR_GUARD_INT register. The number of samples used for left side noise averaging is given by
2*CFAR_AVG_LEFT. The number of samples used for right side noise averaging is given by
2*CFAR_AVG_RIGHT. The number of guard cells that are ignored on each side of the cell under test is
given by CFAR_GUARD_INT. For example, if CFAR_AVG_LEFT = CFAR_AVG_RIGHT = 16, and
CFAR_GUARD_INT = 3, then it means that the most immediate three samples each to the left and right of
the cell under test are skipped and then, 32 samples on the left and 32 samples on the right side are used
for noise averaging. Note that even though the term noise averaging is used here, the actual
implementation simply adds the noise samples first and the “averaging” is done as a divide by a power-of2 as specified in a separate register, CFAR_NOISE_DIV. These registers are described in Table 5.
Cell under test
CFAR_GUARD_INT = 2
CFAR_AVG_LEFT = 3 (3*2 = 6 samples)
CFAR_AVG_RIGHT = 3 (3*2 = 6 samples)
Guard cells
}
}
Guard cells
««..
Cells used for Leftside noise avg.
Cells used for Rightside noise avg.
Figure 6. CFAR-CA: Cells Used for Surrounding Noise Average
As mentioned earlier, the CFAR_THRESH register specifies the threshold scaling factor. This is an 18-bit
register whose value is used to either multiply or add to the ‘surrounding noise average’ to determine the
threshold used for detection of the present cell under test. If logarithmic mode is disabled (in magnitude or
magnitude-squared mode), then the register value is multiplied with the surrounding noise average to
determine the threshold, else it is added to the surrounding noise average. In the former case, this 18-bit
register is interpreted as a 14.4 value and supports a range of values from 1/16 to 2^14-1. In the latter
case (logarithmic mode), the 18-bit register is interpreted as a 7.11 value.
The CFAR engine supports a few output formats that are described next.
2.1.2
CFAR Engine – Output Formats
The cells that exceed the threshold are noted and this ‘Detected Peaks list’ is sent to the destination
memory. Since the output format of the core computational unit is 24-bits I and 24-bits Q, the detected
peaks list is formatted into ‘I’ and ‘Q’ channels as shown in Figure 7. The 24-bit I channel contains the
index at which the peak is detected, with the MSB 12 bits containing the iteration number (corresponding
to REG_BCNT counter value) and the LSB 12 bits containing the sample index number (corresponding to
SRCACNT counter value). The 24-bit Q channel contains the surrounding noise value or the cell under
test value of that detected peak. This is chosen based on CFAR_OUT_MODE register setting. Instead of
‘Detected Peaks list’, it is also possible for the CFAR engine to send out the raw ‘surrounding noise
average’ value for each cell. This is called ‘Raw output mode’.
SWRU527A – May 2017 – Revised March 2018
Submit Documentation Feedback
Copyright © 2017–2018, Texas Instruments Incorporated
Radar Hardware Accelerator - Part 2
11
Core Computational Unit - CFAR Engine
www.ti.com
Figure 7 and Table 3 show the different output formats available.
2XWSXW IRUPDW RI &)$5 (QJLQH LQ µ'HWHFWHG 3HDNV OLVW¶ PRGH
24 bits (I) from Core Computational Unit
12 bits
Iteration number
(i.e., REG_BCNT counter value)
24 bits (Q) from Core Computational Unit
12 bits
Sample index
(i.e., cell under test index)
Surrounding noise average value
or Cell under test value
2XWSXW IRUPDW RI &)$5 (QJLQH LQ µ5DZ RXWSXW¶ PRGH
24 bits (I) from Core Computational Unit
24 bits (Q) from Core Computational Unit
Cell under test value
or Binary detection result flag
Surrounding noise average value
Figure 7. CFAR Engine Output Format
Table 3. CFAR Output Modes and Register Settings
CFAR Output Mode
Q Channel Output
Peak index
Surrounding noise average
value
10
Peak index
Cell under test value
11
Surrounding noise average
value
Cell under test value
00
Surrounding noise average
value
Binary detection result flag (0
or 1)
01
Detected peaks list mode (only
detected peaks are output)
Raw output mode (all cells are
output)
CFAR_OUT_MODE Register
Setting
I Channel Output
In detected peaks list mode, only the detected peaks are output to the destination memory. In this case,
the read-only register FFTPEAKCNT indicates how many peaks have been totally detected, so that the
main processor can read that many locations from the destination memory.
While detecting peaks, if ‘peak grouping’ is required, then it can be enabled using CFAR_GROUPING_EN
register. In this case, a peak is declared as detected only if it the cell under test exceeds the threshold, as
well as, if the cell under test exceeds the two neighboring cells to its immediate left and right (the peak is a
local maximum).
12
Radar Hardware Accelerator - Part 2
SWRU527A – May 2017 – Revised March 2018
Submit Documentation Feedback
Copyright © 2017–2018, Texas Instruments Incorporated
Core Computational Unit - CFAR Engine
www.ti.com
2.1.3
CFAR Engine – Cyclic vs. Non-Cyclic
The register CFAR_CYCLIC specifies whether the CFAR-CA detector needs to work in cyclic mode or in
non-cyclic mode. These two modes are different in the way the samples at the edges are handled. In noncyclic mode, the left side noise average is unavailable for the first several cells under test, therefore, only
the right side noise average is used for those initial cells. Similarly, the right side noise average is
unavailable for the last several cells under test, and only the left side noise average is used for those cells.
For all other cells in the middle, both left and right side noise averaging is used as per the number of
samples programmed in CFAR_AVG_LEFT and CFAR_AVG_RIGHT registers.
Cell under test
}
Guard cells
}
Guard cells
CFAR_GUARD_INT = 2
CFAR_AVG_LEFT = 3 (3*2 = 6 samples)
CFAR_AVG_RIGHT = 3 (3*2 = 6 samples)
««..
Only right-side noise average
is used (scaled by 2 to mimic
left-side averaging as well)
Not enough cells
available on left-side
Figure 8. Handling of Samples Near the Edge in Non-Cyclic Mode
On the other hand, in cyclic mode, the CFAR-CA detector needs to wrap around the edges in a circular
manner. In other words, for a cell under test that is near the left extreme, the left side noise average
computation needs to use some samples from the right edge (circular wrap around the edge). Similarly,
for a cell under test that is near the right extreme, the right side noise average computation needs to use
some samples from the left edge (again, circular wrap around.
Cell under test
Guard cells
}
}
Guard cells
CFAR_GUARD_INT = 2
CFAR_AVG_LEFT = 3 (3*2 = 6 samples)
CFAR_AVG_RIGHT = 3 (3*2 = 6 samples)
««..
Cells used for leftside noise average
Cells used for rightside noise average
Figure 9. Handling of Samples Near the Edge in Cyclic Mode
This cyclic CFAR implementation is accomplished through a combination of a few register settings within
the CFAR engine, as well as in the input and output formatter blocks. Specifically, the input formatter is
configured to send additional samples (repeat samples) in a circular manner wrapping around the left and
right edges. This is achieved by using the circular shift (CIRCIRSHIFT) and circular wrap-around
(CIRCSHIFTWRAP) registers in the input formatter, such that the required number of extra samples at
both edges are streamed into the CFAR engine. The cyclic CFAR mode only works when the number of
cells under test is a power of 2.
For example, if the number of cells under test is 256, the average number of left and right noise samples
is 32 each and the number of guard cells is 3 on either side. Then, the registers need to be programmed
as shown in Table 4
SWRU527A – May 2017 – Revised March 2018
Submit Documentation Feedback
Copyright © 2017–2018, Texas Instruments Incorporated
Radar Hardware Accelerator - Part 2
13
Core Computational Unit - CFAR Engine
www.ti.com
Table 4. Configuration Example for CFAR Cyclic Mode
Module
Register Setting
Comments
CFAR Engine
CFAR_GUARD_INT = 3
3 guard cells on either side
CFAR_AVG_LEFT = 16
32 samples on left side and 32 samples on right side for noise averaging
CFAR_AVG_RIGHT = 16
Input Formatter
Output Formatter
SRCACNT = 325
255 + (32+3) + (32+3), where 255 is the usually configured value of
SRCACNT for a 256 sample vector, plus 32+3 additional samples for
circular repeat at either end
CIRCIRSHIFT = 221
256 – (32+3), which is the starting offset for the circular shift, so that
samples are streamed into CFAR engine start from this point
CIRCSHIFTWRAP = 8
The circular wrap-around happens when SRCACNT counter value reaches
2^CIRCSHIFTWRAPP = 256
REG_DST_SKIP_INIT = 0
No need to skip any samples at Output Formatter even though extra
samples are fed into CFAR engine, because CFAR engine automatically
strips out the extra samples
DSTACNT = 255
256 outputs corresponding to 256 cells
Start at sample index 221
(CIRCIRSHIFT )
End at sample index 34
0 1 2 3
...
3
4
««..
2
2
1
2 2 2
5 5 5
3 4 5
«
Wrap-around at 2^CIRCSHIFTWRAP (= 256)
Figure 10. Input Formatter Sample Streaming for the Cyclic CFAR Example
2.1.4
CFAR Engine – Register Descriptions
Table 5 lists all the registers of the CFAR engine block. A few registers belonging to input formatter block
that are related to cyclic CFAR mode of operation are also listed here.
Table 5. CFAR Engine Registers
Register
CFAR_AVG_LEFT
Width
6
ParameterSet? (Y/N) Description
Y
Number of samples for left-side noise averaging:
This register is used to specify the number of samples used for noise averaging to the left of
the cell under test. The number of samples used for noise averaging is equal to the value of
this register multiplied by 2. For example, if this register value is 15, then the number of leftside samples used for averaging is 30. The maximum averaging that is possible is 126. A
value of zero in this register means that the noise samples on the left side are not used for
averaging. A value of 1 is not supported (valid values are 0, 2, 3, … 63).
CFAR_AVG_RIGHT
6
Y
Number of samples for right-side noise averaging:
This register is very similar to the above, except that this register specifies the averaging to
the right of the cell under test. In most cases, it is expected that CFAR_AVG_RIGHT has the
same value as CFAR_AVG_LEFT.
CFAR_GUARD_INT
3
Y
Number of guard cells:
This register specifies the number of guard cells to ignore on either side of the cell under
test. If this register value is 3, then three guard cells on the left side and three guard cells on
the right side are ignored. Only the noise samples beyond this guard region are used for
calculating the surrounding noise average.
14
Radar Hardware Accelerator - Part 2
SWRU527A – May 2017 – Revised March 2018
Submit Documentation Feedback
Copyright © 2017–2018, Texas Instruments Incorporated
Core Computational Unit - CFAR Engine
www.ti.com
Table 5. CFAR Engine Registers (continued)
Register
CFAR_THRESH
Width
18
ParameterSet? (Y/N) Description
N
Threshold scale factor:
This register is used to specify the threshold scale factor. This value is used to either
multiply or add to the ‘surrounding noise average’ to determine the threshold used for
detection of the present cell under test. If logarithmic CFAR mode is disabled (in magnitude
or magnitude-squared mode), then the register value is multiplied with the surrounding noise
average to determine the threshold, else it is added to the surrounding noise average. In the
former case, this 18-bit register is interpreted as a 14.4 value. In the latter case (logarithmic
mode), the 18-bit register is interpreted as a 7.11 value.
CFAR_LOG_MODE
1
Y
CFAR linear or logarithmic mode:
This register is one of the registers used to specify whether the CFAR detector operates in
linear or logarithmic mode. If this register bit is set, then the CFAR detector operates in
logarithmic mode, which means that the threshold scale factor is added to (instead of
multiplied with) the surrounding noise average value to determine the threshold. Note that
this mode is meaningful only when the input samples to the CFAR detector are logmagnitude samples (see CFAR_INP_MODE as well). If this register bit is 0, then the
logarithmic mode is disabled, in which case, the threshold scale factor is multiplied with
(instead of added to) the surrounding noise average to determine the threshold. This mode
is meaningful when magnitude or magnitude-squared samples are fed to the CFAR detector.
CFAR_INP_MODE
1
Y
CFAR engine input mode:
This register bit specifies whether the inputs to the CFAR engine are complex samples or
real values (the real values are already magnitude, magnitude-squared or log-magnitude
numbers that can be directly sent to CFAR detection process). If this register bit is 1, then
the input samples are real values and are directly sent to CFAR detection. If this register bit
is 0, then the inputs are complex samples and hence either magnitude or magnitudesquared or log-magnitude computation is required prior to CFAR detection. Which of the
three, viz., magnitude or magnitude-squared or log-magnitude is done, is selected by
CFAR_ABS_MODE register described below.
CFAR_ABS_MODE
2
Y
CFAR magnitude, mag-squared or log-mag mode:
This register is used to specify which of the three computations, namely Magnitude, Magsquared or Log-Magnitude, is enabled inside the CFAR engine prior to CFAR detection. This
register is only relevant when CFAR_INP_MODE is 0 (complex samples are fed to CFAR
engine).
00b – Magnitude-squared
01b – Not valid
10b – Magnitude (using JPL approx.)
11b – Log2-Magnitude (using LUT approx.)
CFAR_OUT_MODE
2
Y
CFAR engine output mode:
This register is used to select the output mode of the CFAR engine. The MSB bit of this
register selects whether the CFAR Engine outputs all the noise average values for all the
cells (‘Raw output’ mode), or whether the CFAR Engine outputs only the detected peaks
(‘Detected Peaks List’ mode). The LSB bit specifies the content of the 24-bit ‘I’ and ‘Q’
channel outputs logged in destination memory. Refer main description section for details.
CFAR_GROUPING_
EN
1
Y
CFAR peak grouping enable:
This register bit specifies whether peak grouping should be enabled. When this register bit is
0, peak grouping is disabled, which means that a peak is declared as detected as long as
the cell under test exceeds the threshold. On the other hand, if this register bit is 1, then a
peak is declared as detected only if it the cell under test exceeds the threshold, as well as, if
the cell under test exceeds the two neighboring cells to its immediate left and right (local
maximum).
CFAR_NOISE_DIV
4
Y
CFAR noise average division factor:
This register specifies the division factor with which the noise sum calculated from the left
and right noise windows are divided, in order to get the final surrounding noise average
value. The division factor is equal to 2^ CFAR_NOISE_DIV. Therefore, only powers-of-2
division are possible, even though the number of samples specified in CFAR_AVG_LEFT
and CFAR_AVG_RIGHT are not restricted to powers of 2. The surrounding noise average
value obtained after the division is multiplied or added with CFAR_THRESH to determine
the final threshold used to compare the cell under test for detection. The maximum allowed
value for this register is 8, which gives a division factor of 256.
CFAR_CA_MODE
2
Y
CFAR noise averaging mode:
SWRU527A – May 2017 – Revised March 2018
Submit Documentation Feedback
Copyright © 2017–2018, Texas Instruments Incorporated
Radar Hardware Accelerator - Part 2
15
Core Computational Unit - CFAR Engine
www.ti.com
Table 5. CFAR Engine Registers (continued)
Register
CFAR_CYCLIC
Width
1
ParameterSet? (Y/N) Description
Y
This register configures the noise averaging mode in the CFAR detector from one of three
options – CFAR-CA, CFAR-CAGO, CFAR-CASO.
00b – CFAR-CA
01b – CFAR-CAGO
10b – CFAR-CASO
11b – Not valid
CFAR cyclic vs. non-cyclic mode:
This register bit specifies whether the CFAR-CA detector needs to work in cyclic mode or in
non-cyclic mode. When this register bit is 0, the CFAR detector works in non-cyclic mode
and when it is 1, it works in cyclic mode. Refer main description section for details on how to
configure and use cyclic mode.
FFTPEAKCNT
(read-only)
12
N
CFAR detected peak count:
This is a read-only register that contains the number of detected peaks that are logged in
the destination memory, when CFAR Engine is configured in ‘Detected Peaks List’ mode. In
the Detected Peaks List mode, since only the detected peaks are logged in the destination
memory, this read-only register provides the number of detected peaks that are logged to
the main processor, so that the main processor can determine how many entries to read
from the destination memory.
CIRCIRSHIFT
12
Y
This register is part of input formatter block.
Source Circular Shift:
This register specifies the circular shift (offset in samples) that should be applied on the
sequence of input samples before feeding them to the core computational unit. This register,
together with CIRCSHIFTWRAP register, is useful when CFAR detection needs to be done
in cyclic mode. Refer main description section for details.
CIRCSHIFTWRAP
4
Y
This register is part of Input Formatter block
Source Circular Shift Wraparound:
This register indicates at what number (power-of-2) the sample counter value should
wraparound, when circular shift is needed. The counter wraps around at 2^
CIRCSHIFTWRAP. The sample counter starts counting from the programmed circular shift
value (CIRCIRSHIFT) and when the counter crosses (2^CIRCSHIFTWRAP-1), it wraps back
to zero to continue the count.
16
Radar Hardware Accelerator - Part 2
SWRU527A – May 2017 – Revised March 2018
Submit Documentation Feedback
Copyright © 2017–2018, Texas Instruments Incorporated
Core Computational Unit – Statistics
www.ti.com
3
Core Computational Unit – Statistics
This section describes the statistics block present in the core computational unit.
Core Computational Unit
FFT Engine (ACCEL_MODE = 00b)
Windowing+FFT
Log-Magnitude Post-processing
Pre-processing
24
FFT
LOG2EN
24
JPL
Approx.
x
Log2
FFT_OUTPUT_
MODE
16
24
18
Mag
JPL
Approx.
Mag
24
24
24
Complex
Mult
ABS EN
FFT_EN
WINDOW_EN
Zero Out Interf
Samples
Sin,
Cos
LUT
>Th?
Window
RAM
CMULT_MODE
Statistics
(Sum, Max)
INTERFTHRESH
CFAR-CA Detector
CFAR Engine (ACCEL_MODE = 01b)
Log-Magnitude Pre-processing
Compare
G
24
Mag
Squared
Mag
24
JPL
Approx.
G
U Cell U
RAM
A Under A
(left buffer) R Test R
D
D
RAM
(right
buffer)
24
Sliding
Sum
Log2
Local
Max
Check
ACCEL_MODE
24
Sliding
Sum
Threshold
Mult /
Add
16
Sum or
GO or
SO
Figure 11. Statistics Block
3.1
Statistics Block
The core computational unit has a statistics computation block at the end of the FFT Engine path as
shown in Figure 11. This block can be used to compute a few statistics, such as sum and max of the
samples output by the core computational unit.
3.1.1
Statistics Block – Operation
The 24-bit I and 24-bit Q output of the core computational unit goes to a statistics computation block. The
purpose of this block is to find the maximum and sum (average) of the output samples
The sum and max statistics are computed on a ‘per-iteration’ basis (the sum and max values are logged at
the end of each iteration) and the computation is reset for the next iteration. The sum and max values are
logged in register-sets (see MAXn_VALUE, MAXn_INDEX, ISUMn, QSUMn register-sets), which can be
read by the main processor. However, only four such registers are provided for each statistic and
therefore, the sum and max values can be logged in these registers only for up to a maximum of four
iterations.
The max statistics register-set comprises four read-only registers of 24 bits each, named MAXn_VALUE,
for recording max values, and four read-only registers each 12 bits unsigned, named MAXn_INDEX, for
recording the max indices. The sum statistics register-set contains four registers of 36 bits each, named
ISUMn, for I-sum statistics, and 4 registers of 36 bits each, named QSUMn, for Q-sum statistics.
For larger number (>4) of iterations, either the sum or the max value can be sent to the destination
memory for each iteration, which allows the statistic to be available even for cases with more than four
iterations. The logging of the statistic into the destination memory is enabled using FFT_OUTPUT_MODE
register described below.
SWRU527A – May 2017 – Revised March 2018
Submit Documentation Feedback
Copyright © 2017–2018, Texas Instruments Incorporated
Radar Hardware Accelerator - Part 2
17
Core Computational Unit – Statistics
www.ti.com
The MSB bit of the FFT_OUTPUT_MODE register selects whether the default (main) output of the core
computational unit goes to the destination memory, or the statistics block output. If the MSB of this 2-bit
register is 0, then it selects the default mode of operation, where the main output (FFT or Log-Mag result)
is sent to the destination memory. If the MSB is 1, then it selects the statistics output mode, where either
the sum or max statistic is sent to the destination memory (one value per iteration). Whether the sum or
max is sent to memory is dependent on the LSB bit. If the LSB bit is 0, then the statistic value that is sent
is the max value (useful in conjunction with Log-Mag enabled to find the biggest peak and peak index per
iteration). Here, the I output is the maximum value itself and the Q output is the index (location) of the
maximum value. If the LSB bit is 1, then the statistic value that is sent is the sum value (useful for DFT
mode, as well as for mean squared or mean of absolute values computation).
Table 6. Statistics Output Modes
FFT_OUTPUT_MODE Register
00b – Default output mode
I Channel Output
Q Channel Output
Main output of core computational unit
10b – Max statistics output (One output per
iteration)
Max Value
Max Index
11b – Sum statistics output (One output per
iteration)
Sum of I values
Sum of Q values
The max statistic records the maximum value (and its index) of the magnitude or log-magnitude samples
corresponding to every iteration. The sum statistic records the sum of the magnitude or log-magnitude or
the complex output samples corresponding to each iteration. If the main output of the core computational
unit is the complex FFT output (ABS EN=0 and LOG2EN=0), then the sum statistics is the complex sum.
The complex sum statistics mode is useful when used in conjunction with the complex multiplier block in
DFT mode or vector multiplication mode. For example, the sum statistic computed here, together with the
DFT mode of the complex multiplier block, enable DFT computation for the desired number of bins
(iterations). When the desired number of bins is more than 4, the sum statistic can be sent to destination
memory (instead of the main data output that is normally sent to the destination memory).
Note that when the sum statistics is logged into the destination memory, it goes through the Output
Formatter block as only 24-bits each for I and Q (same bit-width as the primary FFT outputs). Hence, the
computed sum statistics value of 36-bits width, needs to be scaled down by right-shifting the appropriate
number of LSBs (using FFTSUMDIV register) before sending to output formatter. Thus, when logging the
statistics in destination memory, the sum statistics is to be used as an “average” value, rather than a
“sum” value itself.
The FFTSUMDIV register specifies the number of bits to right-shift the sum statistic before it is written to
destination memory. The internal sum statistic register is 36-bits wide (allowing 12 bits of MSB growth of
the 24-bit data path), but this statistics value needs to be scaled down to 24 bits to match the data path
width going to the Output Formatter. This register specifies how many LSBs to drop to convert the sum
statistics to 24-bit value. Note that only signed saturation is implemented (irrespective of whether
magnitude values are being summed or complex FFT output values are being summed). Therefore, it is
recommended that this register is configured to drop an appropriate number of LSBs such that incorrect
saturation in case of magnitude sum is avoided.
Note that in statistics output mode, the registers DSTACNT, DSTAINDX, DSTBINDX, DST16b32b and
DSTREAL are not meant to be used, since it is known that there is only one value to be written to
destination memory for every iteration in a specific format. It is recommended that in this mode, DSTACNT
be programmed to its maximum value of 4095, DSTAINDX and DSTBINDX are both programmed to a
value of 8 bytes, DST16b32b is set to 1 and DSTREAL is reset to 0. The statistics is then always logged
in the destination memory as consecutive 32-bit I and Q samples, irrespective of whether sum statistic or
max statistic is being logged.
18
Radar Hardware Accelerator - Part 2
SWRU527A – May 2017 – Revised March 2018
Submit Documentation Feedback
Copyright © 2017–2018, Texas Instruments Incorporated
FFT Stitching Use-Case Example
www.ti.com
3.1.2
Statistics block – Register Descriptions
Table 7 lists all the registers of the statistics block.
Table 7. Statistics Block Registers
Register
MAX1VALUE
Width
24
ParameterSet? (Y/N) Description
N
MAX2VALUE
Max value:
These registers contain the max value on a per-iteration basis. These registers are meaningful
only when Magnitude or Log-Magnitude is enabled. Only the max values for up to four
iterations are recorded in these registers. For larger number of iterations, use statistics output
mode (FFT_OUTPUT_MODE below).
MAX3VALUE
MAX4VALUE
MAX1INDEX
12
N
MAX2INDEX
Max index:
These registers contain the max index on a per-iteration basis, corresponding to each max
value in the MAXn_VALUE registers.
MAX3INDEX
MAX4INDEX
ISUM1LSB and
ISUM1MSB
Sum statistics:
ISUM2LSB,
ISUM2MSB
These registers contain the sum of the I outputs and Q outputs on a per-iteration basis. Only
the statistics for up to four iterations are recorded in these registers. For larger number of
iterations, use statistics output mode (FFT_OUTPUT_MODE below).
ISUM3LSB,
ISUM3MSB
ISUM4LSB,
ISUM4MSB
FFT_OUTPUT_
MODE
2
Y
FFT Path output mode:
This register specifies the output mode of the FFT path. Instead of the default mode where the
main output of the core computational unit is sent to the destination memory, this register can
be configured such that either the max or sum statistics can be sent to the destination
memory.
00b – Default mode (main output)
10b – Max statistics output mode
11b – Sum statistics output mode
FFTSUMDIV
5
N
Right-shifting for Sum statistic:
This register specifies the number of bits to right-shift the sum statistic before it is written to
destination memory. The internal sum statistic register is 36-bits wide (allowing 12 bits of MSB
growth of the 24-bit data path), but this statistics value needs to be scaled down to 24 bits to
match the data path width going to the Output Formatter. This register specifies how many
LSBs to drop to convert the sum statistics to 24-bit value.
4
FFT Stitching Use-Case Example
This section presents examples that illustrate how to configure and use the radar hardware accelerator for
the special use-case of FFT stitching.
4.1
FFT Stitching Use-Case
As described earlier, the radar hardware accelerator natively supports FFT sizes of up to 1024. However,
FFT of size 2048 and 4096 can also be accomplished using a two-step FFT stitching process. This
involves the use of two parameter-sets as shown in the example below.
SWRU527A – May 2017 – Revised March 2018
Submit Documentation Feedback
Copyright © 2017–2018, Texas Instruments Incorporated
Radar Hardware Accelerator - Part 2
19
FFT Stitching Use-Case Example
www.ti.com
Consider an industrial level-sensing use-case with 1 TX, 1 RX and 4096 complex samples per chirp.
During active chirp transmission, the Digital Front-end (DFE) writes ADC samples to the ADC buffer in
ping-pong manner. This example assumes that the ADC data is complex (the RF/analog is configured as
complex baseband (instead of real-only) chain). Since 4096 samples requires 4096*4 = 16384 bytes, the
DFE fills up the entire ping or pong ADC buffers (ACCEL_MEM0 or ACCEL_MEM1) for every chirp. The
DFE configuration details are outside the scope of this user's guide.
The FFT processing using the radar hardware accelerator can be done inline (as and when the ADC data
is available) by setting FFT1DEN = 1, such that the ADC buffer is shared with the accelerator input
memories and the DFE output is directly available to the accelerator for processing at the end of every
chirp (ping-pong switch). Alternately, it is possible to not use the radar hardware accelerator during ADC
data collection, and instead, simply transferring the ADC data into L3 memory using DMA at the end of
every ping-pong switch. In such a case, after all the chirps are collected, the radar hardware accelerator
can be used to perform FFT during inter-frame time. This is accomplished by setting FFT1DEN register bit
to 0, such that the ADC buffer is not shared with the accelerator input memories and therefore, the
accelerator input memories are directly accessible for DMA transfer to provide input data for the
accelerator. The use of inline FFT processing and inter-frame FFT processing is covered in the Radar
Hardware Accelerator - Use Case Example section of the Radar Hardware Accelerator User's Guide - Part
1.
In Table 8 and Table 9, the parameter-set configurations for performing one 4096-point FFT using FFT
stitching is shown. The input data is assumed to be in ACCEL_MEM0 as shown in the layout below. The
FFT stitching is achieved in two steps as follows.
1. The first step involves computing four 1024-point FFTs of input samples. For these four 1024-FFT
computation, input samples are sent in following order:
a. Iteration #0: x[0], x[4], x[8], x[12], …, x[4092]
b. Iteration #1: x[1], x[5], x[9], x[13], …, x[4093]
c. Iteration #2: x[2], x[6], x[10], x[14], …, x[4094]
d. Iteration #3: x[3], x[7], x[11], x[15], …, x[4095]
Where x[0], x[1], x[2], …., x[4095] are the original 4096-point input samples which are stored in
ACCEL_MEM0 in consecutive locations.
Iteration#0
I0
Q0
Iteration#1
I1
.
.
.
I4092
Q4092 I4093
Q1
Iteration#2
I2
Q2
ACCEL_MEM0
Q4093 I4094
Iteration#3
I3
Q3
.
.
.
Q4094 I4095
Q4095
128-bit wide accelerator local memory
Figure 12. Layout of Samples in Source Memory for 1TX, 1RX, 4K Complex FFT
20
Radar Hardware Accelerator - Part 2
SWRU527A – May 2017 – Revised March 2018
Submit Documentation Feedback
Copyright © 2017–2018, Texas Instruments Incorporated
FFT Stitching Use-Case Example
www.ti.com
Further, before computing the 1024-point FFT in each iteration, apply windowing to the incoming
input samples. Note that the window RAM can hold a maximum of only 1024 window coefficients.
When larger FFT (2K and 4K) is needed via stitching of multiple smaller-size FFTs, a downsampled set of window coefficients is stored in window RAM (1024 coefficients of original 4096
point window are stored by picking every 4th coefficient) and then, the accelerator is configured to
use a linearly interpolation between these 1024 window samples to get intermediate window
coefficients. The WINDOW_INTERP_FRACTION register, which is part of the parameter set, is
used to configure interpolation mode for the window coefficients. When this register is 01b, then the
window coefficients are applied as is from the window RAM for the first iteration, and then linearly
interpolated between successive coefficients with an interpolation fraction of 0.25, 0.5, 0.75 for the
second, third and fourth iterations respectively. This corresponds to the linear interpolation of
window coefficients as needed for a 4K size FFT. (When performing 2K size FFT, the
WINDOW_INTERP_FRACTION register should be programmed to 10b, in which case, the first
iteration uses the window RAM coefficients as is, and the second iteration linearly interpolates
between consecutive coefficients with interpolation fraction of 0.5). Note that when linear
interpolation for the window coefficients is used, the symmetric window mode (WINSYMM = 1)
cannot be used.
Figure 13. Linear Interpolation of Window RAM Coefficients for 4K FFT
SWRU527A – May 2017 – Revised March 2018
Submit Documentation Feedback
Copyright © 2017–2018, Texas Instruments Incorporated
Radar Hardware Accelerator - Part 2
21
FFT Stitching Use-Case Example
www.ti.com
Windowing output is fed to FFT engine for 1024-point FFT computation (see Step 1). At the end of this
step, the complex FFT output is stored in ACCEL_MEM2 in the same fashion as they are picked from
source memory as illustrated in Figure 14. Note that Ii,j represent real part of ith bin FFT output for jth
iteration. Similarly, Qi,j represent imaginary part of ith bin FFT output for jth iteration.
Output
Iteration#0
I0,0
Output
Iteration#1
Q0,0
Output
Iteration#2
Q0,1
I0,1
.
.
.
I0,2
Output
Iteration#3
Q0,2
Q0,3
I0,3
.
.
.
ACCEL_MEM2
I1023,0 Q1023,0 I1023,1 Q1023,1 I1023,2 Q1023,2 I1023,3 Q1023,3
128-bit wide accelerator local memory
Figure 14. Layout of Samples in Destination Memory (after Step 1)
2. In the second step, 1024 4-point FFTs in stitching mode are computed and the results written back to
ACCEL_MEM0, thus, over-writing the original input data. For 4-point FFT computation, the input
samples are sent through the FFT engine in a transpose manner as illustrated in Figure 15. In this
case, the complex multiplier in the pre-processing block needs to be configured to enable 4K FFT
stitching. After each 4-point FFT, the output samples correspond to FFT bins spaced apart by 1024.
For example, the first iteration produces outputs corresponding to bins 0, 1024, 2048 and 3072.
Similarly, the second iteration produces outputs corresponding to bins 1, 1025, 2049 and 3073. By
using appropriate DSTAINDX and DSTBINDX settings, the output samples can be arranged in the
correct bin order as desired.
Iteration#0 Input
Iteration#1 Input
Iteration#2 Input
I0,0
I1,0
I2,0
Q0,0
Q1,0
Q2,0
I0,1
I1,1
I2,1
.
.
.
Iteration#1023 Input
I1023,0
Q1023,0
Q0,1
Q1,1
Q2,1
I0,2
I1,2
I2,2
Q0,2
Q1,2
Q2,2
ACCEL_MEM2
I1023,1
Q1023,1
I1023,2
Q1023,2
I0,3
I1,3
I2,3
Q0,3
Q1,3
Q2,3
I1023,3
Q1023,3
.
.
.
Figure 15. 1024 4-Point FFTs in Step 2 (FFT Stitching)
22
Radar Hardware Accelerator - Part 2
SWRU527A – May 2017 – Revised March 2018
Submit Documentation Feedback
Copyright © 2017–2018, Texas Instruments Incorporated
FFT Stitching Use-Case Example
www.ti.com
The key register configurations to perform 4K size complex FFT using FFT stitching are tabulated in
Table 8.
Table 8. Parameter-Set #0: Used for Step #1
Register
Value
Comments
FFT_EN
1
Enable FFT computation
FFTSIZE
10
FFT size = 2^10 = 1024
WINDOW_EN
1
Enable windowing
WINDOW_INTERP_
FRACTION
1
Enable windowing interpolation for 4096-point FFT stitching
SRCACNT
1023
SRCAINDX
16
Adjacent samples spaced 16 bytes apart
REG_BCNT
3
Four back-to-back 1024-point FFT
SRCBINDX
4
Samples are 4 bytes apart for successive iterations
SRCADDR
0
Start at beginning of ACCEL_MEM0
DSTADDR
32KB
DSTAINDX
16
DSTBINDX
4
DSTACNT
1023
1024 valid samples
Destination memory is ACCEL_MEM2
SRC16b32b
0
FFT input samples are 16-bit word aligned
DST16b32b
0
FFT output samples are 16-bit word aligned
Table 9. Parameter-Set #1: Used for Step #2
Register
Value
Comments
FFT_EN
1
Enable FFT computation
FFTSIZE
2
FFT size = 2^2 = 4
CMULT_MODE
3
Enables FFT Stitching Mode of complex multiplier
TWIDINCR
1
4096-Point FFT Stitching
SRCACNT
3
Four valid samples (zero-based count)
SRCAINDX
4
Adjacent samples spaced 4 bytes apart
REG_BCNT
1023
SRCBINDX
16
SRCADDR
32KB
Input samples for this step are stored in ACCEL_MEM2
DSTADDR
0KB
Output of this step stored back in ACCEL_MEM0
DSTAINDX
1024*4
Adjacent output samples of each iteration are 4KB apart, so that at the end, 4096-point FFT
is output in linear bin order
DSTBINDX
4
First output sample of each iteration is 4 bytes apart to ensure final 4096-point FFT output is
in linear bin order
DSTACNT
3
4-point FFT output
SRC16b32b
0
FFT input samples are 16-bit word aligned
DST16b32b
0
FFT output samples are 16-bit word aligned
Need to compute 4-point FFT 1024 times
Each set for 4–point FFT are spaced 16 bytes apart
SWRU527A – May 2017 – Revised March 2018
Submit Documentation Feedback
Copyright © 2017–2018, Texas Instruments Incorporated
Radar Hardware Accelerator - Part 2
23
References
www.ti.com
At the end of two parameter sets, the result of the 4096-point FFT is available in ACCEL_MEM0 in
correct bin order and can be transferred back to L3 memory via DMA. Alternately, log-magnitude
and/or CFAR detection processing can be done in the radar hardware accelerator itself.
I0
Q0
.
.
.
I1
Q1
I2
Q2
ACCEL_MEM0
.
.
.
I4095
Q4095
128-bit wide accelerator local memory
Figure 16. Layout of Final FFT Output in Correct Bin Order
5
References
Radar Hardware Accelerator User's Guide - Part 1
24
Radar Hardware Accelerator - Part 2
SWRU527A – May 2017 – Revised March 2018
Submit Documentation Feedback
Copyright © 2017–2018, Texas Instruments Incorporated
Revision History
www.ti.com
Revision History
NOTE: Page numbers for previous revisions may differ from page numbers in the current version.
Changes from Original (May 2017) to A Revision ........................................................................................................... Page
•
Update was made in Section 2.1. ....................................................................................................... 9
SWRU527A – May 2017 – Revised March 2018
Submit Documentation Feedback
Copyright © 2017–2018, Texas Instruments Incorporated
Revision History
25
IMPORTANT NOTICE FOR TI DESIGN INFORMATION AND RESOURCES
Texas Instruments Incorporated (‘TI”) technical, application or other design advice, services or information, including, but not limited to,
reference designs and materials relating to evaluation modules, (collectively, “TI Resources”) are intended to assist designers who are
developing applications that incorporate TI products; by downloading, accessing or using any particular TI Resource in any way, you
(individually or, if you are acting on behalf of a company, your company) agree to use it solely for this purpose and subject to the terms of
this Notice.
TI’s provision of TI Resources does not expand or otherwise alter TI’s applicable published warranties or warranty disclaimers for TI
products, and no additional obligations or liabilities arise from TI providing such TI Resources. TI reserves the right to make corrections,
enhancements, improvements and other changes to its TI Resources.
You understand and agree that you remain responsible for using your independent analysis, evaluation and judgment in designing your
applications and that you have full and exclusive responsibility to assure the safety of your applications and compliance of your applications
(and of all TI products used in or for your applications) with all applicable regulations, laws and other applicable requirements. You
represent that, with respect to your applications, you have all the necessary expertise to create and implement safeguards that (1)
anticipate dangerous consequences of failures, (2) monitor failures and their consequences, and (3) lessen the likelihood of failures that
might cause harm and take appropriate actions. You agree that prior to using or distributing any applications that include TI products, you
will thoroughly test such applications and the functionality of such TI products as used in such applications. TI has not conducted any
testing other than that specifically described in the published documentation for a particular TI Resource.
You are authorized to use, copy and modify any individual TI Resource only in connection with the development of applications that include
the TI product(s) identified in such TI Resource. NO OTHER LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE TO
ANY OTHER TI INTELLECTUAL PROPERTY RIGHT, AND NO LICENSE TO ANY TECHNOLOGY OR INTELLECTUAL PROPERTY
RIGHT OF TI OR ANY THIRD PARTY IS GRANTED HEREIN, including but not limited to any patent right, copyright, mask work right, or
other intellectual property right relating to any combination, machine, or process in which TI products or services are used. Information
regarding or referencing third-party products or services does not constitute a license to use such products or services, or a warranty or
endorsement thereof. Use of TI Resources may require a license from a third party under the patents or other intellectual property of the
third party, or a license from TI under the patents or other intellectual property of TI.
TI RESOURCES ARE PROVIDED “AS IS” AND WITH ALL FAULTS. TI DISCLAIMS ALL OTHER WARRANTIES OR
REPRESENTATIONS, EXPRESS OR IMPLIED, REGARDING TI RESOURCES OR USE THEREOF, INCLUDING BUT NOT LIMITED TO
ACCURACY OR COMPLETENESS, TITLE, ANY EPIDEMIC FAILURE WARRANTY AND ANY IMPLIED WARRANTIES OF
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, AND NON-INFRINGEMENT OF ANY THIRD PARTY INTELLECTUAL
PROPERTY RIGHTS.
TI SHALL NOT BE LIABLE FOR AND SHALL NOT DEFEND OR INDEMNIFY YOU AGAINST ANY CLAIM, INCLUDING BUT NOT
LIMITED TO ANY INFRINGEMENT CLAIM THAT RELATES TO OR IS BASED ON ANY COMBINATION OF PRODUCTS EVEN IF
DESCRIBED IN TI RESOURCES OR OTHERWISE. IN NO EVENT SHALL TI BE LIABLE FOR ANY ACTUAL, DIRECT, SPECIAL,
COLLATERAL, INDIRECT, PUNITIVE, INCIDENTAL, CONSEQUENTIAL OR EXEMPLARY DAMAGES IN CONNECTION WITH OR
ARISING OUT OF TI RESOURCES OR USE THEREOF, AND REGARDLESS OF WHETHER TI HAS BEEN ADVISED OF THE
POSSIBILITY OF SUCH DAMAGES.
You agree to fully indemnify TI and its representatives against any damages, costs, losses, and/or liabilities arising out of your noncompliance with the terms and provisions of this Notice.
This Notice applies to TI Resources. Additional terms apply to the use and purchase of certain types of materials, TI products and services.
These include; without limitation, TI’s standard terms for semiconductor products http://www.ti.com/sc/docs/stdterms.htm), evaluation
modules, and samples (http://www.ti.com/sc/docs/sampterms.htm).
Mailing Address: Texas Instruments, Post Office Box 655303, Dallas, Texas 75265
Copyright © 2018, Texas Instruments Incorporated
Was this manual useful for you? yes no
Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Related manuals

Download PDF

advertising