Texas Instruments | TDA2Px Performance (Rev. A) | Application notes | Texas Instruments TDA2Px Performance (Rev. A) Application notes

Texas Instruments TDA2Px Performance (Rev. A) Application notes
Application Report
SPRACE3A – April 2018 – Revised October 2018
TDA2Px Performance
Piyali Goswami
ABSTRACT
This application report looks into the System-on-Chip (SoC) level performance characteristics of key
usecases targeted for TDA2Px. This document discusses the data path infrastructure and parameters that
manage the system level throughput. Different optimization techniques for optimum system performance
are also described.
1
2
3
4
5
6
7
Contents
SoC Overview ................................................................................................................ 3
Camera Interface Subsystem (CAL) ...................................................................................... 4
Imaging Subsystem (ISS) ................................................................................................. 11
EMIF EDMA Performance ................................................................................................ 17
IVI Usecase Performance ................................................................................................. 19
ADAS Usecase Performance ............................................................................................. 26
References .................................................................................................................. 30
List of Figures
1
TDA2Px Block Diagram ..................................................................................................... 3
2
Camera Subsystem Overview
3
CAL Initiator Bandwidth Statcoll Measurement .......................................................................... 5
4
CAL Initiator 4 Channel YUV422 BP Bandwidth Statcoll Measurement
5
VIP Initiator 1 Channel YUV420 Bandwidth Statcoll Measurement @ 239 MHz VP_CLK
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
.............................................................................................
4
............................................. 6
........................ 7
Channel CAL + VIP Configuration for Surround View Applications ................................................... 8
VIP Initiator 4 Channel YUV420 Bandwidth Statcoll Measurement @ 133 MHz VP_CLK ........................ 9
CAL Bandwidth Along With Other Initiators With Adaptive MFLAG Setting ........................................ 10
ISS Overview ............................................................................................................... 11
Single Pass ISP Sub Block Data Processing Flow .................................................................... 12
ISS 1 Channel ISP Single Pass WDR Bandwidth ..................................................................... 12
Single Channel SIMCOP LDC Operation ............................................................................... 13
Bilinear Interpolation Bandwidth Profile With a Valid Mesh Table at 532 MHz ISS Operation .................. 15
Bi-Cubic Interpolation Bandwidth Profile With a Valid Mesh Table at 532 MHz ISS Operation ................. 15
EDMA 2 TC ECC vs Non ECC Performance @ 532 MHz ........................................................... 18
UC7 (Integrated Cockpit + Navi + Media + Radio) .................................................................... 20
MPU (Standalone) OS Mimic Memory Copy Performance Bandwidth Plot ........................................ 21
BB2D (Standalone) Graphics Mimic Performance Bandwidth Plot ................................................. 22
DSS Standalone Bandwidth Profile for IVI Usecase Traffic ......................................................... 23
VPE Standalone Bandwidth Profile for IVI Usecase .................................................................. 23
IVA Standalone 1080p60 Decode Bandwidth Profile for IVI Usecase ............................................. 24
TDA2Px Surround View System ......................................................................................... 26
DSS Standalone Bandwidth Profile for IVI Usecase Traffic ......................................................... 28
List of Tables
SPRACE3A – April 2018 – Revised October 2018
Submit Documentation Feedback
TDA2Px Performance
Copyright © 2018, Texas Instruments Incorporated
1
www.ti.com
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
................................................................................ 5
CAL PPI0 (4L) 4 Channel Capture Test Parameters ................................................................... 6
Initiator Average Bandwidth at Which CAL Overflows With DDR @ 532 MHz .................................... 10
CAL MFLAG Setting Behaviors .......................................................................................... 10
ISS ISP Performance and Efficiency at Different Operating Frequencies .......................................... 13
Calculating the LDC Bandwidth Degradation Factor .................................................................. 13
LDC Parameters for Unity Mesh Table Based Performance Analysis .............................................. 14
SIMCOP LDC Bi-Linear Interpolation Performance for Unity Mesh ................................................. 14
SIMCOP LDC Bi-Cubic Interpolation Performance for Unity Mesh .................................................. 14
SIMCOP LDC Performance With a Valid Mesh Table ................................................................ 14
ISS Multi-Initiator Bandwidth Analysis ................................................................................... 16
Impact of System Traffic on ISP Single Pass WDR Performance ................................................... 16
Impact of System Traffic on LDC Performance ........................................................................ 17
EMIF FIFO Sizing Differences Between TDA2xx and TDA2Px ...................................................... 17
TDA2Px EMIF Performance Analysis @ 532 MHz and @ 666 MHz................................................ 18
TDA2Px EMIF Performance vs TDA2xx @ 532 MHz ................................................................. 19
IVI Usecases and Different Initiator Roles .............................................................................. 19
Top Three Worst Case Bandwidth Requirements for IVI ............................................................. 21
BW Knobs to Make IVI UC7 Work on TDA2Px ........................................................................ 25
Initiator Wise Break Down of IVI UC7 Validation ...................................................................... 25
ADAS 6 Channel Surround View + CMS With ISP .................................................................... 27
BW Knobs to Make ADAS 6Ch SRV + ISP Work on TDA2Px ....................................................... 29
Initiator Wise Break Down of ADAS 6Ch SRV + ISP Validation ..................................................... 29
ADAS 4 Channel SRV + ISP Expected Bandwidth Analysis ......................................................... 30
Initiator Wise Break Down of ADAS 4Ch SRV + ISP Validation ..................................................... 30
CAL PPI0 (4L) Capture Test Parameters
Trademarks
OMAP is a trademark of Texas Instruments.
All other trademarks are the property of their respective owners.
2
TDA2Px Performance
SPRACE3A – April 2018 – Revised October 2018
Submit Documentation Feedback
Copyright © 2018, Texas Instruments Incorporated
SoC Overview
www.ti.com
1
SoC Overview
TDA2Px is a high-performance, automotive vision application device based on enhanced OMAP™
architecture integrated on a 28-nm technology.
The device block diagram is shown in Figure 1.
TDA2Px
MPU
IVA HD
(dual ARM®
Cortex®±A15)
1080p Video
Co-Processor
GPU
(Dual SGX544 3D)
BB2D
(GC320 2D)
EVE x2
Analytic
Processors
CAL
Imaging Subsystem
ISP
SIMCOP
CAL
Display Subsystem
1x GFX Pipeline
IPU1
(Dual Cortex®±M4)
DSP
CSI2 x2
LCD1
LCD2
3x Video Pipeline
LCD3
(2x C66x
Co-Processor)
IPU2
(Dual Cortex®±M4)
EDMA
sDMA
Overlay
MMU x2
HDMI 1.4a
VIP x2
VPE
High-Speed Interconnect
Connectivity
System
Spinklock
Timers x16
PWM SS x3
USB 3.0
Mailbox x13
WDT
RTC SS
Dual Role FS/HS/SS
w/PHYs
PCIe SS x2
GMAC AVB
USB 3.0
GPIO x8
Dual Role FS/HSw/w/
w/ PHYs x1
w/ ULPI x2
Serial Interface
UART x10
QSPI
McSPI x4
McASP x8
DCAN x2
I2C x5
Program / Data Storage
MCAN (CAN FD) w/ ECC
High Performance Package Only
MMC / SD x4
SATA
DMM
Up to 2.4 MiB
OCMC_RAM
w/ ECC
GPMC / ELM
(NAND/NOR/
Async)
EMIF x2
DDR2/DDR3
32-bit w/ ECC*
Pin Compatible Package Only
*ECC supported on EMIF1
only
Copyright © 2018, Texas Instruments Incorporated
Figure 1. TDA2Px Block Diagram
For more information regarding the TDA2Px device, see the TDA2Px SoC for Advanced Driver Assistance
Systems (ADAS) Silicon Revision 1.0 Technical Reference Manual.
The following sections discuss the performance aspects of various new IP features added in the TDA2Px
device versus TDA2x and critical usecase performance entitlement that the device achieves.
SPRACE3A – April 2018 – Revised October 2018
Submit Documentation Feedback
TDA2Px Performance
Copyright © 2018, Texas Instruments Incorporated
3
Camera Interface Subsystem (CAL)
2
www.ti.com
Camera Interface Subsystem (CAL)
This section looks performance aspects of the Camera Interface subsystem comprising of:
• Camera Adapter Layer (CAL)
• CAL interfaces:
– Two PPI interfaces to CSI-2 PHY
Figure 2 shows the Camera subsystem block diagram.
PRCM
csi2_0_dx0
csi2_0_dy0
CTRLCLK
MIPI CSI-2
Image
Sensor
LVDSRX_96M_GFCLK
VIP3_GCLK
:2
:2
CAM_RST
csi2_0_dx1
csi2_0_dy1
CAL_FCLK
CAL_ICLK
CAL_RST
PPI
IRQ_CROSSBAR
To Device
INTCS IRQ_CROSSBAR_353
CS12_PHY1
csi2_0_dx2
csi2_0_dy2
csi2_0_dx3
csi2_0_dy3
SCP
CAL_IRQ
csi2_0_dx4
csi2_0_dy4
CAL
Device
L3_MAIN Interconnect
csi2_1_dx0
csi2_1_dy0
128-bit Data (Master)
PPI
L4_PER3 Interconnect
32-bit Data (Slave)
Video
Ports
(x4)
SCP
CS12_PHY2
csi2_1_dx1
csi2_1_dy1
csi2_1_dx2
csi2_1_dy2
32-txt Data,
PCLK, Syncs
MIPI CSI-2
Image
Sensor
VIP
Copyright © 2018, Texas Instruments Incorporated
Figure 2. Camera Subsystem Overview
The following enhancements are present in the CAL subsystem in TDA2Px:
• Video Port
– Extend VP1 to replicate as VP2, VP3 and VP4 (16 Bits per Cycle mode on VPORTs)
– Extend the programming model of VP1 to map to the 3 new VPs without any new register addition
– Automatically map VP1 -> CPORT1, VP2 -> CPORT2, VP3 -> CPORT3, VP4 -> CPORT4
– Same Programming parameters applied on all 4 VP
– New Input: EN_BASELINE_MODE 1bit: Baseline compatibility mode. Driven via Control Module
MMR
– Baseline_mode = 1 -> CAL in baseline Mode ( 0 = 4 New VP mode )
• DMA
– Support of a new mode that bifurcates YUV422 (16b format) input to 2 DMA output planes for Y
and UV each (termed as YUV422 Bi-Planar mode).
– The DMA needs to allocate 2x numbers for channels of x number of input streams.
4
TDA2Px Performance
SPRACE3A – April 2018 – Revised October 2018
Submit Documentation Feedback
Copyright © 2018, Texas Instruments Incorporated
Camera Interface Subsystem (CAL)
www.ti.com
2.1
CAL Standalone Performance
In the following subsections, performance aspects of different modes of operation of the CAL IP in
TDA2Px are discussed. In all the experiments the following IP frequencies are applied:
• CAL: 266 MHz
• CSI PHY Control Clock: 96 MHz
• L3 Interconnect : 266 MHz
• EMIF Controller: 266 MHz
• DDR3 Clock : 666 MHz
2.1.1
CAL PPI0 (4L) Capture
In this mode, the CAL IP is configured for capture of a single channel of 1920x1080 frame @ 30 FPS.
Pixels are configured to be 16 bits per pixel. The CAL Write DMA is configured for constant addressing
mode and linear write pattern.
The CAL test is configured with the parameters shown in Table 1.
Table 1. CAL PPI0 (4L) Capture Test Parameters
Test Parameter
Value
Capture Width (in Pixels)
1920
Capture Height (in Pixels)
1080
Bits per pixel
16
Horizontal Blanking
10
Vertical Blanking
15
Data format
0x2A (Raw 8)
Number of Lanes
4 Lanes
Expected Bandwidth = 1920 x 1080 x 30 FPS x 2 Bytes per pixel = 124.416 MBps
Measured Average Bandwidth = 128.439 MBps
160
140
Bandwidth (MBps)
120
100
80
31.8 ms
60
40
20
0
0
10
20
30
40
50
60
Time (ms)
70
80
90
100
D001
Figure 3. CAL Initiator Bandwidth Statcoll Measurement
SPRACE3A – April 2018 – Revised October 2018
Submit Documentation Feedback
TDA2Px Performance
Copyright © 2018, Texas Instruments Incorporated
5
Camera Interface Subsystem (CAL)
2.1.2
www.ti.com
Channel CAL PPI0 (4L) Capture With YUV422 to YUV422BP
In this mode, the CAL IP is configured for capture of a 4 channels of 1920x1080 frame @ 30 FPS. Pixels
are configured to be 16 bits per pixel. The CAL Write DMA is configured for constant addressing mode
and linear write pattern. Write DMA Bi-planar mode is enabled to split the Y and UV planes.
Table 2. CAL PPI0 (4L) 4 Channel Capture Test Parameters
Test Parameter
Value
Capture Width (in Pixels)
1920
Capture Height (in Pixels)
1080
Bits per pixel
8
Horizontal Blanking
10
Vertical Blanking
15
Data format
0x2A (Raw 8)
Number of Lanes
4 Lanes
Number of Virtual Channels
4
The pixel processing extraction is configured to 8 bits and the pixel packing is configured to 16 bits.
All the pixel processing contexts are utilized and all the write DMA contexts are utilized to generate 8
streams (4 channels each having Y and UV data).
Expected Bandwidth = 1920 x 1080 x 30 FPS x 2 Bytes per pixel x 4 channels = 497.664 MBps
Measured Average Bandwidth = 520.976 MBps
Without the YUV422 to YUV422 BP conversion 4 WR_DMA contexts are utilized with similar bandwidth
profile.
600
Bandwidth (MBps)
500
31.2 ms
400
300
200
100
0
0
10
20
30
40
50
60
Time (ms)
70
80
90
100
D002
Figure 4. CAL Initiator 4 Channel YUV422 BP Bandwidth Statcoll Measurement
6
TDA2Px Performance
SPRACE3A – April 2018 – Revised October 2018
Submit Documentation Feedback
Copyright © 2018, Texas Instruments Incorporated
Camera Interface Subsystem (CAL)
www.ti.com
2.1.3
CAL to VIP 1 Channel Baseline Mode
When baseline mode is enabled, only one VPORT between CAL and VIP is enabled.
In this mode, VIP 1A has been configured to capture the input frames using active video signaling and
convert the incoming YUV422 frames to YUV420. CAL upon capturing the frames from PPI0 will forward
the frames to the first VPORT. In order for VIP to capture correctly, it is important to set the
CAL_CSI2_CTXx.LINES to the exact number of lines one wants to capture via VIP.
In case of erroneous frames received due to CSI packets being dropped due to errors in the transmission,
the CAL to VIP behavior for capturing or dropping the received frames is as below:
• Small Frame Reception Case: If the expected frame size is 65x65 (set CAL_CSI2_CTXx .LINES = 65)
and the received input frame is 64x64, no data written out by VIP.
• Correct Frame Reception Case: If the expected frame size is 64x64 (set CAL_CSI2_CTXx .LINES =
64) and the received input frame is 64x64, 64 lines written out. This corresponds to a normal working
case.
• Large Frame Reception Case: If the expected frame size is 48x48 (set CAL_CSI2_CTXx .LINES = 48)
and the received input frame is 64x64, only 48 of 64 lines each is written out. Extra data not written
out.
The CAL + VIP is able to receive frames normally after short and long frames are received.
The CAL + VIP can operate at a maximum VP_CLK frequency of 90% of VIP functional clock frequency.
In TDA2Px, the VIP clock frequency is 266 MHz. Thus, 239 MHz is the maximum VPORT pixel clock
frequency.
The expected bandwidth is this test is 1920 x 1080 x 1.5 Bytes per pixel x 30 fps = 93.312 MBps.
The observed bandwidth was VIP1_P1 + VIP1_P2 = 32.36 + 64.75 = 97.12 MBps.
90
VIP1_P1
VIP1_P2
80
Bandwidth (MBps)
70
60
50
40
30
20
28.6 ms
10
0
0
5
10 15 20 25 30 35 40 45 50 55 60 65
Time (ms)
D003
Figure 5. VIP Initiator 1 Channel YUV420 Bandwidth Statcoll Measurement @ 239 MHz VP_CLK
SPRACE3A – April 2018 – Revised October 2018
Submit Documentation Feedback
TDA2Px Performance
Copyright © 2018, Texas Instruments Incorporated
7
Camera Interface Subsystem (CAL)
2.1.4
www.ti.com
CAL to VIP 4 Channel Capture
In this configuration, the EN_BASELINE_MODE is set to 0 (default). This enables using all of the four
VPORTs available in TDA2Px. CAL PPI0 (4L) is configured to capture four channels of 1920x1080 frames
@ 30 FPS, 2 bytes per pixel. The CAL captured data is then sent to VIP 1A, 2A, 3A, 4A. VIP performs
conversion from YUV422 to YUV420 for all 4 channels.
This configuration is useful in surround view applications that are targeted for TDA2Px with an external
ISP as shown in Figure 6.
video
DDR3L
DDR3L
DS90UB
953
32
CAL
DS90UB
953
video
i2C
TDA2 PLUS
VP
4L
VP
video
i2C
4x 2MP 30fps
DS90UB
964
VP
i2C
EMIF
i2c
DS90UB
953
VP
4 x 2MP 30fps
video
1A
DS90UB
953
VIP1
1B
2A
2B
1A
VIP2
1B
4CH YUV420 DATA
i2C
2A
2B
Copyright © 2018, Texas Instruments Incorporated
Figure 6. Channel CAL + VIP Configuration for Surround View Applications
8
TDA2Px Performance
SPRACE3A – April 2018 – Revised October 2018
Submit Documentation Feedback
Copyright © 2018, Texas Instruments Incorporated
Camera Interface Subsystem (CAL)
www.ti.com
The expected bandwidth is 1920 x 1080 x 1.5 Bytes per pixel x 30 fps x 4 Channels = 373.25 MBps.
The observed average bandwidth was VIP1_P1 + VIP1_P2 + VIP2_P1 + VIP2_P2 = 72.34 + 143.32 +
72.44 + 143.35 = 431.45 MBps.
180
160
Bandwidth (MBps)
140
120
27.4 ms
100
80
60
40
VIP1_P1
VIP1_P2
20
VIP2_P1
VIP2_P2
0
0
5
10 15 20 25 30 35 40 45 50 55 60 65
Time (ms)
D004
Figure 7. VIP Initiator 4 Channel YUV420 Bandwidth Statcoll Measurement @ 133 MHz VP_CLK
2.2
CAL Performance With Multiple Initiators
In this section, the CAL MFLAG behavior is analyzed when multiple initiators are executed in parallel with
CAL, which causes the CAL FIFO to overflow and not be able to meet its real-time performance.
The CAL performance is further discussed in the context of planned automotive usecases for the TDA2Px
device in Section 6.
Hard real time traffic can’t be stalled for long periods of time. Indeed, the camera sends data at constant
speed and it can only be stalled until FIFOs on the path are filled up. When FIFOs become full, data is
discarded and the frame is therefore corrupted. To minimize the risk of real time data corruption, CAL IP
supports the MFLAG based Quality of Service Mechanism.
Dynamic MFLAG generation is used when the write DMA operates on real time data. In that case, the
MFLAG value depends on the number of slots ready to generate transactions in the write DMA (n):
• 00: SAFE (n< CAL_CTRL.MFLAGL)
• 01: VULNERABLE (CAL_CTRL.MFLAGL <=n < CAL_CTRL.MFLAGH)
• 11: ENDANGERED (CAL_CTRL.MFLAGH <=n)
Software should ensure that:
• CAL_CTRL.MFLAGL <= CAL_CTRL.MFLAGH (only 0x00 or 0x11 generated when
CAL_CTRL.MFLAGL = CAL_CTRL.MFLAGH)
• CAL_CTRL.MFLAGL = 0x00, 0xFF or less or equal to 2^(WFIFO-3)
• CAL_CTRL.MFLAGH = 0x00, 0xFF or less or equal to 2^(WFIFO-3)
In this experiment, the following initiators were enabled along with the CAL capture of 4 channels of
1920x1080 Raw 12 bit @ 30 FPS.
DSS: 1 video pipe traffic of 1920x826 ARGB8888 @ 60 FPS.
DSP1 EDMA and DSP2 EDMA
The DDR frequency was set to 532 MHz.
SPRACE3A – April 2018 – Revised October 2018
Submit Documentation Feedback
TDA2Px Performance
Copyright © 2018, Texas Instruments Incorporated
9
Camera Interface Subsystem (CAL)
www.ti.com
Table 3 shows the measured bandwidth from the different initiators that causes the CAL overflows.
Table 3. Initiator Average Bandwidth at Which CAL Overflows With DDR @ 532
MHz
Initiators
Average Bandwidth (MBps)
CAL
747.31
DSP1_EDMA
2840.98
DSP2_EDMA
1850.67
DSS
377.81
IPU1
49.48
IPU2
17.31
EMIF1_SYS
2891.95
EMIF2_SYS
2891.59
Total EMIF
5783.54 (67.9 % efficiency)
With this initiator bandwidth profile, the CAL MFLAG behavior is as shown in Table 4.
Table 4. CAL MFLAG Setting Behaviors
CAL MFLAG Setting
CAL Overflow Behavior
No CAL MFLAG
CAL Overflows
Always on CAL MFLAG
No CAL Overflows
Adaptive MFLAG (50% - 75% of 8KB WFIFO)
CAL Overflows
Adaptive MFLAG (25% - 75% of 8KB WFIFO)
No CAL overflows
The following adaptive MFLAG setting is recommended for CAL overflow:
/* Set adaptive MFLAG for 25% to 75% of the WFIFO size (64 x 16 bytes).
* WFIFO = 9 for TDA2PX.
*/
WR_FIELD_32(CAL_INST, CAL__CAL_CTRL, CAL__CAL_CTRL__MFLAGH, 0x30);
WR_FIELD_32(CAL_INST, CAL__CAL_CTRL, CAL__CAL_CTRL__MFLAGL, 0x10);
A snapshot of the bandwidth profile with this CAL MFLAG setting is as shown in Figure 8.
4000
CAL
DSP1_EDMA
3500
DSP2_EDMA
DSS
EMIF1_SYS
EMIF2_SYS
IPU1
IPU2
Bandwidth (MBps)
3000
2500
2000
1500
1000
500
0
0
5
10 15 20 25 30 35 40 45 50 55 60 65
Time (ms)
D005
Figure 8. CAL Bandwidth Along With Other Initiators With Adaptive MFLAG Setting
10
TDA2Px Performance
SPRACE3A – April 2018 – Revised October 2018
Submit Documentation Feedback
Copyright © 2018, Texas Instruments Incorporated
Imaging Subsystem (ISS)
www.ti.com
3
Imaging Subsystem (ISS)
The imaging subsystem (ISS) (see Figure 9) deals with the processing of the pixel data coming from
memory (image format encoding and decoding can be done to and from memory). With its subparts, such
as interfaces and interconnects, image signal processor (ISP), and still image coprocessor (SIMCOP), the
ISS is a key component for the following applications:
• Rear View Camera
• Front View Stereo Camera
• Surround View Camera
Imaging Subsystem
CAL_B
Video Mux
SIMCOP
ISP
NSF3V
GLBCE
IPIPEIF
RSZ1
LDC LUT
DMA
LDC
Master port
Master port
IPIPE
H3A
512 bytes
CNF
IBUFF
nn
IBUFF
IBUFF
IBUFFnn
4x512x15b
4x512x15b
4x512x15b
4x512x15b
ISIF
RSZ2
VTNF
Interconnect
HW SEQ
VBUS
Buffer logic
VBUS2OCP
Slave port
ISS interconnects (128b Data and 32b Config)
ISS top-level resources
3
Bridge
Bridge
Master Interface (x3)
Slave Interface
Copyright © 2018, Texas Instruments Incorporated
Figure 9. ISS Overview
3.1
ISS Standalone Performance
This section reviews the ISS standalone performance for different modes of operation planned for the ISS
based usecases in TDA2Px. In all the experiments the following IP frequencies are applied:
• ISS : 354 MHz (OPP_NOM), 425 MHz (OPP_OD), 532 MHz (OPP_HIGH)
• L3 Interconnect : 266 MHz
• EMIF Controller: 266 MHz
• DDR3 Clock : 666 MHz
SPRACE3A – April 2018 – Revised October 2018
Submit Documentation Feedback
TDA2Px Performance
Copyright © 2018, Texas Instruments Incorporated
11
Imaging Subsystem (ISS)
3.1.1
www.ti.com
ISP Memory to Memory Single Pass WDR
The ISP processing in memory-to-memory mode for a single pass wide dynamic range is shown
Figure 10.
RAW
12b
DDR
GLBCE
16b
12b
CAL
IPIPEIF (FE)
DPC
IPIPEIF (BE
20b)
ISIF
NSFv3
12b
IPIPE
YUV420
8b
RSZ
YUV420
8b
CNF
DDR
Figure 10. Single Pass ISP Sub Block Data Processing Flow
The expected bandwidth in this configuration is calculated as shown below:
• Input = 1920 x 1080 x 30 FPS x 12 bits per pixel / 8 = 93.312 MBps
• Output = 1920 x 1080 x 30 FPS x 1.5 Bytes per pixel = 93.312 MBps
In measurements with a FPS of 39.5, the average bandwidth was found to be 255.88 MBps. The
bandwidth profile is shown in Figure 11.
1200
ISS_NRT1
Bandwidth (MBps)
1000
800
600
400
200
0
0
5
10 15 20 25 30 35 40 45 50 55 60 65
Time (ms)
D006
Figure 11. ISS 1 Channel ISP Single Pass WDR Bandwidth
In order to further understand the impact of the frequency and maximum performance achieved by the
ISP, multiple experiments were conducted for OPP_NOM, OPP_OD, OPP_HIGH frequencies. The results
were compared with the TDA3x ISP performance at 212.8 MHz operation. The ISP efficiency for single
pass WDR was found to be approximately 94%.
12
TDA2Px Performance
SPRACE3A – April 2018 – Revised October 2018
Submit Documentation Feedback
Copyright © 2018, Texas Instruments Incorporated
Imaging Subsystem (ISS)
www.ti.com
Table 5. ISS ISP Performance and Efficiency at Different Operating Frequencies
ISS frequency
(MHz)
3.1.2
ISP Efficiency
Max Number of
1080p30
channels
Headroom (ms)
in 33 ms
199.38
94%
3
1.8
334.45
94%
5
2
5.2
398.77
94%
6
1.8
4.15
499.66
94%
7
3.95
Freq Ratio
Single Pass
WDR Time (ms)
Single Pass
WDR Mpix/s
212.8
0%
10.4
354
66%
6.2
425.6
100%
532
150%
Simcop Memory-to-Memory Lens Distortion Correction
The lens distortion correction operation involves reading the input distorted frame based on a mesh table
input that maps the input block to output blocks. The LDC read interface is used to read the input image
and mesh table. The SIMCOP DMA is used to write out the output blocks to DDR. The data flow is shown
in Figure 12.
LDC Input (YUV420)
DDR
LDC Input (YUV420)
LDC
DDR
DDR
LDC Mesh Table
Figure 12. Single Channel SIMCOP LDC Operation
The block size of the output frame determines the bandwidth degradation for the input LDC bandwidth.
The calculation of the degradation factor can be understood with the example in Table 6.
Table 6. Calculating the LDC Bandwidth Degradation Factor
Example 1:
74
• 64x8 out block will need input block of 74x18 with pix_pad of 5.
• After OCP alignment Luma block will be 96x18 Bytes and Chroma
block will be 96x9 Bytes.
64
• With this Degradation factor is 3.375 (96*27/(64*8*1.5)).
8
Example 2:
• 64x32 out block will need input block of 70x38 with pix_pad of 3.
18
5
Luma
• After OCP alignment Luma block will be 96x38 Bytes and Chroma
block will be 96x19 Bytes.
5
• With this Degradation factor is 1.78 (96*57/(64*32*1.5)).
96
74
64
5
4
9
Chroma
2.5
96
The following SIMCOP settings provide the best performance. In all of the SIMCOP based measurements,
these settings have been applied. Application developers and ISS driver developers are recommended to
set:
• SIMCOP DMA Burst Size = 8 x 128 Bit Burst
• SIMCOP DMA Tags = 0xF
• LDC Read Tags = 0xF
SPRACE3A – April 2018 – Revised October 2018
Submit Documentation Feedback
TDA2Px Performance
Copyright © 2018, Texas Instruments Incorporated
13
Imaging Subsystem (ISS)
www.ti.com
With these settings, the SIMCOP LDC performance was analyzed with a unity mesh table. The
parameters shown in Table 7 were used for the analysis:
Table 7. LDC Parameters for Unity Mesh Table Based Performance Analysis
LDC Parameter
LDC Parameter Value
Input Width
1920
Input Height
1088
Output Width
1920
Output Width
1088
Output Block Width
64
Output Block Height
32
Pixel Pad
3
Interpolation
Bi-Cubic / Bi-Linear
LDC LUT Downscale factor
4
The SIMCOP performance was analyzed for the different OPPs planned in TDA2Px for both Bi-Linear and
Bi-cubic interpolation. As expected the Bi-cubic interpolation leads to approximately 50% SIMCOP
utilization. Table 8 shows the detailed analysis for Bilinear Interpolation and Table 9 shows the Bicubic
interpolation. These provide the best case SIMCOP LDC efficiency.
Table 8. SIMCOP LDC Bi-Linear Interpolation Performance for Unity Mesh
Frequency Ratio
LDC Time (ms)
(Bi-linear)
Standalone LDC
(Unity Mesh)
Mpix/s (BiLinear)
Efficiency
Max Number of
1080p30
Channels
212.8
0%
10.00
207.36
97%
3
3
354
66%
6.02
344.22
97%
5
2.88
425.6
100%
5.03
412.66
97%
6
2.85
532
150%
4.02
516.33
97%
8
0.872
ISS Frequency
(MHz)
Headroom (ms)
in 33 ms
Table 9. SIMCOP LDC Bi-Cubic Interpolation Performance for Unity Mesh
ISS frequency
(MHz)
Efficiency
Max Number of
1080p30
Channels
Headroom (ms)
in 33 ms
104.62
49%
1
13.18
174.25
49%
2
9.2
10
207.36
49%
3
3
8.85
234.31
44%
3
6.45
Freq Ratio
LDC Time (ms)
(Bicubic)
LDC Mpix/s
(Bicubic)
212.8
0%
19.82
354
66%
11.9
425.6
100%
532
150%
Analysis was also performed with a valid mesh table. For the Valid mesh table analysis the output block
width, block height and pixel pad were set to 32, 36 and 2 respectively. The impact in Table 10 is visible
with Bi-linear interpolation where the efficiency drops from 97% to 93%.
Table 10. SIMCOP LDC Performance With a Valid Mesh Table
Condition
ISS Frequency
(MHz)
LDC Time (ms)
LDC Mpix/s
Efficiency
Max Number
of 1080p30
Channels
Headroom
(ms) in 33 ms
Average
Bandwidth
(MBps)
LDC Bi-linear
Interpolation
532
4.20
493.71
93%
7
3.6
1917.45
LDC Bi-cubic
Interpolation
532
7.94
261.16
49%
4
1.24
1049.98
14
TDA2Px Performance
SPRACE3A – April 2018 – Revised October 2018
Submit Documentation Feedback
Copyright © 2018, Texas Instruments Incorporated
Imaging Subsystem (ISS)
www.ti.com
The bandwidth profile for a valid mesh table bi-linear interpolation is shown in Figure 13 and Figure 14.
2500
ISS_NRT2
Bandwidth (MBps)
2000
1500
1000
500
0
0
5
10 15 20 25 30 35 40 45 50 55 60 65
Time (ms)
D007
Figure 13. Bilinear Interpolation Bandwidth Profile With a Valid Mesh Table at 532 MHz ISS Operation
1400
Bandwidth (MBps)
1200
1000
800
600
400
200
ISS_NRT2
0
0
5
10 15 20 25 30 35 40 45 50 55 60 65
Time (ms)
D008
Figure 14. Bi-Cubic Interpolation Bandwidth Profile With a Valid Mesh Table at 532 MHz ISS Operation
SPRACE3A – April 2018 – Revised October 2018
Submit Documentation Feedback
TDA2Px Performance
Copyright © 2018, Texas Instruments Incorporated
15
Imaging Subsystem (ISS)
3.1.3
www.ti.com
ISS Performance With Multiple Initiators
In this section, the impact of other initiator traffic on the ISS performance is discussed when they are
running simultaneously. The following additional initiators were enabled, as shown in Table 11.
Table 11. ISS Multi-Initiator Bandwidth Analysis
Initiator
Initiator Bandwidth (MBps)
Remarks
ISP
1488.47
RAW to YUV420
BB2D
1959.07
SGX 3D Synthesis (Output RGB24)
DSS
360.63
EDMAs
449.25
Loading EDMA traffic
MPU
1088.25
A15 mem copy loading traffic.
Total
5345.67
The impact of the other initiators on the ISP Single Pass WDR performance was measured in terms of
increase in the frame processing time, decrease in the efficiency and overall Mpix/second processing. The
results are as shown in Table 12.
Table 12. Impact of System Traffic on ISP Single Pass WDR Performance
Efficiency
Max Number of
1080p30
Channels
Headroom (ms)
in 33 ms
499.66
94%
7
3.95
465.98
88%
7
1.85
ISS Frequency
(MHz)
Single Pass
WDR Time (ms)
Single Pass
WDR Mpix/s
Standalone ISP
532
4.15
With System
Traffic
532
4.45
Condition
Lens distortion correction with Bi-linear interpolation was further added to this multi-initiator traffic to
analyze the impact of system traffic on the LDC performance. It was found due to the 2D block access
nature of the LDC read and Simcop DMA writes, there is significant impact of system traffic on LDC
performance. Typical measures that can be used to mitigate these are:
• Place a Bandwidth regulator on the NRT2 port to give required priority to the LDC traffic.
• Place the mesh table data on the OCMC RAM of the device to reduce LDC traffic contention at the
DDR.
NOTE: In the TDA2Px integration, the NRT1 and NRT2 ports have the same number of L3 switch
hops. Thus unlike TDA3xx the swapping of NRT1 and NRT2 does not give any noticeable
benefit in terms of bandwidth performance.
16
TDA2Px Performance
SPRACE3A – April 2018 – Revised October 2018
Submit Documentation Feedback
Copyright © 2018, Texas Instruments Incorporated
EMIF EDMA Performance
www.ti.com
Table 13. Impact of System Traffic on LDC Performance
LDC Time (ms) (Bilinear)
LDC Mpix/s (BiLinear)
Efficiency
Max Number of
1080p30 Channels
Headroom (ms) in
33 ms
4.20
493.71
93%
7
3.6
LDC + Single Pass
WDR (2)
4.36
475.60
89%
7
2.48
With Other System
Traffic (3)
5.80
357.52
67%
5
4
With Other System
Traffic (BR on NRT2
= 1600 MBps) (4)
5.40
384.00
72%
6
0.6
Condition
LDC Only
(1)
(1) Standalone, no other initiators
(2) SIMCOP + ISP Traffic
(3) Just by adding LDC to SRV traffic. Overall DDR traffic = 5.217 GBps
(4) Bandwidth regulator enabled on NRT2 port. Overall DDR traffic = 5.15 GBps. Swapping NRT1 and NRT2 does not make any
difference (expected as the L3 switch levels are the same).
4
EMIF EDMA Performance
The TDA2Px EMIF controller and DDR PHY have the following enhancements with respect to the TDA2xx
EMIF controller and PHY:
• Support for 666 MHz DDR3 clock
• Optimized Command and Write Data FIFO sizing (see Table 14)
• ECC Read Modify write support
Table 14. EMIF FIFO Sizing Differences Between TDA2xx and TDA2Px
TDA2xx
Parameter
Pre Command FIFO
Command FIFO
Pre Write FIFO
System Local Interface
Entries
TDA2Px
MPU Local Interface
Entries
System Local Interface
Entries
MPU Local Interface
Entries
6
4
6
4
Up to 10
Up to 10
Up to 16
Up to 16
6
8
10
12
NA
NA
Up to 16
Up to 16
(256-bit) Up to (19 × 256
bits) + 6
Up to 19 + 8
Up to (16 × 256 bits)
Up to 16
Return Command FIFO
22
24
22
24
SDRAM Read Data FIFO
22
24
22
24
Register Read Data FIFO
2
0
2
0
NA
NA
Up to 16
Up to 16
RMW FIFO
Write Data FIFO
RMW Read Data FIFO
SPRACE3A – April 2018 – Revised October 2018
Submit Documentation Feedback
TDA2Px Performance
Copyright © 2018, Texas Instruments Incorporated
17
EMIF EDMA Performance
www.ti.com
To understand the impact of the difference between TDA2xx and TDA2Px EMIF performance, a 2 TC
EDMA transfer was performed.
The effect of ECC transfers versus non ECC transfers was analyzed and the results show that the
performance between ECC and non ECC transfers is comparable.
3555.03
3550.44
3388.23
3320.94
3227.36
Bandwidth (MBps)
3227.36
COPY
WR
RD
COPY
Non ECC
WR
ECC
RD
D009
Figure 15. EDMA 2 TC ECC vs Non ECC Performance @ 532 MHz
In another experiment to understand the impact of the frequency upgrade from 532 MHz to 666 MHz
along with the FIFO sizing changes, the partial IVI usecase was run and additional EDMA load was run in
parallel to keep the EMIF FIFOs fully occupied to analyze the EMIF behavior. More details regarding the
IVI traffic are discussed in Section 5. As can be seen in Table 15, with a frequency scaling of 532 MHz to
666 MHz (approximately 25%), an equivalent performance gain 5486.26 MBps to 6863.22 MBps
(approximately 25%) is achieved.
Table 15. TDA2Px EMIF Performance Analysis @ 532 MHz and @ 666 MHz
Expt Name
TDA2Px-DDR3 @ 532 MHz
TDA2Px-DDR3 @ 666 MHz
EMIF1_SYS (MBps)
2045.97
2799.14
EMIF2_SYS (MBps)
2046.54
2790.17
MA_MPU_P1 (MBps)
696.62
634.85
MA_MPU_P2 (MBps)
697.13
639.06
Total DDR BW
5486.26
6863.22
64.5%
64.4%
Efficiency
18
TDA2Px Performance
SPRACE3A – April 2018 – Revised October 2018
Submit Documentation Feedback
Copyright © 2018, Texas Instruments Incorporated
IVI Usecase Performance
www.ti.com
To stretch the TDA2Px device to its limit, the EDMA transfers were increased until DSS started underflowing. Since the IVI real time DSS traffic requirement is high, in this experiment the DSS would
underflow due to L3 limits. This analysis gave insight into the maximum EMIF performance available by
maximizing the EMIF FIFO usage. In this configuration, Table 16 gives the relative performance between
TDA2Px EMIF and TDA2XX EMIF both operating at 532 MHz.
Table 16. TDA2Px EMIF Performance vs TDA2xx @ 532 MHz
Expt Name
TDA2Px-EMIF @ 532 MHz
TDA2xx EMIF @ 532 MHz
EMIF1_SYS (MBps)
2496.503864
2002.71136
EMIF2_SYS (MBps)
2493.411011
1985.6016
MA_MPU_P1 (MBps)
396.9076276
720.65952
MA_MPU_P2 (MBps)
395.2099299
719.42016
Total DDR BW
5782.032432
5428.39264
67.9%
63.8%
Efficiency
5
IVI Usecase Performance
In an automotive infotainment system, TDA2Px can be used as either the main processor in the head-unit
(Integrated Head Unit) or co-processor (either the applications (HMI)-processor or a co-processor (for
radio, audio)). Infotainment system requires a rich set-of high-level OS, high-resolution multi-display,
camera input, navigation, speech, radio, multimedia, and connectivity support.
TDA2Px extends the target use cases for TDA2x. The list of planned usecases for IVI is given in Table 17.
Table 17. IVI Usecases and Different Initiator Roles
2xA15
GPU
IPU2 + IVA-HD +
VPE Decode
IPU2 + IVA-HD
Encode
C66x DSPs /
EVEs
IPU1
UC1 (Highway
Driving + DualNavi + Media):
Infotainment +
Cluster Info
HLOS, HMI, Nav
HMI, Nav (maps
(main disp.) + Nav on both displays)
(info disp.),
Connectivity, and
so forth.
1080i-60 decode
DTV w/external
decoder DeInterlace 1x
1080i60 (for RSE
display)
1080p30 encode
(for remote eAVB
display)
Audio mixing &
routing
CAN stack
UC2 (Highway
Driving +
Projection + Navi
+ Media):
Infotainment +
Cluster Info +
Multi- DAB Radio
HLOS, Projection
mode, HMI, Nav
(info disp.),
Connectivity, etc.
HMI, Nav
Phone projection
mode: 1080p30
decode
1080p30 encode
(for remote CE
device over WiFi)
Multi DAB Radio
+ Audio mixing &
routing
CAN stack
UC3 (Street
Driving + Navi +
2D Surroundview, or
LDW/TSR/OD):
Infotainment +
Cluster Info +
Multi- DAB Radio
+ Info ADAS or
Driver Monitoring
HLOS, HMI, Nav,
Connectivity, and
so forth.
HMI, Nav
1080i-60 decode
DTV w/external
decoder DeInterlace 1x
1080i60 (for CE
device, or RSE
display)
1080p30 encode
(for remote CE
device over WiFi),
OR Car black-box
encoding
(recording at least
2 cameras –
front/back: 720p
resolution)
InfoADAS (2D
SRV + OD/PD)
(or LDW + TSR +
OD/PD, or Driver
monitor.) + Multi
DAB Radio +
Audio mixing &
routing
CAN stack
UC4 (3D SRV w/
Park Assist):
Infotainment +
Cluster Info +
Dual DAB Radio
HLOS, HMI, 3D
SRV (main disp.)
+ Cluster Info
(info disp.),
Connectivity, and
so forthd.
HMI, 3D SRV
processing
SPRACE3A – April 2018 – Revised October 2018
Submit Documentation Feedback
3D SRV + Park
Assist + MultiDAB Radio +
Audio mixing &
routing
TDA2Px Performance
Copyright © 2018, Texas Instruments Incorporated
19
IVI Usecase Performance
www.ti.com
Table 17. IVI Usecases and Different Initiator Roles (continued)
IPU2 + IVA-HD +
VPE Decode
IPU2 + IVA-HD
Encode
C66x DSPs /
EVEs
HMI, 3D SRV
processing, Nav
1080i-60 decode
DTV w/external
decoder DeInterlace 1x
1080i60 (for RSE
display)
1080p30 encode
(for remote eAVB
display) OR Car
black-box
encoding
(recording at least
2 cameras –
front/back: 720p
resolution)
3D SRV + MultiDAB Radio +
Audio mixing &
routing
CAN stack
Hypervisor, IVI
HLOS, Cluster
OS, Nav,
Connectivity
HMI, Nav, &
Digital Cluster
1080i-60 decode
DTV w/external
decoder DeInterlace 1x
1080i60 (for RSE
display)
1080p30 encode
(for remote eAVB
display)
Audio mixing &
routing
CAN stack
Hypervisor, IVI
HLOS, Cluster
OS, Nav,
Connectivity
HMI, Nav, &
Digital Cluster
1080i-60 decode
DTV w/external
decoder DeInterlace 1x
1080i60 (for RSE
display)
1080p30 encode
(for remote eAVB
display)
Dual DAB Radio + CAN stack
Audio mixing &
routing
2xA15
GPU
UC5 (Street
Driving w/ 3D
SRV + Navi +
Media):
Infotainment +
Cluster Info +
Multi- DAB Radio
HLOS, HMI, 3D
SRV (main disp.)
+ Nav (info disp.),
Connectivity, and
so forth.
UC6 (Integrated
Cockpit + Navi +
Media):
Infotainment +
Digital Cluster
UC7 (Integrated
Cockpit + Navi +
Media + Radio):
Infotainment +
Digital Cluster +
Dual DAB Radio
IPU1
The UC7 IVI usecase is the heaviest with respect to bandwidth requirements. A summary of the data flow
as a part of UC7 is as shown in Figure 16.
Copyright © 2018, Texas Instruments Incorporated
Figure 16. UC7 (Integrated Cockpit + Navi + Media + Radio)
20
TDA2Px Performance
SPRACE3A – April 2018 – Revised October 2018
Submit Documentation Feedback
Copyright © 2018, Texas Instruments Incorporated
IVI Usecase Performance
www.ti.com
A summary of the top three worst case bandwidth IVI usecase requirements are listed in Table 18
Table 18. Top Three Worst Case Bandwidth Requirements for IVI
IVI UC Traffic Requirement
Initiator
UC5
UC6
UC7
MPU
678.8
1280.8
1280.8
GPU
2217.31
2103.72
2103.72
DSP
227.56
2.54
61.67
IVA
1348.43
1348.43
1348.43
CAL
165.69
0
0
DSS
1168.13
1426.64
1426.64
VPE
466.56
466.56
466.56
Misc
33.90858
32.03222
33.03422
Total
6306.389
6660.722
6720.854
The following subsections discuss the different configurations used for each initiator to generate the
overall IVI UC7.
5.1
MPU CPU Traffic
The Cortex A15 is responsible for generating OS traffic, TV Bit stream Out, Modem/WiFi data Out, TV
AAC stream In, BT SCO Audio, Microphone Audio, Filesystem transfers in the UC7. In order to mimic this,
A15 was configured to perform memory copy transfers with MPU I-Cache and D-Cache enabled.
Hardware default A15 configurations were used to get optimal performance.
In the standalone mode, the MPU was found to generate a total average bandwidth of 3011.5 MBps.
1800
1600
Bandwidth (MBps)
1400
1200
1000
800
600
400
MA_MPU_P1
MA_MPU_P2
200
0
0
5
10
15
20
25
Time (ms)
30
35
40
D010
Figure 17. MPU (Standalone) OS Mimic Memory Copy Performance Bandwidth Plot
SPRACE3A – April 2018 – Revised October 2018
Submit Documentation Feedback
TDA2Px Performance
Copyright © 2018, Texas Instruments Incorporated
21
IVI Usecase Performance
5.2
www.ti.com
Graphics Processing Traffic
The GPU is responsible for generating the HMI, Navigation and the 3D cluster graphics as a part of IVI
UC7. GPU programming is tightly coupled with Linux OS. In the non-OS based test, the GPU traffic was
mimicked by the BB2D 2D graphics engine which has a relatively similar access pattern compared to
other initiators and also connects to the same L3 switch fabric as the GPU. The BB2D was configured to
perform a 4 layer 1080p YUV420 overlay.
With the BB2D processing such frames back to back the total average bandwidth was measured to be
3306 MBps in the standalone mode.
2000
1800
Bandwidth (MBps)
1600
1400
1200
1000
800
600
400
BBD2D_P1
BBD2D_P2
200
0
0
2
4
6
8
10
12
Time (ms)
14
16
18
20
D012
Figure 18. BB2D (Standalone) Graphics Mimic Performance Bandwidth Plot
5.3
Display Traffic
The display traffic as a part of UC7 is listed below:
• One Video output (1920x1080 @ 60 FPS) with the blend of the following three:
– DSS read of HMI (Keyboard layer) Buf (Display1) @ 60fps (1920x826) - 4 BPP
– DSS read of Navi Layer1 Buf (Display1) @ 60fps (1920x543) - 4 BPP
– DSS read of Navi Layer2 Buf (Display1) @ 60fps (1920x1007) - 4 BPP
• Second Video output (1920x720 @ 60 FPS) with following:
– DSS read of 3DCluster (1920x720) for Display2 @ 60 fps- 4 BPP
In order for such high real-time display traffic to be supported without sync losses and underflows, the
following settings are recommended:
• BURSTSIZE = 8 x 128-bit bursts
• BUFPRELOAD = 1 (Hardware pre-fetches pixels up to high threshold value)
With the above settings in place the DSS was able to achieve an average bandwidth of 1445.56 MBps
without any underflows and sync losses.
22
TDA2Px Performance
SPRACE3A – April 2018 – Revised October 2018
Submit Documentation Feedback
Copyright © 2018, Texas Instruments Incorporated
IVI Usecase Performance
www.ti.com
The DSS bandwidth profile for this configuration is shown in Figure 19.
2500
Bandwidth (MBps)
2000
1500
1000
500
0
0
10
20
30
40
50
60
Time (ms)
70
80
90
100
D013
Figure 19. DSS Standalone Bandwidth Profile for IVI Usecase Traffic
5.4
VPE Processing Traffic
The Video Processing engine is responsible for de-interlacing 1080i decoded streams of YUV420 to
generate progressive 1080p YUV420 stream at 30 FPS.
In this operation the VPE traffic in standalone mode was found to be on an average 485.69 MBps with
peak traffic of 2194 MBps. In the sequent sections we will see how placing a BW limiter on the VPE ports
will enable the DSS traffic to not underflow when all the initiators are executed together.
The VPE standalone BW profile is as shown in Figure 20.
1400
VPE_P1
VPE_P2
Bandwidth (MBps)
1200
1000
800
600
400
200
0
0
10
20
30
40
50
60
Time (ms)
70
80
90
100
D014
Figure 20. VPE Standalone Bandwidth Profile for IVI Usecase
SPRACE3A – April 2018 – Revised October 2018
Submit Documentation Feedback
TDA2Px Performance
Copyright © 2018, Texas Instruments Incorporated
23
IVI Usecase Performance
5.5
www.ti.com
IVAHD Decode Traffic
The IVI UC7 requires decoding 2 1080i60 TV streams and encoding one 1080p30 encode. In order to
mimic this traffic a standalone codec decoder client application was run to decode a continuously looping
I, P, B and B frames. The choice of the decoded sequence was such that the IVA bandwidth requirement
was close to requirement.
With the stream HD_CR_KyuRyu.264, the IVA average bandwidth for decode was found to be 1008.2
MBps. The bandwidth profile is as shown in Figure 21.
2500
Bandwidth (MBps)
2000
1500
1000
500
0
0
10
20
30
40
50
60
Time (ms)
70
80
90
100
D015
Figure 21. IVA Standalone 1080p60 Decode Bandwidth Profile for IVI Usecase
24
TDA2Px Performance
SPRACE3A – April 2018 – Revised October 2018
Submit Documentation Feedback
Copyright © 2018, Texas Instruments Incorporated
IVI Usecase Performance
www.ti.com
5.6
IVI Usecase Integrated Bandwidth
Once the different initiators from the different sub-experiments were run together with DDR set at 666
MHz, multiple bandwidth knobs had to be employed to ensure the IVI usecase requirements are met with
no DSS underflows and in time processing of the video frames. The step-by-step application of bandwidth
knobs enable understanding where the system level bottlenecks reside.
Table 19. BW Knobs to Make IVI UC7 Work on TDA2Px
Expt Name
DDR3 - 666 MHz (DSS Adaptive MFLAG + BB2D BR + No Extra EDMA + max sys =
12 + A15 lower priority)
OCP_ CONFIG
0x0C500000
EMIF1_SYS (MBps)
2799.14
EMIF2_SYS (MBps)
2790.17
MA_MPU_P1 (MBps)
634.85
MA_MPU_P2 (MBps)
639.06
Total Avg. DDR BW
6863.21
Avg. Efficiency
64.4%
Remarks
1. No DSS Underflows.
2. IVA 1080p60, no drops.
3. BB2D higher BW (2.6GBps versus 2.1 GBps requirement)
4. MPU bandwidth meeting requirement.
BW Knobs Used
1. VPE BW Limiter to both ports (700 MBps).
2. DSS Adaptive MFLAG (50%-75% thresholds) + DSS Priority Highest
3. BB2D BW Regulator 1000 MBps.
4. Sys Threshold kept to 12.
5. MPU_MA_PRIORITY = 6
With respect to the requirement the initiator wise traffic is as given in Table 20.
Table 20. Initiator Wise Break Down of IVI UC7 Validation
UC7 Requirement From UCAD
(MBPS)
Test Traffic (MBPS)
MPU
1280.8
1273.9
Just short, with realistic GPU traffic, should adjust.
GPU
2103.72
2612.94
Mimic via BB2D 1080p 4 layer overlay. Higher BW
likely to absorb DSP and IVA difference.
Initiator
DSP
61.67
–
IVA
1348.43
962.08
CAL
0
–
DSS
1426.64
1477.33
466.56
510.19
VPE
Miscellaneous
33.03422
57.94
Total
6720.854
6863.218
Remarks
Mimic via IPBB 1080p60 decode. (Stream:
HD_CR_KyuRyu.264)
Matched VID/GFX display dimensions, format and
pixel clock rate.
IPU CPU traffic
Approximately 140 MBps higher traffic than
requirement
SPRACE3A – April 2018 – Revised October 2018
Submit Documentation Feedback
TDA2Px Performance
Copyright © 2018, Texas Instruments Incorporated
25
ADAS Usecase Performance
6
www.ti.com
ADAS Usecase Performance
TDA2Px extends the following target use cases:
• ADAS 6-8 Ch smart surround view + CMS – Capture: 6-8 Ch, 2-3ch HD display (2 x 2MP@30/60fps),
SRV processing: GPU @665 MHz, Analytics, A15 @500 MHz, IPU1@ 212 MHz, 2x DSPs, and 2x
EVEs. Function: N-view, 3D stitch view for rendering, CMS, Rear View.
The ADAS 6 Channel Surround view + ISP based system description is as shown in Figure 22. This is
primary usecase targeted for ADAS.
DC/DC
DDR3L
DDR3L
PMIC
32
SERIALIZER + DESERIALIZER
ISP
DS90UB96x
FLASH
24
2x A15
VOUT0
VOUT1
2x C66 DSP
VOUT2
I2C
RGB
Output
2x EVE
CSI-2, 2Lane
1.5 Gbps/Lane
95x
1x SGX544 MP2
95x
ISP
Ethernet
95x
DS90
UB954
2MP 30fps Cameras
2MP 30fps Cameras
95x
EMIF
CSI-2, 4Lane
1.5 Gbps/Lane
95x
95x
TDA2x (Plus)
32
2x Dual Cortex M4
SPI
MCAN
H.264 Enc/Dec
I2C
DCAN
Additional
Cameras over
Parallel
Interface
VIP, 165 MHz px clk
VIP, 165 MHz px clk
PCIe
USB
Expansion Port
Copyright © 2018, Texas Instruments Incorporated
Figure 22. TDA2Px Surround View System
26
TDA2Px Performance
SPRACE3A – April 2018 – Revised October 2018
Submit Documentation Feedback
Copyright © 2018, Texas Instruments Incorporated
ADAS Usecase Performance
www.ti.com
The expected bandwidth analysis of the usecase corresponding to 6 channel input and processing is
shown in Table 21.
Table 21. ADAS 6 Channel Surround View + CMS With ISP
Operation
Type
IP
H
V
BPP
FPS
CH
BW (MB/S)
WR
CAL
1920
1080
2
30
4
497.66
ISP read of all SV Channels
RD
ISP
1920
1080
2
30
4
497.66
4x 2MP 30fps capture for SV
(WR)
WR
ISP
1920
1080
1.5
30
4
373.25
2x1 MP@60fps capture for CMS
(WR) RAW
WR
VIP
1280
720
2
60
2
221.18
ISP read of all SV Channels
RD
ISP
1280
720
2
60
2
221.18
4x 2MP 30fps capture for SV
(WR)
WR
ISP
1280
720
1.5
60
2
165.89
RD/WR
SGX
1920
1080
3
30
1
1740.00
Display RGB24
RD
DSS
1920
1080
3
30
1
186.62
CMS O/P ( 2 displays)
RD
DSS
1280
720
1.5
60
2
165.89
Deflicker for 2x1MP@60 CMS
cameras
RD/WR
IVA+DSP
Analytics for 2MP, 3
camera@10fps
RD/WR
4x2 MP@30fps capture for SV
(WR) RAW
SGX 3D Synthesis (Output
RGB24)
1000.00
1800.00
Total
6869.34
Other variations of the ADAS usecases include:
• ADAS_4CH_SRV + ISS: Concurrent execution of the following initiators:
– CAL_4_CHANNEL_RADAR_CAPTURE
– DSS_3VID_3VENC
– GC320 traffic to mimic SGX traffic
– ISS_4CH_ISP_PROCESSING
– DSP1 and DSP2 EDMA, EVE EDMA
• ADAS_7CH_SRV_CMS_UC: Concurrent execution of the following initiators:
– CAL_VIP_7_CHANNEL_CAPTURE
– DSS_3VID_3VENC
– GC320 traffic to mimic SGX traffic
– DSP1 and DSP2 EDMA, EVE EDMA
• ADAS_8CH_SRV_CMS_UC: Concurrent execution of the following initiators:
– CAL_VIP_8_CHANNEL_CAPTURE
– DSS_3VID_3VENC
– GC320 traffic to mimic SGX traffic
– DSP1 and DSP2 EDMA, EVE EDMA
SPRACE3A – April 2018 – Revised October 2018
Submit Documentation Feedback
TDA2Px Performance
Copyright © 2018, Texas Instruments Incorporated
27
ADAS Usecase Performance
6.1
www.ti.com
Display Traffic
The display traffic as a part of the ADAS usecase is as listed below:
• One display with SV Output RGB888: 1920x1080 @ 3 bpp @ 30 FPS
• Second display CMS Output - 2 Channels: 1280x720 @ 1.5 bpp @ 60 FPS
With YUV420, the DISPC Scaler is enabled automatically (the Scaler is used to convert YUV420 to
YUV444 before YUV2RGB conversion). Scaler will request multiple lines at the start from the DMA (to prefill the Scaler line buffers), even when the display is in blanking state. This can cause DMA to underflow
without any Display sync-lost. This is a harmless condition. To avoid it you can force the DMA to pre-fetch
up to the high threshold value (set DISPC_VID3_ATTRIBUTES.BUFPRELOAD to ‘1’).
With the above settings in place the DSS was able to achieve an average bandwidth of 363.87 MBps
without any underflows and sync losses.
The DSS bandwidth profile for this configuration is shown in Figure 23.
1200
Bandwidth (MBps)
1000
800
600
400
200
0
0
5
10
15
20
25
30
Time (ms)
35
40
45
50
D016
Figure 23. DSS Standalone Bandwidth Profile for IVI Usecase Traffic
28
TDA2Px Performance
SPRACE3A – April 2018 – Revised October 2018
Submit Documentation Feedback
Copyright © 2018, Texas Instruments Incorporated
ADAS Usecase Performance
www.ti.com
6.2
ADAS 6 Channel + ISP Integrated Bandwidth
Once the different initiators from the different sub-experiments were run together with DDR set at 666
MHz, multiple bandwidth knobs had to be employed to ensure the ADAS usecase requirements are met
with no DSS underflows and in time processing of the frames. The step-by-step application of bandwidth
knobs enable understanding where the system level bottlenecks reside.
Table 22. BW Knobs to Make ADAS 6Ch SRV + ISP Work on TDA2Px
Expt Name
DDR3 - 666 MHz (ISP + DSP1 + EVE + BL & BR on BB2D + MPU Lower)
OCP_ CONFIG
0xC500000
Avg. EMIF1_SYS (MBps)
3587.231926
Avg. EMIF2_SYS (MBps)
3589.380462
Avg. MA_MPU_P1 (MBps)
208.1853929
Avg. MA_MPU_P2 (MBps)
207.546379
Total Average DDR BW
7592.34
Avg. Efficiency
71.2%
Peak. Total DDR BW (MBps)
7907.84
Peak Efficiency
74.2%
Remarks
1. DSS not underflowing (YUV420)
2. CAL no overflows.
3. DMA traffic block based (128x128) - 3 GBps.
4. ISP 6 Channel Single Pass WDR completes in time.
5. BB2D Traffic just meeting requirement. (1.67 GBps)
BW Knobs Used
1. DSS Adaptive MFLAG (50%-75% thresholds) + High Priority + BUFPRELOAD = 1
2. BB2D BW Limiter 1100 MBps and BB2D BW Regulator 900 MBps
3. MPU_MA_PRIORITY = 6
4. CAL Adaptive MFLAG (25%-75% thresholds)
5. Sys Threshold = 12
The DMA traffic was tuned further to be block based and further ISP traffic was added. With further
settings of DSS BufPreload = 1, DSS dynamic MFLAG and priority, EMIF OCP_CONFIG, and appropriate
MPU priority the ADAS usecase goals were met.
With respect to the requirement, the initiator wise traffic is given in Table 23.
Table 23. Initiator Wise Break Down of ADAS 6Ch SRV + ISP Validation
Initiator
Expected BW
(MB/S)
Test Case BW
Remarks
CAL
718.85
749.15
6x2 MP@30fps capture for SV (WR) RAW
ISP
1257.98
1279.47
RAW to YUV420
GPU
1740.00
1666.27
SGX 3D Synthesis (Output RGB24) - Slightly lower bandwidth can be
traded off with EDMA traffic in real system.
DSS
352.51
371.20
YUV 420 BUFPRELOAD = 1
2800
3010.56
Approximately 200 MBps higher than required
MPU
–
415.73
Extra traffic in the system
Total
6869.34
7492.38
Approximately 623.03 MB higher traffic than requirement
EDMAs
SPRACE3A – April 2018 – Revised October 2018
Submit Documentation Feedback
TDA2Px Performance
Copyright © 2018, Texas Instruments Incorporated
29
ADAS Usecase Performance
6.3
www.ti.com
ADAS 4 Channel SRV + ISP Integrated Bandwidth
With the settings found during the 6 Channel ADAS SRV + ISP bandwidth analysis the 4 Channel ADAS
SRV + ISP was analyzed. The expected bandwidth in this configuration is given in Table 24.
Table 24. ADAS 4 Channel SRV + ISP Expected Bandwidth Analysis
Operation
Type
IP
H
V
BPP
FPS
CH
BW (MB/S)
4x2 MP@30fps capture
for SV (WR) RAW
WR
CAL
1920
1080
2
30
4
497.66
ISP read of all SV
Channels
RD
ISP
1920
1080
2
30
4
497.66
4x 2MP 30fps capture for
SV (WR)
WR
ISP
1920
1080
1.5
30
4
373.25
RD/WR
SGX
1920
1080
3
30
1
1740.00
RD
DSS
1920
1080
3
30
1
SGX 3D Synthesis
(Output RGB24)
Display RGB24
Analytics for 2MP, 3
camera@10fps
RD/WR
186.62
1800.00
Total
5095.20
With no CAL or VIP overflows and no DSS underflows the initiator wise break down of bandwidth is as
shown in Table 25.
Table 25. Initiator Wise Break Down of ADAS 4Ch SRV + ISP Validation
Initiator
Expected BW
(MB/S)
Test Case BW
(MB/s)
Capture
497.66
436.33
4x2 MP@30fps capture for SV (WR) 12 bit packed data captured at higher
FPS.
ISP
870.91
874.72
RAW to YUV420
GPU
1740.00
1762.65
SGX 3D synthesis (Output RGB24)
DSS
186.62
371.20
6 channel DSS configuration used. Higher than required bandwidth
EDMAs
1800.00
3316.82
Approximately 1.5 GBps higher than required
Remarks
MPU
-
510.19
Extra traffic in the system
Total
5095.20
7271.91
Approximately 2.17 GBps higher traffic than requirement
7
References
•
30
TDA2Px SoC for Advanced Driver Assistance Systems (ADAS) Silicon Revision 1.0 Technical
Reference Manual
TDA2Px Performance
SPRACE3A – April 2018 – Revised October 2018
Submit Documentation Feedback
Copyright © 2018, Texas Instruments Incorporated
Revision History
www.ti.com
Revision History
NOTE: Page numbers for previous revisions may differ from page numbers in the current version.
Changes from Original (April 2018) to A Revision .......................................................................................................... Page
•
•
•
Update was made in Section 1. ......................................................................................................... 3
Update was made in Section 2.1.4. ..................................................................................................... 8
Updates were made in Section 6. ..................................................................................................... 26
SPRACE3A – April 2018 – Revised October 2018
Submit Documentation Feedback
Revision History
Copyright © 2018, Texas Instruments Incorporated
31
IMPORTANT NOTICE AND DISCLAIMER
TI PROVIDES TECHNICAL AND RELIABILITY DATA (INCLUDING DATASHEETS), DESIGN RESOURCES (INCLUDING REFERENCE
DESIGNS), APPLICATION OR OTHER DESIGN ADVICE, WEB TOOLS, SAFETY INFORMATION, AND OTHER RESOURCES “AS IS”
AND WITH ALL FAULTS, AND DISCLAIMS ALL WARRANTIES, EXPRESS AND IMPLIED, INCLUDING WITHOUT LIMITATION ANY
IMPLIED WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE OR NON-INFRINGEMENT OF THIRD
PARTY INTELLECTUAL PROPERTY RIGHTS.
These resources are intended for skilled developers designing with TI products. You are solely responsible for (1) selecting the appropriate
TI products for your application, (2) designing, validating and testing your application, and (3) ensuring your application meets applicable
standards, and any other safety, security, or other requirements. These resources are subject to change without notice. TI grants you
permission to use these resources only for development of an application that uses the TI products described in the resource. Other
reproduction and display of these resources is prohibited. No license is granted to any other TI intellectual property right or to any third
party intellectual property right. TI disclaims responsibility for, and you will fully indemnify TI and its representatives against, any claims,
damages, costs, losses, and liabilities arising out of your use of these resources.
TI’s products are provided subject to TI’s Terms of Sale (www.ti.com/legal/termsofsale.html) or other applicable terms available either on
ti.com or provided in conjunction with such TI products. TI’s provision of these resources does not expand or otherwise alter TI’s applicable
warranties or warranty disclaimers for TI products.
Mailing Address: Texas Instruments, Post Office Box 655303, Dallas, Texas 75265
Copyright © 2018, Texas Instruments Incorporated
Was this manual useful for you? yes no
Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Related manuals

Download PDF

advertising