Texas Instruments | TMS320C6000 Expansion Bus Host Interface Performance | Application notes | Texas Instruments TMS320C6000 Expansion Bus Host Interface Performance Application notes

Texas Instruments TMS320C6000 Expansion Bus Host Interface Performance Application notes
Application Report
SPRA643 - April 2000
TMS320C6000 Expansion Bus Host Port Performance
Zoran Nikolic
Digital Signal Processing Solutions
ABSTRACT
The expansion bus is a 32-bit wide bus that supports interfaces to PCI bridge chips, to
synchronous or asynchronous external masters, to a variety of asynchronous peripherals,
and to asynchronous or synchronous FIFOs. The expansion bus has two major
subblocks—the I/O port and host port Interface. This application report discusses
performance of the TMS320C6000 expansion bus host port.
The expansion bus host port performance is described here using:
•
•
•
The maximum data throughput (in Mbytes/s),
Number of clocks required to perform a transfer and,
Latencies required to start and complete a transfer.
System performance is affected by a variety of factors including:
•
The DMA auxiliary channel (which performs expansion bus transfers) competes for control of
the DMA with other DMA channels.
•
DMA controller competes with the CPU for resources (external memory interface (EMIF),
internal memory).
•
If the resource accessed is within the EMIF it is susceptible to stalls such as SDRAM page
misses, and asynchronous not ready conditions.
Many of these factors are within the system designer’s control. This application note basses
on best case performance to give the system designer an idea of the upper band of available
bandwidth.
The performance was evaluated using the VHDL simulation.
Contents
1
TMS320C6000 Expansion Bus Host Port . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1 Synchronous Host Port Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1.1 Synchronous Slave Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.1.2 Synchronous Master Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
1.2 Asynchronous Host Port Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
2
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
TMS320C6000 is a trademark of Texas Instruments.
1
SPRA643
List of Figures
Figure 1. Parameters Used to Describe Performance of the Synchronous Slave Host Port . . . . . . . . . 4
Figure 2. Data Throughput of the Synchronous Slave Host Port – Burst Read from the
Internal Data Memory. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Figure 3. Data Throughput of the Synchronous Slave Host Port – Burst Write to the
Internal Data Memory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
Figure 4. Data Throughput of the Synchronous Slave Host Port – Burst Read from the
Internal Program Memory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
Figure 5. Data Throughput of the Synchronous Slave Host Port – Burst Write to the
Internal Program Memory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
Figure 6. Data of the Synchronous Slave Host Port – Burst Read from EMIF SBSRAM . . . . . . . . . . . 17
Figure 7. Data Throughput of the Synchronous Slave Host Port – Burst Write to EMIF SBSRAM . . 20
Figure 8. Data Throughput of the Synchronous Slave Host Port – Burst Read from EMIF SDRAM . . . 22
Figure 9. Data Throughput of the Synchronous Slave Host Port – Burst Write to EMIF SDRAM . . . 24
Figure 10. Parameters Used to Describe Performance of the Synchronous Master Host Port . . . . . . . 25
Figure 11. Data Throughput of the Synchronous Master Host Port – Moving Data to the
Internal Data Memory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
Figure 12. Data Throughput of the Synchronous Master Host Port – Moving Data from the
Internal Data Memory. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
Figure 13. Data Throughput of the Synchronous Master Host Port – Moving Data to the
Internal Program Memory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
Figure 14. Data Throughput of the Synchronous Master Host Port – Moving Data from the
Internal Program Memory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
Figure 15. Data Throughput of the Synchronous Master Host Port – Moving Data to EMIF SBSRAM . . 38
Figure 16. Data Throughput of the Synchronous Master Host Port – Moving Data from
EMIF SBSRAM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
Figure 17. Data Throughput of the Synchronous Master Host Port – Moving Data to EMIF SDRAM . . . 43
Figure 18. Data Throughput of the Synchronous Master Host Port – Moving Data from EMIF SDRAM . . . 45
Figure 19. Parameter Used to Measure the Asynchronous Host Port Performance
(Tnr Measured in the DSP Clock Cycles) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
List of Tables
Table 1.
Table 2.
Table 3.
Table 4.
Table 5.
Table 6.
Table 7.
Table 8.
Table 9.
2
Transfer Duration for a Burst Read from the Internal Data Memory Using Synchronous
Slave Host Port . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
The XRDY Latency – Read from the Internal Data Memory Using Synchronous
Slave Host Port . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
Transfer Duration for a Burst Write to the Internal Data Memory Using Synchronous
Slave Host Port . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Transfer Duration for a Burst Read from the Internal Program Memory Using
Synchronous Slave Host Port . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
Latency of the XRDY – Read from the Internal Program MemoryUsing Synchronous
Slave Host Port . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
Transfer Duration for a Burst Write to the Internal Program Memory Using Synchronous
Slave Host Port . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
Transfer Duration for a Burst Read from EMIF SBSRAM Using Synchronous
Slave Host Port . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
Latency of the XRDY – Read from EMIF SBSRAM Using Synchronous Slave Host Port . . . 15
Transfer Duration for a Burst Write to EMIF SBSRAM Using Synchronous Slave
Host Port . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
TMS320C6000 Expansion Bus Host Port Performance
SPRA643
Table 10. Transfer Duration for a Burst Read from EMIF SDRAM Using Synchronous Slave
Host Port . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
Table 11. Latency of the XRDY – Read from EMIF SDRAM Using Synchronous Slave Host Port . . 21
Table 12. Transfer Duration for a Burst Write to EMIF SDRAM Using Synchronous Slave Host Port . . . 23
Table 13. Transfer Duration of a Synchronous Master Burst Read to the Internal DSP
Data Memory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
Table 14. Transfer Duration of a Synchronous Master Burst Write from the Internal DSP
Data Memory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
Table 15. Transfer Overhead for a Synchronous Master Burst to/from the Internal DSP
Data Memory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
Table 16. Transfer Duration of a Synchronous Master Burst Read to the Internal DSP
Program Memory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
Table 17. Transfer Duration of a Synchronous Master Burst from the Internal DSP
Program Memory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
Table 18. Transfer Overhead for a Synchronous Master Burst to/from the Internal DSP
Program Memory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
Table 19. Transfer Duration of a Synchronous Master Burst Read to EMIF SBSRAM . . . . . . . . . . . . . 36
Table 20. Transfer Duration of a Synchronous Master Burst Write from the EMIF SBSRAM . . . . . . . 39
Table 21. Transfer Overhead for a Synchronous Master Burst to/from EMIF SBSRAM . . . . . . . . . . . . 41
Table 22. Transfer Duration of a Synchronous Master Burst Read to EMIF SDRAM . . . . . . . . . . . . . . 41
Table 23. Transfer Duration of a Synchronous Master Burst Write from the EMIF SDRAM . . . . . . . . 44
Table 24. Transfer Overhead for a Synchronous Master Burst to/from EMIF SDRAM . . . . . . . . . . . . . 46
Table 25. Asynchronous Expansion Bus Host Port – Transfer Duration for Different
Memory Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
Table 26. Data Throughput in [Mbytes/s] for Asynchronous Host Port (DSP Clock is Set to 250MHz) . . 47
1
TMS320C6000 Expansion Bus Host Port
The TMS320C6000 expansion bus host port Interface can operate in its two modes:
synchronous and asynchronous. In synchronous mode the host port has address and data
signals multiplexed. The asynchronous mode is a slave only mode. It is similar to the HPI on the
‘C6201/C6211/C6701/C6711 and it is used to interface to microprocessors, which utilize an
asynchronous bus.
The expansion bus host channel through the DMA auxiliary port provides connectivity between
the processor and the host port interface. Dedicated address and data registers connect the
host port interface to the expansion bus host channel. An external master accesses these
registers using external data and interface control signals. In order to initiate transfers via the
synchronous host port Interface, the CPU has to configure a set of registers.
1.1
Synchronous Host Port Performance
In synchronous mode the expansion bus host port has multiplexed address and data signals.
The synchronous host port can easily interface with minimum glue to many popular processors
and PCI bridge chips. In this mode the TMS320C6000 expansion bus has capability to initiate
and receive burst transfers. By simply not initiating master transactions on the expansion bus
synchronous host port can essentially act in a slave only mode.
TMS320C6000 Expansion Bus Host Port Performance
3
SPRA643
1.1.1
Synchronous Slave Performance
The expansion bus synchronous slave host port performance depends on:
1. The ratio between DSP CPU clock rate and the expansion bus clock (XCLKIN)
2. The transfer size
3. The type of memory used as a source (destination) of a transfer (internal program
memory, internal data memory, SDRAM connected to the EMIF, or SBSRAM connected to
the EMIF could be used).
To evaluate the performance of the expansion bus synchronous slave host port a function of
ratio between DSP clock and the expansion bus clock (XCLKIN) is used. Clock ratios from 4 to
20 with increment of one were used, and at each clock ratio performance was evaluated for
bursts of 1, 2, 3, 4, 8, 16, 32, and 64 words.
The performance is characterized in two ways: as a transfer throughput (expressed in Mbytes/s) and
as duration of a transfer (expressed in the expansion bus-XCLKIN clock cycles). Transfer throughputs
in this document are calculated assuming that the DSP is running at 250MHz. Transfer rates for
different DSP clocks can be easily determined since transfer duration is presented in XCLKIN cycles
for each transfer type. Data throughput is calculated using following formula:
Data Throughput [ Mbytess ] 4 (Burst Size [ words ]) (DSP_Clock [MHz])
(Number of Cycles) (Clock Ratio)
(1)
Where NumberOfCycles stands for number of XCLKIN cycles required for completing a burst
transfer (burst size is given in a number of words transferred).
Figure 1 illustrates the parameters observed in order to describe a synchronous slave host port
transfer. Transfer duration is measured in the XCLKIN clock periods (from the XAS\ asserted to
the XBLAST\ de-asserted).
XCLKIN
XAS (input)
XBLAST (input)
XRDY (output)
Ready Latency
Transfer Duration
Figure 1. Parameters Used to Describe Performance of the Synchronous Slave Host Port
The delay that the synchronous slave host port requires to respond to an external host read
request is described by latency of the XRDY signal. The XRDY latency was also measured in
XCLKIN cycles.
4
TMS320C6000 Expansion Bus Host Port Performance
SPRA643
1.1.1.1
Synchronous Slave Transfers to/from Internal Data Memory
The number of XCLKIN cycles required to transfer a burst of data from the internal data memory
of the DSP using the synchronous slave host port is presented in Table 1. Transfer duration for
different burst lengths and different clock ratios are shown in the table. Shaded fields in the table
correspond to transfers throttled by the XRDY signal (synchronous slave host port after initial
latency occasionally deasserts the XRDY during a transfer).
Table 1. Transfer Duration for a Burst Read from the Internal Data Memory
Using Synchronous Slave Host Port
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
1
8
7
6
6
5
5
5
4
4
4
4
4
4
4
4
4
4
2
9
8
7
7
6
6
6
5
5
5
5
5
5
5
5
5
5
3
12
11
9
8
7
7
7
6
6
6
6
6
6
6
6
6
6
4
13
12
10
9
8
8
8
7
7
7
7
7
7
7
7
7
7
8
21
20
16
14
12
12
12
11
11
11
11
11
11
11
11
11
11
16
37
37
28
24
20
20
20
19
19
19
19
19
19
19
19
19
19
32
70
68
52
44
36
36
36
35
35
35
35
35
35
35
35
35
35
64
134
132
100
84
68
68
68
67
67
67
67
67
67
67
67
67
67
Bu t Leng
Burst
ngth
Number of
XCLKIN
Clocks
Clock Ratio
The latency of the XRDY signal when an external host reads from the Internal Data Memory
using the synchronous slave host port is presented in Table 2.
Table 2. The XRDY Latency – Read from the Internal Data Memory
Using Synchronous Slave Host Port
Clock Ratio
Number of
XCLKIN Clocks
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
XRDY
Latency
6
5
4
4
3
3
3
2
2
2
2
2
2
2
2
2
2
Figure 2 presents data throughput information in Mbytes/s for different clock ratios and different
burst sizes. The throughputs are calculated assuming that the DSP is running at 250MHz. Data
throughput graphs in Figure 2 are constructed by applying formula (1) to the transfer duration
numbers (Table 1). Each of the eight panels in Figure 2 is associated with one burst length. The
panels present how the clock ratio affects data throughput when size of the burst is kept fixed.
The dashed line in the figure presents theoretical maximum throughput.
TMS320C6000 Expansion Bus Host Port Performance
5
SPRA643
An external host requires the theoretical minimum of N+1 cycles to read a burst of N words from
the synchronous slave host port (address and data buses are multiplexed). The observed
transfer duration deviates from this ideal case (Figure 2) because of: 1) initial latency of the
XRDY signal and 2) inability of the synchronous slave host port to keep up with the external
host. The synchronous slave host port is not capable of keeping up with the external host for
some combinations of burst lengths and the clock ratios (DSP clock to the expansion bus clock
ratio). In cases when it is not able to keep up with the external host, the synchronous slave host
port deasserts the XRDY signal indicating not ready status.
The initial latency of the XRDY is the time required by the auxiliary DMA channel to provide the
first data demanded by the external host. For clock ratios of eleven and higher after the initial
XRDY latency the synchronous slave host port keeps the XRDY asserted indicating ready status
during rest of a transfer for all burst sizes. For clock ratios between four and seven and bursts
longer than two words the synchronous slave host port can not always keep up with the external
host. For clock ratios between eight and ten, and bursts longer than eight words, the
synchronous slave host port can not always keep up with the external host.
The difference between the calculated maximum throughput and observed throughput in
Figure 2 for clock ratios higher than eleven depends purely on initial latency of the XRDY for all
burst lengths. For bursts larger than eight words this difference is almost negligible.
6
TMS320C6000 Expansion Bus Host Port Performance
SPRA643
Burst Size = 1
9.00E+07
8.00E+07
7.00E+07
6.00E+07
5.00E+07
4.00E+07
3.00E+07
2.00E+07
1.00E+07
0.00E+00
Calculated Maximum Throughput
Observed Throughput
Calculated Maximum Throughput
1.40E+08
1.20E+08
1.00E+08
Bytes/s
Bytes/s
Observed Throughput
Burst Size = 2
8.00E+07
6.00E+07
4.00E+07
2.00E+07
4
5
6
7
8
0.00E+00
9 10 11 12 13 14 15 16 17 18 19 20
Clock Ratio
4
5
6
7
8
Burst Size = 3
Observed Throughput
Burst Size = 4
Calculated Maximum Throughput
Observed Throughput
1.60E+08
Bytes/s
Bytes/s
1.40E+08
1.20E+08
1.00E+08
8.00E+07
6.00E+07
4.00E+07
2.00E+07
0.00E+00
4
5
6
7
8
9 10 11 12 13 14 15 16 17 18 19 20
Clock Ratio
1.80E+08
1.60E+08
1.40E+08
1.20E+08
1.00E+08
8.00E+07
6.00E+07
4.00E+07
2.00E+07
0.00E+00
4
5
6
7
Calculated Maximum Throughput
8
Observed Throughput
2.50E+08
2.50E+08
2.00E+08
2.00E+08
Bytes/s
Bytes/s
Burst Size = 16
Calculated Maximum Throughput
1.50E+08
1.00E+08
4
5
6
7
8
1.50E+08
1.00E+08
0.00E+00
9 10 11 12 13 14 15 16 17 18 19 20
Clock Ratio
4
5
6
7
8
Burst Size = 32
Observed Throughput
2.50E+08
3.00E+08
2.00E+08
2.50E+08
1.50E+08
1.00E+08
9 10 11 12 13 14 15 16 17 18 19 20
Clock Ratio
Burst Size = 64
Observed Throughput
Calculated Maximum Throughput
Bytes/s
Bytes/s
Calculated Maximum Throughput
5.00E+07
5.00E+07
0.00E+00
9 10 11 12 13 14 15 16 17 18 19 20
Clock Ratio
Burst Size = 8
Observed Throughput
9 10 11 12 13 14 15 16 17 18 19 20
Clock Ratio
Calculated Maximum Throughput
2.00E+08
1.50E+08
1.00E+08
5.00E+07
5.00E+07
0.00E+00
4
5
6
7
8
9 10 11 12 13 14 15 16 17 18 19 20
Clock Ratio
0.00E+00
4
5
6
7
8
9 10 11 12 13 14 15 16 17 18 19 20
Clock Ratio
Figure 2. Data Throughput of the Synchronous Slave Host Port –
Burst Read from the Internal Data Memory.
TMS320C6000 Expansion Bus Host Port Performance
7
SPRA643
The number of XCLKIN cycles required for writing a burst of data to the internal data memory of
the DSP using the synchronous slave host port is presented in Table 3. Transfer duration for
different burst lengths and different clock ratios are shown in the table. The shaded fields in the
table correspond to transfers throttled by the XRDY signal. The synchronous slave host port
after initial latency occasionally deasserts the XRDY during a transfer.
Table 3. Transfer Duration for a Burst Write to the Internal Data Memory
Using Synchronous Slave Host Port
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
1
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
2
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
3
5
5
5
5
5
5
5
5
5
5
5
5
5
5
5
5
5
4
8
7
7
7
6
6
6
6
6
6
6
6
6
6
6
6
6
8
13
12
11
10
10
10
10
10
10
10
10
10
10
10
10
10
10
16
24
23
19
18
18
18
18
18
18
18
18
18
18
18
18
18
18
32
47
44
35
34
34
34
34
34
34
34
34
34
34
34
34
34
34
64
92
87
67
66
66
66
66
66
66
66
66
66
66
66
66
66
66
Bu t Leng
Burst
ngth
Number of
XCLKIN
Clocks
Clock Ratio
Figure 3 presents data throughput information in Mbytes/s for different clock ratios and different burst
sizes when the DSP clock is set to 250MHz. The data throughput graphs in Figure 3 are constructed
by applying formula (1) to the transfer duration numbers (Table 3). Each of eight panels in Figure 3 is
associated with one burst length. The panels present how the clock ratio affects data throughput when
size of the burst is kept fixed. The dashed line in the figure presents theoretical maximum throughput.
The external host requires the theoretical minimum of N+1 cycles to write a burst of N words to
the synchronous slave host port (address and data buses are multiplexed). The observed
transfer duration deviates from this ideal case (Figure 3) because of: 1) initial latency of the
XRDY and 2) inability of the synchronous slave host port to keep up with the external host. The
synchronous slave host port may not keep up with the external host for some combinations of
burst sizes and the clock ratios (DSP clock to the expansion bus clock ratio). When it can not
keep up with the external host the synchronous slave host port deasserts the XRDY signal
indicating not ready status.
The initial latency of the XRDY for a write is always one cycle. For the clock ratios of seven and
higher after the initial XRDY latency of one cycle, the synchronous slave host port keeps the
XRDY asserted, indicating ready status during rest of a transfer for all burst lengths. For clock
ratios between four and seven, and bursts longer than three words, the synchronous slave host
port can not always keep up with the external host.
The difference between calculated maximum throughput and observed throughput in Figure 3 for
the clock ratios higher than seven depends purely on the initial one clock latency of the XRDY for
all burst lengths. For bursts larger than eight words this difference is almost negligible.
The difference between the calculated maximum throughput and observed throughput in
Figure 3 for a burst write of one, two and three words depends purely on initial one clock latency
of the XRDY for all clock ratios.
8
TMS320C6000 Expansion Bus Host Port Performance
SPRA643
Burst Size = 1
9.00E+07
8.00E+07
7.00E+07
6.00E+07
5.00E+07
4.00E+07
3.00E+07
2.00E+07
1.00E+07
0.00E+00
Burst Size = 2
Calculated Maximum Throughput
Observed Throughput
Calculated Maximum Throughput
1.40E+08
1.20E+08
1.00E+08
Bytes/s
Bytes/s
Observed Throughput
8.00E+07
6.00E+07
4.00E+07
2.00E+07
4
5
6
7
8
0.00E+00
9 10 11 12 13 14 15 16 17 18 19 20
Clock Ratio
4
5
6
7
8
Burst Size = 3
Observed Throughput
Burst Size = 4
Calculated Maximum Throughput
Observed Throughput
1.60E+08
Bytes/s
Bytes/s
1.40E+08
1.20E+08
1.00E+08
8.00E+07
6.00E+07
4.00E+07
2.00E+07
0.00E+00
4
5
6
7
8
9 10 11 12 13 14 15 16 17 18 19 20
Clock Ratio
1.80E+08
1.60E+08
1.40E+08
1.20E+08
1.00E+08
8.00E+07
6.00E+07
4.00E+07
2.00E+07
0.00E+00
4
5
6
7
Calculated Maximum Throughput
8
Burst Size = 8
Observed Throughput
2.50E+08
2.50E+08
2.00E+08
2.00E+08
1.50E+08
1.00E+08
5.00E+07
Calculated Maximum Throughput
1.50E+08
1.00E+08
5.00E+07
0.00E+00
4
5
6
7
8
0.00E+00
9 10 11 12 13 14 15 16 17 18 19 20
Clock Ratio
4
5
6
7
8
Burst Size = 32
Observed Throughput
Observed Throughput
Calculated Maximum Throughput
2.50E+08
3.00E+08
2.00E+08
2.50E+08
1.50E+08
1.00E+08
9 10 11 12 13 14 15 16 17 18 19 20
Clock Ratio
Burst Size = 64
Bytes/s
Bytes/s
9 10 11 12 13 14 15 16 17 18 19 20
Clock Ratio
Burst Size = 16
Calculated Maximum Throughput
Bytes/s
Bytes/s
Observed Throughput
9 10 11 12 13 14 15 16 17 18 19 20
Clock Ratio
Calculated Maximum Throughput
2.00E+08
1.50E+08
1.00E+08
5.00E+07
5.00E+07
0.00E+00
0.00E+00
4
5
6
7
8
9 10 11 12 13 14 15 16 17 18 19 20
Clock Ratio
4
5
6
7
8
9 10 11 12 13 14 15 16 17 18 19 20
Clock Ratio
Figure 3. Data Throughput of the Synchronous Slave Host Port –
Burst Write to the Internal Data Memory
TMS320C6000 Expansion Bus Host Port Performance
9
SPRA643
1.1.1.2
Synchronous Slave Transfers to/from Internal Program Memory
The number of XCLKIN cycles required for reading a burst of data from the internal program
memory of the DSP using the synchronous slave host port is presented in Table 4. Transfer
duration for different burst lengths and different clock ratios are shown in the table. Shaded
fields in the table correspond to transfers throttled by the XRDY signal (synchronous slave host
port after initial latency occasionally deasserts the XRDY during a transfer).
Table 4. Transfer Duration for a Burst Read from the Internal Program Memory
Using Synchronous Slave Host Port
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
1
7
6
6
5
5
5
4
4
4
4
4
4
4
4
4
4
4
2
9
7
7
6
6
6
5
5
5
5
5
5
5
5
5
5
5
3
11
9
9
7
7
7
6
6
6
6
6
6
6
6
6
6
6
4
13
10
10
8
8
8
7
7
7
7
7
7
7
7
7
7
7
8
21
16
16
13
12
12
11
11
11
11
11
11
11
11
11
11
11
16
37
28
28
22
20
20
19
19
19
19
19
19
19
19
19
19
19
32
69
52
52
39
36
36
35
35
35
35
35
35
35
35
35
35
35
64
133
100
100
73
68
68
67
67
67
67
67
67
67
67
67
67
67
Bu t Leng
Burst
ngth
Number of
XCLKIN
Clocks
Clock Ratio
The latency of the XRDY signal when an external host reads from the internal program memory
using the synchronous slave host port is presented in Table 5.
Table 5. Latency of the XRDY – Read from the Internal Program Memory
Using Synchronous Slave Host Port
Clock Ratio
Number of
XCLKIN Clocks
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
XRDYLatency
5
4
4
3
3
3
2
2
2
2
2
2
2
2
2
2
2
Figure 4 presents the data throughput information in Mbytes/s for different clock ratios and
different burst sizes when the DSP clock is set to 250MHz. The data throughput graphs in
Figure 4 are constructed by applying formula (1) to the transfer duration numbers (Table 4).
Each of the eight panels in Figure 4 is associated with one burst length. The panels present how
the clock ratio affects data throughput when size of the burst is kept fixed. The dashed line in the
figure presents theoretical maximum throughput.
The external host requires the theoretical minimum of N+1 cycles to read a burst of N words
from the synchronous slave host port (address and data buses are multiplexed). The observed
transfer duration deviates from this ideal case (Figure 4) because of: 1) initial latency of the
XRDY and 2) incapability of the synchronous slave host port to always keep up with the external
host. The synchronous slave host port is not capable of keeping up with the external host for
some combinations of burst lengths and the clock ratios (DSP clock to the expansion bus clock
ratio). In case when it is not able to keep up with the external host the synchronous slave host
port deasserts the XRDY signal indicating not ready status.
10
TMS320C6000 Expansion Bus Host Port Performance
SPRA643
The initial latency of the XRDY is the time required by the auxiliary DMA channel to provide the
first data demanded by the external host. For clock ratios of ten and higher after the initial XRDY
latency, the synchronous slave host port keeps the XRDY asserted indicating ready status
during rest of a transfer for all burst lengths. For clock ratios between four and nine, the
synchronous slave host port can not always keep up with the external host.
The difference between the calculated maximum throughput and observed throughput in for the
clock ratios higher than nine depends purely on initial latency of the XRDY for all burst lengths.
For bursts larger than eight words this difference is almost negligible.
TMS320C6000 Expansion Bus Host Port Performance
11
SPRA643
Burst Size = 1
9.00E+07
8.00E+07
7.00E+07
6.00E+07
5.00E+07
4.00E+07
3.00E+07
2.00E+07
1.00E+07
0.00E+00
Burst Size = 2
Calculated Maximum Throughput
Observed Throughput
Calculated Maximum Throughput
1.40E+08
1.20E+08
1.00E+08
Bytes/s
Bytes/s
Observed Throughput
8.00E+07
6.00E+07
4.00E+07
2.00E+07
4
5
6
7
8
0.00E+00
9 10 11 12 13 14 15 16 17 18 19 20
Clock Ratio
4
5
6
7
8
Burst Size = 3
Observed Throughput
Burst Size = 4
Calculated Maximum Throughput
Observed Throughput
1.60E+08
Bytes/s
Bytes/s
1.40E+08
1.20E+08
1.00E+08
8.00E+07
6.00E+07
4.00E+07
2.00E+07
0.00E+00
4
5
6
7
8
Calculated Maximum Throughput
1.80E+08
1.60E+08
1.40E+08
1.20E+08
1.00E+08
8.00E+07
6.00E+07
4.00E+07
2.00E+07
0.00E+00
9 10 11 12 13 14 15 16 17 18 19 20
Clock Ratio
4
5
6
7
8
Observed Throughput
2.50E+08
2.50E+08
2.00E+08
2.00E+08
Bytes/s
Bytes/s
Calculated Maximum Throughput
1.50E+08
1.00E+08
5.00E+07
Calculated Maximum Throughput
1.50E+08
1.00E+08
5.00E+07
0.00E+00
0.00E+00
4
5
6
7
8
4
9 10 11 12 13 14 15 16 17 18 19 20
Clock Ratio
5
6
7
8
Observed Throughput
Calculated Maximum Throughput
3.00E+08
2.00E+08
2.50E+08
Bytes/s
2.50E+08
1.50E+08
1.00E+08
9 10 11 12 13 14 15 16 17 18 19 20
Clock Ratio
Burst Size = 64
Burst Size = 32
Observed Throughput
Bytes/s
9 10 11 12 13 14 15 16 17 18 19 20
Clock Ratio
Burst Size = 16
Burst Size = 8
Observed Throughput
9 10 11 12 13 14 15 16 17 18 19 20
Clock Ratio
Calculated Maximum Throughput
2.00E+08
1.50E+08
1.00E+08
5.00E+07
5.00E+07
0.00E+00
0.00E+00
4
5
6
7
8
9 10 11 12 13 14 15 16 17 18 19 20
Clock Ratio
4
5
6
7
8
9 10 11 12 13 14 15 16 17 18 19 20
Clock Ratio
Figure 4. Data Throughput of the Synchronous Slave Host Port –
Burst Read from the Internal Program Memory
12
TMS320C6000 Expansion Bus Host Port Performance
SPRA643
The number of XCLKIN cycles required for writing a burst of data to the internal program
memory of the DSP using the synchronous slave host port is presented in Table 6. The transfer
duration for different burst lengths and different clock ratios are shown in the table. The shaded
fields in the table correspond to transfers throttled by the XRDY signal. The synchronous slave
host port after initial latency occasionally deasserts the XRDY during a transfer.
Table 6. Transfer Duration for a Burst Write to the Internal Program Memory
Using Synchronous Slave Host Port
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
1
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
2
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
3
6
6
6
5
5
5
5
5
5
5
5
5
5
5
5
5
5
4
8
7
7
6
6
6
6
6
6
6
6
6
6
6
6
6
6
8
13
13
12
10
10
10
10
10
10
10
10
10
10
10
10
10
10
16
26
25
22
20
18
18
18
18
18
18
18
18
18
18
18
18
18
32
51
49
42
38
34
34
34
34
34
34
34
34
34
34
34
34
34
64
101
97
82
74
66
66
66
66
66
66
66
66
66
66
66
66
66
Bu t Leng
Burst
ngth
Number of
XCLKIN
Clocks
Clock Ratio
Figure 5 presents data throughput information in Mbytes/s for different clock ratios and different burst
sizes when the DSP clock is set to 250MHz. Data throughput graphs in Figure 5 are constructed by
applying formula (1) to the transfer duration numbers (Table 6). Each of the eight panels in Figure 5 is
associated with one burst length. The panels present how the clock ratio affects data throughput when
size of the burst is kept fixed. The dashed line in the figure presents theoretical maximum throughput.
The external host requires the theoretical minimum of N+1 cycles to write a burst of N words to
the synchronous slave host port (address and data buses are multiplexed). The observed
transfer duration deviates from this ideal case (Figure 5) because of: 1) initial latency of the
XRDY and 2) incapability of the synchronous slave host port to always keep up with the external
host. The synchronous host port is not capable of keeping up with the external host for some
combinations of burst lengths and the clock ratios (DSP clock to the expansion bus clock ratio).
In cases when it is not able to keep up with the external host, the synchronous slave host port
deasserts the XRDY signal indicating not ready status.
The initial latency of the XRDY for a write is always one cycle. For the clock ratios of eight and
higher after the initial XRDY latency of one cycle, the synchronous slave host port keeps the
XRDY asserted indicating ready status during rest of a transfer for all burst sizes. For clock ratio
seven and bursts longer than 16 words, the synchronous slave host port can not always keep up
with the external host. For clock ratios between four and six, and bursts longer than two words,
the synchronous slave host port can not always keep up with the external host.
The difference between calculated maximum throughput and observed throughput in Figure 5
for the clock ratios higher than eight depends purely on initial one clock latency of the XRDY. For
bursts larger than eight words this difference is almost negligible.
The difference between the calculated maximum throughput and observed throughput in Figure 5
for a burst write of one and two words depends purely on initial one clock latency of the XRDY for
all clock ratios.
TMS320C6000 Expansion Bus Host Port Performance
13
SPRA643
Burst Size = 1
9.00E+07
8.00E+07
7.00E+07
6.00E+07
5.00E+07
4.00E+07
3.00E+07
2.00E+07
1.00E+07
0.00E+00
Burst Size = 2
Calculated Maximum Throughput
Observed Throughput
Calculated Maximum Throughput
1.40E+08
1.20E+08
1.00E+08
Bytes/s
Bytes/s
Observed Throughput
8.00E+07
6.00E+07
4.00E+07
2.00E+07
4
5
6
7
8
0.00E+00
9 10 11 12 13 14 15 16 17 18 19 20
Clock Ratio
4
5
6
7
8
Burst Size = 3
Observed Throughput
Burst Size = 4
Calculated Maximum Throughput
Observed Throughput
1.60E+08
Bytes/s
Bytes/s
1.40E+08
1.20E+08
1.00E+08
8.00E+07
6.00E+07
4.00E+07
2.00E+07
0.00E+00
4
5
6
7
8
Calculated Maximum Throughput
1.80E+08
1.60E+08
1.40E+08
1.20E+08
1.00E+08
8.00E+07
6.00E+07
4.00E+07
2.00E+07
0.00E+00
9 10 11 12 13 14 15 16 17 18 19 20
Clock Ratio
4
5
6
7
8
Burst Size = 8
Observed Throughput
2.50E+08
2.50E+08
2.00E+08
2.00E+08
1.50E+08
1.00E+08
5.00E+07
Calculated Maximum Throughput
1.50E+08
1.00E+08
5.00E+07
0.00E+00
0.00E+00
4
5
6
7
8
9 10 11 12 13 14 15 16 17 18 19 20
Clock Ratio
4
5
6
7
8
9 10 11 12 13 14 15 16 17 18 19 20
Clock Ratio
Burst Size = 64
Burst Size = 32
Observed Throughput
Observed Throughput
Calculated Maximum Throughput
Calculated Maximum Throughput
3.00E+08
2.50E+08
2.50E+08
Bytes/s
2.00E+08
Bytes/s
9 10 11 12 13 14 15 16 17 18 19 20
Clock Ratio
Burst Size = 16
Calculated Maximum Throughput
Bytes/s
Bytes/s
Observed Throughput
9 10 11 12 13 14 15 16 17 18 19 20
Clock Ratio
1.50E+08
1.00E+08
5.00E+07
2.00E+08
1.50E+08
1.00E+08
5.00E+07
0.00E+00
0.00E+00
4
5
6
7
8
9 10 11 12 13 14 15 16 17 18 19 20
Clock Ratio
4
5
6
7
8
9 10 11 12 13 14 15 16 17 18 19 20
Clock Ratio
Figure 5. Data Throughput of the Synchronous Slave Host Port –
Burst Write to the Internal Program Memory
14
TMS320C6000 Expansion Bus Host Port Performance
SPRA643
1.1.1.3
Synchronous Slave Transfers to/from EMIF SBSRAM
The number of XCLKIN cycles required for reading a burst of data from SBSRAM connected to
EMIF using the synchronous slave host port is presented in Table 7. Transfer duration for
different burst lengths and different clock ratios are shown in the table. Shaded fields in the table
correspond to transfers throttled by the XRDY signal (synchronous slave host port after initial
latency occasionally deasserts the XRDY during a transfer).
Table 7. Transfer Duration for a Burst Read from EMIF SBSRAM
Using Synchronous Slave Host Port
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
1
10
9
8
7
6
6
6
5
5
5
5
5
5
5
5
5
5
2
13
11
10
9
7
7
7
6
6
6
6
6
6
6
6
6
6
3
17
14
12
11
9
9
8
8
7
7
7
7
7
7
7
7
7
4
20
17
15
13
11
11
10
9
9
8
8
8
8
8
8
8
8
8
34
28
24
21
18
17
15
14
13
12
12
12
12
12
12
12
12
16
62
51
43
37
32
29
27
24
23
21
20
20
20
20
20
20
20
32
118
96
80
69
60
54
49
45
41
38
36
36
36
36
36
36
36
64
230
185
155
133
116
104
94
85
79
73
68
68
68
68
68
68
68
Bu t Leng
Burst
ngth
Number of
XCLKIN
Clocks
Clock Ratio
The latency of the XRDY signal when an external host reads from EMIF SBSRAM using the
synchronous slave host port is presented in Table 8.
Table 8. Latency of the XRDY – Read from EMIF SBSRAM Using Synchronous Slave Host Port
Clock Ratio
Number of
XCLKIN Clocks
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
XRDYLatency
8
7
6
5
4
4
4
3
3
3
3
3
3
3
3
3
3
Figure 6 presents the data throughput information in Mbytes/s for different clock ratios and
different burst sizes when the DSP clock is set to 250MHz. The data throughput graphs in
Figure 6 are constructed by applying formula (1) to the transfer duration numbers (Table 7).
Each of the eight panels in Figure 6 is associated with one burst length. The panels present how
the clock ratio affects data throughput when size of the burst is kept fixed. The dashed line in the
figure presents theoretical maximum throughput.
The external host requires the theoretical minimum of N+1 cycles to read a burst of N words
from the synchronous slave host port (address and data buses are multiplexed). Observed
transfer duration deviates from this ideal case (Figure 6) because of: 1) initial latency of the
XRDY and 2) incapability of the synchronous slave host port to always keep up with the external
host. The synchronous slave host port is not capable of keeping up with the external host for
some combinations of burst lengths and the clock ratios (DSP clock to the expansion bus clock
ratio). In cases when it is not able to keep up with the external host the expansion bus the
synchronous slave host port deasserts the XRDY signal indicating not ready status.
TMS320C6000 Expansion Bus Host Port Performance
15
SPRA643
The initial latency of the XRDY is the time required by the auxiliary DMA channel to provide the
first data demanded by the external host. For clock ratios of 14 and higher after the initial XRDY
latency, the synchronous slave host port keeps the XRDY asserted indicating ready status
during the rest of a transfer for all burst lengths. For clock ratios between four and 13, the
synchronous slave host port can not always keep up with the external host.
For clock ratios higher than 14, the difference between the calculated maximum throughput and
observed throughput in Figure 6 depends purely on initial latency of the XRDY for all burst
lengths. For bursts larger than 16 words this difference is almost negligible.
16
TMS320C6000 Expansion Bus Host Port Performance
SPRA643
Burst Size = 1
9.00E+07
8.00E+07
7.00E+07
6.00E+07
5.00E+07
4.00E+07
3.00E+07
2.00E+07
1.00E+07
0.00E+00
Calculated Maximum Throughput
Observed Throughput
Calculated Maximum Throughput
1.40E+08
1.20E+08
1.00E+08
Bytes/s
Bytes/s
Observed Throughput
Burst Size = 2
8.00E+07
6.00E+07
4.00E+07
2.00E+07
4
5
6
7
8
0.00E+00
9 10 11 12 13 14 15 16 17 18 19 20
Clock Ratio
4
5
6
7
8
Burst Size = 3
Observed Throughput
Burst Size = 4
Calculated Maximum Throughput
Observed Throughput
1.60E+08
Bytes/s
Bytes/s
1.40E+08
1.20E+08
1.00E+08
8.00E+07
6.00E+07
4.00E+07
2.00E+07
0.00E+00
4
5
6
7
8
Calculated Maximum Throughput
1.80E+08
1.60E+08
1.40E+08
1.20E+08
1.00E+08
8.00E+07
6.00E+07
4.00E+07
2.00E+07
0.00E+00
9 10 11 12 13 14 15 16 17 18 19 20
Clock Ratio
4
5
6
7
8
Observed Throughput
2.50E+08
2.50E+08
2.00E+08
2.00E+08
Bytes/s
Bytes/s
Calculated Maximum Throughput
1.50E+08
1.00E+08
5.00E+07
Calculated Maximum Throughput
1.50E+08
1.00E+08
5.00E+07
0.00E+00
0.00E+00
4
5
6
7
8
4
9 10 11 12 13 14 15 16 17 18 19 20
Clock Ratio
5
6
7
8
Observed Throughput
9 10 11 12 13 14 15 16 17 18 19 20
Clock Ratio
Burst Size = 64
Burst Size = 32
Observed Throughput
Calculated Maximum Throughput
Calculated Maximum Throughput
3.00E+08
2.50E+08
2.50E+08
Bytes/s
2.00E+08
Bytes/s
9 10 11 12 13 14 15 16 17 18 19 20
Clock Ratio
Burst Size = 16
Burst Size = 8
Observed Throughput
9 10 11 12 13 14 15 16 17 18 19 20
Clock Ratio
1.50E+08
1.00E+08
5.00E+07
2.00E+08
1.50E+08
1.00E+08
5.00E+07
0.00E+00
0.00E+00
4
5
6
7
8
9 10 11 12 13 14 15 16 17 18 19 20
Clock Ratio
4
5
6
7
8
9 10 11 12 13 14 15 16 17 18 19 20
Clock Ratio
Figure 6. Data of the Synchronous Slave Host Port – Burst Read from EMIF SBSRAM
TMS320C6000 Expansion Bus Host Port Performance
17
SPRA643
The number of XCLKIN cycles required for writing a burst of data to EMIF SBSRAM of the DSP
using the synchronous slave host port is presented in Table 9. Transfer duration for different
burst lengths and different clock ratios are shown in the table. Shaded fields in the table
correspond to transfers throttled by the XRDY signal (synchronous slave host port after initial
latency occasionally deasserts the XRDY during a transfer).
Table 9. Transfer Duration for a Burst Write to EMIF SBSRAM
Using Synchronous Slave Host Port
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
1
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
2
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
3
6
6
6
5
5
5
5
5
5
5
5
5
5
5
5
5
5
4
9
8
8
7
7
6
6
6
6
6
6
6
6
6
6
6
6
8
19
18
14
13
12
11
10
10
10
10
10
10
10
10
10
10
10
16
39
32
28
24
23
21
18
18
18
18
18
18
18
18
18
18
18
32
79
64
54
47
44
40
34
34
34
34
34
34
34
34
34
34
34
64
159
128
108
93
87
78
66
66
66
66
66
66
66
66
66
66
66
Bu t Leng
Burst
ngth
Number of
XCLKIN
Clocks
Clock Ratio
Figure 7 presents the data throughput information in Mbytes/s for different clock ratios and
different burst sizes when the DSP clock is set to 250MHz. The data throughput graphs in
Figure 7 are constructed by applying formula (1) to the transfer duration numbers (Table 9).
Each of the eight panels in Figure 7 is associated with one burst length. The panels present how
the clock ratio affects data throughput when size of the burst is kept fixed. The dashed line in the
figure presents theoretical maximum throughput.
The external host requires the theoretical minimum of N+1 cycles to write a burst of N words to
the synchronous slave host port (address and data buses are multiplexed). The observed
transfer duration deviates from this ideal case (Figure 7) because of: 1) initial latency of the
XRDY and 2) incapability of the expansion bus synchronous slave to always keep up with the
external host. The synchronous slave host port is not capable of keeping up with the external
host for some combinations of burst lengths and the clock ratios (DSP clock to the expansion
bus clock ratio). In cases when it is not able to keep up with the external host, the expansion bus
synchronous slave host port deasserts the XRDY signal indicating not ready status.
The initial latency of the XRDY for a write is always one cycle. For clock ratios of ten and higher
after the initial XRDY latency of one cycle, the expansion bus synchronous slave host port keeps
the XRDY asserted indicating ready status during rest of a transfer for all burst lengths. For the
clock ratios between four and six, and bursts longer than two, the synchronous slave host port
can not always keep up with the external host. For clock ratios seven and eight, and bursts
longer than three, the synchronous slave host port can not always keep up with the external
host. For clock ratio nine and bursts longer than eight, the synchronous slave host port can not
always keep up with the external host.
18
TMS320C6000 Expansion Bus Host Port Performance
SPRA643
The difference between calculated maximum throughput and observed throughput in Figure 7
for the clock ratios higher than ten depends purely on initial one clock latency of the XRDY. For
bursts larger than four words, this difference is almost negligible.
The difference between calculated maximum throughput and observed throughput in for a burst
write of one and two words depends purely on initial one clock latency of the XRDY for all clock
ratios.
TMS320C6000 Expansion Bus Host Port Performance
19
SPRA643
Burst Size = 1
Observed Throughput
Observed Throughput
1.20E+08
1.00E+08
8.00E+07
6.00E+07
4.00E+07
2.00E+07
4
5
6
7
8
0.00E+00
9 10 11 12 13 14 15 16 17 18 19 20
Clock Ratio
4
5
6
7
8
Burst Size = 3
Observed Throughput
Observed Throughput
Bytes/s
Bytes/s
1.40E+08
1.20E+08
1.00E+08
8.00E+07
6.00E+07
4.00E+07
2.00E+07
0.00E+00
4
5
6
7
8
Calculated Maximum Throughput
1.80E+08
1.60E+08
1.40E+08
1.20E+08
1.00E+08
8.00E+07
6.00E+07
4.00E+07
2.00E+07
0.00E+00
9 10 11 12 13 14 15 16 17 18 19 20
Clock Ratio
4
5
6
7
8
Burst Size = 8
Observed Throughput
2.50E+08
2.00E+08
2.00E+08
Bytes/s
Bytes/s
Observed Throughput
2.50E+08
1.00E+08
9 10 11 12 13 14 15 16 17 18 19 20
Clock Ratio
Burst Size = 16
Calculated Maximum Throughput
1.50E+08
9 10 11 12 13 14 15 16 17 18 19 20
Clock Ratio
Burst Size = 4
Calculated Maximum Throughput
1.60E+08
5.00E+07
Calculated Maximum Throughput
1.50E+08
1.00E+08
5.00E+07
0.00E+00
0.00E+00
4
5
6
7
8
9 10 11 12 13 14 15 16 17 18 19 20
Clock Ratio
4
5
6
7
8
Burst Size = 32
Observed Throughput
9 10 11 12 13 14 15 16 17 18 19 20
Clock Ratio
Burst Size = 64
Observed Throughput
Calculated Maximum Throughput
Calculated Maximum Throughput
3.00E+08
2.50E+08
2.50E+08
Bytes/s
2.00E+08
1.50E+08
Bytes/s
Calculated Maximum Throughput
1.40E+08
Bytes/s
Bytes/s
9.00E+07
8.00E+07
7.00E+07
6.00E+07
5.00E+07
4.00E+07
3.00E+07
2.00E+07
1.00E+07
0.00E+00
Burst Size = 2
Calculated Maximum Throughput
1.00E+08
5.00E+07
2.00E+08
1.50E+08
1.00E+08
5.00E+07
0.00E+00
0.00E+00
4
5
6
7
8
9 10 11 12 13 14 15 16 17 18 19 20
Clock Ratio
4
5
6
7
8
9 10 11 12 13 14 15 16 17 18 19 20
Clock Ratio
Figure 7. Data Throughput of the Synchronous Slave Host Port –
Burst Write to EMIF SBSRAM
20
TMS320C6000 Expansion Bus Host Port Performance
SPRA643
1.1.1.4
Synchronous Slave Transfers to/from EMIF SDRAM
The number of XCLKIN cycles required for reading a burst of data from SDRAM connected to
EMIF using the synchronous slave host port is presented in Table 10. Transfer duration for
different burst lengths and different clock ratios are shown in the table. Transfer performance
to/from EMIF SDRAM depends on the SDRAM refresh rate. Shaded fields in the table
correspond to transfers throttled by the XRDY signal (the synchronous slave host port after initial
latency occasionally deasserts the XRDY during a transfer).
Table 10. Transfer Duration for a Burst Read from EMIF SDRAM
Using Synchronous Slave Host Port
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
1
15
11
10
13
8
8
7
7
6
8
7
5
5
5
5
5
5
2
19
15
12
15
10
9
8
8
7
9
8
7
6
6
6
6
6
3
23
18
15
18
12
11
10
9
9
10
9
8
7
7
7
7
7
4
27
21
18
20
14
13
12
11
10
12
11
9
8
8
8
8
8
8
54
34
36
36
28
25
19
21
19
20
18
13
12
12
12
12
13
16
98
87
65
61
55
45
44
41
30
34
31
27
28
27
25
24
24
32
196
165
138
117
104
88
79
77
66
64
59
54
48
50
48
47
45
64
381
313
269
222
191
170
162
140
136
121
112
103
100
97
94
90
87
Bu t Leng
Burst
ngth
Number of
XCLKIN
Clocks
Clock Ratio
The latency of the XRDY signal when an external host reads from EMIF SDRAM using the
synchronous slave host port is presented in Table 11.
Table 11. Latency of the XRDY – Read from EMIF SDRAM Using Synchronous Slave Host Port
Number of
XCLKIN Clocks
XRDY
Latency
Clock Ratio
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
13
9
9
11
6
6
5
5
4
6
5
3
3
3
3
3
3
Figure 8 presents the data throughput information in Mbytes/s for different clock ratios and
different burst sizes when the DSP clock is set to 250MHz. The data throughput graphs in
Figure 8 are constructed by applying formula (1) to the transfer duration numbers (Table 10).
Each of the eight panels in Figure 8 is associated with one burst length. The panels present how
the clock ratio affects data throughput when size of the burst is kept fixed. The dashed line in the
figure presents theoretical maximum throughput.
The external host requires the theoretical minimum of N+1 cycles to read a burst of N words
from the synchronous slave host port (address and data buses are multiplexed). Observed
transfer duration deviates from this ideal case (Figure 8) because of: 1) initial latency of the
XRDY and 2) incapability of the synchronous slave host port to always keep up with the external
host. In cases when it is not able to keep up with the external host, the synchronous slave host
port deasserts the XRDY signal indicating not ready status.
The initial latency of the XRDY is time required by the auxiliary DMA channel to provide the first
data demanded by the external host.
TMS320C6000 Expansion Bus Host Port Performance
21
SPRA643
Burst Size = 1
9.00E+07
8.00E+07
7.00E+07
6.00E+07
5.00E+07
4.00E+07
3.00E+07
2.00E+07
1.00E+07
0.00E+00
Burst Size = 2
Calculated Maximum Throughput
Observed Throughput
Calculated Maximum Throughput
1.40E+08
1.20E+08
1.00E+08
Bytes/s
Bytes/s
Observed Throughput
8.00E+07
6.00E+07
4.00E+07
2.00E+07
4
5
6
7
8
0.00E+00
9 10 11 12 13 14 15 16 17 18 19 20
Clock Ratio
4
5
6
7
8
Burst Size = 3
Observed Throughput
Burst Size = 4
Calculated Maximum Throughput
Observed Throughput
1.60E+08
Bytes/s
Bytes/s
1.40E+08
1.20E+08
1.00E+08
8.00E+07
6.00E+07
4.00E+07
2.00E+07
0.00E+00
4
5
6
7
8
Calculated Maximum Throughput
1.80E+08
1.60E+08
1.40E+08
1.20E+08
1.00E+08
8.00E+07
6.00E+07
4.00E+07
2.00E+07
0.00E+00
9 10 11 12 13 14 15 16 17 18 19 20
Clock Ratio
4
5
6
7
8
Burst Size = 8
Observed Throughput
2.50E+08
2.50E+08
2.00E+08
2.00E+08
1.50E+08
1.00E+08
5.00E+07
Calculated Maximum Throughput
1.50E+08
1.00E+08
5.00E+07
0.00E+00
0.00E+00
4
5
6
7
8
9 10 11 12 13 14 15 16 17 18 19 20
Clock Ratio
4
5
6
7
8
9 10 11 12 13 14 15 16 17 18 19 20
Clock Ratio
Burst Size = 64
Burst Size = 32
Observed Throughput
Observed Throughput
Calculated Maximum Throughput
Calculated Maximum Throughput
3.00E+08
2.50E+08
2.50E+08
Bytes/s
2.00E+08
Bytes/s
9 10 11 12 13 14 15 16 17 18 19 20
Clock Ratio
Burst Size = 16
Calculated Maximum Throughput
Bytes/s
Bytes/s
Observed Throughput
9 10 11 12 13 14 15 16 17 18 19 20
Clock Ratio
1.50E+08
1.00E+08
5.00E+07
2.00E+08
1.50E+08
1.00E+08
5.00E+07
0.00E+00
0.00E+00
4
5
6
7
8
9 10 11 12 13 14 15 16 17 18 19 20
Clock Ratio
4
5
6
7
8
9 10 11 12 13 14 15 16 17 18 19 20
Clock Ratio
Figure 8. Data Throughput of the Synchronous Slave Host Port –
Burst Read from EMIF SDRAM
22
TMS320C6000 Expansion Bus Host Port Performance
SPRA643
The number of XCLKIN cycles required by an external host to complete a write transfer to EMIF
SDRAM of the DSP using the expansion bus host port is presented in Table 12. Transfer
duration for different burst lengths and different clock ratios are shown in the table. The shaded
fields in the table correspond to transfers throttled by the XRDY signal The synchronous slave
host port after initial latency occasionally deasserts the XRDY during a transfer.
Table 12. Transfer Duration for a Burst Write to EMIF SDRAM
Using Synchronous Slave Host Port
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
1
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
2
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
3
6
6
6
5
5
5
5
5
5
5
5
5
5
5
5
5
5
4
12
10
9
9
8
8
7
9
8
8
7
7
8
7
7
6
6
8
35
18
16
15
13
12
17
13
12
12
14
11
12
11
11
11
10
16
55
34
38
26
22
22
30
21
24
20
25
19
20
21
20
21
21
32
131
84
80
61
54
50
46
45
47
42
50
45
44
43
42
41
40
64
233
194
157
140
123
114
101
93
93
92
94
90
94
85
84
85
84
Bu t Leng
Burst
ngth
Number of
XCLKIN
Clocks
Clock Ratio
Figure 9 presents the data throughput information in Mbytes/s for different clock ratios and
different burst sizes when the DSP clock is set to 250MHz. The data throughput graphs in
Figure 9 are constructed by applying formula (1) to the transfer duration numbers (Table 12).
Each of the eight panels in Figure 9 is associated with one burst length. The panels present how
the clock ratio affects data throughput when size of the burst is kept fixed. The dashed line in the
figure presents theoretical maximum throughput.
The external host requires the theoretical minimum of N+1 cycles to write a burst of N words to
the synchronous slave host port (address and data buses are multiplexed). The observed
transfer duration deviates from this ideal case (Figure 9) because of: 1) initial latency of the
XRDY and 2) incapability of the synchronous slave host port to always keep up with the external
host for the clock ratios lower than ten (DSP clock to the expansion bus clock ratio). In cases
when it is not able to keep up with the external host, the synchronous slave host port deasserts
the XRDY signal indicating not ready status. Initial latency of the XRDY for a write is always one
cycle.
The difference between calculated maximum throughput and observed throughput in Figure 9
for a burst write of one and two words depends purely on initial one clock latency of the XRDY
for all clock ratios.
TMS320C6000 Expansion Bus Host Port Performance
23
SPRA643
Burst Size = 1
9.00E+07
8.00E+07
7.00E+07
6.00E+07
5.00E+07
4.00E+07
3.00E+07
2.00E+07
1.00E+07
0.00E+00
Burst Size = 2
Calculated Maximum Throughput
Observed Throughput
Calculated Maximum Throughput
1.40E+08
1.20E+08
1.00E+08
Bytes/s
Bytes/s
Observed Throughput
8.00E+07
6.00E+07
4.00E+07
2.00E+07
4
5
6
7
8
0.00E+00
9 10 11 12 13 14 15 16 17 18 19 20
Clock Ratio
4
5
6
7
8
Burst Size = 3
Observed Throughput
Burst Size = 4
Calculated Maximum Throughput
Observed Throughput
1.60E+08
Bytes/s
Bytes/s
1.40E+08
1.20E+08
1.00E+08
8.00E+07
6.00E+07
4.00E+07
2.00E+07
0.00E+00
4
5
6
7
8
Calculated Maximum Throughput
1.80E+08
1.60E+08
1.40E+08
1.20E+08
1.00E+08
8.00E+07
6.00E+07
4.00E+07
2.00E+07
0.00E+00
9 10 11 12 13 14 15 16 17 18 19 20
Clock Ratio
4
5
6
7
8
Observed Throughput
2.50E+08
2.50E+08
2.00E+08
2.00E+08
Bytes/s
Bytes/s
Calculated Maximum Throughput
1.50E+08
1.00E+08
5.00E+07
Calculated Maximum Throughput
1.50E+08
1.00E+08
5.00E+07
0.00E+00
0.00E+00
4
5
6
7
8
4
9 10 11 12 13 14 15 16 17 18 19 20
Clock Ratio
5
6
7
8
9 10 11 12 13 14 15 16 17 18 19 20
Clock Ratio
Burst Size = 64
Burst Size = 32
Observed Throughput
Observed Throughput
Calculated Maximum Throughput
Calculated Maximum Throughput
3.00E+08
2.50E+08
2.50E+08
Bytes/s
2.00E+08
Bytes/s
9 10 11 12 13 14 15 16 17 18 19 20
Clock Ratio
Burst Size = 16
Burst Size = 8
Observed Throughput
9 10 11 12 13 14 15 16 17 18 19 20
Clock Ratio
1.50E+08
1.00E+08
5.00E+07
2.00E+08
1.50E+08
1.00E+08
5.00E+07
0.00E+00
0.00E+00
4
5
6
7
8
9 10 11 12 13 14 15 16 17 18 19 20
Clock Ratio
4
5
6
7
8
9 10 11 12 13 14 15 16 17 18 19 20
Clock Ratio
Figure 9. Data Throughput of the Synchronous Slave Host Port –
Burst Write to EMIF SDRAM
24
TMS320C6000 Expansion Bus Host Port Performance
SPRA643
1.1.2
Synchronous Master Performance
Performance of the expansion bus synchronous host port master mode fluctuates with change of:
1. Ratio between DSP clock and the expansion bus clock (XCLKIN)
2. Transfer size
3. Type of memory used as a source (destination) of a transfer (Internal Program Memory,
Internal Data Memory, SDRAM connected to the EMIF, or SBSRAM connected to the
EMIF could be used).
To avoid any type of contention during the performance evaluation, the CPU was in the idle state
and the auxiliary DMA channel was the only DMA channel active.
To evaluate performance of the master mode of the expansion bus synchronous host port, a
function of ratio between DSP clock and the expansion bus clock (XCLKIN) is used. Clock ratios
from 4 to 20 with an increment of one were used, and at each clock ratio performance was
evaluated for bursts of 1, 2, 3, 4, 8, 16, 32, and 64 words.
The performance is characterized in two ways: as a transfer throughput (expressed in Mbytes/s)
and as duration of a transfer (expressed in the expansion bus-XCLKIN clock cycles). Transfer
throughput in this document is calculated assuming that the DSP is running at 250MHz. Transfer
rates for different DSP clocks can be easily determined since the transfer duration table in
XCLKIN cycles is presented for each transfer type. Data rates are calculated using formula (1).
Figure 10 illustrates the parameters observed in order to describe a master transfer initiated by the
expansion bus synchronous host port. Active transfer duration is measured in the XCLKIN clock
periods (from the XAS\ asserted to the XBLAST\ deasserted). Three types of overhead can be
added to the active transfer duration in order to describe total duration of a master transfer: transfer
start overhead, transfer end overhead and software overhead required to set up a master transfer.
START bit–field in the
XBHC is written
Transfer Start
Overhead
Transfer End
Overhead
DSPINT
interrupt
is set
DSP clock
START[1:0]
XAS (output)
XBLAST (output)
DSPINT interrupt
(internal signal)
XCLKIN
Active
Transfer Duration
Figure 10. Parameters Used to Describe Performance of the Synchronous Master Host Port
TMS320C6000 Expansion Bus Host Port Performance
25
SPRA643
Expansion bus synchronous master transfer is initiated by writing to the START bit-field of the
expansion bus host port Control (XBHC) register. Due to combination of the internal delays and
overhead of the auxiliary DMA channel the active transfer does not start immediately after
writing to the START bit-field. This delay between writing to the START bit-field and the /XAS
asserted is described as transfer start overhead. Transfer start overhead is expressed in the
DSP clocks between the moment when the START bit-field is set, and the beginning of an active
transfer (/XAS asserted).
The DSPINT interrupt flag announces completion of the expansion bus synchronous master
transfer. Due to combination of the internal delays and overhead of the auxiliary DMA channel,
the active transfer does not complete internally immediately after the /XBLAST is deasserted.
This delay between rising edge of the /XBLAST and the DSPINT interrupt is described as
transfer end overhead (Figure 10). Transfer end overhead is expressed in the DSP clocks
between the /XBLAST deasserted, and the DSPINT interrupt.
Software overhead required to program and start a master transfer is around 10 DSP cycles (set
the XBIMA, XBEA and XBHC registers). If all parameters required for initiating the transfer are
stored in the internal DSP memory, the 10 DSP cycles execute for 10 DSP clocks.
To calculate data throughput in this document only active transfer duration is used.
1.1.2.1
Synchronous Master Transfers to/from Internal Data Memory
Number of XCLKIN cycles required by the expansion bus synchronous host port to complete a
master read transfer from an external device to the Internal DSP Data memory is presented in
Table 13. Transfer duration for different burst lengths and different clock ratios are shown in the
table. Shaded fields in the table correspond to transfers throttled by the XWAIT signal
(synchronous master host port occasionally asserts the XWAIT during a transfer).
Table 13. Transfer Duration of a Synchronous Master Burst Read to the Internal DSP Data Memory
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
1
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
6
6
6
5
5
5
5
5
5
5
5
5
5
5
5
5
5
8
12
11
12
9
9
9
9
9
9
9
9
9
9
9
9
9
9
16
23
22
24
17
17
17
17
17
17
17
17
17
17
17
17
17
17
32
45
43
48
33
33
33
33
33
33
33
33
33
33
33
33
33
33
64
90
86
96
65
65
65
65
65
65
65
65
65
65
65
65
65
65
Bu t Leng
Burst
ngth
Number of
XCLKIN
Clocks
26
Clock Ratio
TMS320C6000 Expansion Bus Host Port Performance
SPRA643
Figure 11 presents data throughput information in Mbytes/s for different clock ratios and different
burst sizes when the DSP clock is set to 250MHz. Data throughput graphs in Figure 11 are
constructed by applying formula (1) to the transfer duration numbers (Table 13). Each of eight
panels in Figure 11 is associated with one burst length. The panels present how the clock ratio
affects data throughput when size of the burst is kept fixed. The dashed line in the figure
presents theoretical maximum throughput.
The expansion bus synchronous master requires the theoretical minimum of N+1 cycles to read a
burst of N words from an external device (address and data buses are multiplexed). Observed
transfer duration deviates from this ideal case (Figure 11) because of: 1) the external target is not
always ready to send a new data (indicated by the XRDY signal) and 2) the expansion bus
synchronous master is incapable of receiving a new data every clock. In the test setup, the external
target was capable of sending a new data on every rising edge of the clock ,therefore, observed
performance deviates from the ideal solely because of the synchronous master host port. In cases
when it is not able to keep accepting a new data every clock, the expansion bus synchronous master
asserts the XWAIT signal indicating not ready status. In these cases ,the transfer duration is longer
than in ideal case and therefore the throughput is lower. For the clock ratios between four and seven
(DSP clock to the expansion bus clock ratio) and bursts longer than three words the expansion bus
synchronous master is not capable of receiving a new data every clock.
The difference between calculated maximum throughput and observed throughput in Figure 11 for
the clock ratios higher than seven is zero for all burst lengths. This is due to capability of the
expansion bus synchronous master to receive a new data every cycle (XWAIT never gets asserted).
The difference between calculated maximum throughput and observed throughput for burst reads of
one, two and three words is zero for all clock ratios due to internal register pipeline.
TMS320C6000 Expansion Bus Host Port Performance
27
SPRA643
Burst Size = 1
9.00E+07
8.00E+07
7.00E+07
6.00E+07
5.00E+07
4.00E+07
3.00E+07
2.00E+07
1.00E+07
0.00E+00
Burst Size = 2
Calculated Maximum Throughput
Observed Throughput
Calculated Maximum Throughput
1.40E+08
1.20E+08
1.00E+08
Bytes/s
Bytes/s
Observed Throughput
8.00E+07
6.00E+07
4.00E+07
2.00E+07
4
5
6
7
8
0.00E+00
9 10 11 12 13 14 15 16 17 18 19 20
Clock Ratio
4
5
6
7
8
Burst Size = 3
Observed Throughput
Burst Size = 4
Calculated Maximum Throughput
Observed Throughput
1.60E+08
Bytes/s
Bytes/s
1.40E+08
1.20E+08
1.00E+08
8.00E+07
6.00E+07
4.00E+07
2.00E+07
0.00E+00
4
5
6
7
8
Calculated Maximum Throughput
1.80E+08
1.60E+08
1.40E+08
1.20E+08
1.00E+08
8.00E+07
6.00E+07
4.00E+07
2.00E+07
0.00E+00
9 10 11 12 13 14 15 16 17 18 19 20
Clock Ratio
4
5
6
7
8
Observed Throughput
2.50E+08
2.50E+08
2.00E+08
2.00E+08
Bytes/s
Bytes/s
Calculated Maximum Throughput
1.50E+08
1.00E+08
5.00E+07
Calculated Maximum Throughput
1.50E+08
1.00E+08
5.00E+07
0.00E+00
0.00E+00
4
5
6
7
8
4
9 10 11 12 13 14 15 16 17 18 19 20
Clock Ratio
5
6
7
8
Burst Size = 32
Observed Throughput
9 10 11 12 13 14 15 16 17 18 19 20
Clock Ratio
Burst Size = 64
Observed Throughput
Calculated Maximum Throughput
2.50E+08
Calculated Maximum Throughput
3.00E+08
2.50E+08
Bytes/s
2.00E+08
Bytes/s
9 10 11 12 13 14 15 16 17 18 19 20
Clock Ratio
Burst Size = 16
Burst Size = 8
Observed Throughput
9 10 11 12 13 14 15 16 17 18 19 20
Clock Ratio
1.50E+08
1.00E+08
5.00E+07
2.00E+08
1.50E+08
1.00E+08
5.00E+07
0.00E+00
0.00E+00
4
5
6
7
8
9 10 11 12 13 14 15 16 17 18 19 20
Clock Ratio
4
5
6
7
8
9 10 11 12 13 14 15 16 17 18 19 20
Clock Ratio
Figure 11. Data Throughput of the Synchronous Master Host Port –
Moving Data to the Internal Data Memory
28
TMS320C6000 Expansion Bus Host Port Performance
SPRA643
Number of XCLKIN cycles required by the expansion bus synchronous host port to complete a
master write transfer to an external device from the Internal DSP Data memory is presented in
Table 14. Transfer duration for different burst lengths and different clock ratios are shown in the
table. Shaded fields in the table correspond to transfers throttled by the XWAIT signal
(synchronous master host port occasionally asserts the XWAIT during a transfer).
Table 14. Transfer Duration of a Synchronous Master Burst Write
from the Internal DSP Data Memory
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
1
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
5
5
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
6
6
5
5
5
5
5
5
5
5
5
5
5
5
5
5
5
8
14
14
11
10
9
9
9
9
9
9
9
9
9
9
9
9
9
16
30
30
24
21
17
17
17
17
17
17
17
17
17
17
17
17
17
32
63
62
47
40
33
33
33
33
33
33
33
33
33
33
33
33
33
64
127
126
95
80
65
65
65
65
65
65
65
65
65
65
65
65
65
Bu t Leng
Burst
ngth
Number of
XCLKIN
Clocks
Clock Ratio
Figure 12 presents data throughput information in Mbytes/s for different clock ratios and different
burst sizes when the DSP clock is set to 250MHz. Data throughput graphs in Figure 12 are
constructed by applying formula (1) to the transfer duration numbers (Table 14). Each of eight
panels in Figure 12 is associated with one burst length. The panels present how the clock ratio
affects data throughput when size of the burst is kept fixed. The dashed line in the figure
presents theoretical maximum throughput.
The expansion bus synchronous master requires the theoretical minimum of N+1 cycles to write
a burst of N words to an external device (address and data buses are multiplexed). Observed
transfer duration deviates from this ideal case (Figure 12) because of: 1) the external target is
not always ready to accept a new data (indicated by the XRDY signal) and 2) incapability of the
expansion bus synchronous master to keep sending a new data every clock. In the test setup,
the external target was capable of receiving a new data on every rising edge of the clock,
therefore, observed performance deviates from the ideal solely because of the synchronous
master host port. In case when it is not able to keep up sending a new data every clock the
expansion bus synchronous master asserts the XWAIT signal indicating not ready status. In
these cases the transfer duration is longer than in ideal case and therefore the throughput is
lower. For the clock ratios between four and eight (DSP clock to the expansion bus clock ratio)
and bursts longer than two words, the expansion bus synchronous master is not capable of
sending a new data every clock.
The difference between calculated maximum throughput and observed throughput in Figure 12
for the clock ratios higher than eight is zero for all burst sizes. This is due to capability of the
expansion bus synchronous master to send a new data every cycle (XWAIT never gets
asserted). The difference between calculated maximum throughput and observed throughput for
burst writes of one and two words is zero for all clock ratios due to internal register pipeline.
TMS320C6000 Expansion Bus Host Port Performance
29
SPRA643
Burst Size = 1
Burst Size = 2
Calculated Maximum Throughput
9.00E+07
8.00E+07
7.00E+07
6.00E+07
5.00E+07
4.00E+07
3.00E+07
2.00E+07
1.00E+07
0.00E+00
Observed Throughput
Calculated Maximum Throughput
1.40E+08
1.20E+08
1.00E+08
Bytes/s
Bytes/s
Observed Throughput
8.00E+07
6.00E+07
4.00E+07
2.00E+07
0.00E+00
4
5
6
7
8
4
9 10 11 12 13 14 15 16 17 18 19 20
Clock Ratio
5
6
7
8
Burst Size = 3
Observed Throughput
Burst Size = 4
Calculated Maximum Throughput
Observed Throughput
1.60E+08
Bytes/s
Bytes/s
1.40E+08
1.20E+08
1.00E+08
8.00E+07
6.00E+07
4.00E+07
2.00E+07
0.00E+00
4
5
6
7
8
Calculated Maximum Throughput
1.80E+08
1.60E+08
1.40E+08
1.20E+08
1.00E+08
8.00E+07
6.00E+07
4.00E+07
2.00E+07
0.00E+00
9 10 11 12 13 14 15 16 17 18 19 20
Clock Ratio
4
5
6
7
8
Burst Size = 8
Observed Throughput
2.50E+08
2.50E+08
2.00E+08
2.00E+08
1.50E+08
1.00E+08
5.00E+07
Calculated Maximum Throughput
1.50E+08
1.00E+08
5.00E+07
0.00E+00
0.00E+00
4
5
6
7
8
9 10 11 12 13 14 15 16 17 18 19 20
Clock Ratio
4
5
6
7
8
Observed Throughput
9 10 11 12 13 14 15 16 17 18 19 20
Clock Ratio
Burst Size = 64
Burst Size = 32
Observed Throughput
Calculated Maximum Throughput
Calculated Maximum Throughput
3.00E+08
2.50E+08
2.50E+08
Bytes/s
2.00E+08
Bytes/s
9 10 11 12 13 14 15 16 17 18 19 20
Clock Ratio
Burst Size = 16
Calculated Maximum Throughput
Bytes/s
Bytes/s
Observed Throughput
9 10 11 12 13 14 15 16 17 18 19 20
Clock Ratio
1.50E+08
1.00E+08
5.00E+07
2.00E+08
1.50E+08
1.00E+08
5.00E+07
0.00E+00
0.00E+00
4
5
6
7
8
9 10 11 12 13 14 15 16 17 18 19 20
Clock Ratio
4
5
6
7
8
9 10 11 12 13 14 15 16 17 18 19 20
Clock Ratio
Figure 12. Data Throughput of the Synchronous Master Host Port –
Moving Data from the Internal Data Memory.
30
TMS320C6000 Expansion Bus Host Port Performance
SPRA643
Due to two different clock domains (the expansion bus clock and the DSP clock) Start and End
Transfer overhead are difficult to predict. The overhead numbers vary with different phase
relationship between the clocks. Table 15 gives a rough estimate of the transfer overheads.
Table 15. Transfer Overhead for a Synchronous Master Burst to/from the Internal
DSP Data Memory
Clock Ratio
N mber of DSP Clocks
Number
4
8
12
16
20
Start Transfer Overhead
12
16
17
24
27
End Transfer Overhead
9
8
8
8
8
Start Transfer Overhead
28
38
45
50
57
End Transfer Overhead
4
4
4
4
4
READ
WRITE
1.1.2.2
Synchronous Master Transfers to/from Internal Program Memory
The number of XCLKIN cycles required by the expansion bus synchronous host port to complete
a master read transfer from an external device to the Internal DSP Data memory is presented in
Table 16. Transfer duration for different burst lengths and different clock ratios are shown in the
table. Shaded fields in the table correspond to transfers throttled by the XWAIT signal
(synchronous master host port occasionally asserts the XWAIT during a transfer).
Table 16. Transfer Duration of a Synchronous Master Burst Read to the Internal
DSP Program Memory
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
1
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
6
6
6
5
5
5
5
5
5
5
5
5
5
5
5
5
5
8
12
11
11
9
10
9
9
9
9
9
9
9
9
9
9
9
9
16
25
23
21
19
18
17
17
17
17
17
17
17
17
17
17
17
17
32
50
47
41
38
34
33
33
33
33
33
33
33
33
33
33
33
33
64
99
95
81
73
66
65
65
65
65
65
65
65
65
65
65
65
65
Bu t Leng
Burst
ngth
Number of
XCLKIN
Clocks
Clock Ratio
Figure 13 presents data throughput information in Mbytes/s for different clock ratios and different
burst sizes when the DSP clock is set to 250MHz. Data throughput graphs in Figure 13 are
constructed by applying formula (1) to the transfer duration numbers (Table 16). Each of eight
panels in Figure 13 is associated with one burst length. The panels present how the clock ratio
affects data throughput when size of the burst is kept fixed. The dashed line in the figure
presents theoretical maximum throughput.
TMS320C6000 Expansion Bus Host Port Performance
31
SPRA643
The expansion bus synchronous master requires the theoretical minimum of N+1 cycles to read
a burst of N words from an external device (address and data buses are multiplexed). Observed
transfer duration deviates from this ideal case (Figure 13) because of: 1) the external target is
not always ready to send a new data (indicated by the XRDY signal) and 2) the expansion bus
synchronous master’s incapability to keep receiving a new data every clock. In the test setup,
the external target was capable of sending a new data on every rising edge of the clock
therefore observed performance deviates from the ideal solely because of the synchronous
master host port. In cases when it is not able to keep up accepting a new data every clock the
expansion bus synchronous master asserts the XWAIT signal indicating not ready status. In
these cases, the transfer duration is longer than in the the ideal case and therefore the
throughput is lower. For the clock ratios between four and nine (DSP clock to the expansion bus
clock ratio) and bursts longer than three words, the expansion bus synchronous master is not
capable of receiving a new data every clock.
The difference between calculated maximum throughput and observed throughput in Figure 13
for the clock ratios higher than nine is zero for all burst lengths. This is due to capability of the
expansion bus synchronous master to receive a new data every cycle (XWAIT never gets
asserted). The difference between calculated maximum throughput and observed throughput for
burst reads of one, two and three words is zero for all clock ratios due to internal register
pipeline.
32
TMS320C6000 Expansion Bus Host Port Performance
SPRA643
Burst Size = 1
Burst Size = 2
Calculated Maximum Throughput
9.00E+07
8.00E+07
7.00E+07
6.00E+07
5.00E+07
4.00E+07
3.00E+07
2.00E+07
1.00E+07
0.00E+00
Observed Throughput
Calculated Maximum Throughput
1.40E+08
1.20E+08
1.00E+08
Bytes/s
Bytes/s
Observed Throughput
8.00E+07
6.00E+07
4.00E+07
2.00E+07
0.00E+00
4
5
6
7
8
4
9 10 11 12 13 14 15 16 17 18 19 20
Clock Ratio
5
6
7
8
Burst Size = 3
Observed Throughput
Burst Size = 4
Calculated Maximum Throughput
Observed Throughput
1.60E+08
Bytes/s
Bytes/s
1.40E+08
1.20E+08
1.00E+08
8.00E+07
6.00E+07
4.00E+07
2.00E+07
0.00E+00
4
5
6
7
8
Calculated Maximum Throughput
1.80E+08
1.60E+08
1.40E+08
1.20E+08
1.00E+08
8.00E+07
6.00E+07
4.00E+07
2.00E+07
0.00E+00
9 10 11 12 13 14 15 16 17 18 19 20
Clock Ratio
4
5
6
7
8
Burst Size = 8
Observed Throughput
2.50E+08
2.50E+08
2.00E+08
2.00E+08
1.50E+08
1.00E+08
5.00E+07
1.50E+08
1.00E+08
0.00E+00
4
5
6
7
8
9 10 11 12 13 14 15 16 17 18 19 20
Clock Ratio
4
5
6
7
8
9 10 11 12 13 14 15 16 17 18 19 20
Clock Ratio
Burst Size = 64
Burst Size = 32
Observed Throughput
Observed Throughput
Calculated Maximum Throughput
Calculated Maximum Throughput
3.00E+08
2.50E+08
2.50E+08
Bytes/s
2.00E+08
Bytes/s
Calculated Maximum Throughput
5.00E+07
0.00E+00
1.50E+08
1.00E+08
5.00E+07
0.00E+00
9 10 11 12 13 14 15 16 17 18 19 20
Clock Ratio
Burst Size = 16
Calculated Maximum Throughput
Bytes/s
Bytes/s
Observed Throughput
9 10 11 12 13 14 15 16 17 18 19 20
Clock Ratio
2.00E+08
1.50E+08
1.00E+08
5.00E+07
4
5
6
7
8
9 10 11 12 13 14 15 16 17 18 19 20
Clock Ratio
0.00E+00
4
5
6
7
8
9 10 11 12 13 14 15 16 17 18 19 20
Clock Ratio
Figure 13. Data Throughput of the Synchronous Master Host Port –
Moving Data to the Internal Program Memory
TMS320C6000 Expansion Bus Host Port Performance
33
SPRA643
Number of XCLKIN cycles required by the expansion bus synchronous host port to complete a
master write transfer to an external device from the Internal DSP Data memory is presented in
Table 17. Transfer duration for different burst lengths and different clock ratios are shown in the
table. Shaded fields in the table correspond to transfers throttled by the XWAIT signal
(synchronous master host port occasionally asserts the XWAIT during a transfer).
Table 17. Transfer Duration of a Synchronous Master Burst from the Internal
DSP Program Memory
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
1
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
5
5
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
6
6
5
5
5
5
5
5
5
5
5
5
5
5
5
5
5
8
14
13
11
9
9
9
9
9
9
9
9
9
9
9
9
9
9
16
30
25
23
18
17
17
17
17
17
17
17
17
17
17
17
17
17
32
63
49
47
35
33
33
33
33
33
33
33
33
33
33
33
33
33
64
126
97
95
69
65
65
65
65
65
65
65
65
65
65
65
65
65
Bu t Leng
Burst
ngth
Number of
XCLKIN
Clocks
Clock Ratio
Figure 14 presents data throughput information in Mbytes/s for different clock ratios and different
burst sizes when the DSP clock is set to 250MHz. Data throughput graphs in Figure 14 are
constructed by applying formula (1) to the transfer duration numbers (Table 17). Each of eight
panels in Figure 14 is associated with one burst length. The panels present how the clock ratio
affects data throughput when size of the burst is kept fixed. The dashed line in the figure
presents theoretical maximum throughput.
The expansion bus synchronous master requires the theoretical minimum of N+1 cycles to write
a burst of N words to an external device (address and data buses are multiplexed). Observed
transfer duration deviates from this ideal case (Figure 14) because of: 1) the external target is
not always ready to accept a new data (indicated by the XRDY signal) and 2) incapability of the
expansion bus synchronous master to keep up sending a new data every clock. In the test
setup, the external target was capable of receiving a new data on every rising edge of the clock
therefore observed performance deviates from the ideal solely because of the synchronous
master host port. In cases when it is not able to keep sending a new data every clock, the
expansion bus synchronous master asserts the XWAIT signal indicating not ready status. In
these cases, the transfer duration is longer than in the ideal case and therefore the throughput is
lower. For the clock ratios between four and eight (DSP clock to the expansion bus clock ratio)
and bursts longer than two, the expansion bus synchronous master is not capable of sending a
new data every clock.
The difference between calculated maximum throughput and observed throughput in Figure 14
for the clock ratios higher than eight is zero for all burst lengths. This is due to capability of the
expansion bus synchronous master to send a new data every cycle (XWAIT never gets
asserted). The difference between calculated maximum throughput and observed throughput for
burst writes of one and two words is zero for all clock ratios due to internal register pipeline.
34
TMS320C6000 Expansion Bus Host Port Performance
SPRA643
Burst Size = 1
Burst Size = 2
Calculated Maximum Throughput
9.00E+07
8.00E+07
7.00E+07
6.00E+07
5.00E+07
4.00E+07
3.00E+07
2.00E+07
1.00E+07
0.00E+00
Observed Throughput
Calculated Maximum Throughput
1.40E+08
1.20E+08
1.00E+08
Bytes/s
Bytes/s
Observed Throughput
8.00E+07
6.00E+07
4.00E+07
2.00E+07
0.00E+00
4
5
6
7
8
4
9 10 11 12 13 14 15 16 17 18 19 20
Clock Ratio
5
6
7
8
Burst Size = 3
Observed Throughput
Burst Size = 4
Calculated Maximum Throughput
Observed Throughput
1.60E+08
Bytes/s
Bytes/s
1.40E+08
1.20E+08
1.00E+08
8.00E+07
6.00E+07
4.00E+07
2.00E+07
0.00E+00
4
5
6
7
8
Calculated Maximum Throughput
1.80E+08
1.60E+08
1.40E+08
1.20E+08
1.00E+08
8.00E+07
6.00E+07
4.00E+07
2.00E+07
0.00E+00
9 10 11 12 13 14 15 16 17 18 19 20
Clock Ratio
4
5
6
7
8
Burst Size = 8
Observed Throughput
2.50E+08
2.50E+08
2.00E+08
2.00E+08
1.50E+08
1.00E+08
5.00E+07
Calculated Maximum Throughput
1.50E+08
1.00E+08
5.00E+07
0.00E+00
0.00E+00
4
5
6
7
8
9 10 11 12 13 14 15 16 17 18 19 20
Clock Ratio
4
5
6
7
8
9 10 11 12 13 14 15 16 17 18 19 20
Clock Ratio
Burst Size = 64
Burst Size = 32
Observed Throughput
Observed Throughput
Calculated Maximum Throughput
Calculated Maximum Throughput
3.00E+08
2.50E+08
2.50E+08
Bytes/s
2.00E+08
Bytes/s
9 10 11 12 13 14 15 16 17 18 19 20
Clock Ratio
Burst Size = 16
Calculated Maximum Throughput
Bytes/s
Bytes/s
Observed Throughput
9 10 11 12 13 14 15 16 17 18 19 20
Clock Ratio
1.50E+08
1.00E+08
5.00E+07
2.00E+08
1.50E+08
1.00E+08
5.00E+07
0.00E+00
0.00E+00
4
5
6
7
8
9 10 11 12 13 14 15 16 17 18 19 20
Clock Ratio
4
5
6
7
8
9 10 11 12 13 14 15 16 17 18 19 20
Clock Ratio
Figure 14. Data Throughput of the Synchronous Master Host Port – Moving Data
from the Internal Program Memory
TMS320C6000 Expansion Bus Host Port Performance
35
SPRA643
Due to two different clock domains (the expansion bus clock and the DSP clock) Start and End
Transfer overhead are difficult to predict. The overhead numbers vary with different phase
relationship between the clocks. Table 18 gives a rough estimate of the transfer overheads.
Table 18. Transfer Overhead for a Synchronous Master Burst to/from the Internal
DSP Program Memory
Clock Ratio
N mber of DSP Clocks
Number
4
8
12
16
20
Start Transfer Overhead
11
15
17
24
29
End Transfer Overhead
8
9
8
12
8
Start Transfer Overhead
28
30
33
41
57
End Transfer Overhead
4
4
4
4
4
READ
WRITE
1.1.2.3
Synchronous Master Transfers to/from EMIF SBSRAM
The number of XCLKIN cycles required by the expansion bus synchronous host port to complete
a master read transfer from an external device to the EMIF SBSRAM is presented in Table 19.
Transfer duration for different burst lengths and different clock ratios are shown in the table.
Shaded fields in the table correspond to transfers throttled by the XWAIT signal (the
synchronous master host port occasionally asserts the XWAIT during a transfer).
Table 19. Transfer Duration of a Synchronous Master Burst Read to EMIF SBSRAM
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
1
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
8
7
6
6
6
5
5
5
5
5
5
5
5
5
5
5
5
8
18
15
13
12
11
10
9
9
9
9
9
9
9
9
9
9
9
16
35
31
25
23
22
20
17
17
17
17
17
17
17
17
17
17
17
32
78
61
51
46
43
39
33
33
33
33
33
33
33
33
33
33
33
64
158
127
106
92
86
77
65
65
65
65
65
65
65
65
65
65
65
Burst
Bu t Leng
ngth
Number of
XCLKIN
Clocks
36
Clock Ratio
TMS320C6000 Expansion Bus Host Port Performance
SPRA643
Figure 15 presents data throughput information in Mbytes/s for different clock ratios and different
burst sizes when the DSP clock is set to 250MHz. Data throughput graphs in are constructed by
applying formula (1) to the transfer duration numbers (Table 19). Each of eight panels in
Figure 15 is associated with one burst length. The panels present how the clock ratio affects
data throughput when size of the burst is kept fixed. The dashed line in the figure presents
theoretical maximum throughput.
The expansion bus synchronous master requires the theoretical minimum of N+1 cycles to read
a burst of N words from an external device (address and data buses are multiplexed). Observed
transfer duration deviates from this ideal case (Figure 15) because of: 1) the external target is
not always ready to send a new data (indicated by the XRDY signal) and 2) the expansion bus
synchronous master’s incapability to keep receiving a new data every clock. In the test setup the
external target was capable of sending a new data on every rising edge of the clock therefore
observed performance deviates from the ideal solely because of the synchronous master host
port. In case when it is not able to keep up accepting a new data every clock the expansion bus
synchronous master asserts the XWAIT signal indicating not ready status. In these cases the
transfer duration is longer than in ideal case and therefore the throughput is lower. For the clock
ratios between four and ten (DSP clock to the expansion bus clock ratio) and bursts longer than
three the expansion bus synchronous master is not capable of receiving a new data every clock.
The difference between calculated maximum throughput and observed throughput in Figure 15
for the clock ratios higher than ten is zero for all burst sizes. This is due to capability of the
expansion bus synchronous master to receive a new data every cycle (XWAIT never gets
asserted). The difference between calculated maximum throughput and observed throughput for
burst reads of one, two and three words is zero for all clock ratios due to internal register
pipeline.
TMS320C6000 Expansion Bus Host Port Performance
37
SPRA643
Burst Size = 1
Burst Size = 2
Calculated Maximum Throughput
9.00E+07
8.00E+07
7.00E+07
6.00E+07
5.00E+07
4.00E+07
3.00E+07
2.00E+07
1.00E+07
0.00E+00
Observed Throughput
Calculated Maximum Throughput
1.40E+08
1.20E+08
1.00E+08
Bytes/s
Bytes/s
Observed Throughput
8.00E+07
6.00E+07
4.00E+07
2.00E+07
0.00E+00
4
5
6
7
8
4
9 10 11 12 13 14 15 16 17 18 19 20
Clock Ratio
5
6
7
8
Burst Size = 3
Observed Throughput
Burst Size = 4
Calculated Maximum Throughput
Observed Throughput
1.60E+08
Bytes/s
Bytes/s
1.40E+08
1.20E+08
1.00E+08
8.00E+07
6.00E+07
4.00E+07
2.00E+07
0.00E+00
4
5
6
7
8
Calculated Maximum Throughput
1.80E+08
1.60E+08
1.40E+08
1.20E+08
1.00E+08
8.00E+07
6.00E+07
4.00E+07
2.00E+07
0.00E+00
9 10 11 12 13 14 15 16 17 18 19 20
Clock Ratio
4
5
6
7
8
Burst Size = 8
Observed Throughput
2.50E+08
2.50E+08
2.00E+08
2.00E+08
1.50E+08
1.00E+08
5.00E+07
Calculated Maximum Throughput
1.50E+08
1.00E+08
5.00E+07
0.00E+00
0.00E+00
4
5
6
7
8
9 10 11 12 13 14 15 16 17 18 19 20
Clock Ratio
4
5
6
7
8
9 10 11 12 13 14 15 16 17 18 19 20
Clock Ratio
Burst Size = 64
Burst Size = 32
Observed Throughput
Observed Throughput
Calculated Maximum Throughput
Calculated Maximum Throughput
3.00E+08
2.50E+08
2.50E+08
Bytes/s
2.00E+08
Bytes/s
9 10 11 12 13 14 15 16 17 18 19 20
Clock Ratio
Burst Size = 16
Calculated Maximum Throughput
Bytes/s
Bytes/s
Observed Throughput
9 10 11 12 13 14 15 16 17 18 19 20
Clock Ratio
1.50E+08
1.00E+08
5.00E+07
2.00E+08
1.50E+08
1.00E+08
5.00E+07
0.00E+00
0.00E+00
4
5
6
7
8
9 10 11 12 13 14 15 16 17 18 19 20
Clock Ratio
4
5
6
7
8
9 10 11 12 13 14 15 16 17 18 19 20
Clock Ratio
Figure 15. Data Throughput of the Synchronous Master Host Port –
Moving Data to EMIF SBSRAM
38
TMS320C6000 Expansion Bus Host Port Performance
SPRA643
The number of XCLKIN cycles required by the expansion bus synchronous host port to complete
a master write transfer to an external device from the EMIF SBSRAM is presented in Table 20.
Transfer duration for different burst lengths and different clock ratios are shown in the table.
Shaded fields in the table correspond to transfers throttled by the XWAIT signal (the
synchronous master host port occasionally asserts the XWAIT during a transfer).
Table 20. Transfer Duration of a Synchronous Master Burst Write from the EMIF SBSRAM
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
1
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
7
5
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
10
8
7
6
5
5
5
5
5
5
5
5
5
5
5
5
5
8
24
19
16
14
13
11
11
9
10
9
9
9
9
9
9
9
9
16
52
42
37
30
27
24
22
20
18
18
17
17
17
17
17
17
17
32
108
87
72
62
55
49
45
41
38
35
33
33
33
33
33
33
33
64
220
176
147
126
111
99
89
77
75
69
65
65
65
65
65
65
65
Bu t Leng
Burst
ngth
Number of
XCLKIN
Clocks
Clock Ratio
Figure 16 presents data throughput information in Mbytes/s for different clock ratios and different
burst sizes when the DSP clock is set to 250MHz. Data throughput graphs in Figure 16 are
constructed by applying formula (1) to the transfer duration numbers (Table 20). Each of eight
panels in is associated with one burst length. The panels present how the clock ratio affects data
throughput when size of the burst is kept fixed. The dashed line in the figure presents theoretical
maximum throughput.
The expansion bus synchronous master requires the theoretical minimum of N+1 cycles to write
a burst of N words to an external device (address and data buses are multiplexed). Observed
transfer duration deviates from this ideal case (Figure 16) because of: 1) external target is not
always ready to accept a new data (indicated by the XRDY signal) and 2) incapability of the
expansion bus synchronous master to keep up sending a new data every clock. In the test
setup, the external target was capable of receiving a new data on every rising edge of the clock
therefore observed performance deviates from the ideal solely because of the synchronous
master host port. In cases when it is not able to keep up sending a new data every clock the
expansion bus synchronous master asserts the XWAIT signal indicating not ready status. In
these cases the transfer duration is longer than in ideal case and therefore the throughput is
lower. For the clock ratios between four and 13 (DSP clock to the expansion bus clock ratio) and
bursts longer than two, the expansion bus synchronous master is not capable of sending a new
data every clock.
Difference between calculated maximum throughput and observed throughput in Figure 16 for
the clock ratios higher than 13 is zero. This is due to capability of the expansion bus
synchronous master to send a new data every cycle (XWAIT never gets asserted). Difference
between calculated maximum throughput and observed throughput for burst writes of one and
two words is zero for all clock ratios due to internal register pipeline.
TMS320C6000 Expansion Bus Host Port Performance
39
SPRA643
Burst Size = 1
Burst Size = 2
Calculated Maximum Throughput
9.00E+07
8.00E+07
7.00E+07
6.00E+07
5.00E+07
4.00E+07
3.00E+07
2.00E+07
1.00E+07
0.00E+00
Observed Throughput
Calculated Maximum Throughput
1.40E+08
1.20E+08
1.00E+08
Bytes/s
Bytes/s
Observed Throughput
8.00E+07
6.00E+07
4.00E+07
2.00E+07
0.00E+00
4
5
6
7
8
4
9 10 11 12 13 14 15 16 17 18 19 20
Clock Ratio
5
6
7
8
Burst Size = 3
Observed Throughput
Burst Size = 4
Calculated Maximum Throughput
Observed Throughput
1.60E+08
Bytes/s
Bytes/s
1.40E+08
1.20E+08
1.00E+08
8.00E+07
6.00E+07
4.00E+07
2.00E+07
0.00E+00
4
5
6
7
8
Calculated Maximum Throughput
1.80E+08
1.60E+08
1.40E+08
1.20E+08
1.00E+08
8.00E+07
6.00E+07
4.00E+07
2.00E+07
0.00E+00
9 10 11 12 13 14 15 16 17 18 19 20
Clock Ratio
4
5
6
7
8
Burst Size = 8
Observed Throughput
2.50E+08
2.50E+08
2.00E+08
2.00E+08
1.50E+08
1.00E+08
5.00E+07
Calculated Maximum Throughput
1.50E+08
1.00E+08
5.00E+07
0.00E+00
0.00E+00
4
5
6
7
8
9 10 11 12 13 14 15 16 17 18 19 20
Clock Ratio
4
5
6
7
8
Observed Throughput
9 10 11 12 13 14 15 16 17 18 19 20
Clock Ratio
Burst Size = 64
Burst Size = 32
Observed Throughput
Calculated Maximum Throughput
Calculated Maximum Throughput
3.00E+08
2.50E+08
2.50E+08
Bytes/s
2.00E+08
Bytes/s
9 10 11 12 13 14 15 16 17 18 19 20
Clock Ratio
Burst Size = 16
Calculated Maximum Throughput
Bytes/s
Bytes/s
Observed Throughput
9 10 11 12 13 14 15 16 17 18 19 20
Clock Ratio
1.50E+08
1.00E+08
5.00E+07
2.00E+08
1.50E+08
1.00E+08
5.00E+07
0.00E+00
0.00E+00
4
5
6
7
8
9 10 11 12 13 14 15 16 17 18 19 20
Clock Ratio
4
5
6
7
8
9 10 11 12 13 14 15 16 17 18 19 20
Clock Ratio
Figure 16. Data Throughput of the Synchronous Master Host Port –
Moving Data from EMIF SBSRAM
40
TMS320C6000 Expansion Bus Host Port Performance
SPRA643
Due to two different clock domains (the expansion bus clock and the DSP clock) Start and End
Transfer overhead are difficult to predict. The overhead numbers vary with different phase
relationship between the clocks. Table 21 gives a rough estimate of the transfer overheads.
Table 21. Transfer Overhead for a Synchronous Master Burst to/from EMIF SBSRAM
Clock Ratio
N mber of DSP Clocks
Number
4
8
12
16
20
Start Transfer Overhead
9
16
17
24
29
End Transfer Overhead
20
15
14
12
12
Start Transfer Overhead
48
54
57
57
77
End Transfer Overhead
4
4
4
4
4
READ
WRITE
1.1.2.4
Synchronous Master Transfers to/from EMIF SDRAM
The number of XCLKIN cycles required by the expansion bus synchronous host port to complete
a master read transfer from an external device to the EMIF SDRAM is presented in Table 22.
Transfer duration for different burst lengths and different clock ratios are shown in the table.
Transfer performance to/from EMIF SDRAM depends on the SDRAM refresh rate. Shaded fields
in the table correspond to transfers throttled by the XWAIT signal (synchronous master host port
occasionally asserts the XWAIT during a transfer).
Table 22. Transfer Duration of a Synchronous Master Burst Read to EMIF SDRAM
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
1
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
17
11
8
8
7
7
6
6
6
7
6
6
5
5
5
5
5
8
27
19
15
21
12
17
10
15
10
11
10
10
9
12
11
11
11
16
47
35
28
32
22
27
18
28
18
19
18
18
17
20
19
19
18
32
109
84
70
68
53
55
43
44
48
41
43
41
41
44
41
42
42
64
233
184
152
146
122
109
98
100
91
91
93
90
91
88
85
84
83
Bu t Leng
Burst
ngth
Number of
XCLKIN
Clocks
Clock Ratio
TMS320C6000 Expansion Bus Host Port Performance
41
SPRA643
Figure 17 presents data throughput information in Mbytes/s for different clock ratios and different
burst sizes when the DSP clock is set to 250MHz. Data throughput graphs in Figure 17 are
constructed by applying formula (1) to the transfer duration numbers (Table 22). Each of eight
panels in Figure 17 is associated with one burst length. The panels present how the clock ratio
affects data throughput when the size of the burst is kept fixed. The dashed line in the figure
presents theoretical maximum throughput.
The expansion bus synchronous master requires the theoretical minimum of N+1 cycles to read
a burst of N words from an external device (address and data buses are multiplexed). Observed
transfer duration deviates from this ideal case (Figure 17) because of: 1) external target is not
always ready to send a new data (indicated by the XRDY signal) and 2) the expansion bus
synchronous master’s incapability to keep up receiving a new data every clock. In the test setup,
the external target was capable of sending a new data on every rising edge of the clock,
therefore, observed performance deviates from the ideal solely because of the synchronous
master host port. In cases when it is not able to keep up accepting a new data every clock, the
expansion bus synchronous master asserts the XWAIT signal indicating not ready status. In
these cases, the transfer duration is longer than in ideal case and therefore the throughput is
lower. For higher clock ratios (DSP clock to the expansion bus clock ratio) the expansion bus
synchronous master asserts the XWAIT signal less frequently and the calculated maximum
throughput curve becomes very close to observed throughput.
The difference between the calculated maximum throughput and observed throughput for burst
reads of one, two and three words is zero for all clock ratios due to internal register pipeline.
42
TMS320C6000 Expansion Bus Host Port Performance
SPRA643
Burst Size = 1
Burst Size = 2
Calculated Maximum Throughput
9.00E+07
8.00E+07
7.00E+07
6.00E+07
5.00E+07
4.00E+07
3.00E+07
2.00E+07
1.00E+07
0.00E+00
Observed Throughput
Calculated Maximum Throughput
1.40E+08
1.20E+08
1.00E+08
Bytes/s
Bytes/s
Observed Throughput
8.00E+07
6.00E+07
4.00E+07
2.00E+07
0.00E+00
4
5
6
7
8
4
9 10 11 12 13 14 15 16 17 18 19 20
Clock Ratio
5
6
7
8
Burst Size = 3
Observed Throughput
Burst Size = 4
Calculated Maximum Throughput
Observed Throughput
1.60E+08
Bytes/s
Bytes/s
1.40E+08
1.20E+08
1.00E+08
8.00E+07
6.00E+07
4.00E+07
2.00E+07
0.00E+00
4
5
6
7
8
Calculated Maximum Throughput
1.80E+08
1.60E+08
1.40E+08
1.20E+08
1.00E+08
8.00E+07
6.00E+07
4.00E+07
2.00E+07
0.00E+00
9 10 11 12 13 14 15 16 17 18 19 20
Clock Ratio
4
5
6
7
8
Burst Size = 8
Observed Throughput
2.50E+08
2.50E+08
2.00E+08
2.00E+08
1.50E+08
1.00E+08
5.00E+07
Calculated Maximum Throughput
1.50E+08
1.00E+08
5.00E+07
0.00E+00
0.00E+00
4
5
6
7
8
9 10 11 12 13 14 15 16 17 18 19 20
Clock Ratio
4
5
6
7
8
Observed Throughput
9 10 11 12 13 14 15 16 17 18 19 20
Clock Ratio
Burst Size = 64
Burst Size = 32
Observed Throughput
Calculated Maximum Throughput
Calculated Maximum Throughput
3.00E+08
2.50E+08
2.50E+08
Bytes/s
2.00E+08
Bytes/s
9 10 11 12 13 14 15 16 17 18 19 20
Clock Ratio
Burst Size = 16
Calculated Maximum Throughput
Bytes/s
Bytes/s
Observed Throughput
9 10 11 12 13 14 15 16 17 18 19 20
Clock Ratio
1.50E+08
1.00E+08
5.00E+07
2.00E+08
1.50E+08
1.00E+08
5.00E+07
0.00E+00
0.00E+00
4
5
6
7
8
9 10 11 12 13 14 15 16 17 18 19 20
Clock Ratio
4
5
6
7
8
9 10 11 12 13 14 15 16 17 18 19 20
Clock Ratio
Figure 17. Data Throughput of the Synchronous Master Host Port –
Moving Data to EMIF SDRAM
TMS320C6000 Expansion Bus Host Port Performance
43
SPRA643
The number of XCLKIN cycles required by the expansion bus synchronous host port to complete
a master write transfer to an external device from EMIF SDRAM is presented in Table 23.
Transfer duration for different burst lengths and different clock ratios are shown in the table.
Shaded fields in the table correspond to transfers throttled by the XWAIT signal (synchronous
master host port occasionally asserts the XWAIT during a transfer).
Table 23. Transfer Duration of a Synchronous Master Burst Write from the EMIF SDRAM
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
1
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
17
7
5
4
7
7
4
4
4
4
4
4
4
4
4
4
4
4
21
9
7
6
9
9
5
5
5
5
5
5
5
5
5
5
5
8
49
22
18
22
23
21
15
15
14
13
13
11
9
11
11
11
11
16
85
74
62
40
45
35
37
31
29
27
25
19
17
19
23
23
22
32
190
153
127
103
94
84
76
70
65
57
53
49
45
44
46
44
43
64
386
292
243
221
192
171
146
141
130
114
109
104
97
91
90
88
87
Bu t Leng
Burst
ngth
Number of
XCLKIN
Clocks
Clock Ratio
Figure 18 presents data throughput information in Mbytes/s for different clock ratios and different
burst sizes when the DSP clock is set to 250MHz. Data throughput graphs in Figure 18 are
constructed by applying formula (1) to the transfer duration numbers (Table 23). Each of eight
panels in Figure 18 is associated with one burst length. The panels present how the clock ratio
affects data throughput when size of the burst is kept fixed. Dashed line in the figure presents
theoretical maximum throughput.
The expansion bus synchronous master requires the theoretical minimum of N+1 cycles to write
a burst of N words to an external device (address and data buses are multiplexed). Observed
transfer duration deviates from this ideal case (Figure 18) because of: 1) external target is not
always ready to accept a new data (indicated by the XRDY signal) and 2) incapability of the
expansion bus synchronous master to keep up sending a new data every clock. In the test
setup, the external target is capable of receiving a new data on every rising edge of the clock,
therefore, observed performance deviates from the ideal solely because of the synchronous
master host port. In cases when it is not able to keep up sending a new data every clock the
expansion bus synchronous master asserts the XWAIT signal indicating not ready status. In
these cases the transfer duration is longer than in ideal case and therefore the throughput is
lower. For higher clock ratios (DSP clock to the expansion bus clock ratio) the expansion bus
synchronous master is asserting the XWAIT less frequently and calculated maximum throughput
curve becomes very close to observed throughput.
The difference between calculated maximum throughput and observed throughput for burst
writes of one and two words is zero for all clock ratios due to the internal register pipeline.
44
TMS320C6000 Expansion Bus Host Port Performance
SPRA643
Burst Size = 1
Burst Size = 2
Calculated Maximum Throughput
9.00E+07
8.00E+07
7.00E+07
6.00E+07
5.00E+07
4.00E+07
3.00E+07
2.00E+07
1.00E+07
0.00E+00
Observed Throughput
Calculated Maximum Throughput
1.40E+08
1.20E+08
1.00E+08
Bytes/s
Bytes/s
Observed Throughput
8.00E+07
6.00E+07
4.00E+07
2.00E+07
0.00E+00
4
5
6
7
8
4
9 10 11 12 13 14 15 16 17 18 19 20
Clock Ratio
5
6
7
8
Burst Size = 3
Observed Throughput
Burst Size = 4
Calculated Maximum Throughput
Observed Throughput
1.60E+08
Bytes/s
Bytes/s
1.40E+08
1.20E+08
1.00E+08
8.00E+07
6.00E+07
4.00E+07
2.00E+07
0.00E+00
4
5
6
7
8
Calculated Maximum Throughput
1.80E+08
1.60E+08
1.40E+08
1.20E+08
1.00E+08
8.00E+07
6.00E+07
4.00E+07
2.00E+07
0.00E+00
9 10 11 12 13 14 15 16 17 18 19 20
Clock Ratio
4
5
6
7
8
Burst Size = 8
Observed Throughput
2.50E+08
2.50E+08
2.00E+08
2.00E+08
1.50E+08
1.00E+08
5.00E+07
Calculated Maximum Throughput
1.50E+08
1.00E+08
5.00E+07
0.00E+00
0.00E+00
4
5
6
7
8
9 10 11 12 13 14 15 16 17 18 19 20
Clock Ratio
4
5
6
7
8
Observed Throughput
9 10 11 12 13 14 15 16 17 18 19 20
Clock Ratio
Burst Size = 64
Burst Size = 32
Observed Throughput
Calculated Maximum Throughput
Calculated Maximum Throughput
3.00E+08
2.50E+08
2.50E+08
Bytes/s
2.00E+08
Bytes/s
9 10 11 12 13 14 15 16 17 18 19 20
Clock Ratio
Burst Size = 16
Calculated Maximum Throughput
Bytes/s
Bytes/s
Observed Throughput
9 10 11 12 13 14 15 16 17 18 19 20
Clock Ratio
1.50E+08
1.00E+08
5.00E+07
2.00E+08
1.50E+08
1.00E+08
5.00E+07
0.00E+00
0.00E+00
4
5
6
7
8
9 10 11 12 13 14 15 16 17 18 19 20
Clock Ratio
4
5
6
7
8
9 10 11 12 13 14 15 16 17 18 19 20
Clock Ratio
Figure 18. Data Throughput of the Synchronous Master Host Port –
Moving Data from EMIF SDRAM
TMS320C6000 Expansion Bus Host Port Performance
45
SPRA643
Due to two different clock domains (the expansion bus clock and the DSP clock) Start and End
Transfer overhead are very difficult to predict. The overhead numbers vary with different phase
relationship between the clocks. Table 24 gives a rough estimate of the transfer overheads.
Table 24. Transfer Overhead for a Synchronous Master Burst to/from EMIF SDRAM
Clock Ratio
N mber of DSP Clocks
Number
4
8
12
16
20
Start Transfer Overhead
12
17
20
24
28
End Transfer Overhead
23
18
17
14
17
Start Transfer Overhead
61
71
80
105
90
End Transfer Overhead
4
4
4
4
4
READ
WRITE
1.2
Asynchronous Host Port Performance
The asynchronous host port is slave only, it is using 32-bit data path, and it is similar to the HPI
on the ’C6201. This mode is used to interface to genuine asynchronous microprocessor busses.
If the expansion bus host port is configured to operate in asynchronous mode, the /XCS signal is
used for four purposes:
1. To select expansion bus host port as a target of an external master
2. On a read, the falling edge of /XCS initiates read accesse
3. On a write, its rising edge initiates write accesses
4. Falling edges latch expansion bus host port control inputs including: XW/R, XBE[3:0], and
XCNTL
The XRDY signal of the expansion bus functions differently than the ’C6201 HPI READY signal.
The XRDY signal indicates normally not ready condition (the active low READY signal is
internally OR-ed with the XCS signal in order to obtain XRDY).
The performance of the expansion bus asynchronous port is characterized by counting number
of the DSP clock cycles the XRDY\ signal stays high after a transaction on the XCS\ signal (this
parameter is marked as Tnr). The parameter is illustrated in Figure 19.
DSP clock
XCS (input)
XRDY (output)
Tnr
Figure 19. Parameter Used to Measure the Asynchronous Host Port Performance
(Tnr Measured in the DSP Clock Cycles)
46
TMS320C6000 Expansion Bus Host Port Performance
SPRA643
The number of DSP clock cycles needed to perform a transfer to/from the asynchronous host
port is presented in Table 25. Note that the XRDY signal waveform follows a pattern (due to an
internal register pipeline).
Table 25. Asynchronous Expansion Bus Host Port – Transfer Duration
for Different Memory Types
Source/Destination Memory Type
Internal
Data
Memory
Internal
Program
Memory
EMIF
SBSRAM
EMIF
SDRAM
0
0
0
200
2nd word
0
0
0
0
3rd word, etc.
0
0
0
0
1st word
18
20
29
31
2nd word
0
0
5
50
3rd word
0
0
13
6
4th word
6
8
5
62
5th word
5
0
13
6
6th word
5
8
5
52
7th word, etc.
5
0
13
5
Number of DSP Clocks
1st word
WRITE
READ
Transfer duration numbers presented in Table 25 can be used to calculate data throughput of
the asynchronous expansion bus host port. Assuming that the external host is capable of
de-asserting the /XCS on same clock edge when detects the XRDY asserted.
Data throughput depends on a burst length since the XRDY signal follows a pattern due to the
internal register pipeline. After the initial transient interval, the XRDY signal follows a pattern with
the period of two DSP clock cycles. The minimum transfer duration (assuming always ready
condition) is eight DSP clocks (the minimum pulse duration of the /XCS low and the minimum
pulse duration of the /XCS high are individually four DSP clocks).
Assuming the DSP clock speed of 250MHz, and infinitely long data burst (the throughput is not
affected by the initial transient interval), data throughput to different memory types is presented
in Table 26.
Table 26. Data Throughput in [Mbytes/s] for Asynchronous Host Port
(DSP Clock is Set to 250MHz)
Source/Destination Memory Type
2
Mbytes/s
Internal
Data
Memory
Internal
Program
Memory
EMIF
SBSRAM
EMIF
SDRAM
WRITE
125
125
125
125
READ
100
95.2
71.4
31
References
1. TMS320C6000 Peripherals Reference Guide, SPRU190.
2. TMS320C6202 Fixed-Point Digital Signal Processor Data Sheet, SPRS072.
TMS320C6000 Expansion Bus Host Port Performance
47
IMPORTANT NOTICE
Texas Instruments and its subsidiaries (TI) reserve the right to make changes to their products or to discontinue
any product or service without notice, and advise customers to obtain the latest version of relevant information
to verify, before placing orders, that information being relied on is current and complete. All products are sold
subject to the terms and conditions of sale supplied at the time of order acknowledgment, including those
pertaining to warranty, patent infringement, and limitation of liability.
TI warrants performance of its semiconductor products to the specifications applicable at the time of sale in
accordance with TI’s standard warranty. Testing and other quality control techniques are utilized to the extent
TI deems necessary to support this warranty. Specific testing of all parameters of each device is not necessarily
performed, except those mandated by government requirements.
Customers are responsible for their applications using TI components.
In order to minimize risks associated with the customer’s applications, adequate design and operating
safeguards must be provided by the customer to minimize inherent or procedural hazards.
TI assumes no liability for applications assistance or customer product design. TI does not warrant or represent
that any license, either express or implied, is granted under any patent right, copyright, mask work right, or other
intellectual property right of TI covering or relating to any combination, machine, or process in which such
semiconductor products or services might be or are used. TI’s publication of information regarding any third
party’s products or services does not constitute TI’s approval, warranty or endorsement thereof.
Copyright  2000, Texas Instruments Incorporated
Was this manual useful for you? yes no
Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Download PDF

advertising