Texas Instruments | Extended Precision Radix-4 Fast Fourier Transform Implemented on the TMS320C62x | Application notes | Texas Instruments Extended Precision Radix-4 Fast Fourier Transform Implemented on the TMS320C62x Application notes

Texas Instruments Extended Precision Radix-4 Fast Fourier Transform Implemented on the TMS320C62x Application notes
Application Report
SPRA297 - November 2002
Extended Precision Radix-4 Fast Fourier Transform
Implemented on the TMS320C62x
Robert Matusiak
Digital Signal Processing Solutions
ABSTRACT
This application report discusses a method by which the Texas Instruments TMS320C62x
high-performance, fixed-point digital signal processors (DSPs) overcome the traditional
advantage held by floating-point DSPs – precision and speed.
Using the Radix-4 Fast Fourier Transform (FFT), this document illustrates how extended
precision arithmetic, multiplication in particular, can be performed on the C62x. Using the
techniques outlined here, the 16-bit multipliers of the C62x can exceed the performance of
the 32-bit floating-point arithmetic logic units and multipliers found in floating-point DSPs.
List of Figures
Figure 1.
Figure 2.
Figure 3.
Figure 4
32-Bit Multiplication Using 16-Bit Multiplies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
C-Source Listing for an Extended Precision Multiply . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
C62x Assembly Listing for a C-Callable Extended Precision Multiply . . . . . . . . . . . . . . . . . . . .
C62x C-Callable Assembly Language Functin Source Listing for an Extended Precision
Radix-4 FFT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2
3
4
5
The TMS320C62x generation of high-performance fixed-point DSPs features two independent
16-bit multiplier units. Because the multiplier units have been designed primarily for the
processing of 16-bit data, special consideration must be made for implementing algorithms that
require multiplication of numbers with precision of greater than 16 bits. This spplication report
uses the Radix-4 Fast Fourier Transform (FFT) as an example of how extended precision
arithmetic, multiplication in particular, can be performed on the C62x. It is in the findings of this
exercise that the C62x can exceed the performance of floating-point DSPs using the techniques
outlined.
Typically, the two most notable advantages that a floating-point DSP has over a fixed-point DSs
is precision and range. Most floating-point DSPs feature 32-bit floating-point arithmetic logic
units (ALUs), multipliers, and a 32-bit register file, whereas most fixed-point DSPs feature 16-bit
integer units and a register file. In addition, the floating-point arithmetic units feature hardware
that allows numbers to be represented in a wider range than in fixed-point. Floating-point units
have the ability to automatically scale numbers. For example, in a 16-bit fixed-point adder unit if
we added two numbers together producing a result that was larger than 16-bits, an overflow
would occur, and the result would be erroneous. Whereas, a floating-point adder would detect
the condition, scale the number, and move the decimal point to the right. In effect, the
floating-point adder gains precision in the integer portion of the result, and loses precision from
the fraction portion of the result.
TMS320C62x and C62x are trademarks of Texas Instruments.
Trademarks are the property of their respective owners.
1
SPRA297
Because the C62x DSP features 32-bit ALUs and a 32-bit register file, the precision of
floating-point DSPs can be easily achieved. The only stumbling block is that the C62x features a
16-bit multiplier. However, we will show how the C62x can easily perform a 32-bit multiply.
The C62x can perform extended precision multiplication by performing several 16-bit multiplies.
In the case of performing a 32-bit multiply, four 16-bit multiplies and some additional arithmetic
are required. Let’s take a look at how we can multiply two 32-bit numbers, A times B, using
16-bit multiplies. Figure 1 pictorially describes how this would be performed. Note that in the
multiplies the u and s to the right and/or left of the multiplication symbol indicate whether the
operand is signed or unsigned. Also, it should be noted that a 32-bit multiply generates a 64-bit
result. In the example, we keep the most significant, or upper 32-bits. Figure 2 shows a listing of
a C function for an extended precision multiply. Figure 3 shows a listing of a C62x C-callable
assembly function for an extended precision multiply.
Figure 4 contains the source listing for an extended precision radix-4 FFT implemented as a
C62x C-callable assembly language function. This implementation executes a 1024 point radix-4
FFT in 704 usec. Using the 32-bit multiplication technique, we can see that the C62x can
perform computations with the precision comparable to 32-bit floating-point processors, at a
performance greater than most floating-point processors.
32
16
A
B
x
AH
AL
AH
AL
64
AL(U)*(U) BL
SSSSSSSS
x
SSSSSSSS
AH(S)*(U) BL
0000000
SSSSSSSS
AL(U)*(S) BH
0000000
AH(S)*(S) BH
RND
0000000
A*B
A*B
Figure 1. 32-Bit Multiplication Using 16-Bit Multiplies
2
Extended Precision Radix-4 Fast Fourier Transform Implemented on the TMS320C62x
SPRA297
/***********************************************************************
dpmpy() – function used to multiply two signed 32-bit integers and
return the most significant 32-bits of the result.
***********************************************************************/
int dpmpy(int A, int B)
{
int AhBh, AhBl, AlBh, AlBhH, AhBlH;
unsigned int AlBl, AhBlL, AlBhL, ABLl;
short Ah, Bh;
unsigned short Al, Bl;
long ABL;
int ABLov, ABH;
Ah = A>>16;
Bh = B>>16;
Al = A & 0x0000FFFF;
Bl = B & 0x0000FFFF;
AhBh = Ah * Bh;
AlBl = Al * Bl;
AlBh = Al * Bh;
AhBl = Ah * Bl;
AhBlH = AhBl >> 16;
AhBlL = AhBl << 16;
AlBhH = AlBh >> 16;
AlBhL = AlBh << 16;
ABL = AlBl + AlBhL;
ABL = ABL + AhBlL;
ABLov = (int)(ABL >> 32);
ABLl = (unsigned int)(ABL & 0xffffffff);
ABH = AhBh + AhBlH + AlBhH + ABLov;
return(ABH<<1);
}
Figure 2. C-Source Listing for an Extended Precision Multiply
Extended Precision Radix-4 Fast Fourier Transform Implemented on the TMS320C62x
3
SPRA297
;******************************************************************************
; dpmpy.asm – C6x assembly source code for a fixed-point double precision
; multiply C-callable assembly language function. The functions take 2 32-bit
; signed integers and performs a 32-bit by 32-bit multiply which produces a
; 64 bit product. The upper 32-bits of the product are returned as a signed
; integer. The C6x CPU core has 61-bit multipliers, thus 4 16-bit multiplies
; is required to realize a 32-bit multiply.
;******************************************************************************
; PROTOTYPE
;
;
int dpmpy(int, int);
;
;******************************************************************************
; USAGE
;
;
int A, B, prod;
;
;
prod = dpmpy(A,B);
;
;******************************************************************************
.global _dpmpy
A
B
AlBl
AhBl
AlBh
AhBh
AlBhH
AhBlH
AlBhL
AhBlL
ABH
return
ABLo
ABLe
.set
.set
.set
.set
.set
.set
.set
.set
.set
.set
.set
.set
.set
.set
a4
b4
a1
b1
a2
b2
a3
b7
a8
b6
b5
a4
a7
a6
_dpmpy:
||
mpyhslu .m1x
mpyhslu .m2x
B,A,AlBh
A,B,AhBl
; Al u*s Bh
; Ah s*u Bl
||
mpyu
mpyh
.m1x
.m2x
A,B,AlBl
B,A,AhBh
; Al u*u Bl
; Ah s*s Bh
||
shr
shr
.s1
.s2
AlBh,16,AlBhH
AhBl,16,AhBlH
; AlBhH = AlBh >>s 16
; AhBlH = AhBl >>s 16
b
.s2
b3
; return
||
||
shl
shl
add
.s1
.s2
.l2
AlBh,16,AlBhL
AhBl,16,AhBlL
AhBh,AhBlH,ABH
; AlBhL = AlBh << 16
; AhBlL = AhBl << 16
; ABH = AhBl + AhBlH
||
add
addu
.l2x
.l1x
ABH,AlBhH,ABH
AlBl,AhBlL,ABLo:ABLe
addu
.l1
add
.l1x
AlBhL,ABLo:ABLe,ABLo:ABLe
; (long)ABL = AlBhL + (long)ABL
ABLo,ABH,return
; ABH = ABLhigh + ABH
shl
.s1
return, 1, return
; ABH = ABH + AlBhH
; (long)ABL = AlBl + AhBlL
Figure 3. C62x Assembly Listing for a C-Callable Extended Precision Multiply
4
Extended Precision Radix-4 Fast Fourier Transform Implemented on the TMS320C62x
SPRA297
;******************************************************************************
; FILE
;
; dpfft.asm – C6x assembly source code for a C-callable, double precision
; fixed-point, radix-4, in-place, complex FFT assembly language function.
;
;******************************************************************************
; PROTOTYPE
;
;
void dfft (int *x, int *y, int *wcos, int *wsin, int n, int m);
;
;
where: x is a pointer to the real data array of length n
;
y is a pointer to the imaginary data array of length n
;
wcos is a pointer to the real twiddle factors array of length n
;
wsin is a pointer to the imaginary twiddle factors array, length n
;
n is the number of data points (must be a power of 4)
;
m is the number of stages in the radix-4 FFT
;
;******************************************************************************
; PERFORMANCE
;
;
~ # of cycles = M * (N/4 * 54 + 37) + N/3 * 7
;
;******************************************************************************
; MEMORY REQUIREMENTS
;
;
4*N bytes (real data)
;
4*N bytes (imag data)
;
4*N bytes (real coefficents)
;
4*N bytes (imag coefficents)
;
200 bytes (stack)
;
––––––––––––––––––––––––––––
;
16*N + 200 bytes (Total)
;
;******************************************************************************
; ASSUMPTIONS
;
;
1) All data is assumed to be in on-chip data memory
;
2) Digit reversal is not performed
;
3) Further optimization could improve performance
;
;******************************************************************************
xaddr
yaddr
wcaddr
wsaddr
npoints
mstages
.set
.set
.set
.set
.set
.set
a4
b4
a6
b6
a8
b8
STACKSIZE
N2
E
A
B
C
.set
.set
.set
.set
.set
.set
200
1
2
3
4
5
Figure 4. C62x C-Callable Assembly Language Functin Source Listing
for an Extended Precision Radix-4 FFT
Extended Precision Radix-4 Fast Fourier Transform Implemented on the TMS320C62x
5
SPRA297
I0
I1
I2
I3
R1
R2
S1
S2
N1
N
K
J
.set
.set
.set
.set
.set
.set
.set
.set
.set
.set
.set
.set
6
7
8
9
10
11
12
13
14
15
16
17
stack
KCNT
JCNT
ICNT
n1
n2
e
ea
j
ja
m
n
nt
i0t
i1t
n1t
r4
s4
co1
r3
co1hr3l
co1lr3h
co1lr3l
co1hr3h
co1lr3hH
co1hr3lH
co1lr3hL
co1hr3lL
co1r3H
co1r3Lo
co1r3Le
si1
s3
si1hs3l
si1ls3h
si1ls3l
si1hs3h
si1ls3hH
si1hs3lH
si1ls3hL
si1hs3lL
si1s3H
si1s3Lo
si1s3Le
s3_A
.set
.set
.set
.set
.set
.set
.set
.set
.set
.set
.set
.set
.set
.set
.set
.set
.set
.set
.set
.set
.set
.set
.set
.set
.set
.set
.set
.set
.set
.set
.set
.set
.set
.set
.set
.set
.set
.set
.set
.set
.set
.set
.set
.set
.set
b7
b0
b1
b2
b2
b3
b5
a5
b9
a9
a10
b8
b1
b0
b4
b4
a4
b4
a8
a1
a2
a3
a4
a5
a0
a1
a2
a3
a5
a3
a2
b8
b1
b2
b3
b4
b5
b0
b1
b2
b3
b5
b3
b2
a9
FigureC62x
4
C-Callable Assembly Language Functin Source Listing
for an Extended Precision Radix-4 FFT (Continued)
6
Extended Precision Radix-4 Fast Fourier Transform Implemented on the TMS320C62x
SPRA297
co1hs3l
co1ls3h
co1ls3l
co1hs3h
co1ls3hH
co1hs3lH
co1ls3hL
co1hs3lL
co1s3H
co1s3Lo
co1s3Le
xi1
xi0
r3_B
si1hr3l
si1lr3h
si1lr3l
si1hr3h
si1lr3hH
si1hr3lH
si1lr3hL
si1hr3lL
si1r3H
si1r3Lo
si1r3Le
yi1
yi0
co2
r2
co2hr2l
co2lr2h
co2lr2l
co2hr2h
co2lr2hH
co2hr2lH
co2lr2hL
co2hr2lL
co2r2H
co2r2Lo
co2r2Le
si2
s2
si2hs2l
si2ls2h
si2ls2l
si2hs2h
si2ls2hH
si2hs2lH
si2ls2hL
si2hs2lL
si2s2H
si2s2Lo
si2s2Le
co2_A
s2_A
.set
.set
.set
.set
.set
.set
.set
.set
.set
.set
.set
.set
.set
.set
.set
.set
.set
.set
.set
.set
.set
.set
.set
.set
.set
.set
.set
.set
.set
.set
.set
.set
.set
.set
.set
.set
.set
.set
.set
.set
.set
.set
.set
.set
.set
.set
.set
.set
.set
.set
.set
.set
.set
.set
.set
a10
a11
a12
a13
a8
a9
a10
a11
a13
a11
a10
a1
a5
b9
b10
b11
b12
b13
b8
b9
b10
b11
b13
b11
b10
b13
b5
a0
a1
a2
a3
a4
a5
a0
a1
a2
a3
a5
a3
a2
b0
b1
b2
b3
b4
b5
b0
b1
b2
b3
b5
b3
b2
a8
a9
FigureC62x
4
C-Callable Assembly Language Functin Source Listing
for an Extended Precision Radix-4 FFT (Continued)
Extended Precision Radix-4 Fast Fourier Transform Implemented on the TMS320C62x
7
SPRA297
co2hs2l
co2ls2h
co2ls2l
co2hs2h
co2ls2hH
co2hs2lH
co2ls2hL
co2hs2lL
co2s2H
co2s2Lo
co2s2Le
xi2
xi0
si2_B
r2_B
si2hr2l
si2lr2h
si2lr2l
si2hr2h
si2lr2hH
si2hr2lH
si2lr2hL
si2hr2lL
si2r2H
si2r2Lo
si2r2Le
yi2
yi0
co3
r1
co3hr1l
co3lr1h
co3lr1l
co3hr1h
co3lr1hH
co3hr1lH
co3lr1hL
co3hr1lL
co3r1H
co3r1Lo
co3r1Le
si3
s1
si3hs1l
si3ls1h
si3ls1l
si3hs1h
si3ls1hH
si3hs1lH
si3ls1hL
si3hs1lL
si3s1H
si3s1Lo
si3s1Le
co3_A
s1_A
co3hs1l
co3ls1h
.set
.set
.set
.set
.set
.set
.set
.set
.set
.set
.set
.set
.set
.set
.set
.set
.set
.set
.set
.set
.set
.set
.set
.set
.set
.set
.set
.set
.set
.set
.set
.set
.set
.set
.set
.set
.set
.set
.set
.set
.set
.set
.set
.set
.set
.set
.set
.set
.set
.set
.set
.set
.set
.set
.set
.set
.set
.set
a10
a11
a12
a13
a8
a9
a10
a11
a13
a11
a10
a1
a5
b8
b9
b10
b11
b12
b13
b8
b9
b10
b11
b13
b11
b10
b13
b5
a0
a1
a2
a3
a4
a5
a0
a1
a2
a3
a5
a3
a2
b0
b1
b2
b3
b4
b5
b0
b1
b2
b3
b5
b3
b2
a8
a9
a10
a11
FigureC62x
4
C-Callable Assembly Language Functin Source Listing
for an Extended Precision Radix-4 FFT (Continued)
8
Extended Precision Radix-4 Fast Fourier Transform Implemented on the TMS320C62x
SPRA297
co3ls1l
co3hs1h
co3ls1hH
co3hs1lH
co3ls1hL
co3hs1lL
co3s1H
co3s1Lo
co3s1Le
xi3
.set
.set
.set
.set
.set
.set
.set
.set
.set
.set
a12
a13
a8
a9
a10
a11
a13
a11
a10
a5
si3_B
r1_B
si3hr1l
si3lr1h
si3lr1l
si3hr1h
si3lr1hH
si3hr1lH
si3lr1hL
si3hr1lL
si3r1H
si3r1Lo
si3r1Le
yi3
y
x
i0
i1
i2
i3
wc
ws
a
b
c
bb
cc
xi0t
xi1t
xi2t
xi3t
r1t
r2t
r3t
r4t
yi0t
yi1t
yi2t
yi3t
s1t
s2t
s3t
s4t
.set
.set
.set
.set
.set
.set
.set
.set
.set
.set
.set
.set
.set
.set
.set
.set
.set
.set
.set
.set
.set
.set
.set
.set
.set
.set
.set
.set
.set
.set
.set
.set
.set
.set
.set
.set
.set
.set
.set
.set
.set
.set
.set
b8
b9
b10
b11
b12
b13
b8
b9
b10
b11
b13
b11
b10
b13
b6
b7
b14
b14
b14
b14
a7
a6
a14
a14
a14
a13
a12
a0
a1
a2
a3
a13
a9
a10
a11
b0
b1
b2
b3
b13
b9
b10
b11
.global
_fft4
FigureC62x
4
C-Callable Assembly Language Functin Source Listing
for an Extended Precision Radix-4 FFT (Continued)
Extended Precision Radix-4 Fast Fourier Transform Implemented on the TMS320C62x
9
SPRA297
_fft4:
; code to preserve the C runtime enviroment
mvk
sub
stw
stw
stw
stw
stw
stw
stw
stw
mv
stw
mv
stw
mv
stw
mv
stw
mv
||
||
||
||
.s2
.l2
.d2
.d2
.d2
.d2
.d2
.d2
.d2
.d2
.l
.d2
.l
.d2
.l
.d2
.l
.d2
.l
STACKSIZE,stack
B15, stack, B15
B14, *B15++[1]
B13, *B15++[1]
B12, *B15++[1]
B11, *B15++[1]
B10, *B15++[1]
B3,
*B15++[1]
A15, *B15++[1]
A14, *B15++[1]
wcaddr,wc
A13, *B15++[1]
wsaddr,ws
A12, *B15++[1]
xaddr,x
A11, *B15++[1]
yaddr,y
A10, *B15++[1]
B15,A15
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
move stack size into a reg.
allocate space on stack
push B14 onto stack
push B13 onto stack
push B12 onto stack
push B11 onto stack
push B10 onto stack
push B3 onto stack
push A15 onto stack
push A14 onto stack
copy argument to register
push A13 onto stack
copy argument to register
push A12 onto stack
copy argument to register
push A11 onto stack
copy argument to register
push A10 onto stack
copy argument to register
; begin FFT processing
;n2 = n;
;e = 1;
;
stw
mvk
stw
stw
||
||
.d2
.s2
.d2
.d
npoints,*+B15[N2]
1,
e
e,
*+B15[E]
npoints,*+A15[N]
;for(k=0; k<m; k++)
;{
;
n1 = n2;
;
n2 = n2 >> 2;
;
a = 0;
mv
stw
.l2
.d
mstages, KCNT
KCNT,*+B15[K]
ldw
.d2
*+B15[N2],
||
zero
zero
.l
.l
a
j
; a = 0
; j = 0
stw
stw
.d
.d
a, *+A15[A]
j,*+B15[J]
; store a on stack
||
||
||
stw
stw
mv
.d
.d
.l
j,*+B15[I0]
a, *+A15[B]
j,i0
; store i0 on stack
; store b on stack
ldw
.d
*+x[i0], xi0t
; xi0 = x[i0]
KLOOP:
n1
; n1 = n2
FigureC62x
4
C-Callable Assembly Language Functin Source Listing
for an Extended Precision Radix-4 FFT (Continued)
10
Extended Precision Radix-4 Fast Fourier Transform Implemented on the TMS320C62x
SPRA297
||
stw
shr
stw
mv
.d
.s2
.d2
.l2
n1,*+B15[N1]
n1,
2, n2
n2,
*+B15[N2]
n2,
JCNT
; n2 = n2 >> 2;
; store n2 on stack
; Initialize JLOOP counter
||
||
stw
ldw
add
.d
.d
.l
a, *+A15[C]
*+y[i0], yi0t
i0,
n2, i1
; store c on stack
; yi0 = y[i0]
; i1 = i0 + n2
||
ldw
stw
.d
.d
*+x[i1], xi1t
i1,*+A15[I1]
; xi1 = x[i1]
; store i1 on stack
||
||
ldw
add
ldw
.d
.l
.d
*+y[i1], yi1t
i1,
n2, i2
*+A15[A],a
; yi1 = y[i1]
; i2 = i1 + n2
||
ldw
stw
.d
.d
*+x[i2], xi2t
i2,*+A15[I2]
; xi2 = x[i2]
; store i2 on stack
||
ldw
add
.d
.l
*+y[i2], yi2t
i2,
n2, i3
; yi2 = y[i2]
; i3 = i2 + n2
||
ldw
stw
.d
.d
*+x[i3], xi3t
i3,*+A15[I3]
; xi3 = x[i3]
; store i3 on stack
ldw
.d
*+y[i3], yi3t
; yi3 = y[i3]
ldw
ldw
.d
.d
*+B15[I0], i0
*+wc[a],co1
; store i0 on stack
||
||
||
add
sub
ldw
.l
.s
.d
xi0t, xi2t, r1t
xi0t, xi2t, r3t
*+ws[a],si1
; r1 = x[i0] + x[i2]
; r3 = x[i0] – x[i2]
||
add
sub
.l
.s
yi0t, yi2t, s1t
yi0t, yi2t, s3t
; s1 = y[i0] + y[i2]
; s3 = y[i0] – y[i2]
||
add
sub
.l
.s
xi1t, xi3t, r2t
xi1t, xi3t, r4t
; r2 = x[i1] + x[i3]
; r4 = x[i1] – x[i3]
||
||
add
sub
add
.l
.s
.l
yi1t, yi3t, s2t
yi1t, yi3t, s4t
r1t, r2t,
xi0t
; s2 = y[i1] + y[i3]
; s4 = y[i1] – y[i3]
; xi0 = r1 + r2
||
||
sub
add
stw
.l
.l
.d
s3t, r4t,
s3
r3t, s4t,
r3
xi0t,
*+x[i0]
; s3 = s3 – r4
; r3 = r3 + s4
; x[i0] = r1 + r2
||
||
||
||
mpyhslu
mpyhslu
ldw
add
sub
.m1
.m2
.d
.l
.l
r3,co1,co1lr3h
s3,si1,si1ls3h
*+A15[B],b
s1t, s2t,
yi0t
r1t, r2t,
r2t
; co1l u*s r3h
; si1l u*s s3h
||
JLOOP:
ILOOP:
; yi0 = s1 + s2
; r2 = r1 – r2
FigureC62x
4
C-Callable Assembly Language Functin Source Listing
for an Extended Precision Radix-4 FFT (Continued)
Extended Precision Radix-4 Fast Fourier Transform Implemented on the TMS320C62x
11
SPRA297
||
||
||
||
||
mpyhslu
mpyhslu
sub
stw
stw
sub
.m1
.m2
.l
.d
.d
.l
co1,r3,co1hr3l
si1,s3,si1hs3l
s1t, s2t,
s2t
yi0t,
*+y[i0]
r2t,*+A15[R2]
r3t, s4t,
r1t
;
;
;
;
co1h s*u r3l
si1h s*u s3l
s2 = s1 – s2
y[i0] = s1 + s2
||
||
||
||
mpyu
mpyu
stw
add
stw
.m1
.m2
.d
.l
.d
co1,r3,co1lr3l
si1,s3,si1ls3l
s2t,*+B15[S2]
s3t, r4t,
s1t
r1t,*+A15[R1]
; co1l u*u r3l
; si1l u*u s3l
||
||
||
||
mpyh
mpyh
mv
mv
stw
.m1
.m2
.l
.l
.d
r3,co1,co1hr3h
s3,si1,si1hs3h
s3,s3_A
r3,r3_B
s1t,*+B15[S1]
; co1h s*s r3h
; si1h s*s s3h
||
||
||
shr
shr
mpyhslu
mpyhslu
.s1
.s2
.m1
.m2
co1lr3h,16,co1lr3hH
si1ls3h,16,si1ls3hH
s3_A,co1,co1ls3h
r3_B,si1,si1lr3h
;
;
;
;
co1lr3hH
si1ls3hH
co1l u*s
si1l u*s
= co1lr3h >>s 16
= si1ls3h >>s 16
s3h
r3h
||
||
||
||
||
shr
shr
mpyhslu
mpyhslu
ldw
ldw
.s1
.s2
.m1
.m2
.d
.d
co1hr3l,16,co1hr3lH
si1hs3l,16,si1hs3lH
co1,s3_A,co1hs3l
si1,r3_B,si1hr3l
*+ws[b],si2
*+B15[R2],r2
;
;
;
;
co1hr3lH
si1hs3lH
co1h s*u
si1h s*u
= co1hr3l >>s 16
= si1hs3l >>s 16
s3l
r3l
||
||
||
||
||
shl
shl
mpyu
mpyu
ldw
ldw
.s1
.s2
.m1
.m2
.d
.d
co1lr3h,16,co1lr3hL
si1ls3h,16,si1ls3hL
co1,s3_A,co1ls3l
si1,r3_B,si1lr3l
*+wc[b],co2
*+B15[S2],s2
;
;
;
;
co1lr3hL
si1ls3hL
co1l u*u
si1l u*u
= co1lr3h << 16
= si1ls3h << 16
s3l
r3l
||
||
||
shl
shl
mpyh
mpyh
.s1
.s2
.m1
.m2
co1hr3l,16,co1hr3lL
si1hs3l,16,si1hs3lL
s3_A,co1,co1hs3h
r3_B,si1,si1hr3h
;
;
;
;
co1hr3lL
si1hs3lL
co1h s*s
si1h s*s
= co1hr3l << 16
= si1hs3l << 16
s3h
r3h
||
||
||
add
add
shr
shr
.l1
.l2
.s1
.s2
co1hr3h,co1hr3lH,co1r3H
si1hs3h,si1hs3lH,si1s3H
co1ls3h,16,co1ls3hH
si1lr3h,16,si1lr3hH
;
;
;
;
co1r3H =
si1s3H =
co1ls3hH
si1lr3hH
co1hr3l +
si1hs3l +
= co1ls3h
= si1lr3h
||
||
||
add
add
shr
shr
.l1
.l2
.s1
.s2
co1r3H,co1lr3hH,co1r3H
si1s3H,si1ls3hH,si1s3H
co1hs3l,16,co1hs3lH
si1hr3l,16,si1hr3lH
;
;
;
;
co1r3H =
si1s3H =
co1hs3lH
si1hr3lH
co1r3H + co1lr3hH
si1s3H + si1ls3hH
= co1hs3l >>s 16
= si1hr3l >>s 16
; r1 = r3 – s4
; s1 = s3 + r4
co1hr3lH
si1hs3lH
>>s 16
>>s 16
FigureC62x
4
C-Callable Assembly Language Functin Source Listing
for an Extended Precision Radix-4 FFT (Continued)
12
Extended Precision Radix-4 Fast Fourier Transform Implemented on the TMS320C62x
SPRA297
addu
.l1
co1lr3hL,co1hr3lL,co1r3Lo:co1r3Le
; (long)co1r3L =
co1lr3l + co1hr3lL
si1ls3hL,si1hs3lL,si1s3Lo:si1s3Le
; (long)si1s3L =
si1ls3l + si1hs3lL
co1ls3h,16,co1ls3hL
; co1ls3hL = co1ls3h << 16
si1lr3h,16,si1lr3hL
; si1lr3hL = si1lr3h << 16
*+B15[C],c
||
addu
.l2
||
||
||
shl
shl
ldw
.s1
.s2
.d
addu
.l1
||
addu
.l2
||
||
||
||
||
shl
shl
ldw
mpyhslu
mpyhslu
.s1
.s2
.d2
.m1
.m2
co1lr3l,co1r3Lo:co1r3Le,co1r3Lo:co1r3Le ; (long)co1r3L =
co1lr3hL + (long)co1r3L
si1ls3l,si1s3Lo:si1s3Le,si1s3Lo:si1s3Le ;(long)si1s3L =
si1ls3hL + (long)si1s3L
co1hs3l,16,co1hs3lL
; co1hs3lL = co1hs3l << 16
si1hr3l,16,si1hr3lL
; si1hr3lL = si1hr3l << 16
*+B15[I1],i1
; load i1 from stack
r2,co2,co2lr2h
; co2l u*s r2h
s2,si2,si2ls2h
; si2l u*s s2h
||
||
||
||
||
||
||
add
add
add
add
mv
mv
mpyhslu
mpyhslu
.l1
.l2
.d1
.d2
.s
.s
.m1
.m2
co1r3Lo,co1r3H,co1r3H
si1s3Lo,si1s3H,si1s3H
co1hs3h,co1hs3lH,co1s3H
si1hr3h,si1hr3lH,si1r3H
s2,s2_A
r2,r2_B
co2,r2,co2hr2l
si2,s2,si2hs2l
||
||
||
||
||
||
||
shl
shl
add
add
mv
mv
mpyu
mpyu
.s1
.s2
.l1
.l2
.d
.d
.m1
.m2
co1r3H,1,co1r3H
si1s3H,1,si1s3H
co1s3H,co1ls3hH,co1s3H
si1r3H,si1lr3hH,si1r3H
co2,co2_A
si2,si2_B
co2,r2,co2lr2l
si2,s2,si2ls2l
||
add
addu
.s1x
.l1
||
addu
.l2
||
||
mpyh
mpyh
.m1
.m2
co1r3H,si1s3H,xi1
; xi1 = co1*r3 + si1*s3
co1ls3hL,co1hs3lL,co1s3Lo:co1s3Le
; (long)co1s3L =
co1ls3l + co1hs3lL
si1lr3hL,si1hr3lL,si1r3Lo:si1r3Le
; (long)si1r3L =
si1lr3l + si1hr3lL
r2,co2,co2hr2h
; co2h s*s r2h
s2,si2,si2hs2h
; si2h s*s s2h
addu
.l1
||
addu
.l2
||
||
||
||
shr
shr
mpyhslu
mpyhslu
.s1
.s2
.m1
.m2
;
;
;
;
co1r3H
si1s3H
co1s3H
si1r3H
=
=
=
=
co1r3Lhigh + co1r3H
si1s3Lhigh + si1s3H
co1hs3l + co1hs3lH
si1hr3l + si1hr3lH
; co2h s*u r2l
; si2h s*u s2l
; co1s3H = co1s3H + co1ls3hH
; si1r3H = si1r3H + si1lr3hH
; co2l u*u r2l
; si2l u*u s2l
co1ls3l,co1s3Lo:co1s3Le,co1s3Lo:co1s3Le ; (long)co1s3L =
co1ls3hL + (long)co1s3L
si1lr3l,si1r3Lo:si1r3Le,si1r3Lo:si1r3Le ; (long)si1r3L =
si1lr3hL + (long)si1r3L
co2lr2h,16,co2lr2hH
; co2lr2hH = co2lr2h >>s 16
si2ls2h,16,si2ls2hH
; si2ls2hH = si2ls2h >>s 16
s2_A,co2_A,co2ls2h
; co2l u*s s2h
r2_B,si2_B,si2lr2h
; si2l u*s r2h
FigureC62x
4
C-Callable Assembly Language Functin Source Listing
for an Extended Precision Radix-4 FFT (Continued)
Extended Precision Radix-4 Fast Fourier Transform Implemented on the TMS320C62x
13
SPRA297
||
||
||
||
||
||
stw
add
add
shr
shr
mpyhslu
mpyhslu
.d2
.l1
.l2
.s1
.s2
.m1
.m2
xi1,*+x[i1]
; x[i1] = co1*r3 + si1*s3
co1s3Lo,co1s3H,co1s3H
; co1s3H = co1s3Lhigh + co1s3H
si1r3Lo,si1r3H,si1r3H
; si1r3H = si1r3Lhigh + si1r3H
co2hr2l,16,co2hr2lH ; co2hr2lH = co2hr2l >>s 16
si2hs2l,16,si2hs2lH ; si2hs2lH = si2hs2l >>s 16
co2_A,s2_A,co2hs2l ; co2h s*u s2l
si2_B,r2_B,si2hr2l ; si2h s*u r2l
||
||
||
shl
shl
ldw
ldw
.s1
.s2
.d
.d
co1s3H,1,co1s3H
si1r3H,1,si1r3H
*+ws[c],si3
*+B15[R1],r1
||
||
||
||
||
||
sub
shl
shl
mpyu
mpyu
ldw
ldw
.l2x
.s1
.s2
.m1
.m2
.d
.d
co1s3H,si1r3H,yi1 ;
co2lr2h,16,co2lr2hL ;
si2ls2h,16,si2ls2hL ;
co2_A,s2_A,co2ls2l ;
si2_B,r2_B,si2lr2l ;
*+wc[c],co3
*+B15[S1],s1
||
||
||
||
stw
shl
shl
mpyh
mpyh
.d2
.s1
.s2
.m1
.m2
yi1,*+y[i1]
; y[i1] = co1*s3 – si1*r3
co2hr2l,16,co2hr2lL ; co2hr2lL = co2hr2l << 16
si2hs2l,16,si2hs2lL ; si2hs2lL = si2hs2l << 16
s2_A,co2_A,co2hs2h ; co2h s*s s2h
r2_B,si2_B,si2hr2h ; si2h s*s r2h
||
||
||
add
add
shr
shr
.l1
.l2
.s1
.s2
co2hr2h,co2hr2lH,co2r2H
; co2r2H = co2hr2l + co2hr2lH
si2hs2h,si2hs2lH,si2s2H
; si2s2H = si2hs2l + si2hs2lH
co2ls2h,16,co2ls2hH ; co2ls2hH = co2ls2h >>s 16
si2lr2h,16,si2lr2hH ; si2lr2hH = si2lr2h >>s 16
||
||
||
add
add
shr
shr
.l1
.l2
.s1
.s2
co2r2H,co2lr2hH,co2r2H ; co2r2H = co2r2H
si2s2H,si2ls2hH,si2s2H ; si2s2H = si2s2H
co2hs2l,16,co2hs2lH ; co2hs2lH = co2hs2l
si2hr2l,16,si2hr2lH ; si2hr2lH = si2hr2l
addu
.l1
||
addu
.l2
||
||
shl
shl
.s1
.s2
co2lr2hL,co2hr2lL,co2r2Lo:co2r2Le
; (long)co2r2L =
co2lr2l + co2hr2lL
si2ls2hL,si2hs2lL,si2s2Lo:si2s2Le
; (long)si2s2L =
si2ls2l + si2hs2lL
co2ls2h,16,co2ls2hL ; co2ls2hL = co2ls2h << 16
si2lr2h,16,si2lr2hL ; si2lr2hL = si2lr2h << 16
addu
.l1
addu
.l2
||
yi1 = co1*s3 – si1*r3
co2lr2hL = co2lr2h << 16
si2ls2hL = si2ls2h << 16
co2l u*u s2l
si2l u*u r2l
+ co2lr2hH
+ si2ls2hH
>>s 16
>>s 16
co2lr2l,co2r2Lo:co2r2Le,co2r2Lo:co2r2Le ;(long)co2r2L =
co2lr2hL + (long)co2r2L
si2ls2l,si2s2Lo:si2s2Le,si2s2Lo:si2s2Le ;(long)si2s2L =
si2ls2hL + (long)si2s2L
FigureC62x
4
C-Callable Assembly Language Functin Source Listing
for an Extended Precision Radix-4 FFT (Continued)
14
Extended Precision Radix-4 Fast Fourier Transform Implemented on the TMS320C62x
SPRA297
||
||
||
||
||
shl
shl
ldw
mpyhslu
mpyhslu
.s1
.s2
.d2
.m1
.m2
co2hs2l,16,co2hs2lL ; co2hs2lL = co2hs2l << 16
si2hr2l,16,si2hr2lL ; si2hr2lL = si2hr2l << 16
*+B15[I2],i2
; load i2 from stack
r1,co3,co3lr1h
; co3l u*s r1h
s1,si3,si3ls1h
; si3l u*s s1h
||
||
||
||
||
||
||
add
add
add
add
mpyhslu
mpyhslu
mv
mv
.l1
.l2
.d1
.d2
.m1
.m2
.s
.s
co2r2Lo,co2r2H,co2r2H ; co2r2H = co2r2Lhigh
si2s2Lo,si2s2H,si2s2H ; si2s2H = si2s2Lhigh
co2hs2h,co2hs2lH,co2s2H
; co2s2H = co2hs2l
si2hr2h,si2hr2lH,si2r2H
; si2r2H = si2hr2l
co3,r1,co3hr1l
; co3h s*u r1l
si3,s1,si3hs1l
; si3h s*u s1l
s1,s1_A
r1,r1_B
||
||
||
||
||
||
||
shl
shl
add
add
mpyu
mpyu
mv
mv
.s1
.s2
.l1
.l2
.m1
.m2
.d
.d
co2r2H,1,co2r2H
si2s2H,1,si2s2H
co2s2H,co2ls2hH,co2s2H ; co2s2H = co2s2H + co2ls2hH
si2r2H,si2lr2hH,si2r2H
; si2r2H = si2r2H + si2lr2hH
co3,r1,co3lr1l
; co3l u*u r1l
si3,s1,si3ls1l
; si3l u*u s1l
co3,co3_A
si3,si3_B
||
add
addu
||
addu
||
||
mpyh
mpyh
.s1x
co2r2H,si2s2H,xi2 ; xi2 = co2*r2 + si2*s2
.l1
co2ls2hL,co2hs2lL,co2s2Lo:co2s2Le
; (long)co2s2L =
co2ls2l + co2hs2lL
.l2
si2lr2hL,si2hr2lL,si2r2Lo:si2r2Le
; (long)si2r2L =
si2lr2l + si2hr2lL
.m1
r1,co3,co3hr1h
; co3h s*s r1h
.m2
s1,si3,si3hs1h
; si3h s*s s1h
addu
.l1
||
addu
.l2
||
||
||
||
shr
shr
mpyhslu
mpyhslu
.s1
.s2
.m1
.m2
co2ls2l,co2s2Lo:co2s2Le,co2s2Lo:co2s2Le
co2ls2hL + (long)co2s2L
si2lr2l,si2r2Lo:si2r2Le,si2r2Lo:si2r2Le
si2lr2hL + (long)si2r2L
co3lr1h,16,co3lr1hH ; co3lr1hH = co3lr1h
si3ls1h,16,si3ls1hH ; si3ls1hH = si3ls1h
s1_A,co3_A,co3ls1h ; co3l u*s s1h
r1_B,si3_B,si3lr1h ; si3l u*s r1h
||
||
||
||
||
||
stw
add
add
shr
shr
mpyhslu
mpyhslu
.d2
.l1
.l2
.s1
.s2
.m1
.m2
xi2,*+x[i2]
; x[i2] = co2*r2 + si2*s2
co2s2Lo,co2s2H,co2s2H ; co2s2H = co2s2Lhigh
si2r2Lo,si2r2H,si2r2H ; si2r2H = si2r2Lhigh
co3hr1l,16,co3hr1lH ; co3hr1lH = co3hr1l >>s
si3hs1l,16,si3hs1lH ; si3hs1lH = si3hs1l >>s
co3_A,s1_A,co3hs1l ; co3h s*u s1l
si3_B,r1_B,si3hr1l ; si3h s*u r1l
||
||
shl
shl
ldw
.s1
.s2
.d
co2s2H,1,co2s2H
si2r2H,1,si2r2H
*+B15[N1],n1t
+
+
+
+
co2r2H
si2s2H
co2hs2lH
si2hr2lH
; (long)co2s2L =
; (long)si2r2L =
>>s 16
>>s 16
+ co2s2H
+ si2r2H
16
16
FigureC62x
4
C-Callable Assembly Language Functin Source Listing
for an Extended Precision Radix-4 FFT (Continued)
Extended Precision Radix-4 Fast Fourier Transform Implemented on the TMS320C62x
15
SPRA297
||
||
||
||
||
sub
shl
shl
mpyu
mpyu
ldw
.l2x
.s1
.s2
.m1
.m2
.d
co2s2H,si2r2H,yi2 ;
co3lr1h,16,co3lr1hL ;
si3ls1h,16,si3ls1hL ;
co3_A,s1_A,co3ls1l ;
si3_B,r1_B,si3lr1l ;
*+B15[I0],i0t
yi2 = co2*s2 – si2*r2
co3lr1hL = co3lr1h << 16
si3ls1hL = si3ls1h << 16
co3l u*u s1l
si3l u*u r1l
||
||
||
||
shl
shl
mpyh
mpyh
stw
.s1
.s2
.m1
.m2
.d2
co3hr1l,16,co3hr1lL ; co3hr1lL = co3hr1l << 16
si3hs1l,16,si3hs1lL ; si3hs1lL = si3hs1l << 16
s1_A,co3_A,co3hs1h ; co3h s*s s1h
r1_B,si3_B,si3hr1h ; si3h s*s r1h
yi2,*+y[i2]
; y[i2] = co2*s2 – si2*r2
||
||
||
||
add
add
shr
shr
ldw
.l1
.l2
.s1
.s2
.d
co3hr1h,co3hr1lH,co3r1H
; co3r1H = co3hr1l + co3hr1lH
si3hs1h,si3hs1lH,si3s1H
; si3s1H = si3hs1l + si3hs1lH
co3ls1h,16,co3ls1hH ; co3ls1hH = co3ls1h >>s 16
si3lr1h,16,si3lr1hH ; si3lr1hH = si3lr1h >>s 16
*+B15[N],nt
||
||
||
||
add
add
shr
shr
ldw
.l1
.l2
.s1
.s2
.d2
co3r1H,co3lr1hH,co3r1H ; co3r1H = co3r1H
si3s1H,si3ls1hH,si3s1H ; si3s1H = si3s1H
co3hs1l,16,co3hs1lH ; co3hs1lH = co3hs1l
si3hr1l,16,si3hr1lH ; si3hr1lH = si3hr1l
*+B15[N2],
n2
addu
.l1
||
addu
.l2
||
||
shl
shl
.s1
.s2
co3lr1hL,co3hr1lL,co3r1Lo:co3r1Le
; (long)co3r1L =
co3lr1l + co3hr1lL
si3ls1hL,si3hs1lL,si3s1Lo:si3s1Le
; (long)si3s1L =
si3ls1l + si3hs1lL
co3ls1h,16,co3ls1hL ; co3ls1hL = co3ls1h << 16
si3lr1h,16,si3lr1hL ; si3lr1hL = si3lr1h << 16
addu
.l1
||
addu
.l2
||
||
||
shl
shl
ldw
.s1
.s2
.d2
co3lr1l,co3r1Lo:co3r1Le,co3r1Lo:co3r1Le
co3lr1hL + (long)co3r1L
si3ls1l,si3s1Lo:si3s1Le,si3s1Lo:si3s1Le
si3ls1hL + (long)si3s1L
co3hs1l,16,co3hs1lL ; co3hs1lL = co3hs1l
si3hr1l,16,si3hr1lL ; si3hr1lL = si3hr1l
*+B15[I3],i3
; load i3 from stack
||
||
||
||
||
add
add
add
add
add
ldw
.l1
.l2
.s
.d2
.s
.d
co3r1Lo,co3r1H,co3r1H ; co3r1H = co3r1Lhigh
si3s1Lo,si3s1H,si3s1H ; si3s1H = si3s1Lhigh
co3hs1h,co3hs1lH,co3s1H
; co3s1H = co3hs1l
si3hr1h,si3hr1lH,si3r1H
; si3r1H = si3hr1l
i0t, n1t,
i0t
*+A15[J],ja
||
||
||
shl
shl
add
add
.s1
.s2
.l1
.d2
co3r1H,1,co3r1H
si3s1H,1,si3s1H
co3s1H,co3ls1hH,co3s1H ; co3s1H = co3s1H + co3ls1hH
si3r1H,si3lr1hH,si3r1H
; si3r1H = si3r1H + si3lr1hH
+ co3lr1hH
+ si3ls1hH
>>s 16
>>s 16
; (long)co3r1L =
; (long)si3s1L =
<< 16
<< 16
+
+
+
+
co3r1H
si3s1H
co3hs1lH
si3hr1lH
FigureC62x
4
C-Callable Assembly Language Functin Source Listing
for an Extended Precision Radix-4 FFT (Continued)
16
Extended Precision Radix-4 Fast Fourier Transform Implemented on the TMS320C62x
SPRA297
||
||
cmplt
stw
.l
.d
i0t, nt, ICNT
i0t,*+A15[I0]
||
add
addu
.s1x
.l1
||
addu
.l2
co3r1H,si3s1H,xi3 ; xi3 = co3*r1 + si3*s1
co3ls1hL,co3hs1lL,co3s1Lo:co3s1Le
; (long)co3s1L = co3ls1l + co3hs1lL
si3lr1hL,si3hr1lL,si3r1Lo:si3r1Le
; (long)si3r1L = si3lr1l + si3hr1lL
.s
ILOOP
*+y[i0t], yi0t
; yi0 = y[i0]
*+A15[E],ea
||[ICNT]
||
ldw
||
ldw
b
.d
.d
; store i0 on stack
addu
.l1
||
addu
.l2
||
||
ldw
add
.d
.s
co3ls1l,co3s1Lo:co3s1Le,co3s1Lo:co3s1Le ; (long)co3s1L =
co3ls1hL + (long)co3s1L
si3lr1l,si3r1Lo:si3r1Le,si3r1Lo:si3r1Le ; (long)si3r1L =
si3lr1hL + (long)si3r1L
*+x[i0t],
xi0t
; xi0 = x[i0]
i0t, n2, i1t ; i1 = i0 + n2
||
||
||
stw
add
add
stw
.d2
.l1
.l2
.d
xi3,*+x[i3]
; x[i3] = co3*r1 + si3*s1
co3s1Lo,co3s1H,co3s1H ; co3s1H = co3s1Lhigh + co3s1H
si3r1Lo,si3r1H,si3r1H ; si3r1H = si3r1Lhigh + si3r1H
i1t,*+A15[I1]
; store i1 on stack
||
||
||
shl
shl
ldw
add
.s1
.s2
.d
.l
co3s1H,1,co3s1H
si3r1H,1,si3r1H
*+x[i1t],
xi1t
ja,
1, ja
.l2x
ldw
stw
co3s1H,si3r1H,yi3 ; yi3 = co3*s1 – si3*r1
.d
*+y[i1t], yi1t
; yi1 = y[i1]
.d
ja,*+A15[J]
.d2
.s
.l
.d
yi3,*+y[i3]
i1t, n2, i2
ja,
n2, JCNT
*+A15[A],a
.s2
.m
JLOOP
ja,
ea, a
mv
.l
ja,i0
||
||
stw
add
stw
.d
.l
.d
a, *+A15[A]
a,
a, bb
i0,*+B15[I0]
; store a on stack
; b = a + a
; store i0 on stack
||
stw
add
.d
.l
bb, *+A15[B]
bb,
a, cc
; store b on stack
; c = b + a
ldw
.d
*+x[i0], xi0t
; xi0 = x[i0]
sub
||[ICNT]
||[!ICNT]
||
||
||
stw
add
cmplt
ldw
; xi1 = x[i1]
; j=j+1
; y[i3] = co3*s1 – si3*r1
; i2 = i1 + n2
ILOOPEND:
[JCNT] b
||
mpy
; a = (j + 1)*e
FigureC62x
4
C-Callable Assembly Language Functin Source Listing
for an Extended Precision Radix-4 FFT (Continued)
Extended Precision Radix-4 Fast Fourier Transform Implemented on the TMS320C62x
17
SPRA297
stw
ldw
add
.d
.d
.l
cc, *+A15[C]
*+y[i0], yi0t
i0,
n2, i1
JLOOPEND:
ldw
ldw
.d
.d
*+B15[E],e
*+B15[K],KCNT
.s
.d
sub
.d
b
3
e,
2, e
; e = e << 2;
e,*+B15[E]
.l
KCNT, 1, KCNT
KCNT,*+B15[K]
.s
KLOOP
5
||
||
nop
shl
stw
||[KCNT]
stw
||[KCNT]
nop
KLOOPEND:
; store c on stack
; yi0 = y[i0]
; i1 = i0 + n2
; code to restore the C runtime enviroment, and return to calling
function
||
ldw
ldw
ldw
ldw
ldw
ldw
ldw
ldw
ldw
ldw
ldw
b
ldw
mvk
add
nop
.d2
.d2
.d2
.d2
.d2
.d2
.d2
.d2
.d2
.d2
.d2
.s2
.d2
.s2
.l2
*––B15[1],
A10 ;
*––B15[1],
A11 ;
*––B15[1],
A12 ;
*––B15[1],
A13 ;
*––B15[1],
A14 ;
*––B15[1],
A15 ;
*––B15[1],
B3 ;
*––B15[1],
B10 ;
*––B15[1],
B11 ;
*––B15[1],
B12 ;
*––B15[1],
B13 ;
B3
*––B15[1],
B14 ;
STACKSIZE,stack
B15, stack, B15 ;
3
pop
pop
pop
pop
pop
pop
pop
pop
pop
pop
pop
A10 from the stack
A11 from the stack
A12 from the stack
A13 from the stack
A14 from the stack
A15 from the stack
B3 from the stack
B10 from the stack
B11 from the stack
B12 from the stack
B13 from the stack
pop B14 from the stack
; move stack size into a reg.
de-allocate space on stack
FigureC62x
4
C-Callable Assembly Language Functin Source Listing
for an Extended Precision Radix-4 FFT (Continued)
18
Extended Precision Radix-4 Fast Fourier Transform Implemented on the TMS320C62x
IMPORTANT NOTICE
Texas Instruments Incorporated and its subsidiaries (TI) reserve the right to make corrections, modifications,
enhancements, improvements, and other changes to its products and services at any time and to discontinue
any product or service without notice. Customers should obtain the latest relevant information before placing
orders and should verify that such information is current and complete. All products are sold subject to TI’s terms
and conditions of sale supplied at the time of order acknowledgment.
TI warrants performance of its hardware products to the specifications applicable at the time of sale in
accordance with TI’s standard warranty. Testing and other quality control techniques are used to the extent TI
deems necessary to support this warranty. Except where mandated by government requirements, testing of all
parameters of each product is not necessarily performed.
TI assumes no liability for applications assistance or customer product design. Customers are responsible for
their products and applications using TI components. To minimize the risks associated with customer products
and applications, customers should provide adequate design and operating safeguards.
TI does not warrant or represent that any license, either express or implied, is granted under any TI patent right,
copyright, mask work right, or other TI intellectual property right relating to any combination, machine, or process
in which TI products or services are used. Information published by TI regarding third–party products or services
does not constitute a license from TI to use such products or services or a warranty or endorsement thereof.
Use of such information may require a license from a third party under the patents or other intellectual property
of the third party, or a license from TI under the patents or other intellectual property of TI.
Reproduction of information in TI data books or data sheets is permissible only if reproduction is without
alteration and is accompanied by all associated warranties, conditions, limitations, and notices. Reproduction
of this information with alteration is an unfair and deceptive business practice. TI is not responsible or liable for
such altered documentation.
Resale of TI products or services with statements different from or beyond the parameters stated by TI for that
product or service voids all express and any implied warranties for the associated TI product or service and
is an unfair and deceptive business practice. TI is not responsible or liable for any such statements.
Mailing Address:
Texas Instruments
Post Office Box 655303
Dallas, Texas 75265
Copyright  2002, Texas Instruments Incorporated
Was this manual useful for you? yes no
Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Download PDF

advertising