# Tutorial-1 (Micro-architecture and Finite Length) (2013)

```1. Micro Architecture and
Finite Length
Olle Seger (olle.seger@liu.se)
Andreas Ehliar (ehliar@isy.liu.se)
Dake Liu, Rizwan Azhgar
1
Outline
•
•
•
•
Introduction
Basic Components
Finite Length, Overflow, 2-complement,
rounding, saturation
• About Lab-1, the Senior Processor …..
2
• Labs
– In groups of two students
– No written report
– Be prepared to answer questions
– Mandatory
3
Some Basic Components
• Busses
• Multiplexers
• Registers
• Multipliers
4
On-chip Busses
• Sharing data between different modules.
• All can read from the bus at the same time
• Only one can write to it at one time
VHDL
C<= A(4 downto 1) & B(6 downto 0);
Verilog
assign C = {A[4:1], B[6:0]};
5
MUX or Multiplexer
VHDL
ctrl
Y <= A when ctrl = “00” else
B when ctrl = “01” else
C when ctrl = “10” else
D;
A B C D
00
01
10
11
2
Y
(or)
with ctrl select
Y <= A when “00”,
B when “01”,
C when “10”,
D when others;
process(A,B,C,D,ctrl)is
begin
case ctrl is
when 0 => Y <= A;
when 1 => Y <= B;
when 2 => Y <= C;
when others => Y <= D;
end case;
end process;
Verilog
Y = (ctrl == 2’b00) ? A :
(ctrl == 2’b01) ? B :
(ctrl == 2’b10) ? C :
D;
(or)
always @(*) begin
case (ctrl)
2’b00 : Y =
2’b01 : Y =
2’b10 : Y =
2’b11 : Y =
endcase
end
A;
B;
C;
D;
6
Registers
VHDL
Verilog
process(clk)is
if rising_edge(clk) then
if rst=1 then
q <= 0;
elsif ld=1 then
q <= d;
end if;
end process;
always @(posedge clk)
begin
if (rst)
q <= 0;
else if (ld)
q <= d;
end
d
ld
rst
K
2
0
0
1
2
N
clk
rst ld
0 0
0 1
1 -
out
2
0
1
q
7
2’s Complement Number Representation
-1
1
½
¼
1/8
10
10 0
½
¼
−3/4 ∈ [-1,1-1/8]
It’s easy to increase
the number of bits.
It’s still the same
number.
-4
1
2
1
1 1
10
1/8
1/16
10 0 0
1/32
0
−3/4 ∈ [-4,4-1/32]
concatenate zeros
duplicate sign bit
…
8
2’s Complement Number Representation
-4
-2
1
x1 x2 1
½
¼
1/8
1/16
1/32
10
10 0 y1 y2
truncate
rounding
-4
-2
1
x1 x2 1
½
¼
10
10 y1
-4
1/8
-2
1
x1 x2 1
½
¼
1/8
10
10 0
saturate
x1=0
-1
0
½
x1x2=11
x1x2=10
¼
1/8
-1
11 10 1
1
MAX
½
¼
10
1/8
00 0
MIN
-1
1
½
10
¼
1/8
10 y1
9
Implicitly: integer, two’s complement
{c_o, O[15:0]} <= A[15:0] + B[15:0] + {15’b0, c_i}
Alternatively
{c_o,O[15:0],x} <=
{A[15],A[15:0],1} + {B[15],B[15:0], c_i}
B
A
Input operands : N bit ;
Output : N+1 bit
Subtraction :
Using 2’s Complement
+
c_i
c_o
O
10
Example:
Integer or Fractional Multiplication
0111 × 0111 = 00110001 or
0.111 × 0.111 = 00.110001 = 0. 1100010
Input operands : N bit ;
Output : 2N bit
OMB[15:0]
O[7:0] <= A[3:0] × B[3:0]
OMA[15:0]
Multiplier(signed)
MULS
32
Mul_Output
[31:0]
12
Signed multiplication
paper&pencil algorithm
0111 7
*
0111 7
00000111 7
00001110 14
00011100 28
00000000 0
00110001 49
1001
*
0111
11111001
11110010
11100100
00000000
11001111
-7
7
-7
-14
-28
0
-49
0111 7
*
1001 -7
00000111 7
00000000 0
00000000 0
11001000 -56
11001111 -49
1001 -7
*
1001 -7
11111001 -7
00000000
0
00000000
0
00111000 56
00110001 49
13
ar
Register
File
DM1
DM0
ar
X
(2,30)
(1,15)
(1,15)
sign extend to(10,30)
ALU
accumulator
(10,30)
scale
(10,30)
round
(10,15)
sat
(1,15)
15
A Rounding Example
Round Arithmetic Example: Round 8 bits to 4 bits
Sign bit
Before round: 8 bits
A7
A6
A5
A4
A3
A2
A1
A0
Round to nearest arithmetic
b[3:0] = a[7:4] + {3’b000, a[3]}
Sign bit
To round up, add A3 as the carry in
B3
B2
B1
B0
4 bits left after round and truncation
A3
A7A6A5A4
+ 0 0 0 A3
B7B6B5B4
A7A6A5A4A3
+0 0 0 0 1
B7B6B5B4 X
16
Senior Assembler & Simulator
• Assembly Code Includes:
• Assembly Instructions: LD, ST, ADD, CMP, …
• Symbolic name for memory locations: labels
• Assembler directives: .skip 31, .df 0.125, …
• Senior Assembler: Translates assembly code into an
executable binary code (Hex Format).
• Senior Simulator: Takes the hex file and provides a
debugging environment.
Assembly
Code
(ex.asm)
Assembler
(srasm)
ex.hex
Simulator
(srsim)
Debugging +
Output text file
17
Senior
Senior: DSP with lots of
bells and whistles
• 32 16-bit general regs (r0-r31)
• 32 16-bit special purpose regs
• 4 32-bit accumulator regs
+ 8 guard bits (acr0-acr3)
18
Special purpose registers
19
• Memory
• Where is the data? rom0
• Where are the coefficients? rom0
• But you need them at the same
time. So?
• How to save the output to a
text file
dm0
dm1
ram0
ram1
• out 0x11, r31
• Important instructions
• convxx
• repeat vs cmp & jump
• set, clr
• move, ld, st
• Hint : check the cycles required
for data to be ready and use
NOP accordingly.
rom0
RF
DP
PM
CP
20
• move, load and store instructions
move
move.eq
r7,r14
r22, rnd mul2 acr3
set
r21,711
ld0
ld1
r1,(ar1,r9)
r1,(ar0++%)
; r1 <- M0(ar1+r9)
; r1 <- M1(ar0)
; ar0 = (ar0==top0)?bot0:ar+step0
st1
(ar2++),r5
; M1(ar2)<-r5,ar2++
21
• Short arithmetic, logic, shift instructions
r7,r14,r15
r7,r12
• Long instructions
acr2,acr1,acr0
acr1,acr3,r2:0
convss
acr0,(ar0++%),(ar1++%)
;acr0 += M0(ar0)*M1(ar1) , ar0 ⊕ , ar1 ⊕
22
• How to use “repeat”
– Hardware loop!
……
repeat
set
move
mac
label_end, 32
r4,0xfa72
r1,sr3
acr0, r0, r1
These 3 instructions are
repeated. No (visible) loop
counter. No test. No jump.
label_end
move
……
r17,sr31
23
How to use conditional branch “jump”
set r0,32
; set loop counter
label_start
…
dec r0
; decrement loop counter
jump.ne label_start ; no delay slots
xxx
; branch delayed
yyy
; 3 cycles
zzz
;
24
• jump instruction – Another Example
……
jump.ne
move
set
move
label4
set
……
ds2,label4
r1,sr3
r2,7
r12,r3
; this will always execute
; so will this
; but not this
r7,3
set r0,32
; set loop counter
label_start
…
dec r0
; decrement loop counter
jump.ne ds3 label_start
xxx
yyy
zzz
25
• How to debug in simulator (srsim)
– r<n>: execution ‘n’ lines of instructions
– l: list the instructions around the pc
– p: print of the values in registers
• Special registers: which are ar0 and ar1?
• Accumulation registers: which is acr0?
– g: run the whole program
26
Exercise
27
Exercise
28
Convolution
4
y ( n) = ∑ h( k ) x ( n − k )
k =0
= h(0) x(n) + h(1) x(n − 1) + h(2) x(n − 2) + h(3) x(n − 3) + h(4) x(n − 4)
present sample
previous sample
…
reg
x(n-1)
h(0)
reg
x(n-2)
h(1)
+
reg
x(n-3)
h(2)
+
reg
x(n-4)
h(3)
+
h(4)
+
Round
Saturation
x(n)
y(n)
30
Exercise 1.2
31
y ( n) = ∑ h( k ) x ( n − k )
k =0
0 ≤ n < 1000
coeffs
ram1
ar1
h(0) h(1)
…
bot1
h(31)
top1
rom0
0
…
0
x(0) x(1)
…
;; coeffs copied rom0 -> ram1
fir_filter
set
r3,signal
set
r1,1000
; loop counter
set
ar1,coeffs ; ar1->coeffs
set
ar0,zeros ; ar0->signals
set
step1,1
set
bot1,coeffs
set
top1,coeffs_end
;;
loop
inc
r3
move
ar0,r3
repeat falt,32
convss acr0,(--ar0),(ar1++%)
falt
dec
r1
jump.ne ds3 loop
move
r31,rnd div2 acr0
clr
acr0
; clear accu
out
0x11,r31
;;
;;
end of code
out
0x13,r0
.rom0
.scale 2.0
x(999)
signal
ar0
signal
.df 0.0000
.df 0.588059
;; …
31
Exercise 1.2 with ringbuffer
coeffs
rom0
ar0
h(0) h(1)
…
bot0
h(31)
top0
ringbuffer
ram1
ar1
x(0)
bot1
…
x(1)
top1
;;
out
ekg
x(0) x(1)
ar2
; ekg copied rom0->ram1
; zeros in ringbuffer
; pointers fixed
;;
set
r1,1000
; loop counter
;;
loop
ld1
dec
r1
; dec loop cnt
st1
(ar1),r0
; write r.b.
repeat falt,31
convss acr0,(ar0++%),(ar1++%)
falt
move
r2,ar1
convss acr0,(ar0++%),(ar1++%)
move
ar1,r2
jump.ne ds3 loop
move
r31,rnd div2 acr0
clr
acr0
; clear accu
out
0x11,r31
…
end of code
0x13,r0
x(999)
32
x = x0 + sin
h
31
y ( n) = ∑ h( k ) x ( n − k )
k =0
33
Frequency domain
34
Exercise 1.3
ringbuffer
r0
r1
r2
r3
r4
0
0
0
0
0
coeffs
r5
r6
h0
h1
r7
r8
r9
h2
h3
h4
Unroll the loop 5 times!
Step h,x forward
Fill in x backward
in
clr
macss
macss
macss
macss
macss
move
nop
out
r0,0x10
acr0
acr0,r0,r5
acr0,r1,r6
acr0,r2,r7
acr0,r3,r8
acr0,r4,r9
r10,sat rnd acr0
in
clr
macss
macss
macss
macss
macss
move
nop
out
r4,0x10
acr0
acr0,r4,r5
acr0,r0,r6
acr0,r1,r7
acr0,r2,r8
acr0,r3,r9
r10,sat rnd acr0
0x11,r10
0x11,r10
…
35
```