ECE 411 Exam 1

ECE 411 Exam 1
Name:
ECE 411 Exam 1
• This exam has 5 problems. Make sure you have a complete exam before you begin.
• Write your name on every page in case pages become separated during grading.
• You will have three hours to complete this exam.
• Write all of your answers on the exam itself. If you need more space to answer a given problem,
continue on the back of the page, but clearly indicate that you have done so.
• This exam is closed-book. You may use one sheet of notes.
• You may use a calculator.
• Do not do anything that might be perceived as cheating. The minimum penalty for cheating will
be a grade of zero.
• Show all of your work on all problems. Correct answers that do not include work demonstrating
how they were generated may not receive full credit, and answers that show no work cannot receive
partial credit.
• The exam is meant to test your understanding. Ample time has been provided. So be patient and
read the questions/problems carefully before you answer.
• Good luck!
Question
Points
ISA
14
MP
15
Cache
16
Cache and VM
16
Pipelining
9
Total:
70
Score
Name:
1. ISA (14 points)
Branch predication is a computer architecture design strategy that allows each instruction to either
perform an operation or do nothing based on a condition. The condition, called a predicate, is determined based on the value of a general purpose register. For example, the following instruction
executes when R1 is positive and does nothing if R1 is non-positive.
(R1) ADD R2, R3, 3
Answer the questions below.
(a) (2 points) List one advantage and one disadvantage for implementing predication compared to
using the traditional branch instruction.
(b) (5 points) Rewrite the following simple program using predicates. The resulting program should
not have any branch or jump instructions. Explain any optimizations that you make and feel
free to overwrite registers R1 and R2 as long as the result is not affected. Note: Each instruction
needs to have a predication register.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
; R0 = 0
R5 = 1
; VALUE1 and VALUE2
;
are both non-zero
LDR R1, R0, VALUE1
BRp FOO
CONT: LDR R2, R0, VALUE2
BRn BAR
END: ADD R3, R3, 0
BRp GOOD
ADD R3, R0, 0xBADD
HALT
GOOD: ADD R3, R0, 0x600D
HALT
FOO: ADD R3, R0, 1
LEA R7, CONT
JMP R7
BAR: AND R3, R0, 1
LEA R7, END
JMP R7
ECE 411 Exam 1
Page 2
Name:
(c) (2 points) What characteristics should VALUE1 and VALUE2 satisfy to make the R3 end with
value 0x600D/0xBADD?
(d) (5 points) Given a system that is able to execute two parallel instructions (which do not have
data dependencies on each other), rewrite the following code with predications implemented
and explain how predication can help speed up the code.
If you need additional registers in your code, you are free to use registers T1, T2, and T3.
1
2
3
4
5
6
7
8
9
10
11
12
13
14 END:
15
16
17
18 SEG1:
19
20
21
22
23 SEG2:
24
25
26
27
; Assume R0 has value 0,
;
R9 has value 1
LDR R2, R0, VALUE1
LDR R3, R0, VALUE2
...
; Some instructions that
; compute the value to
; put in location ANS
; based on R2 and R3.
...
LDR R1, R0, ANS
BRp SEG1
BRn SEG2
...
; Wrapping up
...
HALT
; Do multiply
MULT R4, R5, R6
STR R4, R0, ANS
LEA R7, END
JMP R7
; Do divide
DIV R8, R5, R6
STR R8, R0, ANS
LEA R7, END
JMP R7
ECE 411 Exam 1
Page 3
Name:
2. MP (15 points)
(a) (6 points) A copy of the MP 1 datapath is attached at the end of this exam. Implement the new
instruction Memory Increment by modifying the datapath. The instruction definition is:
MEMINC BaseR, offset6
memWord[BaseR+SEXT(offset6)<<1] ← memWord[BaseR+SEXT(offset6)<<1] +1
In the table below, list all the components of the given datapath that you need to change and
then give the specific change for each component. You should not add any new components,
just modify the original components and add signals. For example, you cannot add a MUX, but
you can expand an existing MUX to be of larger size. Only the table will be graded. No exceptions.
Component
Change
(b) (4 points) Complete the state machine below to implement the new instruction.
F r om F et ch
Decod e
ECE 411 Exam 1
Page 4
Name:
(c) (3 points) For the given C code, complete the equivalent LC-3b assembly programs below. One
uses the original LC-3b ISA and the other uses the ISA with the Memory Increment instruction
included.
1 for (int i = 0; i < N; i++)
2
vals[i]++;
Original ISA
ISA w/ Memory Increment instruction
LDR R1, R0, N
LEA R2, VALS
LDR R1, R0, N
LEA R2, VALS
LOOP:
LOOP:
ADD R2, R2, #1
ADD R1, R1, #-1
BRnz LOOP
END: BRnzp END
N:
DATA2 0x40
VALS: DATA2 0x02
DATA2 0x21
...
ADD R2, R2, #1
ADD R1, R1, #-1
BRnz LOOP
END: BRnzp END
N:
DATA2 0x40; Variable N
VALS: DATA2 0x02
DATA2 0x21
...
; Rest of Array Omitted
Give an expression for the number of instructions saved in the new ISA’s program as a function
of N.
ECE 411 Exam 1
Page 5
Name:
(d) (2 points) A student proposes putting a loop counter in memory and using the new memory
increment instruction to increment the counter every iteration. What is a performance issue
with this proposed method even if condition codes could be generated for memory locations?
ECE 411 Exam 1
Page 6
Name:
3. Cache (16 points)
(a) A 2-way set associative write back cache with true LRU replacement policy requires 15 × 29 bits
of storage to implement its tag store (including bits for valid, dirty, and LRU). The cache is virtually indexed and physically tagged. The virtual address space is 1 MB, the page size is 2 KB,
and each cache block is 8 bytes.
i. (2 points) What is the size of the data store in bytes?
ii. (2 points) How many bits of the virtual index come from the virtual page number?
iii. (2 points) What is the physical address space of this memory system?
ECE 411 Exam 1
Page 7
Name:
(b) (8 points) Below are four different sequences of memory addresses generated by a program
running on a processor with a cache. The cache hit ratio for each sequence is also shown.
Sequence No.
Address Sequence
Hit Ratio
1
0, 512, 1024, 1536, 2048, 1536, 1024, 512, 0
0.33
2
0, 2, 4, 8, 16, 32
0.33
3
0, 512, 1024, 0, 1536, 0, 2048, 512
0.25
4
0, 64, 128, 256, 512, 256, 128, 64, 0
0.33
Assume that
• the cache is initially empty at the beginning of each sequence,
• all memory accesses are one byte accesses,
• all addresses are byte addresses.
Find parameters below such that a cache with the discovered parameters would behave according to the above table.
i. Associativity
ii. Block size
iii. Total cache size
iv. Replacement policy (LRU, Pseudo LRU, or FIFO)
ECE 411 Exam 1
Page 8
Name:
(c) (2 points) Lyle is dreaming of a multicore LC-3s processor with virtual memory support. The
processor will have 10 KB pages, a block size of 256 bytes, and a shared L3 cache. Lyle wants to
make the L3 cache as large as 128 MB, but is concerned about the synonym problem and thinks
it will complicate the design. What solution would you suggest to Lyle to solve the synonym
problem?
ECE 411 Exam 1
Page 9
Name:
(d) (8 points (bonus)) You have three processors D, E, and F each with only one level of cache. The
caches have the following parameters:
• All caches have
– a total size of 128 bytes
– a block size of 32 bytes
– LRU replacement policy
• D uses a direct mapped cache
• E uses a 2-way set associative cache
• F uses a fully associative cache
A benchmark was run to evaluate the processors which tests memory read performance by issuing read requests to the cache. Assume the caches are empty at the beginning of the benchmark.
The benchmark generates the following cache accesses:
A B A H B G H H A E H D H G C C G C A B H D E C C B A D E F
Each letter is a unique cache block and all eight cache blocks are contiguous in memory. However, the ordering of letters does not correspond to the cache block ordering in memory.
i. The benchmark running on processor D generates the following sequence of cache misses:
A B A H B G A E D H C G C B D A F
Identify which cache blocks belong in the same set (for the cache in processor D).
ii. For processor E, the benchmark crashes after the following sequence of cache misses:
A B H G E
Can you identify which cache blocks are in the same set for the cache of processor E? Explain your answer.
ECE 411 Exam 1
Page 10
Name:
iii. Write down, in order of generation, the sequence of cache misses for the benchmark running on processor F.
iv. What is the cache miss rate for the benchmark running on processor F?
ECE 411 Exam 1
Page 11
Name:
4. Cache and VM (16 points)
Consider a memory system with the following parameters and components:
• Byte addressable
• 256 byte (28 byte) page size
Cache:
• Virtually-indexed and physically-tagged
• 4-way set-associative with 6 index bits
• 4 KB (212 byte) data storage (excluding bits for dirty, valid, tag and LRU)
• Read allocate policy
• Indexing the data array takes 10 ns
• Indexing the tag array takes 8 ns
• Tag comparison takes 4 ns
• Multiplexing the output data takes 3 ns
• A cache miss takes 100 ns to access the main memory and allocate to the cache line
• Assume a hit or miss is detected immediately after the tag comparison
• Initially empty (all lines are invalid)
TLB:
Page Table:
• Fully-associative
• A TLB access takes 5 ns
• Single level page table
• Read allocate policy
• A page table access takes 80 ns
• TLB is updated on a TLB miss
• Some of the entries are listed below
• All entries are listed below
Valid
VPN
PPN
Valid
VPN
PPN
0
00 0000 0001
0001 0000
0
00 0000 0000
0001 0000
1
00 0000 0010
0000 1110
1
00 0000 0001
0011 0011
0
00 0001 0110
0011 0011
1
00 1101 0010
0000 0000
0
00 0001 1011
0000 0000
0
01 0000 0011
0010 0100
1
11 1010 0100
1000 0100
1
01 1010 0010
1110 0001
1
11 0100 0101
0011 0011
1
01 1111 1101
0000 1110
1
10 0010 1010
1100 0110
1
10 0000 0110
1100 0110
0
00 0000 0000
0000 0001
0
11 1111 1111
0101 0110
ECE 411 Exam 1
Page 12
Name:
(a) (12 points) Fill in the blanks and calculate the cache access times for the following actions in
sequence (the second action follows immediately after the first one). Show your calculations
for full credit. Write the address in hex and circle hit or miss.
i. Read virtual address 4x00107
• Cache access time:
ns
• Physical address:
• Cache hit / miss
• TLB
hit / miss
ii. Read virtual address 4x34500
• Cache access time:
ns
• Physical address:
• Cache hit / miss
• TLB
hit / miss
(b) (2 points) Write a virtual address in hex that will cause a page fault. How will it be handled when
this address is accessed?
(c) (2 points) When should the TLB be flushed? Explain why.
ECE 411 Exam 1
Page 13
Name:
5. Pipelining (9 points)
(a) (3 points) What is highest speedup possible through pipelining for a 6 instruction program if
latch delay is 2 ns and total combinational logic delay of a non-pipelined design is 10 ns?
(b) (2 points) Give a specific example of a sequence of assembly instructions that include a data
dependency (hazard) that cannot be resolved by forwarding alone.
(c) (4 points) Consider the code below.
1
2
3
4
add $t0, $s0, $s1
xor $t1, $t0, $s2
lw $s0, -12($a0)
sub $s5, $s0, $s1
Is it possible to resolve any of the hazards in the above code by reordering the instructions so
that forwarding would be unnecessary? If yes, show how. If not, explain why not.
ECE 411 Exam 1
Page 14
Name:
Appendix: MP 0 Datapath
+2
marmux_sel load_mar
pcmux_sel
load_pc
0
0
16
PC
16
16
MAR
16
mem_address
16
mem_wdata
1
+
1
mdrmux_sel load_mdr
16
0
ADJ9
16
load_ir
mem_rdata
16
MDR
1
load_regfile
alumux_sel
aluop
storemux_sel
offset9
9
opcode
4
3
dest
1
IR
3
sr1
3
sr2
6
offset6
REG
FILE
0
16
sr1_out
16
sr2_out
ALU
16
0
1
ADJ6
16
load_cc
ir_nzp
GENCC
16
regfilemux_sel
0
1
ECE 411 Exam 1
Page 15
3
CC
3
CCCOMP
branch_enable
Name:
Appendix: LC-3b State Diagram
ECE 411 Exam 1
Page 16
Name:
Appendix: LC-3b ISA
15
+
ADD
14
13
12
11
10
9
8
7
6
5
4
0001
DR
SR1
0
0001
DR
SR1
1
0101
DR
SR1
0
AND
0101
DR
SR1
1
BR
0000
JMP
1100
JSR
0100
1
JSRR
0100
0
+
ADD
+
AND
+
LDB
+
n
000
2
00
SR2
imm5
00
SR2
imm5
BaseR
000000
PCoffset11
00
BaseR
000000
0010
DR
BaseR
offset6
1010
DR
BaseR
offset6
0110
DR
BaseR
offset6
1110
DR
1001
DR
SR
111111
RET
1100
000
111
000000
RTI
1000
SHF +
1101
DR
SR
STB
0011
SR
BaseR
offset6
STI
1011
SR
BaseR
offset6
STR
0111
SR
BaseR
offset6
TRAP
1111
0000
LDI +
LDR
+
LEA
+
NOT
+
1
PCoffset9
p
z
3
PCoffset9
000000000000
ECE 411 Exam 1
Page 17
A
D
imm4
trapvect8
0
Was this manual useful for you? yes no
Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Download PDF

advertising