Final — Spring 2024 | CSEE 4340 Final Review

Question 1(a) 4 pts

What kind of dependences does register renaming eliminate?

Register renaming eliminates write-after-write (WAW) and write-after-read (WAR) dependences.

Question 1(b) 6 pts

In the P6 microarchitecture, an instruction’s source operand value may come from three places before being placed in the reservation station. Name all three.

Question 1(c) 6 pts

What are the main advantages/disadvantages of the R10k in comparison to the P6-style architecture?

P6 style: easier rollback / easier implementation, but more data movement.
R10k style: faster implementations and less data movement, but harder implementation / harder rollback.

(Both rely on register renaming and a ROB for precise interrupts; the difference is where operand values live.)

Question 1(d) 3 pts

Name a technique that can eliminate compulsory cache misses.

Prefetching.

Question 1(e) 4 pts

What is the main advantage of using a virtually-addressed cache over a physically-addressed one?

A virtually addressed cache can be accessed before virtual-to-physical translation completes, so cache lookup latency can be lower (the TLB lookup is removed from the critical path of a hit).

Question 2(a) 6 pts

A repeating branch pattern of T, T, T, T, T, N, N is predicted with a PC-indexed 2-bit saturating-counter predictor. What is the steady-state prediction accuracy?

The steady-state accuracy is:

4 / 7 = 0.5714 = 57.14%

Reasoning in steady state:

Actual outcome	Predictor state before outcome	Prediction	Result	State after outcome
T1	strongly taken	T	correct	strongly taken
T2	strongly taken	T	correct	strongly taken
T3	strongly taken	T	correct	strongly taken
T4	strongly taken	T	correct	strongly taken
T5	strongly taken	T	correct	strongly taken
N1	strongly taken	T	wrong	weakly taken
N2	weakly taken	T	wrong	weakly not taken
next T1	weakly not taken	N	wrong	weakly taken

Per 7-outcome period, there are 3 mispredictions and 4 correct predictions.

Question 2(b) 8 pts

CPI is 1.2; ideal CPI is 1; 20% of instructions are branches; branch-prediction accuracy is 90%. How many cycles does each misprediction cost? What accuracy is needed to reduce CPI to 1.01 only by improving branch prediction?

Current CPI overhead:

1.2 - 1 = 0.2 cycles/instruction

Branch misprediction rate per instruction:

0.20 branch/instruction * (1 - 0.90) = 0.02 mispredictions/instruction

Misprediction cost:

0.2 / 0.02 = 10 cycles per misprediction

For CPI = 1.01:

1.01 = 1 + 0.20 * (1 - accuracy) * 10
0.01 = 2 * (1 - accuracy)
1 - accuracy = 0.005
accuracy = 0.995 = 99.5%

Question 2(c) 6 pts

A two-level/correlated branch predictor uses 8 PC bits to select a BHT entry; each BHT entry holds an 8-bit branch history; the PHT is indexed by that 8-bit history and stores 2-bit saturating counters. How many storage bits are needed for the BHT and the PHT?

BHT:

2^8 entries * 8 bits/entry = 256 * 8 = 2048 bits

PHT:

2^8 entries * 2 bits/entry = 256 * 2 = 512 bits

Total predictor table storage:

2048 + 512 = 2560 bits

Question 3(a) 8 pts

Compute the cache index bits for L1i, L1d, L2, and L3 caches with the following parameters. The block size is 64 B for all caches.

Cache	Size	Associativity	Block size
L1 instruction	32 KB	8-way	64 B
L1 data	32 KB	8-way	64 B
L2	256 KB	8-way	64 B
L3	20 MB	20-way	64 B

Formula:

# sets = cache size / (block size * associativity)
index bits = log2(# sets)

Answers:

Cache	Sets	Index bits
L1 instruction	32 KB / (64 B * 8) = 64	6
L1 data	32 KB / (64 B * 8) = 64	6
L2	256 KB / (64 B * 8) = 512	9
L3	20 MB / (64 B * 20) = 16384	14

Question 3(b) 9 pts

Compute the average access time (AAT) for each associativity given the per-associativity hit access time, a 10 ns miss access time, 0.3 data references per instruction, and the table’s MPKI values. Which associativity performs best?

Miss rate conversion:

miss rate = misses / accesses
          = (MPKI / 1000) / 0.3

Average access time:

AAT = hit_rate * hit_access_time + miss_rate * miss_access_time
    = (1 - miss_rate) * hit_access_time + miss_rate * 10 ns

Associativity	Hit access time	MPKI	Miss rate	Average access time
Direct-mapped	0.86 ns	6.64	0.02213	~1.061 ns
2-way	1.12 ns	3.66	0.01220	~1.228 ns
4-way	1.37 ns	0.987	0.00329	~1.398 ns
8-way	2.03 ns	0.266	0.000887	~2.037 ns

The direct-mapped cache performs best by average access time.