Skip to content
CSEE 4340 Final Review

Final — Spring 2024

Final — Spring 2024

CSEE 4340 · Spring 2024

Historically numbered as CSEE 4824.

Question 1(a) 4 pts

What kind of dependences does register renaming eliminate?

Question 1(b) 6 pts

In the P6 microarchitecture, an instruction’s source operand value may come from three places before being placed in the reservation station. Name all three.

Question 1(c) 6 pts

What are the main advantages/disadvantages of the R10k in comparison to the P6-style architecture?

  • P6 style: easier rollback / easier implementation, but more data movement.
  • R10k style: faster implementations and less data movement, but harder implementation / harder rollback.

(Both rely on register renaming and a ROB for precise interrupts; the difference is where operand values live.)

Question 1(d) 3 pts

Name a technique that can eliminate compulsory cache misses.

Question 1(e) 4 pts

What is the main advantage of using a virtually-addressed cache over a physically-addressed one?

A virtually addressed cache can be accessed before virtual-to-physical translation completes, so cache lookup latency can be lower (the TLB lookup is removed from the critical path of a hit).

Question 2(a) 6 pts

A repeating branch pattern of T, T, T, T, T, N, N is predicted with a PC-indexed 2-bit saturating-counter predictor. What is the steady-state prediction accuracy?

The steady-state accuracy is:

4 / 7 = 0.5714 = 57.14%

Reasoning in steady state:

Actual outcomePredictor state before outcomePredictionResultState after outcome
T1strongly takenTcorrectstrongly taken
T2strongly takenTcorrectstrongly taken
T3strongly takenTcorrectstrongly taken
T4strongly takenTcorrectstrongly taken
T5strongly takenTcorrectstrongly taken
N1strongly takenTwrongweakly taken
N2weakly takenTwrongweakly not taken
next T1weakly not takenNwrongweakly taken

Per 7-outcome period, there are 3 mispredictions and 4 correct predictions.

Question 2(b) 8 pts

CPI is 1.2; ideal CPI is 1; 20% of instructions are branches; branch-prediction accuracy is 90%. How many cycles does each misprediction cost? What accuracy is needed to reduce CPI to 1.01 only by improving branch prediction?

Current CPI overhead:

1.2 - 1 = 0.2 cycles/instruction

Branch misprediction rate per instruction:

0.20 branch/instruction * (1 - 0.90) = 0.02 mispredictions/instruction

Misprediction cost:

0.2 / 0.02 = 10 cycles per misprediction

For CPI = 1.01:

1.01 = 1 + 0.20 * (1 - accuracy) * 10
0.01 = 2 * (1 - accuracy)
1 - accuracy = 0.005
accuracy = 0.995 = 99.5%

Question 2(c) 6 pts

A two-level/correlated branch predictor uses 8 PC bits to select a BHT entry; each BHT entry holds an 8-bit branch history; the PHT is indexed by that 8-bit history and stores 2-bit saturating counters. How many storage bits are needed for the BHT and the PHT?

BHT:

2^8 entries * 8 bits/entry = 256 * 8 = 2048 bits

PHT:

2^8 entries * 2 bits/entry = 256 * 2 = 512 bits

Total predictor table storage:

2048 + 512 = 2560 bits

Question 3(a) 8 pts

Compute the cache index bits for L1i, L1d, L2, and L3 caches with the following parameters. The block size is 64 B for all caches.

CacheSizeAssociativityBlock size
L1 instruction32 KB8-way64 B
L1 data32 KB8-way64 B
L2256 KB8-way64 B
L320 MB20-way64 B

Formula:

# sets = cache size / (block size * associativity)
index bits = log2(# sets)

Answers:

CacheSetsIndex bits
L1 instruction32 KB / (64 B * 8) = 646
L1 data32 KB / (64 B * 8) = 646
L2256 KB / (64 B * 8) = 5129
L320 MB / (64 B * 20) = 1638414

Question 3(b) 9 pts

Compute the average access time (AAT) for each associativity given the per-associativity hit access time, a 10 ns miss access time, 0.3 data references per instruction, and the table’s MPKI values. Which associativity performs best?

Miss rate conversion:

miss rate = misses / accesses
= (MPKI / 1000) / 0.3

Average access time:

AAT = hit_rate * hit_access_time + miss_rate * miss_access_time
= (1 - miss_rate) * hit_access_time + miss_rate * 10 ns
AssociativityHit access timeMPKIMiss rateAverage access time
Direct-mapped0.86 ns6.640.02213~1.061 ns
2-way1.12 ns3.660.01220~1.228 ns
4-way1.37 ns0.9870.00329~1.398 ns
8-way2.03 ns0.2660.000887~2.037 ns

The direct-mapped cache performs best by average access time.