Cache Lookup Strategies
Let's say we have a 32KB cache with 32-byte blocks (lines). This gives us 1024 blocks which can be held in the cache. When we want to look in the cache to see if a particular block is present, we can use one of 3 basic strategies:
- look in every one of the 1024 blocks to see if the block's tag matches our access address. This is known as a "fully-associative" cache, because any block from memory can reside in any block in the cache.
- use 10 bits of the access address as a selection index to one of the 1024 blocks, then compare that block's tag with our access address. This is known as a "direct-mapped" cache, where a particular block in memory can only go into one particular block in the cache. All the memory blocks that go to the same cache block are said to be in the same "congruence class".
- use a fewer number of bits of the access address as a selection index into a "set" of cache blocks, then search each element of that set for a tag match. This method is somewhere between a) and b), above, and is known as "set-associative". For example, if I divide the cache up into 128 sets of 8 blocks each, I have an 8-way set-associative cache which is indexed by 7 bits of the access address. A particular block in memory can then go into any of the 8 blocks in its associated set.
Direct-mapped caches (2) are the easiest to build, but suffer the most from thrashing due to the "conflict misses" you mention above, where more than one frequently-used memory block competes for the same cache block.
Fully-associative caches (1) don't suffer from conflict misses, since any memory block can go anywhere in the cache. However, they are typically larger, more complicated, and slower than the other types, and usually require specialized CAM (Content Addressable Memory) cells.
Most caches are implemented using set-associativity (3), which lets you build the equivalent of N direct-mapped caches, each with its own tag comparator, all operating in parallel, with a final selection mux at the bottom.
Set-associative caches can also suffer from conflict misses, but to a much smaller degree than direct-mapped caches: in an 8-way set-associative cache, you have to have 8 frequently-used memory blocks all mapping to the same set before you get measureable conflict misses.
Set-associativity also gives you another benefit. The bits you use for index selection from the access address should be from the (untranslated) lower bits of the address to allow cache lookup in parallel with MMU address translation. However, as caches grow larger, the number of cache block index bits plus the low-order bits used to select a byte within a block exceed the number of untranslated bits.
For example, with a 4KB page (12 bits) and a 32-byte block (5 bits), there are only 7 bits available for index. Therefore you could only build a 4KB direct-mapped cache. But if you go to 8-way set-associative, you can build a 32KB cache.
Cache Replacement Strategies
When a lookup misses, we need to pick a victim block to replace in the cache. In a direct-mapped cache this is easy; it's the one block we selected during lookup. In set-associative and fully-associative caches, we must use some algorithm to determine which block to replace. The best algorithm is to pick the block which will be used latest in the future, but that's a bit tricky, given that there are few workable precognition circuits available. Instead, one typically uses a form of LRU (Least Recently Used) as a predictor of future access patterns.
When a block enters the cache or is referenced, it is marked as "Most Recently Used" (MRU), and all other blocks in the set are adjusted accordingly. When a miss occurs, the LRU block is picked for replacement:
mru lru access: 1 set: [ ][ ][ ][ ] -> [1][ ][ ][ ]
access: 2 set: [1][ ][ ][ ] -> [2][1][ ][ ]
access: 3 set: [2][1][ ][ ] -> [3][2][1][ ]
access: 4 set: [3][2][1][ ] -> [4][3][2][1]
access: 2 set: [4][3][2][1] -> [2][4][3][1]
access: 4 set: [2][4][3][1] -> [4][2][3][1]
access: 5 set: [4][2][3][1] -> [5][4][2][3]
^replace lru





