

## Basic virtual memory caching questions

- Where can a block be placed?
  - Since miss penalties are very high, OS designers always choose lower miss rates over simple placement algorithms
  - VM is almost always fully-associative (blocks can be placed anywhere in main memory)
- Which block is replaced?
  - Most operating systems use LRU or an approximation to it
  - The page table often includes a reference bit to help do LRU replacement

## Basic virtual memory caching questions

- How is a block found?
  - Paging systems use a page table to translate virtual page numbers into physical page numbers
  - The physical address is constructed by concatenating the physical page number (found in the table) to the offset
  - Segmented systems use a similar structure except that the segment's physical address is ADDED to the offset
  - The page table needs enough entries to map the entire virtual address space since it is accessed using virtual page numbers
    - This results lots of space dedicated just to the page table
    - One optimization is to use hashing to restrict the number of page table entries to the number of physical pages (inverted page table)
  - Translation lookaside buffers (TLBs) are used to cache these translations, and reduce address translation time

Chapter 5

16-Apr-00



## Using virtual memory for protection Memory protection • To ensure protection, CPU provides: VM is often used to protect a program from other programs - User/supervisor mode bit(s): separation of user & OS functions $\Rightarrow$ Protection mechanisms must have hardware support - Interrupt enable/disable bit(s): atomic operations Base & bounds • Virtual memory offers a more fine-grained alternative - Each reference must fall between two addresses, given by the base & bound registers - Each process has its own page table, which it cannot modify itself - This method also allows some relocation - Permission flags are provided with each segment or page - User processes cannot be allowed to change these registers, but the OS • Read/write must be able to do so on a process switch • Execute Therefore, the hardware must provide: - Concentric rings of security and capability lists are more fine-grained alternatives, allowing more than two levels of protection - At least two modes of operations, user and kernel mode and a mechanism to switch between them $\Rightarrow$ The OS course discusses VM in more detail - A protection mechanism for other portions of the CPU state to prevent user processes from being malicious **UMBC** UMBC 16-Apr-00 CMSC 611 (Advanced Computer Architecture), Spring 2000 Chapter 5 16-Apr-00 CMSC 611 (Advanced Computer Architecture), Spring 2000 Chapter 5 10 Effect of CPU design on memory hierarchy I/O and cache consistency Superscalar & vector execution • I/O devices move data from peripherals to memory - A superscalar or vector machine may fetch several words per cycle • This has two pitfalls: - Clearly, the memory system must deliver the bandwidth to handle this; - Data written into memory is not automatically updated in the cache otherwise the benefit is lost - Data in a writeback cache is not written to memory immediately so - The brunt of the load falls upon the L1 cache memory has stale data • Bandwidth can be increased by widening the path to the cache or • One solution is to flush blocks from the cache that are used in by providing extra ports to the cache the I/O operation • However, cache access is often the bottleneck in modern CPUs - Before the I/O for a write (so the write operation uses up-to-date

- Speculative execution
  - Speculative execution and conditional instructions may generate invalid addresses that would not occur otherwise
  - The memory system must recognize and suppress these exceptions
  - Similarly, it must not stall the cache on a miss caused by a speculative instruction

16-Apr-00

11

Chapter 5

16-Apr-00

٠

information)

buffers as *uncacheable* 

12

- After the I/O for the read (before the I/O should work as well. The CPU

should not access the data as it is being read into memory)

An alternate method is simply to mark the blocks from I/O

