Last Time: Address Translation Lecture 19: Virtual memory - practice

Last Time: Address Translation Lecture 19: Virtual memory - practice
Last Time: Address Translation
Lecture 19:
Virtual memory - practice
Virtual address
Page table
base register
(PTBR)
Computer Architecture and
Systems Programming
(252-0061-00)
Virtual page number (VPN)
Page table
Page table address
for process
Timothy Roscoe
Virtual page offset (VPO)
Valid
Physical page number (PPN)
Valid bit = 0:
page not in memory
(page fault)
Herbstsemester 2012
Physical page number (PPN)
Physical page offset (PPO)
Physical address
© Systems Group | Department of Computer Science | ETH Zürich
Last Time: Page Fault
Exception
TLB Hit
Page fault handler
CPU Chip
4
2
PTEA
CPU Chip
1
VA
CPU
Victim page
MMU
PTE
VPN
3
5
PTE
Cache/
Memory
3
7
TLB
2
Disk
New page
CPU
1
VA
MMU
PA
4
6
Cache/
Memory
Data
5
A TLB hit eliminates a memory access
TLB Miss
CPU Chip
TLB
2
Today
• A note on Terminology
4
PTE
• Virtual memory (VM)
VPN
CPU
1
VA
MMU
– Multi-level page tables
3
PTEA
PA
Cache/
Memory
5
Data
6
A TLB miss incurs an add’l memory access (the PTE)
Fortunately, TLB misses are rare
• Case study: VM system on P6
• Historical aside: VM system on the VAX
• x86-64 and 64-bit paging
• Performance optimization for VM system
Terminology
Today
• Virtual Page may refer to:
• A note on Terminology
– Page-aligned region of virtual address space
and
– Contents thereof (different – might appear on disk)
• Virtual memory (VM)
– Multi-level page tables
• Case study: VM system on P6
• Physical Page:
• Historical aside: VM system on the VAX
– Page-aligned region of physical memory (RAM)
• Physical Frame (=Physical Page)
• x86-64 and 64-bit paging
– Alternative terminology
– Page = contents, Frame = container
– WĂŐĞƐŝnjĞŵĂLJďĞтĨƌĂŵĞƐŝnjĞ;ƌĂƌĞůLJͿ
• Performance optimization for VM system
Multi-Level Page Tables
•
2-Level Page Table Hierarchy
Given:
Level 2
Tables
– 4KB (212) page size
– 48-bit address space
– 4-byte PTE
•
Problem:
– Would need a 256 GB page table!
Level 1
Table
• 248 * 2-12 * 22 = 238 bytes
•
Common solution
Level 1
page table
PTE 0
PTE 0
...
PTE 1
...
VP 1023
PTE 2 (null)
PTE 1023
VP 1024
PTE 0
...
PTE 5 (null)
...
PTE 6 (null)
PTE 1023
Gap
1023 null
PTEs
PTE 1023
1023
unallocated
pages
VP 9215
...
Today
• A note on Terminology
Virtual Address
n-1
p-1
1
VPN 1
VPN 2
Level 2
page table
Level 1
page table
...
...
VPN k
...
0
• Virtual memory (VM)
VPO
Level k
page table
– Multi-level page tables
• Case study: VM system on P6
PPN
• Historical aside: VM system on the VAX
m-1
p-1
p
PPN
Physical Address
0
PPO
6K unallocated VM pages
PTE 8
(1K - 9)
null PTEs
Translating with a k-level Page
Table
2K allocated VM pages
for code and data
VP 2047
PTE 7 (null)
– Level 1 table stays in memory
– Level 2 tables paged in and out
0
...
PTE 3 (null)
...
Virtual
memory
VP 0
PTE 4 (null)
– Multi-level page tables
– Example: 2-level page table
– Level 1 table: each PTE points to a
page table
– Level 2 table: each PTE points to a
page
(paged in and out like other data)
Level 2
page tables
• x86-64 and 64-bit paging
• Performance optimization for VM system
1023 unallocated pages
1 allocated VM page
for the stack
Intel P6
P6 Memory System
32 bit address space
DRAM
• Internal designation for successor to Pentium
4 KB page size
external
system bus
(e.g. PCI)
– Which had internal designation P5
• Fundamentally different from Pentium
– Out-of-order, superscalar operation
L1, L2, and TLBs
• 4-way set associative
L2
cache
• Resulting processors
Inst TLB
• 32 entries
• 8 sets
cache bus
– Pentium Pro (1996)
– Pentium II (1997)
– Pentium III (1999)
• Different microarchitecture to the Pentium 4
instruction
fetch unit
– Similar memory system
– P4 abandoned by Intel in 2005 for P6-based Core 2 Duo
Data TLB
inst
TLB
bus interface unit
• L2 cache on same chip
•
•
data
TLB
L1
i-cache
L1 i-cache and d-cache
•
•
L1
d-cache
processor package
•
•
32
result
CPU
• Components of the virtual address (VA)
20
VPN
TLBI: TLB index
TLBT: TLB tag
VPO: virtual page offset
VPN: virtual page number
12
VPO
16
TLBT
PPO: physical page offset (same as VPO)
PPN: physical page number
CO: byte offset within cache line
CI: cache index
CT: cache tag
TLB (16 sets, 4 entries/set)
PDE
7 5
CI CO
C
physical
address (PA)
PTE
Page tables
12 11
Page table physical base address
PDEs
20
CT
1
12
P
PPO
P6 Page Directory Entry (PDE)
• Page directory
– 1024 4-byte page table entries (PTEs) that point to
pages
– Size: exactly one page
– Page tables can be paged in and out
...
...
31
• Page tables:
L1 (128 sets, 4 lines/set)
TLB
hit
0
20
PPN
N
PDBR
• Make PD the page table
• Fixes page size to 4MB (why?)
L1
miss
L1
hit
4
TLBI
10
10
VPN1
1 VPN2
P6 2-level Page Table Structure
L2 and DRAM
virtual address (VA)
TLB
miss
• Components of the physical address (PA)
– 1024 4-byte page directory entries (PDEs) that
point to page tables
– One page directory per process
– Page directory must be in memory when its process
is running
page
– Always pointed to by PDBR
directoryy
– Large page support:
1024
unified
128 KB–2 MB
Overview: P6 Address Translation
Review of Abbreviations
–
–
–
–
–
16 KB
32 B line size
128 sets
L2 cache
•
–
–
–
–
64 entries
16 sets
Up to 1024
page tables
1024
PTEs
...
1024
PTEs
...
1024
PTEs
9
Avail
8
7
G
PS
6
5
A
4
3
2
1
0
CD WTT U/SS R/W
W P=1
Page table physical base address: 20 most significant bits of physical page table
address (forces page tables to be 4KB aligned)
Avail: These bits available for system programmers
G: global page (don’t evict from TLB on task switch)
PS: page size 4K (0) or 4M (1)
A: accessed (set by MMU on reads and writes, cleared by software)
CD: cache disabled (1) or enabled (0)
WT: write-through or write-back cache policy for this page table
U/S: user or supervisor mode access
R/W: read-only or read-write access
P: page table is present in memory (1) or not (0)
31
1
Available for OS (page table location in secondary storage)
0
P=0
P6 Page Table Entry (PTE)
31
12 11
Page physical base address
9
Avail
8
7
6
5
G
0
D
A
4
3
Representation of VM Addr. Space
2
1
0
PT 3
CD WTT U/SS R/W
W P=1
P=1,, M=1
=1
P=0, M=1
=1
Page base address: 20 most significant bits of physical page address (forces
pages to be 4 KB aligned)
Page Directory
rectory
P=1,, M=1
=1
Avail: available for system programmers
P=1,, M=1
=1
P=0,, M=0
=0
G: global page (don’t evict from TLB on task switch)
P=0, M=1
=1
D: dirty (set by MMU on writes)
PT 2
ͻ
ͻ
ͻ
ͻ
PT 0
U/S: user/supervisor
Page 15
Page 14
Page 13
Page 12
31
1
Available for OS (page location in secondary storage)
P=0
Page 4
Page 3
L1
miss
L1
hit
–
–
–
–
–
–
–
L1 (128 sets, 4 lines/set)
...
...
TLB (16 sets, 4 entries/set
entries/set)
/ )
0
20
PPN
N
PDE
20
CT
1
12
P
PPO
7 5
CI CO
C
physical
address (PA)
PTE
20
16
1
1
1
1
1
TLBTag
V
G
S
W
D
V: indicates a valid (1) or invalid (0) TLB entry
TLBTag: disambiguates entries cached in the same set
PPN: translation of the address indicated by index & tag
G: page is “global” according to PDE, PTE
S: page is “supervisor-only” according to PDE, PTE
W: page is writable according to PDE, PTE
D: PTE has already been marked “dirty” (once is enough)
• Structure of the data TLB:
– 16 sets, 4 entries/set
Translating with the P6 TLB
1. Partition VPN into TLBT
and TLBI.
CPU
2. Is the PTE for VPN
cached in set TLBI?
12 virtual address
VPO
1
PDE
D
partial
TLB hit
3. Yes: Check permissions,
build physical address
2
PTE
TLB
hit 3
...
page table translation
20
PPN
4
12
1
PPO
PP
4. No: Read PTE (and PDE
if not cached) from
memory and build
physical address
entry
entry
entry
entry
entry
entry
...
entry
entry
entry
entry
set 0
set 1
entry
entry
set 15
P6 TLB Translation
32
result
CPU
T
TLB
miss
m
Page 0
PPN
Page tables
4
TLBI
Page 1
• TLB entry (not all documented, so this is speculative):
TLB
hit
16
TLBT
Unmapped
Page 5
L2 and DRAM
virtual address (VA)
TTLB
m
miss
20
VPN
On Disk
Page 6
P6 TLB
32
result
4
TLBI
PDBR
In Mem
Page 2
P6 TLB Translation
CPU
Disk Addr
Page 7
– P: Is entry in physical memory?
– M: Has this part of VA space been
mapped?
0
Mem Addr
Page 10
Page 8
• Flags
P: page is present in physical memory (1) or not (0)
Page 11
Page 9
– 16 page virtual address space
R/W: read/write
10
10
VPN1
1 VPN2
P=0,, M=1
=1
P=0,, M=1
=1
ͻ
ͻ
ͻ
ͻ
ͻ
ͻ
ͻ
ͻ
ͻ
ͻ
ͻ
ͻ
• Simplified Example
WT: write-through or write-back cache policy for this page
16
6
TLBT
P=0,, M=0
=0
P=0,, M=0
=0
P=0, M=0
=0
CD: cache disabled or enabled
12
VPO
P=1,, M=1
=1
P=1,, M=1
=1
P=0, M=1
=1
A: accessed (set by MMU on reads and writes)
20
VPN
P=1,, M=1
=1
P=0,, M=0
=0
20
VPN
12
VPO
16
TLBT
virtual address (VA)
L1
miss
L1
hit
4
TLBI
TLB
hit
TLB
miss
L1 (128 sets, 4 lines/set)
...
...
TLB (16 sets, 4 entries/set)
10
10
VPN1
1 VPN2
0
20
PPN
N
physical
address
PDE
PDBR
L2 and DRAM
Page tables
table
PTE
1
12
P
PPO
physical
address (PA)
20
CT
7 5
CI CO
C
Translating with P6 Page Tables
(case 1/1)
20
VPN
• Page table and
page present
12
VPO
20
PPN
VPN1
1 VPN2
1
12
PPO
PP
Mem
PDBR
Translating with P6 Page Tables
(case 1/0)
p=1
P
PDE
PTE p=1
data
Page
directory
Page table
Data page
Da
Disk
i k
20
VPN
• MMU Action:
• MMU Action:
– MMU builds
physical address
and fetches data
word
– Page fault exception
– Handler receives
the following args:
VPN1
1 VPN2
Mem
PDBR
• OS action
– None
• Page table present,
page missing
12
VPO
p=1
P
PDE
PTE p=0
Page
directory
Page table
Disk
i k
data
• %eip that caused
fault
• VA that caused
fault
• Fault caused by
non-present page
or page-level
protection
violation
– Read/write
– User/supervisor
Data page
Translating with P6 Page Tables
(case 1/0, cont.)
Translating with P6 Page Tables
(case 0/1)
• OS Action:
20
VPN
12
VPO
20
PPN
VPN1
1 VPN2
1
12
PP
PPO
Mem
p=1
P
PDE
PDBR
Page
directory
PTE p=1
Page table
data
Data page
Da
Disk
i k
– Check for a legal
virtual address.
– Read PTE through
PDE.
– Find free physical
page (swapping out
current page if
necessary)
– Read virtual page
from disk into
physical page
– Adjust PTE to point to
physical page, set
p=1
– Restart faulting
instruction by
returning from
exception handler
Translating with P6 Page Tables
(case 0/0)
20
VPN
20
VPN
Mem
PDBR
p=0
P
PDE
data
Page
directory
Data page
PTE p=1
P
Translating with P6 Page Tables
(case 0/0, cont.)
20
VPN
• OS action:
12
VPO
– Swap in page
table
– Restart faulting
instruction by
returning from
handler
VPN1
1 VPN2
Mem
p=0
P
PDE
PDBR
PDBR
Page
directory
Disk
i k
• Linux disallows this
Page table
– Page fault
Mem
– Potentially every
page-out requires
update of disk
page table
– If a page table is
swapped out, then
swap out its data
pages too
Disk
i k
• MMU Action:
VPN1
1 VPN2
• Introduces
consistency issue
VPN1
1 VPN2
• Page table and
page missing
12
VPO
• Page table missing,
page present
12
VPO
p=1
P
PDE
PTE p=0
P
Page
directory
Page table
• Like case 0/1
from here on.
Disk
i k
– Two disk reads
PTE p=0
P
data
data
Page table
Data page
Data page
P6 L1 Cache Access
20
VPN
12
VPO
16
TLBT
L1 Cache Access
32
result
CPU
L2 and DRAM
32
data
virtual address (VA)
L1
miss
L1
hit
4
TLBI
L1 (128 sets, 4 lines/set)
....
...
...
...
..
TLB (16 sets, 4 entries/set)
10
10
VPN1
1 VPN2
0
20
PPN
N
PDE
L1
miss
L1
hit
L1
1 (1
(128 sets, 4 lines/set)
TLB
hit
TLB
miss
L2 and DRAM
20
0
CT
12
1
P
PPO
7 5
CI CO
C
physical
address (PA)
physical
address (PA)
PTE
• Partition physical
address: CO, CI, and CT
• Use CT to determine if
line containing word at
address PA is cached in
set CI
• No: check L2
20
CT
7 5
CI CO
C
• Yes: extract word at byte
offset CO and return to
processor
Page tables
PDBR
Speeding Up L1 Access
Today
Tag Check
20
CT
7 5
CI CO
C
PPN
PPO
• A note on Terminology
• Virtual memory (VM)
Physical address (PA)
Virtual address (VA)
– Multi-level page tables
No
ange
Change
Address
dd
Translation
VPN
VPO
20
12
CI
• Historical aside: VM system on the VAX
• x86-64 and 64-bit paging
• Observation
–
–
–
–
–
• Case study: VM system on P6
Bits that determine CI identical in virtual and physical address
Can index into cache while address translation taking place
Generally we hit in TLB, so PPN bits (CT bits) available next
“Virtually indexed, physically tagged”
Cache carefully sized to make this possible
Historical aside:
virtual page tables
• Performance optimization for VM system
Historical aside:
virtual page tables
• Same problem: linear page table can be large.
• Same problem: linear page table can be large.
• On the VAX:
• On the VAX:
– Page size = 512 bytes (so offset = 9 bits)
– Virtual address space = 32bits
Ÿ page table index = 23 bits
2
21
ss
Segments:
Addr VPN
9
Addr VPO
00
P0
01
P1
User program text and data
User stack
10
S0
System: kernel and page tables
11
S1
Unused (reserved)
– Page size = 512 bytes (so offset = 9 bits)
– Virtual address space = 32bits
Ÿ page table index = 23 bits
Ÿ page table size = 8388608 entries
– Each PTE = 4 bytes (32 bits)
Ÿ 32 Mbytes per page table (i.e. per process!)
– Too much memory in those days…
Solution: put the linear table
into virtual memory!
VAX translation process
Actually an
addition
Ÿ more bits than
shown here
• Of course, most of the PTEs are not used
– Invalid translation: saves space
10
21
0x
9
Addr VPN
PxTB
Addr VPO
Addr VPN
00
іsŝƌƚƵĂůĂĚĚƌĞƐƐŽĨWd
(in seg. S0: system space)
• TLB hides most of the double lookups
Addr VPN
Physical address
Linear page table in
physica lmemory
Addr VPO
Linear page table in
virtual memory
ss
іsŝƌƚƵĂůĂĚĚƌĞƐƐƌĞƋƵĞƐƚĞĚ
(in seg. Px: user space)
Another
ǀŝƌƚƵĂůїƉŚLJƐŝĐĂů
translation
VAX translation process
Actually an
addition
Ÿ more bits than
shown here
10
21
0x
10
PTE VPN
SPTB
PTE VPN
іsŝƌƚƵĂůĂĚĚƌĞƐƐƌĞƋƵĞƐƚĞĚ
(in seg. Px: user space)
Addr VPO
Addr VPN
10
21
9
Addr VPN
PxTB
VAX translation process
00
0x
10
іsŝƌƚƵĂůĂĚĚƌĞƐƐŽĨWd
(in seg. S0: system space)
PTE VPO
іWŚLJƐŝĐĂůĂĚĚƌĞƐƐŽĨWd
mapping the PTE we want
00
PxTB
Addr VPO
Addr VPN
10
10
9
Addr VPN
PTE VPN
SPTB
00
PTE VPO
PTE VPN
PTE PFN
іWŚLJƐŝĐĂůĂĚĚƌĞƐƐŽĨWd
we want
PTE
TE VP
VPO
Load
10
іsŝƌƚƵĂůĂĚĚƌĞƐƐŽĨWd
(in seg. S0: system space)
іWŚLJƐŝĐĂůĂĚĚƌĞƐƐŽĨWd
mapping the PTE we want
00
Load
10
іsŝƌƚƵĂůĂĚĚƌĞƐƐƌĞƋƵĞƐƚĞĚ
(in seg. Px: user space)
Addr PFN
Addr
dr P
PFO
іdŚĞWd͗ƉŚLJƐŝĐĂůĂĚĚƌĞƐƐ
of value we want
Load
Data
VAX translation process
If you can really understand why this
is the case, you’ll have no problem
understanding Virtual Memory
systems -
• Not so bizarre after all:
This really is a 2-level page table!
2
0x
10
14
VPN 2
7
VPN 1
PxTB
VPN 2
PTE VPO
O
PxTB
VPN 2
іDĞŵŽƌLJǀĂůƵĞ;ĂƚůĂƐƚ͊Ϳ
Today
• A note on Terminology
• Virtual memory (VM)
9
VPO
– Multi-level page tables
• Case study: VM system on P6
• Historical aside: VM system on the VAX
• x86-64 and 64-bit paging
10
SPTB
Page table
base address
1st level
index
00
2nd level
index
• Performance optimization for VM system
x86-64 Paging
x86-64 Paging
• Origin
9
VPN1
9
VPN2
Page Map
Table
Page
Directory
Pointer
Table
Page
Directory
Table
Page
Table
PM4LE
PDPE
PDE
PTE
– AMD’s way of extending x86 to 64-bit instruction set
– Intel has followed with “EM64T”
• Requirements
– 48-bit virtual address
9
VPN3
9
VPN4
12
VPO
Virtual address
• 256 terabytes (TB)
• Not yet ready for full 64 bits
– Nobody can buy that much DRAM yet
– Mapping tables would be huge
– Multi-level array map may not be the right data structure
– 52-bit physical address = 40 bits for PPN
BR
• Requires 64-bit table entries
– Keep traditional x86 4KB page size, and same size for page
tables
40
12
PPO
PPN
• (4096 bytes per PT) / (8 bytes per PTE) = only 512 entries per page
Today
Physical address
Large Pages
• A note on Terminology
10
22
20
12
VPN
VPO
VPN
VPO
10
22
20
12
PPN
PPO
PPN
PPO
versus
• Virtual memory (VM)
– Multi-level page tables
• Case study: VM system on P6
• x86-64 and 64-bit paging
• 4MB on 32-bit, 2MB on 64-bit
• Simplify address translation
• Useful for programs with very large, contiguous working sets
• Performance optimization for VM system
• How to use (Linux)
• Historical aside: VM system on the VAX
– Reduces compulsory TLB misses
– hugetlbfs support (since at least 2.6.16)
– Use libhugetlbs
• {m,c,re}alloc replacements
Buffering: Example MMM
• But: Look at one iteration
• Blocked for cache
c
Block size B x B
a
= i1
Buffering: Example MMM
(cont.)
b
*
assume > 4 KB = 512 doubles
c
+
c
a
=
b
*
c
+
blocksize
B = 150
each row used O(B) times
but every time O(B2) ops between
• Assume blocking for L2 cache
– say, 512 MB = 219 B = 216 doubles = C
– 3B2 фŵĞĂŶƐу150
• Consequence
– Each row is on different page
– More rows than TLB entries: TLB thrashing
– Solution: buffering = copy block to contiguous memory
• O(B2) cost for O(B3) operations
Next time: I/O Devices
• What is a device?
• Registers
– Example: NS16550 UART
• Interrupts
• Direct Memory Access (DMA)
• PCI (Peripheral Component Interconnect)
Was this manual useful for you? yes no
Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Download PDF

advertisement