Skip to content

Commit bbf45ba

Browse files
Hollis Blanchardavikivity
authored andcommitted
KVM: ppc: PowerPC 440 KVM implementation
This functionality is definitely experimental, but is capable of running unmodified PowerPC 440 Linux kernels as guests on a PowerPC 440 host. (Only tested with 440EP "Bamboo" guests so far, but with appropriate userspace support other SoC/board combinations should work.) See Documentation/powerpc/kvm_440.txt for technical details. [stephen: build fix] Signed-off-by: Hollis Blanchard <[email protected]> Acked-by: Paul Mackerras <[email protected]> Signed-off-by: Stephen Rothwell <[email protected]> Signed-off-by: Avi Kivity <[email protected]>
1 parent 513014b commit bbf45ba

File tree

19 files changed

+3159
-2
lines changed

19 files changed

+3159
-2
lines changed

Documentation/powerpc/kvm_440.txt

Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,41 @@
1+
Hollis Blanchard <[email protected]>
2+
15 Apr 2008
3+
4+
Various notes on the implementation of KVM for PowerPC 440:
5+
6+
To enforce isolation, host userspace, guest kernel, and guest userspace all
7+
run at user privilege level. Only the host kernel runs in supervisor mode.
8+
Executing privileged instructions in the guest traps into KVM (in the host
9+
kernel), where we decode and emulate them. Through this technique, unmodified
10+
440 Linux kernels can be run (slowly) as guests. Future performance work will
11+
focus on reducing the overhead and frequency of these traps.
12+
13+
The usual code flow is started from userspace invoking an "run" ioctl, which
14+
causes KVM to switch into guest context. We use IVPR to hijack the host
15+
interrupt vectors while running the guest, which allows us to direct all
16+
interrupts to kvmppc_handle_interrupt(). At this point, we could either
17+
- handle the interrupt completely (e.g. emulate "mtspr SPRG0"), or
18+
- let the host interrupt handler run (e.g. when the decrementer fires), or
19+
- return to host userspace (e.g. when the guest performs device MMIO)
20+
21+
Address spaces: We take advantage of the fact that Linux doesn't use the AS=1
22+
address space (in host or guest), which gives us virtual address space to use
23+
for guest mappings. While the guest is running, the host kernel remains mapped
24+
in AS=0, but the guest can only use AS=1 mappings.
25+
26+
TLB entries: The TLB entries covering the host linear mapping remain
27+
present while running the guest. This reduces the overhead of lightweight
28+
exits, which are handled by KVM running in the host kernel. We keep three
29+
copies of the TLB:
30+
- guest TLB: contents of the TLB as the guest sees it
31+
- shadow TLB: the TLB that is actually in hardware while guest is running
32+
- host TLB: to restore TLB state when context switching guest -> host
33+
When a TLB miss occurs because a mapping was not present in the shadow TLB,
34+
but was present in the guest TLB, KVM handles the fault without invoking the
35+
guest. Large guest pages are backed by multiple 4KB shadow pages through this
36+
mechanism.
37+
38+
IO: MMIO and DCR accesses are emulated by userspace. We use virtio for network
39+
and block IO, so those drivers must be enabled in the guest. It's possible
40+
that some qemu device emulation (e.g. e1000 or rtl8139) may also work with
41+
little effort.

arch/powerpc/Kconfig

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -803,3 +803,4 @@ config PPC_CLOCK
803803
config PPC_LIB_RHEAP
804804
bool
805805

806+
source "arch/powerpc/kvm/Kconfig"

arch/powerpc/Kconfig.debug

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -151,6 +151,9 @@ config BOOTX_TEXT
151151

152152
config PPC_EARLY_DEBUG
153153
bool "Early debugging (dangerous)"
154+
# PPC_EARLY_DEBUG on 440 leaves AS=1 mappings above the TLB high water
155+
# mark, which doesn't work with current 440 KVM.
156+
depends on !KVM
154157
help
155158
Say Y to enable some early debugging facilities that may be available
156159
for your processor/board combination. Those facilities are hacks

arch/powerpc/Makefile

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -145,6 +145,7 @@ core-y += arch/powerpc/kernel/ \
145145
arch/powerpc/platforms/
146146
core-$(CONFIG_MATH_EMULATION) += arch/powerpc/math-emu/
147147
core-$(CONFIG_XMON) += arch/powerpc/xmon/
148+
core-$(CONFIG_KVM) += arch/powerpc/kvm/
148149

149150
drivers-$(CONFIG_OPROFILE) += arch/powerpc/oprofile/
150151

arch/powerpc/kernel/asm-offsets.c

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,9 @@
2323
#include <linux/mm.h>
2424
#include <linux/suspend.h>
2525
#include <linux/hrtimer.h>
26+
#ifdef CONFIG_KVM
27+
#include <linux/kvm_host.h>
28+
#endif
2629
#ifdef CONFIG_PPC64
2730
#include <linux/time.h>
2831
#include <linux/hardirq.h>
@@ -324,5 +327,30 @@ int main(void)
324327

325328
DEFINE(PGD_TABLE_SIZE, PGD_TABLE_SIZE);
326329

330+
#ifdef CONFIG_KVM
331+
DEFINE(TLBE_BYTES, sizeof(struct tlbe));
332+
333+
DEFINE(VCPU_HOST_STACK, offsetof(struct kvm_vcpu, arch.host_stack));
334+
DEFINE(VCPU_HOST_PID, offsetof(struct kvm_vcpu, arch.host_pid));
335+
DEFINE(VCPU_HOST_TLB, offsetof(struct kvm_vcpu, arch.host_tlb));
336+
DEFINE(VCPU_SHADOW_TLB, offsetof(struct kvm_vcpu, arch.shadow_tlb));
337+
DEFINE(VCPU_GPRS, offsetof(struct kvm_vcpu, arch.gpr));
338+
DEFINE(VCPU_LR, offsetof(struct kvm_vcpu, arch.lr));
339+
DEFINE(VCPU_CR, offsetof(struct kvm_vcpu, arch.cr));
340+
DEFINE(VCPU_XER, offsetof(struct kvm_vcpu, arch.xer));
341+
DEFINE(VCPU_CTR, offsetof(struct kvm_vcpu, arch.ctr));
342+
DEFINE(VCPU_PC, offsetof(struct kvm_vcpu, arch.pc));
343+
DEFINE(VCPU_MSR, offsetof(struct kvm_vcpu, arch.msr));
344+
DEFINE(VCPU_SPRG4, offsetof(struct kvm_vcpu, arch.sprg4));
345+
DEFINE(VCPU_SPRG5, offsetof(struct kvm_vcpu, arch.sprg5));
346+
DEFINE(VCPU_SPRG6, offsetof(struct kvm_vcpu, arch.sprg6));
347+
DEFINE(VCPU_SPRG7, offsetof(struct kvm_vcpu, arch.sprg7));
348+
DEFINE(VCPU_PID, offsetof(struct kvm_vcpu, arch.pid));
349+
350+
DEFINE(VCPU_LAST_INST, offsetof(struct kvm_vcpu, arch.last_inst));
351+
DEFINE(VCPU_FAULT_DEAR, offsetof(struct kvm_vcpu, arch.fault_dear));
352+
DEFINE(VCPU_FAULT_ESR, offsetof(struct kvm_vcpu, arch.fault_esr));
353+
#endif
354+
327355
return 0;
328356
}

arch/powerpc/kvm/44x_tlb.c

Lines changed: 224 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,224 @@
1+
/*
2+
* This program is free software; you can redistribute it and/or modify
3+
* it under the terms of the GNU General Public License, version 2, as
4+
* published by the Free Software Foundation.
5+
*
6+
* This program is distributed in the hope that it will be useful,
7+
* but WITHOUT ANY WARRANTY; without even the implied warranty of
8+
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
9+
* GNU General Public License for more details.
10+
*
11+
* You should have received a copy of the GNU General Public License
12+
* along with this program; if not, write to the Free Software
13+
* Foundation, 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
14+
*
15+
* Copyright IBM Corp. 2007
16+
*
17+
* Authors: Hollis Blanchard <[email protected]>
18+
*/
19+
20+
#include <linux/types.h>
21+
#include <linux/string.h>
22+
#include <linux/kvm_host.h>
23+
#include <linux/highmem.h>
24+
#include <asm/mmu-44x.h>
25+
#include <asm/kvm_ppc.h>
26+
27+
#include "44x_tlb.h"
28+
29+
#define PPC44x_TLB_USER_PERM_MASK (PPC44x_TLB_UX|PPC44x_TLB_UR|PPC44x_TLB_UW)
30+
#define PPC44x_TLB_SUPER_PERM_MASK (PPC44x_TLB_SX|PPC44x_TLB_SR|PPC44x_TLB_SW)
31+
32+
static unsigned int kvmppc_tlb_44x_pos;
33+
34+
static u32 kvmppc_44x_tlb_shadow_attrib(u32 attrib, int usermode)
35+
{
36+
/* Mask off reserved bits. */
37+
attrib &= PPC44x_TLB_PERM_MASK|PPC44x_TLB_ATTR_MASK;
38+
39+
if (!usermode) {
40+
/* Guest is in supervisor mode, so we need to translate guest
41+
* supervisor permissions into user permissions. */
42+
attrib &= ~PPC44x_TLB_USER_PERM_MASK;
43+
attrib |= (attrib & PPC44x_TLB_SUPER_PERM_MASK) << 3;
44+
}
45+
46+
/* Make sure host can always access this memory. */
47+
attrib |= PPC44x_TLB_SX|PPC44x_TLB_SR|PPC44x_TLB_SW;
48+
49+
return attrib;
50+
}
51+
52+
/* Search the guest TLB for a matching entry. */
53+
int kvmppc_44x_tlb_index(struct kvm_vcpu *vcpu, gva_t eaddr, unsigned int pid,
54+
unsigned int as)
55+
{
56+
int i;
57+
58+
/* XXX Replace loop with fancy data structures. */
59+
for (i = 0; i < PPC44x_TLB_SIZE; i++) {
60+
struct tlbe *tlbe = &vcpu->arch.guest_tlb[i];
61+
unsigned int tid;
62+
63+
if (eaddr < get_tlb_eaddr(tlbe))
64+
continue;
65+
66+
if (eaddr > get_tlb_end(tlbe))
67+
continue;
68+
69+
tid = get_tlb_tid(tlbe);
70+
if (tid && (tid != pid))
71+
continue;
72+
73+
if (!get_tlb_v(tlbe))
74+
continue;
75+
76+
if (get_tlb_ts(tlbe) != as)
77+
continue;
78+
79+
return i;
80+
}
81+
82+
return -1;
83+
}
84+
85+
struct tlbe *kvmppc_44x_itlb_search(struct kvm_vcpu *vcpu, gva_t eaddr)
86+
{
87+
unsigned int as = !!(vcpu->arch.msr & MSR_IS);
88+
unsigned int index;
89+
90+
index = kvmppc_44x_tlb_index(vcpu, eaddr, vcpu->arch.pid, as);
91+
if (index == -1)
92+
return NULL;
93+
return &vcpu->arch.guest_tlb[index];
94+
}
95+
96+
struct tlbe *kvmppc_44x_dtlb_search(struct kvm_vcpu *vcpu, gva_t eaddr)
97+
{
98+
unsigned int as = !!(vcpu->arch.msr & MSR_DS);
99+
unsigned int index;
100+
101+
index = kvmppc_44x_tlb_index(vcpu, eaddr, vcpu->arch.pid, as);
102+
if (index == -1)
103+
return NULL;
104+
return &vcpu->arch.guest_tlb[index];
105+
}
106+
107+
static int kvmppc_44x_tlbe_is_writable(struct tlbe *tlbe)
108+
{
109+
return tlbe->word2 & (PPC44x_TLB_SW|PPC44x_TLB_UW);
110+
}
111+
112+
/* Must be called with mmap_sem locked for writing. */
113+
static void kvmppc_44x_shadow_release(struct kvm_vcpu *vcpu,
114+
unsigned int index)
115+
{
116+
struct tlbe *stlbe = &vcpu->arch.shadow_tlb[index];
117+
struct page *page = vcpu->arch.shadow_pages[index];
118+
119+
kunmap(vcpu->arch.shadow_pages[index]);
120+
121+
if (get_tlb_v(stlbe)) {
122+
if (kvmppc_44x_tlbe_is_writable(stlbe))
123+
kvm_release_page_dirty(page);
124+
else
125+
kvm_release_page_clean(page);
126+
}
127+
}
128+
129+
/* Caller must ensure that the specified guest TLB entry is safe to insert into
130+
* the shadow TLB. */
131+
void kvmppc_mmu_map(struct kvm_vcpu *vcpu, u64 gvaddr, gfn_t gfn, u64 asid,
132+
u32 flags)
133+
{
134+
struct page *new_page;
135+
struct tlbe *stlbe;
136+
hpa_t hpaddr;
137+
unsigned int victim;
138+
139+
/* Future optimization: don't overwrite the TLB entry containing the
140+
* current PC (or stack?). */
141+
victim = kvmppc_tlb_44x_pos++;
142+
if (kvmppc_tlb_44x_pos > tlb_44x_hwater)
143+
kvmppc_tlb_44x_pos = 0;
144+
stlbe = &vcpu->arch.shadow_tlb[victim];
145+
146+
/* Get reference to new page. */
147+
down_write(&current->mm->mmap_sem);
148+
new_page = gfn_to_page(vcpu->kvm, gfn);
149+
if (is_error_page(new_page)) {
150+
printk(KERN_ERR "Couldn't get guest page!\n");
151+
kvm_release_page_clean(new_page);
152+
return;
153+
}
154+
hpaddr = page_to_phys(new_page);
155+
156+
/* Drop reference to old page. */
157+
kvmppc_44x_shadow_release(vcpu, victim);
158+
up_write(&current->mm->mmap_sem);
159+
160+
vcpu->arch.shadow_pages[victim] = new_page;
161+
162+
/* XXX Make sure (va, size) doesn't overlap any other
163+
* entries. 440x6 user manual says the result would be
164+
* "undefined." */
165+
166+
/* XXX what about AS? */
167+
168+
stlbe->tid = asid & 0xff;
169+
170+
/* Force TS=1 for all guest mappings. */
171+
/* For now we hardcode 4KB mappings, but it will be important to
172+
* use host large pages in the future. */
173+
stlbe->word0 = (gvaddr & PAGE_MASK) | PPC44x_TLB_VALID | PPC44x_TLB_TS
174+
| PPC44x_TLB_4K;
175+
176+
stlbe->word1 = (hpaddr & 0xfffffc00) | ((hpaddr >> 32) & 0xf);
177+
stlbe->word2 = kvmppc_44x_tlb_shadow_attrib(flags,
178+
vcpu->arch.msr & MSR_PR);
179+
}
180+
181+
void kvmppc_mmu_invalidate(struct kvm_vcpu *vcpu, u64 eaddr, u64 asid)
182+
{
183+
unsigned int pid = asid & 0xff;
184+
int i;
185+
186+
/* XXX Replace loop with fancy data structures. */
187+
down_write(&current->mm->mmap_sem);
188+
for (i = 0; i <= tlb_44x_hwater; i++) {
189+
struct tlbe *stlbe = &vcpu->arch.shadow_tlb[i];
190+
unsigned int tid;
191+
192+
if (!get_tlb_v(stlbe))
193+
continue;
194+
195+
if (eaddr < get_tlb_eaddr(stlbe))
196+
continue;
197+
198+
if (eaddr > get_tlb_end(stlbe))
199+
continue;
200+
201+
tid = get_tlb_tid(stlbe);
202+
if (tid && (tid != pid))
203+
continue;
204+
205+
kvmppc_44x_shadow_release(vcpu, i);
206+
stlbe->word0 = 0;
207+
}
208+
up_write(&current->mm->mmap_sem);
209+
}
210+
211+
/* Invalidate all mappings, so that when they fault back in they will get the
212+
* proper permission bits. */
213+
void kvmppc_mmu_priv_switch(struct kvm_vcpu *vcpu, int usermode)
214+
{
215+
int i;
216+
217+
/* XXX Replace loop with fancy data structures. */
218+
down_write(&current->mm->mmap_sem);
219+
for (i = 0; i <= tlb_44x_hwater; i++) {
220+
kvmppc_44x_shadow_release(vcpu, i);
221+
vcpu->arch.shadow_tlb[i].word0 = 0;
222+
}
223+
up_write(&current->mm->mmap_sem);
224+
}

0 commit comments

Comments
 (0)