Skip to content

Commit 5a47555

Browse files
chao-pbonzini
authored andcommitted
KVM: Introduce per-page memory attributes
In confidential computing usages, whether a page is private or shared is necessary information for KVM to perform operations like page fault handling, page zapping etc. There are other potential use cases for per-page memory attributes, e.g. to make memory read-only (or no-exec, or exec-only, etc.) without having to modify memslots. Introduce the KVM_SET_MEMORY_ATTRIBUTES ioctl, advertised by KVM_CAP_MEMORY_ATTRIBUTES, to allow userspace to set the per-page memory attributes to a guest memory range. Use an xarray to store the per-page attributes internally, with a naive, not fully optimized implementation, i.e. prioritize correctness over performance for the initial implementation. Use bit 3 for the PRIVATE attribute so that KVM can use bits 0-2 for RWX attributes/protections in the future, e.g. to give userspace fine-grained control over read, write, and execute protections for guest memory. Provide arch hooks for handling attribute changes before and after common code sets the new attributes, e.g. x86 will use the "pre" hook to zap all relevant mappings, and the "post" hook to track whether or not hugepages can be used to map the range. To simplify the implementation wrap the entire sequence with kvm_mmu_invalidate_{begin,end}() even though the operation isn't strictly guaranteed to be an invalidation. For the initial use case, x86 *will* always invalidate memory, and preventing arch code from creating new mappings while the attributes are in flux makes it much easier to reason about the correctness of consuming attributes. It's possible that future usages may not require an invalidation, e.g. if KVM ends up supporting RWX protections and userspace grants _more_ protections, but again opt for simplicity and punt optimizations to if/when they are needed. Suggested-by: Sean Christopherson <[email protected]> Link: https://lore.kernel.org/all/[email protected] Cc: Fuad Tabba <[email protected]> Cc: Xu Yilun <[email protected]> Cc: Mickaël Salaün <[email protected]> Signed-off-by: Chao Peng <[email protected]> Co-developed-by: Sean Christopherson <[email protected]> Signed-off-by: Sean Christopherson <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
1 parent 193bbfa commit 5a47555

File tree

5 files changed

+288
-0
lines changed

5 files changed

+288
-0
lines changed

Documentation/virt/kvm/api.rst

Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6214,6 +6214,42 @@ superset of the features supported by the system.
62146214

62156215
See KVM_SET_USER_MEMORY_REGION.
62166216

6217+
4.141 KVM_SET_MEMORY_ATTRIBUTES
6218+
-------------------------------
6219+
6220+
:Capability: KVM_CAP_MEMORY_ATTRIBUTES
6221+
:Architectures: x86
6222+
:Type: vm ioctl
6223+
:Parameters: struct kvm_memory_attributes (in)
6224+
:Returns: 0 on success, <0 on error
6225+
6226+
KVM_SET_MEMORY_ATTRIBUTES allows userspace to set memory attributes for a range
6227+
of guest physical memory.
6228+
6229+
::
6230+
6231+
struct kvm_memory_attributes {
6232+
__u64 address;
6233+
__u64 size;
6234+
__u64 attributes;
6235+
__u64 flags;
6236+
};
6237+
6238+
#define KVM_MEMORY_ATTRIBUTE_PRIVATE (1ULL << 3)
6239+
6240+
The address and size must be page aligned. The supported attributes can be
6241+
retrieved via ioctl(KVM_CHECK_EXTENSION) on KVM_CAP_MEMORY_ATTRIBUTES. If
6242+
executed on a VM, KVM_CAP_MEMORY_ATTRIBUTES precisely returns the attributes
6243+
supported by that VM. If executed at system scope, KVM_CAP_MEMORY_ATTRIBUTES
6244+
returns all attributes supported by KVM. The only attribute defined at this
6245+
time is KVM_MEMORY_ATTRIBUTE_PRIVATE, which marks the associated gfn as being
6246+
guest private memory.
6247+
6248+
Note, there is no "get" API. Userspace is responsible for explicitly tracking
6249+
the state of a gfn/page as needed.
6250+
6251+
The "flags" field is reserved for future extensions and must be '0'.
6252+
62176253
5. The kvm_run structure
62186254
========================
62196255

include/linux/kvm_host.h

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -256,6 +256,7 @@ int kvm_async_pf_wakeup_all(struct kvm_vcpu *vcpu);
256256
#ifdef CONFIG_KVM_GENERIC_MMU_NOTIFIER
257257
union kvm_mmu_notifier_arg {
258258
pte_t pte;
259+
unsigned long attributes;
259260
};
260261

261262
struct kvm_gfn_range {
@@ -806,6 +807,10 @@ struct kvm {
806807

807808
#ifdef CONFIG_HAVE_KVM_PM_NOTIFIER
808809
struct notifier_block pm_notifier;
810+
#endif
811+
#ifdef CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES
812+
/* Protected by slots_locks (for writes) and RCU (for reads) */
813+
struct xarray mem_attr_array;
809814
#endif
810815
char stats_id[KVM_STATS_NAME_SIZE];
811816
};
@@ -2338,4 +2343,18 @@ static inline void kvm_prepare_memory_fault_exit(struct kvm_vcpu *vcpu,
23382343
vcpu->run->memory_fault.flags = 0;
23392344
}
23402345

2346+
#ifdef CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES
2347+
static inline unsigned long kvm_get_memory_attributes(struct kvm *kvm, gfn_t gfn)
2348+
{
2349+
return xa_to_value(xa_load(&kvm->mem_attr_array, gfn));
2350+
}
2351+
2352+
bool kvm_range_has_memory_attributes(struct kvm *kvm, gfn_t start, gfn_t end,
2353+
unsigned long attrs);
2354+
bool kvm_arch_pre_set_memory_attributes(struct kvm *kvm,
2355+
struct kvm_gfn_range *range);
2356+
bool kvm_arch_post_set_memory_attributes(struct kvm *kvm,
2357+
struct kvm_gfn_range *range);
2358+
#endif /* CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES */
2359+
23412360
#endif

include/uapi/linux/kvm.h

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1220,6 +1220,7 @@ struct kvm_ppc_resize_hpt {
12201220
#define KVM_CAP_ARM_SUPPORTED_REG_MASK_RANGES 230
12211221
#define KVM_CAP_USER_MEMORY2 231
12221222
#define KVM_CAP_MEMORY_FAULT_INFO 232
1223+
#define KVM_CAP_MEMORY_ATTRIBUTES 233
12231224

12241225
#ifdef KVM_CAP_IRQ_ROUTING
12251226

@@ -2288,4 +2289,16 @@ struct kvm_s390_zpci_op {
22882289
/* flags for kvm_s390_zpci_op->u.reg_aen.flags */
22892290
#define KVM_S390_ZPCIOP_REGAEN_HOST (1 << 0)
22902291

2292+
/* Available with KVM_CAP_MEMORY_ATTRIBUTES */
2293+
#define KVM_SET_MEMORY_ATTRIBUTES _IOW(KVMIO, 0xd2, struct kvm_memory_attributes)
2294+
2295+
struct kvm_memory_attributes {
2296+
__u64 address;
2297+
__u64 size;
2298+
__u64 attributes;
2299+
__u64 flags;
2300+
};
2301+
2302+
#define KVM_MEMORY_ATTRIBUTE_PRIVATE (1ULL << 3)
2303+
22912304
#endif /* __LINUX_KVM_H */

virt/kvm/Kconfig

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -96,3 +96,7 @@ config KVM_GENERIC_HARDWARE_ENABLING
9696
config KVM_GENERIC_MMU_NOTIFIER
9797
select MMU_NOTIFIER
9898
bool
99+
100+
config KVM_GENERIC_MEMORY_ATTRIBUTES
101+
select KVM_GENERIC_MMU_NOTIFIER
102+
bool

virt/kvm/kvm_main.c

Lines changed: 216 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1211,6 +1211,9 @@ static struct kvm *kvm_create_vm(unsigned long type, const char *fdname)
12111211
spin_lock_init(&kvm->mn_invalidate_lock);
12121212
rcuwait_init(&kvm->mn_memslots_update_rcuwait);
12131213
xa_init(&kvm->vcpu_array);
1214+
#ifdef CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES
1215+
xa_init(&kvm->mem_attr_array);
1216+
#endif
12141217

12151218
INIT_LIST_HEAD(&kvm->gpc_list);
12161219
spin_lock_init(&kvm->gpc_lock);
@@ -1391,6 +1394,9 @@ static void kvm_destroy_vm(struct kvm *kvm)
13911394
}
13921395
cleanup_srcu_struct(&kvm->irq_srcu);
13931396
cleanup_srcu_struct(&kvm->srcu);
1397+
#ifdef CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES
1398+
xa_destroy(&kvm->mem_attr_array);
1399+
#endif
13941400
kvm_arch_free_vm(kvm);
13951401
preempt_notifier_dec();
13961402
hardware_disable_all();
@@ -2397,6 +2403,200 @@ static int kvm_vm_ioctl_clear_dirty_log(struct kvm *kvm,
23972403
}
23982404
#endif /* CONFIG_KVM_GENERIC_DIRTYLOG_READ_PROTECT */
23992405

2406+
#ifdef CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES
2407+
/*
2408+
* Returns true if _all_ gfns in the range [@start, @end) have attributes
2409+
* matching @attrs.
2410+
*/
2411+
bool kvm_range_has_memory_attributes(struct kvm *kvm, gfn_t start, gfn_t end,
2412+
unsigned long attrs)
2413+
{
2414+
XA_STATE(xas, &kvm->mem_attr_array, start);
2415+
unsigned long index;
2416+
bool has_attrs;
2417+
void *entry;
2418+
2419+
rcu_read_lock();
2420+
2421+
if (!attrs) {
2422+
has_attrs = !xas_find(&xas, end - 1);
2423+
goto out;
2424+
}
2425+
2426+
has_attrs = true;
2427+
for (index = start; index < end; index++) {
2428+
do {
2429+
entry = xas_next(&xas);
2430+
} while (xas_retry(&xas, entry));
2431+
2432+
if (xas.xa_index != index || xa_to_value(entry) != attrs) {
2433+
has_attrs = false;
2434+
break;
2435+
}
2436+
}
2437+
2438+
out:
2439+
rcu_read_unlock();
2440+
return has_attrs;
2441+
}
2442+
2443+
static u64 kvm_supported_mem_attributes(struct kvm *kvm)
2444+
{
2445+
if (!kvm)
2446+
return KVM_MEMORY_ATTRIBUTE_PRIVATE;
2447+
2448+
return 0;
2449+
}
2450+
2451+
static __always_inline void kvm_handle_gfn_range(struct kvm *kvm,
2452+
struct kvm_mmu_notifier_range *range)
2453+
{
2454+
struct kvm_gfn_range gfn_range;
2455+
struct kvm_memory_slot *slot;
2456+
struct kvm_memslots *slots;
2457+
struct kvm_memslot_iter iter;
2458+
bool found_memslot = false;
2459+
bool ret = false;
2460+
int i;
2461+
2462+
gfn_range.arg = range->arg;
2463+
gfn_range.may_block = range->may_block;
2464+
2465+
for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++) {
2466+
slots = __kvm_memslots(kvm, i);
2467+
2468+
kvm_for_each_memslot_in_gfn_range(&iter, slots, range->start, range->end) {
2469+
slot = iter.slot;
2470+
gfn_range.slot = slot;
2471+
2472+
gfn_range.start = max(range->start, slot->base_gfn);
2473+
gfn_range.end = min(range->end, slot->base_gfn + slot->npages);
2474+
if (gfn_range.start >= gfn_range.end)
2475+
continue;
2476+
2477+
if (!found_memslot) {
2478+
found_memslot = true;
2479+
KVM_MMU_LOCK(kvm);
2480+
if (!IS_KVM_NULL_FN(range->on_lock))
2481+
range->on_lock(kvm);
2482+
}
2483+
2484+
ret |= range->handler(kvm, &gfn_range);
2485+
}
2486+
}
2487+
2488+
if (range->flush_on_ret && ret)
2489+
kvm_flush_remote_tlbs(kvm);
2490+
2491+
if (found_memslot)
2492+
KVM_MMU_UNLOCK(kvm);
2493+
}
2494+
2495+
static bool kvm_pre_set_memory_attributes(struct kvm *kvm,
2496+
struct kvm_gfn_range *range)
2497+
{
2498+
/*
2499+
* Unconditionally add the range to the invalidation set, regardless of
2500+
* whether or not the arch callback actually needs to zap SPTEs. E.g.
2501+
* if KVM supports RWX attributes in the future and the attributes are
2502+
* going from R=>RW, zapping isn't strictly necessary. Unconditionally
2503+
* adding the range allows KVM to require that MMU invalidations add at
2504+
* least one range between begin() and end(), e.g. allows KVM to detect
2505+
* bugs where the add() is missed. Relaxing the rule *might* be safe,
2506+
* but it's not obvious that allowing new mappings while the attributes
2507+
* are in flux is desirable or worth the complexity.
2508+
*/
2509+
kvm_mmu_invalidate_range_add(kvm, range->start, range->end);
2510+
2511+
return kvm_arch_pre_set_memory_attributes(kvm, range);
2512+
}
2513+
2514+
/* Set @attributes for the gfn range [@start, @end). */
2515+
static int kvm_vm_set_mem_attributes(struct kvm *kvm, gfn_t start, gfn_t end,
2516+
unsigned long attributes)
2517+
{
2518+
struct kvm_mmu_notifier_range pre_set_range = {
2519+
.start = start,
2520+
.end = end,
2521+
.handler = kvm_pre_set_memory_attributes,
2522+
.on_lock = kvm_mmu_invalidate_begin,
2523+
.flush_on_ret = true,
2524+
.may_block = true,
2525+
};
2526+
struct kvm_mmu_notifier_range post_set_range = {
2527+
.start = start,
2528+
.end = end,
2529+
.arg.attributes = attributes,
2530+
.handler = kvm_arch_post_set_memory_attributes,
2531+
.on_lock = kvm_mmu_invalidate_end,
2532+
.may_block = true,
2533+
};
2534+
unsigned long i;
2535+
void *entry;
2536+
int r = 0;
2537+
2538+
entry = attributes ? xa_mk_value(attributes) : NULL;
2539+
2540+
mutex_lock(&kvm->slots_lock);
2541+
2542+
/* Nothing to do if the entire range as the desired attributes. */
2543+
if (kvm_range_has_memory_attributes(kvm, start, end, attributes))
2544+
goto out_unlock;
2545+
2546+
/*
2547+
* Reserve memory ahead of time to avoid having to deal with failures
2548+
* partway through setting the new attributes.
2549+
*/
2550+
for (i = start; i < end; i++) {
2551+
r = xa_reserve(&kvm->mem_attr_array, i, GFP_KERNEL_ACCOUNT);
2552+
if (r)
2553+
goto out_unlock;
2554+
}
2555+
2556+
kvm_handle_gfn_range(kvm, &pre_set_range);
2557+
2558+
for (i = start; i < end; i++) {
2559+
r = xa_err(xa_store(&kvm->mem_attr_array, i, entry,
2560+
GFP_KERNEL_ACCOUNT));
2561+
KVM_BUG_ON(r, kvm);
2562+
}
2563+
2564+
kvm_handle_gfn_range(kvm, &post_set_range);
2565+
2566+
out_unlock:
2567+
mutex_unlock(&kvm->slots_lock);
2568+
2569+
return r;
2570+
}
2571+
static int kvm_vm_ioctl_set_mem_attributes(struct kvm *kvm,
2572+
struct kvm_memory_attributes *attrs)
2573+
{
2574+
gfn_t start, end;
2575+
2576+
/* flags is currently not used. */
2577+
if (attrs->flags)
2578+
return -EINVAL;
2579+
if (attrs->attributes & ~kvm_supported_mem_attributes(kvm))
2580+
return -EINVAL;
2581+
if (attrs->size == 0 || attrs->address + attrs->size < attrs->address)
2582+
return -EINVAL;
2583+
if (!PAGE_ALIGNED(attrs->address) || !PAGE_ALIGNED(attrs->size))
2584+
return -EINVAL;
2585+
2586+
start = attrs->address >> PAGE_SHIFT;
2587+
end = (attrs->address + attrs->size) >> PAGE_SHIFT;
2588+
2589+
/*
2590+
* xarray tracks data using "unsigned long", and as a result so does
2591+
* KVM. For simplicity, supports generic attributes only on 64-bit
2592+
* architectures.
2593+
*/
2594+
BUILD_BUG_ON(sizeof(attrs->attributes) != sizeof(unsigned long));
2595+
2596+
return kvm_vm_set_mem_attributes(kvm, start, end, attrs->attributes);
2597+
}
2598+
#endif /* CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES */
2599+
24002600
struct kvm_memory_slot *gfn_to_memslot(struct kvm *kvm, gfn_t gfn)
24012601
{
24022602
return __gfn_to_memslot(kvm_memslots(kvm), gfn);
@@ -4641,6 +4841,10 @@ static int kvm_vm_ioctl_check_extension_generic(struct kvm *kvm, long arg)
46414841
case KVM_CAP_BINARY_STATS_FD:
46424842
case KVM_CAP_SYSTEM_EVENT_DATA:
46434843
return 1;
4844+
#ifdef CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES
4845+
case KVM_CAP_MEMORY_ATTRIBUTES:
4846+
return kvm_supported_mem_attributes(kvm);
4847+
#endif
46444848
default:
46454849
break;
46464850
}
@@ -5034,6 +5238,18 @@ static long kvm_vm_ioctl(struct file *filp,
50345238
break;
50355239
}
50365240
#endif /* CONFIG_HAVE_KVM_IRQ_ROUTING */
5241+
#ifdef CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES
5242+
case KVM_SET_MEMORY_ATTRIBUTES: {
5243+
struct kvm_memory_attributes attrs;
5244+
5245+
r = -EFAULT;
5246+
if (copy_from_user(&attrs, argp, sizeof(attrs)))
5247+
goto out;
5248+
5249+
r = kvm_vm_ioctl_set_mem_attributes(kvm, &attrs);
5250+
break;
5251+
}
5252+
#endif /* CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES */
50375253
case KVM_CREATE_DEVICE: {
50385254
struct kvm_create_device cd;
50395255

0 commit comments

Comments
 (0)