Skip to content

Commit 256c8ae

Browse files
committed
fs: introduce dedicated idmap type for mounts
Last cycle we've already made the interaction with idmapped mounts more robust and type safe by introducing the vfs{g,u}id_t type. This cycle we concluded the conversion and removed the legacy helpers. Currently we still pass around the plain namespace that was attached to a mount. This is in general pretty convenient but it makes it easy to conflate filesystem and mount namespaces and what different roles they have to play. Especially for filesystem developers without much experience in this area this is an easy source for bugs. Instead of passing the plain namespace we introduce a dedicated type struct mnt_idmap and replace the pointer with a pointer to a struct mnt_idmap. There are no semantic or size changes for the mount struct caused by this. We then start converting all places aware of idmapped mounts to rely on struct mnt_idmap. Once the conversion is done all helpers down to the really low-level make_vfs{g,u}id() and from_vfs{g,u}id() will take a struct mnt_idmap argument instead of two namespace arguments. This way it becomes impossible to conflate the two, removing and thus eliminating the possibility of any bugs. Fwiw, I fixed some issues in that area a while ago in ntfs3 and ksmbd in the past. Afterwards, only low-level code can ultimately use the associated namespace for any permission checks. Even most of the vfs can be ultimately completely oblivious about this and filesystems will never interact with it directly in any form in the future. A struct mnt_idmap currently encompasses a simple refcount and a pointer to the relevant namespace the mount is idmapped to. If a mount isn't idmapped then it will point to a static nop_mnt_idmap. If it is an idmapped mount it will point to a new struct mnt_idmap. As usual there are no allocations or anything happening for non-idmapped mounts. Everthing is carefully written to be a nop for non-idmapped mounts as has always been the case. If an idmapped mount or mount tree is created a new struct mnt_idmap is allocated and a reference taken on the relevant namespace. For each mount in a mount tree that gets idmapped or a mount that inherits the idmap when it is cloned the reference count on the associated struct mnt_idmap is bumped. Just a reminder that we only allow a mount to change it's idmapping a single time and only if it hasn't already been attached to the filesystems and has no active writers. The actual changes are fairly straightforward. This will have huge benefits for maintenance and security in the long run even if it causes some churn. I'm aware that there's some cost for all of you. And I'll commit to doing this work and make this as painless as I can. Note that this also makes it possible to extend struct mount_idmap in the future. For example, it would be possible to place the namespace pointer in an anonymous union together with an idmapping struct. This would allow us to expose an api to userspace that would let it specify idmappings directly instead of having to go through the detour of setting up namespaces at all. This just adds the infrastructure and doesn't do any conversions. Reviewed-by: Seth Forshee (DigitalOcean) <[email protected]> Signed-off-by: Christian Brauner (Microsoft) <[email protected]>
1 parent f7adeea commit 256c8ae

File tree

4 files changed

+158
-45
lines changed

4 files changed

+158
-45
lines changed

fs/namespace.c

Lines changed: 142 additions & 34 deletions
Original file line numberDiff line numberDiff line change
@@ -75,13 +75,30 @@ static DECLARE_RWSEM(namespace_sem);
7575
static HLIST_HEAD(unmounted); /* protected by namespace_sem */
7676
static LIST_HEAD(ex_mountpoints); /* protected by namespace_sem */
7777

78+
struct mnt_idmap {
79+
struct user_namespace *owner;
80+
refcount_t count;
81+
};
82+
83+
/*
84+
* Carries the initial idmapping of 0:0:4294967295 which is an identity
85+
* mapping. This means that {g,u}id 0 is mapped to {g,u}id 0, {g,u}id 1 is
86+
* mapped to {g,u}id 1, [...], {g,u}id 1000 to {g,u}id 1000, [...].
87+
*/
88+
struct mnt_idmap nop_mnt_idmap = {
89+
.owner = &init_user_ns,
90+
.count = REFCOUNT_INIT(1),
91+
};
92+
EXPORT_SYMBOL_GPL(nop_mnt_idmap);
93+
7894
struct mount_kattr {
7995
unsigned int attr_set;
8096
unsigned int attr_clr;
8197
unsigned int propagation;
8298
unsigned int lookup_flags;
8399
bool recurse;
84100
struct user_namespace *mnt_userns;
101+
struct mnt_idmap *mnt_idmap;
85102
};
86103

87104
/* /sys/fs */
@@ -193,6 +210,104 @@ int mnt_get_count(struct mount *mnt)
193210
#endif
194211
}
195212

213+
/**
214+
* mnt_idmap_owner - retrieve owner of the mount's idmapping
215+
* @idmap: mount idmapping
216+
*
217+
* This helper will go away once the conversion to use struct mnt_idmap
218+
* everywhere has finished at which point the helper will be unexported.
219+
*
220+
* Only code that needs to perform permission checks based on the owner of the
221+
* idmapping will get access to it. All other code will solely rely on
222+
* idmappings. This will get us type safety so it's impossible to conflate
223+
* filesystems idmappings with mount idmappings.
224+
*
225+
* Return: The owner of the idmapping.
226+
*/
227+
struct user_namespace *mnt_idmap_owner(const struct mnt_idmap *idmap)
228+
{
229+
return idmap->owner;
230+
}
231+
EXPORT_SYMBOL_GPL(mnt_idmap_owner);
232+
233+
/**
234+
* mnt_user_ns - retrieve owner of an idmapped mount
235+
* @mnt: the relevant vfsmount
236+
*
237+
* This helper will go away once the conversion to use struct mnt_idmap
238+
* everywhere has finished at which point the helper will be unexported.
239+
*
240+
* Only code that needs to perform permission checks based on the owner of the
241+
* idmapping will get access to it. All other code will solely rely on
242+
* idmappings. This will get us type safety so it's impossible to conflate
243+
* filesystems idmappings with mount idmappings.
244+
*
245+
* Return: The owner of the idmapped.
246+
*/
247+
struct user_namespace *mnt_user_ns(const struct vfsmount *mnt)
248+
{
249+
struct mnt_idmap *idmap = mnt_idmap(mnt);
250+
251+
/* Return the actual owner of the filesystem instead of the nop. */
252+
if (idmap == &nop_mnt_idmap &&
253+
!initial_idmapping(mnt->mnt_sb->s_user_ns))
254+
return mnt->mnt_sb->s_user_ns;
255+
return mnt_idmap_owner(idmap);
256+
}
257+
EXPORT_SYMBOL_GPL(mnt_user_ns);
258+
259+
/**
260+
* alloc_mnt_idmap - allocate a new idmapping for the mount
261+
* @mnt_userns: owning userns of the idmapping
262+
*
263+
* Allocate a new struct mnt_idmap which carries the idmapping of the mount.
264+
*
265+
* Return: On success a new idmap, on error an error pointer is returned.
266+
*/
267+
static struct mnt_idmap *alloc_mnt_idmap(struct user_namespace *mnt_userns)
268+
{
269+
struct mnt_idmap *idmap;
270+
271+
idmap = kzalloc(sizeof(struct mnt_idmap), GFP_KERNEL_ACCOUNT);
272+
if (!idmap)
273+
return ERR_PTR(-ENOMEM);
274+
275+
idmap->owner = get_user_ns(mnt_userns);
276+
refcount_set(&idmap->count, 1);
277+
return idmap;
278+
}
279+
280+
/**
281+
* mnt_idmap_get - get a reference to an idmapping
282+
* @idmap: the idmap to bump the reference on
283+
*
284+
* If @idmap is not the @nop_mnt_idmap bump the reference count.
285+
*
286+
* Return: @idmap with reference count bumped if @not_mnt_idmap isn't passed.
287+
*/
288+
static inline struct mnt_idmap *mnt_idmap_get(struct mnt_idmap *idmap)
289+
{
290+
if (idmap != &nop_mnt_idmap)
291+
refcount_inc(&idmap->count);
292+
293+
return idmap;
294+
}
295+
296+
/**
297+
* mnt_idmap_put - put a reference to an idmapping
298+
* @idmap: the idmap to put the reference on
299+
*
300+
* If this is a non-initial idmapping, put the reference count when a mount is
301+
* released and free it if we're the last user.
302+
*/
303+
static inline void mnt_idmap_put(struct mnt_idmap *idmap)
304+
{
305+
if (idmap != &nop_mnt_idmap && refcount_dec_and_test(&idmap->count)) {
306+
put_user_ns(idmap->owner);
307+
kfree(idmap);
308+
}
309+
}
310+
196311
static struct mount *alloc_vfsmnt(const char *name)
197312
{
198313
struct mount *mnt = kmem_cache_zalloc(mnt_cache, GFP_KERNEL);
@@ -232,7 +347,7 @@ static struct mount *alloc_vfsmnt(const char *name)
232347
INIT_HLIST_NODE(&mnt->mnt_mp_list);
233348
INIT_LIST_HEAD(&mnt->mnt_umounting);
234349
INIT_HLIST_HEAD(&mnt->mnt_stuck_children);
235-
mnt->mnt.mnt_userns = &init_user_ns;
350+
mnt->mnt.mnt_idmap = &nop_mnt_idmap;
236351
}
237352
return mnt;
238353

@@ -602,11 +717,7 @@ int sb_prepare_remount_readonly(struct super_block *sb)
602717

603718
static void free_vfsmnt(struct mount *mnt)
604719
{
605-
struct user_namespace *mnt_userns;
606-
607-
mnt_userns = mnt_user_ns(&mnt->mnt);
608-
if (!initial_idmapping(mnt_userns))
609-
put_user_ns(mnt_userns);
720+
mnt_idmap_put(mnt_idmap(&mnt->mnt));
610721
kfree_const(mnt->mnt_devname);
611722
#ifdef CONFIG_SMP
612723
free_percpu(mnt->mnt_pcp);
@@ -1009,7 +1120,6 @@ static struct mount *skip_mnt_tree(struct mount *p)
10091120
struct vfsmount *vfs_create_mount(struct fs_context *fc)
10101121
{
10111122
struct mount *mnt;
1012-
struct user_namespace *fs_userns;
10131123

10141124
if (!fc->root)
10151125
return ERR_PTR(-EINVAL);
@@ -1027,10 +1137,6 @@ struct vfsmount *vfs_create_mount(struct fs_context *fc)
10271137
mnt->mnt_mountpoint = mnt->mnt.mnt_root;
10281138
mnt->mnt_parent = mnt;
10291139

1030-
fs_userns = mnt->mnt.mnt_sb->s_user_ns;
1031-
if (!initial_idmapping(fs_userns))
1032-
mnt->mnt.mnt_userns = get_user_ns(fs_userns);
1033-
10341140
lock_mount_hash();
10351141
list_add_tail(&mnt->mnt_instance, &mnt->mnt.mnt_sb->s_mounts);
10361142
unlock_mount_hash();
@@ -1120,9 +1226,8 @@ static struct mount *clone_mnt(struct mount *old, struct dentry *root,
11201226
mnt->mnt.mnt_flags &= ~(MNT_WRITE_HOLD|MNT_MARKED|MNT_INTERNAL);
11211227

11221228
atomic_inc(&sb->s_active);
1123-
mnt->mnt.mnt_userns = mnt_user_ns(&old->mnt);
1124-
if (!initial_idmapping(mnt->mnt.mnt_userns))
1125-
mnt->mnt.mnt_userns = get_user_ns(mnt->mnt.mnt_userns);
1229+
mnt->mnt.mnt_idmap = mnt_idmap_get(mnt_idmap(&old->mnt));
1230+
11261231
mnt->mnt.mnt_sb = sb;
11271232
mnt->mnt.mnt_root = dget(root);
11281233
mnt->mnt_mountpoint = mnt->mnt.mnt_root;
@@ -3981,14 +4086,14 @@ static int can_idmap_mount(const struct mount_kattr *kattr, struct mount *mnt)
39814086
struct vfsmount *m = &mnt->mnt;
39824087
struct user_namespace *fs_userns = m->mnt_sb->s_user_ns;
39834088

3984-
if (!kattr->mnt_userns)
4089+
if (!kattr->mnt_idmap)
39854090
return 0;
39864091

39874092
/*
39884093
* Creating an idmapped mount with the filesystem wide idmapping
39894094
* doesn't make sense so block that. We don't allow mushy semantics.
39904095
*/
3991-
if (kattr->mnt_userns == fs_userns)
4096+
if (mnt_idmap_owner(kattr->mnt_idmap) == fs_userns)
39924097
return -EINVAL;
39934098

39944099
/*
@@ -4028,7 +4133,7 @@ static inline bool mnt_allow_writers(const struct mount_kattr *kattr,
40284133
{
40294134
return (!(kattr->attr_set & MNT_READONLY) ||
40304135
(mnt->mnt.mnt_flags & MNT_READONLY)) &&
4031-
!kattr->mnt_userns;
4136+
!kattr->mnt_idmap;
40324137
}
40334138

40344139
static int mount_setattr_prepare(struct mount_kattr *kattr, struct mount *mnt)
@@ -4082,27 +4187,18 @@ static int mount_setattr_prepare(struct mount_kattr *kattr, struct mount *mnt)
40824187

40834188
static void do_idmap_mount(const struct mount_kattr *kattr, struct mount *mnt)
40844189
{
4085-
struct user_namespace *mnt_userns, *old_mnt_userns;
4086-
4087-
if (!kattr->mnt_userns)
4190+
if (!kattr->mnt_idmap)
40884191
return;
40894192

40904193
/*
4091-
* We're the only ones able to change the mount's idmapping. So
4092-
* mnt->mnt.mnt_userns is stable and we can retrieve it directly.
4093-
*/
4094-
old_mnt_userns = mnt->mnt.mnt_userns;
4095-
4096-
mnt_userns = get_user_ns(kattr->mnt_userns);
4097-
/* Pairs with smp_load_acquire() in mnt_user_ns(). */
4098-
smp_store_release(&mnt->mnt.mnt_userns, mnt_userns);
4099-
4100-
/*
4101-
* If this is an idmapped filesystem drop the reference we've taken
4102-
* in vfs_create_mount() before.
4194+
* Pairs with smp_load_acquire() in mnt_idmap().
4195+
*
4196+
* Since we only allow a mount to change the idmapping once and
4197+
* verified this in can_idmap_mount() we know that the mount has
4198+
* @nop_mnt_idmap attached to it. So there's no need to drop any
4199+
* references.
41034200
*/
4104-
if (!initial_idmapping(old_mnt_userns))
4105-
put_user_ns(old_mnt_userns);
4201+
smp_store_release(&mnt->mnt.mnt_idmap, mnt_idmap_get(kattr->mnt_idmap));
41064202
}
41074203

41084204
static void mount_setattr_commit(struct mount_kattr *kattr, struct mount *mnt)
@@ -4136,6 +4232,15 @@ static int do_mount_setattr(struct path *path, struct mount_kattr *kattr)
41364232
if (path->dentry != mnt->mnt.mnt_root)
41374233
return -EINVAL;
41384234

4235+
if (kattr->mnt_userns) {
4236+
struct mnt_idmap *mnt_idmap;
4237+
4238+
mnt_idmap = alloc_mnt_idmap(kattr->mnt_userns);
4239+
if (IS_ERR(mnt_idmap))
4240+
return PTR_ERR(mnt_idmap);
4241+
kattr->mnt_idmap = mnt_idmap;
4242+
}
4243+
41394244
if (kattr->propagation) {
41404245
/*
41414246
* Only take namespace_lock() if we're actually changing
@@ -4323,6 +4428,9 @@ static void finish_mount_kattr(struct mount_kattr *kattr)
43234428
{
43244429
put_user_ns(kattr->mnt_userns);
43254430
kattr->mnt_userns = NULL;
4431+
4432+
if (kattr->mnt_idmap)
4433+
mnt_idmap_put(kattr->mnt_idmap);
43264434
}
43274435

43284436
SYSCALL_DEFINE5(mount_setattr, int, dfd, const char __user *, path,

include/linux/fs.h

Lines changed: 7 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -2700,18 +2700,22 @@ static inline struct user_namespace *file_mnt_user_ns(struct file *file)
27002700
return mnt_user_ns(file->f_path.mnt);
27012701
}
27022702

2703+
static inline struct mnt_idmap *file_mnt_idmap(struct file *file)
2704+
{
2705+
return mnt_idmap(file->f_path.mnt);
2706+
}
2707+
27032708
/**
27042709
* is_idmapped_mnt - check whether a mount is mapped
27052710
* @mnt: the mount to check
27062711
*
2707-
* If @mnt has an idmapping attached different from the
2708-
* filesystem's idmapping then @mnt is mapped.
2712+
* If @mnt has an non @nop_mnt_idmap attached to it then @mnt is mapped.
27092713
*
27102714
* Return: true if mount is mapped, false if not.
27112715
*/
27122716
static inline bool is_idmapped_mnt(const struct vfsmount *mnt)
27132717
{
2714-
return mnt_user_ns(mnt) != mnt->mnt_sb->s_user_ns;
2718+
return mnt_idmap(mnt) != &nop_mnt_idmap;
27152719
}
27162720

27172721
extern long vfs_truncate(const struct path *, loff_t);

include/linux/mnt_idmapping.h

Lines changed: 3 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -5,12 +5,10 @@
55
#include <linux/types.h>
66
#include <linux/uidgid.h>
77

8+
struct mnt_idmap;
89
struct user_namespace;
9-
/*
10-
* Carries the initial idmapping of 0:0:4294967295 which is an identity
11-
* mapping. This means that {g,u}id 0 is mapped to {g,u}id 0, {g,u}id 1 is
12-
* mapped to {g,u}id 1, [...], {g,u}id 1000 to {g,u}id 1000, [...].
13-
*/
10+
11+
extern struct mnt_idmap nop_mnt_idmap;
1412
extern struct user_namespace init_user_ns;
1513

1614
typedef struct {

include/linux/mount.h

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,7 @@
1616
struct super_block;
1717
struct dentry;
1818
struct user_namespace;
19+
struct mnt_idmap;
1920
struct file_system_type;
2021
struct fs_context;
2122
struct file;
@@ -70,13 +71,15 @@ struct vfsmount {
7071
struct dentry *mnt_root; /* root of the mounted tree */
7172
struct super_block *mnt_sb; /* pointer to superblock */
7273
int mnt_flags;
73-
struct user_namespace *mnt_userns;
74+
struct mnt_idmap *mnt_idmap;
7475
} __randomize_layout;
7576

76-
static inline struct user_namespace *mnt_user_ns(const struct vfsmount *mnt)
77+
struct user_namespace *mnt_user_ns(const struct vfsmount *mnt);
78+
struct user_namespace *mnt_idmap_owner(const struct mnt_idmap *idmap);
79+
static inline struct mnt_idmap *mnt_idmap(const struct vfsmount *mnt)
7780
{
7881
/* Pairs with smp_store_release() in do_idmap_mount(). */
79-
return smp_load_acquire(&mnt->mnt_userns);
82+
return smp_load_acquire(&mnt->mnt_idmap);
8083
}
8184

8285
extern int mnt_want_write(struct vfsmount *mnt);

0 commit comments

Comments
 (0)