[1/2] fs: Extend mount_ns with support for a fast namespace to vfsmount function

Submitted by Eric W. Biederman on March 23, 2018, 9:41 p.m.

Details

Message ID 87fu4qo4ff.fsf_-_@xmission.com
State New
Series "mqueue: forbid unprivileged user access to internal mount"
Headers show

Commit Message

Eric W. Biederman March 23, 2018, 9:41 p.m.
If this function is present use it to lookup up the vfsmount except
when performaning internal kernel mounts.  When performing internal
kernel mounts don't look through the list of superblocks just create a
new one.

After a quick survey it appears all callers of mount_ns are candidates
for this optimization.  So extending the generic helper appears
like the right thing.

The motivation for this change is that this optimization was performed
recently on mqueuefs and a permission check was dropped and
sb->s_user_ns was set incorrectly.

To enable fixing mqueuefs this logic was extracted from mqueuefs and
added to mount_ns which gets the permission check correct and set
sb->s_user_ns properly.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 fs/nfsd/nfsctl.c      |  3 ++-
 fs/proc/root.c        |  3 ++-
 fs/super.c            | 18 +++++++++++++++---
 include/linux/fs.h    |  1 +
 net/sunrpc/rpc_pipe.c |  2 +-
 5 files changed, 21 insertions(+), 6 deletions(-)

Patch hide | download patch | download mbox

diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
index d107b4426f7e..ffd8d91a68df 100644
--- a/fs/nfsd/nfsctl.c
+++ b/fs/nfsd/nfsctl.c
@@ -1182,7 +1182,8 @@  static struct dentry *nfsd_mount(struct file_system_type *fs_type,
 	int flags, const char *dev_name, void *data)
 {
 	struct net *net = current->nsproxy->net_ns;
-	return mount_ns(fs_type, flags, data, net, net->user_ns, nfsd_fill_super);
+	return mount_ns(fs_type, flags, data, net, net->user_ns,
+			NULL, nfsd_fill_super);
 }
 
 static void nfsd_umount(struct super_block *sb)
diff --git a/fs/proc/root.c b/fs/proc/root.c
index ede8e64974be..4111565b6944 100644
--- a/fs/proc/root.c
+++ b/fs/proc/root.c
@@ -98,7 +98,8 @@  static struct dentry *proc_mount(struct file_system_type *fs_type,
 		ns = task_active_pid_ns(current);
 	}
 
-	return mount_ns(fs_type, flags, data, ns, ns->user_ns, proc_fill_super);
+	return mount_ns(fs_type, flags, data, ns, ns->user_ns,
+			NULL, proc_fill_super);
 }
 
 static void proc_kill_sb(struct super_block *sb)
diff --git a/fs/super.c b/fs/super.c
index 672538ca9831..4734d423b403 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -1016,18 +1016,30 @@  static int ns_set_super(struct super_block *sb, void *data)
 
 struct dentry *mount_ns(struct file_system_type *fs_type,
 	int flags, void *data, void *ns, struct user_namespace *user_ns,
+	struct vfsmount *(*ns_to_mnt)(void *ns),
 	int (*fill_super)(struct super_block *, void *, int))
 {
 	struct super_block *sb;
-
+	int (*test_super)(struct super_block *, void *) = ns_test_super;
 	/* Don't allow mounting unless the caller has CAP_SYS_ADMIN
 	 * over the namespace.
 	 */
 	if (!(flags & SB_KERNMOUNT) && !ns_capable(user_ns, CAP_SYS_ADMIN))
 		return ERR_PTR(-EPERM);
 
-	sb = sget_userns(fs_type, ns_test_super, ns_set_super, flags,
-			 user_ns, ns);
+	if (ns_to_mnt) {
+		test_super = NULL;
+		if (!(flags & SB_KERNMOUNT)) {
+			struct vfsmount *m = ns_to_mnt(ns);
+			if (IS_ERR(m))
+				return ERR_CAST(m);
+			atomic_inc(&m->mnt_sb->s_active);
+			down_write(&m->mnt_sb->s_umount);
+			return dget(m->mnt_root);
+		}
+	}
+
+	sb = sget_userns(fs_type, test_super, ns_set_super, flags, user_ns, ns);
 	if (IS_ERR(sb))
 		return ERR_CAST(sb);
 
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 2a815560fda0..ca7f59ff144c 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2091,6 +2091,7 @@  struct file_system_type {
 
 extern struct dentry *mount_ns(struct file_system_type *fs_type,
 	int flags, void *data, void *ns, struct user_namespace *user_ns,
+	struct vfsmount *(*ns_to_mnt)(void *ns),
 	int (*fill_super)(struct super_block *, void *, int));
 #ifdef CONFIG_BLOCK
 extern struct dentry *mount_bdev(struct file_system_type *fs_type,
diff --git a/net/sunrpc/rpc_pipe.c b/net/sunrpc/rpc_pipe.c
index fc97fc3ed637..824e740fe740 100644
--- a/net/sunrpc/rpc_pipe.c
+++ b/net/sunrpc/rpc_pipe.c
@@ -1448,7 +1448,7 @@  rpc_mount(struct file_system_type *fs_type,
 		int flags, const char *dev_name, void *data)
 {
 	struct net *net = current->nsproxy->net_ns;
-	return mount_ns(fs_type, flags, data, net, net->user_ns, rpc_fill_super);
+	return mount_ns(fs_type, flags, data, net, net->user_ns, NULL, rpc_fill_super);
 }
 
 static void rpc_kill_sb(struct super_block *sb)

Comments

Al Viro March 23, 2018, 11:15 p.m.
On Fri, Mar 23, 2018 at 04:41:40PM -0500, Eric W. Biederman wrote:

>  struct dentry *mount_ns(struct file_system_type *fs_type,
>  	int flags, void *data, void *ns, struct user_namespace *user_ns,
> +	struct vfsmount *(*ns_to_mnt)(void *ns),
>  	int (*fill_super)(struct super_block *, void *, int))
>  {
>  	struct super_block *sb;
> -
> +	int (*test_super)(struct super_block *, void *) = ns_test_super;
>  	/* Don't allow mounting unless the caller has CAP_SYS_ADMIN
>  	 * over the namespace.
>  	 */
>  	if (!(flags & SB_KERNMOUNT) && !ns_capable(user_ns, CAP_SYS_ADMIN))
>  		return ERR_PTR(-EPERM);
>  
> -	sb = sget_userns(fs_type, ns_test_super, ns_set_super, flags,
> -			 user_ns, ns);
> +	if (ns_to_mnt) {
> +		test_super = NULL;
> +		if (!(flags & SB_KERNMOUNT)) {
> +			struct vfsmount *m = ns_to_mnt(ns);
> +			if (IS_ERR(m))
> +				return ERR_CAST(m);
> +			atomic_inc(&m->mnt_sb->s_active);
> +			down_write(&m->mnt_sb->s_umount);
> +			return dget(m->mnt_root);

This is completely wrong.  Look:
	* SB_KERNMOUNT and !SB_KERNMOUNT cases are almost entirely isolated;
completely so once that ns_to_mnt becomes unconditionally non-NULL.  
	* in !SB_KERNMOUNT passing ns_to_mnt() is pointless - you might as
well pass existing vfsmount (or ERR_PTR()) and use _that_.  fill_super()
is not used at all in that case.
	* is SB_KERNMOUNT ns_to_mnt serves only as a flag, eventually
constant true.

So let's split it in two helpers and give them sane arguments.
Eric W. Biederman March 24, 2018, 4:12 p.m.
Al Viro <viro@ZenIV.linux.org.uk> writes:

> On Fri, Mar 23, 2018 at 04:41:40PM -0500, Eric W. Biederman wrote:
>
>>  struct dentry *mount_ns(struct file_system_type *fs_type,
>>  	int flags, void *data, void *ns, struct user_namespace *user_ns,
>> +	struct vfsmount *(*ns_to_mnt)(void *ns),
>>  	int (*fill_super)(struct super_block *, void *, int))
>>  {
>>  	struct super_block *sb;
>> -
>> +	int (*test_super)(struct super_block *, void *) = ns_test_super;
>>  	/* Don't allow mounting unless the caller has CAP_SYS_ADMIN
>>  	 * over the namespace.
>>  	 */
>>  	if (!(flags & SB_KERNMOUNT) && !ns_capable(user_ns, CAP_SYS_ADMIN))
>>  		return ERR_PTR(-EPERM);
>>  
>> -	sb = sget_userns(fs_type, ns_test_super, ns_set_super, flags,
>> -			 user_ns, ns);
>> +	if (ns_to_mnt) {
>> +		test_super = NULL;
>> +		if (!(flags & SB_KERNMOUNT)) {
>> +			struct vfsmount *m = ns_to_mnt(ns);
>> +			if (IS_ERR(m))
>> +				return ERR_CAST(m);
>> +			atomic_inc(&m->mnt_sb->s_active);
>> +			down_write(&m->mnt_sb->s_umount);
>> +			return dget(m->mnt_root);
>
> This is completely wrong.  Look:
> 	* SB_KERNMOUNT and !SB_KERNMOUNT cases are almost entirely isolated;
> completely so once that ns_to_mnt becomes unconditionally non-NULL.  
> 	* in !SB_KERNMOUNT passing ns_to_mnt() is pointless - you might as
> well pass existing vfsmount (or ERR_PTR()) and use _that_.  fill_super()
> is not used at all in that case.
> 	* is SB_KERNMOUNT ns_to_mnt serves only as a flag, eventually
> constant true.
>
> So let's split it in two helpers and give them sane arguments.

Everything I look at with multiple helpers feels even worse to me.
The above has the advantage it is the minimal change to fix the
regression.  So I am not worried about code correctness.

I keep wondering is the intention long term to fix sget so it has an
efficient data structure for finding super blocks (like an rbtree) or if
the intention is to deprecate sget entirely and just have everything
call alloc_super, and be responsible for their own data structures for
finding existing superblocks.

At this point since we are not in agreement on a proper fix I am going
to plan on just queueing up a revert.   So that we don't ship 4.16 with
a regression in a permission check.

Eric
Al Viro March 24, 2018, 9:48 p.m.
On Sat, Mar 24, 2018 at 11:12:02AM -0500, Eric W. Biederman wrote:

> > This is completely wrong.  Look:
> > 	* SB_KERNMOUNT and !SB_KERNMOUNT cases are almost entirely isolated;
> > completely so once that ns_to_mnt becomes unconditionally non-NULL.  
> > 	* in !SB_KERNMOUNT passing ns_to_mnt() is pointless - you might as
> > well pass existing vfsmount (or ERR_PTR()) and use _that_.  fill_super()
> > is not used at all in that case.
> > 	* is SB_KERNMOUNT ns_to_mnt serves only as a flag, eventually
> > constant true.
> >
> > So let's split it in two helpers and give them sane arguments.
> 
> Everything I look at with multiple helpers feels even worse to me.
> The above has the advantage it is the minimal change to fix the
> regression.  So I am not worried about code correctness.

> I keep wondering is the intention long term to fix sget so it has an
> efficient data structure for finding super blocks (like an rbtree) or if
> the intention is to deprecate sget entirely and just have everything
> call alloc_super, and be responsible for their own data structures for
> finding existing superblocks.
>
> At this point since we are not in agreement on a proper fix I am going
> to plan on just queueing up a revert.   So that we don't ship 4.16 with
> a regression in a permission check.

Permission check is trivial to put back in; I'll do that.

FWIW, I don't believe that sget_userns() is a good place for any kind of
universal permission checks.  It's a library helper, not a place everything
must come through when mounting something.  So's mount_ns(), etc.

BTW, will you be at LSF?  I would suggest discussing the architectural
issues there - they are directly related to fsmount() proposals...