[rh8,1/4] fs/ve: add new FS_VE_MOUNT flag to allow mount in container init userns

Submitted by Konstantin Khorenko on March 18, 2021, 5:04 p.m.

Details

Message ID 20210318170409.903280-1-khorenko@virtuozzo.com
State New
Series "Series without cover letter"
Headers show

Commit Message

Konstantin Khorenko March 18, 2021, 5:04 p.m.
This patch is a part of vz7 commit 4e8e69eb16b1 ("fs/ve: add new
FS_VE_MOUNT flag to allow mount in container init userns")

Some filesystems are allowed to be mounted only in init userns in
mainstream/rh kernel. And some of those we still would like to mount in
Containers (like NFS, overlayfs) - thus in not-init userns.

We do check if particular filesystem is virtualized enough (or
implement missing virtualization), but still we would like to mimic
mainstream behavior and allow to mount those filesystems only in root
userns for Containers (not in every nested userns).

Thus introduce a new fs_flag to allow mounting the FS in root userns of
a Container.

https://jira.sw.ru/browse/PSBM-121284

Fixes: f6264f72dc29 ("ve/fs: check mount SYS_ADMIN permission in current
VE")

Signed-off-by: Konstantin Khorenko <khorenko@virtuozzo.com>
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
---
 fs/super.c         | 5 ++++-
 include/linux/fs.h | 1 +
 2 files changed, 5 insertions(+), 1 deletion(-)

Patch hide | download patch | download mbox

diff --git a/fs/super.c b/fs/super.c
index 24ac1e93f8a4..a7de90fc2d74 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -523,7 +523,10 @@  struct super_block *sget_userns(struct file_system_type *type,
 
 	if (!(flags & (SB_KERNMOUNT|SB_SUBMOUNT)) &&
 	    !(type->fs_flags & FS_USERNS_MOUNT) &&
-	    !ve_capable(CAP_SYS_ADMIN))
+	    !capable(CAP_SYS_ADMIN) &&
+	    /* FS_VE_MOUNT allows mount in container init userns */
+	    !((type->fs_flags & FS_VE_MOUNT) &&
+	       ve_capable(CAP_SYS_ADMIN)))
 		return ERR_PTR(-EPERM);
 retry:
 	spin_lock(&sb_lock);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 7427c0579771..544d7fa3ca58 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2221,6 +2221,7 @@  struct file_system_type {
 #define FS_USERNS_MOUNT		8	/* Can be mounted by userns root */
 
 #define FS_VIRTUALIZED		64	/* Can mount this fstype inside ve */
+#define FS_VE_MOUNT		128	/* Can be mounted in VE init userns */
 #define FS_RENAME_DOES_D_MOVE	32768	/* FS will handle d_move() during rename() internally. */
 	struct dentry *(*mount) (struct file_system_type *, int,
 		       const char *, void *);

Comments

Pavel Tikhomirov March 19, 2021, 7:57 a.m.
Reviewed-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>

Everything looks good. Though note that probably we have some more fs-es 
except nfs and ext4 which need the same change in future: devtmpfs, 
autofs, binfmt_misc, devpts, fuse, nfsd, proc, ramfs, sysfs, xfs, 
mqueue, shmem and rpcpipefs (At least all of them need to be checked).

On 3/18/21 8:04 PM, Konstantin Khorenko wrote:
> This patch is a part of vz7 commit 4e8e69eb16b1 ("fs/ve: add new
> FS_VE_MOUNT flag to allow mount in container init userns")
> 
> Some filesystems are allowed to be mounted only in init userns in
> mainstream/rh kernel. And some of those we still would like to mount in
> Containers (like NFS, overlayfs) - thus in not-init userns.
> 
> We do check if particular filesystem is virtualized enough (or
> implement missing virtualization), but still we would like to mimic
> mainstream behavior and allow to mount those filesystems only in root
> userns for Containers (not in every nested userns).
> 
> Thus introduce a new fs_flag to allow mounting the FS in root userns of
> a Container.
> 
> https://jira.sw.ru/browse/PSBM-121284
> 
> Fixes: f6264f72dc29 ("ve/fs: check mount SYS_ADMIN permission in current
> VE")
> 
> Signed-off-by: Konstantin Khorenko <khorenko@virtuozzo.com>
> Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
> ---
>   fs/super.c         | 5 ++++-
>   include/linux/fs.h | 1 +
>   2 files changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/super.c b/fs/super.c
> index 24ac1e93f8a4..a7de90fc2d74 100644
> --- a/fs/super.c
> +++ b/fs/super.c
> @@ -523,7 +523,10 @@ struct super_block *sget_userns(struct file_system_type *type,
>   
>   	if (!(flags & (SB_KERNMOUNT|SB_SUBMOUNT)) &&
>   	    !(type->fs_flags & FS_USERNS_MOUNT) &&
> -	    !ve_capable(CAP_SYS_ADMIN))
> +	    !capable(CAP_SYS_ADMIN) &&
> +	    /* FS_VE_MOUNT allows mount in container init userns */
> +	    !((type->fs_flags & FS_VE_MOUNT) &&
> +	       ve_capable(CAP_SYS_ADMIN)))
>   		return ERR_PTR(-EPERM);
>   retry:
>   	spin_lock(&sb_lock);
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 7427c0579771..544d7fa3ca58 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -2221,6 +2221,7 @@ struct file_system_type {
>   #define FS_USERNS_MOUNT		8	/* Can be mounted by userns root */
>   
>   #define FS_VIRTUALIZED		64	/* Can mount this fstype inside ve */
> +#define FS_VE_MOUNT		128	/* Can be mounted in VE init userns */
>   #define FS_RENAME_DOES_D_MOVE	32768	/* FS will handle d_move() during rename() internally. */
>   	struct dentry *(*mount) (struct file_system_type *, int,
>   		       const char *, void *);
>
Konstantin Khorenko March 19, 2021, 8:03 a.m.
On 03/19/2021 10:57 AM, Pavel Tikhomirov wrote:
> Reviewed-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
>
> Everything looks good. Though note that probably we have some more fs-es
> except nfs and ext4 which need the same change in future: devtmpfs,
> autofs, binfmt_misc, devpts, fuse, nfsd, proc, ramfs, sysfs, xfs,
> mqueue, shmem and rpcpipefs (At least all of them need to be checked).

Sure we have and i've already filed a task to review all of them later
https://jira.sw.ru/browse/PSBM-127322

Thank you for review!

>
> On 3/18/21 8:04 PM, Konstantin Khorenko wrote:
>> This patch is a part of vz7 commit 4e8e69eb16b1 ("fs/ve: add new
>> FS_VE_MOUNT flag to allow mount in container init userns")
>>
>> Some filesystems are allowed to be mounted only in init userns in
>> mainstream/rh kernel. And some of those we still would like to mount in
>> Containers (like NFS, overlayfs) - thus in not-init userns.
>>
>> We do check if particular filesystem is virtualized enough (or
>> implement missing virtualization), but still we would like to mimic
>> mainstream behavior and allow to mount those filesystems only in root
>> userns for Containers (not in every nested userns).
>>
>> Thus introduce a new fs_flag to allow mounting the FS in root userns of
>> a Container.
>>
>> https://jira.sw.ru/browse/PSBM-121284
>>
>> Fixes: f6264f72dc29 ("ve/fs: check mount SYS_ADMIN permission in current
>> VE")
>>
>> Signed-off-by: Konstantin Khorenko <khorenko@virtuozzo.com>
>> Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
>> ---
>>   fs/super.c         | 5 ++++-
>>   include/linux/fs.h | 1 +
>>   2 files changed, 5 insertions(+), 1 deletion(-)
>>
>> diff --git a/fs/super.c b/fs/super.c
>> index 24ac1e93f8a4..a7de90fc2d74 100644
>> --- a/fs/super.c
>> +++ b/fs/super.c
>> @@ -523,7 +523,10 @@ struct super_block *sget_userns(struct file_system_type *type,
>>
>>   	if (!(flags & (SB_KERNMOUNT|SB_SUBMOUNT)) &&
>>   	    !(type->fs_flags & FS_USERNS_MOUNT) &&
>> -	    !ve_capable(CAP_SYS_ADMIN))
>> +	    !capable(CAP_SYS_ADMIN) &&
>> +	    /* FS_VE_MOUNT allows mount in container init userns */
>> +	    !((type->fs_flags & FS_VE_MOUNT) &&
>> +	       ve_capable(CAP_SYS_ADMIN)))
>>   		return ERR_PTR(-EPERM);
>>   retry:
>>   	spin_lock(&sb_lock);
>> diff --git a/include/linux/fs.h b/include/linux/fs.h
>> index 7427c0579771..544d7fa3ca58 100644
>> --- a/include/linux/fs.h
>> +++ b/include/linux/fs.h
>> @@ -2221,6 +2221,7 @@ struct file_system_type {
>>   #define FS_USERNS_MOUNT		8	/* Can be mounted by userns root */
>>
>>   #define FS_VIRTUALIZED		64	/* Can mount this fstype inside ve */
>> +#define FS_VE_MOUNT		128	/* Can be mounted in VE init userns */
>>   #define FS_RENAME_DOES_D_MOVE	32768	/* FS will handle d_move() during rename() internally. */
>>   	struct dentry *(*mount) (struct file_system_type *, int,
>>   		       const char *, void *);
>>
>