[v3,1/4] fs, net: Standardize on file_receive helper to move fds across processes

Submitted by Kees Cook on June 11, 2020, 2:59 a.m.

Details

Message ID 202006101953.899EFB53@keescook
State New
Series "Add seccomp notifier ioctl that enables adding fds"
Headers show

Commit Message

Kees Cook June 11, 2020, 2:59 a.m.
On Wed, Jun 10, 2020 at 08:12:38AM +0000, Sargun Dhillon wrote:
> As an aside, all of this junk should be dropped:
> +	ret = get_user(size, &uaddfd->size);
> +	if (ret)
> +		return ret;
> +
> +	ret = copy_struct_from_user(&addfd, sizeof(addfd), uaddfd, size);
> +	if (ret)
> +		return ret;
> 
> and the size member of the seccomp_notif_addfd struct. I brought this up 
> off-list with Tycho that ioctls have the size of the struct embedded in them. We 
> should just use that. The ioctl definition is based on this[2]:
> #define _IOC(dir,type,nr,size) \
> 	(((dir)  << _IOC_DIRSHIFT) | \
> 	 ((type) << _IOC_TYPESHIFT) | \
> 	 ((nr)   << _IOC_NRSHIFT) | \
> 	 ((size) << _IOC_SIZESHIFT))
> 
> 
> We should just use copy_from_user for now. In the future, we can either 
> introduce new ioctl names for new structs, or extract the size dynamically from 
> the ioctl (and mask it out on the switch statement in seccomp_notify_ioctl.

Yeah, that seems reasonable. Here's the diff for that part:

Patch hide | download patch | download mbox

diff --git a/include/uapi/linux/seccomp.h b/include/uapi/linux/seccomp.h
index 7b6028b399d8..98bf19b4e086 100644
--- a/include/uapi/linux/seccomp.h
+++ b/include/uapi/linux/seccomp.h
@@ -118,7 +118,6 @@  struct seccomp_notif_resp {
 
 /**
  * struct seccomp_notif_addfd
- * @size: The size of the seccomp_notif_addfd datastructure
  * @id: The ID of the seccomp notification
  * @flags: SECCOMP_ADDFD_FLAG_*
  * @srcfd: The local fd number
@@ -126,7 +125,6 @@  struct seccomp_notif_resp {
  * @newfd_flags: The O_* flags the remote FD should have applied
  */
 struct seccomp_notif_addfd {
-	__u64 size;
 	__u64 id;
 	__u32 flags;
 	__u32 srcfd;
diff --git a/kernel/seccomp.c b/kernel/seccomp.c
index 3c913f3b8451..00cbdad6c480 100644
--- a/kernel/seccomp.c
+++ b/kernel/seccomp.c
@@ -1297,14 +1297,9 @@  static long seccomp_notify_addfd(struct seccomp_filter *filter,
 	struct seccomp_notif_addfd addfd;
 	struct seccomp_knotif *knotif;
 	struct seccomp_kaddfd kaddfd;
-	u64 size;
 	int ret;
 
-	ret = get_user(size, &uaddfd->size);
-	if (ret)
-		return ret;
-
-	ret = copy_struct_from_user(&addfd, sizeof(addfd), uaddfd, size);
+	ret = copy_from_user(&addfd, uaddfd, sizeof(addfd));
 	if (ret)
 		return ret;
 

> 
> ----
> +#define SECCOMP_IOCTL_NOTIF_ADDFD	SECCOMP_IOR(3,	\
> +						struct seccomp_notif_addfd)
> 
> Lastly, what I believe to be a small mistake, it should be SECCOMP_IOW, based on 
> the documentation in ioctl.h -- "_IOW means userland is writing and kernel is 
> reading."

Oooooh. Yeah; good catch. Uhm, that means SECCOMP_IOCTL_NOTIF_ID_VALID
is wrong too, yes? Tycho, Christian, how disruptive would this be to
fix? (Perhaps support both and deprecate the IOR version at some point
in the future?)

Diff for just addfd's change:

diff --git a/include/uapi/linux/seccomp.h b/include/uapi/linux/seccomp.h
index 7b6028b399d8..98bf19b4e086 100644
--- a/include/uapi/linux/seccomp.h
+++ b/include/uapi/linux/seccomp.h
@@ -146,7 +144,7 @@  struct seccomp_notif_addfd {
 						struct seccomp_notif_resp)
 #define SECCOMP_IOCTL_NOTIF_ID_VALID	SECCOMP_IOR(2, __u64)
 /* On success, the return value is the remote process's added fd number */
-#define SECCOMP_IOCTL_NOTIF_ADDFD	SECCOMP_IOR(3,	\
+#define SECCOMP_IOCTL_NOTIF_ADDFD	SECCOMP_IOW(3,	\
 						struct seccomp_notif_addfd)
 
 #endif /* _UAPI_LINUX_SECCOMP_H */

Comments

Sargun Dhillon June 11, 2020, 4:41 a.m.
On Wed, Jun 10, 2020 at 07:59:55PM -0700, Kees Cook wrote:
> 
> Yeah, that seems reasonable. Here's the diff for that part:
> 
> diff --git a/include/uapi/linux/seccomp.h b/include/uapi/linux/seccomp.h
> index 7b6028b399d8..98bf19b4e086 100644
> --- a/include/uapi/linux/seccomp.h
> +++ b/include/uapi/linux/seccomp.h
> @@ -118,7 +118,6 @@ struct seccomp_notif_resp {
>  
>  /**
>   * struct seccomp_notif_addfd
> - * @size: The size of the seccomp_notif_addfd datastructure
>   * @id: The ID of the seccomp notification
>   * @flags: SECCOMP_ADDFD_FLAG_*
>   * @srcfd: The local fd number
> @@ -126,7 +125,6 @@ struct seccomp_notif_resp {
>   * @newfd_flags: The O_* flags the remote FD should have applied
>   */
>  struct seccomp_notif_addfd {
> -	__u64 size;
>  	__u64 id;
>  	__u32 flags;
>  	__u32 srcfd;
> diff --git a/kernel/seccomp.c b/kernel/seccomp.c
> index 3c913f3b8451..00cbdad6c480 100644
> --- a/kernel/seccomp.c
> +++ b/kernel/seccomp.c
> @@ -1297,14 +1297,9 @@ static long seccomp_notify_addfd(struct seccomp_filter *filter,
>  	struct seccomp_notif_addfd addfd;
>  	struct seccomp_knotif *knotif;
>  	struct seccomp_kaddfd kaddfd;
> -	u64 size;
>  	int ret;
>  
> -	ret = get_user(size, &uaddfd->size);
> -	if (ret)
> -		return ret;
> -
> -	ret = copy_struct_from_user(&addfd, sizeof(addfd), uaddfd, size);
> +	ret = copy_from_user(&addfd, uaddfd, sizeof(addfd));
>  	if (ret)
>  		return ret;
>  
> 
Looks good to me. If we ever change the size of this struct, we can do the work 
then to copy_struct_from_user.

> > 
> > ----
> > +#define SECCOMP_IOCTL_NOTIF_ADDFD	SECCOMP_IOR(3,	\
> > +						struct seccomp_notif_addfd)
> > 
> > Lastly, what I believe to be a small mistake, it should be SECCOMP_IOW, based on 
> > the documentation in ioctl.h -- "_IOW means userland is writing and kernel is 
> > reading."
> 
> Oooooh. Yeah; good catch. Uhm, that means SECCOMP_IOCTL_NOTIF_ID_VALID
> is wrong too, yes? Tycho, Christian, how disruptive would this be to
> fix? (Perhaps support both and deprecate the IOR version at some point
> in the future?)
I think at a minimum we should change the uapi, and accept both (for now). Maybe 
a pr_warn_once telling people not to use the old one.

I can do the patch, if you want. 
> 
> Diff for just addfd's change:
> 
> diff --git a/include/uapi/linux/seccomp.h b/include/uapi/linux/seccomp.h
> index 7b6028b399d8..98bf19b4e086 100644
> --- a/include/uapi/linux/seccomp.h
> +++ b/include/uapi/linux/seccomp.h
> @@ -146,7 +144,7 @@ struct seccomp_notif_addfd {
>  						struct seccomp_notif_resp)
>  #define SECCOMP_IOCTL_NOTIF_ID_VALID	SECCOMP_IOR(2, __u64)
>  /* On success, the return value is the remote process's added fd number */
> -#define SECCOMP_IOCTL_NOTIF_ADDFD	SECCOMP_IOR(3,	\
> +#define SECCOMP_IOCTL_NOTIF_ADDFD	SECCOMP_IOW(3,	\
>  						struct seccomp_notif_addfd)
>  
>  #endif /* _UAPI_LINUX_SECCOMP_H */
> 
> -- 
> Kees Cook
Looks good. Thank you.
Christian Brauner June 11, 2020, 9:19 a.m.
On Wed, Jun 10, 2020 at 07:59:55PM -0700, Kees Cook wrote:
> On Wed, Jun 10, 2020 at 08:12:38AM +0000, Sargun Dhillon wrote:
> > As an aside, all of this junk should be dropped:
> > +	ret = get_user(size, &uaddfd->size);
> > +	if (ret)
> > +		return ret;
> > +
> > +	ret = copy_struct_from_user(&addfd, sizeof(addfd), uaddfd, size);
> > +	if (ret)
> > +		return ret;
> > 
> > and the size member of the seccomp_notif_addfd struct. I brought this up 
> > off-list with Tycho that ioctls have the size of the struct embedded in them. We 
> > should just use that. The ioctl definition is based on this[2]:
> > #define _IOC(dir,type,nr,size) \
> > 	(((dir)  << _IOC_DIRSHIFT) | \
> > 	 ((type) << _IOC_TYPESHIFT) | \
> > 	 ((nr)   << _IOC_NRSHIFT) | \
> > 	 ((size) << _IOC_SIZESHIFT))
> > 
> > 
> > We should just use copy_from_user for now. In the future, we can either 
> > introduce new ioctl names for new structs, or extract the size dynamically from 
> > the ioctl (and mask it out on the switch statement in seccomp_notify_ioctl.
> 
> Yeah, that seems reasonable. Here's the diff for that part:
> 
> diff --git a/include/uapi/linux/seccomp.h b/include/uapi/linux/seccomp.h
> index 7b6028b399d8..98bf19b4e086 100644
> --- a/include/uapi/linux/seccomp.h
> +++ b/include/uapi/linux/seccomp.h
> @@ -118,7 +118,6 @@ struct seccomp_notif_resp {
>  
>  /**
>   * struct seccomp_notif_addfd
> - * @size: The size of the seccomp_notif_addfd datastructure
>   * @id: The ID of the seccomp notification
>   * @flags: SECCOMP_ADDFD_FLAG_*
>   * @srcfd: The local fd number
> @@ -126,7 +125,6 @@ struct seccomp_notif_resp {
>   * @newfd_flags: The O_* flags the remote FD should have applied
>   */
>  struct seccomp_notif_addfd {
> -	__u64 size;
>  	__u64 id;
>  	__u32 flags;
>  	__u32 srcfd;
> diff --git a/kernel/seccomp.c b/kernel/seccomp.c
> index 3c913f3b8451..00cbdad6c480 100644
> --- a/kernel/seccomp.c
> +++ b/kernel/seccomp.c
> @@ -1297,14 +1297,9 @@ static long seccomp_notify_addfd(struct seccomp_filter *filter,
>  	struct seccomp_notif_addfd addfd;
>  	struct seccomp_knotif *knotif;
>  	struct seccomp_kaddfd kaddfd;
> -	u64 size;
>  	int ret;
>  
> -	ret = get_user(size, &uaddfd->size);
> -	if (ret)
> -		return ret;
> -
> -	ret = copy_struct_from_user(&addfd, sizeof(addfd), uaddfd, size);
> +	ret = copy_from_user(&addfd, uaddfd, sizeof(addfd));
>  	if (ret)
>  		return ret;
>  
> 
> > 
> > ----
> > +#define SECCOMP_IOCTL_NOTIF_ADDFD	SECCOMP_IOR(3,	\
> > +						struct seccomp_notif_addfd)
> > 
> > Lastly, what I believe to be a small mistake, it should be SECCOMP_IOW, based on 
> > the documentation in ioctl.h -- "_IOW means userland is writing and kernel is 
> > reading."
> 
> Oooooh. Yeah; good catch. Uhm, that means SECCOMP_IOCTL_NOTIF_ID_VALID
> is wrong too, yes? Tycho, Christian, how disruptive would this be to
> fix? (Perhaps support both and deprecate the IOR version at some point
> in the future?)

We have custom defines in our source code, i.e.
#define SECCOMP_IOCTL_NOTIF_ID_VALID  SECCOMP_IOR(2, __u64)
so ideally we'd have a SECCOMP_IOCTL_NOTIF_ID_VALID_V2

Does that sound ok?

Christian
Christian Brauner June 11, 2020, 10:01 a.m.
On Wed, Jun 10, 2020 at 07:59:55PM -0700, Kees Cook wrote:
> On Wed, Jun 10, 2020 at 08:12:38AM +0000, Sargun Dhillon wrote:
> > As an aside, all of this junk should be dropped:
> > +	ret = get_user(size, &uaddfd->size);
> > +	if (ret)
> > +		return ret;
> > +
> > +	ret = copy_struct_from_user(&addfd, sizeof(addfd), uaddfd, size);
> > +	if (ret)
> > +		return ret;
> > 
> > and the size member of the seccomp_notif_addfd struct. I brought this up 
> > off-list with Tycho that ioctls have the size of the struct embedded in them. We 
> > should just use that. The ioctl definition is based on this[2]:
> > #define _IOC(dir,type,nr,size) \
> > 	(((dir)  << _IOC_DIRSHIFT) | \
> > 	 ((type) << _IOC_TYPESHIFT) | \
> > 	 ((nr)   << _IOC_NRSHIFT) | \
> > 	 ((size) << _IOC_SIZESHIFT))
> > 
> > 
> > We should just use copy_from_user for now. In the future, we can either 
> > introduce new ioctl names for new structs, or extract the size dynamically from 
> > the ioctl (and mask it out on the switch statement in seccomp_notify_ioctl.
> 
> Yeah, that seems reasonable. Here's the diff for that part:

Why does it matter that the ioctl() has the size of the struct embedded
within? Afaik, the kernel itself doesn't do anything with that size. It
merely checks that the size is not pathological and it does so at
compile time.

#ifdef __CHECKER__
#define _IOC_TYPECHECK(t) (sizeof(t))
#else
/* provoke compile error for invalid uses of size argument */
extern unsigned int __invalid_size_argument_for_IOC;
#define _IOC_TYPECHECK(t) \
	((sizeof(t) == sizeof(t[1]) && \
	  sizeof(t) < (1 << _IOC_SIZEBITS)) ? \
	  sizeof(t) : __invalid_size_argument_for_IOC)
#endif

The size itself is not verified at runtime. copy_struct_from_user()
still makes sense at least if we're going to allow expanding the struct
in the future.

Leaving that aside, the proposed direction here seems to mean that any
change to the struct itself will immediately mean a new ioctl() but
afaict, that also means a new struct. Since when you simply extend the
struct for the sake of the new ioctl you also change the size for the
ioctl.

Sure, you can simply treat the struct coming through the old ioctl as
being "capped" by e.g. hiding the size as suggested but then the gain
by having two separate ioctls is 0 compared to simply versioning the
struct with an explicit size member since the size encoded in the ioctl
and the actual size of the struct don't line up anymore which is the
only plus I can see for relying on _IOC_SIZE(). All this manages to do
then is to make it more annoying for userspace since they now need to
maintain multiple ioctls(). And if you have - however unlikely - say
three different ioctls all to be used with a different struct size of
the same struct I now need to know which ioctl() goes with which size of
the struct (I guess you could append the size to the ioctl name?
*shudder*). If you have the size in the struct itself you don't need to
care about any of that.
Maybe I'm not making sense or I misunderstand what's going on though.

Christian

> 
> diff --git a/include/uapi/linux/seccomp.h b/include/uapi/linux/seccomp.h
> index 7b6028b399d8..98bf19b4e086 100644
> --- a/include/uapi/linux/seccomp.h
> +++ b/include/uapi/linux/seccomp.h
> @@ -118,7 +118,6 @@ struct seccomp_notif_resp {
>  
>  /**
>   * struct seccomp_notif_addfd
> - * @size: The size of the seccomp_notif_addfd datastructure
>   * @id: The ID of the seccomp notification
>   * @flags: SECCOMP_ADDFD_FLAG_*
>   * @srcfd: The local fd number
> @@ -126,7 +125,6 @@ struct seccomp_notif_resp {
>   * @newfd_flags: The O_* flags the remote FD should have applied
>   */
>  struct seccomp_notif_addfd {
> -	__u64 size;
>  	__u64 id;
>  	__u32 flags;
>  	__u32 srcfd;
> diff --git a/kernel/seccomp.c b/kernel/seccomp.c
> index 3c913f3b8451..00cbdad6c480 100644
> --- a/kernel/seccomp.c
> +++ b/kernel/seccomp.c
> @@ -1297,14 +1297,9 @@ static long seccomp_notify_addfd(struct seccomp_filter *filter,
>  	struct seccomp_notif_addfd addfd;
>  	struct seccomp_knotif *knotif;
>  	struct seccomp_kaddfd kaddfd;
> -	u64 size;
>  	int ret;
>  
> -	ret = get_user(size, &uaddfd->size);
> -	if (ret)
> -		return ret;
> -
> -	ret = copy_struct_from_user(&addfd, sizeof(addfd), uaddfd, size);
> +	ret = copy_from_user(&addfd, uaddfd, sizeof(addfd));
>  	if (ret)
>  		return ret;
>  
> 
> > 
> > ----
> > +#define SECCOMP_IOCTL_NOTIF_ADDFD	SECCOMP_IOR(3,	\
> > +						struct seccomp_notif_addfd)
> > 
> > Lastly, what I believe to be a small mistake, it should be SECCOMP_IOW, based on 
> > the documentation in ioctl.h -- "_IOW means userland is writing and kernel is 
> > reading."
> 
> Oooooh. Yeah; good catch. Uhm, that means SECCOMP_IOCTL_NOTIF_ID_VALID
> is wrong too, yes? Tycho, Christian, how disruptive would this be to
> fix? (Perhaps support both and deprecate the IOR version at some point
> in the future?)
> 
> Diff for just addfd's change:
> 
> diff --git a/include/uapi/linux/seccomp.h b/include/uapi/linux/seccomp.h
> index 7b6028b399d8..98bf19b4e086 100644
> --- a/include/uapi/linux/seccomp.h
> +++ b/include/uapi/linux/seccomp.h
> @@ -146,7 +144,7 @@ struct seccomp_notif_addfd {
>  						struct seccomp_notif_resp)
>  #define SECCOMP_IOCTL_NOTIF_ID_VALID	SECCOMP_IOR(2, __u64)
>  /* On success, the return value is the remote process's added fd number */
> -#define SECCOMP_IOCTL_NOTIF_ADDFD	SECCOMP_IOR(3,	\
> +#define SECCOMP_IOCTL_NOTIF_ADDFD	SECCOMP_IOW(3,	\
>  						struct seccomp_notif_addfd)
>  
>  #endif /* _UAPI_LINUX_SECCOMP_H */
> 
> -- 
> Kees Cook
Sargun Dhillon June 11, 2020, 10:39 a.m.
On Thu, Jun 11, 2020 at 11:19:42AM +0200, Christian Brauner wrote:
> On Wed, Jun 10, 2020 at 07:59:55PM -0700, Kees Cook wrote:
> > On Wed, Jun 10, 2020 at 08:12:38AM +0000, Sargun Dhillon wrote:
> > > As an aside, all of this junk should be dropped:
> > > +	ret = get_user(size, &uaddfd->size);
> > > +	if (ret)
> > > +		return ret;
> > > +
> > > +	ret = copy_struct_from_user(&addfd, sizeof(addfd), uaddfd, size);
> > > +	if (ret)
> > > +		return ret;
> > > 
> > > and the size member of the seccomp_notif_addfd struct. I brought this up 
> > > off-list with Tycho that ioctls have the size of the struct embedded in them. We 
> > > should just use that. The ioctl definition is based on this[2]:
> > > #define _IOC(dir,type,nr,size) \
> > > 	(((dir)  << _IOC_DIRSHIFT) | \
> > > 	 ((type) << _IOC_TYPESHIFT) | \
> > > 	 ((nr)   << _IOC_NRSHIFT) | \
> > > 	 ((size) << _IOC_SIZESHIFT))
> > > 
> > > 
> > > We should just use copy_from_user for now. In the future, we can either 
> > > introduce new ioctl names for new structs, or extract the size dynamically from 
> > > the ioctl (and mask it out on the switch statement in seccomp_notify_ioctl.
> > 
> > Yeah, that seems reasonable. Here's the diff for that part:
> > 
> > diff --git a/include/uapi/linux/seccomp.h b/include/uapi/linux/seccomp.h
> > index 7b6028b399d8..98bf19b4e086 100644
> > --- a/include/uapi/linux/seccomp.h
> > +++ b/include/uapi/linux/seccomp.h
> > @@ -118,7 +118,6 @@ struct seccomp_notif_resp {
> >  
> >  /**
> >   * struct seccomp_notif_addfd
> > - * @size: The size of the seccomp_notif_addfd datastructure
> >   * @id: The ID of the seccomp notification
> >   * @flags: SECCOMP_ADDFD_FLAG_*
> >   * @srcfd: The local fd number
> > @@ -126,7 +125,6 @@ struct seccomp_notif_resp {
> >   * @newfd_flags: The O_* flags the remote FD should have applied
> >   */
> >  struct seccomp_notif_addfd {
> > -	__u64 size;
> >  	__u64 id;
> >  	__u32 flags;
> >  	__u32 srcfd;
> > diff --git a/kernel/seccomp.c b/kernel/seccomp.c
> > index 3c913f3b8451..00cbdad6c480 100644
> > --- a/kernel/seccomp.c
> > +++ b/kernel/seccomp.c
> > @@ -1297,14 +1297,9 @@ static long seccomp_notify_addfd(struct seccomp_filter *filter,
> >  	struct seccomp_notif_addfd addfd;
> >  	struct seccomp_knotif *knotif;
> >  	struct seccomp_kaddfd kaddfd;
> > -	u64 size;
> >  	int ret;
> >  
> > -	ret = get_user(size, &uaddfd->size);
> > -	if (ret)
> > -		return ret;
> > -
> > -	ret = copy_struct_from_user(&addfd, sizeof(addfd), uaddfd, size);
> > +	ret = copy_from_user(&addfd, uaddfd, sizeof(addfd));
> >  	if (ret)
> >  		return ret;
> >  
> > 
> > > 
> > > ----
> > > +#define SECCOMP_IOCTL_NOTIF_ADDFD	SECCOMP_IOR(3,	\
> > > +						struct seccomp_notif_addfd)
> > > 
> > > Lastly, what I believe to be a small mistake, it should be SECCOMP_IOW, based on 
> > > the documentation in ioctl.h -- "_IOW means userland is writing and kernel is 
> > > reading."
> > 
> > Oooooh. Yeah; good catch. Uhm, that means SECCOMP_IOCTL_NOTIF_ID_VALID
> > is wrong too, yes? Tycho, Christian, how disruptive would this be to
> > fix? (Perhaps support both and deprecate the IOR version at some point
> > in the future?)
> 
> We have custom defines in our source code, i.e.
> #define SECCOMP_IOCTL_NOTIF_ID_VALID  SECCOMP_IOR(2, __u64)
> so ideally we'd have a SECCOMP_IOCTL_NOTIF_ID_VALID_V2
> 
> Does that sound ok?
> 
> Christian
Why not change the public API in seccomp.h to:
#define SECCOMP_IOCTL_NOTIF_ID_VALID	SECCOMP_IOW(2, __u64)

And then in seccomp.c:
#define SECCOMP_IOCTL_NOTIF_ID_VALID_OLD	SECCOMP_IOR(2, __u64)
static long seccomp_notify_ioctl(struct file *file, unsigned int cmd,
				 unsigned long arg)
{
	struct seccomp_filter *filter = file->private_data;
	void __user *buf = (void __user *)arg;

	switch (cmd) {
	case SECCOMP_IOCTL_NOTIF_RECV:
		return seccomp_notify_recv(filter, buf);
	case SECCOMP_IOCTL_NOTIF_SEND:
		return seccomp_notify_send(filter, buf);
	case SECCOMP_IOCTL_NOTIF_ID_VALID_OLD:
		pr_warn_once("Detected usage of legacy (incorrect) version of seccomp notifier notif_id_valid ioctl\n");
	case SECCOMP_IOCTL_NOTIF_ID_VALID:
		return seccomp_notify_id_valid(filter, buf);
	default:
		return -EINVAL;
	}
}
---- 

So, both will work fine, and whenevery anyone recompiles, or picks up new 
headers, they will start calling the "right" one without a code change, and
we wont break any userspace.
Sargun Dhillon June 11, 2020, 11:06 a.m.
On Thu, Jun 11, 2020 at 12:01:14PM +0200, Christian Brauner wrote:
> On Wed, Jun 10, 2020 at 07:59:55PM -0700, Kees Cook wrote:
> > On Wed, Jun 10, 2020 at 08:12:38AM +0000, Sargun Dhillon wrote:
> > > As an aside, all of this junk should be dropped:
> > > +	ret = get_user(size, &uaddfd->size);
> > > +	if (ret)
> > > +		return ret;
> > > +
> > > +	ret = copy_struct_from_user(&addfd, sizeof(addfd), uaddfd, size);
> > > +	if (ret)
> > > +		return ret;
> > > 
> > > and the size member of the seccomp_notif_addfd struct. I brought this up 
> > > off-list with Tycho that ioctls have the size of the struct embedded in them. We 
> > > should just use that. The ioctl definition is based on this[2]:
> > > #define _IOC(dir,type,nr,size) \
> > > 	(((dir)  << _IOC_DIRSHIFT) | \
> > > 	 ((type) << _IOC_TYPESHIFT) | \
> > > 	 ((nr)   << _IOC_NRSHIFT) | \
> > > 	 ((size) << _IOC_SIZESHIFT))
> > > 
> > > 
> > > We should just use copy_from_user for now. In the future, we can either 
> > > introduce new ioctl names for new structs, or extract the size dynamically from 
> > > the ioctl (and mask it out on the switch statement in seccomp_notify_ioctl.
> > 
> > Yeah, that seems reasonable. Here's the diff for that part:
> 
> Why does it matter that the ioctl() has the size of the struct embedded
> within? Afaik, the kernel itself doesn't do anything with that size. It
> merely checks that the size is not pathological and it does so at
> compile time.
> 
> #ifdef __CHECKER__
> #define _IOC_TYPECHECK(t) (sizeof(t))
> #else
> /* provoke compile error for invalid uses of size argument */
> extern unsigned int __invalid_size_argument_for_IOC;
> #define _IOC_TYPECHECK(t) \
> 	((sizeof(t) == sizeof(t[1]) && \
> 	  sizeof(t) < (1 << _IOC_SIZEBITS)) ? \
> 	  sizeof(t) : __invalid_size_argument_for_IOC)
> #endif
> 
> The size itself is not verified at runtime. copy_struct_from_user()
> still makes sense at least if we're going to allow expanding the struct
> in the future.
Right, but if we simply change our headers and extend the struct, it will break 
all existing programs compiled against those headers. In order to avoid that, if 
we intend on extending this struct by appending to it, we need to have a 
backwards compatibility mechanism. Just having copy_struct_from_user isn't 
enough. The data structure either must be fixed size, or we need a way to handle 
multiple ioctl numbers derived from headers with different sized struct arguments

The two approaches I see are:
1. use more indirection. This has previous art in drm[1]. That's look
something like this:

struct seccomp_notif_addfd_ptr {
	__u64 size;
	__u64 addr;
}

... And then it'd be up to us to dereference the addr and copy struct from user.

2. Expose one ioctl to the user, many internally

e.g., public api:

struct seccomp_notif {
	__u64 id;
	__u64 pid;
	struct seccomp_data;
	__u64 fancy_new_field;
}

#define SECCOMP_IOCTL_NOTIF_RECV	SECCOMP_IOWR(0, struct seccomp_notif)

internally:
struct seccomp_notif_v1 {
	__u64 id;
	__u64 pid;
	struct seccomp_data;
}

struct seccomp_notif_v2 {
	__u64 id;
	__u64 pid;
	struct seccomp_data;
	__u64 fancy_new_field;
}

and we can switch like this:
	switch (cmd) {
	/* for example. We actually have to do this for any struct we intend to 
	 * extend to get proper backwards compatibility
	 */
	case SECCOMP_IOWR(0, struct seccomp_notif_v1)
		return seccomp_notify_recv(filter, buf, sizeof(struct seccomp_notif_v1));
	case SECCOMP_IOWR(0, struct seccomp_notif_v2)
		return seccomp_notify_recv(filter, buf, sizeof(struct seccomp_notif_v3));
...
	case SECCOMP_IOCTL_NOTIF_SEND:
		return seccomp_notify_send(filter, buf);
	case SECCOMP_IOCTL_NOTIF_ID_VALID:
		return seccomp_notify_id_valid(filter, buf);
	default:
		return -EINVAL;
	}

This has the downside that programs compiled against more modern kernel headers 
will break on older kernels.

3. We can take the approach you suggested.

#define UNSIZED(cmd)	(cmd & ~(_IOC_SIZEMASK << _IOC_SIZESHIFT)
static long seccomp_notify_ioctl(struct file *file, unsigned int cmd,
				 unsigned long arg)
{
	struct seccomp_filter *filter = file->private_data;
	void __user *buf = (void __user *)arg;
	int size = _IOC_SIZE(cmd);
	cmd = UNSIZED(cmd);

	switch (cmd) {
	/* for example. We actually have to do this for any struct we intend to 
	 * extend to get proper backwards compatibility
	 */
	case UNSIZED(SECCOMP_IOCTL_NOTIF_RECV):
		return seccomp_notify_recv(filter, buf, size);
...
	case SECCOMP_IOCTL_NOTIF_SEND:
		return seccomp_notify_send(filter, buf);
	case SECCOMP_IOCTL_NOTIF_ID_VALID:
		return seccomp_notify_id_valid(filter, buf);
	default:
		return -EINVAL;
	}
}

> 
> Leaving that aside, the proposed direction here seems to mean that any
> change to the struct itself will immediately mean a new ioctl() but
> afaict, that also means a new struct. Since when you simply extend the
> struct for the sake of the new ioctl you also change the size for the
> ioctl.
> 
> Sure, you can simply treat the struct coming through the old ioctl as
> being "capped" by e.g. hiding the size as suggested but then the gain
> by having two separate ioctls is 0 compared to simply versioning the
> struct with an explicit size member since the size encoded in the ioctl
> and the actual size of the struct don't line up anymore which is the
> only plus I can see for relying on _IOC_SIZE(). All this manages to do
> then is to make it more annoying for userspace since they now need to
> maintain multiple ioctls(). And if you have - however unlikely - say
> three different ioctls all to be used with a different struct size of
> the same struct I now need to know which ioctl() goes with which size of
> the struct (I guess you could append the size to the ioctl name?
> *shudder*). If you have the size in the struct itself you don't need to
> care about any of that.
> Maybe I'm not making sense or I misunderstand what's going on though.
> 
> Christian
> 
I don't understand why userspace has to have any knowledge of this. As soon as 
we add the code above, and we use copy_struct_from_user based on _that_ size,
userspace will get free upgrades. If they are compiling against an older header
than the kernel, size will return a smaller number, and thus we will zero
out our trailing bits, and if their number is bigger, we just check their
bits are appropriately zeroed.

This approach would be forwards-and-backwards compatible.

There's a little bit of prior art here as well [2]. The approach is that
we effectively do the thing we had earlier with passing a size with
copy_struct_from_user, but instead of the size being embedded in the struct,
it's embedded in the ioctl command itself.


[1]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/include/uapi/drm/radeon_drm.h?h=v5.7#n831
[2]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/firewire/core-cdev.c?id=v5.7#n1621
Christian Brauner June 11, 2020, 2:42 p.m.
On Thu, Jun 11, 2020 at 11:06:31AM +0000, Sargun Dhillon wrote:
> On Thu, Jun 11, 2020 at 12:01:14PM +0200, Christian Brauner wrote:
> > On Wed, Jun 10, 2020 at 07:59:55PM -0700, Kees Cook wrote:
> > > On Wed, Jun 10, 2020 at 08:12:38AM +0000, Sargun Dhillon wrote:
> > > > As an aside, all of this junk should be dropped:
> > > > +	ret = get_user(size, &uaddfd->size);
> > > > +	if (ret)
> > > > +		return ret;
> > > > +
> > > > +	ret = copy_struct_from_user(&addfd, sizeof(addfd), uaddfd, size);
> > > > +	if (ret)
> > > > +		return ret;
> > > > 
> > > > and the size member of the seccomp_notif_addfd struct. I brought this up 
> > > > off-list with Tycho that ioctls have the size of the struct embedded in them. We 
> > > > should just use that. The ioctl definition is based on this[2]:
> > > > #define _IOC(dir,type,nr,size) \
> > > > 	(((dir)  << _IOC_DIRSHIFT) | \
> > > > 	 ((type) << _IOC_TYPESHIFT) | \
> > > > 	 ((nr)   << _IOC_NRSHIFT) | \
> > > > 	 ((size) << _IOC_SIZESHIFT))
> > > > 
> > > > 
> > > > We should just use copy_from_user for now. In the future, we can either 
> > > > introduce new ioctl names for new structs, or extract the size dynamically from 
> > > > the ioctl (and mask it out on the switch statement in seccomp_notify_ioctl.
> > > 
> > > Yeah, that seems reasonable. Here's the diff for that part:
> > 
> > Why does it matter that the ioctl() has the size of the struct embedded
> > within? Afaik, the kernel itself doesn't do anything with that size. It
> > merely checks that the size is not pathological and it does so at
> > compile time.
> > 
> > #ifdef __CHECKER__
> > #define _IOC_TYPECHECK(t) (sizeof(t))
> > #else
> > /* provoke compile error for invalid uses of size argument */
> > extern unsigned int __invalid_size_argument_for_IOC;
> > #define _IOC_TYPECHECK(t) \
> > 	((sizeof(t) == sizeof(t[1]) && \
> > 	  sizeof(t) < (1 << _IOC_SIZEBITS)) ? \
> > 	  sizeof(t) : __invalid_size_argument_for_IOC)
> > #endif
> > 
> > The size itself is not verified at runtime. copy_struct_from_user()
> > still makes sense at least if we're going to allow expanding the struct
> > in the future.
> Right, but if we simply change our headers and extend the struct, it will break 
> all existing programs compiled against those headers. In order to avoid that, if 
> we intend on extending this struct by appending to it, we need to have a 
> backwards compatibility mechanism. Just having copy_struct_from_user isn't 
> enough. The data structure either must be fixed size, or we need a way to handle 
> multiple ioctl numbers derived from headers with different sized struct arguments
> 
> The two approaches I see are:
> 1. use more indirection. This has previous art in drm[1]. That's look
> something like this:
> 
> struct seccomp_notif_addfd_ptr {
> 	__u64 size;
> 	__u64 addr;
> }
> 
> ... And then it'd be up to us to dereference the addr and copy struct from user.

Which isn't great but could do.

> 
> 2. Expose one ioctl to the user, many internally
> 
> e.g., public api:
> 
> struct seccomp_notif {
> 	__u64 id;
> 	__u64 pid;
> 	struct seccomp_data;
> 	__u64 fancy_new_field;
> }
> 
> #define SECCOMP_IOCTL_NOTIF_RECV	SECCOMP_IOWR(0, struct seccomp_notif)
> 
> internally:
> struct seccomp_notif_v1 {
> 	__u64 id;
> 	__u64 pid;
> 	struct seccomp_data;
> }
> 
> struct seccomp_notif_v2 {
> 	__u64 id;
> 	__u64 pid;
> 	struct seccomp_data;
> 	__u64 fancy_new_field;
> }
> 
> and we can switch like this:
> 	switch (cmd) {
> 	/* for example. We actually have to do this for any struct we intend to 
> 	 * extend to get proper backwards compatibility
> 	 */
> 	case SECCOMP_IOWR(0, struct seccomp_notif_v1)
> 		return seccomp_notify_recv(filter, buf, sizeof(struct seccomp_notif_v1));
> 	case SECCOMP_IOWR(0, struct seccomp_notif_v2)
> 		return seccomp_notify_recv(filter, buf, sizeof(struct seccomp_notif_v3));
> ...
> 	case SECCOMP_IOCTL_NOTIF_SEND:
> 		return seccomp_notify_send(filter, buf);
> 	case SECCOMP_IOCTL_NOTIF_ID_VALID:
> 		return seccomp_notify_id_valid(filter, buf);
> 	default:
> 		return -EINVAL;
> 	}
> 
> This has the downside that programs compiled against more modern kernel headers 
> will break on older kernels.
> 
> 3. We can take the approach you suggested.
> 
> #define UNSIZED(cmd)	(cmd & ~(_IOC_SIZEMASK << _IOC_SIZESHIFT)
> static long seccomp_notify_ioctl(struct file *file, unsigned int cmd,
> 				 unsigned long arg)
> {
> 	struct seccomp_filter *filter = file->private_data;
> 	void __user *buf = (void __user *)arg;
> 	int size = _IOC_SIZE(cmd);
> 	cmd = UNSIZED(cmd);
> 
> 	switch (cmd) {
> 	/* for example. We actually have to do this for any struct we intend to 
> 	 * extend to get proper backwards compatibility
> 	 */
> 	case UNSIZED(SECCOMP_IOCTL_NOTIF_RECV):
> 		return seccomp_notify_recv(filter, buf, size);
> ...
> 	case SECCOMP_IOCTL_NOTIF_SEND:
> 		return seccomp_notify_send(filter, buf);
> 	case SECCOMP_IOCTL_NOTIF_ID_VALID:
> 		return seccomp_notify_id_valid(filter, buf);
> 	default:
> 		return -EINVAL;
> 	}
> }
> 
> > 
> > Leaving that aside, the proposed direction here seems to mean that any
> > change to the struct itself will immediately mean a new ioctl() but
> > afaict, that also means a new struct. Since when you simply extend the
> > struct for the sake of the new ioctl you also change the size for the
> > ioctl.
> > 
> > Sure, you can simply treat the struct coming through the old ioctl as
> > being "capped" by e.g. hiding the size as suggested but then the gain
> > by having two separate ioctls is 0 compared to simply versioning the
> > struct with an explicit size member since the size encoded in the ioctl
> > and the actual size of the struct don't line up anymore which is the
> > only plus I can see for relying on _IOC_SIZE(). All this manages to do
> > then is to make it more annoying for userspace since they now need to
> > maintain multiple ioctls(). And if you have - however unlikely - say
> > three different ioctls all to be used with a different struct size of
> > the same struct I now need to know which ioctl() goes with which size of
> > the struct (I guess you could append the size to the ioctl name?
> > *shudder*). If you have the size in the struct itself you don't need to
> > care about any of that.
> > Maybe I'm not making sense or I misunderstand what's going on though.
> > 
> > Christian
> > 
> I don't understand why userspace has to have any knowledge of this. As soon as 
> we add the code above, and we use copy_struct_from_user based on _that_ size,

Hm, which code exactly?

In the previous mail the only thing proposed was to switch to a simple
copy_from_user() which effectively bars us from extending the
seccomp_addfd struct which this is about, right? At that point, the only
option then becomes to either introduce a new ioctl() and a new struct
or to go for the hack in e.g. 3. (Afaiu, 2. is not working anymore
because we break userspace as soon as we append "fancy_new_field" to the
struct because it changes the ioctl() unless I'm missing something.)

Let me maybe rephrase: I'd prefer we merge something for addfd that is
extensible with minimal burden on userspace. 
But if we are fine with saying "we don't care, let's just use
copy_from_user() for addfd and if we extend we add a new struct + ioctl"
then ok, sure. But I would prefer to keep dealing with new structs +
ioctls (Look at the end of btrfs.h [1] unlikely to be a problem for us,
but still.) as little as possible because that will be more churn in
userspace code than I'd prefer.

So either [1] or - since none of the generic extensibility seems to be
particularly nice - we bite the bullet and just add a:

__u64 reserved[4]

field and hope that this will carry us for a long time (probably will
for quite a long time) and defer the new ioctl() problem.

[1]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/include/uapi/linux/btrfs.h

> userspace will get free upgrades. If they are compiling against an older header
> than the kernel, size will return a smaller number, and thus we will zero
> out our trailing bits, and if their number is bigger, we just check their
> bits are appropriately zeroed.
> 
> This approach would be forwards-and-backwards compatible.
> 
> There's a little bit of prior art here as well [2]. The approach is that
> we effectively do the thing we had earlier with passing a size with
> copy_struct_from_user, but instead of the size being embedded in the struct,
> it's embedded in the ioctl command itself.

That looks super sketchy. :D

Christian
David Laight June 11, 2020, 2:56 p.m.
From: Sargun Dhillon
> Sent: 11 June 2020 12:07
> Subject: Re: [PATCH v3 1/4] fs, net: Standardize on file_receive helper to move fds across processes
> 
> On Thu, Jun 11, 2020 at 12:01:14PM +0200, Christian Brauner wrote:
> > On Wed, Jun 10, 2020 at 07:59:55PM -0700, Kees Cook wrote:
> > > On Wed, Jun 10, 2020 at 08:12:38AM +0000, Sargun Dhillon wrote:
> > > > As an aside, all of this junk should be dropped:
> > > > +	ret = get_user(size, &uaddfd->size);
> > > > +	if (ret)
> > > > +		return ret;
> > > > +
> > > > +	ret = copy_struct_from_user(&addfd, sizeof(addfd), uaddfd, size);
> > > > +	if (ret)
> > > > +		return ret;
> > > >
> > > > and the size member of the seccomp_notif_addfd struct. I brought this up
> > > > off-list with Tycho that ioctls have the size of the struct embedded in them. We
> > > > should just use that. The ioctl definition is based on this[2]:
> > > > #define _IOC(dir,type,nr,size) \
> > > > 	(((dir)  << _IOC_DIRSHIFT) | \
> > > > 	 ((type) << _IOC_TYPESHIFT) | \
> > > > 	 ((nr)   << _IOC_NRSHIFT) | \
> > > > 	 ((size) << _IOC_SIZESHIFT))
> > > >
> > > >
> > > > We should just use copy_from_user for now. In the future, we can either
> > > > introduce new ioctl names for new structs, or extract the size dynamically from
> > > > the ioctl (and mask it out on the switch statement in seccomp_notify_ioctl.
> > >
> > > Yeah, that seems reasonable. Here's the diff for that part:
> >
> > Why does it matter that the ioctl() has the size of the struct embedded
> > within? Afaik, the kernel itself doesn't do anything with that size. It
> > merely checks that the size is not pathological and it does so at
> > compile time.
> >
> > #ifdef __CHECKER__
> > #define _IOC_TYPECHECK(t) (sizeof(t))
> > #else
> > /* provoke compile error for invalid uses of size argument */
> > extern unsigned int __invalid_size_argument_for_IOC;
> > #define _IOC_TYPECHECK(t) \
> > 	((sizeof(t) == sizeof(t[1]) && \
> > 	  sizeof(t) < (1 << _IOC_SIZEBITS)) ? \
> > 	  sizeof(t) : __invalid_size_argument_for_IOC)
> > #endif
> >
> > The size itself is not verified at runtime. copy_struct_from_user()
> > still makes sense at least if we're going to allow expanding the struct
> > in the future.
> Right, but if we simply change our headers and extend the struct, it will break
> all existing programs compiled against those headers. In order to avoid that, if
> we intend on extending this struct by appending to it, we need to have a
> backwards compatibility mechanism. Just having copy_struct_from_user isn't
> enough. The data structure either must be fixed size, or we need a way to handle
> multiple ioctl numbers derived from headers with different sized struct arguments
> 
> The two approaches I see are:
> 1. use more indirection. This has previous art in drm[1]. That's look
> something like this:
> 
> struct seccomp_notif_addfd_ptr {
> 	__u64 size;
> 	__u64 addr;
> }
> 
> ... And then it'd be up to us to dereference the addr and copy struct from user.

Do not go down that route. It isn't worth the pain.

You should also assume that userspace might have a compile-time check
on the buffer length (I've written one - not hard) and that the kernel
might (in the future - or on a BSD kernel) be doing the user copies
for you.

Also, if you change the structure you almost certainly need to
change the name of the ioctl cmd as well as its value.
Otherwise a recompiled program will pass the new cmd value (and
hopefully the right sized buffer) but it won't have initialised
the buffer properly.
This is likely to lead to unexpected behaviour.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
Kees Cook June 11, 2020, 11:23 p.m.
On Thu, Jun 11, 2020 at 10:39:23AM +0000, Sargun Dhillon wrote:
> On Thu, Jun 11, 2020 at 11:19:42AM +0200, Christian Brauner wrote:
> > On Wed, Jun 10, 2020 at 07:59:55PM -0700, Kees Cook wrote:
> > > On Wed, Jun 10, 2020 at 08:12:38AM +0000, Sargun Dhillon wrote:
> > > > As an aside, all of this junk should be dropped:
> > > > +	ret = get_user(size, &uaddfd->size);
> > > > +	if (ret)
> > > > +		return ret;
> > > > +
> > > > +	ret = copy_struct_from_user(&addfd, sizeof(addfd), uaddfd, size);
> > > > +	if (ret)
> > > > +		return ret;
> > > > 
> > > > and the size member of the seccomp_notif_addfd struct. I brought this up 
> > > > off-list with Tycho that ioctls have the size of the struct embedded in them. We 
> > > > should just use that. The ioctl definition is based on this[2]:
> > > > #define _IOC(dir,type,nr,size) \
> > > > 	(((dir)  << _IOC_DIRSHIFT) | \
> > > > 	 ((type) << _IOC_TYPESHIFT) | \
> > > > 	 ((nr)   << _IOC_NRSHIFT) | \
> > > > 	 ((size) << _IOC_SIZESHIFT))
> > > > 
> > > > 
> > > > We should just use copy_from_user for now. In the future, we can either 
> > > > introduce new ioctl names for new structs, or extract the size dynamically from 
> > > > the ioctl (and mask it out on the switch statement in seccomp_notify_ioctl.
> > > 
> > > Yeah, that seems reasonable. Here's the diff for that part:
> > > 
> > > diff --git a/include/uapi/linux/seccomp.h b/include/uapi/linux/seccomp.h
> > > index 7b6028b399d8..98bf19b4e086 100644
> > > --- a/include/uapi/linux/seccomp.h
> > > +++ b/include/uapi/linux/seccomp.h
> > > @@ -118,7 +118,6 @@ struct seccomp_notif_resp {
> > >  
> > >  /**
> > >   * struct seccomp_notif_addfd
> > > - * @size: The size of the seccomp_notif_addfd datastructure
> > >   * @id: The ID of the seccomp notification
> > >   * @flags: SECCOMP_ADDFD_FLAG_*
> > >   * @srcfd: The local fd number
> > > @@ -126,7 +125,6 @@ struct seccomp_notif_resp {
> > >   * @newfd_flags: The O_* flags the remote FD should have applied
> > >   */
> > >  struct seccomp_notif_addfd {
> > > -	__u64 size;
> > >  	__u64 id;
> > >  	__u32 flags;
> > >  	__u32 srcfd;
> > > diff --git a/kernel/seccomp.c b/kernel/seccomp.c
> > > index 3c913f3b8451..00cbdad6c480 100644
> > > --- a/kernel/seccomp.c
> > > +++ b/kernel/seccomp.c
> > > @@ -1297,14 +1297,9 @@ static long seccomp_notify_addfd(struct seccomp_filter *filter,
> > >  	struct seccomp_notif_addfd addfd;
> > >  	struct seccomp_knotif *knotif;
> > >  	struct seccomp_kaddfd kaddfd;
> > > -	u64 size;
> > >  	int ret;
> > >  
> > > -	ret = get_user(size, &uaddfd->size);
> > > -	if (ret)
> > > -		return ret;
> > > -
> > > -	ret = copy_struct_from_user(&addfd, sizeof(addfd), uaddfd, size);
> > > +	ret = copy_from_user(&addfd, uaddfd, sizeof(addfd));
> > >  	if (ret)
> > >  		return ret;
> > >  
> > > 
> > > > 
> > > > ----
> > > > +#define SECCOMP_IOCTL_NOTIF_ADDFD	SECCOMP_IOR(3,	\
> > > > +						struct seccomp_notif_addfd)
> > > > 
> > > > Lastly, what I believe to be a small mistake, it should be SECCOMP_IOW, based on 
> > > > the documentation in ioctl.h -- "_IOW means userland is writing and kernel is 
> > > > reading."
> > > 
> > > Oooooh. Yeah; good catch. Uhm, that means SECCOMP_IOCTL_NOTIF_ID_VALID
> > > is wrong too, yes? Tycho, Christian, how disruptive would this be to
> > > fix? (Perhaps support both and deprecate the IOR version at some point
> > > in the future?)
> > 
> > We have custom defines in our source code, i.e.
> > #define SECCOMP_IOCTL_NOTIF_ID_VALID  SECCOMP_IOR(2, __u64)
> > so ideally we'd have a SECCOMP_IOCTL_NOTIF_ID_VALID_V2
> > 
> > Does that sound ok?
> > 
> > Christian
> Why not change the public API in seccomp.h to:
> #define SECCOMP_IOCTL_NOTIF_ID_VALID	SECCOMP_IOW(2, __u64)
> 
> And then in seccomp.c:
> #define SECCOMP_IOCTL_NOTIF_ID_VALID_OLD	SECCOMP_IOR(2, __u64)
> static long seccomp_notify_ioctl(struct file *file, unsigned int cmd,
> 				 unsigned long arg)
> {
> 	struct seccomp_filter *filter = file->private_data;
> 	void __user *buf = (void __user *)arg;
> 
> 	switch (cmd) {
> 	case SECCOMP_IOCTL_NOTIF_RECV:
> 		return seccomp_notify_recv(filter, buf);
> 	case SECCOMP_IOCTL_NOTIF_SEND:
> 		return seccomp_notify_send(filter, buf);
> 	case SECCOMP_IOCTL_NOTIF_ID_VALID_OLD:
> 		pr_warn_once("Detected usage of legacy (incorrect) version of seccomp notifier notif_id_valid ioctl\n");
> 	case SECCOMP_IOCTL_NOTIF_ID_VALID:
> 		return seccomp_notify_id_valid(filter, buf);
> 	default:
> 		return -EINVAL;
> 	}
> }
> ---- 
> 
> So, both will work fine, and whenevery anyone recompiles, or picks up new 
> headers, they will start calling the "right" one without a code change, and
> we wont break any userspace.

Yeah, that's what I'd prefer here.
Kees Cook June 11, 2020, 11:49 p.m.
On Thu, Jun 11, 2020 at 02:56:22PM +0000, David Laight wrote:
> From: Sargun Dhillon
> > Sent: 11 June 2020 12:07
> > Subject: Re: [PATCH v3 1/4] fs, net: Standardize on file_receive helper to move fds across processes
> > 
> > On Thu, Jun 11, 2020 at 12:01:14PM +0200, Christian Brauner wrote:
> > > On Wed, Jun 10, 2020 at 07:59:55PM -0700, Kees Cook wrote:
> > > > On Wed, Jun 10, 2020 at 08:12:38AM +0000, Sargun Dhillon wrote:
> > > > > As an aside, all of this junk should be dropped:
> > > > > +	ret = get_user(size, &uaddfd->size);
> > > > > +	if (ret)
> > > > > +		return ret;
> > > > > +
> > > > > +	ret = copy_struct_from_user(&addfd, sizeof(addfd), uaddfd, size);
> > > > > +	if (ret)
> > > > > +		return ret;
> > > > >
> > > > > and the size member of the seccomp_notif_addfd struct. I brought this up
> > > > > off-list with Tycho that ioctls have the size of the struct embedded in them. We
> > > > > should just use that. The ioctl definition is based on this[2]:
> > > > > #define _IOC(dir,type,nr,size) \
> > > > > 	(((dir)  << _IOC_DIRSHIFT) | \
> > > > > 	 ((type) << _IOC_TYPESHIFT) | \
> > > > > 	 ((nr)   << _IOC_NRSHIFT) | \
> > > > > 	 ((size) << _IOC_SIZESHIFT))
> > > > >
> > > > >
> > > > > We should just use copy_from_user for now. In the future, we can either
> > > > > introduce new ioctl names for new structs, or extract the size dynamically from
> > > > > the ioctl (and mask it out on the switch statement in seccomp_notify_ioctl.
> > > >
> > > > Yeah, that seems reasonable. Here's the diff for that part:
> > >
> > > Why does it matter that the ioctl() has the size of the struct embedded
> > > within? Afaik, the kernel itself doesn't do anything with that size. It
> > > merely checks that the size is not pathological and it does so at
> > > compile time.
> > >
> > > #ifdef __CHECKER__
> > > #define _IOC_TYPECHECK(t) (sizeof(t))
> > > #else
> > > /* provoke compile error for invalid uses of size argument */
> > > extern unsigned int __invalid_size_argument_for_IOC;
> > > #define _IOC_TYPECHECK(t) \
> > > 	((sizeof(t) == sizeof(t[1]) && \
> > > 	  sizeof(t) < (1 << _IOC_SIZEBITS)) ? \
> > > 	  sizeof(t) : __invalid_size_argument_for_IOC)
> > > #endif
> > >
> > > The size itself is not verified at runtime. copy_struct_from_user()
> > > still makes sense at least if we're going to allow expanding the struct
> > > in the future.
> > Right, but if we simply change our headers and extend the struct, it will break
> > all existing programs compiled against those headers. In order to avoid that, if
> > we intend on extending this struct by appending to it, we need to have a
> > backwards compatibility mechanism. Just having copy_struct_from_user isn't
> > enough. The data structure either must be fixed size, or we need a way to handle
> > multiple ioctl numbers derived from headers with different sized struct arguments
> > 
> > The two approaches I see are:
> > 1. use more indirection. This has previous art in drm[1]. That's look
> > something like this:
> > 
> > struct seccomp_notif_addfd_ptr {
> > 	__u64 size;
> > 	__u64 addr;
> > }
> > 
> > ... And then it'd be up to us to dereference the addr and copy struct from user.
> 
> Do not go down that route. It isn't worth the pain.
> 
> You should also assume that userspace might have a compile-time check
> on the buffer length (I've written one - not hard) and that the kernel
> might (in the future - or on a BSD kernel) be doing the user copies
> for you.
> 
> Also, if you change the structure you almost certainly need to
> change the name of the ioctl cmd as well as its value.
> Otherwise a recompiled program will pass the new cmd value (and
> hopefully the right sized buffer) but it won't have initialised
> the buffer properly.
> This is likely to lead to unexpected behaviour.

Hmmm.

So, while initially I thought Sargun's observation about ioctl's fixed
struct size was right, I think I've been swayed to Christian's view
(which is supported by the long tail of struct size pain we've seen in
other APIs).

Doing a separate ioctl for each structure version seems like the "old
solution" now that we've got EA syscalls. So, I'd like to keep the size
and copy_struct_from_user().

Which leaves us with the question of how to deal with the ioctl
numbering. As we've seen, there is no actual enforcement of direction
nor size, so to that end, while we could provide the hints about both, I
guess we just don't need to. To that end, perhaps _IO() is best:

#define SECCOMP_IOCTL_NOTIF_ADDFD       SECCOMP_IO(3)

Alternatively, we could use a size of either 0, 8(u64), or -1, and then
use IORW() so we _also_ won't paint ourselves into a corner if we ever
want to write something back to userspace in the structure:

#define SECCOMP_IOCTL_EA(nr) _IOC(_IOC_READ|_IOC_WRITE,SECCOMP_IOC_MAGIC,(nr),0)
#define SECCOMP_IOCTL_NOTIF_ADDFD       SECCOMP_IOCTL_EA(3)

or

#define SECCOMP_IOCTL_EA(nr) _IOC(_IOC_READ|_IOC_WRITE,SECCOMP_IOC_MAGIC,(nr),8)
#define SECCOMP_IOCTL_NOTIF_ADDFD       SECCOMP_IOCTL_EA(3)

or

#define SECCOMP_IOCTL_EA(nr) _IOC(_IOC_READ|_IOC_WRITE,SECCOMP_IOC_MAGIC,(nr),_IOC_SIZEMASK)
#define SECCOMP_IOCTL_NOTIF_ADDFD       SECCOMP_IOCTL_EA(3)

I think I prefer the last one.
Kees Cook June 12, 2020, 6:58 a.m.
On Thu, Jun 11, 2020 at 04:49:37PM -0700, Kees Cook wrote:
> I think I prefer the last one.

Here's where I am with things:

https://git.kernel.org/pub/scm/linux/kernel/git/kees/linux.git/log/?h=devel/seccomp/addfd/v3.3

If we can agree on the ioctl numbering solution, I can actually send the
series for email review...
David Laight June 12, 2020, 8:36 a.m.
From: Kees Cook
> Sent: 12 June 2020 00:50
> > From: Sargun Dhillon
> > > Sent: 11 June 2020 12:07
> > > Subject: Re: [PATCH v3 1/4] fs, net: Standardize on file_receive helper to move fds across
> processes
> > >
> > > On Thu, Jun 11, 2020 at 12:01:14PM +0200, Christian Brauner wrote:
> > > > On Wed, Jun 10, 2020 at 07:59:55PM -0700, Kees Cook wrote:
> > > > > On Wed, Jun 10, 2020 at 08:12:38AM +0000, Sargun Dhillon wrote:
> > > > > > As an aside, all of this junk should be dropped:
> > > > > > +	ret = get_user(size, &uaddfd->size);
> > > > > > +	if (ret)
> > > > > > +		return ret;
> > > > > > +
> > > > > > +	ret = copy_struct_from_user(&addfd, sizeof(addfd), uaddfd, size);
> > > > > > +	if (ret)
> > > > > > +		return ret;
> > > > > >
> > > > > > and the size member of the seccomp_notif_addfd struct. I brought this up
> > > > > > off-list with Tycho that ioctls have the size of the struct embedded in them. We
> > > > > > should just use that. The ioctl definition is based on this[2]:
> > > > > > #define _IOC(dir,type,nr,size) \
> > > > > > 	(((dir)  << _IOC_DIRSHIFT) | \
> > > > > > 	 ((type) << _IOC_TYPESHIFT) | \
> > > > > > 	 ((nr)   << _IOC_NRSHIFT) | \
> > > > > > 	 ((size) << _IOC_SIZESHIFT))
> > > > > >
> > > > > >
> > > > > > We should just use copy_from_user for now. In the future, we can either
> > > > > > introduce new ioctl names for new structs, or extract the size dynamically from
> > > > > > the ioctl (and mask it out on the switch statement in seccomp_notify_ioctl.
> > > > >
> > > > > Yeah, that seems reasonable. Here's the diff for that part:
> > > >
> > > > Why does it matter that the ioctl() has the size of the struct embedded
> > > > within? Afaik, the kernel itself doesn't do anything with that size. It
> > > > merely checks that the size is not pathological and it does so at
> > > > compile time.
> > > >
> > > > #ifdef __CHECKER__
> > > > #define _IOC_TYPECHECK(t) (sizeof(t))
> > > > #else
> > > > /* provoke compile error for invalid uses of size argument */
> > > > extern unsigned int __invalid_size_argument_for_IOC;
> > > > #define _IOC_TYPECHECK(t) \
> > > > 	((sizeof(t) == sizeof(t[1]) && \
> > > > 	  sizeof(t) < (1 << _IOC_SIZEBITS)) ? \
> > > > 	  sizeof(t) : __invalid_size_argument_for_IOC)
> > > > #endif
> > > >
> > > > The size itself is not verified at runtime. copy_struct_from_user()
> > > > still makes sense at least if we're going to allow expanding the struct
> > > > in the future.
> > > Right, but if we simply change our headers and extend the struct, it will break
> > > all existing programs compiled against those headers. In order to avoid that, if
> > > we intend on extending this struct by appending to it, we need to have a
> > > backwards compatibility mechanism. Just having copy_struct_from_user isn't
> > > enough. The data structure either must be fixed size, or we need a way to handle
> > > multiple ioctl numbers derived from headers with different sized struct arguments
> > >
> > > The two approaches I see are:
> > > 1. use more indirection. This has previous art in drm[1]. That's look
> > > something like this:
> > >
> > > struct seccomp_notif_addfd_ptr {
> > > 	__u64 size;
> > > 	__u64 addr;
> > > }
> > >
> > > ... And then it'd be up to us to dereference the addr and copy struct from user.
> >
> > Do not go down that route. It isn't worth the pain.
> >
> > You should also assume that userspace might have a compile-time check
> > on the buffer length (I've written one - not hard) and that the kernel
> > might (in the future - or on a BSD kernel) be doing the user copies
> > for you.
> >
> > Also, if you change the structure you almost certainly need to
> > change the name of the ioctl cmd as well as its value.
> > Otherwise a recompiled program will pass the new cmd value (and
> > hopefully the right sized buffer) but it won't have initialised
> > the buffer properly.
> > This is likely to lead to unexpected behaviour.
> 
> Hmmm.
> 
> So, while initially I thought Sargun's observation about ioctl's fixed
> struct size was right, I think I've been swayed to Christian's view
> (which is supported by the long tail of struct size pain we've seen in
> other APIs).
> 
> Doing a separate ioctl for each structure version seems like the "old
> solution" now that we've got EA syscalls. So, I'd like to keep the size
> and copy_struct_from_user().

If the size is variable then why not get the application to fill
in the size of the structure it is sending at the time of the ioctl.

So you'd have:
#define xxx_IOCTL_17(param) _IOCW('X', 17, sizeof *(param))

The application code would then do:
	ioctl(fd, xxx_IOCTL_17(arg), arg);

The kernel code can either choose to have specific 'case'
for each size, or mask off the length bits and do the
length check later.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
Sargun Dhillon June 12, 2020, 10:46 a.m.
On Fri, Jun 12, 2020 at 08:36:03AM +0000, David Laight wrote:
> From: Kees Cook
> > Sent: 12 June 2020 00:50
> > > From: Sargun Dhillon
> > > > Sent: 11 June 2020 12:07
> > > > Subject: Re: [PATCH v3 1/4] fs, net: Standardize on file_receive helper to move fds across
> > processes
> > > >
> > > > On Thu, Jun 11, 2020 at 12:01:14PM +0200, Christian Brauner wrote:
> > > > > On Wed, Jun 10, 2020 at 07:59:55PM -0700, Kees Cook wrote:
> > > > > > On Wed, Jun 10, 2020 at 08:12:38AM +0000, Sargun Dhillon wrote:
> > > > > > > As an aside, all of this junk should be dropped:
> > > > > > > +	ret = get_user(size, &uaddfd->size);
> > > > > > > +	if (ret)
> > > > > > > +		return ret;
> > > > > > > +
> > > > > > > +	ret = copy_struct_from_user(&addfd, sizeof(addfd), uaddfd, size);
> > > > > > > +	if (ret)
> > > > > > > +		return ret;
> > > > > > >
> > > > > > > and the size member of the seccomp_notif_addfd struct. I brought this up
> > > > > > > off-list with Tycho that ioctls have the size of the struct embedded in them. We
> > > > > > > should just use that. The ioctl definition is based on this[2]:
> > > > > > > #define _IOC(dir,type,nr,size) \
> > > > > > > 	(((dir)  << _IOC_DIRSHIFT) | \
> > > > > > > 	 ((type) << _IOC_TYPESHIFT) | \
> > > > > > > 	 ((nr)   << _IOC_NRSHIFT) | \
> > > > > > > 	 ((size) << _IOC_SIZESHIFT))
> > > > > > >
> > > > > > >
> > > > > > > We should just use copy_from_user for now. In the future, we can either
> > > > > > > introduce new ioctl names for new structs, or extract the size dynamically from
> > > > > > > the ioctl (and mask it out on the switch statement in seccomp_notify_ioctl.
> > > > > >
> > > > > > Yeah, that seems reasonable. Here's the diff for that part:
> > > > >
> > > > > Why does it matter that the ioctl() has the size of the struct embedded
> > > > > within? Afaik, the kernel itself doesn't do anything with that size. It
> > > > > merely checks that the size is not pathological and it does so at
> > > > > compile time.
> > > > >
> > > > > #ifdef __CHECKER__
> > > > > #define _IOC_TYPECHECK(t) (sizeof(t))
> > > > > #else
> > > > > /* provoke compile error for invalid uses of size argument */
> > > > > extern unsigned int __invalid_size_argument_for_IOC;
> > > > > #define _IOC_TYPECHECK(t) \
> > > > > 	((sizeof(t) == sizeof(t[1]) && \
> > > > > 	  sizeof(t) < (1 << _IOC_SIZEBITS)) ? \
> > > > > 	  sizeof(t) : __invalid_size_argument_for_IOC)
> > > > > #endif
> > > > >
> > > > > The size itself is not verified at runtime. copy_struct_from_user()
> > > > > still makes sense at least if we're going to allow expanding the struct
> > > > > in the future.
> > > > Right, but if we simply change our headers and extend the struct, it will break
> > > > all existing programs compiled against those headers. In order to avoid that, if
> > > > we intend on extending this struct by appending to it, we need to have a
> > > > backwards compatibility mechanism. Just having copy_struct_from_user isn't
> > > > enough. The data structure either must be fixed size, or we need a way to handle
> > > > multiple ioctl numbers derived from headers with different sized struct arguments
> > > >
> > > > The two approaches I see are:
> > > > 1. use more indirection. This has previous art in drm[1]. That's look
> > > > something like this:
> > > >
> > > > struct seccomp_notif_addfd_ptr {
> > > > 	__u64 size;
> > > > 	__u64 addr;
> > > > }
> > > >
> > > > ... And then it'd be up to us to dereference the addr and copy struct from user.
> > >
> > > Do not go down that route. It isn't worth the pain.
> > >
> > > You should also assume that userspace might have a compile-time check
> > > on the buffer length (I've written one - not hard) and that the kernel
> > > might (in the future - or on a BSD kernel) be doing the user copies
> > > for you.
> > >
> > > Also, if you change the structure you almost certainly need to
> > > change the name of the ioctl cmd as well as its value.
> > > Otherwise a recompiled program will pass the new cmd value (and
> > > hopefully the right sized buffer) but it won't have initialised
> > > the buffer properly.
> > > This is likely to lead to unexpected behaviour.
Why do you say this? Assuming people are just pulling in <linux/seccomp.h>
they will get both the ioctl number, and the struct. The one case where
I can see things going wrong is languages which implement their own struct
packing / ioctls and wouldn't get the updated # because it's hard coded.


> > 
> > Hmmm.
> > 
> > So, while initially I thought Sargun's observation about ioctl's fixed
> > struct size was right, I think I've been swayed to Christian's view
> > (which is supported by the long tail of struct size pain we've seen in
> > other APIs).
> > 
> > Doing a separate ioctl for each structure version seems like the "old
> > solution" now that we've got EA syscalls. So, I'd like to keep the size
> > and copy_struct_from_user().
> 
> If the size is variable then why not get the application to fill
> in the size of the structure it is sending at the time of the ioctl.
> 
> So you'd have:
> #define xxx_IOCTL_17(param) _IOCW('X', 17, sizeof *(param))
> 
> The application code would then do:
> 	ioctl(fd, xxx_IOCTL_17(arg), arg);
> 
> The kernel code can either choose to have specific 'case'
> for each size, or mask off the length bits and do the
> length check later.
> 
> 	David
> 
> 
My suggest, written out (no idea if this code actually works), is as follows:

ioctl.h:
/* This needs to be added */
#define IOCDIR_MASK	(_IOC_DIRMASK << _IOC_DIRSHIFT)


seccomp.h:

struct struct seccomp_notif_addfd {
	__u64 fd;
	...
}

/* or IOW? */
#define SECCOMP_IOCTL_NOTIF_ADDFD	SECCOMP_IOWR(3, struct seccomp_notif_addfd)

seccomp.c:
static long seccomp_notify_addfd(struct seccomp_filter *filter,
				 struct seccomp_notif_addfd __user *uaddfd int size)
{
	struct seccomp_notif_addfd addfd;
	int ret;

	if (size < 32)
		return -EINVAL;
	if (size > PAGE_SIZE)
		return -E2BIG;

	ret = copy_struct_from_user(&addfd, sizeof(addfd), uaddfd, size);
	if (ret)
		return ret;

	...
}

/* Mask out size */
#define SIZE_MASK(cmd)	(~IOCSIZE_MASK & cmd)

/* Mask out direction */
#define DIR_MASK(cmd)	(~IOCDIR_MASK & cmd)

static long seccomp_notify_ioctl(struct file *file, unsigned int cmd,
				 unsigned long arg)
{
	struct seccomp_filter *filter = file->private_data;
	void __user *buf = (void __user *)arg;

	/* Fixed size ioctls. Can be converted later on? */
	switch (cmd) {
	case SECCOMP_IOCTL_NOTIF_RECV:
		return seccomp_notify_recv(filter, buf);
	case SECCOMP_IOCTL_NOTIF_SEND:
		return seccomp_notify_send(filter, buf);
	case SECCOMP_IOCTL_NOTIF_ID_VALID:
		return seccomp_notify_id_valid(filter, buf);
	}

	/* Probably should make some nicer macros here */
	switch (SIZE_MASK(DIR_MASK(cmd))) {
	case SIZE_MASK(DIR_MASK(SECCOMP_IOCTL_NOTIF_ADDFD)):
		return seccomp_notify_addfd(filter, buf, _IOC_SIZE(cmd));
	default:
		return -EINVAL;
	}
}

--------

What boxes does this tick?
* Forwards (and backwards) compatibility
* Applies to existing commands
* Command can be extended without requiring new ioctl to be defined
* It well accomodates the future where we want to have a kernel
  helper copy the structures from userspace

The fact that the size of the argument struct, and the ioctl are defined in the 
same header gives us the ability to "cheat", and for the argument size to be 
included / embedded for free in the command passed to ioctl. In turn, this
gives us two benefits. First, it means we don't have to copy from user twice,
and can just do it all in one shot since the size is passed with the syscall
arguments. Second, it means that the user does not have to do the following:

seccomp_notif_addfd addfd = {};
addfd.size = sizeof(struct seccomp_notif_addfd)

Because sizeof(struct seccomp_notif_addfd) is embedded in 
SECCOMP_IOCTL_NOTIF_ADDFD based on the same headers they plucked the struct out of.
Kees Cook June 12, 2020, 3:13 p.m.
On Fri, Jun 12, 2020 at 10:46:30AM +0000, Sargun Dhillon wrote:
> My suggest, written out (no idea if this code actually works), is as follows:
> 
> ioctl.h:
> /* This needs to be added */
> #define IOCDIR_MASK	(_IOC_DIRMASK << _IOC_DIRSHIFT)

This exists already:

#define _IOC_DIRMASK    ((1 << _IOC_DIRBITS)-1)

> 
> 
> seccomp.h:
> 
> struct struct seccomp_notif_addfd {
> 	__u64 fd;
> 	...
> }
> 
> /* or IOW? */
> #define SECCOMP_IOCTL_NOTIF_ADDFD	SECCOMP_IOWR(3, struct seccomp_notif_addfd)
> 
> seccomp.c:
> static long seccomp_notify_addfd(struct seccomp_filter *filter,
> 				 struct seccomp_notif_addfd __user *uaddfd int size)
> {
> 	struct seccomp_notif_addfd addfd;
> 	int ret;
> 
> 	if (size < 32)
> 		return -EINVAL;
> 	if (size > PAGE_SIZE)
> 		return -E2BIG;

(Tanget: what was the reason for copy_struct_from_user() not including
the min/max check? I have a memory of Al objecting to having an
"internal" limit?)

> 
> 	ret = copy_struct_from_user(&addfd, sizeof(addfd), uaddfd, size);
> 	if (ret)
> 		return ret;
> 
> 	...
> }
> 
> /* Mask out size */
> #define SIZE_MASK(cmd)	(~IOCSIZE_MASK & cmd)
> 
> /* Mask out direction */
> #define DIR_MASK(cmd)	(~IOCDIR_MASK & cmd)
> 
> static long seccomp_notify_ioctl(struct file *file, unsigned int cmd,
> 				 unsigned long arg)
> {
> 	struct seccomp_filter *filter = file->private_data;
> 	void __user *buf = (void __user *)arg;
> 
> 	/* Fixed size ioctls. Can be converted later on? */
> 	switch (cmd) {
> 	case SECCOMP_IOCTL_NOTIF_RECV:
> 		return seccomp_notify_recv(filter, buf);
> 	case SECCOMP_IOCTL_NOTIF_SEND:
> 		return seccomp_notify_send(filter, buf);
> 	case SECCOMP_IOCTL_NOTIF_ID_VALID:
> 		return seccomp_notify_id_valid(filter, buf);
> 	}
> 
> 	/* Probably should make some nicer macros here */
> 	switch (SIZE_MASK(DIR_MASK(cmd))) {
> 	case SIZE_MASK(DIR_MASK(SECCOMP_IOCTL_NOTIF_ADDFD)):

Ah yeah, I like this because of what you mention below: it's forward
compat too. (I'd just use the ioctl masks directly...)

	switch (cmd & ~(_IOC_SIZEMASK | _IOC_DIRMASK))

> 		return seccomp_notify_addfd(filter, buf, _IOC_SIZE(cmd));

I really like that this ends up having the same construction as a
standard EA syscall: the size is part of the syscall arguments.

> 	default:
> 		return -EINVAL;
> 	}
> }
> 
> --------
> 
> What boxes does this tick?
> * Forwards (and backwards) compatibility
> * Applies to existing commands
> * Command can be extended without requiring new ioctl to be defined

(Technically, a new one is always redefined, but it's automatic in that
the kernel needs to do nothing.)

> * It well accomodates the future where we want to have a kernel
>   helper copy the structures from userspace

Yeah, this is a good solution.

> The fact that the size of the argument struct, and the ioctl are defined in the 
> same header gives us the ability to "cheat", and for the argument size to be 
> included / embedded for free in the command passed to ioctl. In turn, this
> gives us two benefits. First, it means we don't have to copy from user twice,
> and can just do it all in one shot since the size is passed with the syscall
> arguments. Second, it means that the user does not have to do the following:
> 
> seccomp_notif_addfd addfd = {};
> addfd.size = sizeof(struct seccomp_notif_addfd)
> 
> Because sizeof(struct seccomp_notif_addfd) is embedded in 
> SECCOMP_IOCTL_NOTIF_ADDFD based on the same headers they plucked the struct out of.

Cool. I will do more patch reworking! ;)
David Laight June 12, 2020, 3:55 p.m.
From: Kees Cook
> Sent: 12 June 2020 16:13
...
> > 	/* Fixed size ioctls. Can be converted later on? */
> > 	switch (cmd) {
> > 	case SECCOMP_IOCTL_NOTIF_RECV:
> > 		return seccomp_notify_recv(filter, buf);
> > 	case SECCOMP_IOCTL_NOTIF_SEND:
> > 		return seccomp_notify_send(filter, buf);
> > 	case SECCOMP_IOCTL_NOTIF_ID_VALID:
> > 		return seccomp_notify_id_valid(filter, buf);
> > 	}
> >
> > 	/* Probably should make some nicer macros here */
> > 	switch (SIZE_MASK(DIR_MASK(cmd))) {
> > 	case SIZE_MASK(DIR_MASK(SECCOMP_IOCTL_NOTIF_ADDFD)):
> 
> Ah yeah, I like this because of what you mention below: it's forward
> compat too. (I'd just use the ioctl masks directly...)
> 
> 	switch (cmd & ~(_IOC_SIZEMASK | _IOC_DIRMASK))

Since you need the same mask on the case labels I think
I'd define a helper just across the switch statement:

#define M(cmd) ((cmd & ~(_IOC_SIZEMASK | _IOC_DIRMASK))
	switch (M(cmd)) {
	case M(SECCOMP_IOCTL_NOTIF_RECV):
	...
	}
#undef M

It is probably wrong to mask off DIRMASK.
But you might need to add extra case labels for
the broken one(s).

Prior to worries about indirect jumps you could
get a dense set of case label and faster code.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
Christian Brauner June 12, 2020, 6:28 p.m.
On Fri, Jun 12, 2020 at 08:13:25AM -0700, Kees Cook wrote:
> On Fri, Jun 12, 2020 at 10:46:30AM +0000, Sargun Dhillon wrote:
> > My suggest, written out (no idea if this code actually works), is as follows:
> > 
> > ioctl.h:
> > /* This needs to be added */
> > #define IOCDIR_MASK	(_IOC_DIRMASK << _IOC_DIRSHIFT)
> 
> This exists already:
> 
> #define _IOC_DIRMASK    ((1 << _IOC_DIRBITS)-1)
> 
> > 
> > 
> > seccomp.h:
> > 
> > struct struct seccomp_notif_addfd {
> > 	__u64 fd;
> > 	...
> > }
> > 
> > /* or IOW? */
> > #define SECCOMP_IOCTL_NOTIF_ADDFD	SECCOMP_IOWR(3, struct seccomp_notif_addfd)
> > 
> > seccomp.c:
> > static long seccomp_notify_addfd(struct seccomp_filter *filter,
> > 				 struct seccomp_notif_addfd __user *uaddfd int size)
> > {
> > 	struct seccomp_notif_addfd addfd;
> > 	int ret;
> > 
> > 	if (size < 32)
> > 		return -EINVAL;
> > 	if (size > PAGE_SIZE)
> > 		return -E2BIG;
> 
> (Tanget: what was the reason for copy_struct_from_user() not including
> the min/max check? I have a memory of Al objecting to having an
> "internal" limit?)

Al didn't want the PAGE_SIZE limit in there because there's nothing
inherently wrong with copying insane amounts of memory.

(Another tangent. I've asked this on Twitter not too long ago: do we
have stats how long copy_from_user()/copy_struct_from_user() takes with
growing struct/memory size? I'd be really interested in this. I have a
feeling that clone3()'s and - having had a chat with David Howells -
openat2()'s structs will continue to grow for a while... and I'd really
like to have some numbers on when copy_struct_from_user() becomes
costly or how costly it becomes.)

> 
> > 
> > 	ret = copy_struct_from_user(&addfd, sizeof(addfd), uaddfd, size);
> > 	if (ret)
> > 		return ret;
> > 
> > 	...
> > }
> > 
> > /* Mask out size */
> > #define SIZE_MASK(cmd)	(~IOCSIZE_MASK & cmd)
> > 
> > /* Mask out direction */
> > #define DIR_MASK(cmd)	(~IOCDIR_MASK & cmd)
> > 
> > static long seccomp_notify_ioctl(struct file *file, unsigned int cmd,
> > 				 unsigned long arg)
> > {
> > 	struct seccomp_filter *filter = file->private_data;
> > 	void __user *buf = (void __user *)arg;
> > 
> > 	/* Fixed size ioctls. Can be converted later on? */
> > 	switch (cmd) {
> > 	case SECCOMP_IOCTL_NOTIF_RECV:
> > 		return seccomp_notify_recv(filter, buf);
> > 	case SECCOMP_IOCTL_NOTIF_SEND:
> > 		return seccomp_notify_send(filter, buf);
> > 	case SECCOMP_IOCTL_NOTIF_ID_VALID:
> > 		return seccomp_notify_id_valid(filter, buf);
> > 	}
> > 
> > 	/* Probably should make some nicer macros here */
> > 	switch (SIZE_MASK(DIR_MASK(cmd))) {
> > 	case SIZE_MASK(DIR_MASK(SECCOMP_IOCTL_NOTIF_ADDFD)):
> 
> Ah yeah, I like this because of what you mention below: it's forward
> compat too. (I'd just use the ioctl masks directly...)
> 
> 	switch (cmd & ~(_IOC_SIZEMASK | _IOC_DIRMASK))
> 
> > 		return seccomp_notify_addfd(filter, buf, _IOC_SIZE(cmd));
> 
> I really like that this ends up having the same construction as a
> standard EA syscall: the size is part of the syscall arguments.

This is basically what I had proposed in my previous mail, right?

Christian
Kees Cook June 12, 2020, 6:38 p.m.
On Fri, Jun 12, 2020 at 08:28:16PM +0200, Christian Brauner wrote:
> Al didn't want the PAGE_SIZE limit in there because there's nothing
> inherently wrong with copying insane amounts of memory.

Right, ok.

> (Another tangent. I've asked this on Twitter not too long ago: do we
> have stats how long copy_from_user()/copy_struct_from_user() takes with
> growing struct/memory size? I'd be really interested in this. I have a
> feeling that clone3()'s and - having had a chat with David Howells -
> openat2()'s structs will continue to grow for a while... and I'd really
> like to have some numbers on when copy_struct_from_user() becomes
> costly or how costly it becomes.)

How long it takes? It should be basically the same, the costs should be
mostly in switching memory protections, etc. I wouldn't imagine how many
bytes being copied would matter much here, given the sub-page sizes.

> > Ah yeah, I like this because of what you mention below: it's forward
> > compat too. (I'd just use the ioctl masks directly...)
> > 
> > 	switch (cmd & ~(_IOC_SIZEMASK | _IOC_DIRMASK))
> > 
> > > 		return seccomp_notify_addfd(filter, buf, _IOC_SIZE(cmd));
> > 
> > I really like that this ends up having the same construction as a
> > standard EA syscall: the size is part of the syscall arguments.
> 
> This is basically what I had proposed in my previous mail, right?

I guess I missed it! Well, then I think we're all in agreement? :)
Christian Brauner June 12, 2020, 6:42 p.m.
On Fri, Jun 12, 2020 at 11:38:33AM -0700, Kees Cook wrote:
> On Fri, Jun 12, 2020 at 08:28:16PM +0200, Christian Brauner wrote:
> > Al didn't want the PAGE_SIZE limit in there because there's nothing
> > inherently wrong with copying insane amounts of memory.
> 
> Right, ok.
> 
> > (Another tangent. I've asked this on Twitter not too long ago: do we
> > have stats how long copy_from_user()/copy_struct_from_user() takes with
> > growing struct/memory size? I'd be really interested in this. I have a
> > feeling that clone3()'s and - having had a chat with David Howells -
> > openat2()'s structs will continue to grow for a while... and I'd really
> > like to have some numbers on when copy_struct_from_user() becomes
> > costly or how costly it becomes.)
> 
> How long it takes? It should be basically the same, the costs should be
> mostly in switching memory protections, etc. I wouldn't imagine how many
> bytes being copied would matter much here, given the sub-page sizes.

This makes me _very_ happy.

Christian
David Laight June 15, 2020, 8:27 a.m.
From: Christian Brauner
> Sent: 12 June 2020 19:28
...
> > > 	if (size < 32)
> > > 		return -EINVAL;
> > > 	if (size > PAGE_SIZE)
> > > 		return -E2BIG;
> >
> > (Tanget: what was the reason for copy_struct_from_user() not including
> > the min/max check? I have a memory of Al objecting to having an
> > "internal" limit?)
> 
> Al didn't want the PAGE_SIZE limit in there because there's nothing
> inherently wrong with copying insane amounts of memory.

The problem is really allowing a user process to allocate
unbounded blocks of memory, not the copy itself.

The limit for IOW() etc is 16k - not a problem.
If a 32bit size is set to just under 4GB so you really want
to allocate 4GB of memory then find the request is garbage.
Seems like a nice DoS attack.
A 64bit size can be worse.

Potentially the limit should be in memdup_user() itself.
And possibly an extra parameter giving a per-call lower? limit.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)