[v7,4/6] files: add a replace_fd_files() function

Submitted by Tycho Andersen on Sept. 27, 2018, 3:11 p.m.

Details

Message ID 20180927151119.9989-5-tycho@tycho.ws
State New
Series "seccomp trap to userspace"
Headers show

Commit Message

Tycho Andersen Sept. 27, 2018, 3:11 p.m.
Similar to fd_install/__fd_install, we want to be able to replace an fd of
an arbitrary struct files_struct, not just current's. We'll use this in the
next patch to implement the seccomp ioctl that allows inserting fds into a
stopped process' context.

v7: new in v7

Signed-off-by: Tycho Andersen <tycho@tycho.ws>
CC: Alexander Viro <viro@zeniv.linux.org.uk>
CC: Kees Cook <keescook@chromium.org>
CC: Andy Lutomirski <luto@amacapital.net>
CC: Oleg Nesterov <oleg@redhat.com>
CC: Eric W. Biederman <ebiederm@xmission.com>
CC: "Serge E. Hallyn" <serge@hallyn.com>
CC: Christian Brauner <christian.brauner@ubuntu.com>
CC: Tyler Hicks <tyhicks@canonical.com>
CC: Akihiro Suda <suda.akihiro@lab.ntt.co.jp>
---
 fs/file.c            | 22 +++++++++++++++-------
 include/linux/file.h |  8 ++++++++
 2 files changed, 23 insertions(+), 7 deletions(-)

Patch hide | download patch | download mbox

diff --git a/fs/file.c b/fs/file.c
index 7ffd6e9d103d..3b3c5aadaadb 100644
--- a/fs/file.c
+++ b/fs/file.c
@@ -850,24 +850,32 @@  __releases(&files->file_lock)
 }
 
 int replace_fd(unsigned fd, struct file *file, unsigned flags)
+{
+	return replace_fd_task(current, fd, file, flags);
+}
+
+/*
+ * Same warning as __alloc_fd()/__fd_install() here.
+ */
+int replace_fd_task(struct task_struct *task, unsigned fd,
+		    struct file *file, unsigned flags)
 {
 	int err;
-	struct files_struct *files = current->files;
 
 	if (!file)
-		return __close_fd(files, fd);
+		return __close_fd(task->files, fd);
 
-	if (fd >= rlimit(RLIMIT_NOFILE))
+	if (fd >= task_rlimit(task, RLIMIT_NOFILE))
 		return -EBADF;
 
-	spin_lock(&files->file_lock);
-	err = expand_files(files, fd);
+	spin_lock(&task->files->file_lock);
+	err = expand_files(task->files, fd);
 	if (unlikely(err < 0))
 		goto out_unlock;
-	return do_dup2(files, file, fd, flags);
+	return do_dup2(task->files, file, fd, flags);
 
 out_unlock:
-	spin_unlock(&files->file_lock);
+	spin_unlock(&task->files->file_lock);
 	return err;
 }
 
diff --git a/include/linux/file.h b/include/linux/file.h
index 6b2fb032416c..f94277fee038 100644
--- a/include/linux/file.h
+++ b/include/linux/file.h
@@ -11,6 +11,7 @@ 
 #include <linux/posix_types.h>
 
 struct file;
+struct task_struct;
 
 extern void fput(struct file *);
 
@@ -79,6 +80,13 @@  static inline void fdput_pos(struct fd f)
 
 extern int f_dupfd(unsigned int from, struct file *file, unsigned flags);
 extern int replace_fd(unsigned fd, struct file *file, unsigned flags);
+/*
+ * Warning! This is only safe if you know the owner of the files_struct is
+ * stopped outside syscall context. It's a very bad idea to use this unless you
+ * have similar guarantees in your code.
+ */
+extern int replace_fd_task(struct task_struct *task, unsigned fd,
+			   struct file *file, unsigned flags);
 extern void set_close_on_exec(unsigned int fd, int flag);
 extern bool get_close_on_exec(unsigned int fd);
 extern int get_unused_fd_flags(unsigned flags);

Comments

Jann Horn via Containers Sept. 27, 2018, 4:49 p.m.
On Thu, Sep 27, 2018 at 5:11 PM Tycho Andersen <tycho@tycho.ws> wrote:
> Similar to fd_install/__fd_install, we want to be able to replace an fd of
> an arbitrary struct files_struct, not just current's. We'll use this in the
> next patch to implement the seccomp ioctl that allows inserting fds into a
> stopped process' context.
[...]
> diff --git a/fs/file.c b/fs/file.c
> index 7ffd6e9d103d..3b3c5aadaadb 100644
> --- a/fs/file.c
> +++ b/fs/file.c
> @@ -850,24 +850,32 @@ __releases(&files->file_lock)
>  }
>
>  int replace_fd(unsigned fd, struct file *file, unsigned flags)
> +{
> +       return replace_fd_task(current, fd, file, flags);
> +}
> +
> +/*
> + * Same warning as __alloc_fd()/__fd_install() here.
> + */
> +int replace_fd_task(struct task_struct *task, unsigned fd,
> +                   struct file *file, unsigned flags)
>  {
>         int err;
> -       struct files_struct *files = current->files;

Why did you remove this? You could just do s/current/task/ instead, right?

>         if (!file)
> -               return __close_fd(files, fd);
> +               return __close_fd(task->files, fd);
>
> -       if (fd >= rlimit(RLIMIT_NOFILE))
> +       if (fd >= task_rlimit(task, RLIMIT_NOFILE))
>                 return -EBADF;
>
> -       spin_lock(&files->file_lock);
> -       err = expand_files(files, fd);
> +       spin_lock(&task->files->file_lock);
> +       err = expand_files(task->files, fd);
>         if (unlikely(err < 0))
>                 goto out_unlock;
> -       return do_dup2(files, file, fd, flags);
> +       return do_dup2(task->files, file, fd, flags);
>
>  out_unlock:
> -       spin_unlock(&files->file_lock);
> +       spin_unlock(&task->files->file_lock);
>         return err;
>  }
>
> diff --git a/include/linux/file.h b/include/linux/file.h
> index 6b2fb032416c..f94277fee038 100644
> --- a/include/linux/file.h
> +++ b/include/linux/file.h
> @@ -11,6 +11,7 @@
>  #include <linux/posix_types.h>
>
>  struct file;
> +struct task_struct;
>
>  extern void fput(struct file *);
>
> @@ -79,6 +80,13 @@ static inline void fdput_pos(struct fd f)
>
>  extern int f_dupfd(unsigned int from, struct file *file, unsigned flags);
>  extern int replace_fd(unsigned fd, struct file *file, unsigned flags);
> +/*
> + * Warning! This is only safe if you know the owner of the files_struct is
> + * stopped outside syscall context. It's a very bad idea to use this unless you
> + * have similar guarantees in your code.
> + */
> +extern int replace_fd_task(struct task_struct *task, unsigned fd,
> +                          struct file *file, unsigned flags);

I think Linux kernel coding style is normally to have comments on the
implementations of functions, not in the headers? Maybe replace the
warning above the implemenation of replace_fd_task() with this
comment.
Tycho Andersen Sept. 27, 2018, 6:04 p.m.
On Thu, Sep 27, 2018 at 06:49:02PM +0200, Jann Horn wrote:
> On Thu, Sep 27, 2018 at 5:11 PM Tycho Andersen <tycho@tycho.ws> wrote:
> > Similar to fd_install/__fd_install, we want to be able to replace an fd of
> > an arbitrary struct files_struct, not just current's. We'll use this in the
> > next patch to implement the seccomp ioctl that allows inserting fds into a
> > stopped process' context.
> [...]
> > diff --git a/fs/file.c b/fs/file.c
> > index 7ffd6e9d103d..3b3c5aadaadb 100644
> > --- a/fs/file.c
> > +++ b/fs/file.c
> > @@ -850,24 +850,32 @@ __releases(&files->file_lock)
> >  }
> >
> >  int replace_fd(unsigned fd, struct file *file, unsigned flags)
> > +{
> > +       return replace_fd_task(current, fd, file, flags);
> > +}
> > +
> > +/*
> > + * Same warning as __alloc_fd()/__fd_install() here.
> > + */
> > +int replace_fd_task(struct task_struct *task, unsigned fd,
> > +                   struct file *file, unsigned flags)
> >  {
> >         int err;
> > -       struct files_struct *files = current->files;
> 
> Why did you remove this? You could just do s/current/task/ instead, right?

No reason, probably just flailing around trying to figure out what
exactly I wanted. I'll make the change, thanks.

> >         if (!file)
> > -               return __close_fd(files, fd);
> > +               return __close_fd(task->files, fd);
> >
> > -       if (fd >= rlimit(RLIMIT_NOFILE))
> > +       if (fd >= task_rlimit(task, RLIMIT_NOFILE))
> >                 return -EBADF;
> >
> > -       spin_lock(&files->file_lock);
> > -       err = expand_files(files, fd);
> > +       spin_lock(&task->files->file_lock);
> > +       err = expand_files(task->files, fd);
> >         if (unlikely(err < 0))
> >                 goto out_unlock;
> > -       return do_dup2(files, file, fd, flags);
> > +       return do_dup2(task->files, file, fd, flags);
> >
> >  out_unlock:
> > -       spin_unlock(&files->file_lock);
> > +       spin_unlock(&task->files->file_lock);
> >         return err;
> >  }
> >
> > diff --git a/include/linux/file.h b/include/linux/file.h
> > index 6b2fb032416c..f94277fee038 100644
> > --- a/include/linux/file.h
> > +++ b/include/linux/file.h
> > @@ -11,6 +11,7 @@
> >  #include <linux/posix_types.h>
> >
> >  struct file;
> > +struct task_struct;
> >
> >  extern void fput(struct file *);
> >
> > @@ -79,6 +80,13 @@ static inline void fdput_pos(struct fd f)
> >
> >  extern int f_dupfd(unsigned int from, struct file *file, unsigned flags);
> >  extern int replace_fd(unsigned fd, struct file *file, unsigned flags);
> > +/*
> > + * Warning! This is only safe if you know the owner of the files_struct is
> > + * stopped outside syscall context. It's a very bad idea to use this unless you
> > + * have similar guarantees in your code.
> > + */
> > +extern int replace_fd_task(struct task_struct *task, unsigned fd,
> > +                          struct file *file, unsigned flags);
> 
> I think Linux kernel coding style is normally to have comments on the
> implementations of functions, not in the headers? Maybe replace the
> warning above the implemenation of replace_fd_task() with this
> comment.

Will do.

Cheers,

Tycho
Kees Cook Sept. 27, 2018, 9:59 p.m.
On Thu, Sep 27, 2018 at 8:11 AM, Tycho Andersen <tycho@tycho.ws> wrote:
> Similar to fd_install/__fd_install, we want to be able to replace an fd of
> an arbitrary struct files_struct, not just current's. We'll use this in the
> next patch to implement the seccomp ioctl that allows inserting fds into a
> stopped process' context.
>
> v7: new in v7
>
> Signed-off-by: Tycho Andersen <tycho@tycho.ws>
> CC: Alexander Viro <viro@zeniv.linux.org.uk>
> CC: Kees Cook <keescook@chromium.org>
> CC: Andy Lutomirski <luto@amacapital.net>
> CC: Oleg Nesterov <oleg@redhat.com>
> CC: Eric W. Biederman <ebiederm@xmission.com>
> CC: "Serge E. Hallyn" <serge@hallyn.com>
> CC: Christian Brauner <christian.brauner@ubuntu.com>
> CC: Tyler Hicks <tyhicks@canonical.com>
> CC: Akihiro Suda <suda.akihiro@lab.ntt.co.jp>
> ---
>  fs/file.c            | 22 +++++++++++++++-------
>  include/linux/file.h |  8 ++++++++
>  2 files changed, 23 insertions(+), 7 deletions(-)
>
> diff --git a/fs/file.c b/fs/file.c
> index 7ffd6e9d103d..3b3c5aadaadb 100644
> --- a/fs/file.c
> +++ b/fs/file.c
> @@ -850,24 +850,32 @@ __releases(&files->file_lock)
>  }
>
>  int replace_fd(unsigned fd, struct file *file, unsigned flags)
> +{
> +       return replace_fd_task(current, fd, file, flags);
> +}
> +
> +/*
> + * Same warning as __alloc_fd()/__fd_install() here.
> + */
> +int replace_fd_task(struct task_struct *task, unsigned fd,
> +                   struct file *file, unsigned flags)
>  {
>         int err;
> -       struct files_struct *files = current->files;

Same feedback as Jann: on a purely "smaller diff" note, this could
just be s/current/task/ here and all the other s/files/task->files/
would go away...

>
>         if (!file)
> -               return __close_fd(files, fd);
> +               return __close_fd(task->files, fd);
>
> -       if (fd >= rlimit(RLIMIT_NOFILE))
> +       if (fd >= task_rlimit(task, RLIMIT_NOFILE))
>                 return -EBADF;
>
> -       spin_lock(&files->file_lock);
> -       err = expand_files(files, fd);
> +       spin_lock(&task->files->file_lock);
> +       err = expand_files(task->files, fd);
>         if (unlikely(err < 0))
>                 goto out_unlock;
> -       return do_dup2(files, file, fd, flags);
> +       return do_dup2(task->files, file, fd, flags);
>
>  out_unlock:
> -       spin_unlock(&files->file_lock);
> +       spin_unlock(&task->files->file_lock);
>         return err;
>  }
>
> diff --git a/include/linux/file.h b/include/linux/file.h
> index 6b2fb032416c..f94277fee038 100644
> --- a/include/linux/file.h
> +++ b/include/linux/file.h
> @@ -11,6 +11,7 @@
>  #include <linux/posix_types.h>
>
>  struct file;
> +struct task_struct;
>
>  extern void fput(struct file *);
>
> @@ -79,6 +80,13 @@ static inline void fdput_pos(struct fd f)
>
>  extern int f_dupfd(unsigned int from, struct file *file, unsigned flags);
>  extern int replace_fd(unsigned fd, struct file *file, unsigned flags);
> +/*
> + * Warning! This is only safe if you know the owner of the files_struct is
> + * stopped outside syscall context. It's a very bad idea to use this unless you
> + * have similar guarantees in your code.
> + */
> +extern int replace_fd_task(struct task_struct *task, unsigned fd,
> +                          struct file *file, unsigned flags);

Perhaps call this __replace_fd() to indicate the "please don't use
this unless you're very sure"ness of it?

>  extern void set_close_on_exec(unsigned int fd, int flag);
>  extern bool get_close_on_exec(unsigned int fd);
>  extern int get_unused_fd_flags(unsigned flags);
> --
> 2.17.1
>

If I can get an Ack from Al, that would be very nice. :)

-Kees
Kees Cook Sept. 28, 2018, 2:20 a.m.
On Thu, Sep 27, 2018 at 2:59 PM, Kees Cook <keescook@chromium.org> wrote:
> On Thu, Sep 27, 2018 at 8:11 AM, Tycho Andersen <tycho@tycho.ws> wrote:
>> Similar to fd_install/__fd_install, we want to be able to replace an fd of
>> an arbitrary struct files_struct, not just current's. We'll use this in the
>> next patch to implement the seccomp ioctl that allows inserting fds into a
>> stopped process' context.
>>
>> v7: new in v7
>>
>> Signed-off-by: Tycho Andersen <tycho@tycho.ws>
>> CC: Alexander Viro <viro@zeniv.linux.org.uk>
>> CC: Kees Cook <keescook@chromium.org>
>> CC: Andy Lutomirski <luto@amacapital.net>
>> CC: Oleg Nesterov <oleg@redhat.com>
>> CC: Eric W. Biederman <ebiederm@xmission.com>
>> CC: "Serge E. Hallyn" <serge@hallyn.com>
>> CC: Christian Brauner <christian.brauner@ubuntu.com>
>> CC: Tyler Hicks <tyhicks@canonical.com>
>> CC: Akihiro Suda <suda.akihiro@lab.ntt.co.jp>
>> ---
>>  fs/file.c            | 22 +++++++++++++++-------
>>  include/linux/file.h |  8 ++++++++
>>  2 files changed, 23 insertions(+), 7 deletions(-)
>>
>> diff --git a/fs/file.c b/fs/file.c
>> index 7ffd6e9d103d..3b3c5aadaadb 100644
>> --- a/fs/file.c
>> +++ b/fs/file.c
>> @@ -850,24 +850,32 @@ __releases(&files->file_lock)
>>  }
>>
>>  int replace_fd(unsigned fd, struct file *file, unsigned flags)
>> +{
>> +       return replace_fd_task(current, fd, file, flags);
>> +}
>> +
>> +/*
>> + * Same warning as __alloc_fd()/__fd_install() here.
>> + */
>> +int replace_fd_task(struct task_struct *task, unsigned fd,
>> +                   struct file *file, unsigned flags)
>>  {
>>         int err;
>> -       struct files_struct *files = current->files;
>
> Same feedback as Jann: on a purely "smaller diff" note, this could
> just be s/current/task/ here and all the other s/files/task->files/
> would go away...
>
>>
>>         if (!file)
>> -               return __close_fd(files, fd);
>> +               return __close_fd(task->files, fd);
>>
>> -       if (fd >= rlimit(RLIMIT_NOFILE))
>> +       if (fd >= task_rlimit(task, RLIMIT_NOFILE))
>>                 return -EBADF;
>>
>> -       spin_lock(&files->file_lock);
>> -       err = expand_files(files, fd);
>> +       spin_lock(&task->files->file_lock);
>> +       err = expand_files(task->files, fd);
>>         if (unlikely(err < 0))
>>                 goto out_unlock;
>> -       return do_dup2(files, file, fd, flags);
>> +       return do_dup2(task->files, file, fd, flags);
>>
>>  out_unlock:
>> -       spin_unlock(&files->file_lock);
>> +       spin_unlock(&task->files->file_lock);
>>         return err;
>>  }
>>
>> diff --git a/include/linux/file.h b/include/linux/file.h
>> index 6b2fb032416c..f94277fee038 100644
>> --- a/include/linux/file.h
>> +++ b/include/linux/file.h
>> @@ -11,6 +11,7 @@
>>  #include <linux/posix_types.h>
>>
>>  struct file;
>> +struct task_struct;
>>
>>  extern void fput(struct file *);
>>
>> @@ -79,6 +80,13 @@ static inline void fdput_pos(struct fd f)
>>
>>  extern int f_dupfd(unsigned int from, struct file *file, unsigned flags);
>>  extern int replace_fd(unsigned fd, struct file *file, unsigned flags);
>> +/*
>> + * Warning! This is only safe if you know the owner of the files_struct is
>> + * stopped outside syscall context. It's a very bad idea to use this unless you
>> + * have similar guarantees in your code.
>> + */
>> +extern int replace_fd_task(struct task_struct *task, unsigned fd,
>> +                          struct file *file, unsigned flags);
>
> Perhaps call this __replace_fd() to indicate the "please don't use
> this unless you're very sure"ness of it?
>
>>  extern void set_close_on_exec(unsigned int fd, int flag);
>>  extern bool get_close_on_exec(unsigned int fd);
>>  extern int get_unused_fd_flags(unsigned flags);
>> --
>> 2.17.1
>>
>
> If I can get an Ack from Al, that would be very nice. :)

In out-of-band feedback from Al, he's pointed out a much cleaner
approach: do the work on the "current" side. i.e. current is stopped
in __seccomp_filter in the case SECCOMP_RET_USER_NOTIFY. Instead of
having the ioctl-handing process doing the work, have it done on the
other side. This may cause some additional complexity on the ioctl
return path, but it solves both this problem and the "ptrace attach"
issue: have the work delayed until "current" gets caught by seccomp.

-Kees
Jann Horn via Containers Sept. 28, 2018, 2:46 a.m.
On Fri, Sep 28, 2018 at 4:20 AM Kees Cook <keescook@chromium.org> wrote:
> On Thu, Sep 27, 2018 at 2:59 PM, Kees Cook <keescook@chromium.org> wrote:
> > On Thu, Sep 27, 2018 at 8:11 AM, Tycho Andersen <tycho@tycho.ws> wrote:
> >> Similar to fd_install/__fd_install, we want to be able to replace an fd of
> >> an arbitrary struct files_struct, not just current's. We'll use this in the
> >> next patch to implement the seccomp ioctl that allows inserting fds into a
> >> stopped process' context.
> >>
> >> v7: new in v7
> >>
> >> Signed-off-by: Tycho Andersen <tycho@tycho.ws>
> >> CC: Alexander Viro <viro@zeniv.linux.org.uk>
> >> CC: Kees Cook <keescook@chromium.org>
> >> CC: Andy Lutomirski <luto@amacapital.net>
> >> CC: Oleg Nesterov <oleg@redhat.com>
> >> CC: Eric W. Biederman <ebiederm@xmission.com>
> >> CC: "Serge E. Hallyn" <serge@hallyn.com>
> >> CC: Christian Brauner <christian.brauner@ubuntu.com>
> >> CC: Tyler Hicks <tyhicks@canonical.com>
> >> CC: Akihiro Suda <suda.akihiro@lab.ntt.co.jp>
> >> ---
> >>  fs/file.c            | 22 +++++++++++++++-------
> >>  include/linux/file.h |  8 ++++++++
> >>  2 files changed, 23 insertions(+), 7 deletions(-)
> >>
> >> diff --git a/fs/file.c b/fs/file.c
> >> index 7ffd6e9d103d..3b3c5aadaadb 100644
> >> --- a/fs/file.c
> >> +++ b/fs/file.c
> >> @@ -850,24 +850,32 @@ __releases(&files->file_lock)
> >>  }
> >>
> >>  int replace_fd(unsigned fd, struct file *file, unsigned flags)
> >> +{
> >> +       return replace_fd_task(current, fd, file, flags);
> >> +}
> >> +
> >> +/*
> >> + * Same warning as __alloc_fd()/__fd_install() here.
> >> + */
> >> +int replace_fd_task(struct task_struct *task, unsigned fd,
> >> +                   struct file *file, unsigned flags)
> >>  {
> >>         int err;
> >> -       struct files_struct *files = current->files;
> >
> > Same feedback as Jann: on a purely "smaller diff" note, this could
> > just be s/current/task/ here and all the other s/files/task->files/
> > would go away...
> >
> >>
> >>         if (!file)
> >> -               return __close_fd(files, fd);
> >> +               return __close_fd(task->files, fd);
> >>
> >> -       if (fd >= rlimit(RLIMIT_NOFILE))
> >> +       if (fd >= task_rlimit(task, RLIMIT_NOFILE))
> >>                 return -EBADF;
> >>
> >> -       spin_lock(&files->file_lock);
> >> -       err = expand_files(files, fd);
> >> +       spin_lock(&task->files->file_lock);
> >> +       err = expand_files(task->files, fd);
> >>         if (unlikely(err < 0))
> >>                 goto out_unlock;
> >> -       return do_dup2(files, file, fd, flags);
> >> +       return do_dup2(task->files, file, fd, flags);
> >>
> >>  out_unlock:
> >> -       spin_unlock(&files->file_lock);
> >> +       spin_unlock(&task->files->file_lock);
> >>         return err;
> >>  }
> >>
> >> diff --git a/include/linux/file.h b/include/linux/file.h
> >> index 6b2fb032416c..f94277fee038 100644
> >> --- a/include/linux/file.h
> >> +++ b/include/linux/file.h
> >> @@ -11,6 +11,7 @@
> >>  #include <linux/posix_types.h>
> >>
> >>  struct file;
> >> +struct task_struct;
> >>
> >>  extern void fput(struct file *);
> >>
> >> @@ -79,6 +80,13 @@ static inline void fdput_pos(struct fd f)
> >>
> >>  extern int f_dupfd(unsigned int from, struct file *file, unsigned flags);
> >>  extern int replace_fd(unsigned fd, struct file *file, unsigned flags);
> >> +/*
> >> + * Warning! This is only safe if you know the owner of the files_struct is
> >> + * stopped outside syscall context. It's a very bad idea to use this unless you
> >> + * have similar guarantees in your code.
> >> + */
> >> +extern int replace_fd_task(struct task_struct *task, unsigned fd,
> >> +                          struct file *file, unsigned flags);
> >
> > Perhaps call this __replace_fd() to indicate the "please don't use
> > this unless you're very sure"ness of it?
> >
> >>  extern void set_close_on_exec(unsigned int fd, int flag);
> >>  extern bool get_close_on_exec(unsigned int fd);
> >>  extern int get_unused_fd_flags(unsigned flags);
> >> --
> >> 2.17.1
> >>
> >
> > If I can get an Ack from Al, that would be very nice. :)
>
> In out-of-band feedback from Al, he's pointed out a much cleaner
> approach: do the work on the "current" side. i.e. current is stopped
> in __seccomp_filter in the case SECCOMP_RET_USER_NOTIFY. Instead of
> having the ioctl-handing process doing the work, have it done on the
> other side. This may cause some additional complexity on the ioctl
> return path, but it solves both this problem and the "ptrace attach"
> issue: have the work delayed until "current" gets caught by seccomp.

Can you elaborate on this? Are you saying you want to, for every file
descriptor that should be transferred, put a reference to the file
into the kernel's seccomp notification data structure, wake up the
task that's waiting for a reply, let the task install an fd, send back
a response on whether installing the FD worked, and then return that
response back to the container manager process? That sounds
like a pretty complicated dance that I'd prefer to avoid.
Tycho Andersen Sept. 28, 2018, 5:23 a.m.
On Thu, Sep 27, 2018 at 07:20:50PM -0700, Kees Cook wrote:
> On Thu, Sep 27, 2018 at 2:59 PM, Kees Cook <keescook@chromium.org> wrote:
> > On Thu, Sep 27, 2018 at 8:11 AM, Tycho Andersen <tycho@tycho.ws> wrote:
> >> Similar to fd_install/__fd_install, we want to be able to replace an fd of
> >> an arbitrary struct files_struct, not just current's. We'll use this in the
> >> next patch to implement the seccomp ioctl that allows inserting fds into a
> >> stopped process' context.
> >>
> >> v7: new in v7
> >>
> >> Signed-off-by: Tycho Andersen <tycho@tycho.ws>
> >> CC: Alexander Viro <viro@zeniv.linux.org.uk>
> >> CC: Kees Cook <keescook@chromium.org>
> >> CC: Andy Lutomirski <luto@amacapital.net>
> >> CC: Oleg Nesterov <oleg@redhat.com>
> >> CC: Eric W. Biederman <ebiederm@xmission.com>
> >> CC: "Serge E. Hallyn" <serge@hallyn.com>
> >> CC: Christian Brauner <christian.brauner@ubuntu.com>
> >> CC: Tyler Hicks <tyhicks@canonical.com>
> >> CC: Akihiro Suda <suda.akihiro@lab.ntt.co.jp>
> >> ---
> >>  fs/file.c            | 22 +++++++++++++++-------
> >>  include/linux/file.h |  8 ++++++++
> >>  2 files changed, 23 insertions(+), 7 deletions(-)
> >>
> >> diff --git a/fs/file.c b/fs/file.c
> >> index 7ffd6e9d103d..3b3c5aadaadb 100644
> >> --- a/fs/file.c
> >> +++ b/fs/file.c
> >> @@ -850,24 +850,32 @@ __releases(&files->file_lock)
> >>  }
> >>
> >>  int replace_fd(unsigned fd, struct file *file, unsigned flags)
> >> +{
> >> +       return replace_fd_task(current, fd, file, flags);
> >> +}
> >> +
> >> +/*
> >> + * Same warning as __alloc_fd()/__fd_install() here.
> >> + */
> >> +int replace_fd_task(struct task_struct *task, unsigned fd,
> >> +                   struct file *file, unsigned flags)
> >>  {
> >>         int err;
> >> -       struct files_struct *files = current->files;
> >
> > Same feedback as Jann: on a purely "smaller diff" note, this could
> > just be s/current/task/ here and all the other s/files/task->files/
> > would go away...
> >
> >>
> >>         if (!file)
> >> -               return __close_fd(files, fd);
> >> +               return __close_fd(task->files, fd);
> >>
> >> -       if (fd >= rlimit(RLIMIT_NOFILE))
> >> +       if (fd >= task_rlimit(task, RLIMIT_NOFILE))
> >>                 return -EBADF;
> >>
> >> -       spin_lock(&files->file_lock);
> >> -       err = expand_files(files, fd);
> >> +       spin_lock(&task->files->file_lock);
> >> +       err = expand_files(task->files, fd);
> >>         if (unlikely(err < 0))
> >>                 goto out_unlock;
> >> -       return do_dup2(files, file, fd, flags);
> >> +       return do_dup2(task->files, file, fd, flags);
> >>
> >>  out_unlock:
> >> -       spin_unlock(&files->file_lock);
> >> +       spin_unlock(&task->files->file_lock);
> >>         return err;
> >>  }
> >>
> >> diff --git a/include/linux/file.h b/include/linux/file.h
> >> index 6b2fb032416c..f94277fee038 100644
> >> --- a/include/linux/file.h
> >> +++ b/include/linux/file.h
> >> @@ -11,6 +11,7 @@
> >>  #include <linux/posix_types.h>
> >>
> >>  struct file;
> >> +struct task_struct;
> >>
> >>  extern void fput(struct file *);
> >>
> >> @@ -79,6 +80,13 @@ static inline void fdput_pos(struct fd f)
> >>
> >>  extern int f_dupfd(unsigned int from, struct file *file, unsigned flags);
> >>  extern int replace_fd(unsigned fd, struct file *file, unsigned flags);
> >> +/*
> >> + * Warning! This is only safe if you know the owner of the files_struct is
> >> + * stopped outside syscall context. It's a very bad idea to use this unless you
> >> + * have similar guarantees in your code.
> >> + */
> >> +extern int replace_fd_task(struct task_struct *task, unsigned fd,
> >> +                          struct file *file, unsigned flags);
> >
> > Perhaps call this __replace_fd() to indicate the "please don't use
> > this unless you're very sure"ness of it?
> >
> >>  extern void set_close_on_exec(unsigned int fd, int flag);
> >>  extern bool get_close_on_exec(unsigned int fd);
> >>  extern int get_unused_fd_flags(unsigned flags);
> >> --
> >> 2.17.1
> >>
> >
> > If I can get an Ack from Al, that would be very nice. :)
> 
> In out-of-band feedback from Al, he's pointed out a much cleaner
> approach: do the work on the "current" side. i.e. current is stopped
> in __seccomp_filter in the case SECCOMP_RET_USER_NOTIFY. Instead of
> having the ioctl-handing process doing the work, have it done on the
> other side. This may cause some additional complexity on the ioctl
> return path, but it solves both this problem and the "ptrace attach"
> issue: have the work delayed until "current" gets caught by seccomp.

So this is pretty much what we had in v6 (a one fd version, but the
idea is the same). The biggest issue is that in the case of e.g.
socketpair(), the fd values need to be written somewhere in the task's
memory, which means they need to be known before the response is sent.
If we have to wait until we're back in the task's context to install
them, we can't know the fd values.

V6 implementation: https://lkml.org/lkml/2018/9/6/773

Tycho