[v4,1/4] seccomp: add a return code to trap to userspace

Submitted by Jann Horn via Containers on June 22, 2018, 10:27 p.m.

Details

Message ID CAG48ez3gobTL5mUnZKjhLedotZx49nGrxYKKud5_7+512PaOFw@mail.gmail.com
State New
Series "seccomp trap to userspace"
Headers show

Commit Message

Jann Horn via Containers June 22, 2018, 10:27 p.m.
On Fri, Jun 22, 2018 at 11:51 PM Kees Cook <keescook@chromium.org> wrote:
>
> On Fri, Jun 22, 2018 at 11:09 AM, Andy Lutomirski <luto@amacapital.net> wrote:
> > One possible extra issue: IIRC /proc/.../mem uses FOLL_FORCE, which is not what we want here.

Uuugh, I forgot about that.

> > How about just adding an explicit “read/write the seccomp-trapped task’s memory” primitive?  That should be easier than a “open mem fd” primitive.
>
> Uuugh. Can we avoid adding another "read/write remote process memory"
> interface? The point of this series was to provide a lightweight
> approach to what should normally be possible via the existing
> seccomp+ptrace interface. I do like Jann's context idea, but I agree
> with Andy: it can't be a handle to /proc/$pid/mem, since it's
> FOLL_FORCE. Is there any other kind of process context id we can use
> for this instead of pid? There was once an idea of pid-fd but it never
> landed... This would let us get rid of the "id" in the structure too.
> And if that existed, we could make process_vm_*v() safer too (taking a
> pid-fd instead of a pid).

Or make a duplicate of /proc/$pid/mem that only differs in whether it
sets FOLL_FORCE? The code is basically already there... something like
this:

Patch hide | download patch | download mbox

diff --git a/fs/proc/base.c b/fs/proc/base.c
index 80aa42506b8b..e8a6a63046da 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -762,6 +762,8 @@  static int mem_open(struct inode *inode, struct file *file)
        return ret;
 }

+static const struct file_operations proc_mem_operations;
+
 static ssize_t mem_rw(struct file *file, char __user *buf,
                        size_t count, loff_t *ppos, int write)
 {
@@ -782,7 +784,10 @@  static ssize_t mem_rw(struct file *file, char __user *buf,
        if (!mmget_not_zero(mm))
                goto free;

-       flags = FOLL_FORCE | (write ? FOLL_WRITE : 0);
+       flags = (write ? FOLL_WRITE : 0);
+       if (file->f_op == &proc_mem_operations) {
+               flags |= FOLL_FORCE;
+       }

        while (count > 0) {
                int this_len = min_t(int, count, PAGE_SIZE);
@@ -861,6 +866,14 @@  static const struct file_operations proc_mem_operations = {
        .release        = mem_release,
 };

+static const struct file_operations proc_mem_noforce_operations = {
+       .llseek         = mem_lseek,
+       .read           = mem_read,
+       .write          = mem_write,
+       .open           = mem_open,
+       .release        = mem_release,
+};
+
 static int environ_open(struct inode *inode, struct file *file)
 {
        return __mem_open(inode, file, PTRACE_MODE_READ);
@@ -2916,6 +2929,7 @@  static const struct pid_entry tgid_base_stuff[] = {
        REG("numa_maps",  S_IRUGO, proc_pid_numa_maps_operations),
 #endif
        REG("mem",        S_IRUSR|S_IWUSR, proc_mem_operations),
+       REG("mem_noforce", S_IRUSR|S_IWUSR, proc_mem_noforce_operations),
        LNK("cwd",        proc_cwd_link),
        LNK("root",       proc_root_link),
        LNK("exe",        proc_exe_link),
@@ -3302,6 +3316,7 @@  static const struct pid_entry tid_base_stuff[] = {
        REG("numa_maps", S_IRUGO, proc_tid_numa_maps_operations),
 #endif
        REG("mem",       S_IRUSR|S_IWUSR, proc_mem_operations),
+       REG("mem_noforce",S_IRUSR|S_IWUSR, proc_mem_noforce_operations),
        LNK("cwd",       proc_cwd_link),
        LNK("root",      proc_root_link),
        LNK("exe",       proc_exe_link),

Comments

Tycho Andersen June 26, 2018, 1:32 a.m.
On Sat, Jun 23, 2018 at 12:27:43AM +0200, Jann Horn wrote:
> On Fri, Jun 22, 2018 at 11:51 PM Kees Cook <keescook@chromium.org> wrote:
> >
> > On Fri, Jun 22, 2018 at 11:09 AM, Andy Lutomirski <luto@amacapital.net> wrote:
> > > One possible extra issue: IIRC /proc/.../mem uses FOLL_FORCE, which is not what we want here.
> 
> Uuugh, I forgot about that.
> 
> > > How about just adding an explicit “read/write the seccomp-trapped task’s memory” primitive?  That should be easier than a “open mem fd” primitive.
> >
> > Uuugh. Can we avoid adding another "read/write remote process memory"
> > interface? The point of this series was to provide a lightweight
> > approach to what should normally be possible via the existing
> > seccomp+ptrace interface. I do like Jann's context idea, but I agree
> > with Andy: it can't be a handle to /proc/$pid/mem, since it's
> > FOLL_FORCE. Is there any other kind of process context id we can use
> > for this instead of pid? There was once an idea of pid-fd but it never
> > landed... This would let us get rid of the "id" in the structure too.
> > And if that existed, we could make process_vm_*v() safer too (taking a
> > pid-fd instead of a pid).
> 
> Or make a duplicate of /proc/$pid/mem that only differs in whether it
> sets FOLL_FORCE? The code is basically already there... something like
> this:

But we want more than just memory access, I think. rootfs access, ns
fds, etc. all seem like they might be useful, and racy to open.

I guess I see two options: use the existing id and add something to
seccomp() to ask if it's still valid or independent of this patchset
add some kind of pid id :\

Tycho
Andy Lutomirski June 26, 2018, 2 a.m.
> On Jun 25, 2018, at 6:32 PM, Tycho Andersen <tycho@tycho.ws> wrote:
> 
>> On Sat, Jun 23, 2018 at 12:27:43AM +0200, Jann Horn wrote:
>>> On Fri, Jun 22, 2018 at 11:51 PM Kees Cook <keescook@chromium.org> wrote:
>>> 
>>>> On Fri, Jun 22, 2018 at 11:09 AM, Andy Lutomirski <luto@amacapital.net> wrote:
>>>> One possible extra issue: IIRC /proc/.../mem uses FOLL_FORCE, which is not what we want here.
>> 
>> Uuugh, I forgot about that.
>> 
>>>> How about just adding an explicit “read/write the seccomp-trapped task’s memory” primitive?  That should be easier than a “open mem fd” primitive.
>>> 
>>> Uuugh. Can we avoid adding another "read/write remote process memory"
>>> interface? The point of this series was to provide a lightweight
>>> approach to what should normally be possible via the existing
>>> seccomp+ptrace interface. I do like Jann's context idea, but I agree
>>> with Andy: it can't be a handle to /proc/$pid/mem, since it's
>>> FOLL_FORCE. Is there any other kind of process context id we can use
>>> for this instead of pid? There was once an idea of pid-fd but it never
>>> landed... This would let us get rid of the "id" in the structure too.
>>> And if that existed, we could make process_vm_*v() safer too (taking a
>>> pid-fd instead of a pid).
>> 
>> Or make a duplicate of /proc/$pid/mem that only differs in whether it
>> sets FOLL_FORCE? The code is basically already there... something like
>> this:
> 
> But we want more than just memory access, I think. rootfs access, ns
> fds, etc. all seem like they might be useful, and racy to open.
> 
> I guess I see two options: use the existing id and add something to
> seccomp() to ask if it's still valid or independent of this patchset
> add some kind of pid id :\
> 

I think we use the existing id / cookie / whatever and ask seccomp, or new syscalls, to do the requested operation. This is because we know the target task is in a very special stopping point. As a result, a seccomp-specific mechanism can do RCU-less fd modifications against a single-threaded target, can muck with things like struct cred, etc, while a more general interface can’t.

It might be nice to add a syscall with flags such that it could be used on ptrace-stopped targets later on. Something like:

access_remote_task(int fd, u64 id, u32 type, ...)

Where type is 16 bits of “id and fd is from seccomp” and 16 bits of “write memory” or such.