Message ID | 20170118040159.4751-1-asarai@suse.de |
---|---|
State | New |
Series | "procfs: change the owner of non-dumpable and writeable files" |
Headers | show |
diff --git a/fs/proc/base.c b/fs/proc/base.c index ca651ac00660..ebabb12f4536 100644 --- a/fs/proc/base.c +++ b/fs/proc/base.c @@ -1729,6 +1729,7 @@ int pid_getattr(struct vfsmount *mnt, struct dentry *dentry, struct kstat *stat) return -ENOENT; } if ((inode->i_mode == (S_IFDIR|S_IRUGO|S_IXUGO)) || + (inode->i_mode == (S_IFREG|S_IRUGO|S_IWUSR)) || task_dumpable(task)) { cred = __task_cred(task); stat->uid = cred->euid; @@ -1770,6 +1771,7 @@ int pid_revalidate(struct dentry *dentry, unsigned int flags) if (task) { if ((inode->i_mode == (S_IFDIR|S_IRUGO|S_IXUGO)) || + (inode->i_mode == (S_IFREG|S_IRUGO|S_IWUSR)) || task_dumpable(task)) { rcu_read_lock(); cred = __task_cred(task); @@ -2394,7 +2396,7 @@ static int proc_pident_instantiate(struct inode *dir, return -ENOENT; } -static struct dentry *proc_pident_lookup(struct inode *dir, +static struct dentry *proc_pident_lookup(struct inode *dir, struct dentry *dentry, const struct pid_entry *ents, unsigned int nents) @@ -2536,7 +2538,7 @@ static const struct pid_entry attr_dir_stuff[] = { static int proc_attr_dir_readdir(struct file *file, struct dir_context *ctx) { - return proc_pident_readdir(file, ctx, + return proc_pident_readdir(file, ctx, attr_dir_stuff, ARRAY_SIZE(attr_dir_stuff)); }
On Tue, Jan 17, 2017 at 8:01 PM, Aleksa Sarai <asarai@suse.de> wrote: > In order to protect against ptrace(2) and similar attacks on container > runtimes when they join namespaces, many runtimes set mm->dumpable to > SUID_DUMP_DISABLE. However, doing this means that attempting to set up > an unprivileged user namespace will fail because an unprivileged process > can no longer access /proc/self/{setgroups,{uid,gid}_map} for the > container process (which is the same uid as the runtime process). > > Fix this by changing pid_getattr to *also* change the owner of regular > files that have a mode of 0644 (when the process is not dumpable). This > ensures that the important /proc/[pid]/... files mentioned above are > properly accessible by a container runtime in a rootless container > context. > > The most blantant issue is that a non-dumpable process in a rootless > container context is unable to open /proc/self/setgroups, because it > doesn't own the file. This changes a lot more than just setgroups, doesn't it? This bypasses the task_dumpable check for all kinds of things. Though, I expect the has_pid_permissions() check to be the harder one to pass. Why does has_pid_permissions() succeed in the case you've given? -Kees > int main(void) > { > prctl(PR_SET_DUMPABLE, 0, 0, 0, 0); > unshare(CLONE_NEWUSER); > > /* This will fail. */ > int fd = open("/proc/self/setgroups", O_WRONLY); > if (fd < 0) > abort(); > > return 0; > } > > Cc: dev@opencontainers.org > Cc: containers@lists.linux-foundation.org > Signed-off-by: Aleksa Sarai <asarai@suse.de> > --- > fs/proc/base.c | 6 ++++-- > 1 file changed, 4 insertions(+), 2 deletions(-) > > diff --git a/fs/proc/base.c b/fs/proc/base.c > index ca651ac00660..ebabb12f4536 100644 > --- a/fs/proc/base.c > +++ b/fs/proc/base.c > @@ -1729,6 +1729,7 @@ int pid_getattr(struct vfsmount *mnt, struct dentry *dentry, struct kstat *stat) > return -ENOENT; > } > if ((inode->i_mode == (S_IFDIR|S_IRUGO|S_IXUGO)) || > + (inode->i_mode == (S_IFREG|S_IRUGO|S_IWUSR)) || > task_dumpable(task)) { > cred = __task_cred(task); > stat->uid = cred->euid; > @@ -1770,6 +1771,7 @@ int pid_revalidate(struct dentry *dentry, unsigned int flags) > > if (task) { > if ((inode->i_mode == (S_IFDIR|S_IRUGO|S_IXUGO)) || > + (inode->i_mode == (S_IFREG|S_IRUGO|S_IWUSR)) || > task_dumpable(task)) { > rcu_read_lock(); > cred = __task_cred(task); > @@ -2394,7 +2396,7 @@ static int proc_pident_instantiate(struct inode *dir, > return -ENOENT; > } > > -static struct dentry *proc_pident_lookup(struct inode *dir, > +static struct dentry *proc_pident_lookup(struct inode *dir, > struct dentry *dentry, > const struct pid_entry *ents, > unsigned int nents) > @@ -2536,7 +2538,7 @@ static const struct pid_entry attr_dir_stuff[] = { > > static int proc_attr_dir_readdir(struct file *file, struct dir_context *ctx) > { > - return proc_pident_readdir(file, ctx, > + return proc_pident_readdir(file, ctx, > attr_dir_stuff, ARRAY_SIZE(attr_dir_stuff)); > } > > -- > 2.11.0 >
>> In order to protect against ptrace(2) and similar attacks on container >> runtimes when they join namespaces, many runtimes set mm->dumpable to >> SUID_DUMP_DISABLE. However, doing this means that attempting to set up >> an unprivileged user namespace will fail because an unprivileged process >> can no longer access /proc/self/{setgroups,{uid,gid}_map} for the >> container process (which is the same uid as the runtime process). >> >> Fix this by changing pid_getattr to *also* change the owner of regular >> files that have a mode of 0644 (when the process is not dumpable). This >> ensures that the important /proc/[pid]/... files mentioned above are >> properly accessible by a container runtime in a rootless container >> context. >> >> The most blantant issue is that a non-dumpable process in a rootless >> container context is unable to open /proc/self/setgroups, because it >> doesn't own the file. > > This changes a lot more than just setgroups, doesn't it? This bypasses > the task_dumpable check for all kinds of things. Yeah. I can special case /proc/self/setgroups as well as uid_map, gid_map and the other *specific* things that runC needs. But ultimately I think we should come up with agreement on what things should always appear to be owned by the process's user. > Though, I expect the > has_pid_permissions() check to be the harder one to pass. Why does > has_pid_permissions() succeed in the case you've given? Because the group id of the container process is the same as the parent process, so is_group_p() will succeed. Also hide_pid_min is set such that it will work in either case.
Cc Eric On Wed 18-01-17 15:01:59, Aleksa Sarai wrote: > In order to protect against ptrace(2) and similar attacks on container > runtimes when they join namespaces, many runtimes set mm->dumpable to > SUID_DUMP_DISABLE. However, doing this means that attempting to set up > an unprivileged user namespace will fail because an unprivileged process > can no longer access /proc/self/{setgroups,{uid,gid}_map} for the > container process (which is the same uid as the runtime process). > > Fix this by changing pid_getattr to *also* change the owner of regular > files that have a mode of 0644 (when the process is not dumpable). This > ensures that the important /proc/[pid]/... files mentioned above are > properly accessible by a container runtime in a rootless container > context. > > The most blantant issue is that a non-dumpable process in a rootless > container context is unable to open /proc/self/setgroups, because it > doesn't own the file. > > int main(void) > { > prctl(PR_SET_DUMPABLE, 0, 0, 0, 0); > unshare(CLONE_NEWUSER); > > /* This will fail. */ > int fd = open("/proc/self/setgroups", O_WRONLY); > if (fd < 0) > abort(); > > return 0; > } I do agree that failing to open anything in /proc/self/ is more than unexepcted! I cannot judge the patch but my gut feeling tells me that the fix should be somewhere in the open handler. One nit below > > Cc: dev@opencontainers.org > Cc: containers@lists.linux-foundation.org > Signed-off-by: Aleksa Sarai <asarai@suse.de> > --- > fs/proc/base.c | 6 ++++-- > 1 file changed, 4 insertions(+), 2 deletions(-) > > diff --git a/fs/proc/base.c b/fs/proc/base.c > index ca651ac00660..ebabb12f4536 100644 > --- a/fs/proc/base.c > +++ b/fs/proc/base.c > @@ -1729,6 +1729,7 @@ int pid_getattr(struct vfsmount *mnt, struct dentry *dentry, struct kstat *stat) > return -ENOENT; > } > if ((inode->i_mode == (S_IFDIR|S_IRUGO|S_IXUGO)) || > + (inode->i_mode == (S_IFREG|S_IRUGO|S_IWUSR)) || > task_dumpable(task)) { > cred = __task_cred(task); > stat->uid = cred->euid; > @@ -1770,6 +1771,7 @@ int pid_revalidate(struct dentry *dentry, unsigned int flags) > > if (task) { > if ((inode->i_mode == (S_IFDIR|S_IRUGO|S_IXUGO)) || > + (inode->i_mode == (S_IFREG|S_IRUGO|S_IWUSR)) || > task_dumpable(task)) { > rcu_read_lock(); > cred = __task_cred(task); > @@ -2394,7 +2396,7 @@ static int proc_pident_instantiate(struct inode *dir, > return -ENOENT; > } > > -static struct dentry *proc_pident_lookup(struct inode *dir, > +static struct dentry *proc_pident_lookup(struct inode *dir, > struct dentry *dentry, > const struct pid_entry *ents, > unsigned int nents) > @@ -2536,7 +2538,7 @@ static const struct pid_entry attr_dir_stuff[] = { > > static int proc_attr_dir_readdir(struct file *file, struct dir_context *ctx) > { > - return proc_pident_readdir(file, ctx, > + return proc_pident_readdir(file, ctx, > attr_dir_stuff, ARRAY_SIZE(attr_dir_stuff)); > } this two are just whitespace noise
>> In order to protect against ptrace(2) and similar attacks on container >> runtimes when they join namespaces, many runtimes set mm->dumpable to >> SUID_DUMP_DISABLE. However, doing this means that attempting to set up >> an unprivileged user namespace will fail because an unprivileged process >> can no longer access /proc/self/{setgroups,{uid,gid}_map} for the >> container process (which is the same uid as the runtime process). >> >> Fix this by changing pid_getattr to *also* change the owner of regular >> files that have a mode of 0644 (when the process is not dumpable). This >> ensures that the important /proc/[pid]/... files mentioned above are >> properly accessible by a container runtime in a rootless container >> context. >> >> The most blantant issue is that a non-dumpable process in a rootless >> container context is unable to open /proc/self/setgroups, because it >> doesn't own the file. >> >> int main(void) >> { >> prctl(PR_SET_DUMPABLE, 0, 0, 0, 0); >> unshare(CLONE_NEWUSER); >> >> /* This will fail. */ >> int fd = open("/proc/self/setgroups", O_WRONLY); >> if (fd < 0) >> abort(); >> >> return 0; >> } > > I do agree that failing to open anything in /proc/self/ is more than > unexepcted! I cannot judge the patch but my gut feeling tells me that > the fix should be somewhere in the open handler. Maybe that would suffice as a more specific fix (for the special case of /proc/self), but the fact that none of the users and groups are correctly set in /proc/[pid] will cause issues for runC and other container runtimes (because they don't go through /proc/self -- it's accessing /proc/[pid] from another process). Though I get the feeling that the *correct* fix would be to remove the conditional and *always* change the owner -- maybe I'm missing something but I can't think of the security issue that this code currently fixes (since all of the important permission checks are *in addition* to the generic_permission used for /proc/self/..., which use ptrace_may_access).
In order to protect against ptrace(2) and similar attacks on container runtimes when they join namespaces, many runtimes set mm->dumpable to SUID_DUMP_DISABLE. However, doing this means that attempting to set up an unprivileged user namespace will fail because an unprivileged process can no longer access /proc/self/{setgroups,{uid,gid}_map} for the container process (which is the same uid as the runtime process). Fix this by changing pid_getattr to *also* change the owner of regular files that have a mode of 0644 (when the process is not dumpable). This ensures that the important /proc/[pid]/... files mentioned above are properly accessible by a container runtime in a rootless container context. The most blantant issue is that a non-dumpable process in a rootless container context is unable to open /proc/self/setgroups, because it doesn't own the file. int main(void) { prctl(PR_SET_DUMPABLE, 0, 0, 0, 0); unshare(CLONE_NEWUSER); /* This will fail. */ int fd = open("/proc/self/setgroups", O_WRONLY); if (fd < 0) abort(); return 0; } Cc: dev@opencontainers.org Cc: containers@lists.linux-foundation.org Signed-off-by: Aleksa Sarai <asarai@suse.de> --- fs/proc/base.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)