[REVIEW,v2] mm: Add a user_ns owner to mm_struct and fix ptrace permission checks

Submitted by Eric W. Biederman on Oct. 27, 2016, 3:54 p.m.

Details

Message ID 87d1ilrdmt.fsf_-_@xmission.com
State New
Series "mm: Add a user_ns owner to mm_struct and fix ptrace permission checks"
Headers show

Commit Message

Eric W. Biederman Oct. 27, 2016, 3:54 p.m.
During exec dumpable is cleared if the file that is being executed is
not readable by the user executing the file.  A bug in
ptrace_may_access allows reading the file if the executable happens to
enter into a subordinate user namespace (aka clone(CLONE_NEWUSER),
unshare(CLONE_NEWUSER), or setns(fd, CLONE_NEWUSER).

This problem is fixed with only necessary userspace breakage by adding
a user namespace owner to mm_struct, captured at the time of exec, so
it is clear in which user namespace CAP_SYS_PTRACE must be present in
to be able to safely give read permission to the executable.

The function ptrace_may_access is modified to verify that the ptracer
has CAP_SYS_ADMIN in task->mm->user_ns instead of task->cred->user_ns.
This ensures that if the task changes it's cred into a subordinate
user namespace it does not become ptraceable.

The function ptrace_attach is modified to only set PT_PTRACE_CAP when
CAP_SYS_PTRACE is held over task->mm->user_ns.  The intent of
PT_PTRACE_CAP is to be a flag to note that whatever permission changes
the task might go through the tracer has sufficient permissions for
it not to be an issue.  task->cred->user_ns is always the same
as or descendent of mm->user_ns.  Which guarantees that having
CAP_SYS_PTRACE over mm->user_ns is the worst case for the tasks
credentials.

Cc: stable@vger.kernel.org
Fixes: 8409cca70561 ("userns: allow ptrace from non-init user namespaces")
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---

Jann or anyone who looked at my last version of this that I thought
was ready to go, I have change the test in ptrace_may_access from:
	if (!mm ||
	    ((get_dumpable(mm) != SUID_DUMP_USER) &&
	     !ptrace_has_cap(mm->user_ns, mode)))
	    return -EPERM;
to:
	if (mm &&
	    ((get_dumpable(mm) != SUID_DUMP_USER) &&
	     !ptrace_has_cap(mm->user_ns, mode)))
	    return -EPERM;

As enforcing the dumpablity check without an mm caused a regression for
Cyrill (he could not read task->exit code of a zomebie even with a full
set of caps).

Further I have moved the setting of PT_PTRACE_CAP up in ptrace_attach
so that I can easily use the mm->user_ns.

I can't imagine either of these changes making a practical difference
to anyone but I am calling them out in case someone can.

 include/linux/mm_types.h |  1 +
 kernel/fork.c            |  9 ++++++---
 kernel/ptrace.c          | 26 +++++++++++---------------
 mm/init-mm.c             |  2 ++
 4 files changed, 20 insertions(+), 18 deletions(-)

Patch hide | download patch | download mbox

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 4a8acedf4b7d..08d947fc4c59 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -473,6 +473,7 @@  struct mm_struct {
 	 */
 	struct task_struct __rcu *owner;
 #endif
+	struct user_namespace *user_ns;
 
 	/* store ref to file /proc/<pid>/exe symlink points to */
 	struct file __rcu *exe_file;
diff --git a/kernel/fork.c b/kernel/fork.c
index 623259fc794d..fd85c68c2791 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -742,7 +742,8 @@  static void mm_init_owner(struct mm_struct *mm, struct task_struct *p)
 #endif
 }
 
-static struct mm_struct *mm_init(struct mm_struct *mm, struct task_struct *p)
+static struct mm_struct *mm_init(struct mm_struct *mm, struct task_struct *p,
+	struct user_namespace *user_ns)
 {
 	mm->mmap = NULL;
 	mm->mm_rb = RB_ROOT;
@@ -782,6 +783,7 @@  static struct mm_struct *mm_init(struct mm_struct *mm, struct task_struct *p)
 	if (init_new_context(p, mm))
 		goto fail_nocontext;
 
+	mm->user_ns = get_user_ns(user_ns);
 	return mm;
 
 fail_nocontext:
@@ -827,7 +829,7 @@  struct mm_struct *mm_alloc(void)
 		return NULL;
 
 	memset(mm, 0, sizeof(*mm));
-	return mm_init(mm, current);
+	return mm_init(mm, current, current_user_ns());
 }
 
 /*
@@ -842,6 +844,7 @@  void __mmdrop(struct mm_struct *mm)
 	destroy_context(mm);
 	mmu_notifier_mm_destroy(mm);
 	check_mm(mm);
+	put_user_ns(mm->user_ns);
 	free_mm(mm);
 }
 EXPORT_SYMBOL_GPL(__mmdrop);
@@ -1123,7 +1126,7 @@  static struct mm_struct *dup_mm(struct task_struct *tsk)
 
 	memcpy(mm, oldmm, sizeof(*mm));
 
-	if (!mm_init(mm, tsk))
+	if (!mm_init(mm, tsk, mm->user_ns))
 		goto fail_nomem;
 
 	err = dup_mmap(mm, oldmm);
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 2a99027312a6..44a25a1e6e83 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -220,7 +220,7 @@  static int ptrace_has_cap(struct user_namespace *ns, unsigned int mode)
 static int __ptrace_may_access(struct task_struct *task, unsigned int mode)
 {
 	const struct cred *cred = current_cred(), *tcred;
-	int dumpable = 0;
+	struct mm_struct *mm;
 	kuid_t caller_uid;
 	kgid_t caller_gid;
 
@@ -271,16 +271,11 @@  static int __ptrace_may_access(struct task_struct *task, unsigned int mode)
 	return -EPERM;
 ok:
 	rcu_read_unlock();
-	smp_rmb();
-	if (task->mm)
-		dumpable = get_dumpable(task->mm);
-	rcu_read_lock();
-	if (dumpable != SUID_DUMP_USER &&
-	    !ptrace_has_cap(__task_cred(task)->user_ns, mode)) {
-		rcu_read_unlock();
-		return -EPERM;
-	}
-	rcu_read_unlock();
+	mm = task->mm;
+	if (mm &&
+	    ((get_dumpable(mm) != SUID_DUMP_USER) &&
+	     !ptrace_has_cap(mm->user_ns, mode)))
+	    return -EPERM;
 
 	return security_ptrace_access_check(task, mode);
 }
@@ -331,6 +326,11 @@  static int ptrace_attach(struct task_struct *task, long request,
 
 	task_lock(task);
 	retval = __ptrace_may_access(task, PTRACE_MODE_ATTACH_REALCREDS);
+	if (!retval) {
+		struct mm_struct *mm = task->mm;
+		if (mm && ns_capable(mm->user_ns, CAP_SYS_PTRACE))
+			flags |= PT_PTRACE_CAP;
+	}
 	task_unlock(task);
 	if (retval)
 		goto unlock_creds;
@@ -344,10 +344,6 @@  static int ptrace_attach(struct task_struct *task, long request,
 
 	if (seize)
 		flags |= PT_SEIZED;
-	rcu_read_lock();
-	if (ns_capable(__task_cred(task)->user_ns, CAP_SYS_PTRACE))
-		flags |= PT_PTRACE_CAP;
-	rcu_read_unlock();
 	task->ptrace = flags;
 
 	__ptrace_link(task, current);
diff --git a/mm/init-mm.c b/mm/init-mm.c
index a56a851908d2..975e49f00f34 100644
--- a/mm/init-mm.c
+++ b/mm/init-mm.c
@@ -6,6 +6,7 @@ 
 #include <linux/cpumask.h>
 
 #include <linux/atomic.h>
+#include <linux/user_namespace.h>
 #include <asm/pgtable.h>
 #include <asm/mmu.h>
 
@@ -21,5 +22,6 @@  struct mm_struct init_mm = {
 	.mmap_sem	= __RWSEM_INITIALIZER(init_mm.mmap_sem),
 	.page_table_lock =  __SPIN_LOCK_UNLOCKED(init_mm.page_table_lock),
 	.mmlist		= LIST_HEAD_INIT(init_mm.mmlist),
+	.user_ns	= &init_user_ns,
 	INIT_MM_CONTEXT(init_mm)
 };

Comments

Kees Cook Oct. 27, 2016, 9:27 p.m.
On Thu, Oct 27, 2016 at 8:54 AM, Eric W. Biederman
<ebiederm@xmission.com> wrote:
>
> During exec dumpable is cleared if the file that is being executed is
> not readable by the user executing the file.  A bug in
> ptrace_may_access allows reading the file if the executable happens to
> enter into a subordinate user namespace (aka clone(CLONE_NEWUSER),
> unshare(CLONE_NEWUSER), or setns(fd, CLONE_NEWUSER).
>
> This problem is fixed with only necessary userspace breakage by adding
> a user namespace owner to mm_struct, captured at the time of exec, so
> it is clear in which user namespace CAP_SYS_PTRACE must be present in
> to be able to safely give read permission to the executable.
>
> The function ptrace_may_access is modified to verify that the ptracer
> has CAP_SYS_ADMIN in task->mm->user_ns instead of task->cred->user_ns.
> This ensures that if the task changes it's cred into a subordinate
> user namespace it does not become ptraceable.
>
> The function ptrace_attach is modified to only set PT_PTRACE_CAP when
> CAP_SYS_PTRACE is held over task->mm->user_ns.  The intent of
> PT_PTRACE_CAP is to be a flag to note that whatever permission changes
> the task might go through the tracer has sufficient permissions for
> it not to be an issue.  task->cred->user_ns is always the same
> as or descendent of mm->user_ns.  Which guarantees that having
> CAP_SYS_PTRACE over mm->user_ns is the worst case for the tasks
> credentials.
>
> Cc: stable@vger.kernel.org
> Fixes: 8409cca70561 ("userns: allow ptrace from non-init user namespaces")
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>

Thanks!

Acked-by: Kees Cook <keescook@chromium.org>

-Kees
Cyrill Gorcunov Oct. 27, 2016, 9:39 p.m.
On Thu, Oct 27, 2016 at 10:54:34AM -0500, Eric W. Biederman wrote:
> 
> 
> I can't imagine either of these changes making a practical difference
> to anyone but I am calling them out in case someone can.
> 
>  include/linux/mm_types.h |  1 +
>  kernel/fork.c            |  9 ++++++---
>  kernel/ptrace.c          | 26 +++++++++++---------------
>  mm/init-mm.c             |  2 ++
>  4 files changed, 20 insertions(+), 18 deletions(-)

Thanks a huge, Eric! And really sorry for delay in response,
I managed to miss this quite important mail for me in mail
storm. Gonna test it and will write you the results. Overall looks
great, but better be sure and run the tests.

Reviewed-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cyrill Gorcunov Oct. 27, 2016, 10:34 p.m.
On Fri, Oct 28, 2016 at 12:39:18AM +0300, Cyrill Gorcunov wrote:
> On Thu, Oct 27, 2016 at 10:54:34AM -0500, Eric W. Biederman wrote:
> > 
> > 
> > I can't imagine either of these changes making a practical difference
> > to anyone but I am calling them out in case someone can.
> > 
> >  include/linux/mm_types.h |  1 +
> >  kernel/fork.c            |  9 ++++++---
> >  kernel/ptrace.c          | 26 +++++++++++---------------
> >  mm/init-mm.c             |  2 ++
> >  4 files changed, 20 insertions(+), 18 deletions(-)
> 
> Thanks a huge, Eric! And really sorry for delay in response,
> I managed to miss this quite important mail for me in mail
> storm. Gonna test it and will write you the results. Overall looks
> great, but better be sure and run the tests.
> 
> Reviewed-by: Cyrill Gorcunov <gorcunov@openvz.org>

Eric, on which kernel the patch is on top of?
It doesn't apply on linux-next for some reason.

 | Date:   Thu Oct 27 14:21:59 2016 +1100
 | 
 |     Add linux-next specific files for 20161027
 |     
 |     Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>

I applied it on Linus' master and tests passed fine
(but they were passing fine even without the patch,
 only linux-next failed).
Eric W. Biederman Oct. 28, 2016, 2:22 a.m.
Cyrill Gorcunov <gorcunov@gmail.com> writes:

> On Fri, Oct 28, 2016 at 12:39:18AM +0300, Cyrill Gorcunov wrote:
>> On Thu, Oct 27, 2016 at 10:54:34AM -0500, Eric W. Biederman wrote:
>> > 
>> > 
>> > I can't imagine either of these changes making a practical difference
>> > to anyone but I am calling them out in case someone can.
>> > 
>> >  include/linux/mm_types.h |  1 +
>> >  kernel/fork.c            |  9 ++++++---
>> >  kernel/ptrace.c          | 26 +++++++++++---------------
>> >  mm/init-mm.c             |  2 ++
>> >  4 files changed, 20 insertions(+), 18 deletions(-)
>> 
>> Thanks a huge, Eric! And really sorry for delay in response,
>> I managed to miss this quite important mail for me in mail
>> storm. Gonna test it and will write you the results. Overall looks
>> great, but better be sure and run the tests.
>> 
>> Reviewed-by: Cyrill Gorcunov <gorcunov@openvz.org>
>
> Eric, on which kernel the patch is on top of?
> It doesn't apply on linux-next for some reason.
>
>  | Date:   Thu Oct 27 14:21:59 2016 +1100
>  | 
>  |     Add linux-next specific files for 20161027
>  |     
>  |     Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
>
> I applied it on Linus' master and tests passed fine
> (but they were passing fine even without the patch,
>  only linux-next failed).

Odd.  I don't think I have taken the old version out of
linux-next yet.   So you can probably revert the old version out of
linux-next and apply this one.  All of my development at this point is
against v4.9-rc1.

I suspect you will find my last version on top of against v4.9-rc1 will
pass.  Since my tree is only one deep and I don't think anyone except
linux-next is based on it, I plan to drop and readd this patch.
Especially since it is candidate for backporting.

Eric