[v8,1/2] seccomp: add a return code to trap to userspace

Submitted by Tycho Andersen on Oct. 30, 2018, 3:54 p.m.

Details

Message ID 20181030155403.GC7343@cisco
State New
Series "seccomp trap to userspace"
Headers show

Commit Message

Tycho Andersen Oct. 30, 2018, 3:54 p.m.
On Tue, Oct 30, 2018 at 04:02:54PM +0100, Oleg Nesterov wrote:
> On 10/29, Tycho Andersen wrote:
> >
> > +static long seccomp_notify_recv(struct seccomp_filter *filter,
> > +				void __user *buf)
> > +{
> > +	struct seccomp_knotif *knotif = NULL, *cur;
> > +	struct seccomp_notif unotif;
> > +	ssize_t ret;
> > +
> > +	memset(&unotif, 0, sizeof(unotif));
> > +
> > +	ret = down_interruptible(&filter->notif->request);
> > +	if (ret < 0)
> > +		return ret;
> > +
> > +	mutex_lock(&filter->notify_lock);
> > +	list_for_each_entry(cur, &filter->notif->notifications, list) {
> > +		if (cur->state == SECCOMP_NOTIFY_INIT) {
> > +			knotif = cur;
> > +			break;
> > +		}
> > +	}
> > +
> > +	/*
> > +	 * If we didn't find a notification, it could be that the task was
> > +	 * interrupted by a fatal signal between the time we were woken and
> > +	 * when we were able to acquire the rw lock.
> > +	 *
> > +	 * This is the place where we handle the extra high semaphore count
> > +	 * mentioned in seccomp_do_user_notification().
> > +	 */
> > +	if (!knotif) {
> > +		ret = -ENOENT;
> > +		goto out;
> > +	}
> > +
> > +	unotif.id = knotif->id;
> > +	unotif.pid = task_pid_vnr(knotif->task);
> > +	if (knotif->signaled)
> > +		unotif.flags |= SECCOMP_NOTIF_FLAG_SIGNALED;
> > +	unotif.data = *(knotif->data);
> 
> Tycho, I forgot everything about seccomp, most probably I am wrong but let me
> ask anyway.
> 
> __seccomp_filter(SECCOMP_RET_TRACE) does
> 
> 		/*
> 		 * Recheck the syscall, since it may have changed. This
> 		 * intentionally uses a NULL struct seccomp_data to force
> 		 * a reload of all registers. This does not goto skip since
> 		 * a skip would have already been reported.
> 		 */
> 		if (__seccomp_filter(this_syscall, NULL, true))
> 			return -1;
> 
> and the next seccomp_run_filters() can return SECCOMP_RET_USER_NOTIF, right?
> seccomp_do_user_notification() doesn't check recheck_after_trace and it simply
> does n.data = sd.
> 
> Doesn't this mean that "unotif.data = *(knotif->data)" can hit NULL ?
> 
> seccomp_run_filters() does populate_seccomp_data() in this case, but this
> won't affect "seccomp_data *sd" passed to seccomp_do_user_notification().

Oof, yes, you're right. Seems like there are no other users of sd in
__seccomp_filter(). Seems to me like we can just do the
populate_seccomp_data() one level higher in __seccomp_filter()?

Tycho


From 9e0f75ea51a2c328567910df3122a236ebeccab0 Mon Sep 17 00:00:00 2001
From: Tycho Andersen <tycho@tycho.ws>
Date: Tue, 30 Oct 2018 09:51:14 -0600
Subject: [PATCH] seccomp: hoist struct seccomp_data recalculation higher

Signed-off-by: Tycho Andersen <tycho@tycho.ws>
---
 kernel/seccomp.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

Patch hide | download patch | download mbox

diff --git a/kernel/seccomp.c b/kernel/seccomp.c
index 4c5fb6ced4cd..1525cb753ad2 100644
--- a/kernel/seccomp.c
+++ b/kernel/seccomp.c
@@ -257,7 +257,6 @@  static int seccomp_check_filter(struct sock_filter *filter, unsigned int flen)
 static u32 seccomp_run_filters(const struct seccomp_data *sd,
 			       struct seccomp_filter **match)
 {
-	struct seccomp_data sd_local;
 	u32 ret = SECCOMP_RET_ALLOW;
 	/* Make sure cross-thread synced filter points somewhere sane. */
 	struct seccomp_filter *f =
@@ -267,11 +266,6 @@  static u32 seccomp_run_filters(const struct seccomp_data *sd,
 	if (unlikely(WARN_ON(f == NULL)))
 		return SECCOMP_RET_KILL_PROCESS;
 
-	if (!sd) {
-		populate_seccomp_data(&sd_local);
-		sd = &sd_local;
-	}
-
 	/*
 	 * All filters in the list are evaluated and the lowest BPF return
 	 * value always takes priority (ignoring the DATA).
@@ -821,6 +815,7 @@  static int __seccomp_filter(int this_syscall, const struct seccomp_data *sd,
 	u32 filter_ret, action;
 	struct seccomp_filter *match = NULL;
 	int data;
+	struct seccomp_data sd_local;
 
 	/*
 	 * Make sure that any changes to mode from another thread have
@@ -828,6 +823,11 @@  static int __seccomp_filter(int this_syscall, const struct seccomp_data *sd,
 	 */
 	rmb();
 
+	if (!sd) {
+		populate_seccomp_data(&sd_local);
+		sd = &sd_local;
+	}
+
 	filter_ret = seccomp_run_filters(sd, &match);
 	data = filter_ret & SECCOMP_RET_DATA;
 	action = filter_ret & SECCOMP_RET_ACTION_FULL;

Comments

Oleg Nesterov Oct. 30, 2018, 4:27 p.m.
On 10/30, Tycho Andersen wrote:
>
> @@ -828,6 +823,11 @@ static int __seccomp_filter(int this_syscall, const struct seccomp_data *sd,
>  	 */
>  	rmb();
>  
> +	if (!sd) {
> +		populate_seccomp_data(&sd_local);
> +		sd = &sd_local;
> +	}
> +

To me it would be more clean to remove the "if (!sd)" check, case(SECCOMP_RET_TRACE)
in __seccomp_filter() can simply do populate_seccomp_data(&sd_local) unconditionally
and pass &sd_local to __seccomp_filter().

Oleg.
Oleg Nesterov Oct. 30, 2018, 4:39 p.m.
On 10/30, Oleg Nesterov wrote:
>
> On 10/30, Tycho Andersen wrote:
> >
> > @@ -828,6 +823,11 @@ static int __seccomp_filter(int this_syscall, const struct seccomp_data *sd,
> >  	 */
> >  	rmb();
> >
> > +	if (!sd) {
> > +		populate_seccomp_data(&sd_local);
> > +		sd = &sd_local;
> > +	}
> > +
>
> To me it would be more clean to remove the "if (!sd)" check, case(SECCOMP_RET_TRACE)
> in __seccomp_filter() can simply do populate_seccomp_data(&sd_local) unconditionally
> and pass &sd_local to __seccomp_filter().

Ah, please ignore, emulate_vsyscall() does secure_computing(NULL).

Btw. why __seccomp_filter() doesn't return a boolean?

Or at least, why can't case(SECCOMP_RET_TRACE) simply do

	return __seccomp_filter(this_syscall, NULL, true);

?

Oleg.
Tycho Andersen Oct. 30, 2018, 5:21 p.m.
On Tue, Oct 30, 2018 at 05:39:26PM +0100, Oleg Nesterov wrote:
> On 10/30, Oleg Nesterov wrote:
> >
> > On 10/30, Tycho Andersen wrote:
> > >
> > > @@ -828,6 +823,11 @@ static int __seccomp_filter(int this_syscall, const struct seccomp_data *sd,
> > >  	 */
> > >  	rmb();
> > >
> > > +	if (!sd) {
> > > +		populate_seccomp_data(&sd_local);
> > > +		sd = &sd_local;
> > > +	}
> > > +
> >
> > To me it would be more clean to remove the "if (!sd)" check, case(SECCOMP_RET_TRACE)
> > in __seccomp_filter() can simply do populate_seccomp_data(&sd_local) unconditionally
> > and pass &sd_local to __seccomp_filter().
> 
> Ah, please ignore, emulate_vsyscall() does secure_computing(NULL).
> 
> Btw. why __seccomp_filter() doesn't return a boolean?
> 
> Or at least, why can't case(SECCOMP_RET_TRACE) simply do
> 
> 	return __seccomp_filter(this_syscall, NULL, true);
> 
> ?

Yeah, at least the second one definitely makes sense. I can add that
as a patch in the next version of this series unless Kees does it
before.

Thanks for your help, Oleg!

Tycho
Kees Cook Oct. 30, 2018, 9:32 p.m.
On Tue, Oct 30, 2018 at 10:21 AM, Tycho Andersen <tycho@tycho.ws> wrote:
> On Tue, Oct 30, 2018 at 05:39:26PM +0100, Oleg Nesterov wrote:
>> On 10/30, Oleg Nesterov wrote:
>> >
>> > On 10/30, Tycho Andersen wrote:
>> > >
>> > > @@ -828,6 +823,11 @@ static int __seccomp_filter(int this_syscall, const struct seccomp_data *sd,
>> > >    */
>> > >   rmb();
>> > >
>> > > + if (!sd) {
>> > > +         populate_seccomp_data(&sd_local);
>> > > +         sd = &sd_local;
>> > > + }
>> > > +
>> >
>> > To me it would be more clean to remove the "if (!sd)" check, case(SECCOMP_RET_TRACE)
>> > in __seccomp_filter() can simply do populate_seccomp_data(&sd_local) unconditionally
>> > and pass &sd_local to __seccomp_filter().
>>
>> Ah, please ignore, emulate_vsyscall() does secure_computing(NULL).

Right.

>>
>> Btw. why __seccomp_filter() doesn't return a boolean?

Because it was wrapped by __secure_computing(). *shrug* The common
method in the kernel is to use int and 0=ok.

>> Or at least, why can't case(SECCOMP_RET_TRACE) simply do
>>
>>       return __seccomp_filter(this_syscall, NULL, true);
>>
>> ?
>
> Yeah, at least the second one definitely makes sense. I can add that
> as a patch in the next version of this series unless Kees does it
> before.

I'd like to avoid changing the return value of __secure_computing() to
just avoid having to touch all the callers. And I'd prefer not to
change __seccomp_filter() to a bool, since I'd like the return values
to be consistent through the call chain.

I find the existing code more readable than a single-line return, just
because it's very explicit. I don't want to have to think any harder
when reading seccomp. ;)

-Kees
Kees Cook Oct. 30, 2018, 9:38 p.m.
On Tue, Oct 30, 2018 at 8:54 AM, Tycho Andersen <tycho@tycho.ws> wrote:
> On Tue, Oct 30, 2018 at 04:02:54PM +0100, Oleg Nesterov wrote:
>> On 10/29, Tycho Andersen wrote:
>> >
>> > +static long seccomp_notify_recv(struct seccomp_filter *filter,
>> > +                           void __user *buf)
>> > +{
>> > +   struct seccomp_knotif *knotif = NULL, *cur;
>> > +   struct seccomp_notif unotif;
>> > +   ssize_t ret;
>> > +
>> > +   memset(&unotif, 0, sizeof(unotif));
>> > +
>> > +   ret = down_interruptible(&filter->notif->request);
>> > +   if (ret < 0)
>> > +           return ret;
>> > +
>> > +   mutex_lock(&filter->notify_lock);
>> > +   list_for_each_entry(cur, &filter->notif->notifications, list) {
>> > +           if (cur->state == SECCOMP_NOTIFY_INIT) {
>> > +                   knotif = cur;
>> > +                   break;
>> > +           }
>> > +   }
>> > +
>> > +   /*
>> > +    * If we didn't find a notification, it could be that the task was
>> > +    * interrupted by a fatal signal between the time we were woken and
>> > +    * when we were able to acquire the rw lock.
>> > +    *
>> > +    * This is the place where we handle the extra high semaphore count
>> > +    * mentioned in seccomp_do_user_notification().
>> > +    */
>> > +   if (!knotif) {
>> > +           ret = -ENOENT;
>> > +           goto out;
>> > +   }
>> > +
>> > +   unotif.id = knotif->id;
>> > +   unotif.pid = task_pid_vnr(knotif->task);
>> > +   if (knotif->signaled)
>> > +           unotif.flags |= SECCOMP_NOTIF_FLAG_SIGNALED;
>> > +   unotif.data = *(knotif->data);
>>
>> Tycho, I forgot everything about seccomp, most probably I am wrong but let me
>> ask anyway.
>>
>> __seccomp_filter(SECCOMP_RET_TRACE) does
>>
>>               /*
>>                * Recheck the syscall, since it may have changed. This
>>                * intentionally uses a NULL struct seccomp_data to force
>>                * a reload of all registers. This does not goto skip since
>>                * a skip would have already been reported.
>>                */
>>               if (__seccomp_filter(this_syscall, NULL, true))
>>                       return -1;
>>
>> and the next seccomp_run_filters() can return SECCOMP_RET_USER_NOTIF, right?
>> seccomp_do_user_notification() doesn't check recheck_after_trace and it simply
>> does n.data = sd.
>>
>> Doesn't this mean that "unotif.data = *(knotif->data)" can hit NULL ?
>>
>> seccomp_run_filters() does populate_seccomp_data() in this case, but this
>> won't affect "seccomp_data *sd" passed to seccomp_do_user_notification().

Woo, yeah, good catch. :)

> Oof, yes, you're right. Seems like there are no other users of sd in
> __seccomp_filter(). Seems to me like we can just do the
> populate_seccomp_data() one level higher in __seccomp_filter()?

Agreed.

>
> Tycho
>
>
> From 9e0f75ea51a2c328567910df3122a236ebeccab0 Mon Sep 17 00:00:00 2001
> From: Tycho Andersen <tycho@tycho.ws>
> Date: Tue, 30 Oct 2018 09:51:14 -0600
> Subject: [PATCH] seccomp: hoist struct seccomp_data recalculation higher
>
> Signed-off-by: Tycho Andersen <tycho@tycho.ws>
> ---
>  kernel/seccomp.c | 12 ++++++------
>  1 file changed, 6 insertions(+), 6 deletions(-)
>
> diff --git a/kernel/seccomp.c b/kernel/seccomp.c
> index 4c5fb6ced4cd..1525cb753ad2 100644
> --- a/kernel/seccomp.c
> +++ b/kernel/seccomp.c
> @@ -257,7 +257,6 @@ static int seccomp_check_filter(struct sock_filter *filter, unsigned int flen)
>  static u32 seccomp_run_filters(const struct seccomp_data *sd,
>                                struct seccomp_filter **match)
>  {
> -       struct seccomp_data sd_local;
>         u32 ret = SECCOMP_RET_ALLOW;
>         /* Make sure cross-thread synced filter points somewhere sane. */
>         struct seccomp_filter *f =
> @@ -267,11 +266,6 @@ static u32 seccomp_run_filters(const struct seccomp_data *sd,
>         if (unlikely(WARN_ON(f == NULL)))
>                 return SECCOMP_RET_KILL_PROCESS;
>
> -       if (!sd) {
> -               populate_seccomp_data(&sd_local);
> -               sd = &sd_local;
> -       }
> -
>         /*
>          * All filters in the list are evaluated and the lowest BPF return
>          * value always takes priority (ignoring the DATA).
> @@ -821,6 +815,7 @@ static int __seccomp_filter(int this_syscall, const struct seccomp_data *sd,
>         u32 filter_ret, action;
>         struct seccomp_filter *match = NULL;
>         int data;
> +       struct seccomp_data sd_local;
>
>         /*
>          * Make sure that any changes to mode from another thread have
> @@ -828,6 +823,11 @@ static int __seccomp_filter(int this_syscall, const struct seccomp_data *sd,
>          */
>         rmb();
>
> +       if (!sd) {
> +               populate_seccomp_data(&sd_local);
> +               sd = &sd_local;
> +       }
> +
>         filter_ret = seccomp_run_filters(sd, &match);
>         data = filter_ret & SECCOMP_RET_DATA;
>         action = filter_ret & SECCOMP_RET_ACTION_FULL;
> --
> 2.17.1
>

Looks good to me, yes.
Oleg Nesterov Oct. 31, 2018, 1:04 p.m.
On 10/30, Kees Cook wrote:
>
> I'd like to avoid changing the return value of __secure_computing() to
> just avoid having to touch all the callers. And I'd prefer not to
> change __seccomp_filter() to a bool, since I'd like the return values
> to be consistent through the call chain.

Sure, please forget.

> I find the existing code more readable than a single-line return, just
> because it's very explicit. I don't want to have to think any harder
> when reading seccomp. ;)

Heh ;) Again, please forget, this is cosmetic.

But I simply can't resist. I asked this question exactly because I was
confused by these 2 lines:

		if (__seccomp_filter(this_syscall, NULL, true))
			return -1;

		return 0;

to me it looks as if we need to filter out some non-zero return values and
turn them into -1. I had to spend some time (and think harder ;) to verify
that this is just the recursive call and nothing more.

nevermind, please ignore.

Oleg.