[3/8] seccomp: Introduce SECCOMP_PIN_ARCHITECTURE

Submitted by Kees Cook on June 16, 2020, 7:49 a.m.

Details

Message ID 20200616074934.1600036-4-keescook@chromium.org
State New
Series "seccomp: Implement constant action bitmaps"
Headers show

Commit Message

Kees Cook June 16, 2020, 7:49 a.m.
For systems that provide multiple syscall maps based on architectures
(e.g. AUDIT_ARCH_X86_64 and AUDIT_ARCH_I386 via CONFIG_COMPAT), allow
a fast way to pin the process to a specific syscall mapping, instead of
needing to generate all filters with an architecture check as the first
filter action.

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Will Drewry <wad@chromium.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 include/linux/seccomp.h      |  3 +++
 include/uapi/linux/seccomp.h |  1 +
 kernel/seccomp.c             | 37 ++++++++++++++++++++++++++++++++++--
 3 files changed, 39 insertions(+), 2 deletions(-)

Patch hide | download patch | download mbox

diff --git a/include/linux/seccomp.h b/include/linux/seccomp.h
index babcd6c02d09..6525ddec177a 100644
--- a/include/linux/seccomp.h
+++ b/include/linux/seccomp.h
@@ -30,6 +30,9 @@  struct seccomp_filter;
  */
 struct seccomp {
 	int mode;
+#ifdef CONFIG_COMPAT
+	u32 arch;
+#endif
 	atomic_t filter_count;
 	struct seccomp_filter *filter;
 };
diff --git a/include/uapi/linux/seccomp.h b/include/uapi/linux/seccomp.h
index c1735455bc53..84e89bb201ae 100644
--- a/include/uapi/linux/seccomp.h
+++ b/include/uapi/linux/seccomp.h
@@ -16,6 +16,7 @@ 
 #define SECCOMP_SET_MODE_FILTER		1
 #define SECCOMP_GET_ACTION_AVAIL	2
 #define SECCOMP_GET_NOTIF_SIZES		3
+#define SECCOMP_PIN_ARCHITECTURE	4
 
 /* Valid flags for SECCOMP_SET_MODE_FILTER */
 #define SECCOMP_FILTER_FLAG_TSYNC		(1UL << 0)
diff --git a/kernel/seccomp.c b/kernel/seccomp.c
index a319700c04c4..43edf53c2d84 100644
--- a/kernel/seccomp.c
+++ b/kernel/seccomp.c
@@ -268,9 +268,16 @@  static u32 seccomp_run_filters(const struct seccomp_data *sd,
 			       struct seccomp_filter **match)
 {
 	u32 ret = SECCOMP_RET_ALLOW;
+	struct seccomp_filter *f;
+
+#ifdef CONFIG_COMPAT
+	/* Block mismatched architectures. */
+	if (current->seccomp.arch && current->seccomp.arch != sd->arch)
+		return SECCOMP_RET_KILL_PROCESS;
+#endif
+
 	/* Make sure cross-thread synced filter points somewhere sane. */
-	struct seccomp_filter *f =
-			READ_ONCE(current->seccomp.filter);
+	f = READ_ONCE(current->seccomp.filter);
 
 	/* Ensure unexpected behavior doesn't result in failing open. */
 	if (WARN_ON(f == NULL))
@@ -478,6 +485,11 @@  static inline void seccomp_sync_threads(unsigned long flags)
 		if (task_no_new_privs(caller))
 			task_set_no_new_privs(thread);
 
+#ifdef CONFIG_COMPAT
+		/* Copy any pinned architecture. */
+		thread->seccomp.arch = caller->seccomp.arch;
+#endif
+
 		/*
 		 * Opt the other thread into seccomp if needed.
 		 * As threads are considered to be trust-realm
@@ -1456,6 +1468,20 @@  static long seccomp_get_notif_sizes(void __user *usizes)
 	return 0;
 }
 
+static long seccomp_pin_architecture(void)
+{
+#ifdef CONFIG_COMPAT
+	u32 arch = syscall_get_arch(current);
+
+	/* How did you even get here? */
+	if (current->seccomp.arch && current->seccomp.arch != arch)
+		return -EBUSY;
+
+	current->seccomp.arch = arch;
+#endif
+	return 0;
+}
+
 /* Common entry point for both prctl and syscall. */
 static long do_seccomp(unsigned int op, unsigned int flags,
 		       void __user *uargs)
@@ -1477,6 +1503,13 @@  static long do_seccomp(unsigned int op, unsigned int flags,
 			return -EINVAL;
 
 		return seccomp_get_notif_sizes(uargs);
+	case SECCOMP_PIN_ARCHITECTURE:
+		if (flags != 0)
+			return -EINVAL;
+		if (uargs != NULL)
+			return -EINVAL;
+
+		return seccomp_pin_architecture();
 	default:
 		return -EINVAL;
 	}

Comments

Andy Lutomirski June 16, 2020, 4:56 p.m.
On Tue, Jun 16, 2020 at 12:49 AM Kees Cook <keescook@chromium.org> wrote:
>
> For systems that provide multiple syscall maps based on architectures
> (e.g. AUDIT_ARCH_X86_64 and AUDIT_ARCH_I386 via CONFIG_COMPAT), allow
> a fast way to pin the process to a specific syscall mapping, instead of
> needing to generate all filters with an architecture check as the first
> filter action.

Can you allow specification of the reject action?  I can see people
wanting TRAP instead, for example.
Jann Horn via Containers June 17, 2020, 3:25 p.m.
On Tue, Jun 16, 2020 at 9:49 AM Kees Cook <keescook@chromium.org> wrote:
> For systems that provide multiple syscall maps based on architectures
> (e.g. AUDIT_ARCH_X86_64 and AUDIT_ARCH_I386 via CONFIG_COMPAT), allow
> a fast way to pin the process to a specific syscall mapping, instead of
> needing to generate all filters with an architecture check as the first
> filter action.

This seems reasonable; but can we maybe also add X86-specific handling
for that X32 mess? AFAIK there are four ways to do syscalls with
AUDIT_ARCH_X86_64:

1. normal x86-64 syscall, X32 bit unset (native case)
2. normal x86-64 syscall, X32 bit set (for X32 code calling syscalls
with no special X32 version)
3. x32-specific syscall, X32 bit unset (never happens legitimately)
4. x32-specific syscall, X32 bit set (for X32 code calling syscalls
with special X32 version)

(I got this wrong when I wrote the notes on x32 in the seccomp manpage...)

Can we add a flag for AUDIT_ARCH_X86_64 that says either "I want
native x64-64" (enforcing case 1) or "I want X32" (enforcing case 2 or
4, and in case 2 checking that the syscall has no X32 equivalent)? (Of
course, if the kernel is built without X32 support, we can leave out
these extra checks.)

> +static long seccomp_pin_architecture(void)
> +{
> +#ifdef CONFIG_COMPAT
> +       u32 arch = syscall_get_arch(current);
> +
> +       /* How did you even get here? */
> +       if (current->seccomp.arch && current->seccomp.arch != arch)
> +               return -EBUSY;
> +
> +       current->seccomp.arch = arch;
> +#endif
> +       return 0;
> +}

Are you intentionally writing this such that SECCOMP_PIN_ARCHITECTURE
only has an effect once you've installed a filter, and propagation to
other threads happens when a filter is installed with TSYNC? I guess
that is a possible way to design the API, but it seems like something
that should at least be pointed out explicitly.
Andy Lutomirski June 17, 2020, 3:29 p.m.
On Wed, Jun 17, 2020 at 8:25 AM Jann Horn <jannh@google.com> wrote:
>
> On Tue, Jun 16, 2020 at 9:49 AM Kees Cook <keescook@chromium.org> wrote:
> > For systems that provide multiple syscall maps based on architectures
> > (e.g. AUDIT_ARCH_X86_64 and AUDIT_ARCH_I386 via CONFIG_COMPAT), allow
> > a fast way to pin the process to a specific syscall mapping, instead of
> > needing to generate all filters with an architecture check as the first
> > filter action.
>
> This seems reasonable; but can we maybe also add X86-specific handling
> for that X32 mess? AFAIK there are four ways to do syscalls with
> AUDIT_ARCH_X86_64:

You're out of date :)  I fixed the mess.

commit 6365b842aae4490ebfafadfc6bb27a6d3cc54757
Author: Andy Lutomirski <luto@kernel.org>
Date:   Wed Jul 3 13:34:04 2019 -0700

    x86/syscalls: Split the x32 syscalls into their own table



>
> 1. normal x86-64 syscall, X32 bit unset (native case)
> 2. normal x86-64 syscall, X32 bit set (for X32 code calling syscalls
> with no special X32 version)

Returns -ENOSYS now if an x32 version was supposed to be used.

> 3. x32-specific syscall, X32 bit unset (never happens legitimately)

Returns -ENOSYS now.

> 4. x32-specific syscall, X32 bit set (for X32 code calling syscalls
> with special X32 version)
>
> (I got this wrong when I wrote the notes on x32 in the seccomp manpage...)
>
> Can we add a flag for AUDIT_ARCH_X86_64 that says either "I want
> native x64-64" (enforcing case 1) or "I want X32" (enforcing case 2 or
> 4, and in case 2 checking that the syscall has no X32 equivalent)? (Of
> course, if the kernel is built without X32 support, we can leave out
> these extra checks.)

No extra checks needed.  Trying to do a syscall with a wrongly-encoded
x32 nr just generates -ENOSYS now.

Henceforth, all new syscalls will have the same number for native and
x32 and will differ only in the presence of the x32 bit.

--Andy
Jann Horn via Containers June 17, 2020, 3:31 p.m.
On Wed, Jun 17, 2020 at 5:30 PM Andy Lutomirski <luto@kernel.org> wrote:
>
> On Wed, Jun 17, 2020 at 8:25 AM Jann Horn <jannh@google.com> wrote:
> >
> > On Tue, Jun 16, 2020 at 9:49 AM Kees Cook <keescook@chromium.org> wrote:
> > > For systems that provide multiple syscall maps based on architectures
> > > (e.g. AUDIT_ARCH_X86_64 and AUDIT_ARCH_I386 via CONFIG_COMPAT), allow
> > > a fast way to pin the process to a specific syscall mapping, instead of
> > > needing to generate all filters with an architecture check as the first
> > > filter action.
> >
> > This seems reasonable; but can we maybe also add X86-specific handling
> > for that X32 mess? AFAIK there are four ways to do syscalls with
> > AUDIT_ARCH_X86_64:
>
> You're out of date :)  I fixed the mess.
>
> commit 6365b842aae4490ebfafadfc6bb27a6d3cc54757
> Author: Andy Lutomirski <luto@kernel.org>
> Date:   Wed Jul 3 13:34:04 2019 -0700
>
>     x86/syscalls: Split the x32 syscalls into their own table

Oooooh, beautiful. Thank you very much for that.