[vz7] x86,ia32: Restore 32bit personality

Submitted by Kirill Gorkunov on Nov. 10, 2017, 3:25 p.m.

Details

Message ID 20171110152511.GA15821@uranus
State New
Series "x86,ia32: Restore 32bit personality"
Headers show

Commit Message

Kirill Gorkunov Nov. 10, 2017, 3:25 p.m.
When restoring compatible applications (ie running in ia32
mode) we have to restore thread flags and mm context at
least otherwise compat_alloc_user_space may allocate
values from old_rsp remembered at last 64 bit syscall
leading to EFAULT.

Note the vanilla kernel already switched to pt_regs::sp
for this sake but backporting the patch is not an
option due to its size.

In the patch we simply make sure that the task is in
container and has no TIF_IA32 before thus when CRIU
call for rt_sigreturn via int80 gate we adjust the
flags needed.

 | [root@pcs7 ~]# vzctl exec b3a8161f-50e2-4072-ab0a-93baa9ac20d4 cat /home/criu/test/zdtm/static/aio01.out
 | 15:08:15.525:   201: tail=2, head=128, nr=255
 | 15:08:26.424:   201: tail=2, head=128, nr=255
 | 15:08:26.425:   201: tail=3, head=128, nr=255
 | 15:08:26.425:   201: PASS

Previously this test failed because after the
restore we call for io_submit in the test and
kernel tries to allocate helper structure to
convert arguments from compat to native mode
with compat_alloc_user_space and failed.

https://jira.sw.ru/browse/PSBM-76965

CC: Andrei Vagin <avagin@openvz.org>
CC: Dmitry Safonov <0x7f454c46@gmail.com>
CC: Pavel Emelyanov <xemul@virtuozzo.com>
CC: Kirill Tkhai <ktkhai@virtuozzo.com>
CC: Konstantin Khorenko <khorenko@virtuozzo.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
---
 arch/x86/ia32/ia32_signal.c |   53 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 53 insertions(+)

Patch hide | download patch | download mbox

Index: linux-pcs7.git/arch/x86/ia32/ia32_signal.c
===================================================================
--- linux-pcs7.git.orig/arch/x86/ia32/ia32_signal.c
+++ linux-pcs7.git/arch/x86/ia32/ia32_signal.c
@@ -35,6 +35,10 @@ 
 #include <asm/sys_ia32.h>
 #include <asm/smap.h>
 
+#ifdef CONFIG_VE
+# include <linux/ve.h>
+#endif
+
 void sigaction_compat_abi(struct k_sigaction *act, struct k_sigaction *oact)
 {
 	/* Don't leak in-kernel non-uapi flags to user-space */
@@ -258,6 +262,53 @@  badframe:
 	return 0;
 }
 
+#ifdef CONFIG_VE
+extern int force_personality32;
+
+static void ve_set_personality_ia32(struct pt_regs *regs)
+{
+	if (ve_is_super(current->task_ve) ||
+	    test_thread_flag(TIF_IA32))
+		return;
+
+	/*
+	 * TIF_IA32 flag is used by the kernel to figure out
+	 * the task address space at least but more importantly
+	 * in compat_alloc_user_space when an application is
+	 * doing syscalls like io_submit and etc. So when
+	 * we restore such process via rt_sigreturn syscall
+	 * from inside of native mode with int80 gate help
+	 * we have to restore the TIF_IA32 and ia32_compat
+	 * at least.
+	 *
+	 * NOTE 1: this applies to applications running inside
+	 * VE only. The vanilla kernel alreasy throwing off
+	 * old_rsp symbol but the patch is too big to merge
+	 * inside.
+	 *
+	 * NOTE 2: this is pretty valid to call int80 on
+	 * regular x86-64 bit application with rt_sigreturn
+	 * as syscall number, but it's rather an exception
+	 * than common practice (except CRIU of course).
+	 */
+	if (regs->cs == __USER32_CS) {
+		/*
+		 * It's close to set_personality_ia32
+		 * but we don't want to change orig_ax.
+		 */
+		set_thread_flag(TIF_ADDR32);
+		set_thread_flag(TIF_IA32);
+		clear_thread_flag(TIF_X32);
+		current->personality |= force_personality32;
+		current_thread_info()->status |= TS_COMPAT;
+		if (current->mm)
+			current->mm->context.ia32_compat = TIF_IA32;
+	}
+}
+#else
+static void ve_set_personality_ia32(struct pt_regs *regs) { }
+#endif
+
 asmlinkage long sys32_rt_sigreturn(void)
 {
 	struct pt_regs *regs = current_pt_regs();
@@ -277,6 +328,8 @@  asmlinkage long sys32_rt_sigreturn(void)
 	if (ia32_restore_sigcontext(regs, &frame->uc.uc_mcontext, &ax))
 		goto badframe;
 
+	ve_set_personality_ia32(regs);
+
 	if (compat_restore_altstack(&frame->uc.uc_stack))
 		goto badframe;
 

Comments

Kirill Gorkunov Nov. 10, 2017, 4:21 p.m.
On Fri, Nov 10, 2017 at 04:04:01PM +0000, Dmitry Safonov wrote:
> > +       if (regs->cs == __USER32_CS) {
> > +               /*
> > +                * It's close to set_personality_ia32
> > +                * but we don't want to change orig_ax.
> > +                */
> > +               set_thread_flag(TIF_ADDR32);
> > +               set_thread_flag(TIF_IA32);
> > +               clear_thread_flag(TIF_X32);
> > +               current->personality |= force_personality32;
> > +               current_thread_info()->status |= TS_COMPAT;
> > +               if (current->mm)
> > +                       current->mm->context.ia32_compat = TIF_IA32;
> > +       }
> 
> I see. The thing is that setting this thread-flag may be considered
> racy from the kernel side.

Well, to be fair indeed there is no lock and potential place for
a race, but since we're inside syscall handler I don't think
we hit any problem. I'll revisit locking, thanks! And note the
only purpose for this patch is to support criu since normal
programs don't usually do such things.

> So, the way mainstream accepted ia32 C/R is that this TIF_IA32
> will not be switched from userspace. Instead, this flag is considered
> for removing. And the task from kernel perspective shouldn't differ
> with exception for which syscall they are doing (ia32/x32/64-bit).
> While I've removed the major users (like signal ABI format delivery,
> ptrace and etc), there are a couple of minor users left like oprofile,
> uprobes - and the way they use this flag is not critical for their work.
> 
> So, it looks like this difference between vz and ms has slipped my
> mind. I don't mind your change, but have you tried something less
> intrusive? Something like the attached patch.

Yes, but I can't fetch the vanilla patches because they are a
way too big. While I like the idea of your patch we have to
do somthing with do_signal as well

		case -ERESTART_RESTARTBLOCK:
			regs->ax = NR_restart_syscall;
			regs->ip -= 2;
			break;

where

define NR_restart_syscall	\
	test_thread_flag(TIF_IA32) ? __NR_ia32_restart_syscall : __NR_restart_syscall

So that TIF_IA32 more intrusive than might be see in first glance.
Still thanks a lot! I'll take more precisely on Mon maybe extending
your patch will be less code. Thanks!