[RFC] compel: Save thread registers before executing parasite code

Submitted by Michael Holzheu on July 19, 2017, 6:54 p.m.

Details

Message ID 20170719205408.79c41ee0@TP-holzheu
State New
Series "compel: Save thread registers before executing parasite code"
Headers show

Commit Message

Michael Holzheu July 19, 2017, 6:54 p.m.
For dumping threads we execute parasite code before we collect the
floating point registers with get_task_regs().

The following describes the code path were this is done:

 dump_task_threads()
     for (i = 0; i < item->nr_threads; i++)
         dump_task_thread()
           parasite_dump_thread_seized()
             compel_prepare_thread()
                prepare_thread()

                   ### Get general purpose registers ###
                   ptrace_get_regs(ctx->regs)

             ### Parasite code is executed in thread ###
             compel_run_in_thread(tctl, PARASITE_CMD_DUMP_THREAD)

             compel_get_thread_regs()

                 ### Get FP and VX registers ###
                 get_task_regs()

On s390 gcc uses floating point registers to cache general purpose
registers. Therefore we use the -msoft-float option to ensure that
floating point registers are not clobbered in compel_run_in_thread().

There might be a better option:

First save the thread registers and then call the parasite code. This
allows us to remove the -msoft-float on s390.

Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
---
 Makefile.compel           |  4 ++--
 criu/parasite-syscall.c   | 11 +++++------
 criu/pie/Makefile         |  7 -------
 criu/pie/Makefile.library |  6 ------
 4 files changed, 7 insertions(+), 21 deletions(-)

Patch hide | download patch | download mbox

diff --git a/Makefile.compel b/Makefile.compel
index 5c854e3..d200cb8 100644
--- a/Makefile.compel
+++ b/Makefile.compel
@@ -77,6 +77,6 @@  compel-install-targets	+= $(compel-plugins)
 # commit "S/390: Fix 64 bit sibcall".
 #
 ifeq ($(ARCH),s390)
-CFLAGS += -msoft-float -fno-optimize-sibling-calls
-HOSTCFLAGS += -msoft-float -fno-optimize-sibling-calls
+CFLAGS += -fno-optimize-sibling-calls
+HOSTCFLAGS += -fno-optimize-sibling-calls
 endif
diff --git a/criu/parasite-syscall.c b/criu/parasite-syscall.c
index 69dba60..76e943e 100644
--- a/criu/parasite-syscall.c
+++ b/criu/parasite-syscall.c
@@ -198,6 +198,11 @@  int parasite_dump_thread_seized(struct parasite_ctl *ctl, const struct pstree_it
 
 	tc->has_blk_sigset = true;
 	memcpy(&tc->blk_sigset, compel_thread_sigmask(tctl), sizeof(k_rtsigset_t));
+	ret = compel_get_thread_regs(tctl, save_task_regs, core);
+	if (ret) {
+		pr_err("Can't obtain regs for thread %d\n", pid);
+		goto err_rth;
+	}
 
 	ret = compel_run_in_thread(tctl, PARASITE_CMD_DUMP_THREAD);
 	if (ret) {
@@ -211,12 +216,6 @@  int parasite_dump_thread_seized(struct parasite_ctl *ctl, const struct pstree_it
 		goto err_rth;
 	}
 
-	ret = compel_get_thread_regs(tctl, save_task_regs, core);
-	if (ret) {
-		pr_err("Can't obtain regs for thread %d\n", pid);
-		goto err_rth;
-	}
-
 	compel_release_thread(tctl);
 
 	*parasite_tid = args->tid;
diff --git a/criu/pie/Makefile b/criu/pie/Makefile
index 76c3535..73d95d5 100644
--- a/criu/pie/Makefile
+++ b/criu/pie/Makefile
@@ -16,13 +16,6 @@  ifeq ($(SRCARCH),arm)
 	ccflags-y	+= -marm
 endif
 
-# We assume that compel code does not change floating point registers.
-# On s390 gcc uses fprs to cache gprs. Therefore disable floating point
-# with -msoft-float.
-ifeq ($(SRCARCH),s390)
-	ccflags-y	+= -msoft-float
-endif
-
 asflags-y	+= -D__ASSEMBLY__
 
 LDS		:= compel/arch/$(SRCARCH)/scripts/compel-pack.lds.S
diff --git a/criu/pie/Makefile.library b/criu/pie/Makefile.library
index ceadc1d..f589333 100644
--- a/criu/pie/Makefile.library
+++ b/criu/pie/Makefile.library
@@ -41,9 +41,3 @@  ccflags-y		+= $(COMPEL_UAPI_INCLUDES)
 ifeq ($(SRCARCH),arm)
 	ccflags-y	+= -marm
 endif
-# We assume that compel code does not change floating point registers.
-# On s390 gcc uses fprs to cache gprs. Therefore disable floating point
-# with -msoft-float.
-ifeq ($(SRCARCH),s390)
-	ccflags-y	+= -msoft-float
-endif

Comments

Dmitry Safonov July 19, 2017, 8:05 p.m.
2017-07-19 21:54 GMT+03:00 Michael Holzheu <holzheu@linux.vnet.ibm.com>:
> For dumping threads we execute parasite code before we collect the
> floating point registers with get_task_regs().
>
> The following describes the code path were this is done:
>
>  dump_task_threads()
>      for (i = 0; i < item->nr_threads; i++)
>          dump_task_thread()
>            parasite_dump_thread_seized()
>              compel_prepare_thread()
>                 prepare_thread()
>
>                    ### Get general purpose registers ###
>                    ptrace_get_regs(ctx->regs)
>
>              ### Parasite code is executed in thread ###
>              compel_run_in_thread(tctl, PARASITE_CMD_DUMP_THREAD)
>
>              compel_get_thread_regs()
>
>                  ### Get FP and VX registers ###
>                  get_task_regs()
>
> On s390 gcc uses floating point registers to cache general purpose
> registers. Therefore we use the -msoft-float option to ensure that
> floating point registers are not clobbered in compel_run_in_thread().
>
> There might be a better option:
>
> First save the thread registers and then call the parasite code. This
> allows us to remove the -msoft-float on s390.
>
> Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
> ---
>  Makefile.compel           |  4 ++--
>  criu/parasite-syscall.c   | 11 +++++------
>  criu/pie/Makefile         |  7 -------
>  criu/pie/Makefile.library |  6 ------
>  4 files changed, 7 insertions(+), 21 deletions(-)
>
> diff --git a/Makefile.compel b/Makefile.compel
> index 5c854e3..d200cb8 100644
> --- a/Makefile.compel
> +++ b/Makefile.compel
> @@ -77,6 +77,6 @@ compel-install-targets        += $(compel-plugins)
>  # commit "S/390: Fix 64 bit sibcall".
>  #
>  ifeq ($(ARCH),s390)
> -CFLAGS += -msoft-float -fno-optimize-sibling-calls
> -HOSTCFLAGS += -msoft-float -fno-optimize-sibling-calls
> +CFLAGS += -fno-optimize-sibling-calls
> +HOSTCFLAGS += -fno-optimize-sibling-calls
>  endif
> diff --git a/criu/parasite-syscall.c b/criu/parasite-syscall.c
> index 69dba60..76e943e 100644
> --- a/criu/parasite-syscall.c
> +++ b/criu/parasite-syscall.c
> @@ -198,6 +198,11 @@ int parasite_dump_thread_seized(struct parasite_ctl *ctl, const struct pstree_it
>
>         tc->has_blk_sigset = true;
>         memcpy(&tc->blk_sigset, compel_thread_sigmask(tctl), sizeof(k_rtsigset_t));
> +       ret = compel_get_thread_regs(tctl, save_task_regs, core);
> +       if (ret) {
> +               pr_err("Can't obtain regs for thread %d\n", pid);
> +               goto err_rth;
> +       }

That *looks* not very correct by the following reason:

ucontext_extended has fpu/vxr registers on it, so for the *task*
they will be restored on parasite's CMD_FINI.
And you get them before running parasite in a task - that's ok.

But from the point of *thread*'s view - you dump them here,
run parasite inside thread, which can change fpu/vxr, but only
restore generic registers after (parasite_run()=>ptrace_set_regs()).

That means that dumping thread's core may corrupt fpu/vxr, which
will break as the result --leave-running and err-path recovery curing
for the parasite.

Please, also correct comment near ptrace_get_regs() for s390.

>
>         ret = compel_run_in_thread(tctl, PARASITE_CMD_DUMP_THREAD);
>         if (ret) {
> @@ -211,12 +216,6 @@ int parasite_dump_thread_seized(struct parasite_ctl *ctl, const struct pstree_it
>                 goto err_rth;
>         }
>
> -       ret = compel_get_thread_regs(tctl, save_task_regs, core);
> -       if (ret) {
> -               pr_err("Can't obtain regs for thread %d\n", pid);
> -               goto err_rth;
> -       }
> -
>         compel_release_thread(tctl);
>
>         *parasite_tid = args->tid;
> diff --git a/criu/pie/Makefile b/criu/pie/Makefile
> index 76c3535..73d95d5 100644
> --- a/criu/pie/Makefile
> +++ b/criu/pie/Makefile
> @@ -16,13 +16,6 @@ ifeq ($(SRCARCH),arm)
>         ccflags-y       += -marm
>  endif
>
> -# We assume that compel code does not change floating point registers.
> -# On s390 gcc uses fprs to cache gprs. Therefore disable floating point
> -# with -msoft-float.
> -ifeq ($(SRCARCH),s390)
> -       ccflags-y       += -msoft-float
> -endif
> -
>  asflags-y      += -D__ASSEMBLY__
>
>  LDS            := compel/arch/$(SRCARCH)/scripts/compel-pack.lds.S
> diff --git a/criu/pie/Makefile.library b/criu/pie/Makefile.library
> index ceadc1d..f589333 100644
> --- a/criu/pie/Makefile.library
> +++ b/criu/pie/Makefile.library
> @@ -41,9 +41,3 @@ ccflags-y             += $(COMPEL_UAPI_INCLUDES)
>  ifeq ($(SRCARCH),arm)
>         ccflags-y       += -marm
>  endif
> -# We assume that compel code does not change floating point registers.
> -# On s390 gcc uses fprs to cache gprs. Therefore disable floating point
> -# with -msoft-float.
> -ifeq ($(SRCARCH),s390)
> -       ccflags-y       += -msoft-float
> -endif
Dmitry Safonov July 19, 2017, 8:20 p.m.
2017-07-19 23:05 GMT+03:00 Dmitry Safonov <0x7f454c46@gmail.com>:
> 2017-07-19 21:54 GMT+03:00 Michael Holzheu <holzheu@linux.vnet.ibm.com>:
>> For dumping threads we execute parasite code before we collect the
>> floating point registers with get_task_regs().
>>
>> The following describes the code path were this is done:
>>
>>  dump_task_threads()
>>      for (i = 0; i < item->nr_threads; i++)
>>          dump_task_thread()
>>            parasite_dump_thread_seized()
>>              compel_prepare_thread()
>>                 prepare_thread()
>>
>>                    ### Get general purpose registers ###
>>                    ptrace_get_regs(ctx->regs)
>>
>>              ### Parasite code is executed in thread ###
>>              compel_run_in_thread(tctl, PARASITE_CMD_DUMP_THREAD)
>>
>>              compel_get_thread_regs()
>>
>>                  ### Get FP and VX registers ###
>>                  get_task_regs()
>>
>> On s390 gcc uses floating point registers to cache general purpose
>> registers. Therefore we use the -msoft-float option to ensure that
>> floating point registers are not clobbered in compel_run_in_thread().
>>
>> There might be a better option:
>>
>> First save the thread registers and then call the parasite code. This
>> allows us to remove the -msoft-float on s390.
>>
>> Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
>> ---
>>  Makefile.compel           |  4 ++--
>>  criu/parasite-syscall.c   | 11 +++++------
>>  criu/pie/Makefile         |  7 -------
>>  criu/pie/Makefile.library |  6 ------
>>  4 files changed, 7 insertions(+), 21 deletions(-)
>>
>> diff --git a/Makefile.compel b/Makefile.compel
>> index 5c854e3..d200cb8 100644
>> --- a/Makefile.compel
>> +++ b/Makefile.compel
>> @@ -77,6 +77,6 @@ compel-install-targets        += $(compel-plugins)
>>  # commit "S/390: Fix 64 bit sibcall".
>>  #
>>  ifeq ($(ARCH),s390)
>> -CFLAGS += -msoft-float -fno-optimize-sibling-calls
>> -HOSTCFLAGS += -msoft-float -fno-optimize-sibling-calls
>> +CFLAGS += -fno-optimize-sibling-calls
>> +HOSTCFLAGS += -fno-optimize-sibling-calls
>>  endif
>> diff --git a/criu/parasite-syscall.c b/criu/parasite-syscall.c
>> index 69dba60..76e943e 100644
>> --- a/criu/parasite-syscall.c
>> +++ b/criu/parasite-syscall.c
>> @@ -198,6 +198,11 @@ int parasite_dump_thread_seized(struct parasite_ctl *ctl, const struct pstree_it
>>
>>         tc->has_blk_sigset = true;
>>         memcpy(&tc->blk_sigset, compel_thread_sigmask(tctl), sizeof(k_rtsigset_t));
>> +       ret = compel_get_thread_regs(tctl, save_task_regs, core);
>> +       if (ret) {
>> +               pr_err("Can't obtain regs for thread %d\n", pid);
>> +               goto err_rth;
>> +       }
>
> That *looks* not very correct by the following reason:
>
> ucontext_extended has fpu/vxr registers on it, so for the *task*
> they will be restored on parasite's CMD_FINI.
> And you get them before running parasite in a task - that's ok.
>
> But from the point of *thread*'s view - you dump them here,
> run parasite inside thread, which can change fpu/vxr, but only
> restore generic registers after (parasite_run()=>ptrace_set_regs()).
>
> That means that dumping thread's core may corrupt fpu/vxr, which
> will break as the result --leave-running and err-path recovery curing
> for the parasite.
>
> Please, also correct comment near ptrace_get_regs() for s390.

And if you want to verify that all works as expected, I would suggest
to add fpu03 s390-version test having as an example x86 versions
test/zdtm/static/{fpu00.c,fpu01.c}.

Also might be worth if the test would have a couple of threads, to
check the mentioned issue.

>
>>
>>         ret = compel_run_in_thread(tctl, PARASITE_CMD_DUMP_THREAD);
>>         if (ret) {
>> @@ -211,12 +216,6 @@ int parasite_dump_thread_seized(struct parasite_ctl *ctl, const struct pstree_it
>>                 goto err_rth;
>>         }
>>
>> -       ret = compel_get_thread_regs(tctl, save_task_regs, core);
>> -       if (ret) {
>> -               pr_err("Can't obtain regs for thread %d\n", pid);
>> -               goto err_rth;
>> -       }
>> -
>>         compel_release_thread(tctl);
>>
>>         *parasite_tid = args->tid;
>> diff --git a/criu/pie/Makefile b/criu/pie/Makefile
>> index 76c3535..73d95d5 100644
>> --- a/criu/pie/Makefile
>> +++ b/criu/pie/Makefile
>> @@ -16,13 +16,6 @@ ifeq ($(SRCARCH),arm)
>>         ccflags-y       += -marm
>>  endif
>>
>> -# We assume that compel code does not change floating point registers.
>> -# On s390 gcc uses fprs to cache gprs. Therefore disable floating point
>> -# with -msoft-float.
>> -ifeq ($(SRCARCH),s390)
>> -       ccflags-y       += -msoft-float
>> -endif
>> -
>>  asflags-y      += -D__ASSEMBLY__
>>
>>  LDS            := compel/arch/$(SRCARCH)/scripts/compel-pack.lds.S
>> diff --git a/criu/pie/Makefile.library b/criu/pie/Makefile.library
>> index ceadc1d..f589333 100644
>> --- a/criu/pie/Makefile.library
>> +++ b/criu/pie/Makefile.library
>> @@ -41,9 +41,3 @@ ccflags-y             += $(COMPEL_UAPI_INCLUDES)
>>  ifeq ($(SRCARCH),arm)
>>         ccflags-y       += -marm
>>  endif
>> -# We assume that compel code does not change floating point registers.
>> -# On s390 gcc uses fprs to cache gprs. Therefore disable floating point
>> -# with -msoft-float.
>> -ifeq ($(SRCARCH),s390)
>> -       ccflags-y       += -msoft-float
>> -endif
Michael Holzheu July 20, 2017, 11:09 a.m.
Am Wed, 19 Jul 2017 23:05:38 +0300
schrieb Dmitry Safonov <0x7f454c46@gmail.com>:

> 2017-07-19 21:54 GMT+03:00 Michael Holzheu <holzheu@linux.vnet.ibm.com>:
> > For dumping threads we execute parasite code before we collect the
> > floating point registers with get_task_regs().
> >
> > The following describes the code path were this is done:
> >
> >  dump_task_threads()
> >      for (i = 0; i < item->nr_threads; i++)
> >          dump_task_thread()
> >            parasite_dump_thread_seized()
> >              compel_prepare_thread()
> >                 prepare_thread()
> >
> >                    ### Get general purpose registers ###
> >                    ptrace_get_regs(ctx->regs)
> >
> >              ### Parasite code is executed in thread ###
> >              compel_run_in_thread(tctl, PARASITE_CMD_DUMP_THREAD)
> >
> >              compel_get_thread_regs()
> >
> >                  ### Get FP and VX registers ###
> >                  get_task_regs()
> >
> > On s390 gcc uses floating point registers to cache general purpose
> > registers. Therefore we use the -msoft-float option to ensure that
> > floating point registers are not clobbered in compel_run_in_thread().
> >
> > There might be a better option:
> >
> > First save the thread registers and then call the parasite code. This
> > allows us to remove the -msoft-float on s390.
> >
> > Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
> > ---
> >  Makefile.compel           |  4 ++--
> >  criu/parasite-syscall.c   | 11 +++++------
> >  criu/pie/Makefile         |  7 -------
> >  criu/pie/Makefile.library |  6 ------
> >  4 files changed, 7 insertions(+), 21 deletions(-)
> >
> > diff --git a/Makefile.compel b/Makefile.compel
> > index 5c854e3..d200cb8 100644
> > --- a/Makefile.compel
> > +++ b/Makefile.compel
> > @@ -77,6 +77,6 @@ compel-install-targets        += $(compel-plugins)
> >  # commit "S/390: Fix 64 bit sibcall".
> >  #
> >  ifeq ($(ARCH),s390)
> > -CFLAGS += -msoft-float -fno-optimize-sibling-calls
> > -HOSTCFLAGS += -msoft-float -fno-optimize-sibling-calls
> > +CFLAGS += -fno-optimize-sibling-calls
> > +HOSTCFLAGS += -fno-optimize-sibling-calls
> >  endif
> > diff --git a/criu/parasite-syscall.c b/criu/parasite-syscall.c
> > index 69dba60..76e943e 100644
> > --- a/criu/parasite-syscall.c
> > +++ b/criu/parasite-syscall.c
> > @@ -198,6 +198,11 @@ int parasite_dump_thread_seized(struct parasite_ctl *ctl, const struct pstree_it
> >
> >         tc->has_blk_sigset = true;
> >         memcpy(&tc->blk_sigset, compel_thread_sigmask(tctl), sizeof(k_rtsigset_t));
> > +       ret = compel_get_thread_regs(tctl, save_task_regs, core);
> > +       if (ret) {
> > +               pr_err("Can't obtain regs for thread %d\n", pid);
> > +               goto err_rth;
> > +       }
> 
> That *looks* not very correct by the following reason:
> 
> ucontext_extended has fpu/vxr registers on it, so for the *task*
> they will be restored on parasite's CMD_FINI.
> And you get them before running parasite in a task - that's ok.
> 
> But from the point of *thread*'s view - you dump them here,
> run parasite inside thread, which can change fpu/vxr, but only
> restore generic registers after (parasite_run()=>ptrace_set_regs()).
> 
> That means that dumping thread's core may corrupt fpu/vxr, which
> will break as the result --leave-running and err-path recovery curing
> for the parasite.

You are right, I forgot --leave-running and the error path.

So the current CRIU design is that the parasite code must not change
floating point and VX registers?

Are you sure that on the other architectures FP and VX registers are
not changed by auto-vectorization or other fancy optimization stuff
with -O2?
 
> Please, also correct comment near ptrace_get_regs() for s390.

You mean, if I provide an updated patch that removes soft-float?

Michael
Dmitry Safonov July 20, 2017, 2 p.m.
2017-07-20 14:09 GMT+03:00 Michael Holzheu <holzheu@linux.vnet.ibm.com>:
> Am Wed, 19 Jul 2017 23:05:38 +0300
> schrieb Dmitry Safonov <0x7f454c46@gmail.com>:
>
>> 2017-07-19 21:54 GMT+03:00 Michael Holzheu <holzheu@linux.vnet.ibm.com>:
>> > For dumping threads we execute parasite code before we collect the
>> > floating point registers with get_task_regs().
>> >
>> > The following describes the code path were this is done:
>> >
>> >  dump_task_threads()
>> >      for (i = 0; i < item->nr_threads; i++)
>> >          dump_task_thread()
>> >            parasite_dump_thread_seized()
>> >              compel_prepare_thread()
>> >                 prepare_thread()
>> >
>> >                    ### Get general purpose registers ###
>> >                    ptrace_get_regs(ctx->regs)
>> >
>> >              ### Parasite code is executed in thread ###
>> >              compel_run_in_thread(tctl, PARASITE_CMD_DUMP_THREAD)
>> >
>> >              compel_get_thread_regs()
>> >
>> >                  ### Get FP and VX registers ###
>> >                  get_task_regs()
>> >
>> > On s390 gcc uses floating point registers to cache general purpose
>> > registers. Therefore we use the -msoft-float option to ensure that
>> > floating point registers are not clobbered in compel_run_in_thread().
>> >
>> > There might be a better option:
>> >
>> > First save the thread registers and then call the parasite code. This
>> > allows us to remove the -msoft-float on s390.
>> >
>> > Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
>> > ---
>> >  Makefile.compel           |  4 ++--
>> >  criu/parasite-syscall.c   | 11 +++++------
>> >  criu/pie/Makefile         |  7 -------
>> >  criu/pie/Makefile.library |  6 ------
>> >  4 files changed, 7 insertions(+), 21 deletions(-)
>> >
>> > diff --git a/Makefile.compel b/Makefile.compel
>> > index 5c854e3..d200cb8 100644
>> > --- a/Makefile.compel
>> > +++ b/Makefile.compel
>> > @@ -77,6 +77,6 @@ compel-install-targets        += $(compel-plugins)
>> >  # commit "S/390: Fix 64 bit sibcall".
>> >  #
>> >  ifeq ($(ARCH),s390)
>> > -CFLAGS += -msoft-float -fno-optimize-sibling-calls
>> > -HOSTCFLAGS += -msoft-float -fno-optimize-sibling-calls
>> > +CFLAGS += -fno-optimize-sibling-calls
>> > +HOSTCFLAGS += -fno-optimize-sibling-calls
>> >  endif
>> > diff --git a/criu/parasite-syscall.c b/criu/parasite-syscall.c
>> > index 69dba60..76e943e 100644
>> > --- a/criu/parasite-syscall.c
>> > +++ b/criu/parasite-syscall.c
>> > @@ -198,6 +198,11 @@ int parasite_dump_thread_seized(struct parasite_ctl *ctl, const struct pstree_it
>> >
>> >         tc->has_blk_sigset = true;
>> >         memcpy(&tc->blk_sigset, compel_thread_sigmask(tctl), sizeof(k_rtsigset_t));
>> > +       ret = compel_get_thread_regs(tctl, save_task_regs, core);
>> > +       if (ret) {
>> > +               pr_err("Can't obtain regs for thread %d\n", pid);
>> > +               goto err_rth;
>> > +       }
>>
>> That *looks* not very correct by the following reason:
>>
>> ucontext_extended has fpu/vxr registers on it, so for the *task*
>> they will be restored on parasite's CMD_FINI.
>> And you get them before running parasite in a task - that's ok.
>>
>> But from the point of *thread*'s view - you dump them here,
>> run parasite inside thread, which can change fpu/vxr, but only
>> restore generic registers after (parasite_run()=>ptrace_set_regs()).
>>
>> That means that dumping thread's core may corrupt fpu/vxr, which
>> will break as the result --leave-running and err-path recovery curing
>> for the parasite.
>
> You are right, I forgot --leave-running and the error path.
>
> So the current CRIU design is that the parasite code must not change
> floating point and VX registers?

Well, you can change them, but you should restore their content on curing.
That looks not that hard - providing arch-dependent function
compel_set_thread_regs() that will restore fpu/vx registers
(with the comment that they're used by parasite) and call it after
compel_run_in_thread() - don't forget to restore them also on err-path
if compel_run_in_thread() failed.

> Are you sure that on the other architectures FP and VX registers are
> not changed by auto-vectorization or other fancy optimization stuff
> with -O2?

No. The best test coverage, for now, is for x86_64 - the other archs
do not get as much attention and care, so there might be errors.

>> Please, also correct comment near ptrace_get_regs() for s390.
>
> You mean, if I provide an updated patch that removes soft-float?

Yes.