test/others/rpc test cases

Submitted by Dmitry Safonov on May 16, 2018, 4:16 p.m.

Details

Message ID CAJwJo6Ydvm6-o+R3GGN0imM_JwzMh0XuJ-_1oGjVRG7w22m8cQ@mail.gmail.com
State New
Series "test/others/rpc test cases"
Headers show

Commit Message

Dmitry Safonov May 16, 2018, 4:16 p.m.
2018-05-16 16:52 GMT+01:00 Adrian Reber <adrian@lisas.de>:
> Hello Andrei,
>
> we quickly chatted about the tests in test/others/rpc and I had a look
> at it.
>
> One test (errno.py) dumps itself, keeps on running and tries to restore
> itself. This should (and does fail) as the PID is already in use by the
> test case. The test case wants to to test the error handling of 'criu
> service'. But CRIU just kills all processes which it failed to restore
> in criu/cr-restore.c (around line 2240 : kill(vpid(pi), SIGKILL);).
>
> As the test case is killed by CRIU it counts as a failure. I guess this
> used to work at some point in time, but with 3.8.1 it does not.
>
> I am not sure what the right solution is. Should CRIU kill a process if
> restore fails, probably to make sure that no half restored process still
> exists. Maybe the test case does not make sense anymore...
>
> Do you have any thoughts on this? Is this a CRIU bug or a wrong test case?

It looks to me like a valid test. CRIU shouldn't kill existing tasks with
reused PID. Do we run this errno.py on Travis?

I haven't even compile tested, but something like this might fix?

--->8---

Patch hide | download patch | download mbox

--- a/criu/cr-restore.c
+++ b/criu/cr-restore.c
@@ -1510,6 +1510,7 @@  static int do_fork_with_pid(struct pstree_item
*item, struct ns_id *pid_ns, stru
        if (pid < 0) {
                pr_perror("Can't clone()");
                ret = -1;
+               hlp_pid->ns[i].virt = -1;
                goto unblock;
        }

Comments

Adrian Reber May 16, 2018, 6:54 p.m.
On Wed, May 16, 2018 at 05:16:12PM +0100, Dmitry Safonov wrote:
> 2018-05-16 16:52 GMT+01:00 Adrian Reber <adrian@lisas.de>:
> > we quickly chatted about the tests in test/others/rpc and I had a look
> > at it.
> >
> > One test (errno.py) dumps itself, keeps on running and tries to restore
> > itself. This should (and does fail) as the PID is already in use by the
> > test case. The test case wants to to test the error handling of 'criu
> > service'. But CRIU just kills all processes which it failed to restore
> > in criu/cr-restore.c (around line 2240 : kill(vpid(pi), SIGKILL);).
> >
> > As the test case is killed by CRIU it counts as a failure. I guess this
> > used to work at some point in time, but with 3.8.1 it does not.
> >
> > I am not sure what the right solution is. Should CRIU kill a process if
> > restore fails, probably to make sure that no half restored process still
> > exists. Maybe the test case does not make sense anymore...
> >
> > Do you have any thoughts on this? Is this a CRIU bug or a wrong test case?
> 
> It looks to me like a valid test. CRIU shouldn't kill existing tasks with
> reused PID. Do we run this errno.py on Travis?
> 
> I haven't even compile tested, but something like this might fix?
> 
> --->8---
> --- a/criu/cr-restore.c
> +++ b/criu/cr-restore.c
> @@ -1510,6 +1510,7 @@ static int do_fork_with_pid(struct pstree_item
> *item, struct ns_id *pid_ns, stru
>         if (pid < 0) {
>                 pr_perror("Can't clone()");
>                 ret = -1;
> +               hlp_pid->ns[i].virt = -1;
>                 goto unblock;
>         }
> 
> 
> -- 

No, the test case is still killed by CRIU. That is the end of the
restore log file (with additional printouts from my testing):

(00.005773) PID: real 25375 virt 25374
(00.005811) Error (criu/cr-restore.c:1869): Pid 25375 do not match expected 25374 (task 25374)
(00.005863) Error (criu/cr-restore.c:2392): Failed to wait inprogress tasks
(00.005869) Warn  (criu/cr-restore.c:2561): About to kill 25374
(00.007245) Error (criu/namespaces.c:1604): Spurious pid ns helper: pid=25375
(00.007256) Error (criu/cr-restore.c:1629): 25375 exited, status=1
(00.007266) Error (criu/cr-restore.c:2571): Restoring FAILED.


		Adrian
Andrey Vagin May 23, 2018, 9:26 p.m.
On Wed, May 16, 2018 at 05:16:12PM +0100, Dmitry Safonov wrote:
> 2018-05-16 16:52 GMT+01:00 Adrian Reber <adrian@lisas.de>:
> > Hello Andrei,
> >
> > we quickly chatted about the tests in test/others/rpc and I had a look
> > at it.
> >
> > One test (errno.py) dumps itself, keeps on running and tries to restore
> > itself. This should (and does fail) as the PID is already in use by the
> > test case. The test case wants to to test the error handling of 'criu
> > service'. But CRIU just kills all processes which it failed to restore
> > in criu/cr-restore.c (around line 2240 : kill(vpid(pi), SIGKILL);).
> >
> > As the test case is killed by CRIU it counts as a failure. I guess this
> > used to work at some point in time, but with 3.8.1 it does not.
> >
> > I am not sure what the right solution is. Should CRIU kill a process if
> > restore fails, probably to make sure that no half restored process still
> > exists. Maybe the test case does not make sense anymore...
> >
> > Do you have any thoughts on this? Is this a CRIU bug or a wrong test case?
> 
> It looks to me like a valid test. CRIU shouldn't kill existing tasks with
> reused PID. Do we run this errno.py on Travis?
> 
> I haven't even compile tested, but something like this might fix?
> 
> --->8---
> --- a/criu/cr-restore.c
> +++ b/criu/cr-restore.c
> @@ -1510,6 +1510,7 @@ static int do_fork_with_pid(struct pstree_item
> *item, struct ns_id *pid_ns, stru
>         if (pid < 0) {
>                 pr_perror("Can't clone()");
>                 ret = -1;
> +               hlp_pid->ns[i].virt = -1;

If we have three processes A -> B -> C. PID-s of B, C have been reused.
With you patch, virt for B will be set to -1, but a process with the C
pid will be killed.

I think something like this should fix a problem:
@@ -2556,8 +2556,8 @@ out_kill:
                struct pstree_item *pi;
 
                for_each_pstree_item(pi)
-                       if (vpid(pi) > 0)
-                               kill(vpid(pi), SIGKILL);
+                       if (pi->pid->real > 0)
+                               kill(pi->pid->real, SIGKILL);
        }
 
 out:


>                 goto unblock;
>         }
> 
> 
> -- 
>              Dmitry
Dmitry Safonov May 23, 2018, 9:31 p.m.
2018-05-23 22:26 GMT+01:00 Andrei Vagin <avagin@virtuozzo.com>:
> On Wed, May 16, 2018 at 05:16:12PM +0100, Dmitry Safonov wrote:
>> 2018-05-16 16:52 GMT+01:00 Adrian Reber <adrian@lisas.de>:
>> > Hello Andrei,
>> >
>> > we quickly chatted about the tests in test/others/rpc and I had a look
>> > at it.
>> >
>> > One test (errno.py) dumps itself, keeps on running and tries to restore
>> > itself. This should (and does fail) as the PID is already in use by the
>> > test case. The test case wants to to test the error handling of 'criu
>> > service'. But CRIU just kills all processes which it failed to restore
>> > in criu/cr-restore.c (around line 2240 : kill(vpid(pi), SIGKILL);).
>> >
>> > As the test case is killed by CRIU it counts as a failure. I guess this
>> > used to work at some point in time, but with 3.8.1 it does not.
>> >
>> > I am not sure what the right solution is. Should CRIU kill a process if
>> > restore fails, probably to make sure that no half restored process still
>> > exists. Maybe the test case does not make sense anymore...
>> >
>> > Do you have any thoughts on this? Is this a CRIU bug or a wrong test case?
>>
>> It looks to me like a valid test. CRIU shouldn't kill existing tasks with
>> reused PID. Do we run this errno.py on Travis?
>>
>> I haven't even compile tested, but something like this might fix?
>>
>> --->8---
>> --- a/criu/cr-restore.c
>> +++ b/criu/cr-restore.c
>> @@ -1510,6 +1510,7 @@ static int do_fork_with_pid(struct pstree_item
>> *item, struct ns_id *pid_ns, stru
>>         if (pid < 0) {
>>                 pr_perror("Can't clone()");
>>                 ret = -1;
>> +               hlp_pid->ns[i].virt = -1;
>
> If we have three processes A -> B -> C. PID-s of B, C have been reused.
> With you patch, virt for B will be set to -1, but a process with the C
> pid will be killed.
>
> I think something like this should fix a problem:

Yeah, probably - I haven't looked at the issue closely, only glanced
and that diff looked meaningful, altrough it had a chance to be comple
and utter garbage.

> @@ -2556,8 +2556,8 @@ out_kill:
>                 struct pstree_item *pi;
>
>                 for_each_pstree_item(pi)
> -                       if (vpid(pi) > 0)
> -                               kill(vpid(pi), SIGKILL);
> +                       if (pi->pid->real > 0)
> +                               kill(pi->pid->real, SIGKILL);

Nice!

Thanks,
             Dmitry