spfs: Main process wakes and kills its children on exit

Submitted by Kirill Tkhai on Jan. 23, 2018, 9:41 a.m.

Details

Message ID 151670048861.16661.13744998895733500828.stgit@localhost.localdomain
State New
Series "spfs: Main process wakes and kills its children on exit"
Headers show

Commit Message

Kirill Tkhai Jan. 23, 2018, 9:41 a.m.
Stanislav Kinsburskiy says:

"SPFS manager has a special "--exit-with-spfs" options, which is used by CRIU.
 The idea of the option is simple: force SPFS manager to exit, when it has some
 SPFS processes among its children (i.e. spfs was mounted at least once),
 but all these processes have exited for whatever reason (which usually happens
 when restore has failed and spfs mounts where unmounted).
 Although it works in overall (main SPFS manager process exits), its children
 (responsible to SPFS replacement) may wait on FUTEX for "release" command
 for corresponding SPFS mount and thus never stop until they are killed".

1 spfs-manager
2   \_ spfs
3   \_ spfs-manager
4   \_ spfs
5   \_ spfs-manager

2 and 3 are pair of a mount, and 4 and 5 are pair of another mount.
The patch makes spfs-manager 1 kill 3 in case of 2 exited.

https://jira.sw.ru/browse/PSBM-80055

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
---
 manager/context.c |    4 ++++
 manager/spfs.c    |    1 +
 2 files changed, 5 insertions(+)

Patch hide | download patch | download mbox

diff --git a/manager/context.c b/manager/context.c
index 1eb37c9..4464a23 100644
--- a/manager/context.c
+++ b/manager/context.c
@@ -53,6 +53,9 @@  static void cleanup_spfs_mount(struct spfs_manager_context_s *ctx,
 		/* SPFS master was killed. We need to release the reference */
 		spfs_release_mnt(info);
 
+	if (killed || WEXITSTATUS(status))
+		kill(info->replacer, SIGKILL);
+
 	info->dead = true;
 	del_spfs_info(ctx->spfs_mounts, info);
 
@@ -88,6 +91,7 @@  static void sigchld_handler(int signal, siginfo_t *siginfo, void *data)
 				 * corresponding fd.
 				 */
 				spfs_release_mnt(info);
+			info->replacer = -1;
 		} else {
 			info = find_spfs_by_pid(ctx->spfs_mounts, pid);
 			if (info) {
diff --git a/manager/spfs.c b/manager/spfs.c
index 3e0f667..0b94587 100644
--- a/manager/spfs.c
+++ b/manager/spfs.c
@@ -39,6 +39,7 @@  int create_spfs_info(const char *id,
 		pr_err("failed to allocate\n");
 		return -ENOMEM;
 	}
+	info->replacer = -1;
 
 	err = init_mount_info(&info->mnt, id, mountpoint, ns_mountpoint);
 	if (err)

Comments

Kirill Tkhai Jan. 23, 2018, 9:47 a.m.
Missed header. Actually it's v2

On 23.01.2018 12:41, Kirill Tkhai wrote:
> Stanislav Kinsburskiy says:
> 
> "SPFS manager has a special "--exit-with-spfs" options, which is used by CRIU.
>  The idea of the option is simple: force SPFS manager to exit, when it has some
>  SPFS processes among its children (i.e. spfs was mounted at least once),
>  but all these processes have exited for whatever reason (which usually happens
>  when restore has failed and spfs mounts where unmounted).
>  Although it works in overall (main SPFS manager process exits), its children
>  (responsible to SPFS replacement) may wait on FUTEX for "release" command
>  for corresponding SPFS mount and thus never stop until they are killed".
> 
> 1 spfs-manager
> 2   \_ spfs
> 3   \_ spfs-manager
> 4   \_ spfs
> 5   \_ spfs-manager
> 
> 2 and 3 are pair of a mount, and 4 and 5 are pair of another mount.
> The patch makes spfs-manager 1 kill 3 in case of 2 exited.
> 
> https://jira.sw.ru/browse/PSBM-80055
> 
> Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
> ---
>  manager/context.c |    4 ++++
>  manager/spfs.c    |    1 +
>  2 files changed, 5 insertions(+)
> 
> diff --git a/manager/context.c b/manager/context.c
> index 1eb37c9..4464a23 100644
> --- a/manager/context.c
> +++ b/manager/context.c
> @@ -53,6 +53,9 @@ static void cleanup_spfs_mount(struct spfs_manager_context_s *ctx,
>  		/* SPFS master was killed. We need to release the reference */
>  		spfs_release_mnt(info);
>  
> +	if (killed || WEXITSTATUS(status))
> +		kill(info->replacer, SIGKILL);
> +
>  	info->dead = true;
>  	del_spfs_info(ctx->spfs_mounts, info);
>  
> @@ -88,6 +91,7 @@ static void sigchld_handler(int signal, siginfo_t *siginfo, void *data)
>  				 * corresponding fd.
>  				 */
>  				spfs_release_mnt(info);
> +			info->replacer = -1;
>  		} else {
>  			info = find_spfs_by_pid(ctx->spfs_mounts, pid);
>  			if (info) {
> diff --git a/manager/spfs.c b/manager/spfs.c
> index 3e0f667..0b94587 100644
> --- a/manager/spfs.c
> +++ b/manager/spfs.c
> @@ -39,6 +39,7 @@ int create_spfs_info(const char *id,
>  		pr_err("failed to allocate\n");
>  		return -ENOMEM;
>  	}
> +	info->replacer = -1;
>  
>  	err = init_mount_info(&info->mnt, id, mountpoint, ns_mountpoint);
>  	if (err)
>
Stanislav Kinsburskiy Jan. 23, 2018, 9:48 a.m.
Please, see a couple of nits below

23.01.2018 10:41, Kirill Tkhai пишет:
> Stanislav Kinsburskiy says:
> 
> "SPFS manager has a special "--exit-with-spfs" options, which is used by CRIU.
>  The idea of the option is simple: force SPFS manager to exit, when it has some
>  SPFS processes among its children (i.e. spfs was mounted at least once),
>  but all these processes have exited for whatever reason (which usually happens
>  when restore has failed and spfs mounts where unmounted).
>  Although it works in overall (main SPFS manager process exits), its children
>  (responsible to SPFS replacement) may wait on FUTEX for "release" command
>  for corresponding SPFS mount and thus never stop until they are killed".
> 
> 1 spfs-manager
> 2   \_ spfs
> 3   \_ spfs-manager
> 4   \_ spfs
> 5   \_ spfs-manager
> 
> 2 and 3 are pair of a mount, and 4 and 5 are pair of another mount.
> The patch makes spfs-manager 1 kill 3 in case of 2 exited.
> 
> https://jira.sw.ru/browse/PSBM-80055
> 
> Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
> ---
>  manager/context.c |    4 ++++
>  manager/spfs.c    |    1 +
>  2 files changed, 5 insertions(+)
> 
> diff --git a/manager/context.c b/manager/context.c
> index 1eb37c9..4464a23 100644
> --- a/manager/context.c
> +++ b/manager/context.c
> @@ -53,6 +53,9 @@ static void cleanup_spfs_mount(struct spfs_manager_context_s *ctx,
>  		/* SPFS master was killed. We need to release the reference */
>  		spfs_release_mnt(info);
>  
> +	if (killed || WEXITSTATUS(status))
> +		kill(info->replacer, SIGKILL);
> +

There is "if (killed)" check above.
Could you please move this hunk there?

>  	info->dead = true;
>  	del_spfs_info(ctx->spfs_mounts, info);
>  
> @@ -88,6 +91,7 @@ static void sigchld_handler(int signal, siginfo_t *siginfo, void *data)
>  				 * corresponding fd.
>  				 */
>  				spfs_release_mnt(info);
> +			info->replacer = -1;
>  		} else {
>  			info = find_spfs_by_pid(ctx->spfs_mounts, pid);
>  			if (info) {
> diff --git a/manager/spfs.c b/manager/spfs.c
> index 3e0f667..0b94587 100644
> --- a/manager/spfs.c
> +++ b/manager/spfs.c
> @@ -39,6 +39,7 @@ int create_spfs_info(const char *id,
>  		pr_err("failed to allocate\n");
>  		return -ENOMEM;
>  	}
> +	info->replacer = -1;
>  

Could you, please, move it down to the end of the function where unconditional actions take place?

>  	err = init_mount_info(&info->mnt, id, mountpoint, ns_mountpoint);
>  	if (err)
>
Kirill Tkhai Jan. 23, 2018, 9:51 a.m.
On 23.01.2018 12:48, Stanislav Kinsburskiy wrote:
> Please, see a couple of nits below
> 
> 23.01.2018 10:41, Kirill Tkhai пишет:
>> Stanislav Kinsburskiy says:
>>
>> "SPFS manager has a special "--exit-with-spfs" options, which is used by CRIU.
>>  The idea of the option is simple: force SPFS manager to exit, when it has some
>>  SPFS processes among its children (i.e. spfs was mounted at least once),
>>  but all these processes have exited for whatever reason (which usually happens
>>  when restore has failed and spfs mounts where unmounted).
>>  Although it works in overall (main SPFS manager process exits), its children
>>  (responsible to SPFS replacement) may wait on FUTEX for "release" command
>>  for corresponding SPFS mount and thus never stop until they are killed".
>>
>> 1 spfs-manager
>> 2   \_ spfs
>> 3   \_ spfs-manager
>> 4   \_ spfs
>> 5   \_ spfs-manager
>>
>> 2 and 3 are pair of a mount, and 4 and 5 are pair of another mount.
>> The patch makes spfs-manager 1 kill 3 in case of 2 exited.
>>
>> https://jira.sw.ru/browse/PSBM-80055
>>
>> Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
>> ---
>>  manager/context.c |    4 ++++
>>  manager/spfs.c    |    1 +
>>  2 files changed, 5 insertions(+)
>>
>> diff --git a/manager/context.c b/manager/context.c
>> index 1eb37c9..4464a23 100644
>> --- a/manager/context.c
>> +++ b/manager/context.c
>> @@ -53,6 +53,9 @@ static void cleanup_spfs_mount(struct spfs_manager_context_s *ctx,
>>  		/* SPFS master was killed. We need to release the reference */
>>  		spfs_release_mnt(info);
>>  
>> +	if (killed || WEXITSTATUS(status))
>> +		kill(info->replacer, SIGKILL);
>> +
> 
> There is "if (killed)" check above.
> Could you please move this hunk there?

There is logical OR (||) in the hunk. How should I move it to unconditional check?

>>  	info->dead = true;
>>  	del_spfs_info(ctx->spfs_mounts, info);
>>  
>> @@ -88,6 +91,7 @@ static void sigchld_handler(int signal, siginfo_t *siginfo, void *data)
>>  				 * corresponding fd.
>>  				 */
>>  				spfs_release_mnt(info);
>> +			info->replacer = -1;
>>  		} else {
>>  			info = find_spfs_by_pid(ctx->spfs_mounts, pid);
>>  			if (info) {
>> diff --git a/manager/spfs.c b/manager/spfs.c
>> index 3e0f667..0b94587 100644
>> --- a/manager/spfs.c
>> +++ b/manager/spfs.c
>> @@ -39,6 +39,7 @@ int create_spfs_info(const char *id,
>>  		pr_err("failed to allocate\n");
>>  		return -ENOMEM;
>>  	}
>> +	info->replacer = -1;
>>  
> 
> Could you, please, move it down to the end of the function where unconditional actions take place?

No, problem

>>  	err = init_mount_info(&info->mnt, id, mountpoint, ns_mountpoint);
>>  	if (err)
>>
Stanislav Kinsburskiy Jan. 23, 2018, 10:07 a.m.
23.01.2018 10:51, Kirill Tkhai пишет:
> On 23.01.2018 12:48, Stanislav Kinsburskiy wrote:
>> Please, see a couple of nits below
>>
>> 23.01.2018 10:41, Kirill Tkhai пишет:
>>> Stanislav Kinsburskiy says:
>>>
>>> "SPFS manager has a special "--exit-with-spfs" options, which is used by CRIU.
>>>  The idea of the option is simple: force SPFS manager to exit, when it has some
>>>  SPFS processes among its children (i.e. spfs was mounted at least once),
>>>  but all these processes have exited for whatever reason (which usually happens
>>>  when restore has failed and spfs mounts where unmounted).
>>>  Although it works in overall (main SPFS manager process exits), its children
>>>  (responsible to SPFS replacement) may wait on FUTEX for "release" command
>>>  for corresponding SPFS mount and thus never stop until they are killed".
>>>
>>> 1 spfs-manager
>>> 2   \_ spfs
>>> 3   \_ spfs-manager
>>> 4   \_ spfs
>>> 5   \_ spfs-manager
>>>
>>> 2 and 3 are pair of a mount, and 4 and 5 are pair of another mount.
>>> The patch makes spfs-manager 1 kill 3 in case of 2 exited.
>>>
>>> https://jira.sw.ru/browse/PSBM-80055
>>>
>>> Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
>>> ---
>>>  manager/context.c |    4 ++++
>>>  manager/spfs.c    |    1 +
>>>  2 files changed, 5 insertions(+)
>>>
>>> diff --git a/manager/context.c b/manager/context.c
>>> index 1eb37c9..4464a23 100644
>>> --- a/manager/context.c
>>> +++ b/manager/context.c
>>> @@ -53,6 +53,9 @@ static void cleanup_spfs_mount(struct spfs_manager_context_s *ctx,
>>>  		/* SPFS master was killed. We need to release the reference */
>>>  		spfs_release_mnt(info);
>>>  
>>> +	if (killed || WEXITSTATUS(status))
>>> +		kill(info->replacer, SIGKILL);
>>> +
>>
>> There is "if (killed)" check above.
>> Could you please move this hunk there?
> 
> There is logical OR (||) in the hunk. How should I move it to unconditional check?
> 

Ah, ok. Then let the check be as it is.
Could you please add warning message for this kill with the process pid being killed and the reason why (spfs was killed of exited with error)?
Kirill Tkhai Jan. 23, 2018, 10:08 a.m.
On 23.01.2018 13:07, Stanislav Kinsburskiy wrote:
> 
> 
> 23.01.2018 10:51, Kirill Tkhai пишет:
>> On 23.01.2018 12:48, Stanislav Kinsburskiy wrote:
>>> Please, see a couple of nits below
>>>
>>> 23.01.2018 10:41, Kirill Tkhai пишет:
>>>> Stanislav Kinsburskiy says:
>>>>
>>>> "SPFS manager has a special "--exit-with-spfs" options, which is used by CRIU.
>>>>  The idea of the option is simple: force SPFS manager to exit, when it has some
>>>>  SPFS processes among its children (i.e. spfs was mounted at least once),
>>>>  but all these processes have exited for whatever reason (which usually happens
>>>>  when restore has failed and spfs mounts where unmounted).
>>>>  Although it works in overall (main SPFS manager process exits), its children
>>>>  (responsible to SPFS replacement) may wait on FUTEX for "release" command
>>>>  for corresponding SPFS mount and thus never stop until they are killed".
>>>>
>>>> 1 spfs-manager
>>>> 2   \_ spfs
>>>> 3   \_ spfs-manager
>>>> 4   \_ spfs
>>>> 5   \_ spfs-manager
>>>>
>>>> 2 and 3 are pair of a mount, and 4 and 5 are pair of another mount.
>>>> The patch makes spfs-manager 1 kill 3 in case of 2 exited.
>>>>
>>>> https://jira.sw.ru/browse/PSBM-80055
>>>>
>>>> Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
>>>> ---
>>>>  manager/context.c |    4 ++++
>>>>  manager/spfs.c    |    1 +
>>>>  2 files changed, 5 insertions(+)
>>>>
>>>> diff --git a/manager/context.c b/manager/context.c
>>>> index 1eb37c9..4464a23 100644
>>>> --- a/manager/context.c
>>>> +++ b/manager/context.c
>>>> @@ -53,6 +53,9 @@ static void cleanup_spfs_mount(struct spfs_manager_context_s *ctx,
>>>>  		/* SPFS master was killed. We need to release the reference */
>>>>  		spfs_release_mnt(info);
>>>>  
>>>> +	if (killed || WEXITSTATUS(status))
>>>> +		kill(info->replacer, SIGKILL);
>>>> +
>>>
>>> There is "if (killed)" check above.
>>> Could you please move this hunk there?
>>
>> There is logical OR (||) in the hunk. How should I move it to unconditional check?
>>
> 
> Ah, ok. Then let the check be as it is.
> Could you please add warning message for this kill with the process pid being killed and the reason why (spfs was killed of exited with error)?

Ok
Kirill Tkhai Jan. 23, 2018, 10:09 a.m.
On 23.01.2018 13:07, Stanislav Kinsburskiy wrote:
> 
> 
> 23.01.2018 10:51, Kirill Tkhai пишет:
>> On 23.01.2018 12:48, Stanislav Kinsburskiy wrote:
>>> Please, see a couple of nits below
>>>
>>> 23.01.2018 10:41, Kirill Tkhai пишет:
>>>> Stanislav Kinsburskiy says:
>>>>
>>>> "SPFS manager has a special "--exit-with-spfs" options, which is used by CRIU.
>>>>  The idea of the option is simple: force SPFS manager to exit, when it has some
>>>>  SPFS processes among its children (i.e. spfs was mounted at least once),
>>>>  but all these processes have exited for whatever reason (which usually happens
>>>>  when restore has failed and spfs mounts where unmounted).
>>>>  Although it works in overall (main SPFS manager process exits), its children
>>>>  (responsible to SPFS replacement) may wait on FUTEX for "release" command
>>>>  for corresponding SPFS mount and thus never stop until they are killed".
>>>>
>>>> 1 spfs-manager
>>>> 2   \_ spfs
>>>> 3   \_ spfs-manager
>>>> 4   \_ spfs
>>>> 5   \_ spfs-manager
>>>>
>>>> 2 and 3 are pair of a mount, and 4 and 5 are pair of another mount.
>>>> The patch makes spfs-manager 1 kill 3 in case of 2 exited.
>>>>
>>>> https://jira.sw.ru/browse/PSBM-80055
>>>>
>>>> Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
>>>> ---
>>>>  manager/context.c |    4 ++++
>>>>  manager/spfs.c    |    1 +
>>>>  2 files changed, 5 insertions(+)
>>>>
>>>> diff --git a/manager/context.c b/manager/context.c
>>>> index 1eb37c9..4464a23 100644
>>>> --- a/manager/context.c
>>>> +++ b/manager/context.c
>>>> @@ -53,6 +53,9 @@ static void cleanup_spfs_mount(struct spfs_manager_context_s *ctx,
>>>>  		/* SPFS master was killed. We need to release the reference */
>>>>  		spfs_release_mnt(info);
>>>>  
>>>> +	if (killed || WEXITSTATUS(status))
>>>> +		kill(info->replacer, SIGKILL);
>>>> +
>>>
>>> There is "if (killed)" check above.
>>> Could you please move this hunk there?
>>
>> There is logical OR (||) in the hunk. How should I move it to unconditional check?
>>
> 
> Ah, ok. Then let the check be as it is.
> Could you please add warning message for this kill with the process pid being killed and the reason why (spfs was killed of exited with error)?

Maybe we should call spfs_release_mnt() in case of exit status != 0 too?
Stanislav Kinsburskiy Jan. 23, 2018, 10:14 a.m.
23.01.2018 11:09, Kirill Tkhai пишет:
> On 23.01.2018 13:07, Stanislav Kinsburskiy wrote:
>>
>>
>> 23.01.2018 10:51, Kirill Tkhai пишет:
>>> On 23.01.2018 12:48, Stanislav Kinsburskiy wrote:
>>>> Please, see a couple of nits below
>>>>
>>>> 23.01.2018 10:41, Kirill Tkhai пишет:
>>>>> Stanislav Kinsburskiy says:
>>>>>
>>>>> "SPFS manager has a special "--exit-with-spfs" options, which is used by CRIU.
>>>>>  The idea of the option is simple: force SPFS manager to exit, when it has some
>>>>>  SPFS processes among its children (i.e. spfs was mounted at least once),
>>>>>  but all these processes have exited for whatever reason (which usually happens
>>>>>  when restore has failed and spfs mounts where unmounted).
>>>>>  Although it works in overall (main SPFS manager process exits), its children
>>>>>  (responsible to SPFS replacement) may wait on FUTEX for "release" command
>>>>>  for corresponding SPFS mount and thus never stop until they are killed".
>>>>>
>>>>> 1 spfs-manager
>>>>> 2   \_ spfs
>>>>> 3   \_ spfs-manager
>>>>> 4   \_ spfs
>>>>> 5   \_ spfs-manager
>>>>>
>>>>> 2 and 3 are pair of a mount, and 4 and 5 are pair of another mount.
>>>>> The patch makes spfs-manager 1 kill 3 in case of 2 exited.
>>>>>
>>>>> https://jira.sw.ru/browse/PSBM-80055
>>>>>
>>>>> Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
>>>>> ---
>>>>>  manager/context.c |    4 ++++
>>>>>  manager/spfs.c    |    1 +
>>>>>  2 files changed, 5 insertions(+)
>>>>>
>>>>> diff --git a/manager/context.c b/manager/context.c
>>>>> index 1eb37c9..4464a23 100644
>>>>> --- a/manager/context.c
>>>>> +++ b/manager/context.c
>>>>> @@ -53,6 +53,9 @@ static void cleanup_spfs_mount(struct spfs_manager_context_s *ctx,
>>>>>  		/* SPFS master was killed. We need to release the reference */
>>>>>  		spfs_release_mnt(info);
>>>>>  
>>>>> +	if (killed || WEXITSTATUS(status))
>>>>> +		kill(info->replacer, SIGKILL);
>>>>> +
>>>>
>>>> There is "if (killed)" check above.
>>>> Could you please move this hunk there?
>>>
>>> There is logical OR (||) in the hunk. How should I move it to unconditional check?
>>>
>>
>> Ah, ok. Then let the check be as it is.
>> Could you please add warning message for this kill with the process pid being killed and the reason why (spfs was killed of exited with error)?
> 
> Maybe we should call spfs_release_mnt() in case of exit status != 0 too?
> 

Well, this makes sense.
If spfs exited with non-zero result (either it was killed or exited due to some error), then there is no need in replacer. And there is no need in the reference.
So, probably this check you proposed should be used for both spfs_release_mnt(info) and kill(info->replacer, SIGKILL).
Kirill Tkhai Jan. 23, 2018, 10:17 a.m.
On 23.01.2018 13:14, Stanislav Kinsburskiy wrote:
> 
> 
> 23.01.2018 11:09, Kirill Tkhai пишет:
>> On 23.01.2018 13:07, Stanislav Kinsburskiy wrote:
>>>
>>>
>>> 23.01.2018 10:51, Kirill Tkhai пишет:
>>>> On 23.01.2018 12:48, Stanislav Kinsburskiy wrote:
>>>>> Please, see a couple of nits below
>>>>>
>>>>> 23.01.2018 10:41, Kirill Tkhai пишет:
>>>>>> Stanislav Kinsburskiy says:
>>>>>>
>>>>>> "SPFS manager has a special "--exit-with-spfs" options, which is used by CRIU.
>>>>>>  The idea of the option is simple: force SPFS manager to exit, when it has some
>>>>>>  SPFS processes among its children (i.e. spfs was mounted at least once),
>>>>>>  but all these processes have exited for whatever reason (which usually happens
>>>>>>  when restore has failed and spfs mounts where unmounted).
>>>>>>  Although it works in overall (main SPFS manager process exits), its children
>>>>>>  (responsible to SPFS replacement) may wait on FUTEX for "release" command
>>>>>>  for corresponding SPFS mount and thus never stop until they are killed".
>>>>>>
>>>>>> 1 spfs-manager
>>>>>> 2   \_ spfs
>>>>>> 3   \_ spfs-manager
>>>>>> 4   \_ spfs
>>>>>> 5   \_ spfs-manager
>>>>>>
>>>>>> 2 and 3 are pair of a mount, and 4 and 5 are pair of another mount.
>>>>>> The patch makes spfs-manager 1 kill 3 in case of 2 exited.
>>>>>>
>>>>>> https://jira.sw.ru/browse/PSBM-80055
>>>>>>
>>>>>> Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
>>>>>> ---
>>>>>>  manager/context.c |    4 ++++
>>>>>>  manager/spfs.c    |    1 +
>>>>>>  2 files changed, 5 insertions(+)
>>>>>>
>>>>>> diff --git a/manager/context.c b/manager/context.c
>>>>>> index 1eb37c9..4464a23 100644
>>>>>> --- a/manager/context.c
>>>>>> +++ b/manager/context.c
>>>>>> @@ -53,6 +53,9 @@ static void cleanup_spfs_mount(struct spfs_manager_context_s *ctx,
>>>>>>  		/* SPFS master was killed. We need to release the reference */
>>>>>>  		spfs_release_mnt(info);
>>>>>>  
>>>>>> +	if (killed || WEXITSTATUS(status))
>>>>>> +		kill(info->replacer, SIGKILL);
>>>>>> +
>>>>>
>>>>> There is "if (killed)" check above.
>>>>> Could you please move this hunk there?
>>>>
>>>> There is logical OR (||) in the hunk. How should I move it to unconditional check?
>>>>
>>>
>>> Ah, ok. Then let the check be as it is.
>>> Could you please add warning message for this kill with the process pid being killed and the reason why (spfs was killed of exited with error)?
>>
>> Maybe we should call spfs_release_mnt() in case of exit status != 0 too?
>>
> 
> Well, this makes sense.
> If spfs exited with non-zero result (either it was killed or exited due to some error), then there is no need in replacer. And there is no need in the reference.
> So, probably this check you proposed should be used for both spfs_release_mnt(info) and kill(info->replacer, SIGKILL).

Are you OK with this change sent in the only patch, or should we use one more patch?
Stanislav Kinsburskiy Jan. 23, 2018, 10:18 a.m.
23.01.2018 11:17, Kirill Tkhai пишет:
> On 23.01.2018 13:14, Stanislav Kinsburskiy wrote:
>>
>>
>> 23.01.2018 11:09, Kirill Tkhai пишет:
>>> On 23.01.2018 13:07, Stanislav Kinsburskiy wrote:
>>>>
>>>>
>>>> 23.01.2018 10:51, Kirill Tkhai пишет:
>>>>> On 23.01.2018 12:48, Stanislav Kinsburskiy wrote:
>>>>>> Please, see a couple of nits below
>>>>>>
>>>>>> 23.01.2018 10:41, Kirill Tkhai пишет:
>>>>>>> Stanislav Kinsburskiy says:
>>>>>>>
>>>>>>> "SPFS manager has a special "--exit-with-spfs" options, which is used by CRIU.
>>>>>>>  The idea of the option is simple: force SPFS manager to exit, when it has some
>>>>>>>  SPFS processes among its children (i.e. spfs was mounted at least once),
>>>>>>>  but all these processes have exited for whatever reason (which usually happens
>>>>>>>  when restore has failed and spfs mounts where unmounted).
>>>>>>>  Although it works in overall (main SPFS manager process exits), its children
>>>>>>>  (responsible to SPFS replacement) may wait on FUTEX for "release" command
>>>>>>>  for corresponding SPFS mount and thus never stop until they are killed".
>>>>>>>
>>>>>>> 1 spfs-manager
>>>>>>> 2   \_ spfs
>>>>>>> 3   \_ spfs-manager
>>>>>>> 4   \_ spfs
>>>>>>> 5   \_ spfs-manager
>>>>>>>
>>>>>>> 2 and 3 are pair of a mount, and 4 and 5 are pair of another mount.
>>>>>>> The patch makes spfs-manager 1 kill 3 in case of 2 exited.
>>>>>>>
>>>>>>> https://jira.sw.ru/browse/PSBM-80055
>>>>>>>
>>>>>>> Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
>>>>>>> ---
>>>>>>>  manager/context.c |    4 ++++
>>>>>>>  manager/spfs.c    |    1 +
>>>>>>>  2 files changed, 5 insertions(+)
>>>>>>>
>>>>>>> diff --git a/manager/context.c b/manager/context.c
>>>>>>> index 1eb37c9..4464a23 100644
>>>>>>> --- a/manager/context.c
>>>>>>> +++ b/manager/context.c
>>>>>>> @@ -53,6 +53,9 @@ static void cleanup_spfs_mount(struct spfs_manager_context_s *ctx,
>>>>>>>  		/* SPFS master was killed. We need to release the reference */
>>>>>>>  		spfs_release_mnt(info);
>>>>>>>  
>>>>>>> +	if (killed || WEXITSTATUS(status))
>>>>>>> +		kill(info->replacer, SIGKILL);
>>>>>>> +
>>>>>>
>>>>>> There is "if (killed)" check above.
>>>>>> Could you please move this hunk there?
>>>>>
>>>>> There is logical OR (||) in the hunk. How should I move it to unconditional check?
>>>>>
>>>>
>>>> Ah, ok. Then let the check be as it is.
>>>> Could you please add warning message for this kill with the process pid being killed and the reason why (spfs was killed of exited with error)?
>>>
>>> Maybe we should call spfs_release_mnt() in case of exit status != 0 too?
>>>
>>
>> Well, this makes sense.
>> If spfs exited with non-zero result (either it was killed or exited due to some error), then there is no need in replacer. And there is no need in the reference.
>> So, probably this check you proposed should be used for both spfs_release_mnt(info) and kill(info->replacer, SIGKILL).
> 
> Are you OK with this change sent in the only patch, or should we use one more patch?
> 

I think one patch is OK. Only mention the change explicitly in the patch description, please.
Kirill Tkhai Jan. 23, 2018, 10:19 a.m.
On 23.01.2018 13:18, Stanislav Kinsburskiy wrote:
> 
> 
> 23.01.2018 11:17, Kirill Tkhai пишет:
>> On 23.01.2018 13:14, Stanislav Kinsburskiy wrote:
>>>
>>>
>>> 23.01.2018 11:09, Kirill Tkhai пишет:
>>>> On 23.01.2018 13:07, Stanislav Kinsburskiy wrote:
>>>>>
>>>>>
>>>>> 23.01.2018 10:51, Kirill Tkhai пишет:
>>>>>> On 23.01.2018 12:48, Stanislav Kinsburskiy wrote:
>>>>>>> Please, see a couple of nits below
>>>>>>>
>>>>>>> 23.01.2018 10:41, Kirill Tkhai пишет:
>>>>>>>> Stanislav Kinsburskiy says:
>>>>>>>>
>>>>>>>> "SPFS manager has a special "--exit-with-spfs" options, which is used by CRIU.
>>>>>>>>  The idea of the option is simple: force SPFS manager to exit, when it has some
>>>>>>>>  SPFS processes among its children (i.e. spfs was mounted at least once),
>>>>>>>>  but all these processes have exited for whatever reason (which usually happens
>>>>>>>>  when restore has failed and spfs mounts where unmounted).
>>>>>>>>  Although it works in overall (main SPFS manager process exits), its children
>>>>>>>>  (responsible to SPFS replacement) may wait on FUTEX for "release" command
>>>>>>>>  for corresponding SPFS mount and thus never stop until they are killed".
>>>>>>>>
>>>>>>>> 1 spfs-manager
>>>>>>>> 2   \_ spfs
>>>>>>>> 3   \_ spfs-manager
>>>>>>>> 4   \_ spfs
>>>>>>>> 5   \_ spfs-manager
>>>>>>>>
>>>>>>>> 2 and 3 are pair of a mount, and 4 and 5 are pair of another mount.
>>>>>>>> The patch makes spfs-manager 1 kill 3 in case of 2 exited.
>>>>>>>>
>>>>>>>> https://jira.sw.ru/browse/PSBM-80055
>>>>>>>>
>>>>>>>> Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
>>>>>>>> ---
>>>>>>>>  manager/context.c |    4 ++++
>>>>>>>>  manager/spfs.c    |    1 +
>>>>>>>>  2 files changed, 5 insertions(+)
>>>>>>>>
>>>>>>>> diff --git a/manager/context.c b/manager/context.c
>>>>>>>> index 1eb37c9..4464a23 100644
>>>>>>>> --- a/manager/context.c
>>>>>>>> +++ b/manager/context.c
>>>>>>>> @@ -53,6 +53,9 @@ static void cleanup_spfs_mount(struct spfs_manager_context_s *ctx,
>>>>>>>>  		/* SPFS master was killed. We need to release the reference */
>>>>>>>>  		spfs_release_mnt(info);
>>>>>>>>  
>>>>>>>> +	if (killed || WEXITSTATUS(status))
>>>>>>>> +		kill(info->replacer, SIGKILL);
>>>>>>>> +
>>>>>>>
>>>>>>> There is "if (killed)" check above.
>>>>>>> Could you please move this hunk there?
>>>>>>
>>>>>> There is logical OR (||) in the hunk. How should I move it to unconditional check?
>>>>>>
>>>>>
>>>>> Ah, ok. Then let the check be as it is.
>>>>> Could you please add warning message for this kill with the process pid being killed and the reason why (spfs was killed of exited with error)?
>>>>
>>>> Maybe we should call spfs_release_mnt() in case of exit status != 0 too?
>>>>
>>>
>>> Well, this makes sense.
>>> If spfs exited with non-zero result (either it was killed or exited due to some error), then there is no need in replacer. And there is no need in the reference.
>>> So, probably this check you proposed should be used for both spfs_release_mnt(info) and kill(info->replacer, SIGKILL).
>>
>> Are you OK with this change sent in the only patch, or should we use one more patch?
>>
> 
> I think one patch is OK. Only mention the change explicitly in the patch description, please.

Ok, then what about spfs_cleanup_env(info, killed)? It also has killed argument