criu: tty -- Restore tty params synchronously

Submitted by Cyrill Gorcunov on April 21, 2016, 2:10 p.m.

Details

Message ID 1461247846-20266-1-git-send-email-gorcunov@virtuozzo.com
State Rejected
Series "criu: tty -- Restore tty params synchronously"
Headers show

Patch hide | download patch | download mbox

diff --git a/criu/tty.c b/criu/tty.c
index 68d7ba3..430a2c8 100644
--- a/criu/tty.c
+++ b/criu/tty.c
@@ -786,7 +786,7 @@  static int restore_tty_params(int fd, struct tty_info *info)
 		winsize_copy(&p.w, info->tie->winsize);
 	}
 
-	return userns_call(do_restore_tty_parms, UNS_ASYNC, &p, sizeof(p), fd);
+	return userns_call(do_restore_tty_parms, 0, &p, sizeof(p), fd);
 }
 
 static int pty_open_slaves(struct tty_info *info)

Comments

Kirill Gorkunov April 21, 2016, 2:13 p.m.
On Thu, Apr 21, 2016 at 05:10:46PM +0300, Cyrill Gorcunov wrote:
> Async call to do_restore_tty_parms may have a side effect -- the caller
> doesn't wait for its completion and continue processing tty associated
> with the fd. userns_call carries own copy of file descriptor and if
> main code in tty fails then both slave and master may be closed
> and only one file descriptor remains: the one which userns carries,
> but the terminal already dead, so when we call do_restore_tty_parms
> on it we get -EIO.
> 
> Thus simply run params restoration on peers in sync'ed way when we
> know terminal peers both exist and valid.
> 
> https://jira.sw.ru/browse/PSBM-46382
> 
> Signed-off-by: Cyrill Gorcunov <gorcunov@virtuozzo.com>

Actually there might be other side effects as well. For example
in the bug above (which please rip off from the commit message
when be merging):

...
(01.346469) Error (tty.c:751): tty: Can't set tty params on 1156: Input/output error
(01.346475) Error (namespaces.c:1238): uns: Async call failed. Exiting
...
Pavel Emelianov April 21, 2016, 2:22 p.m.
On 04/21/2016 05:10 PM, Cyrill Gorcunov wrote:
> Async call to do_restore_tty_parms may have a side effect -- the caller
> doesn't wait for its completion and continue processing tty associated
> with the fd. userns_call carries own copy of file descriptor and if
> main code in tty fails then both slave and master may be closed
> and only one file descriptor remains: the one which userns carries,
> but the terminal already dead, so when we call do_restore_tty_parms
> on it we get -EIO.

But the restore fails anyway, so we do we care?

> Thus simply run params restoration on peers in sync'ed way when we
> know terminal peers both exist and valid.
> 
> https://jira.sw.ru/browse/PSBM-46382
> 
> Signed-off-by: Cyrill Gorcunov <gorcunov@virtuozzo.com>
> ---
>  criu/tty.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/criu/tty.c b/criu/tty.c
> index 68d7ba3..430a2c8 100644
> --- a/criu/tty.c
> +++ b/criu/tty.c
> @@ -786,7 +786,7 @@ static int restore_tty_params(int fd, struct tty_info *info)
>  		winsize_copy(&p.w, info->tie->winsize);
>  	}
>  
> -	return userns_call(do_restore_tty_parms, UNS_ASYNC, &p, sizeof(p), fd);
> +	return userns_call(do_restore_tty_parms, 0, &p, sizeof(p), fd);
>  }
>  
>  static int pty_open_slaves(struct tty_info *info)
>
Cyrill Gorcunov April 21, 2016, 2:32 p.m.
On Thu, Apr 21, 2016 at 05:22:21PM +0300, Pavel Emelyanov wrote:
> On 04/21/2016 05:10 PM, Cyrill Gorcunov wrote:
> > Async call to do_restore_tty_parms may have a side effect -- the caller
> > doesn't wait for its completion and continue processing tty associated
> > with the fd. userns_call carries own copy of file descriptor and if
> > main code in tty fails then both slave and master may be closed
> > and only one file descriptor remains: the one which userns carries,
> > but the terminal already dead, so when we call do_restore_tty_parms
> > on it we get -EIO.
> 
> But the restore fails anyway, so we do we care?

Yes we care. The restoration of params must be done on alive terminals.
Both peers must be alive when we do such action. The idea behind of
restoring terminal params relies on this fact. Such races really
hard to catch :/
Pavel Emelianov April 21, 2016, 5:48 p.m.
On 04/21/2016 05:32 PM, Cyrill Gorcunov wrote:
> On Thu, Apr 21, 2016 at 05:22:21PM +0300, Pavel Emelyanov wrote:
>> On 04/21/2016 05:10 PM, Cyrill Gorcunov wrote:
>>> Async call to do_restore_tty_parms may have a side effect -- the caller
>>> doesn't wait for its completion and continue processing tty associated
>>> with the fd. userns_call carries own copy of file descriptor and if
>>> main code in tty fails then both slave and master may be closed
>>> and only one file descriptor remains: the one which userns carries,
>>> but the terminal already dead, so when we call do_restore_tty_parms
>>> on it we get -EIO.
>>
>> But the restore fails anyway, so we do we care?
> 
> Yes we care. The restoration of params must be done on alive terminals.

OK, but if terminal is dead, then the restore is failing. So why not
have one more failure in logs?

> Both peers must be alive when we do such action. The idea behind of
> restoring terminal params relies on this fact. Such races really
> hard to catch :/
> .
>
Cyrill Gorcunov April 21, 2016, 6:02 p.m.
On Thu, Apr 21, 2016 at 08:48:49PM +0300, Pavel Emelyanov wrote:
> >>
> >> But the restore fails anyway, so we do we care?
> > 
> > Yes we care. The restoration of params must be done on alive terminals.
> 
> OK, but if terminal is dead, then the restore is failing. So why not
> have one more failure in logs?

Because in this case we don't control when real async ioctl is called,
as far as I understand we may hit it earlier and hide real case of
problem.

After all your questions, I started doubting if the changelog is correct.
For sure the reason of failure is async ioctl call over terminal
which is not alive (since with the patch container passes this point
of restore) but maybe I've overlooked something. Gimme some time...
Cyrill Gorcunov April 28, 2016, 10:18 a.m.
On Thu, Apr 21, 2016 at 9:02 PM, Cyrill Gorcunov <gorcunov@gmail.com> wrote:
> On Thu, Apr 21, 2016 at 08:48:49PM +0300, Pavel Emelyanov wrote:
>> >>
>> >> But the restore fails anyway, so we do we care?
>> >
>> > Yes we care. The restoration of params must be done on alive terminals.
>>
>> OK, but if terminal is dead, then the restore is failing. So why not
>> have one more failure in logs?
>
> Because in this case we don't control when real async ioctl is called,
> as far as I understand we may hit it earlier and hide real case of
> problem.
>
> After all your questions, I started doubting if the changelog is correct.
> For sure the reason of failure is async ioctl call over terminal
> which is not alive (since with the patch container passes this point
> of restore) but maybe I've overlooked something. Gimme some time...

You know, we may fail inbetween, having restore-tty-params request in the fly.
Imagine

pty_open_unpaired_slave
pty_open_slaves
  ...
  fd = open_tty_reg()
  restore_tty_params --> sends request
  send_fd_to_peer(fd) -> it fails
  close_safe(&fd);
  return -1;
now pty_open_unpaired_slave will close the master but then in-flight
request may start and will exit with error.
Moreover I think we should not call userns_call if there is no p.has
set. I will optimize it on top.
So please merge this patch.
Pavel Emelianov May 4, 2016, 7:22 a.m.
On 04/28/2016 01:18 PM, Cyrill Gorcunov wrote:
> On Thu, Apr 21, 2016 at 9:02 PM, Cyrill Gorcunov <gorcunov@gmail.com> wrote:
>> On Thu, Apr 21, 2016 at 08:48:49PM +0300, Pavel Emelyanov wrote:
>>>>>
>>>>> But the restore fails anyway, so we do we care?
>>>>
>>>> Yes we care. The restoration of params must be done on alive terminals.
>>>
>>> OK, but if terminal is dead, then the restore is failing. So why not
>>> have one more failure in logs?
>>
>> Because in this case we don't control when real async ioctl is called,
>> as far as I understand we may hit it earlier and hide real case of
>> problem.
>>
>> After all your questions, I started doubting if the changelog is correct.
>> For sure the reason of failure is async ioctl call over terminal
>> which is not alive (since with the patch container passes this point
>> of restore) but maybe I've overlooked something. Gimme some time...
> 
> You know, we may fail inbetween, having restore-tty-params request in the fly.
> Imagine
> 
> pty_open_unpaired_slave
> pty_open_slaves
>   ...
>   fd = open_tty_reg()
>   restore_tty_params --> sends request
>   send_fd_to_peer(fd) -> it fails
>   close_safe(&fd);
>   return -1;
> now pty_open_unpaired_slave will close the master but then in-flight
> request may start and will exit with error.

Still, send_fd_to_peer() failed and we exit with error anyway.

> Moreover I think we should not call userns_call if there is no p.has
> set. I will optimize it on top.
> So please merge this patch.
> .
>
Kirill Gorkunov May 4, 2016, 7:36 a.m.
On Wed, May 04, 2016 at 10:22:17AM +0300, Pavel Emelyanov wrote:
> > 
> > You know, we may fail inbetween, having restore-tty-params request in the fly.
> > Imagine
> > 
> > pty_open_unpaired_slave
> > pty_open_slaves
> >   ...
> >   fd = open_tty_reg()
> >   restore_tty_params --> sends request
> >   send_fd_to_peer(fd) -> it fails
> >   close_safe(&fd);
> >   return -1;
> > now pty_open_unpaired_slave will close the master but then in-flight
> > request may start and will exit with error.
> 
> Still, send_fd_to_peer() failed and we exit with error anyway.

It doesn't matter. It's bloody wrong calling the function which
_implies_ that tty is alive on dead tty. That's the point. Either
we should call it in sync way, either should provide own completions
on tty side (which is the same as use sync flag). Frankly, Pavel,
this cause unexpected problems for no gain.
Pavel Emelianov May 4, 2016, 2:39 p.m.
On 05/04/2016 10:36 AM, Cyrill Gorcunov wrote:
> On Wed, May 04, 2016 at 10:22:17AM +0300, Pavel Emelyanov wrote:
>>>
>>> You know, we may fail inbetween, having restore-tty-params request in the fly.
>>> Imagine
>>>
>>> pty_open_unpaired_slave
>>> pty_open_slaves
>>>   ...
>>>   fd = open_tty_reg()
>>>   restore_tty_params --> sends request
>>>   send_fd_to_peer(fd) -> it fails
>>>   close_safe(&fd);
>>>   return -1;
>>> now pty_open_unpaired_slave will close the master but then in-flight
>>> request may start and will exit with error.
>>
>> Still, send_fd_to_peer() failed and we exit with error anyway.
> 
> It doesn't matter. It's bloody wrong calling the function which
> _implies_ that tty is alive on dead tty. That's the point.

If (!) this is done on error paths, then it's ... affordable. Doing
synchronous usernsd calls is worse.

> Either
> we should call it in sync way, either should provide own completions
> on tty side (which is the same as use sync flag). Frankly, Pavel,
> this cause unexpected problems for no gain.
> .
>
Kirill Gorkunov May 4, 2016, 2:52 p.m.
On Wed, May 04, 2016 at 05:39:37PM +0300, Pavel Emelyanov wrote:
> >>> now pty_open_unpaired_slave will close the master but then in-flight
> >>> request may start and will exit with error.
> >>
> >> Still, send_fd_to_peer() failed and we exit with error anyway.
> > 
> > It doesn't matter. It's bloody wrong calling the function which
> > _implies_ that tty is alive on dead tty. That's the point.
> 
> If (!) this is done on error paths, then it's ... affordable. Doing
> synchronous usernsd calls is worse.

OK, up to you.