[rh7] : ve: Provide interface for current tty inheritance

Submitted by Kirill Gorkunov on Feb. 15, 2018, 6:54 p.m.

Details

Message ID 20180215185439.GA12701@uranus.lan
State New
Series ": ve: Provide interface for current tty inheritance"
Headers show

Commit Message

Kirill Gorkunov Feb. 15, 2018, 6:54 p.m.
When fork() is called the current controlling terminal is inherited
by a child process. But in criu we fork all process first and then
restore their files, thus if terminal is opened in some children
its reference get lost. We refuse to checkpoint such configurations
at the moment in criu itself.

So to be able to restore this kind of container we need a way to
propagate controlling terminal to children processes, and here
is an interface "ve.ctty" entry on toplevel ve cgroup. One have
to pass pid of the donor task in first position followed by a series
of recipient pids.

https://jira.sw.ru/browse/PSBM-76490

CC: Andrey Vagin <avagin@virtuozzo.com>
CC: Andrey Ryabinin <aryabinin@virtuozzo.com>
CC: Konstantin Khorenko <khorenko@virtuozzo.com>
CC: "Denis V. Lunev" <den@virtuozzo.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@virtuozzo.com>
---
Guys, take a look please. Note the patch on its own is useless
because CRIU support needed as well. I already have a working
version of patches for CRIU, but they are ~300 lines of code
so I'm trying to shrink them because it's too much.

 kernel/ve/ve.c |   96 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 96 insertions(+)

Patch hide | download patch | download mbox

Index: linux-pcs7.git/kernel/ve/ve.c
===================================================================
--- linux-pcs7.git.orig/kernel/ve/ve.c
+++ linux-pcs7.git/kernel/ve/ve.c
@@ -37,6 +37,7 @@ 
 #include <linux/fs_struct.h>
 #include <linux/task_work.h>
 #include <linux/ctype.h>
+#include <linux/tty.h>
 
 #include <uapi/linux/vzcalluser.h>
 #include <linux/vziptable_defs.h>
@@ -1214,6 +1215,7 @@  enum {
 	VE_CF_NETNS_NR,
 	VE_CF_NETIF_MAX_NR,
 	VE_CF_NETIF_NR,
+	VE_CF_CTTY,
 };
 
 static int ve_ts_read(struct cgroup *cg, struct cftype *cft, struct seq_file *m)
@@ -1393,6 +1395,94 @@  static int ve_write_running_u64(struct c
 	return _ve_write_u64(cg, cft, value, 1);
 }
 
+static int ve_write_ctty(struct cgroup *cg, struct cftype *cft, const char *buffer)
+{
+	struct task_struct *tsk_from, *tsk_to;
+	struct tty_struct *tty_from, *tty_to;
+	pid_t pid_from, pid_to;
+	unsigned long flags;
+	char *pids;
+	int ret;
+
+	/*
+	 * Buffer format is the following
+	 *
+	 * 	pid_from pid_to pid_to ...
+	 *
+	 * where pid_to are pids to propagate
+	 * current terminal into.
+	 */
+
+	pids = skip_spaces(buffer);
+	if (sscanf(pids, "%d", &pid_from) != 1)
+		return -EINVAL;
+	pids = strchr(pids, ' ');
+	if (!pids)
+		return -EINVAL;
+	pids = skip_spaces(pids);
+
+	rcu_read_lock();
+	tsk_from = find_task_by_vpid(pid_from);
+	if (tsk_from)
+		get_task_struct(tsk_from);
+	rcu_read_unlock();
+
+	if (!tsk_from)
+		return -ESRCH;
+
+	spin_lock_irqsave(&tsk_from->sighand->siglock, flags);
+	tty_from = tty_kref_get(tsk_from->signal->tty);
+	spin_unlock_irqrestore(&tsk_from->sighand->siglock, flags);
+
+	if (!tty_from) {
+		ret = -ENOTTY;
+		goto out;
+	}
+
+	ret = 0;
+	while (pids && *pids) {
+		if (sscanf(pids, "%d", &pid_to) != 1) {
+			ret = -EINVAL;
+			goto out;
+		}
+		pids = strchr(pids, ' ');
+		if (pids)
+			pids = skip_spaces(pids);
+
+		rcu_read_lock();
+		tsk_to = find_task_by_vpid(pid_to);
+		if (tsk_to)
+			get_task_struct(tsk_to);
+		rcu_read_unlock();
+
+		if (!tsk_to) {
+			ret = -ESRCH;
+			goto out;
+		}
+
+		if (tsk_from->task_ve == tsk_to->task_ve) {
+			spin_lock_irqsave(&tsk_to->sighand->siglock, flags);
+			tty_to = tsk_to->signal->tty;
+			if (!tty_to)
+				tsk_to->signal->tty = tty_kref_get(tty_from);
+			else
+				ret = -EBUSY;
+			spin_unlock_irqrestore(&tsk_to->sighand->siglock, flags);
+		} else
+			ret = -EINVAL;
+
+		put_task_struct(tsk_to);
+
+		if (ret)
+			goto out;
+	}
+
+out:
+	tty_kref_put(tty_from);
+	put_task_struct(tsk_from);
+	return ret;
+}
+
 static struct cftype ve_cftypes[] = {
 	{
 		.name			= "state",
@@ -1496,6 +1586,12 @@  static struct cftype ve_cftypes[] = {
 		.read_u64		= ve_read_u64,
 		.private		= VE_CF_NETIF_NR,
 	},
+	{
+		.name			= "ctty",
+		.flags			= CFTYPE_ONLY_ON_ROOT,
+		.write_string		= ve_write_ctty,
+		.private		= VE_CF_CTTY,
+	},
 	{ }
 };
 

Comments

Andrey Vagin Feb. 15, 2018, 10:18 p.m.
On Thu, Feb 15, 2018 at 09:54:39PM +0300, Cyrill Gorcunov wrote:
> When fork() is called the current controlling terminal is inherited
> by a child process. But in criu we fork all process first and then
> restore their files, thus if terminal is opened in some children
> its reference get lost. We refuse to checkpoint such configurations
> at the moment in criu itself.
> 
> So to be able to restore this kind of container we need a way to
> propagate controlling terminal to children processes, and here
> is an interface "ve.ctty" entry on toplevel ve cgroup. One have
> to pass pid of the donor task in first position followed by a series
> of recipient pids.

Cyrill, you chose a very weird interface. I think we can add an ioctl to
set a current terminal for a task. And I think we have a good chance to
push it into the upstream kernle.


Thanks,
Andrei


> 
> https://jira.sw.ru/browse/PSBM-76490
> 
> CC: Andrey Vagin <avagin@virtuozzo.com>
> CC: Andrey Ryabinin <aryabinin@virtuozzo.com>
> CC: Konstantin Khorenko <khorenko@virtuozzo.com>
> CC: "Denis V. Lunev" <den@virtuozzo.com>
> Signed-off-by: Cyrill Gorcunov <gorcunov@virtuozzo.com>
> ---
> Guys, take a look please. Note the patch on its own is useless
> because CRIU support needed as well. I already have a working
> version of patches for CRIU, but they are ~300 lines of code
> so I'm trying to shrink them because it's too much.
> 
>  kernel/ve/ve.c |   96 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 96 insertions(+)
> 
> Index: linux-pcs7.git/kernel/ve/ve.c
> ===================================================================
> --- linux-pcs7.git.orig/kernel/ve/ve.c
> +++ linux-pcs7.git/kernel/ve/ve.c
> @@ -37,6 +37,7 @@
>  #include <linux/fs_struct.h>
>  #include <linux/task_work.h>
>  #include <linux/ctype.h>
> +#include <linux/tty.h>
>  
>  #include <uapi/linux/vzcalluser.h>
>  #include <linux/vziptable_defs.h>
> @@ -1214,6 +1215,7 @@ enum {
>  	VE_CF_NETNS_NR,
>  	VE_CF_NETIF_MAX_NR,
>  	VE_CF_NETIF_NR,
> +	VE_CF_CTTY,
>  };
>  
>  static int ve_ts_read(struct cgroup *cg, struct cftype *cft, struct seq_file *m)
> @@ -1393,6 +1395,94 @@ static int ve_write_running_u64(struct c
>  	return _ve_write_u64(cg, cft, value, 1);
>  }
>  
> +static int ve_write_ctty(struct cgroup *cg, struct cftype *cft, const char *buffer)
> +{
> +	struct task_struct *tsk_from, *tsk_to;
> +	struct tty_struct *tty_from, *tty_to;
> +	pid_t pid_from, pid_to;
> +	unsigned long flags;
> +	char *pids;
> +	int ret;
> +
> +	/*
> +	 * Buffer format is the following
> +	 *
> +	 * 	pid_from pid_to pid_to ...
> +	 *
> +	 * where pid_to are pids to propagate
> +	 * current terminal into.
> +	 */
> +
> +	pids = skip_spaces(buffer);
> +	if (sscanf(pids, "%d", &pid_from) != 1)
> +		return -EINVAL;
> +	pids = strchr(pids, ' ');
> +	if (!pids)
> +		return -EINVAL;
> +	pids = skip_spaces(pids);
> +
> +	rcu_read_lock();
> +	tsk_from = find_task_by_vpid(pid_from);
> +	if (tsk_from)
> +		get_task_struct(tsk_from);
> +	rcu_read_unlock();
> +
> +	if (!tsk_from)
> +		return -ESRCH;
> +
> +	spin_lock_irqsave(&tsk_from->sighand->siglock, flags);
> +	tty_from = tty_kref_get(tsk_from->signal->tty);
> +	spin_unlock_irqrestore(&tsk_from->sighand->siglock, flags);
> +
> +	if (!tty_from) {
> +		ret = -ENOTTY;
> +		goto out;
> +	}
> +
> +	ret = 0;
> +	while (pids && *pids) {
> +		if (sscanf(pids, "%d", &pid_to) != 1) {
> +			ret = -EINVAL;
> +			goto out;
> +		}
> +		pids = strchr(pids, ' ');
> +		if (pids)
> +			pids = skip_spaces(pids);
> +
> +		rcu_read_lock();
> +		tsk_to = find_task_by_vpid(pid_to);
> +		if (tsk_to)
> +			get_task_struct(tsk_to);
> +		rcu_read_unlock();
> +
> +		if (!tsk_to) {
> +			ret = -ESRCH;
> +			goto out;
> +		}
> +
> +		if (tsk_from->task_ve == tsk_to->task_ve) {
> +			spin_lock_irqsave(&tsk_to->sighand->siglock, flags);
> +			tty_to = tsk_to->signal->tty;
> +			if (!tty_to)
> +				tsk_to->signal->tty = tty_kref_get(tty_from);
> +			else
> +				ret = -EBUSY;
> +			spin_unlock_irqrestore(&tsk_to->sighand->siglock, flags);
> +		} else
> +			ret = -EINVAL;
> +
> +		put_task_struct(tsk_to);
> +
> +		if (ret)
> +			goto out;
> +	}
> +
> +out:
> +	tty_kref_put(tty_from);
> +	put_task_struct(tsk_from);
> +	return ret;
> +}
> +
>  static struct cftype ve_cftypes[] = {
>  	{
>  		.name			= "state",
> @@ -1496,6 +1586,12 @@ static struct cftype ve_cftypes[] = {
>  		.read_u64		= ve_read_u64,
>  		.private		= VE_CF_NETIF_NR,
>  	},
> +	{
> +		.name			= "ctty",
> +		.flags			= CFTYPE_ONLY_ON_ROOT,
> +		.write_string		= ve_write_ctty,
> +		.private		= VE_CF_CTTY,
> +	},
>  	{ }
>  };
>
Kirill Gorkunov Feb. 16, 2018, 7:27 a.m.
On Thu, Feb 15, 2018 at 02:18:27PM -0800, Andrey Vagin wrote:
> On Thu, Feb 15, 2018 at 09:54:39PM +0300, Cyrill Gorcunov wrote:
> > When fork() is called the current controlling terminal is inherited
> > by a child process. But in criu we fork all process first and then
> > restore their files, thus if terminal is opened in some children
> > its reference get lost. We refuse to checkpoint such configurations
> > at the moment in criu itself.
> > 
> > So to be able to restore this kind of container we need a way to
> > propagate controlling terminal to children processes, and here
> > is an interface "ve.ctty" entry on toplevel ve cgroup. One have
> > to pass pid of the donor task in first position followed by a series
> > of recipient pids.
> 
> Cyrill, you chose a very weird interface. I think we can add an ioctl to
> set a current terminal for a task. And I think we have a good chance to
> push it into the upstream kernle.

This kind of interface is for several reasons:

1) We should be able to setup ctty for a number of tasks in one
   call, I can easily generate any number of forks with ctty propagated,
   you think calling ioctl for every of it is a good idea?

   Instead we collect all pids in one place and process inside one call.

2) The cgroup interface is chosen to not interfere with any constants from
   vanilla kernel, so that we can easily rebase when needed.

But thinking more I agree that pushing something similar to vanilla might
be a good idea, the vanilla criu needs this ability too. I think maybe
instead of ioctl we could use some netlink interface?

Again, I think passing all pids needed in one call is better than doing
per-task ioctl, no?