cgroups: Rework "devices" controller properties restoration

Submitted by Cyrill Gorcunov on Sept. 1, 2016, 1:41 p.m.

Details

Message ID 1472737286-13649-1-git-send-email-gorcunov@virtuozzo.com
State Rejected
Series "cgroups: Rework "devices" controller properties restoration"
Headers show

Commit Message

Cyrill Gorcunov Sept. 1, 2016, 1:41 p.m.
Currently if there are several subcgroups in devices container
we're trying to deny any device by default first before we're
setting up allowed device from the image. That's is prohibited
by kernel because if the cgroup has an active parent noone can
add a rule to deny devices. The logic in kernel is simple: once
devices are denied from top level, all descendant cgroups are
automatically propagated with such limitation.

Thus what we need is to setup "deny" rule on toplevel cgroup.
Thus here is a proposal

 - in restore_special_props pass a level number so we won't try
   to deny devices on children which already have this rule
 - write global deny rule _iif_ it's a fresh cgroup created,
   if cgroup is already existing we assume the user already
   configured it

Signed-off-by: Cyrill Gorcunov <gorcunov@virtuozzo.com>
---

Guys, take a look please, this is on top of our vz7 CRIU but
I'll portforward it upon review.

 criu/cgroup.c | 86 ++++++++++++++++++++++++++++++++++++++++++-----------------
 1 file changed, 62 insertions(+), 24 deletions(-)

Patch hide | download patch | download mbox

diff --git a/criu/cgroup.c b/criu/cgroup.c
index e316197..7517009 100644
--- a/criu/cgroup.c
+++ b/criu/cgroup.c
@@ -1319,14 +1319,18 @@  static int prepare_cgroup_dir_properties(char *path, int off, CgroupDirEntry **e
 				if (special)
 					continue;
 
-				/* The devices cgroup must be restored in a
+				/*
+				 * The devices cgroup must be restored in a
 				 * special way: only the contents of
 				 * devices.list can be read, and it is a
 				 * whitelist of all the devices the cgroup is
-				 * allowed to create. To re-creat this
-				 * whitelist, we first deny everything via
-				 * devices.deny, and then write the list back
-				 * into devices.allow.
+				 * allowed to create.
+				 *
+				 * If the cgroup already exist then we assume
+				 * the default "deny" strategy already set by
+				 * a caller, if it doesn't then our "special"
+				 * properties handling routine will deny
+				 * everything on toplevel cgroup.
 				 *
 				 * Further, we must have a write() call for
 				 * each line, because the kernel only parses
@@ -1346,22 +1350,7 @@  static int prepare_cgroup_dir_properties(char *path, int off, CgroupDirEntry **e
 					 * this stuff.
 					 */
 					pe->perms->mode = 0200;
-
-					pe->name = "devices.deny";
-					pe->value = "a";
-					ret = restore_cgroup_prop(e->properties[j], path, off2);
-					pe->name = old_name;
-					pe->value = old_val;
-
-					if (ret < 0)
-						return -1;
-
-					pe->name = xstrdup("devices.allow");
-					if (!pe->name) {
-						pe->name = old_name;
-						return -1;
-					}
-					xfree(old_name);
+					pe->name = "devices.allow";
 
 					pos = pe->value;
 					while (*pos) {
@@ -1374,6 +1363,7 @@  static int prepare_cgroup_dir_properties(char *path, int off, CgroupDirEntry **e
 						}
 						pos += offset;
 					}
+					pe->name = old_name;
 					pe->value = old_val;
 
 				} else if (restore_cgroup_prop(e->properties[j], path, off2) < 0) {
@@ -1411,12 +1401,58 @@  int prepare_cgroup_properties(void)
 	return 0;
 }
 
-static int restore_special_props(char *paux, size_t off, CgroupDirEntry *e)
+static int restore_special_props(char *paux, size_t off, CgroupDirEntry *e, bool toplevel)
 {
 	int i, j;
 
 	pr_info("Restore special props\n");
 
+	/*
+	 * For freshly created toplevel cgroup we need to
+	 * take care of devices list separately: deny everything
+	 * by default and setup allowed devices. Upon children
+	 * creation the list get propagated.
+	 */
+	if (toplevel) {
+		for (j = 0; j < e->n_properties; j++) {
+			CgroupPropEntry *prop = e->properties[j];
+			char *old_val, *old_name, *pos;
+			int ret;
+
+			if (strcmp(prop->name, "device.list"))
+				continue;
+
+			prop->perms->mode = 0200;
+
+			old_val = prop->value;
+			old_name = prop->name;
+
+			prop->name = "devices.deny";
+			prop->value = "a";
+			ret = restore_cgroup_prop(prop, paux, off);
+			prop->value = old_val;
+
+			if (ret < 0) {
+				prop->name = old_name;
+				return -1;
+			}
+
+			prop->name = "devices.allow";
+			pos = prop->value;
+			while (*pos) {
+				int offset = next_device_entry(pos);
+				prop->value = pos;
+				ret = restore_cgroup_prop(prop, paux, off);
+				if (ret < 0) {
+					prop->value = old_val;
+					return -1;
+				}
+				pos += offset;
+			}
+			prop->value = old_val;
+		}
+	}
+
 	for (i = 0; special_props[i]; i++) {
 		const char *name = special_props[i];
 
@@ -1490,8 +1526,10 @@  static int prepare_cgroup_dirs(char **controllers, int n_controllers, char *paux
 				return -1;
 
 			for (j = 0; j < n_controllers; j++) {
-				if (!strcmp(controllers[j], "cpuset") || !strcmp(controllers[j], "memory")) {
-					if (restore_special_props(paux, off2, e) < 0) {
+				if (!strcmp(controllers[j], "cpuset") ||
+				    !strcmp(controllers[j], "memory") ||
+				    !strcmp(controllers[j], "devices")) {
+					if (restore_special_props(paux, off2, e, i == 0) < 0) {
 						pr_err("Restoring special cpuset props failed!\n");
 						return -1;
 					}

Comments

Kirill Gorkunov Sept. 1, 2016, 2:01 p.m.
On Thu, Sep 01, 2016 at 04:41:26PM +0300, Cyrill Gorcunov wrote:
> Currently if there are several subcgroups in devices container
> we're trying to deny any device by default first before we're
> setting up allowed device from the image. That's is prohibited
> by kernel because if the cgroup has an active parent noone can
> add a rule to deny devices. The logic in kernel is simple: once
> devices are denied from top level, all descendant cgroups are
> automatically propagated with such limitation.
> 
> Thus what we need is to setup "deny" rule on toplevel cgroup.
> Thus here is a proposal
> 
>  - in restore_special_props pass a level number so we won't try
>    to deny devices on children which already have this rule
>  - write global deny rule _iif_ it's a fresh cgroup created,
>    if cgroup is already existing we assume the user already
>    configured it
> 
> Signed-off-by: Cyrill Gorcunov <gorcunov@virtuozzo.com>
> ---
> 
> Guys, take a look please, this is on top of our vz7 CRIU but
> I'll portforward it upon review.

Sidenote: of course the routine of parsing device.list and
restore props should be merged into separate helper to not
duplicate the code.
Tycho Andersen Sept. 12, 2016, 2:59 p.m.
Hi Cyrill,

Sorry for the wait :)

On Thu, Sep 01, 2016 at 04:41:26PM +0300, Cyrill Gorcunov wrote:
> Currently if there are several subcgroups in devices container
> we're trying to deny any device by default first before we're
> setting up allowed device from the image. That's is prohibited
> by kernel because if the cgroup has an active parent noone can
> add a rule to deny devices. The logic in kernel is simple: once
> devices are denied from top level, all descendant cgroups are
> automatically propagated with such limitation.
> 
> Thus what we need is to setup "deny" rule on toplevel cgroup.
> Thus here is a proposal
> 
>  - in restore_special_props pass a level number so we won't try
>    to deny devices on children which already have this rule
>  - write global deny rule _iif_ it's a fresh cgroup created,
>    if cgroup is already existing we assume the user already
>    configured it

I think this is still wrong, because if we have the situation of:

/foo/devices.list: c *:* m
/foo/bar/devices.list: c 1:3 m

this will restore it incorrectly; we need to write a deny rule, and
then only allow 'c 1:3 m' in /foo/bar. If we don't write a deny, we'll
end up with both 'c *:* m' and 'c 1:3 m':

root@hopstrocity:/sys/fs/cgroup/devices/foo# cat devices.list 
c *:* m
root@hopstrocity:/sys/fs/cgroup/devices/foo# mkdir bar
root@hopstrocity:/sys/fs/cgroup/devices/foo# cd bar/
root@hopstrocity:/sys/fs/cgroup/devices/foo/bar# echo a > devices.deny 
root@hopstrocity:/sys/fs/cgroup/devices/foo/bar# echo 'c 1:3 m' > devices.allow 
root@hopstrocity:/sys/fs/cgroup/devices/foo/bar# cat devices.list 
c 1:3 m
root@hopstrocity:/sys/fs/cgroup/devices/foo/bar# cd ..
root@hopstrocity:/sys/fs/cgroup/devices/foo# rmdir bar
root@hopstrocity:/sys/fs/cgroup/devices/foo# mkdir bar
root@hopstrocity:/sys/fs/cgroup/devices/foo# cd bar/
root@hopstrocity:/sys/fs/cgroup/devices/foo/bar# echo 'c 1:3 m' > devices.allow 
root@hopstrocity:/sys/fs/cgroup/devices/foo/bar# cat devices.list 
c *:* m
c 1:3 m

So I think we have to write the 'a' in devices.deny all the time to
get the right result. I suppose we could do some smart checking to
figure out whether the parent is the same as the current cgroup to
avoid writing anything at all, but that's the only case where we don't
need to write it.

I think the right thing to do is in the patchset I sent. The reason the test
doesn't try to restore it is because the cgroup already exists
(because we don't have a way to delete it in the test harness right
now). I'll see about sending a patch for that in the series I sent as
well.

Tycho

> Signed-off-by: Cyrill Gorcunov <gorcunov@virtuozzo.com>
> ---
> 
> Guys, take a look please, this is on top of our vz7 CRIU but
> I'll portforward it upon review.
> 
>  criu/cgroup.c | 86 ++++++++++++++++++++++++++++++++++++++++++-----------------
>  1 file changed, 62 insertions(+), 24 deletions(-)
> 
> diff --git a/criu/cgroup.c b/criu/cgroup.c
> index e316197..7517009 100644
> --- a/criu/cgroup.c
> +++ b/criu/cgroup.c
> @@ -1319,14 +1319,18 @@ static int prepare_cgroup_dir_properties(char *path, int off, CgroupDirEntry **e
>  				if (special)
>  					continue;
>  
> -				/* The devices cgroup must be restored in a
> +				/*
> +				 * The devices cgroup must be restored in a
>  				 * special way: only the contents of
>  				 * devices.list can be read, and it is a
>  				 * whitelist of all the devices the cgroup is
> -				 * allowed to create. To re-creat this
> -				 * whitelist, we first deny everything via
> -				 * devices.deny, and then write the list back
> -				 * into devices.allow.
> +				 * allowed to create.
> +				 *
> +				 * If the cgroup already exist then we assume
> +				 * the default "deny" strategy already set by
> +				 * a caller, if it doesn't then our "special"
> +				 * properties handling routine will deny
> +				 * everything on toplevel cgroup.
>  				 *
>  				 * Further, we must have a write() call for
>  				 * each line, because the kernel only parses
> @@ -1346,22 +1350,7 @@ static int prepare_cgroup_dir_properties(char *path, int off, CgroupDirEntry **e
>  					 * this stuff.
>  					 */
>  					pe->perms->mode = 0200;
> -
> -					pe->name = "devices.deny";
> -					pe->value = "a";
> -					ret = restore_cgroup_prop(e->properties[j], path, off2);
> -					pe->name = old_name;
> -					pe->value = old_val;
> -
> -					if (ret < 0)
> -						return -1;
> -
> -					pe->name = xstrdup("devices.allow");
> -					if (!pe->name) {
> -						pe->name = old_name;
> -						return -1;
> -					}
> -					xfree(old_name);
> +					pe->name = "devices.allow";
>  
>  					pos = pe->value;
>  					while (*pos) {
> @@ -1374,6 +1363,7 @@ static int prepare_cgroup_dir_properties(char *path, int off, CgroupDirEntry **e
>  						}
>  						pos += offset;
>  					}
> +					pe->name = old_name;
>  					pe->value = old_val;
>  
>  				} else if (restore_cgroup_prop(e->properties[j], path, off2) < 0) {
> @@ -1411,12 +1401,58 @@ int prepare_cgroup_properties(void)
>  	return 0;
>  }
>  
> -static int restore_special_props(char *paux, size_t off, CgroupDirEntry *e)
> +static int restore_special_props(char *paux, size_t off, CgroupDirEntry *e, bool toplevel)
>  {
>  	int i, j;
>  
>  	pr_info("Restore special props\n");
>  
> +	/*
> +	 * For freshly created toplevel cgroup we need to
> +	 * take care of devices list separately: deny everything
> +	 * by default and setup allowed devices. Upon children
> +	 * creation the list get propagated.
> +	 */
> +	if (toplevel) {
> +		for (j = 0; j < e->n_properties; j++) {
> +			CgroupPropEntry *prop = e->properties[j];
> +			char *old_val, *old_name, *pos;
> +			int ret;
> +
> +			if (strcmp(prop->name, "device.list"))
> +				continue;
> +
> +			prop->perms->mode = 0200;
> +
> +			old_val = prop->value;
> +			old_name = prop->name;
> +
> +			prop->name = "devices.deny";
> +			prop->value = "a";
> +			ret = restore_cgroup_prop(prop, paux, off);
> +			prop->value = old_val;
> +
> +			if (ret < 0) {
> +				prop->name = old_name;
> +				return -1;
> +			}
> +
> +			prop->name = "devices.allow";
> +			pos = prop->value;
> +			while (*pos) {
> +				int offset = next_device_entry(pos);
> +				prop->value = pos;
> +				ret = restore_cgroup_prop(prop, paux, off);
> +				if (ret < 0) {
> +					prop->value = old_val;
> +					return -1;
> +				}
> +				pos += offset;
> +			}
> +			prop->value = old_val;
> +		}
> +	}
> +
>  	for (i = 0; special_props[i]; i++) {
>  		const char *name = special_props[i];
>  
> @@ -1490,8 +1526,10 @@ static int prepare_cgroup_dirs(char **controllers, int n_controllers, char *paux
>  				return -1;
>  
>  			for (j = 0; j < n_controllers; j++) {
> -				if (!strcmp(controllers[j], "cpuset") || !strcmp(controllers[j], "memory")) {
> -					if (restore_special_props(paux, off2, e) < 0) {
> +				if (!strcmp(controllers[j], "cpuset") ||
> +				    !strcmp(controllers[j], "memory") ||
> +				    !strcmp(controllers[j], "devices")) {
> +					if (restore_special_props(paux, off2, e, i == 0) < 0) {
>  						pr_err("Restoring special cpuset props failed!\n");
>  						return -1;
>  					}
> -- 
> 2.7.4
>
Cyrill Gorcunov Sept. 12, 2016, 3:54 p.m.
On Mon, Sep 12, 2016 at 08:59:32AM -0600, Tycho Andersen wrote:
> Hi Cyrill,
> 
> Sorry for the wait :)
> 
> On Thu, Sep 01, 2016 at 04:41:26PM +0300, Cyrill Gorcunov wrote:
> > Currently if there are several subcgroups in devices container
> > we're trying to deny any device by default first before we're
> > setting up allowed device from the image. That's is prohibited
> > by kernel because if the cgroup has an active parent noone can
> > add a rule to deny devices. The logic in kernel is simple: once
> > devices are denied from top level, all descendant cgroups are
> > automatically propagated with such limitation.
> > 
> > Thus what we need is to setup "deny" rule on toplevel cgroup.
> > Thus here is a proposal
> > 
> >  - in restore_special_props pass a level number so we won't try
> >    to deny devices on children which already have this rule
> >  - write global deny rule _iif_ it's a fresh cgroup created,
> >    if cgroup is already existing we assume the user already
> >    configured it
> 
> I think this is still wrong, because if we have the situation of:
> 
> /foo/devices.list: c *:* m
> /foo/bar/devices.list: c 1:3 m
> 
> this will restore it incorrectly; we need to write a deny rule, and
> then only allow 'c 1:3 m' in /foo/bar. If we don't write a deny, we'll
> end up with both 'c *:* m' and 'c 1:3 m':
> 
> root@hopstrocity:/sys/fs/cgroup/devices/foo# cat devices.list 
> c *:* m
> root@hopstrocity:/sys/fs/cgroup/devices/foo# mkdir bar
> root@hopstrocity:/sys/fs/cgroup/devices/foo# cd bar/
> root@hopstrocity:/sys/fs/cgroup/devices/foo/bar# echo a > devices.deny 
> root@hopstrocity:/sys/fs/cgroup/devices/foo/bar# echo 'c 1:3 m' > devices.allow 
> root@hopstrocity:/sys/fs/cgroup/devices/foo/bar# cat devices.list 
> c 1:3 m
> root@hopstrocity:/sys/fs/cgroup/devices/foo/bar# cd ..
> root@hopstrocity:/sys/fs/cgroup/devices/foo# rmdir bar
> root@hopstrocity:/sys/fs/cgroup/devices/foo# mkdir bar
> root@hopstrocity:/sys/fs/cgroup/devices/foo# cd bar/
> root@hopstrocity:/sys/fs/cgroup/devices/foo/bar# echo 'c 1:3 m' > devices.allow 
> root@hopstrocity:/sys/fs/cgroup/devices/foo/bar# cat devices.list 
> c *:* m
> c 1:3 m

OK, thanks! I'll try your patches and report the results ;)