[RFC] crns.py: New attempt to have --unshare option

Submitted by Pavel Emelianov on Aug. 2, 2016, 4:48 p.m.

Details

Message ID 57A0CEFA.8020002@virtuozzo.com
State Rejected
Series "crns.py: New attempt to have --unshare option"
Headers show

Commit Message

Pavel Emelianov Aug. 2, 2016, 4:48 p.m.
Hi.

The existing --unshare option has one nasty problem -- after restore it's
almost impossible to dump the restored tasks for the 2nd time, since the
mount namespace is not criu's one (cloned from host) and user is likely
not willing to dump it :) So the 2nd dump should somehow be told that the
namespaces are not to be taken into account. This complicates things, so
here's another attempt to do unshared dump/restore differently.

The idea is to have a script that starts CRIU in needed namespaces. And to
use it like this

crns.py <nsid> <criu command>

For restore the needed namespaces are to be created (mount ns is equipped 
with new /proc and pid ns is left with "fake" init process). For dump the 
needed namespaces are setns()-ed from the task we want to dump.

Although restore usage is simple, just like we want it, like this:

scripts/crns.py - <regular criu command>
                ^ this dash stands for "create new namespaces"

the dump CLI is somewhat tricky. First is that all paths (including the
path to criu) should be absolute, as after setns on mntns the cwd and root
on spawned criu will be reset. Second is that the pid to dump should be
specified as it's seen from the target namespace, not from the current one.
And third is that pid to take namespaces from is the one seen from current
namespace %) So the dump CLI looks like this

scripts/crns.py $(pidof task) /usr/bin/criu dump -t $(vpidof task) -D $(realpath dir) ...

Suggestions how to make CLI simpler are welcome :)

Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>

---

Patch hide | download patch | download mbox

diff --git a/scripts/crns.py b/scripts/crns.py
new file mode 100755
index 0000000..6a890c7
--- /dev/null
+++ b/scripts/crns.py
@@ -0,0 +1,149 @@ 
+#!/usr/bin/env python
+import ctypes
+import ctypes.util
+import errno
+import sys
+import os
+
+# <sched.h> constants for unshare
+CLONE_NEWNS = 0x00020000
+CLONE_NEWPID = 0x20000000
+
+# <sys/mount.h> - constants for mount
+MS_REC = 16384
+MS_PRIVATE = 1 << 18
+MS_SLAVE = 1 << 19
+
+# Load libc bindings
+_libc = ctypes.CDLL(ctypes.util.find_library("c"), use_errno=True)
+
+try:
+	_unshare = _libc.unshare
+except AttributeError:
+	raise OSError(errno.EINVAL, "unshare is not supported on this platform")
+else:
+	_unshare.argtypes = [ ctypes.c_int ]
+	_unshare.restype = ctypes.c_int
+
+try:
+	_setns = _libc.setns
+except AttributeError:
+	raise OSError(errno.EINVAL, "setns is not supported on this platform")
+else:
+	_setns.argtypes = [ ctypes.c_int, ctypes.c_int ]
+	_setns.restype = ctypes.c_int
+
+try:
+	_mount = _libc.mount
+except AttributeError:
+	raise OSError(errno.EINVAL, "mount is not supported on this platform")
+else:
+	_mount.argtypes = [
+		ctypes.c_char_p,
+		ctypes.c_char_p,
+		ctypes.c_char_p,
+		ctypes.c_ulong,
+		ctypes.c_void_p
+	]
+	_mount.restype = ctypes.c_int
+
+try:
+	_umount = _libc.umount
+except AttributeError:
+	raise OSError(errno.EINVAL, "umount is not supported on this platform")
+else:
+	_umount.argtypes = [ctypes.c_char]
+	_umount.restype = ctypes.c_int
+
+ns_pid = sys.argv[1]
+
+if ns_pid == '-':
+	# Unshare pid and mount namespaces
+	if _unshare(CLONE_NEWNS | CLONE_NEWPID) != 0:
+		_errno = ctypes.get_errno()
+		raise OSError(_errno, errno.errorcode[_errno])
+
+	(r_pipe, w_pipe) = os.pipe()
+
+	# Spawn the init
+	if os.fork() == 0:
+		os.close(r_pipe)
+
+		# Mount new /proc
+		if _mount(None, "/", None, MS_SLAVE|MS_REC, None) != 0:
+			_errno = ctypes.get_errno()
+			raise OSError(_errno, errno.errorcode[_errno])
+
+		if _mount('proc', '/proc', 'proc', 0, None) != 0:
+			_errno = ctypes.get_errno()
+			raise OSError(_errno, errno.errorcode[_errno])
+
+		# Spawn CRIU binary
+		criu_pid = os.fork()
+		if criu_pid == 0:
+			os.execl(sys.argv[2], *sys.argv[2:])
+			raise OSError(errno.ENOENT, "No such command")
+
+		while True:
+			try:
+				(pid, status) = os.wait()
+				if pid == criu_pid:
+					break
+			except OSError:
+				status = -251
+				break
+
+		os.write(w_pipe, "%d" % status)
+		os.close(w_pipe)
+
+		if status != 0:
+			sys.exit(status)
+
+		while True:
+			try:
+				os.wait()
+			except OSError:
+				break
+
+		sys.exit(0)
+
+	# Wait for CRIU to exit and report the status back
+	os.close(w_pipe)
+	status = os.read(r_pipe, 1024)
+	if not status.isdigit():
+		status_i = -252
+	else:
+		status_i = int(status)
+
+	sys.exit(status_i)
+else:
+	# Join pid and mount namespaces
+	ns_fd = os.open('/proc/%s/ns/pid' % ns_pid, os.O_RDONLY)
+	if _setns(ns_fd, 0) != 0:
+		_errno = ctypes.get_errno()
+		raise OSError(_errno, errno.errorcode[_errno])
+	os.close(ns_fd)
+
+	ns_fd = os.open('/proc/%s/ns/mnt' % ns_pid, os.O_RDONLY)
+	if _setns(ns_fd, 0) != 0:
+		_errno = ctypes.get_errno()
+		raise OSError(_errno, errno.errorcode[_errno])
+	os.close(ns_fd)
+
+	# Spawn CRIU binary
+	criu_pid = os.fork()
+	if criu_pid == 0:
+		os.execl(sys.argv[2], *sys.argv[2:])
+		raise OSError(errno.ENOENT, "No such command")
+
+	# Wait for CRIU to exit and report the status back
+	while True:
+		try:
+			(pid, status) = os.wait()
+			if pid == criu_pid:
+				break
+		except OSError:
+			status = -251
+			break
+
+	sys.exit(status)