[CRIU] p.haul: increase dump and predump timeouts for Virtuozzo containers

Submitted by Nikita Spiridonov on June 15, 2016, 7:04 a.m.

Details

Message ID 1465974288-399623-1-git-send-email-nspiridonov@virtuozzo.com
State New
Series "p.haul: increase dump and predump timeouts for Virtuozzo containers"
Headers show

Commit Message

Nikita Spiridonov June 15, 2016, 7:04 a.m.
By default timeout to freeze all processes in container is 10
seconds. For containers with large number of processes freeze can
take more time, so increase timeout up to 180 seconds.

Signed-off-by: Nikita Spiridonov <nspiridonov@virtuozzo.com>
---
 phaul/p_haul_vz.py |    7 ++++++-
 1 files changed, 6 insertions(+), 1 deletions(-)

Patch hide | download patch | download mbox

diff --git a/phaul/p_haul_vz.py b/phaul/p_haul_vz.py
index 97c2bec..a0ba306 100644
--- a/phaul/p_haul_vz.py
+++ b/phaul/p_haul_vz.py
@@ -140,11 +140,16 @@  class p_haul_type:
 			# Increase ghost-limit up to 50Mb
 			req.opts.ghost_limit = 50 << 20
 
-		# Specify freezer cgroup for both predump and dump requests
+		# Specify both predump and dump specific options
 		if req.type == pycriu.rpc.PRE_DUMP or req.type == pycriu.rpc.DUMP:
+
+			# Specify freezer cgroup
 			req.opts.freeze_cgroup = \
 				"/sys/fs/cgroup/freezer/{0}/".format(self._ctid)
 
+			# Increase timeout up to 180 seconds
+			req.opts.timeout = 180
+
 	def root_task_pid(self):
 		path = "/var/run/ve/{0}.init.pid".format(self._ctid)
 		with open(path) as pidfile:

Comments

Andrei Vagin June 21, 2016, 3:31 a.m.
On Wed, Jun 15, 2016 at 11:04:48AM +0400, Nikita Spiridonov wrote:
> By default timeout to freeze all processes in container is 10
> seconds. For containers with large number of processes freeze can
> take more time, so increase timeout up to 180 seconds.

Do we really want to migrate a container with so big downtime?


> 
> Signed-off-by: Nikita Spiridonov <nspiridonov@virtuozzo.com>
> ---
>  phaul/p_haul_vz.py |    7 ++++++-
>  1 files changed, 6 insertions(+), 1 deletions(-)
> 
> diff --git a/phaul/p_haul_vz.py b/phaul/p_haul_vz.py
> index 97c2bec..a0ba306 100644
> --- a/phaul/p_haul_vz.py
> +++ b/phaul/p_haul_vz.py
> @@ -140,11 +140,16 @@ class p_haul_type:
>  			# Increase ghost-limit up to 50Mb
>  			req.opts.ghost_limit = 50 << 20
>  
> -		# Specify freezer cgroup for both predump and dump requests
> +		# Specify both predump and dump specific options
>  		if req.type == pycriu.rpc.PRE_DUMP or req.type == pycriu.rpc.DUMP:
> +
> +			# Specify freezer cgroup
>  			req.opts.freeze_cgroup = \
>  				"/sys/fs/cgroup/freezer/{0}/".format(self._ctid)
>  
> +			# Increase timeout up to 180 seconds
> +			req.opts.timeout = 180
> +
>  	def root_task_pid(self):
>  		path = "/var/run/ve/{0}.init.pid".format(self._ctid)
>  		with open(path) as pidfile:
> -- 
> 1.7.1
> 
> _______________________________________________
> CRIU mailing list
> CRIU@openvz.org
> https://lists.openvz.org/mailman/listinfo/criu
Nikita Spiridonov June 21, 2016, 9:41 a.m.
On Mon, 2016-06-20 at 20:31 -0700, Andrew Vagin wrote:
> On Wed, Jun 15, 2016 at 11:04:48AM +0400, Nikita Spiridonov wrote:
> > By default timeout to freeze all processes in container is 10
> > seconds. For containers with large number of processes freeze can
> > take more time, so increase timeout up to 180 seconds.
> 
> Do we really want to migrate a container with so big downtime?
> 

Yep as for me; at the present moment this fix needed for our internal
stress tests (which fails due to freeze timeout). Pick 180 seconds on
the advice of Cyrill.

> 
> > 
> > Signed-off-by: Nikita Spiridonov <nspiridonov@virtuozzo.com>
> > ---
> >  phaul/p_haul_vz.py |    7 ++++++-
> >  1 files changed, 6 insertions(+), 1 deletions(-)
> > 
> > diff --git a/phaul/p_haul_vz.py b/phaul/p_haul_vz.py
> > index 97c2bec..a0ba306 100644
> > --- a/phaul/p_haul_vz.py
> > +++ b/phaul/p_haul_vz.py
> > @@ -140,11 +140,16 @@ class p_haul_type:
> >  			# Increase ghost-limit up to 50Mb
> >  			req.opts.ghost_limit = 50 << 20
> >  
> > -		# Specify freezer cgroup for both predump and dump requests
> > +		# Specify both predump and dump specific options
> >  		if req.type == pycriu.rpc.PRE_DUMP or req.type == pycriu.rpc.DUMP:
> > +
> > +			# Specify freezer cgroup
> >  			req.opts.freeze_cgroup = \
> >  				"/sys/fs/cgroup/freezer/{0}/".format(self._ctid)
> >  
> > +			# Increase timeout up to 180 seconds
> > +			req.opts.timeout = 180
> > +
> >  	def root_task_pid(self):
> >  		path = "/var/run/ve/{0}.init.pid".format(self._ctid)
> >  		with open(path) as pidfile:
> > -- 
> > 1.7.1
> > 
> > _______________________________________________
> > CRIU mailing list
> > CRIU@openvz.org
> > https://lists.openvz.org/mailman/listinfo/criu
Cyrill Gorcunov June 21, 2016, 4:26 p.m.
On Tue, Jun 21, 2016 at 01:41:36PM +0400, Nikita Spiridonov wrote:
> On Mon, 2016-06-20 at 20:31 -0700, Andrew Vagin wrote:
> > On Wed, Jun 15, 2016 at 11:04:48AM +0400, Nikita Spiridonov wrote:
> > > By default timeout to freeze all processes in container is 10
> > > seconds. For containers with large number of processes freeze can
> > > take more time, so increase timeout up to 180 seconds.
> > 
> > Do we really want to migrate a container with so big downtime?
> > 
> 
> Yep as for me; at the present moment this fix needed for our internal
> stress tests (which fails due to freeze timeout). Pick 180 seconds on
> the advice of Cyrill.

What is interesting the timeout not always happening, once I hit
a problem where time on the node has jumped up and our 10 sec by
default timelimit expectedly overflowed interrupting checkpoint
procedure. So I think 180 sec is a good choise because otherwise
if freezing processes takes more than that we're definitely in
troubles.
Pavel Emelyanov July 15, 2016, 5:41 p.m.
Applied