early_init: Force to unlimit files on early init

Submitted by Cyrill Gorcunov on March 12, 2019, 5:29 p.m.

Details

Message ID 20190312172925.4822-1-gorcunov@gmail.com
State New
Series "early_init: Force to unlimit files on early init"
Headers show

Commit Message

Cyrill Gorcunov March 12, 2019, 5:29 p.m.
From: Cyrill Gorcunov <gorcunov@virtuozzo.com>

It is so because when we checkpoint containers with huge number
of files opened we get

 | (00.659733) Error (criu/util.c:403): Can't open 57950: Too many open files
 | (00.659741) Error (criu/proc_parse.c:2399): Can't open 57950/task on procfs: Too many open files
 | (00.675635) Error (criu/cr-dump.c:2111): Dumping FAILED.
 | Failed to checkpoint the Container

simply because we're opening more files for criu itself. This mostly
reverts a148e87469a36dc75e263259e39ed30336343fc8

Signed-off-by: Cyrill Gorcunov <gorcunov@virtuozzo.com>
---
 criu/cr-restore.c | 32 --------------------------------
 criu/crtools.c    | 33 +++++++++++++++++++++++++++++++++
 2 files changed, 33 insertions(+), 32 deletions(-)

Patch hide | download patch | download mbox

diff --git a/criu/cr-restore.c b/criu/cr-restore.c
index af2ca2921d00..a6cb8f1b3e07 100644
--- a/criu/cr-restore.c
+++ b/criu/cr-restore.c
@@ -2327,42 +2327,10 @@  int prepare_dummy_task_state(struct pstree_item *pi)
 	return 0;
 }
 
-static void rlimit_unlimit_nofile_self(void)
-{
-	struct rlimit new;
-
-	new.rlim_cur = kdat.sysctl_nr_open;
-	new.rlim_max = kdat.sysctl_nr_open;
-
-	if (prlimit(getpid(), RLIMIT_NOFILE, &new, NULL)) {
-		pr_perror("rlimir: Can't setup RLIMIT_NOFILE for self");
-		return;
-	} else
-		pr_debug("rlimit: RLIMIT_NOFILE unlimited for self\n");
-	service_fd_rlim_cur = kdat.sysctl_nr_open;
-}
-
 int cr_restore_tasks(void)
 {
 	int ret = -1;
 
-	/*
-	 * Service fd engine implies that file descriptors
-	 * used won't be borrowed by the rest of the code
-	 * and default 1024 limit is not enough for high
-	 * loaded test/containers. Thus use kdat engine
-	 * to fetch current system level limit for numbers
-	 * of files allowed to open up and lift up own
-	 * limits.
-	 *
-	 * Note we have to do it before the service fd
-	 * get inited and we dont exit with errors here
-	 * because in worst scenario where clash of fd
-	 * happen we simply exit with explicit error
-	 * during real action stage.
-	 */
-	rlimit_unlimit_nofile_self();
-
 	if (cr_plugin_init(CR_PLUGIN_STAGE__RESTORE))
 		return -1;
 
diff --git a/criu/crtools.c b/criu/crtools.c
index 3f64000c1e87..5da09beaa75f 100644
--- a/criu/crtools.c
+++ b/criu/crtools.c
@@ -51,8 +51,41 @@ 
 #include "sysctl.h"
 #include "img-remote.h"
 
+static void rlimit_unlimit_nofile_self(void)
+{
+	struct rlimit new;
+
+	new.rlim_cur = kdat.sysctl_nr_open;
+	new.rlim_max = kdat.sysctl_nr_open;
+
+	if (prlimit(getpid(), RLIMIT_NOFILE, &new, NULL)) {
+		pr_perror("rlimir: Can't setup RLIMIT_NOFILE for self");
+		return;
+	} else
+		pr_debug("rlimit: RLIMIT_NOFILE unlimited for self\n");
+	service_fd_rlim_cur = kdat.sysctl_nr_open;
+}
+
 static int early_init(void)
 {
+	/*
+	 * Service fd engine implies that file descritprs
+	 * used won't be borrowed by the rest of the code
+	 * and default 1024 limit is not enough for high
+	 * loaded test/containers. Thus use kdat engine
+	 * to fetch current system level limit for numbers
+	 * of files allowed to open up and lift up own
+	 * limits.
+	 *
+	 * Note we have to do it before the service fd
+	 * get inited and we dont exit with errors here
+	 * because in worst scenario where clash of fd
+	 * happen we simply exit with explicit error
+	 * during real action stage.
+	 */
+	if (!kerndat_files_stat(true))
+		rlimit_unlimit_nofile_self();
+
 	if (init_service_fd())
 		return 1;
 

Comments

Mike Rapoport March 12, 2019, 8:35 p.m.
On Tue, Mar 12, 2019 at 08:29:25PM +0300, Cyrill Gorcunov wrote:
> From: Cyrill Gorcunov <gorcunov@virtuozzo.com>
> 
> It is so because when we checkpoint containers with huge number
> of files opened we get
> 
>  | (00.659733) Error (criu/util.c:403): Can't open 57950: Too many open files
>  | (00.659741) Error (criu/proc_parse.c:2399): Can't open 57950/task on procfs: Too many open files
>  | (00.675635) Error (criu/cr-dump.c:2111): Dumping FAILED.
>  | Failed to checkpoint the Container
> 
> simply because we're opening more files for criu itself. This mostly
> reverts a148e87469a36dc75e263259e39ed30336343fc8
> 
> Signed-off-by: Cyrill Gorcunov <gorcunov@virtuozzo.com>
> ---
>  criu/cr-restore.c | 32 --------------------------------
>  criu/crtools.c    | 33 +++++++++++++++++++++++++++++++++
>  2 files changed, 33 insertions(+), 32 deletions(-)

...
 
> diff --git a/criu/crtools.c b/criu/crtools.c
> index 3f64000c1e87..5da09beaa75f 100644
> --- a/criu/crtools.c
> +++ b/criu/crtools.c
> @@ -51,8 +51,41 @@
>  #include "sysctl.h"
>  #include "img-remote.h"
>  
> +static void rlimit_unlimit_nofile_self(void)

Nit: does '_self' suffix have some real meaning? Can we just drop it?
Same for the pr_{perror,debug}

> +{
> +	struct rlimit new;
> +
> +	new.rlim_cur = kdat.sysctl_nr_open;
> +	new.rlim_max = kdat.sysctl_nr_open;
> +
> +	if (prlimit(getpid(), RLIMIT_NOFILE, &new, NULL)) {
> +		pr_perror("rlimir: Can't setup RLIMIT_NOFILE for self");

                          ^ rlimit

> +		return;
> +	} else
> +		pr_debug("rlimit: RLIMIT_NOFILE unlimited for self\n");
> +	service_fd_rlim_cur = kdat.sysctl_nr_open;
> +}
> +
>  static int early_init(void)
>  {
> +	/*
> +	 * Service fd engine implies that file descritprs
> +	 * used won't be borrowed by the rest of the code
> +	 * and default 1024 limit is not enough for high
> +	 * loaded test/containers. Thus use kdat engine
> +	 * to fetch current system level limit for numbers
> +	 * of files allowed to open up and lift up own
> +	 * limits.
> +	 *
> +	 * Note we have to do it before the service fd
> +	 * get inited and we dont exit with errors here
> +	 * because in worst scenario where clash of fd
> +	 * happen we simply exit with explicit error
> +	 * during real action stage.
> +	 */
> +	if (!kerndat_files_stat(true))
> +		rlimit_unlimit_nofile_self();
> +
>  	if (init_service_fd())
>  		return 1;
>  
> -- 
> 2.20.1
> 
> _______________________________________________
> CRIU mailing list
> CRIU@openvz.org
> https://lists.openvz.org/mailman/listinfo/criu
>
Cyrill Gorcunov March 12, 2019, 8:43 p.m.
On Tue, Mar 12, 2019 at 10:35:17PM +0200, Mike Rapoport wrote:
> On Tue, Mar 12, 2019 at 08:29:25PM +0300, Cyrill Gorcunov wrote:
> > From: Cyrill Gorcunov <gorcunov@virtuozzo.com>
> > 
> > It is so because when we checkpoint containers with huge number
> > of files opened we get
> > 
> >  | (00.659733) Error (criu/util.c:403): Can't open 57950: Too many open files
> >  | (00.659741) Error (criu/proc_parse.c:2399): Can't open 57950/task on procfs: Too many open files
> >  | (00.675635) Error (criu/cr-dump.c:2111): Dumping FAILED.
> >  | Failed to checkpoint the Container
> > 
> > simply because we're opening more files for criu itself. This mostly
> > reverts a148e87469a36dc75e263259e39ed30336343fc8
> > 
> > Signed-off-by: Cyrill Gorcunov <gorcunov@virtuozzo.com>
> > ---
> >  criu/cr-restore.c | 32 --------------------------------
> >  criu/crtools.c    | 33 +++++++++++++++++++++++++++++++++
> >  2 files changed, 33 insertions(+), 32 deletions(-)
> 
> ...
>  
> > diff --git a/criu/crtools.c b/criu/crtools.c
> > index 3f64000c1e87..5da09beaa75f 100644
> > --- a/criu/crtools.c
> > +++ b/criu/crtools.c
> > @@ -51,8 +51,41 @@
> >  #include "sysctl.h"
> >  #include "img-remote.h"
> >  
> > +static void rlimit_unlimit_nofile_self(void)
> 
> Nit: does '_self' suffix have some real meaning? Can we just drop it?
> Same for the pr_{perror,debug}

It means that we're unlimiting _own_ resources not
some other pid.
 
> > +{
> > +	struct rlimit new;
> > +
> > +	new.rlim_cur = kdat.sysctl_nr_open;
> > +	new.rlim_max = kdat.sysctl_nr_open;
> > +
> > +	if (prlimit(getpid(), RLIMIT_NOFILE, &new, NULL)) {
> > +		pr_perror("rlimir: Can't setup RLIMIT_NOFILE for self");
> 
>                           ^ rlimit
> 

? Unfortunately our logger doesn't allow to setup temp prefixes,
so I put rlimit here for easier greps.
Mike Rapoport March 12, 2019, 8:51 p.m.
On Tue, Mar 12, 2019 at 11:43:02PM +0300, Cyrill Gorcunov wrote:
> On Tue, Mar 12, 2019 at 10:35:17PM +0200, Mike Rapoport wrote:
> > On Tue, Mar 12, 2019 at 08:29:25PM +0300, Cyrill Gorcunov wrote:
> > > From: Cyrill Gorcunov <gorcunov@virtuozzo.com>
> > > 
> > > It is so because when we checkpoint containers with huge number
> > > of files opened we get
> > > 
> > >  | (00.659733) Error (criu/util.c:403): Can't open 57950: Too many open files
> > >  | (00.659741) Error (criu/proc_parse.c:2399): Can't open 57950/task on procfs: Too many open files
> > >  | (00.675635) Error (criu/cr-dump.c:2111): Dumping FAILED.
> > >  | Failed to checkpoint the Container
> > > 
> > > simply because we're opening more files for criu itself. This mostly
> > > reverts a148e87469a36dc75e263259e39ed30336343fc8
> > > 
> > > Signed-off-by: Cyrill Gorcunov <gorcunov@virtuozzo.com>
> > > ---
> > >  criu/cr-restore.c | 32 --------------------------------
> > >  criu/crtools.c    | 33 +++++++++++++++++++++++++++++++++
> > >  2 files changed, 33 insertions(+), 32 deletions(-)
> > 
> > ...
> >  
> > > diff --git a/criu/crtools.c b/criu/crtools.c
> > > index 3f64000c1e87..5da09beaa75f 100644
> > > --- a/criu/crtools.c
> > > +++ b/criu/crtools.c
> > > @@ -51,8 +51,41 @@
> > >  #include "sysctl.h"
> > >  #include "img-remote.h"
> > >  
> > > +static void rlimit_unlimit_nofile_self(void)
> > 
> > Nit: does '_self' suffix have some real meaning? Can we just drop it?
> > Same for the pr_{perror,debug}
> 
> It means that we're unlimiting _own_ resources not
> some other pid.

Well, for me the '_self' was excessive and it was clear enough that a
static function called from the early_init() would limit own resources.
  
> > > +{
> > > +	struct rlimit new;
> > > +
> > > +	new.rlim_cur = kdat.sysctl_nr_open;
> > > +	new.rlim_max = kdat.sysctl_nr_open;
> > > +
> > > +	if (prlimit(getpid(), RLIMIT_NOFILE, &new, NULL)) {
> > > +		pr_perror("rlimir: Can't setup RLIMIT_NOFILE for self");
> > 
> >                           ^ rlimit
> > 
> 
> ? Unfortunately our logger doesn't allow to setup temp prefixes,
> so I put rlimit here for easier greps.

There's a typo: rlimi*r* ;-)
Cyrill Gorcunov March 12, 2019, 9 p.m.
On Tue, Mar 12, 2019 at 10:51:32PM +0200, Mike Rapoport wrote:
> On Tue, Mar 12, 2019 at 11:43:02PM +0300, Cyrill Gorcunov wrote:
> > On Tue, Mar 12, 2019 at 10:35:17PM +0200, Mike Rapoport wrote:
> > > On Tue, Mar 12, 2019 at 08:29:25PM +0300, Cyrill Gorcunov wrote:
> > > > From: Cyrill Gorcunov <gorcunov@virtuozzo.com>
> > > > 
> > > > It is so because when we checkpoint containers with huge number
> > > > of files opened we get
> > > > 
> > > >  | (00.659733) Error (criu/util.c:403): Can't open 57950: Too many open files
> > > >  | (00.659741) Error (criu/proc_parse.c:2399): Can't open 57950/task on procfs: Too many open files
> > > >  | (00.675635) Error (criu/cr-dump.c:2111): Dumping FAILED.
> > > >  | Failed to checkpoint the Container
> > > > 
> > > > simply because we're opening more files for criu itself. This mostly
> > > > reverts a148e87469a36dc75e263259e39ed30336343fc8
> > > > 
> > > > Signed-off-by: Cyrill Gorcunov <gorcunov@virtuozzo.com>
> > > > ---
> > > >  criu/cr-restore.c | 32 --------------------------------
> > > >  criu/crtools.c    | 33 +++++++++++++++++++++++++++++++++
> > > >  2 files changed, 33 insertions(+), 32 deletions(-)
> > > 
> > > ...
> > >  
> > > > diff --git a/criu/crtools.c b/criu/crtools.c
> > > > index 3f64000c1e87..5da09beaa75f 100644
> > > > --- a/criu/crtools.c
> > > > +++ b/criu/crtools.c
> > > > @@ -51,8 +51,41 @@
> > > >  #include "sysctl.h"
> > > >  #include "img-remote.h"
> > > >  
> > > > +static void rlimit_unlimit_nofile_self(void)
> > > 
> > > Nit: does '_self' suffix have some real meaning? Can we just drop it?
> > > Same for the pr_{perror,debug}
> > 
> > It means that we're unlimiting _own_ resources not
> > some other pid.
> 
> Well, for me the '_self' was excessive and it was clear enough that a
> static function called from the early_init() would limit own resources.

You know, I'm fine with _self dropped (thought I personally prefer
it to have) but I have no strong objectives. So ok, will drop.

> > > > +	if (prlimit(getpid(), RLIMIT_NOFILE, &new, NULL)) {
> > > > +		pr_perror("rlimir: Can't setup RLIMIT_NOFILE for self");
> > > 
> > >                           ^ rlimit
> > > 
> > 
> > ? Unfortunately our logger doesn't allow to setup temp prefixes,
> > so I put rlimit here for easier greps.
> 
> There's a typo: rlimi*r* ;-)

Ah, indeed, sorry :)