[00/12] Introduce custom per-task service fds placement

Submitted by Andrei Vagin on Dec. 27, 2017, 5:14 p.m.

Details

Message ID 20171227171403.GA24365@outlook.office365.com
State New
Headers show

Patch hide | download patch | download mbox

=== Run 1/1 ================ zdtm/static/env00

========================== Run zdtm/static/env00 in h ==========================
Start test
./env00 --pidfile=env00.pid --outfile=env00.out --envname=ENV_00_TEST
Run criu dump
Unable to kill 36: [Errno 3] No such process
Run criu restore
=[log]=> dump/zdtm/static/env00/36/1/restore.log
------------------------ grep Error ------------------------
(00.016768)     36: Collect fdinfo pid=36 fd=1006 id=0x5
(00.016783)     36: Collect fdinfo pid=36 fd=1007 id=0x5
(00.016798)     36: Collect fdinfo pid=36 fd=1008 id=0x5
(00.016813)     36: Collect fdinfo pid=36 fd=1009 id=0x5
(00.016828)     36: Error (criu/files.c:915): Too big FD number to restore 1010
(00.016864) Error (criu/cr-restore.c:2452): Restoring FAILED.
------------------------ ERROR OVER ------------------------
################# Test zdtm/static/env00 FAIL at CRIU restore ##################
##################################### FAIL #####################################
[root@fc24 criu]# git diff
diff --git a/criu/crtools.c b/criu/crtools.c
index 5f005387d..f18c295d8 100644
--- a/criu/crtools.c
+++ b/criu/crtools.c
@@ -244,7 +244,7 @@  static void rlimit_unlimit_nofile_self(void)
        new.rlim_cur = kdat.sysctl_nr_open;
        new.rlim_max = kdat.sysctl_nr_open;
 
-       if (prlimit(getpid(), RLIMIT_NOFILE, &new, NULL)) {
+       if (0 && prlimit(getpid(), RLIMIT_NOFILE, &new, NULL)) {
                pr_perror("rlimir: Can't setup RLIMIT_NOFILE for self");
                return;
        } else
diff --git a/test/zdtm/static/env00.c b/test/zdtm/static/env00.c
index 1feabfa9f..0083d3fa0 100644
--- a/test/zdtm/static/env00.c
+++ b/test/zdtm/static/env00.c
@@ -21,6 +21,15 @@  int main(int argc, char **argv)
                exit(1);
        }
 
+       while (1) {
+               if (dup(1) == -1)
+                       break;
+       }
+       close(10);
+       close(11);
+       close(12);
+       close(13);
+
        test_daemon();
        test_waitsig();
 

Comments

Kirill Tkhai Dec. 27, 2017, 7:30 p.m.
On 27.12.2017 20:14, Andrei Vagin wrote:
> On Tue, Dec 26, 2017 at 06:45:46PM +0300, Kirill Tkhai wrote:
>> Currently, service fds are bounded to rlimit(RLIMIT_NOFILE).
>> This brings problems with memory usage, because service fds
>> forces kernel to allocate big files_struct for every task,
>> and the container consumes much more memory, than it used
>> before dump. In some situations this even may lead to restore
>> fail because of enough memory absence in cgroup or on node.
>>
>> This patchset reworks service fds placement, and places
>> the service fds in custom place for every task in dependence
>> of task's biggest fd used on dump. This reduces kernel-allocated
>> files_struct in significant way, and make container use less
>> memory.
>>
>> The patchset consists of two parts. Patches [1-4/12] reworks
>> ctl tty crutch we used before, and makes ctl tty to restore
>> in generic file engine way. This change is good enough by its
>> own, and it has to be made separately earlier, but now is
>> the time to stop delaying, and to remove all the crutch, because
>> this is need for the second part.
>>
>> The second part of patchset places service fds in dependence
>> of task's used fds as described in start of topic. See patches
>> [5-12/12] for details of that.
>>
>> Patch [12/12] actually makes service fds "per-task placed".
>>
> 
> Where is a test?

I'll be happy to implement a test scenario you'll suggested.
What will you recommend?
 
> [root@fc24 criu]# python test/zdtm.py run -t zdtm/static/env00
> === Run 1/1 ================ zdtm/static/env00
> 
> ========================== Run zdtm/static/env00 in h ==========================
> Start test
> ./env00 --pidfile=env00.pid --outfile=env00.out --envname=ENV_00_TEST
> Run criu dump
> Unable to kill 36: [Errno 3] No such process
> Run criu restore
> =[log]=> dump/zdtm/static/env00/36/1/restore.log
> ------------------------ grep Error ------------------------
> (00.016768)     36: Collect fdinfo pid=36 fd=1006 id=0x5
> (00.016783)     36: Collect fdinfo pid=36 fd=1007 id=0x5
> (00.016798)     36: Collect fdinfo pid=36 fd=1008 id=0x5
> (00.016813)     36: Collect fdinfo pid=36 fd=1009 id=0x5
> (00.016828)     36: Error (criu/files.c:915): Too big FD number to restore 1010
> (00.016864) Error (criu/cr-restore.c:2452): Restoring FAILED.

See below.

> ------------------------ ERROR OVER ------------------------
> ################# Test zdtm/static/env00 FAIL at CRIU restore ##################
> ##################################### FAIL #####################################
> [root@fc24 criu]# git diff
> diff --git a/criu/crtools.c b/criu/crtools.c
> index 5f005387d..f18c295d8 100644
> --- a/criu/crtools.c
> +++ b/criu/crtools.c
> @@ -244,7 +244,7 @@ static void rlimit_unlimit_nofile_self(void)
>         new.rlim_cur = kdat.sysctl_nr_open;
>         new.rlim_max = kdat.sysctl_nr_open;
>  
> -       if (prlimit(getpid(), RLIMIT_NOFILE, &new, NULL)) {
> +       if (0 && prlimit(getpid(), RLIMIT_NOFILE, &new, NULL)) {

It's not a valid change, because this unlimited PRLIMIT_NOFILE value
set in the above line is used in init_service_fd() to calculate potentially
possible service fds limit.

So, the check "Too big FD number" just says that you can't restore a fd,
if the system does not allow to create such a number whatever you do.

>                 pr_perror("rlimir: Can't setup RLIMIT_NOFILE for self");
>                 return;
>         } else
> diff --git a/test/zdtm/static/env00.c b/test/zdtm/static/env00.c
> index 1feabfa9f..0083d3fa0 100644
> --- a/test/zdtm/static/env00.c
> +++ b/test/zdtm/static/env00.c
> @@ -21,6 +21,15 @@ int main(int argc, char **argv)
>                 exit(1);
>         }
>  
> +       while (1) {
> +               if (dup(1) == -1)
> +                       break;
> +       }
> +       close(10);
> +       close(11);
> +       close(12);
> +       close(13);
> +
>         test_daemon();
>         test_waitsig();
>  
> 
>  
>> https://travis-ci.org/tkhai/criu/builds/321817890
>> ---
>>
>> Kirill Tkhai (12):
>>       files: Close ctl tty via generic engine
>>       files: Move prepare_ctl_tty() to criu/tty.c
>>       files: Move CTL_TTY_OFF fixup to generic file engine
>>       files: Kill unused CTL_TTY_OFF leftovers
>>       files: Rename service_fd_rlim_cur to service_fd_base_cur
>>       files: Count inh_fd_max
>>       files: Pass pstree_item argument to clone_service_fd()
>>       files: Close old service fd in clone_service_fd()
>>       files: Do setup_newborn_fds() later
>>       files: Refactor clone_service_fd()
>>       files: Prepare clone_service_fd() for overlaping ranges.
>>       files: Make tasks set their own service_fd_base
>>
>>
>>  criu/cr-restore.c        |   14 +++--
>>  criu/files.c             |   45 +++-------------
>>  criu/include/files.h     |    3 +
>>  criu/include/servicefd.h |    4 -
>>  criu/include/tty.h       |    2 -
>>  criu/include/util.h      |    2 +
>>  criu/tty.c               |  127 +++++++++++++++++++++++++++++++++++++++++++---
>>  criu/util.c              |  101 ++++++++++++++++++++++++++-----------
>>  images/fdinfo.proto      |    1 
>>  9 files changed, 214 insertions(+), 85 deletions(-)
>>
>> --
>> Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>

Kirill
Andrei Vagin Dec. 28, 2017, 12:13 a.m.
On Wed, Dec 27, 2017 at 10:30:26PM +0300, Kirill Tkhai wrote:
> On 27.12.2017 20:14, Andrei Vagin wrote:
> > On Tue, Dec 26, 2017 at 06:45:46PM +0300, Kirill Tkhai wrote:
> >> Currently, service fds are bounded to rlimit(RLIMIT_NOFILE).
> >> This brings problems with memory usage, because service fds
> >> forces kernel to allocate big files_struct for every task,
> >> and the container consumes much more memory, than it used
> >> before dump. In some situations this even may lead to restore
> >> fail because of enough memory absence in cgroup or on node.
> >>
> >> This patchset reworks service fds placement, and places
> >> the service fds in custom place for every task in dependence
> >> of task's biggest fd used on dump. This reduces kernel-allocated
> >> files_struct in significant way, and make container use less
> >> memory.
> >>
> >> The patchset consists of two parts. Patches [1-4/12] reworks
> >> ctl tty crutch we used before, and makes ctl tty to restore
> >> in generic file engine way. This change is good enough by its
> >> own, and it has to be made separately earlier, but now is
> >> the time to stop delaying, and to remove all the crutch, because
> >> this is need for the second part.
> >>
> >> The second part of patchset places service fds in dependence
> >> of task's used fds as described in start of topic. See patches
> >> [5-12/12] for details of that.
> >>
> >> Patch [12/12] actually makes service fds "per-task placed".
> >>
> > 
> > Where is a test?
> 
> I'll be happy to implement a test scenario you'll suggested.
> What will you recommend?

Create a few processes with different file rlimits, set file descriptors
so that criu has to move srevice fd-s on retore, etc...

>  
> > [root@fc24 criu]# python test/zdtm.py run -t zdtm/static/env00
> > === Run 1/1 ================ zdtm/static/env00
> > 
> > ========================== Run zdtm/static/env00 in h ==========================
> > Start test
> > ./env00 --pidfile=env00.pid --outfile=env00.out --envname=ENV_00_TEST
> > Run criu dump
> > Unable to kill 36: [Errno 3] No such process
> > Run criu restore
> > =[log]=> dump/zdtm/static/env00/36/1/restore.log
> > ------------------------ grep Error ------------------------
> > (00.016768)     36: Collect fdinfo pid=36 fd=1006 id=0x5
> > (00.016783)     36: Collect fdinfo pid=36 fd=1007 id=0x5
> > (00.016798)     36: Collect fdinfo pid=36 fd=1008 id=0x5
> > (00.016813)     36: Collect fdinfo pid=36 fd=1009 id=0x5
> > (00.016828)     36: Error (criu/files.c:915): Too big FD number to restore 1010
> > (00.016864) Error (criu/cr-restore.c:2452): Restoring FAILED.
> 
> See below.
> 
> > ------------------------ ERROR OVER ------------------------
> > ################# Test zdtm/static/env00 FAIL at CRIU restore ##################
> > ##################################### FAIL #####################################
> > [root@fc24 criu]# git diff
> > diff --git a/criu/crtools.c b/criu/crtools.c
> > index 5f005387d..f18c295d8 100644
> > --- a/criu/crtools.c
> > +++ b/criu/crtools.c
> > @@ -244,7 +244,7 @@ static void rlimit_unlimit_nofile_self(void)
> >         new.rlim_cur = kdat.sysctl_nr_open;
> >         new.rlim_max = kdat.sysctl_nr_open;
> >  
> > -       if (prlimit(getpid(), RLIMIT_NOFILE, &new, NULL)) {
> > +       if (0 && prlimit(getpid(), RLIMIT_NOFILE, &new, NULL)) {
> 
> It's not a valid change, because this unlimited PRLIMIT_NOFILE value
> set in the above line is used in init_service_fd() to calculate potentially
> possible service fds limit.
> 
> So, the check "Too big FD number" just says that you can't restore a fd,
> if the system does not allow to create such a number whatever you do.

Pls, write a comment before prlimit() which explains why do we need
this.

> 
> >                 pr_perror("rlimir: Can't setup RLIMIT_NOFILE for self");
> >                 return;
> >         } else
> > diff --git a/test/zdtm/static/env00.c b/test/zdtm/static/env00.c
> > index 1feabfa9f..0083d3fa0 100644
> > --- a/test/zdtm/static/env00.c
> > +++ b/test/zdtm/static/env00.c
> > @@ -21,6 +21,15 @@ int main(int argc, char **argv)
> >                 exit(1);
> >         }
> >  
> > +       while (1) {
> > +               if (dup(1) == -1)
> > +                       break;
> > +       }
> > +       close(10);
> > +       close(11);
> > +       close(12);
> > +       close(13);
> > +
> >         test_daemon();
> >         test_waitsig();
> >  
> > 
> >  
> >> https://travis-ci.org/tkhai/criu/builds/321817890
> >> ---
> >>
> >> Kirill Tkhai (12):
> >>       files: Close ctl tty via generic engine
> >>       files: Move prepare_ctl_tty() to criu/tty.c
> >>       files: Move CTL_TTY_OFF fixup to generic file engine
> >>       files: Kill unused CTL_TTY_OFF leftovers
> >>       files: Rename service_fd_rlim_cur to service_fd_base_cur
> >>       files: Count inh_fd_max
> >>       files: Pass pstree_item argument to clone_service_fd()
> >>       files: Close old service fd in clone_service_fd()
> >>       files: Do setup_newborn_fds() later
> >>       files: Refactor clone_service_fd()
> >>       files: Prepare clone_service_fd() for overlaping ranges.
> >>       files: Make tasks set their own service_fd_base
> >>
> >>
> >>  criu/cr-restore.c        |   14 +++--
> >>  criu/files.c             |   45 +++-------------
> >>  criu/include/files.h     |    3 +
> >>  criu/include/servicefd.h |    4 -
> >>  criu/include/tty.h       |    2 -
> >>  criu/include/util.h      |    2 +
> >>  criu/tty.c               |  127 +++++++++++++++++++++++++++++++++++++++++++---
> >>  criu/util.c              |  101 ++++++++++++++++++++++++++-----------
> >>  images/fdinfo.proto      |    1 
> >>  9 files changed, 214 insertions(+), 85 deletions(-)
> >>
> >> --
> >> Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
> 
> Kirill
Kirill Tkhai Dec. 28, 2017, 7:08 a.m.
On 28.12.2017 03:13, Andrei Vagin wrote:
> On Wed, Dec 27, 2017 at 10:30:26PM +0300, Kirill Tkhai wrote:
>> On 27.12.2017 20:14, Andrei Vagin wrote:
>>> On Tue, Dec 26, 2017 at 06:45:46PM +0300, Kirill Tkhai wrote:
>>>> Currently, service fds are bounded to rlimit(RLIMIT_NOFILE).
>>>> This brings problems with memory usage, because service fds
>>>> forces kernel to allocate big files_struct for every task,
>>>> and the container consumes much more memory, than it used
>>>> before dump. In some situations this even may lead to restore
>>>> fail because of enough memory absence in cgroup or on node.
>>>>
>>>> This patchset reworks service fds placement, and places
>>>> the service fds in custom place for every task in dependence
>>>> of task's biggest fd used on dump. This reduces kernel-allocated
>>>> files_struct in significant way, and make container use less
>>>> memory.
>>>>
>>>> The patchset consists of two parts. Patches [1-4/12] reworks
>>>> ctl tty crutch we used before, and makes ctl tty to restore
>>>> in generic file engine way. This change is good enough by its
>>>> own, and it has to be made separately earlier, but now is
>>>> the time to stop delaying, and to remove all the crutch, because
>>>> this is need for the second part.
>>>>
>>>> The second part of patchset places service fds in dependence
>>>> of task's used fds as described in start of topic. See patches
>>>> [5-12/12] for details of that.
>>>>
>>>> Patch [12/12] actually makes service fds "per-task placed".
>>>>
>>>
>>> Where is a test?
>>
>> I'll be happy to implement a test scenario you'll suggested.
>> What will you recommend?
> 
> Create a few processes with different file rlimits, set file descriptors
> so that criu has to move srevice fd-s on retore, etc...

Yeah, it has a sense.

>>  
>>> [root@fc24 criu]# python test/zdtm.py run -t zdtm/static/env00
>>> === Run 1/1 ================ zdtm/static/env00
>>>
>>> ========================== Run zdtm/static/env00 in h ==========================
>>> Start test
>>> ./env00 --pidfile=env00.pid --outfile=env00.out --envname=ENV_00_TEST
>>> Run criu dump
>>> Unable to kill 36: [Errno 3] No such process
>>> Run criu restore
>>> =[log]=> dump/zdtm/static/env00/36/1/restore.log
>>> ------------------------ grep Error ------------------------
>>> (00.016768)     36: Collect fdinfo pid=36 fd=1006 id=0x5
>>> (00.016783)     36: Collect fdinfo pid=36 fd=1007 id=0x5
>>> (00.016798)     36: Collect fdinfo pid=36 fd=1008 id=0x5
>>> (00.016813)     36: Collect fdinfo pid=36 fd=1009 id=0x5
>>> (00.016828)     36: Error (criu/files.c:915): Too big FD number to restore 1010
>>> (00.016864) Error (criu/cr-restore.c:2452): Restoring FAILED.
>>
>> See below.
>>
>>> ------------------------ ERROR OVER ------------------------
>>> ################# Test zdtm/static/env00 FAIL at CRIU restore ##################
>>> ##################################### FAIL #####################################
>>> [root@fc24 criu]# git diff
>>> diff --git a/criu/crtools.c b/criu/crtools.c
>>> index 5f005387d..f18c295d8 100644
>>> --- a/criu/crtools.c
>>> +++ b/criu/crtools.c
>>> @@ -244,7 +244,7 @@ static void rlimit_unlimit_nofile_self(void)
>>>         new.rlim_cur = kdat.sysctl_nr_open;
>>>         new.rlim_max = kdat.sysctl_nr_open;
>>>  
>>> -       if (prlimit(getpid(), RLIMIT_NOFILE, &new, NULL)) {
>>> +       if (0 && prlimit(getpid(), RLIMIT_NOFILE, &new, NULL)) {
>>
>> It's not a valid change, because this unlimited PRLIMIT_NOFILE value
>> set in the above line is used in init_service_fd() to calculate potentially
>> possible service fds limit.
>>
>> So, the check "Too big FD number" just says that you can't restore a fd,
>> if the system does not allow to create such a number whatever you do.
> 
> Pls, write a comment before prlimit() which explains why do we need
> this.

OK
 
>>
>>>                 pr_perror("rlimir: Can't setup RLIMIT_NOFILE for self");
>>>                 return;
>>>         } else
>>> diff --git a/test/zdtm/static/env00.c b/test/zdtm/static/env00.c
>>> index 1feabfa9f..0083d3fa0 100644
>>> --- a/test/zdtm/static/env00.c
>>> +++ b/test/zdtm/static/env00.c
>>> @@ -21,6 +21,15 @@ int main(int argc, char **argv)
>>>                 exit(1);
>>>         }
>>>  
>>> +       while (1) {
>>> +               if (dup(1) == -1)
>>> +                       break;
>>> +       }
>>> +       close(10);
>>> +       close(11);
>>> +       close(12);
>>> +       close(13);
>>> +
>>>         test_daemon();
>>>         test_waitsig();
>>>  
>>>
>>>  
>>>> https://travis-ci.org/tkhai/criu/builds/321817890
>>>> ---
>>>>
>>>> Kirill Tkhai (12):
>>>>       files: Close ctl tty via generic engine
>>>>       files: Move prepare_ctl_tty() to criu/tty.c
>>>>       files: Move CTL_TTY_OFF fixup to generic file engine
>>>>       files: Kill unused CTL_TTY_OFF leftovers
>>>>       files: Rename service_fd_rlim_cur to service_fd_base_cur
>>>>       files: Count inh_fd_max
>>>>       files: Pass pstree_item argument to clone_service_fd()
>>>>       files: Close old service fd in clone_service_fd()
>>>>       files: Do setup_newborn_fds() later
>>>>       files: Refactor clone_service_fd()
>>>>       files: Prepare clone_service_fd() for overlaping ranges.
>>>>       files: Make tasks set their own service_fd_base
>>>>
>>>>
>>>>  criu/cr-restore.c        |   14 +++--
>>>>  criu/files.c             |   45 +++-------------
>>>>  criu/include/files.h     |    3 +
>>>>  criu/include/servicefd.h |    4 -
>>>>  criu/include/tty.h       |    2 -
>>>>  criu/include/util.h      |    2 +
>>>>  criu/tty.c               |  127 +++++++++++++++++++++++++++++++++++++++++++---
>>>>  criu/util.c              |  101 ++++++++++++++++++++++++++-----------
>>>>  images/fdinfo.proto      |    1 
>>>>  9 files changed, 214 insertions(+), 85 deletions(-)
>>>>
>>>> --
>>>> Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
>>
>> Kirill