[2/2] crit: Anonymize file paths in files.img

Submitted by Harshavardhan Unnibhavi on June 30, 2019, 7:53 a.m.

Details

Message ID 20190630075349.32705-3-hvubfoss@gmail.com
State New
Series "Issue 360: Anonymize image files"
Headers show

Commit Message

Harshavardhan Unnibhavi June 30, 2019, 7:53 a.m.
File path names are replaced by their corresponding sha1 hash values.
The top level names such as bin, var, usr, lib etc, are kept unchanged.

Resolve Issue #360.

Signed-off-by: Harshavardhan Unnibhavi <hvubfoss@gmail.com>
---
 lib/py/anonymize.py | 72 +++++++++++++++++++++++++++++++++++++++++++++
 lib/py/cli.py       |  4 +++
 2 files changed, 76 insertions(+)
 create mode 100644 lib/py/anonymize.py

Patch hide | download patch | download mbox

diff --git a/lib/py/anonymize.py b/lib/py/anonymize.py
new file mode 100644
index 00000000..42861696
--- /dev/null
+++ b/lib/py/anonymize.py
@@ -0,0 +1,72 @@ 
+# This file contains methods to anonymize criu images.
+
+# In order to anonymize images three steps are followed:
+#     - decode the binary image to json
+#     - strip the necessary information from the json dict
+#     - encode the json dict back to a binary image, which is now anonymized
+
+# The following contents are being anonymized:
+#     - Paths to files
+
+import hashlib
+
+def files_anon(image):
+    levels = {}
+
+    fname_key = 'reg'
+    checksum  = hashlib.sha1()
+
+    for e in image['entries']:
+        if fname_key in e:
+            f_path = e[fname_key]['name']
+
+        f_path  = f_path.split('/')
+        lev_num = 0
+
+        for i, p in enumerate(f_path):
+            if p == '':
+                continue
+            if lev_num not in levels:
+                levels[lev_num] = {}
+            if p not in levels[lev_num]:
+                if i == 1:
+                    levels[lev_num][p] = p
+                else:
+                    checksum.update(p)
+                    levels[lev_num][p] = checksum.hexdigest()
+            lev_num += 1
+
+    for i, e in enumerate(image['entries']):
+        if fname_key in e:
+            f_path = e[fname_key]['name']
+        
+        if f_path == '/':
+            continue
+        
+        f_path = f_path.split('/')
+        lev_num = 0
+
+        for j, p in enumerate(f_path):
+            if p == '':
+                continue
+            f_path[j] = levels[lev_num][p]
+            lev_num += 1
+        f_path = '/'.join(f_path)
+        image['entries'][i][fname_key]['name'] = f_path
+    
+    return image
+
+anonymizers = {
+    'FILES': files_anon
+}
+
+def anon_handler(image):
+    magic = image['magic']
+
+    if magic != 'FILES':
+        return -1
+    
+    handler  = anonymizers[magic]
+    anon_img = handler(image)
+
+    return anon_img
diff --git a/lib/py/cli.py b/lib/py/cli.py
index fdb24dbe..400c084e 100755
--- a/lib/py/cli.py
+++ b/lib/py/cli.py
@@ -6,6 +6,7 @@  import os
 import glob
 
 import pycriu
+from anonymize import anon_handler
 
 def inf(opts):
 	if opts['in']:
@@ -286,6 +287,9 @@  def anonymize(opts):
 
 		try:
 			img = pycriu.images.load(inf(inf_opts))
+			anon_dict = anon_handler(img)
+			if anon_dict != -1:
+				pycriu.images.dump(anon_dict, outf(inf_opts))
 		except pycriu.images.MagicException as exc:
 			print("Unknown magic %#x.\n"\
 					"Found a raw image" %exc.magic, file=sys.stderr)

Comments

Pavel Emelianov July 9, 2019, 12:57 p.m.
On 6/30/19 10:53 AM, Harshavardhan Unnibhavi wrote:
> File path names are replaced by their corresponding sha1 hash values.
> The top level names such as bin, var, usr, lib etc, are kept unchanged.
> 
> Resolve Issue #360.

This looks really good :)

Let's go ahead and try to teach criu-restore do the restore of the anonymized 
images until the criu/pie/restorer.c's restore_task() line

	restore_finish_stage(task_entries_local, CR_STATE_RESTORE_CREDS);

after which the whole restore just aborts and exits.

> Signed-off-by: Harshavardhan Unnibhavi <hvubfoss@gmail.com>
> ---
>  lib/py/anonymize.py | 72 +++++++++++++++++++++++++++++++++++++++++++++
>  lib/py/cli.py       |  4 +++
>  2 files changed, 76 insertions(+)
>  create mode 100644 lib/py/anonymize.py
> 
> diff --git a/lib/py/anonymize.py b/lib/py/anonymize.py
> new file mode 100644
> index 00000000..42861696
> --- /dev/null
> +++ b/lib/py/anonymize.py
> @@ -0,0 +1,72 @@
> +# This file contains methods to anonymize criu images.
> +
> +# In order to anonymize images three steps are followed:
> +#     - decode the binary image to json
> +#     - strip the necessary information from the json dict
> +#     - encode the json dict back to a binary image, which is now anonymized
> +
> +# The following contents are being anonymized:
> +#     - Paths to files
> +
> +import hashlib
> +
> +def files_anon(image):
> +    levels = {}
> +
> +    fname_key = 'reg'
> +    checksum  = hashlib.sha1()
> +
> +    for e in image['entries']:
> +        if fname_key in e:
> +            f_path = e[fname_key]['name']
> +
> +        f_path  = f_path.split('/')
> +        lev_num = 0
> +
> +        for i, p in enumerate(f_path):
> +            if p == '':
> +                continue
> +            if lev_num not in levels:
> +                levels[lev_num] = {}
> +            if p not in levels[lev_num]:
> +                if i == 1:
> +                    levels[lev_num][p] = p
> +                else:
> +                    checksum.update(p)
> +                    levels[lev_num][p] = checksum.hexdigest()
> +            lev_num += 1
> +
> +    for i, e in enumerate(image['entries']):
> +        if fname_key in e:
> +            f_path = e[fname_key]['name']
> +        
> +        if f_path == '/':
> +            continue
> +        
> +        f_path = f_path.split('/')
> +        lev_num = 0
> +
> +        for j, p in enumerate(f_path):
> +            if p == '':
> +                continue
> +            f_path[j] = levels[lev_num][p]
> +            lev_num += 1
> +        f_path = '/'.join(f_path)
> +        image['entries'][i][fname_key]['name'] = f_path
> +    
> +    return image
> +
> +anonymizers = {
> +    'FILES': files_anon
> +}
> +
> +def anon_handler(image):
> +    magic = image['magic']
> +
> +    if magic != 'FILES':
> +        return -1
> +    
> +    handler  = anonymizers[magic]
> +    anon_img = handler(image)
> +
> +    return anon_img
> diff --git a/lib/py/cli.py b/lib/py/cli.py
> index fdb24dbe..400c084e 100755
> --- a/lib/py/cli.py
> +++ b/lib/py/cli.py
> @@ -6,6 +6,7 @@ import os
>  import glob
>  
>  import pycriu
> +from anonymize import anon_handler
>  
>  def inf(opts):
>  	if opts['in']:
> @@ -286,6 +287,9 @@ def anonymize(opts):
>  
>  		try:
>  			img = pycriu.images.load(inf(inf_opts))
> +			anon_dict = anon_handler(img)
> +			if anon_dict != -1:
> +				pycriu.images.dump(anon_dict, outf(inf_opts))
>  		except pycriu.images.MagicException as exc:
>  			print("Unknown magic %#x.\n"\
>  					"Found a raw image" %exc.magic, file=sys.stderr)
>
Harshavardhan Unnibhavi Aug. 11, 2019, 9:30 a.m.
Hi Pavel,

Sorry for the late reply. When we restore the anonymized images(with
file names anonymized) it fails to restore the task as the files
themselves don't exist. I have gone through the  criu/pie/restorer.c's
restore_task() function and have found that lines between 1553 and
1599 access the file descriptor from the VMA entry struct.

I have also traced out the file access steps followed by restore from
criu/cr-restore.c to criu/files-reg.c. I think the solution would be
to create a new file with the anonymized name? Where should I insert
this piece of code,to create a new file if it doesn't exist?

Best,
Harsha

On Tue, Jul 9, 2019 at 6:27 PM Pavel Emelianov <xemul@virtuozzo.com> wrote:
>
> On 6/30/19 10:53 AM, Harshavardhan Unnibhavi wrote:
> > File path names are replaced by their corresponding sha1 hash values.
> > The top level names such as bin, var, usr, lib etc, are kept unchanged.
> >
> > Resolve Issue #360.
>
> This looks really good :)
>
> Let's go ahead and try to teach criu-restore do the restore of the anonymized
> images until the criu/pie/restorer.c's restore_task() line
>
>         restore_finish_stage(task_entries_local, CR_STATE_RESTORE_CREDS);
>
> after which the whole restore just aborts and exits.
>
> > Signed-off-by: Harshavardhan Unnibhavi <hvubfoss@gmail.com>
> > ---
> >  lib/py/anonymize.py | 72 +++++++++++++++++++++++++++++++++++++++++++++
> >  lib/py/cli.py       |  4 +++
> >  2 files changed, 76 insertions(+)
> >  create mode 100644 lib/py/anonymize.py
> >
> > diff --git a/lib/py/anonymize.py b/lib/py/anonymize.py
> > new file mode 100644
> > index 00000000..42861696
> > --- /dev/null
> > +++ b/lib/py/anonymize.py
> > @@ -0,0 +1,72 @@
> > +# This file contains methods to anonymize criu images.
> > +
> > +# In order to anonymize images three steps are followed:
> > +#     - decode the binary image to json
> > +#     - strip the necessary information from the json dict
> > +#     - encode the json dict back to a binary image, which is now anonymized
> > +
> > +# The following contents are being anonymized:
> > +#     - Paths to files
> > +
> > +import hashlib
> > +
> > +def files_anon(image):
> > +    levels = {}
> > +
> > +    fname_key = 'reg'
> > +    checksum  = hashlib.sha1()
> > +
> > +    for e in image['entries']:
> > +        if fname_key in e:
> > +            f_path = e[fname_key]['name']
> > +
> > +        f_path  = f_path.split('/')
> > +        lev_num = 0
> > +
> > +        for i, p in enumerate(f_path):
> > +            if p == '':
> > +                continue
> > +            if lev_num not in levels:
> > +                levels[lev_num] = {}
> > +            if p not in levels[lev_num]:
> > +                if i == 1:
> > +                    levels[lev_num][p] = p
> > +                else:
> > +                    checksum.update(p)
> > +                    levels[lev_num][p] = checksum.hexdigest()
> > +            lev_num += 1
> > +
> > +    for i, e in enumerate(image['entries']):
> > +        if fname_key in e:
> > +            f_path = e[fname_key]['name']
> > +
> > +        if f_path == '/':
> > +            continue
> > +
> > +        f_path = f_path.split('/')
> > +        lev_num = 0
> > +
> > +        for j, p in enumerate(f_path):
> > +            if p == '':
> > +                continue
> > +            f_path[j] = levels[lev_num][p]
> > +            lev_num += 1
> > +        f_path = '/'.join(f_path)
> > +        image['entries'][i][fname_key]['name'] = f_path
> > +
> > +    return image
> > +
> > +anonymizers = {
> > +    'FILES': files_anon
> > +}
> > +
> > +def anon_handler(image):
> > +    magic = image['magic']
> > +
> > +    if magic != 'FILES':
> > +        return -1
> > +
> > +    handler  = anonymizers[magic]
> > +    anon_img = handler(image)
> > +
> > +    return anon_img
> > diff --git a/lib/py/cli.py b/lib/py/cli.py
> > index fdb24dbe..400c084e 100755
> > --- a/lib/py/cli.py
> > +++ b/lib/py/cli.py
> > @@ -6,6 +6,7 @@ import os
> >  import glob
> >
> >  import pycriu
> > +from anonymize import anon_handler
> >
> >  def inf(opts):
> >       if opts['in']:
> > @@ -286,6 +287,9 @@ def anonymize(opts):
> >
> >               try:
> >                       img = pycriu.images.load(inf(inf_opts))
> > +                     anon_dict = anon_handler(img)
> > +                     if anon_dict != -1:
> > +                             pycriu.images.dump(anon_dict, outf(inf_opts))
> >               except pycriu.images.MagicException as exc:
> >                       print("Unknown magic %#x.\n"\
> >                                       "Found a raw image" %exc.magic, file=sys.stderr)
> >
>
Pavel Emelianov Aug. 14, 2019, 9:47 a.m.
On 8/11/19 12:30 PM, Harshavardhan Unnibhavi wrote:
> Hi Pavel,
> 
> Sorry for the late reply. When we restore the anonymized images(with
> file names anonymized) it fails to restore the task as the files
> themselves don't exist. I have gone through the  criu/pie/restorer.c's
> restore_task() function and have found that lines between 1553 and
> 1599 access the file descriptor from the VMA entry struct.

Of course. The "anonymized" file paths cannot be opened, so you should patch
the criu restore code to make it pretend that the file in question is opened,
but restore some fake file descriptor instead. And abort the restore at the
very end.
 
> I have also traced out the file access steps followed by restore from
> criu/cr-restore.c to criu/files-reg.c. I think the solution would be
> to create a new file with the anonymized name? Where should I insert
> this piece of code,to create a new file if it doesn't exist?

Not create the file with anonymous name, but rather report back a fake file
descriptor. I'd introduce a "--anon" CLI option to the restore action for this.

-- Pavel

> Best,
> Harsha
> 
> On Tue, Jul 9, 2019 at 6:27 PM Pavel Emelianov <xemul@virtuozzo.com> wrote:
>>
>> On 6/30/19 10:53 AM, Harshavardhan Unnibhavi wrote:
>>> File path names are replaced by their corresponding sha1 hash values.
>>> The top level names such as bin, var, usr, lib etc, are kept unchanged.
>>>
>>> Resolve Issue #360.
>>
>> This looks really good :)
>>
>> Let's go ahead and try to teach criu-restore do the restore of the anonymized
>> images until the criu/pie/restorer.c's restore_task() line
>>
>>         restore_finish_stage(task_entries_local, CR_STATE_RESTORE_CREDS);
>>
>> after which the whole restore just aborts and exits.
>>
>>> Signed-off-by: Harshavardhan Unnibhavi <hvubfoss@gmail.com>
>>> ---
>>>  lib/py/anonymize.py | 72 +++++++++++++++++++++++++++++++++++++++++++++
>>>  lib/py/cli.py       |  4 +++
>>>  2 files changed, 76 insertions(+)
>>>  create mode 100644 lib/py/anonymize.py
>>>
>>> diff --git a/lib/py/anonymize.py b/lib/py/anonymize.py
>>> new file mode 100644
>>> index 00000000..42861696
>>> --- /dev/null
>>> +++ b/lib/py/anonymize.py
>>> @@ -0,0 +1,72 @@
>>> +# This file contains methods to anonymize criu images.
>>> +
>>> +# In order to anonymize images three steps are followed:
>>> +#     - decode the binary image to json
>>> +#     - strip the necessary information from the json dict
>>> +#     - encode the json dict back to a binary image, which is now anonymized
>>> +
>>> +# The following contents are being anonymized:
>>> +#     - Paths to files
>>> +
>>> +import hashlib
>>> +
>>> +def files_anon(image):
>>> +    levels = {}
>>> +
>>> +    fname_key = 'reg'
>>> +    checksum  = hashlib.sha1()
>>> +
>>> +    for e in image['entries']:
>>> +        if fname_key in e:
>>> +            f_path = e[fname_key]['name']
>>> +
>>> +        f_path  = f_path.split('/')
>>> +        lev_num = 0
>>> +
>>> +        for i, p in enumerate(f_path):
>>> +            if p == '':
>>> +                continue
>>> +            if lev_num not in levels:
>>> +                levels[lev_num] = {}
>>> +            if p not in levels[lev_num]:
>>> +                if i == 1:
>>> +                    levels[lev_num][p] = p
>>> +                else:
>>> +                    checksum.update(p)
>>> +                    levels[lev_num][p] = checksum.hexdigest()
>>> +            lev_num += 1
>>> +
>>> +    for i, e in enumerate(image['entries']):
>>> +        if fname_key in e:
>>> +            f_path = e[fname_key]['name']
>>> +
>>> +        if f_path == '/':
>>> +            continue
>>> +
>>> +        f_path = f_path.split('/')
>>> +        lev_num = 0
>>> +
>>> +        for j, p in enumerate(f_path):
>>> +            if p == '':
>>> +                continue
>>> +            f_path[j] = levels[lev_num][p]
>>> +            lev_num += 1
>>> +        f_path = '/'.join(f_path)
>>> +        image['entries'][i][fname_key]['name'] = f_path
>>> +
>>> +    return image
>>> +
>>> +anonymizers = {
>>> +    'FILES': files_anon
>>> +}
>>> +
>>> +def anon_handler(image):
>>> +    magic = image['magic']
>>> +
>>> +    if magic != 'FILES':
>>> +        return -1
>>> +
>>> +    handler  = anonymizers[magic]
>>> +    anon_img = handler(image)
>>> +
>>> +    return anon_img
>>> diff --git a/lib/py/cli.py b/lib/py/cli.py
>>> index fdb24dbe..400c084e 100755
>>> --- a/lib/py/cli.py
>>> +++ b/lib/py/cli.py
>>> @@ -6,6 +6,7 @@ import os
>>>  import glob
>>>
>>>  import pycriu
>>> +from anonymize import anon_handler
>>>
>>>  def inf(opts):
>>>       if opts['in']:
>>> @@ -286,6 +287,9 @@ def anonymize(opts):
>>>
>>>               try:
>>>                       img = pycriu.images.load(inf(inf_opts))
>>> +                     anon_dict = anon_handler(img)
>>> +                     if anon_dict != -1:
>>> +                             pycriu.images.dump(anon_dict, outf(inf_opts))
>>>               except pycriu.images.MagicException as exc:
>>>                       print("Unknown magic %#x.\n"\
>>>                                       "Found a raw image" %exc.magic, file=sys.stderr)
>>>
>>
> .
>
Harshavardhan Unnibhavi Aug. 26, 2019, 5:38 p.m.
Hi Pavel,

On Wed, Aug 14, 2019 at 3:17 PM Pavel Emelianov <xemul@virtuozzo.com> wrote:
>
> On 8/11/19 12:30 PM, Harshavardhan Unnibhavi wrote:
> > Hi Pavel,
> >
> > Sorry for the late reply. When we restore the anonymized images(with
> > file names anonymized) it fails to restore the task as the files
> > themselves don't exist. I have gone through the  criu/pie/restorer.c's
> > restore_task() function and have found that lines between 1553 and
> > 1599 access the file descriptor from the VMA entry struct.
>
> Of course. The "anonymized" file paths cannot be opened, so you should patch
> the criu restore code to make it pretend that the file in question is opened,
> but restore some fake file descriptor instead. And abort the restore at the
> very end.
>
> > I have also traced out the file access steps followed by restore from
> > criu/cr-restore.c to criu/files-reg.c. I think the solution would be
> > to create a new file with the anonymized name? Where should I insert
> > this piece of code,to create a new file if it doesn't exist?
>
> Not create the file with anonymous name, but rather report back a fake file
> descriptor. I'd introduce a "--anon" CLI option to the restore action for this.
I think I will create a patch which would include the following:
1) --anon option,
2)abort the restore task just before the call to
fork_with_pid(cr_restore_tasks()
 calls restore_root_task which calls fork_with_pid). This seems to be
the endpoint before
everything is read in.

Question: Will the bugs that are reported be reproduced from the
logs(restore.log)?
>
> -- Pavel
>
> > Best,
> > Harsha
> >
> > On Tue, Jul 9, 2019 at 6:27 PM Pavel Emelianov <xemul@virtuozzo.com> wrote:
> >>
> >> On 6/30/19 10:53 AM, Harshavardhan Unnibhavi wrote:
> >>> File path names are replaced by their corresponding sha1 hash values.
> >>> The top level names such as bin, var, usr, lib etc, are kept unchanged.
> >>>
> >>> Resolve Issue #360.
> >>
> >> This looks really good :)
> >>
> >> Let's go ahead and try to teach criu-restore do the restore of the anonymized
> >> images until the criu/pie/restorer.c's restore_task() line
> >>
> >>         restore_finish_stage(task_entries_local, CR_STATE_RESTORE_CREDS);
> >>
> >> after which the whole restore just aborts and exits.
> >>
> >>> Signed-off-by: Harshavardhan Unnibhavi <hvubfoss@gmail.com>
> >>> ---
> >>>  lib/py/anonymize.py | 72 +++++++++++++++++++++++++++++++++++++++++++++
> >>>  lib/py/cli.py       |  4 +++
> >>>  2 files changed, 76 insertions(+)
> >>>  create mode 100644 lib/py/anonymize.py
> >>>
> >>> diff --git a/lib/py/anonymize.py b/lib/py/anonymize.py
> >>> new file mode 100644
> >>> index 00000000..42861696
> >>> --- /dev/null
> >>> +++ b/lib/py/anonymize.py
> >>> @@ -0,0 +1,72 @@
> >>> +# This file contains methods to anonymize criu images.
> >>> +
> >>> +# In order to anonymize images three steps are followed:
> >>> +#     - decode the binary image to json
> >>> +#     - strip the necessary information from the json dict
> >>> +#     - encode the json dict back to a binary image, which is now anonymized
> >>> +
> >>> +# The following contents are being anonymized:
> >>> +#     - Paths to files
> >>> +
> >>> +import hashlib
> >>> +
> >>> +def files_anon(image):
> >>> +    levels = {}
> >>> +
> >>> +    fname_key = 'reg'
> >>> +    checksum  = hashlib.sha1()
> >>> +
> >>> +    for e in image['entries']:
> >>> +        if fname_key in e:
> >>> +            f_path = e[fname_key]['name']
> >>> +
> >>> +        f_path  = f_path.split('/')
> >>> +        lev_num = 0
> >>> +
> >>> +        for i, p in enumerate(f_path):
> >>> +            if p == '':
> >>> +                continue
> >>> +            if lev_num not in levels:
> >>> +                levels[lev_num] = {}
> >>> +            if p not in levels[lev_num]:
> >>> +                if i == 1:
> >>> +                    levels[lev_num][p] = p
> >>> +                else:
> >>> +                    checksum.update(p)
> >>> +                    levels[lev_num][p] = checksum.hexdigest()
> >>> +            lev_num += 1
> >>> +
> >>> +    for i, e in enumerate(image['entries']):
> >>> +        if fname_key in e:
> >>> +            f_path = e[fname_key]['name']
> >>> +
> >>> +        if f_path == '/':
> >>> +            continue
> >>> +
> >>> +        f_path = f_path.split('/')
> >>> +        lev_num = 0
> >>> +
> >>> +        for j, p in enumerate(f_path):
> >>> +            if p == '':
> >>> +                continue
> >>> +            f_path[j] = levels[lev_num][p]
> >>> +            lev_num += 1
> >>> +        f_path = '/'.join(f_path)
> >>> +        image['entries'][i][fname_key]['name'] = f_path
> >>> +
> >>> +    return image
> >>> +
> >>> +anonymizers = {
> >>> +    'FILES': files_anon
> >>> +}
> >>> +
> >>> +def anon_handler(image):
> >>> +    magic = image['magic']
> >>> +
> >>> +    if magic != 'FILES':
> >>> +        return -1
> >>> +
> >>> +    handler  = anonymizers[magic]
> >>> +    anon_img = handler(image)
> >>> +
> >>> +    return anon_img
> >>> diff --git a/lib/py/cli.py b/lib/py/cli.py
> >>> index fdb24dbe..400c084e 100755
> >>> --- a/lib/py/cli.py
> >>> +++ b/lib/py/cli.py
> >>> @@ -6,6 +6,7 @@ import os
> >>>  import glob
> >>>
> >>>  import pycriu
> >>> +from anonymize import anon_handler
> >>>
> >>>  def inf(opts):
> >>>       if opts['in']:
> >>> @@ -286,6 +287,9 @@ def anonymize(opts):
> >>>
> >>>               try:
> >>>                       img = pycriu.images.load(inf(inf_opts))
> >>> +                     anon_dict = anon_handler(img)
> >>> +                     if anon_dict != -1:
> >>> +                             pycriu.images.dump(anon_dict, outf(inf_opts))
> >>>               except pycriu.images.MagicException as exc:
> >>>                       print("Unknown magic %#x.\n"\
> >>>                                       "Found a raw image" %exc.magic, file=sys.stderr)
> >>>
> >>
> > .
> >
>

Best,
Harsha
Pavel Emelianov Aug. 27, 2019, 9:54 a.m.
On 8/26/19 8:38 PM, Harshavardhan Unnibhavi wrote:
> Hi Pavel,
> 
> On Wed, Aug 14, 2019 at 3:17 PM Pavel Emelianov <xemul@virtuozzo.com> wrote:
>>
>> On 8/11/19 12:30 PM, Harshavardhan Unnibhavi wrote:
>>> Hi Pavel,
>>>
>>> Sorry for the late reply. When we restore the anonymized images(with
>>> file names anonymized) it fails to restore the task as the files
>>> themselves don't exist. I have gone through the  criu/pie/restorer.c's
>>> restore_task() function and have found that lines between 1553 and
>>> 1599 access the file descriptor from the VMA entry struct.
>>
>> Of course. The "anonymized" file paths cannot be opened, so you should patch
>> the criu restore code to make it pretend that the file in question is opened,
>> but restore some fake file descriptor instead. And abort the restore at the
>> very end.
>>
>>> I have also traced out the file access steps followed by restore from
>>> criu/cr-restore.c to criu/files-reg.c. I think the solution would be
>>> to create a new file with the anonymized name? Where should I insert
>>> this piece of code,to create a new file if it doesn't exist?
>>
>> Not create the file with anonymous name, but rather report back a fake file
>> descriptor. I'd introduce a "--anon" CLI option to the restore action for this.
> I think I will create a patch which would include the following:
> 1) --anon option,

OK

> 2)abort the restore task just before the call to
> fork_with_pid(cr_restore_tasks()
>  calls restore_root_task which calls fork_with_pid). This seems to be
> the endpoint before
> everything is read in.

Nope, that's too early. We must be sure that all the criu restore code runs OK.
You should instead make the CR_STATE_COMPLETE stage fail on the --anon option.

> Question: Will the bugs that are reported be reproduced from the
> logs(restore.log)?

I don't get the question, would you elaborate one?

-- Pavel
Harshavardhan Unnibhavi Aug. 27, 2019, 6:32 p.m.
Hi,

On Tue, Aug 27, 2019 at 3:24 PM Pavel Emelianov <xemul@virtuozzo.com> wrote:
>
> On 8/26/19 8:38 PM, Harshavardhan Unnibhavi wrote:
> > Hi Pavel,
> >
> > On Wed, Aug 14, 2019 at 3:17 PM Pavel Emelianov <xemul@virtuozzo.com> wrote:
> >>
> >> On 8/11/19 12:30 PM, Harshavardhan Unnibhavi wrote:
> >>> Hi Pavel,
> >>>
> >>> Sorry for the late reply. When we restore the anonymized images(with
> >>> file names anonymized) it fails to restore the task as the files
> >>> themselves don't exist. I have gone through the  criu/pie/restorer.c's
> >>> restore_task() function and have found that lines between 1553 and
> >>> 1599 access the file descriptor from the VMA entry struct.
> >>
> >> Of course. The "anonymized" file paths cannot be opened, so you should patch
> >> the criu restore code to make it pretend that the file in question is opened,
> >> but restore some fake file descriptor instead. And abort the restore at the
> >> very end.
> >>
> >>> I have also traced out the file access steps followed by restore from
> >>> criu/cr-restore.c to criu/files-reg.c. I think the solution would be
> >>> to create a new file with the anonymized name? Where should I insert
> >>> this piece of code,to create a new file if it doesn't exist?
> >>
> >> Not create the file with anonymous name, but rather report back a fake file
> >> descriptor. I'd introduce a "--anon" CLI option to the restore action for this.
> > I think I will create a patch which would include the following:
> > 1) --anon option,
>
> OK
>
> > 2)abort the restore task just before the call to
> > fork_with_pid(cr_restore_tasks()
> >  calls restore_root_task which calls fork_with_pid). This seems to be
> > the endpoint before
> > everything is read in.
>
> Nope, that's too early. We must be sure that all the criu restore code runs OK.
> You should instead make the CR_STATE_COMPLETE stage fail on the --anon option.
Yes, I understand why now.
>
> > Question: Will the bugs that are reported be reproduced from the
> > logs(restore.log)?
>
> I don't get the question, would you elaborate one?
It was just a misunderstanding due to my incorrect suggestion to abort
before fork_with_pid(), it is
clear now.
>
> -- Pavel

Best,
Harsha