[2/3] Issue #360: Anonymize image files

Submitted by Harshavardhan Unnibhavi on June 22, 2019, 9:37 a.m.

Details

Message ID 20190622093732.9024-2-hvubfoss@gmail.com
State New
Series "Series without cover letter"
Headers show

Commit Message

Harshavardhan Unnibhavi June 22, 2019, 9:37 a.m.
This commit changes the pycriu.images.load() function to return magic type of the loaded image file. The reason was to use this type information to call appropriate methods to handle anonymisation.

Signed-off-by: Harshavardhan Unnibhavi <hvubfoss@gmail.com>
---
 lib/py/cli.py           | 17 ++++++++++++++++-
 lib/py/images/images.py |  5 ++++-
 2 files changed, 20 insertions(+), 2 deletions(-)

Patch hide | download patch | download mbox

diff --git a/lib/py/cli.py b/lib/py/cli.py
index d3242f08..17622fd2 100755
--- a/lib/py/cli.py
+++ b/lib/py/cli.py
@@ -273,8 +273,23 @@  def explore_rss(opts):
 
 
 def anonymize(opts):
-	pass
+	try:
+		os.stat(opts['out'])
+	except:
+		os.mkdir(opts['out'])
+	
+	img_files = os.listdir(opts['in'])
+
+	for i in img_files:
+		temp = {'in':os.path.join(opts['in'], i)}
 
+		try:
+			m, img = pycriu.images.load(inf(temp), anon_info = True)
+		except pycriu.images.MagicException as exc:
+			print("Unknown magic %#x.\n"\
+					"Found a raw image, continuing ..."% exc.magic, file=sys.stderr)
+			continue
+		
 
 explorers = { 'ps': explore_ps, 'fds': explore_fds, 'mems': explore_mems, 'rss': explore_rss }
 
diff --git a/lib/py/images/images.py b/lib/py/images/images.py
index 7a9b9da6..a411ffc4 100644
--- a/lib/py/images/images.py
+++ b/lib/py/images/images.py
@@ -528,7 +528,7 @@  def __rhandler(f):
 
 	return m, handler
 
-def load(f, pretty = False, no_payload = False):
+def load(f, pretty = False, no_payload = False, anon_info = False):
 	"""
 	Convert criu image from binary format to dict(json).
 	Takes a file-like object to read criu image from.
@@ -541,6 +541,9 @@  def load(f, pretty = False, no_payload = False):
 	image['magic'] = m
 	image['entries'] = handler.load(f, pretty, no_payload)
 
+	if anon_info:
+		return m, image
+
 	return image
 
 def info(f):

Comments

Pavel Emelianov June 24, 2019, 11:38 a.m.
>  def anonymize(opts):
> -	pass
> +	try:
> +		os.stat(opts['out'])
> +	except:
> +		os.mkdir(opts['out'])
> +	
> +	img_files = os.listdir(opts['in'])
> +
> +	for i in img_files:
> +		temp = {'in':os.path.join(opts['in'], i)}
>  
> +		try:
> +			m, img = pycriu.images.load(inf(temp), anon_info = True)
> +		except pycriu.images.MagicException as exc:
> +			print("Unknown magic %#x.\n"\
> +					"Found a raw image, continuing ..."% exc.magic, file=sys.stderr)
> +			continue
> +		
>  
>  explorers = { 'ps': explore_ps, 'fds': explore_fds, 'mems': explore_mems, 'rss': explore_rss }
>  
> diff --git a/lib/py/images/images.py b/lib/py/images/images.py
> index 7a9b9da6..a411ffc4 100644
> --- a/lib/py/images/images.py
> +++ b/lib/py/images/images.py
> @@ -528,7 +528,7 @@ def __rhandler(f):
>  
>  	return m, handler
>  
> -def load(f, pretty = False, no_payload = False):
> +def load(f, pretty = False, no_payload = False, anon_info = False):
>  	"""
>  	Convert criu image from binary format to dict(json).
>  	Takes a file-like object to read criu image from.
> @@ -541,6 +541,9 @@ def load(f, pretty = False, no_payload = False):
>  	image['magic'] = m
>  	image['entries'] = handler.load(f, pretty, no_payload)
>  
> +	if anon_info:
> +		return m, image

Please, don't do this, as the value of "m" is available as "image['magic']" already, so your above
code line

    m, img = pycriu.images.load(inf(temp), anon_info = True)

would look like

    img = pycriu.images.load(inf(temp))
    m = img['magic']

-- Pavel
Radostin Stoyanov June 24, 2019, 3:12 p.m.
On 22/06/2019 10:37, Harshavardhan Unnibhavi wrote:
> This commit changes the pycriu.images.load() function to return magic type of the loaded image file. The reason was to use this type information to call appropriate methods to handle anonymisation.
Considering Pavel's comment, this commit could be combined with the 
previous one?
> Signed-off-by: Harshavardhan Unnibhavi <hvubfoss@gmail.com>
> ---
>   lib/py/cli.py           | 17 ++++++++++++++++-
>   lib/py/images/images.py |  5 ++++-
>   2 files changed, 20 insertions(+), 2 deletions(-)
>
> diff --git a/lib/py/cli.py b/lib/py/cli.py
> index d3242f08..17622fd2 100755
> --- a/lib/py/cli.py
> +++ b/lib/py/cli.py
> @@ -273,8 +273,23 @@ def explore_rss(opts):
>   
>   
>   def anonymize(opts):
> -	pass
> +	try:
> +		os.stat(opts['out'])
> +	except:
> +		os.mkdir(opts['out'])
Here we could call os.makedirs() and handle any errors, something like

try:
     os.makedirs(opts['out'])
except OSError as err:
     if err.errno != os.errno.EEXIST:
         raise

> +	
> +	img_files = os.listdir(opts['in'])
Using glob() will return a list of matching pathnames (image files):

img_files = glob.glob(os.path.join(opts['in'], '*.img'))

> +
> +	for i in img_files:
> +		temp = {'in':os.path.join(opts['in'], i)}
When using glob(), the os.path.join() will not be necessary. For the 
next patch, the file name could be obtained with os.path.basename()
>   
> +		try:
> +			m, img = pycriu.images.load(inf(temp), anon_info = True)
> +		except pycriu.images.MagicException as exc:
> +			print("Unknown magic %#x.\n"\
> +					"Found a raw image, continuing ..."% exc.magic, file=sys.stderr)
please add a space before the % sign
> +			continue
is the 'continue' keyword necessary?
> +		
could you please remove the trailing white-space?
>   
>   explorers = { 'ps': explore_ps, 'fds': explore_fds, 'mems': explore_mems, 'rss': explore_rss }
>   
> diff --git a/lib/py/images/images.py b/lib/py/images/images.py
> index 7a9b9da6..a411ffc4 100644
> --- a/lib/py/images/images.py
> +++ b/lib/py/images/images.py
> @@ -528,7 +528,7 @@ def __rhandler(f):
>   
>   	return m, handler
>   
> -def load(f, pretty = False, no_payload = False):
> +def load(f, pretty = False, no_payload = False, anon_info = False):
>   	"""
>   	Convert criu image from binary format to dict(json).
>   	Takes a file-like object to read criu image from.
> @@ -541,6 +541,9 @@ def load(f, pretty = False, no_payload = False):
>   	image['magic'] = m
>   	image['entries'] = handler.load(f, pretty, no_payload)
>   
> +	if anon_info:
> +		return m, image
> +
>   	return image
>   
>   def info(f):
Harshavardhan Unnibhavi June 25, 2019, 11:36 a.m.
On Mon, Jun 24, 2019 at 5:08 PM Pavel Emelianov <xemul@virtuozzo.com> wrote:
>
>
> >  def anonymize(opts):
> > -     pass
> > +     try:
> > +             os.stat(opts['out'])
> > +     except:
> > +             os.mkdir(opts['out'])
> > +
> > +     img_files = os.listdir(opts['in'])
> > +
> > +     for i in img_files:
> > +             temp = {'in':os.path.join(opts['in'], i)}
> >
> > +             try:
> > +                     m, img = pycriu.images.load(inf(temp), anon_info = True)
> > +             except pycriu.images.MagicException as exc:
> > +                     print("Unknown magic %#x.\n"\
> > +                                     "Found a raw image, continuing ..."% exc.magic, file=sys.stderr)
> > +                     continue
> > +
> >
> >  explorers = { 'ps': explore_ps, 'fds': explore_fds, 'mems': explore_mems, 'rss': explore_rss }
> >
> > diff --git a/lib/py/images/images.py b/lib/py/images/images.py
> > index 7a9b9da6..a411ffc4 100644
> > --- a/lib/py/images/images.py
> > +++ b/lib/py/images/images.py
> > @@ -528,7 +528,7 @@ def __rhandler(f):
> >
> >       return m, handler
> >
> > -def load(f, pretty = False, no_payload = False):
> > +def load(f, pretty = False, no_payload = False, anon_info = False):
> >       """
> >       Convert criu image from binary format to dict(json).
> >       Takes a file-like object to read criu image from.
> > @@ -541,6 +541,9 @@ def load(f, pretty = False, no_payload = False):
> >       image['magic'] = m
> >       image['entries'] = handler.load(f, pretty, no_payload)
> >
> > +     if anon_info:
> > +             return m, image
>
> Please, don't do this, as the value of "m" is available as "image['magic']" already, so your above
> code line
>
>     m, img = pycriu.images.load(inf(temp), anon_info = True)
>
> would look like
>
>     img = pycriu.images.load(inf(temp))
>     m = img['magic']
>
Thanks, I will change it and send a revised v2 patch.
> -- Pavel

-- Harsha
Harshavardhan Unnibhavi June 25, 2019, 1:34 p.m.
On Mon, Jun 24, 2019 at 8:42 PM Radostin Stoyanov <rstoyanov1@gmail.com> wrote:
>
> On 22/06/2019 10:37, Harshavardhan Unnibhavi wrote:
> > This commit changes the pycriu.images.load() function to return magic type of the loaded image file. The reason was to use this type information to call appropriate methods to handle anonymisation.
> Considering Pavel's comment, this commit could be combined with the
> previous one?
Yes that makes sense, I will combine them into one single commit.

> > Signed-off-by: Harshavardhan Unnibhavi <hvubfoss@gmail.com>
> > ---
> >   lib/py/cli.py           | 17 ++++++++++++++++-
> >   lib/py/images/images.py |  5 ++++-
> >   2 files changed, 20 insertions(+), 2 deletions(-)
> >
> > diff --git a/lib/py/cli.py b/lib/py/cli.py
> > index d3242f08..17622fd2 100755
> > --- a/lib/py/cli.py
> > +++ b/lib/py/cli.py
> > @@ -273,8 +273,23 @@ def explore_rss(opts):
> >
> >
> >   def anonymize(opts):
> > -     pass
> > +     try:
> > +             os.stat(opts['out'])
> > +     except:
> > +             os.mkdir(opts['out'])
> Here we could call os.makedirs() and handle any errors, something like
>
> try:
>      os.makedirs(opts['out'])
> except OSError as err:
>      if err.errno != os.errno.EEXIST:
>          raise
>
Sure I will make the change.

> > +
> > +     img_files = os.listdir(opts['in'])
> Using glob() will return a list of matching pathnames (image files):
>
> img_files = glob.glob(os.path.join(opts['in'], '*.img'))
>
> > +
> > +     for i in img_files:
> > +             temp = {'in':os.path.join(opts['in'], i)}
> When using glob(), the os.path.join() will not be necessary. For the
> next patch, the file name could be obtained with os.path.basename()
> >
> > +             try:
> > +                     m, img = pycriu.images.load(inf(temp), anon_info = True)
> > +             except pycriu.images.MagicException as exc:
> > +                     print("Unknown magic %#x.\n"\
> > +                                     "Found a raw image, continuing ..."% exc.magic, file=sys.stderr)
> please add a space before the % sign
> > +                     continue
> is the 'continue' keyword necessary?
Yes, as the lines after it use the img and m variable. I could
actually move those lines into the try block, and hence the continue
is not absolutely necessary.
> > +
> could you please remove the trailing white-space?
> >
> >   explorers = { 'ps': explore_ps, 'fds': explore_fds, 'mems': explore_mems, 'rss': explore_rss }
> >
> > diff --git a/lib/py/images/images.py b/lib/py/images/images.py
> > index 7a9b9da6..a411ffc4 100644
> > --- a/lib/py/images/images.py
> > +++ b/lib/py/images/images.py
> > @@ -528,7 +528,7 @@ def __rhandler(f):
> >
> >       return m, handler
> >
> > -def load(f, pretty = False, no_payload = False):
> > +def load(f, pretty = False, no_payload = False, anon_info = False):
> >       """
> >       Convert criu image from binary format to dict(json).
> >       Takes a file-like object to read criu image from.
> > @@ -541,6 +541,9 @@ def load(f, pretty = False, no_payload = False):
> >       image['magic'] = m
> >       image['entries'] = handler.load(f, pretty, no_payload)
> >
> > +     if anon_info:
> > +             return m, image
> > +
> >       return image
> >
> >   def info(f):
I should send a version 2 patch right? Or a new set as I will combine
this and the previous one?

Best,
Harsha
Pavel Emelianov June 27, 2019, 10:07 a.m.
>>> @@ -541,6 +541,9 @@ def load(f, pretty = False, no_payload = False):
>>>       image['magic'] = m
>>>       image['entries'] = handler.load(f, pretty, no_payload)
>>>
>>> +     if anon_info:
>>> +             return m, image
>>> +
>>>       return image
>>>
>>>   def info(f):
> I should send a version 2 patch right? Or a new set as I will combine
> this and the previous one?

The new set, please. And when preparing e-mails, make "Subject"-s of them contain
summary of the patch, not an abstract "issue #360: Anonymize image files" phrase :) 
This phrase is OK for the series cover letter.

Like this

[PATCH 0/3] Issue 360: Anonymize image files
 `- [PATCH 1/3] crit: Add 'anonymize' action
 `- [PATCH 2/3] crit: Anonymization skeleton
 `- [PATCH 3/3] crit: Anonymize file paths in reg-files.img

-- Pavel