glob: implement GLOB_TILDE

Submitted by Ismael Luceno on July 23, 2019, 6:33 p.m.

Details

Message ID 20190723183354.31633-1-ismael@iodev.co.uk
State New
Series "glob: implement GLOB_TILDE"
Headers show

Commit Message

Ismael Luceno July 23, 2019, 6:33 p.m.
Signed-off-by: Ismael Luceno <ismael@iodev.co.uk>
---
 include/glob.h   |  1 +
 src/regex/glob.c | 26 ++++++++++++++++++++++++++
 2 files changed, 27 insertions(+)

Patch hide | download patch | download mbox

diff --git a/include/glob.h b/include/glob.h
index 76f6c1c68a23..fc8106b20c5b 100644
--- a/include/glob.h
+++ b/include/glob.h
@@ -30,6 +30,7 @@  void globfree(glob_t *);
 #define GLOB_APPEND   0x20
 #define GLOB_NOESCAPE 0x40
 #define	GLOB_PERIOD   0x80
+#define GLOB_TILDE    0x100
 
 #define GLOB_NOSPACE 1
 #define GLOB_ABORTED 2
diff --git a/src/regex/glob.c b/src/regex/glob.c
index aa1c6a4482ee..77b81f707420 100644
--- a/src/regex/glob.c
+++ b/src/regex/glob.c
@@ -8,6 +8,8 @@ 
 #include <stdlib.h>
 #include <errno.h>
 #include <stddef.h>
+#include <unistd.h>
+#include "../passwd/pwf.h"
 
 struct match
 {
@@ -35,6 +37,30 @@  static int do_glob(char *buf, size_t pos, int type, char *pat, int flags, int (*
 	/* If GLOB_MARK is unused, we don't care about type. */
 	if (!type && !(flags & GLOB_MARK)) type = DT_REG;
 
+	if ((flags & GLOB_TILDE) && *pat == '~') {
+		struct passwd pw, *res;
+		char *buf = NULL;
+		size_t len = 0;
+		char *name_end = strchr(++pat, '/');
+		if (name_end)
+			*name_end = 0;
+		uid_t uid = *pat ? 0 : getuid();
+		int err = __getpw_a(pat, uid, &pw, &buf, &len, &res);
+		if (!err && res) {
+			while (pos < PATH_MAX - 1 && *pw.pw_dir)
+				buf[pos++] = *pw.pw_dir++;
+		}
+		free(buf);
+		if (err)
+			return GLOB_NOSPACE;
+		if (!res)
+			return 0;
+		if (name_end) {
+			pat = name_end;
+			*pat = '/';
+		}
+	}
+
 	/* Special-case the remaining pattern being all slashes, in
 	 * which case we can use caller-passed type if it's a dir. */
 	if (*pat && type!=DT_DIR) type = 0;

Comments

Rich Felker July 23, 2019, 8:07 p.m.
On Tue, Jul 23, 2019 at 08:33:54PM +0200, Ismael Luceno wrote:
> Signed-off-by: Ismael Luceno <ismael@iodev.co.uk>
> ---
>  include/glob.h   |  1 +
>  src/regex/glob.c | 26 ++++++++++++++++++++++++++
>  2 files changed, 27 insertions(+)
> 
> diff --git a/include/glob.h b/include/glob.h
> index 76f6c1c68a23..fc8106b20c5b 100644
> --- a/include/glob.h
> +++ b/include/glob.h
> @@ -30,6 +30,7 @@ void globfree(glob_t *);
>  #define GLOB_APPEND   0x20
>  #define GLOB_NOESCAPE 0x40
>  #define	GLOB_PERIOD   0x80
> +#define GLOB_TILDE    0x100
>  
>  #define GLOB_NOSPACE 1
>  #define GLOB_ABORTED 2
> diff --git a/src/regex/glob.c b/src/regex/glob.c
> index aa1c6a4482ee..77b81f707420 100644
> --- a/src/regex/glob.c
> +++ b/src/regex/glob.c
> @@ -8,6 +8,8 @@
>  #include <stdlib.h>
>  #include <errno.h>
>  #include <stddef.h>
> +#include <unistd.h>
> +#include "../passwd/pwf.h"
>  
>  struct match
>  {
> @@ -35,6 +37,30 @@ static int do_glob(char *buf, size_t pos, int type, char *pat, int flags, int (*
>  	/* If GLOB_MARK is unused, we don't care about type. */
>  	if (!type && !(flags & GLOB_MARK)) type = DT_REG;
>  
> +	if ((flags & GLOB_TILDE) && *pat == '~') {
> +		struct passwd pw, *res;
> +		char *buf = NULL;
> +		size_t len = 0;
> +		char *name_end = strchr(++pat, '/');
> +		if (name_end)
> +			*name_end = 0;
> +		uid_t uid = *pat ? 0 : getuid();
> +		int err = __getpw_a(pat, uid, &pw, &buf, &len, &res);
> +		if (!err && res) {
> +			while (pos < PATH_MAX - 1 && *pw.pw_dir)
> +				buf[pos++] = *pw.pw_dir++;
> +		}
> +		free(buf);
> +		if (err)
> +			return GLOB_NOSPACE;
> +		if (!res)
> +			return 0;
> +		if (name_end) {
> +			pat = name_end;
> +			*pat = '/';
> +		}
> +	}

This should almost surely go in the top-level glob function rather
than the recusive function, where the original strdup of pat happens.
As you've written it, I think it wrongly applies tilde expansion to
each path component into which recursion happens, but not to ones
where it doesn't, whereas it should only apply to the first component.
Doing this in the recursive function also has the problem of exploding
stack usage.

One thing I don't see is how the code is working at all right now,
because you've shadowed the argument buf with a locally scoped
variable by the same name that's the working space buffer passed to
__getpw_a. The result directory is never copied out into the actual
buf array. One thing to note at this point: it's generally preferable
to use the public interfaces (getpwnam_r) rather than the internal
ones from a different sybsystem of libc (__getpw_a) unless there's a
really compelling reason not to. Here I think it would actually be
easier using the public interface -- you could reuse buf for the
buffer space you pass to getpwnam_r, then memmove from pw_dir (which
lies somewhere inside buf, you don't know where) to the start of buf.

There's also a matter of intended behavor: my understanding is that
plain ~ without a name is supposed to expand to $HOME, not lookup the
caller's passwd entry based on uid (which is broken on setups where
you have multiple login accounts with the same uid). The man page
doesn't document this at all. I looked just now in the glibc manual
and it documents the behavior in terms of getlogin+getpwnam, which is
utterly broken -- it doesn't work at all in the absence of a login
session associated with a terminal (except in musl where we wrongly
just return getenv("LOGNAME"), a known/open bug). and it's also
inconsistent with the standard shell behavior, which is to use $HOME:

http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_06_01

So I feel like, if we implement GLOB_TILDE, we should do it the way
that's consistent with the standard meaning of ~, using $HOME, not the
glibc implementation. This is also much more user-friendly, in that it
honors manual override of $HOME.

Rich
Ismael Luceno July 24, 2019, 7:15 a.m.
On 23/Jul/2019 16:07, Rich Felker wrote:
> On Tue, Jul 23, 2019 at 08:33:54PM +0200, Ismael Luceno wrote:
> > Signed-off-by: Ismael Luceno <ismael@iodev.co.uk>
> > ---
> >  include/glob.h   |  1 +
> >  src/regex/glob.c | 26 ++++++++++++++++++++++++++
> >  2 files changed, 27 insertions(+)
> > 
> > diff --git a/include/glob.h b/include/glob.h
> > index 76f6c1c68a23..fc8106b20c5b 100644
> > --- a/include/glob.h
> > +++ b/include/glob.h
> > @@ -30,6 +30,7 @@ void globfree(glob_t *);
> >  #define GLOB_APPEND   0x20
> >  #define GLOB_NOESCAPE 0x40
> >  #define	GLOB_PERIOD   0x80
> > +#define GLOB_TILDE    0x100
> >  
> >  #define GLOB_NOSPACE 1
> >  #define GLOB_ABORTED 2
> > diff --git a/src/regex/glob.c b/src/regex/glob.c
> > index aa1c6a4482ee..77b81f707420 100644
> > --- a/src/regex/glob.c
> > +++ b/src/regex/glob.c
> > @@ -8,6 +8,8 @@
> >  #include <stdlib.h>
> >  #include <errno.h>
> >  #include <stddef.h>
> > +#include <unistd.h>
> > +#include "../passwd/pwf.h"
> >  
> >  struct match
> >  {
> > @@ -35,6 +37,30 @@ static int do_glob(char *buf, size_t pos, int type, char *pat, int flags, int (*
> >  	/* If GLOB_MARK is unused, we don't care about type. */
> >  	if (!type && !(flags & GLOB_MARK)) type = DT_REG;
> >  
> > +	if ((flags & GLOB_TILDE) && *pat == '~') {
> > +		struct passwd pw, *res;
> > +		char *buf = NULL;
> > +		size_t len = 0;
> > +		char *name_end = strchr(++pat, '/');
> > +		if (name_end)
> > +			*name_end = 0;
> > +		uid_t uid = *pat ? 0 : getuid();
> > +		int err = __getpw_a(pat, uid, &pw, &buf, &len, &res);
> > +		if (!err && res) {
> > +			while (pos < PATH_MAX - 1 && *pw.pw_dir)
> > +				buf[pos++] = *pw.pw_dir++;
> > +		}
> > +		free(buf);
> > +		if (err)
> > +			return GLOB_NOSPACE;
> > +		if (!res)
> > +			return 0;
> > +		if (name_end) {
> > +			pat = name_end;
> > +			*pat = '/';
> > +		}
> > +	}
> 
> This should almost surely go in the top-level glob function rather
> than the recusive function, where the original strdup of pat happens.
> As you've written it, I think it wrongly applies tilde expansion to
> each path component into which recursion happens, but not to ones
> where it doesn't, whereas it should only apply to the first component.
> Doing this in the recursive function also has the problem of exploding
> stack usage.  One thing I don't see is how the code is working at all
> right now, because you've shadowed the argument buf with a locally
> scoped variable by the same name that's the working space buffer
> passed to __getpw_a. The result directory is never copied out into the
> actual buf array.

Indeed it doesn't work XD.

I meant it more as a quick try and to see if it was interesting and the
right kind of approach.

Though I forgot to add "RFC" to the subject prefix.

> One thing to note at this point: it's generally preferable to use the
> public interfaces (getpwnam_r) rather than the internal ones from a
> different sybsystem of libc (__getpw_a) unless there's a really
> compelling reason not to. Here I think it would actually be easier
> using the public interface -- you could reuse buf for the buffer space
> you pass to getpwnam_r, then memmove from pw_dir (which lies somewhere
> inside buf, you don't know where) to the start of buf.

The reason I had to use the private interface is to avoid dealing with
the buffer allocation on the side of the glob function.

It would be better to keep it that way.

> There's also a matter of intended behavor: my understanding is that
> plain ~ without a name is supposed to expand to $HOME, not lookup the
> caller's passwd entry based on uid (which is broken on setups where
> you have multiple login accounts with the same uid). The man page
> doesn't document this at all. I looked just now in the glibc manual
> and it documents the behavior in terms of getlogin+getpwnam, which is
> utterly broken -- it doesn't work at all in the absence of a login
> session associated with a terminal (except in musl where we wrongly
> just return getenv("LOGNAME"), a known/open bug). and it's also
> inconsistent with the standard shell behavior, which is to use $HOME:
> 
> http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_06_01
> 
> So I feel like, if we implement GLOB_TILDE, we should do it the way
> that's consistent with the standard meaning of ~, using $HOME, not the
> glibc implementation. This is also much more user-friendly, in that it
> honors manual override of $HOME.

I'm not entirely comfortable expanding HOME unconditionally, as it could
easily be exploited into a vulnerability for any privileged (e.g.
setuid) process.

Thank you very much for your comments; I will try to come up with a
better version later today.
Rich Felker July 24, 2019, 1 p.m.
On Wed, Jul 24, 2019 at 09:15:22AM +0200, Ismael Luceno wrote:
> On 23/Jul/2019 16:07, Rich Felker wrote:
> > On Tue, Jul 23, 2019 at 08:33:54PM +0200, Ismael Luceno wrote:
> > > Signed-off-by: Ismael Luceno <ismael@iodev.co.uk>
> > > ---
> > >  include/glob.h   |  1 +
> > >  src/regex/glob.c | 26 ++++++++++++++++++++++++++
> > >  2 files changed, 27 insertions(+)
> > > 
> > > diff --git a/include/glob.h b/include/glob.h
> > > index 76f6c1c68a23..fc8106b20c5b 100644
> > > --- a/include/glob.h
> > > +++ b/include/glob.h
> > > @@ -30,6 +30,7 @@ void globfree(glob_t *);
> > >  #define GLOB_APPEND   0x20
> > >  #define GLOB_NOESCAPE 0x40
> > >  #define	GLOB_PERIOD   0x80
> > > +#define GLOB_TILDE    0x100
> > >  
> > >  #define GLOB_NOSPACE 1
> > >  #define GLOB_ABORTED 2
> > > diff --git a/src/regex/glob.c b/src/regex/glob.c
> > > index aa1c6a4482ee..77b81f707420 100644
> > > --- a/src/regex/glob.c
> > > +++ b/src/regex/glob.c
> > > @@ -8,6 +8,8 @@
> > >  #include <stdlib.h>
> > >  #include <errno.h>
> > >  #include <stddef.h>
> > > +#include <unistd.h>
> > > +#include "../passwd/pwf.h"
> > >  
> > >  struct match
> > >  {
> > > @@ -35,6 +37,30 @@ static int do_glob(char *buf, size_t pos, int type, char *pat, int flags, int (*
> > >  	/* If GLOB_MARK is unused, we don't care about type. */
> > >  	if (!type && !(flags & GLOB_MARK)) type = DT_REG;
> > >  
> > > +	if ((flags & GLOB_TILDE) && *pat == '~') {
> > > +		struct passwd pw, *res;
> > > +		char *buf = NULL;
> > > +		size_t len = 0;
> > > +		char *name_end = strchr(++pat, '/');
> > > +		if (name_end)
> > > +			*name_end = 0;
> > > +		uid_t uid = *pat ? 0 : getuid();
> > > +		int err = __getpw_a(pat, uid, &pw, &buf, &len, &res);
> > > +		if (!err && res) {
> > > +			while (pos < PATH_MAX - 1 && *pw.pw_dir)
> > > +				buf[pos++] = *pw.pw_dir++;
> > > +		}
> > > +		free(buf);
> > > +		if (err)
> > > +			return GLOB_NOSPACE;
> > > +		if (!res)
> > > +			return 0;
> > > +		if (name_end) {
> > > +			pat = name_end;
> > > +			*pat = '/';
> > > +		}
> > > +	}
> > 
> > This should almost surely go in the top-level glob function rather
> > than the recusive function, where the original strdup of pat happens.
> > As you've written it, I think it wrongly applies tilde expansion to
> > each path component into which recursion happens, but not to ones
> > where it doesn't, whereas it should only apply to the first component.
> > Doing this in the recursive function also has the problem of exploding
> > stack usage.  One thing I don't see is how the code is working at all
> > right now, because you've shadowed the argument buf with a locally
> > scoped variable by the same name that's the working space buffer
> > passed to __getpw_a. The result directory is never copied out into the
> > actual buf array.
> 
> Indeed it doesn't work XD.
> 
> I meant it more as a quick try and to see if it was interesting and the
> right kind of approach.

It is interesting. Adding this functionality is a tradeoff, since it
does pull in linking of more stuff when glob is used even without
GLOB_TILDE, but since glob already inherently depends on dynamic
allocation it's not a huge issue, and this has already been planned
functionality for a while (it's on the roadmap).

The approach is somewhat right, modulo points in my previous review I
think.

> > One thing to note at this point: it's generally preferable to use the
> > public interfaces (getpwnam_r) rather than the internal ones from a
> > different sybsystem of libc (__getpw_a) unless there's a really
> > compelling reason not to. Here I think it would actually be easier
> > using the public interface -- you could reuse buf for the buffer space
> > you pass to getpwnam_r, then memmove from pw_dir (which lies somewhere
> > inside buf, you don't know where) to the start of buf.
> 
> The reason I had to use the private interface is to avoid dealing with
> the buffer allocation on the side of the glob function.

As noted, I think you can just reuse buf[] and avoid any dynamic
allocation visible in glob (of course it happens internally in
getpwnam_r but that's hidden). The only way it's not going to fit is
if the user's homedir doesn't fit (or barely fits) in PATH_MAX,
meaning they wouldn't be able to access anything in it with absolute
paths to begin with.

> It would be better to keep it that way.

To clarify, the reason to avoid doing this unless there's a really
strong motivation is to avoid cross-component coupling. If for example
someone needed to change the signature of __getpw_a as part of changes
in the pwd implementation, they'd have to figure out how glob is using
it and how to adapt glob to the new change, rather than just focusing
on the subsystem they're working on and familiar with. By using public
interfaces, you avoid that.

> > There's also a matter of intended behavor: my understanding is that
> > plain ~ without a name is supposed to expand to $HOME, not lookup the
> > caller's passwd entry based on uid (which is broken on setups where
> > you have multiple login accounts with the same uid). The man page
> > doesn't document this at all. I looked just now in the glibc manual
> > and it documents the behavior in terms of getlogin+getpwnam, which is
> > utterly broken -- it doesn't work at all in the absence of a login
> > session associated with a terminal (except in musl where we wrongly
> > just return getenv("LOGNAME"), a known/open bug). and it's also
> > inconsistent with the standard shell behavior, which is to use $HOME:
> > 
> > http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_06_01
> > 
> > So I feel like, if we implement GLOB_TILDE, we should do it the way
> > that's consistent with the standard meaning of ~, using $HOME, not the
> > glibc implementation. This is also much more user-friendly, in that it
> > honors manual override of $HOME.
> 
> I'm not entirely comfortable expanding HOME unconditionally, as it could
> easily be exploited into a vulnerability for any privileged (e.g.
> setuid) process.

I understand and appreciate your concern, but I don't think that can
affect any use that's not already horribly insecure for other reasons.
If glob is being used with GLOB_TILDE in a suid context, one of the
following holds:

- The input pattern is provided by the user, in which case the user
  already fully controls the output, regardless of whether it contains
  ~ or how ~ expands.

- The input pattern is fixed and does not start with plain ~, in which
  case how ~ is expanded is irrelevant.

- The input pattern is fixed and does start with plain ~, in which
  case the program is explicitly requesting output that's under the
  control of the invoking user by virtue of controlling the contents
  of their home directory.

Naturally the cases where the suid program is operating on data
controlled by the invoking user are usually a bad idea, and if done at
all need to be done under extreme care, but I don't see how honoring
the standard meaning of ~ makes that any moreso. The outputs of glob
are always literal pathname strings, and have to be treated as such,
not pasted into command lines or used as format strings or anything
like that.

Rich
Ismael Luceno July 24, 2019, 5:25 p.m.
On 24/Jul/2019 09:00, Rich Felker wrote:
<...>
> To clarify, the reason to avoid doing this unless there's a really
> strong motivation is to avoid cross-component coupling. If for example
> someone needed to change the signature of __getpw_a as part of changes
> in the pwd implementation, they'd have to figure out how glob is using
> it and how to adapt glob to the new change, rather than just focusing
> on the subsystem they're working on and familiar with. By using public
> interfaces, you avoid that.

Good point.