make realpath replace leading // with a single /

Submitted by Natanael Copa on Jan. 13, 2021, 1:28 p.m.

Details

Message ID 20210113132835.22685-1-ncopa@alpinelinux.org
State New
Series "make realpath replace leading // with a single /"
Headers show

Commit Message

Natanael Copa Jan. 13, 2021, 1:28 p.m.
On some systems a leading double slash may have special meaning, so
POSIX[1] says that "If a pathname begins with two successive <slash>
characters, the first component following the leading <slash> characters
may be interpreted in an implementation-defined manner"

While current musl implementation is technically correct, most other
systems' (at least GNU libc, freebsd, openbsd, netbsd macOS)
implementations will replace a leading // with a single /. Make musl
do the same to avoid surprises.

[1]: https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap04.html#tag_04_13
---
 src/misc/realpath.c | 3 ---
 1 file changed, 3 deletions(-)

Patch hide | download patch | download mbox

diff --git a/src/misc/realpath.c b/src/misc/realpath.c
index db8b74dc..414b4741 100644
--- a/src/misc/realpath.c
+++ b/src/misc/realpath.c
@@ -46,9 +46,6 @@  restart:
 			q=0;
 			output[q++] = '/';
 			p++;
-			/* Initial // is special. */
-			if (stack[p] == '/' && stack[p+1] != '/')
-				output[q++] = '/';
 			continue;
 		}
 

Comments

Natanael Copa Jan. 13, 2021, 1:38 p.m.
On Wed, 13 Jan 2021 14:28:35 +0100
Natanael Copa <ncopa@alpinelinux.org> wrote:

> On some systems a leading double slash may have special meaning, so
> POSIX[1] says that "If a pathname begins with two successive <slash>
> characters, the first component following the leading <slash> characters
> may be interpreted in an implementation-defined manner"
> 
> While current musl implementation is technically correct, most other
> systems' (at least GNU libc, freebsd, openbsd, netbsd macOS)
> implementations will replace a leading // with a single /. Make musl
> do the same to avoid surprises.
> 
> [1]: https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap04.html#tag_04_13
> ---
>  src/misc/realpath.c | 3 ---
>  1 file changed, 3 deletions(-)
> 
> diff --git a/src/misc/realpath.c b/src/misc/realpath.c
> index db8b74dc..414b4741 100644
> --- a/src/misc/realpath.c
> +++ b/src/misc/realpath.c
> @@ -46,9 +46,6 @@ restart:
>  			q=0;
>  			output[q++] = '/';
>  			p++;
> -			/* Initial // is special. */
> -			if (stack[p] == '/' && stack[p+1] != '/')
> -				output[q++] = '/';
>  			continue;
>  		}
>  

This fixes gettext's (gnulib) testsuite, which tests if realpath("//",
NULL) return "/" and fails if it doesn't.

I ran this testcase on multiple systems:

#include <stdio.h>
#include <stdlib.h>

int main() {
	printf("%s\n", realpath("//", NULL));
	return 0;
}



musl 1.1.24:	/
ubuntu 20.03:	/
macOS Big Sur:	/
OpenBSD 6.8:	/
FreeBSD 12.2:	/
NetBSD 9.1:	/
musl (current)	//

I don't know why this behavior was introduced in musl, but I think
this only adds meaningless friction to downstream users.

-nc
Rich Felker Jan. 13, 2021, 6:01 p.m.
On Wed, Jan 13, 2021 at 02:38:09PM +0100, Natanael Copa wrote:
> On Wed, 13 Jan 2021 14:28:35 +0100
> Natanael Copa <ncopa@alpinelinux.org> wrote:
> 
> > On some systems a leading double slash may have special meaning, so
> > POSIX[1] says that "If a pathname begins with two successive <slash>
> > characters, the first component following the leading <slash> characters
> > may be interpreted in an implementation-defined manner"
> > 
> > While current musl implementation is technically correct, most other
> > systems' (at least GNU libc, freebsd, openbsd, netbsd macOS)
> > implementations will replace a leading // with a single /. Make musl
> > do the same to avoid surprises.
> > 
> > [1]: https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap04.html#tag_04_13
> > ---
> >  src/misc/realpath.c | 3 ---
> >  1 file changed, 3 deletions(-)
> > 
> > diff --git a/src/misc/realpath.c b/src/misc/realpath.c
> > index db8b74dc..414b4741 100644
> > --- a/src/misc/realpath.c
> > +++ b/src/misc/realpath.c
> > @@ -46,9 +46,6 @@ restart:
> >  			q=0;
> >  			output[q++] = '/';
> >  			p++;
> > -			/* Initial // is special. */
> > -			if (stack[p] == '/' && stack[p+1] != '/')
> > -				output[q++] = '/';
> >  			continue;
> >  		}
> >  
> 
> This fixes gettext's (gnulib) testsuite, which tests if realpath("//",
> NULL) return "/" and fails if it doesn't.
> 
> I ran this testcase on multiple systems:
> 
> #include <stdio.h>
> #include <stdlib.h>
> 
> int main() {
> 	printf("%s\n", realpath("//", NULL));
> 	return 0;
> }
> 
> 
> 
> musl 1.1.24:	/
> ubuntu 20.03:	/
> macOS Big Sur:	/
> OpenBSD 6.8:	/
> FreeBSD 12.2:	/
> NetBSD 9.1:	/
> musl (current)	//
> 
> I don't know why this behavior was introduced in musl, but I think
> this only adds meaningless friction to downstream users.

It wasn't so much introduced new as written to avoid hard-coding a
non-portable interpretation of pathnames into binaries. Previously, we
relied on the kernel's path normalization as displayed in /proc/*/fd/*
symlink contents to do the normalization, and it was able to convert
leading // to / on the basis that it (the kernel) does not treat them
differently. However POSIX specifies / and // as potentially being
different (with /// and beyond all being equivalent to /, though), and
if musl's userspace realpath resolution made the normalization of //
to /, it would be encoding the present-day-Linux-specific semantic
assumption. I did not want to do that.

POSIX reserves // presumably to allow things like network shares,
abstract volume references, drive letters, etc. not explcitly mounted
into the filesystem but referenced via a user-provided pathname. At
one point midipix was planning to use this for Windows drive letters,
but I think they hit too much friction with applications doing their
own pathname normalization that broke it (converting the // to /) and
dropped it. Part of my motive in preserving compatibility with kernel
implementations that use // specially was aimed at midipix (I wasn't
yet aware they'd dropped it) but I was also thinking about the ability
to run musl binaries on non-Linux kernels where pathnames should be
interpreted according to the host filesystem policy, not a hard-coded
assumption that Linux and other currently popular systems don't use //
for anything.

It would perhaps be nice to detect at realpath time when //foo and
/foo are the same inode and drop the extra / in that case, but that
requires pulling in stat, which is large nowadays thanks to the need
for translation of kernel structures. It might be worth doing anyway
if this ends up being a problem, but from my understanding on IRC
earlier the breakage you hit was just a testsuite asserting something
invalid, and nothing was actually broken at build or run time.

Rich