[RFC,1/1] mount: universally disallow mounting over symlinks

Submitted by Aleksa Sarai on Dec. 30, 2019, 5:20 a.m.

Details

Message ID 20191230052036.8765-2-cyphar@cyphar.com
State New
Series "mount: universally disallow mounting over symlinks"
Headers show

Commit Message

Aleksa Sarai Dec. 30, 2019, 5:20 a.m.
An undocumented feature of the mount interface was that it was possible
to mount over a symlink (even with the old mount API) by mounting over
/proc/self/fd/$n -- where the corresponding file descrpitor was opened
with (O_PATH|O_NOFOLLOW). This didn't work with traditional "new" mounts
(for a variety of reasons), but MS_BIND worked without issue. With the
new mount API it was even easier.

From userspace's perspective, this capability is only really useful as
an attack vector. Until the introduction of openat2(RESOLVE_NO_XDEV),
there was no trivial way to detect if a bind-mount was present. In the
container runtime context (in a similar vein to CVE-2019-19921), this
could result in a privileged process being unable to detect that a
configuration resulted in magic-link usage operating on the wrong
magic-links. Additionally, the API to use this feature was incredibly
strange -- in order to umount, you would have go through
/proc/self/fd/$n again (umounting the path would result in the
*underlying* symlink being followed).

Which brings us to the issues on the kernel side. When umounting a mount
on top of a symlink, several oopses (both NULL and garbage kernel
address dereferences) and deadlocks could be triggered incredibly
trivially. Note that because this works in user namespaces, an
unprivileged user could trigger these oopses incredibly trivially. While
these bugs could be fixed separately, it seems much cleaner to disable a
"feature" which clearly was not intentional (and is not used --
otherwise we would've seen bug reports about it breaking on umount).

Note that because the linux-utils mount(1) helper will expand paths
containing symlinks in user-space, only users which used the mount(2)
syscall directly could possibly have seen this behaviour.

Cc: stable@vger.kernel.org # pre-git
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: David Howells <dhowells@redhat.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
---
 fs/namespace.c | 17 ++++++++++++-----
 1 file changed, 12 insertions(+), 5 deletions(-)

Patch hide | download patch | download mbox

diff --git a/fs/namespace.c b/fs/namespace.c
index be601d3a8008..01a62bce105f 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -2172,8 +2172,12 @@  static int graft_tree(struct mount *mnt, struct mount *p, struct mountpoint *mp)
 	if (mnt->mnt.mnt_sb->s_flags & SB_NOUSER)
 		return -EINVAL;
 
+	if (d_is_symlink(mp->m_dentry) ||
+	    d_is_symlink(mnt->mnt.mnt_root))
+		return -EINVAL;
+
 	if (d_is_dir(mp->m_dentry) !=
-	      d_is_dir(mnt->mnt.mnt_root))
+	    d_is_dir(mnt->mnt.mnt_root))
 		return -ENOTDIR;
 
 	return attach_recursive_mnt(mnt, p, mp, false);
@@ -2251,6 +2255,9 @@  static struct mount *__do_loopback(struct path *old_path, int recurse)
 	if (IS_MNT_UNBINDABLE(old))
 		return mnt;
 
+	if (d_is_symlink(old_path->dentry))
+		return mnt;
+
 	if (!check_mnt(old) && old_path->dentry->d_op != &ns_dentry_operations)
 		return mnt;
 
@@ -2635,6 +2642,10 @@  static int do_move_mount(struct path *old_path, struct path *new_path)
 	if (old_path->dentry != old_path->mnt->mnt_root)
 		goto out;
 
+	if (d_is_symlink(new_path->dentry) ||
+	    d_is_symlink(old_path->dentry))
+		goto out;
+
 	if (d_is_dir(new_path->dentry) !=
 	    d_is_dir(old_path->dentry))
 		goto out;
@@ -2726,10 +2737,6 @@  static int do_add_mount(struct mount *newmnt, struct path *path, int mnt_flags)
 	    path->mnt->mnt_root == path->dentry)
 		goto unlock;
 
-	err = -EINVAL;
-	if (d_is_symlink(newmnt->mnt.mnt_root))
-		goto unlock;
-
 	newmnt->mnt.mnt_flags = mnt_flags;
 	err = graft_tree(newmnt, parent, mp);
 

Comments

Linus Torvalds Dec. 30, 2019, 7:34 a.m.
On Sun, Dec 29, 2019 at 9:21 PM Aleksa Sarai <cyphar@cyphar.com> wrote:
>
> +       if (d_is_symlink(mp->m_dentry) ||
> +           d_is_symlink(mnt->mnt.mnt_root))
> +               return -EINVAL;

So I don't hate this kind of check in general - overmounting a symlink
sounds odd, but at the same time I get the feeling that the real issue
is that something went wrong earlier.

Yeah, the mount target kind of _is_ a path, but at the same time, we
most definitely want to have the permission to really open the
directory in question, don't we, and I don't see that we should accept
a O_PATH file descriptor.

I feel like the only valid use of "O_PATH" files is to then use them
as the base for an openat() and friends (ie fchmodat/execveat() etc).

But maybe I'm completely wrong, and people really do want O_PATH
handling exactly for mounting too. It does sound a bit odd. By
definition, mounting wants permissions to the mount-point, so what's
the point of using O_PATH?

So instead of saying "don't overmount symlinks", I would feel like
it's the mount system call that should use a proper file descriptor
that isn't FMODE_PATH.

Is it really the symlink that is the issue? Because if it's the
symlink that is the issue then I feel like O_NOFOLLOW should have
triggered it, but your other email seems to say that you really need
O_PATH | O_SYMLINK.

So I'm not sayng that this patch is wrong, but it really smells a bit
like it's papering over the more fundamental issue.

For example, is the problem that when you do a proper

  fd = open("somepath", O_PATH);

in one process, and then another thread does

   fd = open("/proc/<pid>/fd/<opathfd>", O_RDWR);

then we get confused and do bad things on that *second* open? Because
now the second open doesn't have O_PATH, and doesn't ghet marked
FMODE_PATH, but the underlying file descriptor is one of those limited
"is really only useful for openat() and friends".

I dunno. I haven't thought through the whole thing. But the oopses you
quote seem like we're really doing something wrong, and it really does
feel like your patch in no way _fixes_ the wrong thing we're doing,
it's just hiding the symptoms.

               Linus
Aleksa Sarai Dec. 30, 2019, 8:28 a.m.
On 2019-12-29, Linus Torvalds <torvalds@linux-foundation.org> wrote:
> On Sun, Dec 29, 2019 at 9:21 PM Aleksa Sarai <cyphar@cyphar.com> wrote:
> > +       if (d_is_symlink(mp->m_dentry) ||
> > +           d_is_symlink(mnt->mnt.mnt_root))
> > +               return -EINVAL;
> 
> So I don't hate this kind of check in general - overmounting a symlink
> sounds odd, but at the same time I get the feeling that the real issue
> is that something went wrong earlier.
> 
> Yeah, the mount target kind of _is_ a path, but at the same time, we
> most definitely want to have the permission to really open the
> directory in question, don't we, and I don't see that we should accept
> a O_PATH file descriptor.

The new mount API uses O_PATH under the hood (which is a good thing
since some files you'd like to avoid actually opening -- FIFOs are the
obvious example) so I'm not sure that's something we could really avoid.

But if we block O_PATH for mounts this will achieve the same thing,
because the only way to get a file descriptor that references a symlink
is through (O_PATH | O_NOFOLLOW).

> I feel like the only valid use of "O_PATH" files is to then use them
> as the base for an openat() and friends (ie fchmodat/execveat() etc).

See below, we use this for all sorts of dirty^Wclever tricks.

> But maybe I'm completely wrong, and people really do want O_PATH
> handling exactly for mounting too. It does sound a bit odd. By
> definition, mounting wants permissions to the mount-point, so what's
> the point of using O_PATH?

When you go through O_PATH, you still get a proper 'struct path' which
means that for operations such as mount (or open) you will operate on
the *real* underlying file.

This is part of what makes magic-links so useful (but also quite
terrifying).

> For example, is the problem that when you do a proper
> 
>   fd = open("somepath", O_PATH);
> 
> in one process, and then another thread does
> 
>    fd = open("/proc/<pid>/fd/<opathfd>", O_RDWR);
> 
> then we get confused and do bad things on that *second* open? Because
> now the second open doesn't have O_PATH, and doesn't ghet marked
> FMODE_PATH, but the underlying file descriptor is one of those limited
> "is really only useful for openat() and friends".

Actually, this isn't true (for the same reason as above) -- when you do
a re-open through /proc/$pid/fd/$n you get a real-as-a-heart-attack file
descriptor. We make lots of use of this in container runtimes in order
to do some dirty^Wfun tricks that help us harden the runtime against
malicious container processes.

You might recall that when I was posting the earlier revisions of
openat2(), I also included a patch for O_EMPTYPATH (which basically did
a re-open of /proc/self/fd/$dfd but without needing /proc). That had
precisely the same semantics so that you could do the same operation
without procfs. That patch was dropped before Al merged openat2(), but I
am probably going to revive it for the reasons I outlined below.

> I dunno. I haven't thought through the whole thing. But the oopses you
> quote seem like we're really doing something wrong, and it really does
> feel like your patch in no way _fixes_ the wrong thing we're doing,
> it's just hiding the symptoms.

That's fair enough.

I'll be honest, the real reason why I don't want mounts over symlinks to
be possible is for an entirely different reason. I'm working on a safe
path resolution library to accompany openat2()[1] -- and one of the
things I want to do is to harden all of our uses of procfs (such that if
we are running in a context where procfs has been messed with -- such as
having files bind-mounted -- we can detect it and abort). The issue with
symlinks is that we need to be able to operate on magic-links (such as
/proc/self/fd/$n and /proc/self/exe) -- and if it's possible bind-mount
over those magic-links then we can't detect it at all.

openat2(RESOLVE_NO_XDEV) would block it, but it also blocks going
through magic-links which change your mount (which would almost always
be true). You can't trust /proc/self/mountinfo by definition -- not just
because of the TOCTOU race but also because you can't depend on /proc to
harden against a "bad" /proc. All other options such as
umount2(MNT_EXPIRE) won't help with magic-links because we cannot take
an O_PATH to a magic-link and follow it -- O_PATHs of symlinks are
completely stunted in this respect.

If allowing bind-mounts over symlinks is allowed (which I don't have a
problem with really), it just means we'll need a few more kernel pieces
to get this hardening to work. But these features would be useful
outside of the problems I'm dealing with (O_EMPTYPATH and some kind of
pidfd-based interface to grab the equivalent of /proc/self/exe and a few
other such magic-link targets).

[1]: https://github.com/openSUSE/libpathrs
Andy Lutomirski Jan. 8, 2020, 4:39 a.m.
On Mon, Dec 30, 2019 at 12:29 AM Aleksa Sarai <cyphar@cyphar.com> wrote:
>
> On 2019-12-29, Linus Torvalds <torvalds@linux-foundation.org> wrote:
> > On Sun, Dec 29, 2019 at 9:21 PM Aleksa Sarai <cyphar@cyphar.com> wrote:
>
> If allowing bind-mounts over symlinks is allowed (which I don't have a
> problem with really), it just means we'll need a few more kernel pieces
> to get this hardening to work. But these features would be useful
> outside of the problems I'm dealing with (O_EMPTYPATH and some kind of
> pidfd-based interface to grab the equivalent of /proc/self/exe and a few
> other such magic-link targets).

As one data point, I would use this ability in virtme: this would
allow me to more reliably mount over /etc/resolve.conf even when it's
a symlink.

(Perhaps I should use overlayfs instead.  Hmm.)