[12/17] proc/fd: In fdinfo seq_show don't use get_files_struct

Submitted by Eric W. Biederman on Aug. 17, 2020, 10:04 p.m.

Details

Message ID 20200817220425.9389-12-ebiederm@xmission.com
State New
Series "Series without cover letter"
Headers show

Commit Message

Eric W. Biederman Aug. 17, 2020, 10:04 p.m.
When discussing[1] exec and posix file locks it was realized that none
of the callers of get_files_struct fundamentally needed to call
get_files_struct, and that by switching them to helper functions
instead it will both simplify their code and remove unnecessary
increments of files_struct.count.  Those unnecessary increments can
result in exec unnecessarily unsharing files_struct which breaking
posix locks, and it can result in fget_light having to fallback to
fget reducing system performance.

Instead hold task_lock for the duration that task->files needs to be
stable in seq_show.  The task_lock was already taken in
get_files_struct, and so skipping get_files_struct performs less work
overall, and avoids the problems with the files_struct reference
count.

[1] https://lkml.kernel.org/r/20180915160423.GA31461@redhat.com
Suggested-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 fs/proc/fd.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

Patch hide | download patch | download mbox

diff --git a/fs/proc/fd.c b/fs/proc/fd.c
index d9fee5390fd7..0b46eea154b7 100644
--- a/fs/proc/fd.c
+++ b/fs/proc/fd.c
@@ -28,9 +28,8 @@  static int seq_show(struct seq_file *m, void *v)
 	if (!task)
 		return -ENOENT;
 
-	files = get_files_struct(task);
-	put_task_struct(task);
-
+	task_lock(task);
+	files = task->files;
 	if (files) {
 		unsigned int fd = proc_fd(m->private);
 
@@ -47,8 +46,9 @@  static int seq_show(struct seq_file *m, void *v)
 			ret = 0;
 		}
 		spin_unlock(&files->file_lock);
-		put_files_struct(files);
 	}
+	task_unlock(task);
+	put_task_struct(task);
 
 	if (ret)
 		return ret;
@@ -57,6 +57,7 @@  static int seq_show(struct seq_file *m, void *v)
 		   (long long)file->f_pos, f_flags,
 		   real_mount(file->f_path.mnt)->mnt_id);
 
+	/* show_fd_locks() never deferences files so a stale value is safe */
 	show_fd_locks(m, file, files);
 	if (seq_has_overflowed(m))
 		goto out;

Comments

Linus Torvalds Aug. 18, 2020, 12:08 a.m.
On Mon, Aug 17, 2020 at 3:11 PM Eric W. Biederman <ebiederm@xmission.com> wrote:
>
> Instead hold task_lock for the duration that task->files needs to be
> stable in seq_show.  The task_lock was already taken in
> get_files_struct, and so skipping get_files_struct performs less work
> overall, and avoids the problems with the files_struct reference
> count.

Hmm. Do we even need that task_lock() at all? Couldn't we do this all
with just the RCU lock held for reading?

As far as I can tell, we don't really need the task lock. We don't
even need the get/pid_task_struct().

Can't we just do

        rcu_read_lock();
        task = pid_task(proc_pid(inode), PIDTYPE_PID);
        if (task) {
                unsigned int fd = proc_fd(m->private);
                struct file *file = fcheck_task(task, fd);
                if (file)
                        .. do the seq_printf ..

and do it all with no refcount games or task locking at all?

Anyway, I don't dislike your patch per se, I just get the feeling that
you could go a bit further in that cleanup..

And it's quite possible that I'm wrong, and you can't just use the RCU
lock, but as far as I can tell, both the task lookup and the file
lookup *already* really both depend on the RCU lock anyway, so the
other locking and reference counting games really do seem superfluous.

Please just send me a belittling email telling me what a nincompoop I
am if I have missed something.

             Linus
Eric W. Biederman Aug. 18, 2020, 1:09 a.m.
Linus Torvalds <torvalds@linux-foundation.org> writes:

> On Mon, Aug 17, 2020 at 3:11 PM Eric W. Biederman <ebiederm@xmission.com> wrote:
>>
>> Instead hold task_lock for the duration that task->files needs to be
>> stable in seq_show.  The task_lock was already taken in
>> get_files_struct, and so skipping get_files_struct performs less work
>> overall, and avoids the problems with the files_struct reference
>> count.
>
> Hmm. Do we even need that task_lock() at all? Couldn't we do this all
> with just the RCU lock held for reading?

task_lock is needed to protect task->files.

files->fd_array is rcu protected.

We could change task->files to be rcu protected but today we do
an immediate free of task files.

> As far as I can tell, we don't really need the task lock. We don't
> even need the get/pid_task_struct().
>
> Can't we just do
>
>         rcu_read_lock();
>         task = pid_task(proc_pid(inode), PIDTYPE_PID);
>         if (task) {
>                 unsigned int fd = proc_fd(m->private);
>                 struct file *file = fcheck_task(task, fd);
>                 if (file)
>                         .. do the seq_printf ..
>
> and do it all with no refcount games or task locking at all?

If we want to change how task->files is freed in exit_files.
Well freed in put_files_struct really.

Rereading it I am having a hard time convincing myself that the
__free_fdtable in put_files_struct is fine.  I will have a look
tommorrow after I have slept.

> Anyway, I don't dislike your patch per se, I just get the feeling that
> you could go a bit further in that cleanup..
>
> And it's quite possible that I'm wrong, and you can't just use the RCU
> lock, but as far as I can tell, both the task lookup and the file
> lookup *already* really both depend on the RCU lock anyway, so the
> other locking and reference counting games really do seem superfluous.
>
> Please just send me a belittling email telling me what a nincompoop I
> am if I have missed something.

I hope this email isn't belittling.  But yes you did miss a thing or
two, and now I am not certain I haven't missed anything.

Eric
Linus Torvalds Aug. 18, 2020, 1:21 a.m.
On Mon, Aug 17, 2020 at 6:13 PM Eric W. Biederman <ebiederm@xmission.com> wrote:
>
> task_lock is needed to protect task->files.

Hah. Right you are. I found a few cases where we didn't do that, but I
hadn't noticed that they were all of the pattern

        struct task_struct *tsk = current;

so "tsk->files" was safe for that reason..

So never mind.

           Linus
Christian Brauner Aug. 18, 2020, 10:43 a.m.
On Mon, Aug 17, 2020 at 05:04:20PM -0500, Eric W. Biederman wrote:
> When discussing[1] exec and posix file locks it was realized that none
> of the callers of get_files_struct fundamentally needed to call
> get_files_struct, and that by switching them to helper functions
> instead it will both simplify their code and remove unnecessary
> increments of files_struct.count.  Those unnecessary increments can
> result in exec unnecessarily unsharing files_struct which breaking
> posix locks, and it can result in fget_light having to fallback to
> fget reducing system performance.
> 
> Instead hold task_lock for the duration that task->files needs to be
> stable in seq_show.  The task_lock was already taken in
> get_files_struct, and so skipping get_files_struct performs less work
> overall, and avoids the problems with the files_struct reference
> count.
> 
> [1] https://lkml.kernel.org/r/20180915160423.GA31461@redhat.com
> Suggested-by: Oleg Nesterov <oleg@redhat.com>
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
> ---

Acked-by: Christian Brauner <christian.brauner@ubuntu.com>