[10/17] proc/fd: In proc_readfd_common use fnext_task

Submitted by Eric W. Biederman on Aug. 17, 2020, 10:04 p.m.

Details

Message ID 20200817220425.9389-10-ebiederm@xmission.com
State New
Series "Series without cover letter"
Headers show

Commit Message

Eric W. Biederman Aug. 17, 2020, 10:04 p.m.
When discussing[1] exec and posix file locks it was realized that none
of the callers of get_files_struct fundamentally needed to call
get_files_struct, and that by switching them to helper functions
instead it will both simplify their code and remove unnecessary
increments of files_struct.count.  Those unnecessary increments can
result in exec unnecessarily unsharing files_struct which breaking
posix locks, and it can result in fget_light having to fallback to
fget reducing system performance.

Using fnext_task simplifies proc_readfd_common, by moving the checking
for the maximum file descritor into the generic code, and by
remvoing the need for capturing and releasing a reference on
files_struct.

[1] https://lkml.kernel.org/r/20180915160423.GA31461@redhat.com
Suggested-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
 fs/proc/fd.c | 17 +++++------------
 1 file changed, 5 insertions(+), 12 deletions(-)

Patch hide | download patch | download mbox

diff --git a/fs/proc/fd.c b/fs/proc/fd.c
index abfdcb21cc79..d9fee5390fd7 100644
--- a/fs/proc/fd.c
+++ b/fs/proc/fd.c
@@ -218,7 +218,6 @@  static int proc_readfd_common(struct file *file, struct dir_context *ctx,
 			      instantiate_t instantiate)
 {
 	struct task_struct *p = get_proc_task(file_inode(file));
-	struct files_struct *files;
 	unsigned int fd;
 
 	if (!p)
@@ -226,22 +225,18 @@  static int proc_readfd_common(struct file *file, struct dir_context *ctx,
 
 	if (!dir_emit_dots(file, ctx))
 		goto out;
-	files = get_files_struct(p);
-	if (!files)
-		goto out;
 
 	rcu_read_lock();
-	for (fd = ctx->pos - 2;
-	     fd < files_fdtable(files)->max_fds;
-	     fd++, ctx->pos++) {
+	for (fd = ctx->pos - 2;; fd++, ctx->pos++) {
 		struct file *f;
 		struct fd_data data;
 		char name[10 + 1];
 		unsigned int len;
 
-		f = fcheck_files(files, fd);
+		f = fnext_task(p, &fd);
+		ctx->pos = fd;
 		if (!f)
-			continue;
+			break;
 		data.mode = f->f_mode;
 		rcu_read_unlock();
 		data.fd = fd;
@@ -250,13 +245,11 @@  static int proc_readfd_common(struct file *file, struct dir_context *ctx,
 		if (!proc_fill_cache(file, ctx,
 				     name, len, instantiate, p,
 				     &data))
-			goto out_fd_loop;
+			goto out;
 		cond_resched();
 		rcu_read_lock();
 	}
 	rcu_read_unlock();
-out_fd_loop:
-	put_files_struct(files);
 out:
 	put_task_struct(p);
 	return 0;

Comments

Al Viro Aug. 18, 2020, 2:22 a.m.
On Mon, Aug 17, 2020 at 05:04:18PM -0500, Eric W. Biederman wrote:
> When discussing[1] exec and posix file locks it was realized that none
> of the callers of get_files_struct fundamentally needed to call
> get_files_struct, and that by switching them to helper functions
> instead it will both simplify their code and remove unnecessary
> increments of files_struct.count.  Those unnecessary increments can
> result in exec unnecessarily unsharing files_struct which breaking
> posix locks, and it can result in fget_light having to fallback to
> fget reducing system performance.
> 
> Using fnext_task simplifies proc_readfd_common, by moving the checking
> for the maximum file descritor into the generic code, and by
> remvoing the need for capturing and releasing a reference on
> files_struct.

Yecchhh...  So you keep locking/unlocking the damn thing for each
descriptor?
Eric W. Biederman Aug. 18, 2020, 3:42 a.m.
Al Viro <viro@zeniv.linux.org.uk> writes:

> On Mon, Aug 17, 2020 at 05:04:18PM -0500, Eric W. Biederman wrote:
>> When discussing[1] exec and posix file locks it was realized that none
>> of the callers of get_files_struct fundamentally needed to call
>> get_files_struct, and that by switching them to helper functions
>> instead it will both simplify their code and remove unnecessary
>> increments of files_struct.count.  Those unnecessary increments can
>> result in exec unnecessarily unsharing files_struct which breaking
>> posix locks, and it can result in fget_light having to fallback to
>> fget reducing system performance.
>> 
>> Using fnext_task simplifies proc_readfd_common, by moving the checking
>> for the maximum file descritor into the generic code, and by
>> remvoing the need for capturing and releasing a reference on
>> files_struct.
>
> Yecchhh...  So you keep locking/unlocking the damn thing for each
> descriptor?

For each descriptor found yes.

In practice there is a copy_to_user in between each fnext_task call which
limits what can be done.

The options are:
- Batching
- Adding a second count to files_struct for observers
- Or making exit_file/put_files_struct always put files_struct in
  an rcu safe way.

My gut says if we are going to care enough about the readers in proc
to do something making the entire thing rcu safe is probably the way
to go.

I don't know the real world cost of modifing exit_files/put_task_struct
to put the files_struct in an rcu safe way, but it looks like the only
case that matters is when a processes is exiting, and so it probably
is in the noise.


Any objection to putting an rcu_head into files_struct so that it can be
rcu freed?

There might even be a way to split exit_files so they are all closed
in the current location, and then the data structure is freed
in delayed put_task_struct.


I am definitely willing to look at it. Do we think there would be enough
traffic on task_lock from /proc/<pid>/fd access to make it work doing?

Eric
Alexei Starovoitov Aug. 18, 2020, 4:59 a.m.
On Mon, Aug 17, 2020 at 8:46 PM Eric W. Biederman <ebiederm@xmission.com> wrote:
>
> I am definitely willing to look at it. Do we think there would be enough
> traffic on task_lock from /proc/<pid>/fd access to make it work doing?

not from /proc, but bpf iterator in kernel/bpf/task_iter.c that is
being modified
in the other patch is used to process 100+k tasks. The faster it can go
through them the better. We don't see task spin_lock being a bottleneck though.
To test it just do 'bpftool prog show'. It will use task iterator undercover.