[v2,15/24] proc/fd: In proc_readfd_common use task_lookup_next_fd_rcu

Submitted by Eric W. Biederman on Nov. 20, 2020, 11:14 p.m.

Details

Message ID 20201120231441.29911-15-ebiederm@xmission.com
State New
Series "exec: Move unshare_files and guarantee files_struct.count is correct"
Headers show

Commit Message

Eric W. Biederman Nov. 20, 2020, 11:14 p.m.
When discussing[1] exec and posix file locks it was realized that none
of the callers of get_files_struct fundamentally needed to call
get_files_struct, and that by switching them to helper functions
instead it will both simplify their code and remove unnecessary
increments of files_struct.count.  Those unnecessary increments can
result in exec unnecessarily unsharing files_struct which breaking
posix locks, and it can result in fget_light having to fallback to
fget reducing system performance.

Using task_lookup_next_fd_rcu simplifies proc_readfd_common, by moving
the checking for the maximum file descritor into the generic code, and
by remvoing the need for capturing and releasing a reference on
files_struct.

As task_lookup_fd_rcu may update the fd ctx->pos has been changed
to be the fd +2 after task_lookup_fd_rcu returns.

[1] https://lkml.kernel.org/r/20180915160423.GA31461@redhat.com
Suggested-by: Oleg Nesterov <oleg@redhat.com>
Tested-by: Andy Lavr <andy.lavr@gmail.com>
v1: https://lkml.kernel.org/r/20200817220425.9389-10-ebiederm@xmission.com
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
 fs/proc/fd.c | 17 +++++------------
 1 file changed, 5 insertions(+), 12 deletions(-)

Patch hide | download patch | download mbox

diff --git a/fs/proc/fd.c b/fs/proc/fd.c
index c1a984f3c4df..72c1525b4b3e 100644
--- a/fs/proc/fd.c
+++ b/fs/proc/fd.c
@@ -217,7 +217,6 @@  static int proc_readfd_common(struct file *file, struct dir_context *ctx,
 			      instantiate_t instantiate)
 {
 	struct task_struct *p = get_proc_task(file_inode(file));
-	struct files_struct *files;
 	unsigned int fd;
 
 	if (!p)
@@ -225,22 +224,18 @@  static int proc_readfd_common(struct file *file, struct dir_context *ctx,
 
 	if (!dir_emit_dots(file, ctx))
 		goto out;
-	files = get_files_struct(p);
-	if (!files)
-		goto out;
 
 	rcu_read_lock();
-	for (fd = ctx->pos - 2;
-	     fd < files_fdtable(files)->max_fds;
-	     fd++, ctx->pos++) {
+	for (fd = ctx->pos - 2;; fd++) {
 		struct file *f;
 		struct fd_data data;
 		char name[10 + 1];
 		unsigned int len;
 
-		f = files_lookup_fd_rcu(files, fd);
+		f = task_lookup_next_fd_rcu(p, &fd);
+		ctx->pos = fd + 2LL;
 		if (!f)
-			continue;
+			break;
 		data.mode = f->f_mode;
 		rcu_read_unlock();
 		data.fd = fd;
@@ -249,13 +244,11 @@  static int proc_readfd_common(struct file *file, struct dir_context *ctx,
 		if (!proc_fill_cache(file, ctx,
 				     name, len, instantiate, p,
 				     &data))
-			goto out_fd_loop;
+			goto out;
 		cond_resched();
 		rcu_read_lock();
 	}
 	rcu_read_unlock();
-out_fd_loop:
-	put_files_struct(files);
 out:
 	put_task_struct(p);
 	return 0;

Comments

Al Viro Dec. 7, 2020, 11:29 p.m.
On Fri, Nov 20, 2020 at 05:14:32PM -0600, Eric W. Biederman wrote:
> When discussing[1] exec and posix file locks it was realized that none
> of the callers of get_files_struct fundamentally needed to call
> get_files_struct, and that by switching them to helper functions
> instead it will both simplify their code and remove unnecessary
> increments of files_struct.count.  Those unnecessary increments can
> result in exec unnecessarily unsharing files_struct which breaking
> posix locks, and it can result in fget_light having to fallback to
> fget reducing system performance.
> 
> Using task_lookup_next_fd_rcu simplifies proc_readfd_common, by moving
> the checking for the maximum file descritor into the generic code, and
> by remvoing the need for capturing and releasing a reference on
> files_struct.
> 
> As task_lookup_fd_rcu may update the fd ctx->pos has been changed
> to be the fd +2 after task_lookup_fd_rcu returns.


> +	for (fd = ctx->pos - 2;; fd++) {
>  		struct file *f;
>  		struct fd_data data;
>  		char name[10 + 1];
>  		unsigned int len;
>  
> -		f = files_lookup_fd_rcu(files, fd);
> +		f = task_lookup_next_fd_rcu(p, &fd);

Ugh...  That makes for a massive cacheline pingpong on task_lock -
instead of grabbing/dropping task_lock() once in the beginning, we do
that for every damn descriptor.

I really don't like this one.  If anything, I would rather have
a helper that would collect a bunch of pairs (fd,mode) into an
array and have lookups batched into it.  With the loop in that
sucker grabbing a reasonable amount into a local array, then
doing proc_fill_cache() for each collected.