[v3] nfs: protect callback execution against per-net callback thread shutdown

Submitted by Stanislav Kinsburskiy on Nov. 7, 2017, 10:39 a.m.

Details

Message ID 20171107103919.21983.32275.stgit@localhost.localdomain
State New
Series "nfs: protect callback execution against per-net callback thread shutdown"
Headers show

Commit Message

Stanislav Kinsburskiy Nov. 7, 2017, 10:39 a.m.
From: Stanislav Kinsburskiy <skinsbursky@parallels.com>

Here is the race:

CPU #0				CPU#1

cleanup_mnt			nfs41_callback_svc (get xprt from the list)
nfs_callback_down		...
...				...
svc_close_net			...
...				...
svc_xprt_free			...
svc_bc_sock_free		bc_svc_process
kfree(xprt)			svc_process_common
				rqstp->rq_xprt->xpt_ops (use after free)

The problem is that per-net SUNRPC transports shutdown is done regardless
current callback execution. This is a race leading to transport use-after-free
in callback handler.
This patch fixes it in stright-forward way. I.e. it protects callback
execution with the same mutex used for per-net data creation and destruction.
Hopefully, it won't slow down NFS client significantly.

https://jira.sw.ru/browse/PSBM-75751

v3: Fix mutex deadlock, when shutdown callback waits for thread to exit (with
mutex taken), while thread wait for the mutex to take.
The idea is to simply check if thread has to exit, if mutex lock has failed.
This is a busy loop, but it shouldn't happend often and for long.

Signed-off-by: Stanislav Kinsburskiy <skinsbursky@parallels.com>
---
 fs/nfs/callback.c |    6 ++++++
 1 file changed, 6 insertions(+)

Patch hide | download patch | download mbox

diff --git a/fs/nfs/callback.c b/fs/nfs/callback.c
index 0beb275..bbb07e4 100644
--- a/fs/nfs/callback.c
+++ b/fs/nfs/callback.c
@@ -117,7 +117,11 @@  nfs41_callback_svc(void *vrqstp)
 		if (try_to_freeze())
 			continue;
 
+		if (!mutex_trylock(&nfs_callback_mutex))
+		       continue;
+
 		prepare_to_wait(&serv->sv_cb_waitq, &wq, TASK_INTERRUPTIBLE);
+
 		spin_lock_bh(&serv->sv_cb_lock);
 		if (!list_empty(&serv->sv_cb_list)) {
 			req = list_first_entry(&serv->sv_cb_list,
@@ -129,8 +133,10 @@  nfs41_callback_svc(void *vrqstp)
 			error = bc_svc_process(serv, req, rqstp);
 			dprintk("bc_svc_process() returned w/ error code= %d\n",
 				error);
+			mutex_unlock(&nfs_callback_mutex);
 		} else {
 			spin_unlock_bh(&serv->sv_cb_lock);
+			mutex_unlock(&nfs_callback_mutex);
 			schedule();
 			finish_wait(&serv->sv_cb_waitq, &wq);
 		}

Comments

Kirill Tkhai Nov. 7, 2017, 11:30 a.m.
On 07.11.2017 13:39, Stanislav Kinsburskiy wrote:
> From: Stanislav Kinsburskiy <skinsbursky@parallels.com>
> 
> Here is the race:
> 
> CPU #0				CPU#1
> 
> cleanup_mnt			nfs41_callback_svc (get xprt from the list)
> nfs_callback_down		...
> ...				...
> svc_close_net			...
> ...				...
> svc_xprt_free			...
> svc_bc_sock_free		bc_svc_process
> kfree(xprt)			svc_process_common
> 				rqstp->rq_xprt->xpt_ops (use after free)
> 
> The problem is that per-net SUNRPC transports shutdown is done regardless
> current callback execution. This is a race leading to transport use-after-free
> in callback handler.
> This patch fixes it in stright-forward way. I.e. it protects callback
> execution with the same mutex used for per-net data creation and destruction.
> Hopefully, it won't slow down NFS client significantly.
> 
> https://jira.sw.ru/browse/PSBM-75751
> 
> v3: Fix mutex deadlock, when shutdown callback waits for thread to exit (with
> mutex taken), while thread wait for the mutex to take.
> The idea is to simply check if thread has to exit, if mutex lock has failed.
> This is a busy loop, but it shouldn't happend often and for long.
> 
> Signed-off-by: Stanislav Kinsburskiy <skinsbursky@parallels.com>
> ---
>  fs/nfs/callback.c |    6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/fs/nfs/callback.c b/fs/nfs/callback.c
> index 0beb275..bbb07e4 100644
> --- a/fs/nfs/callback.c
> +++ b/fs/nfs/callback.c
> @@ -117,7 +117,11 @@ nfs41_callback_svc(void *vrqstp)
>  		if (try_to_freeze())
>  			continue;
>  
> +		if (!mutex_trylock(&nfs_callback_mutex))
> +		       continue;

This looks like a busy loop (that especially bad, because mutex-owner may sleep
at the moment we are looping).

It seems the solution may be to change nfs_callback_down() function.
Can't we flush pending request in nfs_callback_down()? Or just delete
it from sv_cb_list without handling (does nfs proto allow that)?

Also, one more additional argument for flush is a suspicion there may be more than
one pending request in the list. Can it be?

> +
>  		prepare_to_wait(&serv->sv_cb_waitq, &wq, TASK_INTERRUPTIBLE);
> +
>  		spin_lock_bh(&serv->sv_cb_lock);
>  		if (!list_empty(&serv->sv_cb_list)) {
>  			req = list_first_entry(&serv->sv_cb_list,
> @@ -129,8 +133,10 @@ nfs41_callback_svc(void *vrqstp)
>  			error = bc_svc_process(serv, req, rqstp);
>  			dprintk("bc_svc_process() returned w/ error code= %d\n",
>  				error);
> +			mutex_unlock(&nfs_callback_mutex);
>  		} else {
>  			spin_unlock_bh(&serv->sv_cb_lock);
> +			mutex_unlock(&nfs_callback_mutex);
>  			schedule();
>  			finish_wait(&serv->sv_cb_waitq, &wq);
>  		}
>
Stanislav Kinsburskiy Nov. 7, 2017, 2:29 p.m.
Please, see my comments below.

07.11.2017 12:30, Kirill Tkhai пишет:
> On 07.11.2017 13:39, Stanislav Kinsburskiy wrote:
>> From: Stanislav Kinsburskiy <skinsbursky@parallels.com>
>>
>> Here is the race:
>>
>> CPU #0				CPU#1
>>
>> cleanup_mnt			nfs41_callback_svc (get xprt from the list)
>> nfs_callback_down		...
>> ...				...
>> svc_close_net			...
>> ...				...
>> svc_xprt_free			...
>> svc_bc_sock_free		bc_svc_process
>> kfree(xprt)			svc_process_common
>> 				rqstp->rq_xprt->xpt_ops (use after free)
>>
>> The problem is that per-net SUNRPC transports shutdown is done regardless
>> current callback execution. This is a race leading to transport use-after-free
>> in callback handler.
>> This patch fixes it in stright-forward way. I.e. it protects callback
>> execution with the same mutex used for per-net data creation and destruction.
>> Hopefully, it won't slow down NFS client significantly.
>>
>> https://jira.sw.ru/browse/PSBM-75751
>>
>> v3: Fix mutex deadlock, when shutdown callback waits for thread to exit (with
>> mutex taken), while thread wait for the mutex to take.
>> The idea is to simply check if thread has to exit, if mutex lock has failed.
>> This is a busy loop, but it shouldn't happend often and for long.
>>
>> Signed-off-by: Stanislav Kinsburskiy <skinsbursky@parallels.com>
>> ---
>>  fs/nfs/callback.c |    6 ++++++
>>  1 file changed, 6 insertions(+)
>>
>> diff --git a/fs/nfs/callback.c b/fs/nfs/callback.c
>> index 0beb275..bbb07e4 100644
>> --- a/fs/nfs/callback.c
>> +++ b/fs/nfs/callback.c
>> @@ -117,7 +117,11 @@ nfs41_callback_svc(void *vrqstp)
>>  		if (try_to_freeze())
>>  			continue;
>>  
>> +		if (!mutex_trylock(&nfs_callback_mutex))
>> +		       continue;
> 
> This looks like a busy loop (that especially bad, because mutex-owner may sleep
> at the moment we are looping).
> 

Well, yes. It a busy loop. And that's not good. But I have to mention, that this busy loop can happen only on umount request.

> It seems the solution may be to change nfs_callback_down() function.
> Can't we flush pending request in nfs_callback_down()?

Well, this already happens. The problem is in race between transport usage (unconditional) and its destruction (also unconditional).
IOW, there should be either reference counting or some critical section.
Looks like the former will need a way more code and thus more error-prone, the latter is uglier, but code should be simpler at first sight.


> Or just delete it from sv_cb_list without handling (does nfs proto allow that)?
> 

Well, this looks like a promising idea. But yet again, there should be some protection against transport usage in shutdown helper.
And it's not yet clear, how to implement it without significant code change...