fs: sync: prevent possible deadlock in sync_collect_filesystems

Submitted by Alexander Mikhalitsyn on March 4, 2021, 3:59 p.m.

Details

Message ID 20210304155939.28320-1-alexander.mikhalitsyn@virtuozzo.com
State New
Series "fs: sync: prevent possible deadlock in sync_collect_filesystems"
Headers show

Commit Message

Alexander Mikhalitsyn March 4, 2021, 3:59 p.m.
There are two problems:
1. kmalloc should be called with GFP_NOWAIT to prevent sleep
under spinlock taken

2. spin_lock(&sb_lock) under spinlock(mnt_ns_list)
There we have a small probability of deadlock, to overcome
this let's use spin_trylock(&sb_lock) here.

Cc: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Cc: Vasily Averin <vvs@virtuozzo.com>
Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
---
 fs/sync.c | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

Patch hide | download patch | download mbox

diff --git a/fs/sync.c b/fs/sync.c
index 1cf0f0b38824..b2b72ac75615 100644
--- a/fs/sync.c
+++ b/fs/sync.c
@@ -150,7 +150,7 @@  static int sync_collect_filesystems(struct ve_struct *ve, struct list_head *sync
 		if (sync_filesystem_collected(sync_list, mnt->mnt.mnt_sb))
 			continue;
 
-		ss = kmalloc(sizeof(*ss), GFP_KERNEL);
+		ss = kmalloc(sizeof(*ss), GFP_NOWAIT);
 		if (ss == NULL) {
 			ret = -ENOMEM;
 			break;
@@ -161,7 +161,14 @@  static int sync_collect_filesystems(struct ve_struct *ve, struct list_head *sync
 		 * alive. And it means, that we can safely increase it's usage
 		 * counter.
 		 */
-		spin_lock(&sb_lock);
+		if (unlikely(!spin_trylock(&sb_lock))) {
+			/*
+			 * If we can't take spinlock then just skip that mount to prevent
+			 * possible deadlocks here.
+			 */
+			kfree(ss);
+			continue;
+		}
 		ss->sb->s_count++;
 		spin_unlock(&sb_lock);
 		list_add_tail(&ss->list, sync_list);

Comments

Pavel Tikhomirov March 5, 2021, 7:43 a.m.
On 3/4/21 6:59 PM, Alexander Mikhalitsyn wrote:
> There are two problems:
> 1. kmalloc should be called with GFP_NOWAIT to prevent sleep
> under spinlock taken
> 
> 2. spin_lock(&sb_lock) under spinlock(mnt_ns_list)
> There we have a small probability of deadlock, to overcome
> this let's use spin_trylock(&sb_lock) here
Bug link is missing, Probably PSBM-126568?

> 
> Cc: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
> Cc: Vasily Averin <vvs@virtuozzo.com>
> Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
> ---
>   fs/sync.c | 11 +++++++++--
>   1 file changed, 9 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/sync.c b/fs/sync.c
> index 1cf0f0b38824..b2b72ac75615 100644
> --- a/fs/sync.c
> +++ b/fs/sync.c
> @@ -150,7 +150,7 @@ static int sync_collect_filesystems(struct ve_struct *ve, struct list_head *sync
>   		if (sync_filesystem_collected(sync_list, mnt->mnt.mnt_sb))
>   			continue;
>   
> -		ss = kmalloc(sizeof(*ss), GFP_KERNEL);
> +		ss = kmalloc(sizeof(*ss), GFP_NOWAIT);

I don't think it's a good thing to just replace all GFP_KERNEL to 
GFP_NOWAIT, I'm not an expert but I would expect it to eat more memory 
reserves. We can do it only if there is nothing else to do.

But we have an alternative here: Use mnt_list_next.

We need to hold lock_ns_list(mnt_ns); only when mnt is "cursor" all 
other mounts in list can be removed from it only under 
down_write(&namespace_sem) from which we are already protected here with 
down_read(&namespace_sem). So we need to take lock_ns_list(mnt_ns) only 
for iterating to next mnt entry over probably several "cursor" ones, you 
can safely to this by using mnt_list_next() helper.

>   		if (ss == NULL) {
>   			ret = -ENOMEM;
>   			break;
> @@ -161,7 +161,14 @@ static int sync_collect_filesystems(struct ve_struct *ve, struct list_head *sync
>   		 * alive. And it means, that we can safely increase it's usage
>   		 * counter.
>   		 */
> -		spin_lock(&sb_lock);
> +		if (unlikely(!spin_trylock(&sb_lock))) {
> +			/*
> +			 * If we can't take spinlock then just skip that mount to prevent
> +			 * possible deadlocks here.
> +			 */
> +			kfree(ss);
> +			continue;
> +		}
>   		ss->sb->s_count++;
>   		spin_unlock(&sb_lock);
>   		list_add_tail(&ss->list, sync_list);
>
Pavel Tikhomirov March 5, 2021, 7:44 a.m.
On 3/5/21 10:43 AM, Pavel Tikhomirov wrote:
> 
> 
> On 3/4/21 6:59 PM, Alexander Mikhalitsyn wrote:
>> There are two problems:
>> 1. kmalloc should be called with GFP_NOWAIT to prevent sleep
>> under spinlock taken
>>
>> 2. spin_lock(&sb_lock) under spinlock(mnt_ns_list)
>> There we have a small probability of deadlock, to overcome
>> this let's use spin_trylock(&sb_lock) here
> Bug link is missing, Probably PSBM-126568?
> 
>>
>> Cc: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
>> Cc: Vasily Averin <vvs@virtuozzo.com>
>> Signed-off-by: Alexander Mikhalitsyn 
>> <alexander.mikhalitsyn@virtuozzo.com>
>> ---
>>   fs/sync.c | 11 +++++++++--
>>   1 file changed, 9 insertions(+), 2 deletions(-)
>>
>> diff --git a/fs/sync.c b/fs/sync.c
>> index 1cf0f0b38824..b2b72ac75615 100644
>> --- a/fs/sync.c
>> +++ b/fs/sync.c
>> @@ -150,7 +150,7 @@ static int sync_collect_filesystems(struct 
>> ve_struct *ve, struct list_head *sync
>>           if (sync_filesystem_collected(sync_list, mnt->mnt.mnt_sb))
>>               continue;
>> -        ss = kmalloc(sizeof(*ss), GFP_KERNEL);
>> +        ss = kmalloc(sizeof(*ss), GFP_NOWAIT);
> 
> I don't think it's a good thing to just replace all GFP_KERNEL to 
> GFP_NOWAIT, I'm not an expert but I would expect it to eat more memory 
> reserves. We can do it only if there is nothing else to do.
> 
> But we have an alternative here: Use mnt_list_next.
> 
> We need to hold lock_ns_list(mnt_ns); only when mnt is "cursor" all 
> other mounts in list can be removed from it only under 
> down_write(&namespace_sem) from which we are already protected here with 
> down_read(&namespace_sem). So we need to take lock_ns_list(mnt_ns) only 
> for iterating to next mnt entry over probably several "cursor" ones, you 
> can safely to this by using mnt_list_next() helper.

Note: this alternative aproach fixes both (1) and (2).

> 
>>           if (ss == NULL) {
>>               ret = -ENOMEM;
>>               break;
>> @@ -161,7 +161,14 @@ static int sync_collect_filesystems(struct 
>> ve_struct *ve, struct list_head *sync
>>            * alive. And it means, that we can safely increase it's usage
>>            * counter.
>>            */
>> -        spin_lock(&sb_lock);
>> +        if (unlikely(!spin_trylock(&sb_lock))) {
>> +            /*
>> +             * If we can't take spinlock then just skip that mount to 
>> prevent
>> +             * possible deadlocks here.
>> +             */
>> +            kfree(ss);
>> +            continue;
>> +        }
>>           ss->sb->s_count++;
>>           spin_unlock(&sb_lock);
>>           list_add_tail(&ss->list, sync_list);
>>
>