[rh8] ploop: Update cached bat_entries before bio completion

Submitted by Kirill Tkhai on Feb. 19, 2020, 12:03 p.m.

Details

Message ID 158211376969.340009.2114251419549825821.stgit@localhost.localdomain
State New
Series "ploop: Update cached bat_entries before bio completion"
Headers show

Commit Message

Kirill Tkhai Feb. 19, 2020, 12:03 p.m.
Right after data_bio->end_io is called, it's expected
a subsequent read returns written data. But there is
a race between ploop_bat_write_complete() and ploop_map():

WRITE					READ
ploop_bat_write_complete()
  data_bio->end_io()
					write submitter sees write completion
					next submittion is READ
					ploop_map()
					  check BAT --- sees zero BAT entry
					  zero_fill_bio()
  update BAT

So, the order has to be vise versa: first is to update BAT cache,
second is to call data_bio->end_io().

https://jira.sw.ru/browse/PSBM-100788

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
---
 drivers/md/dm-ploop-map.c |   21 ++++++++++++---------
 1 file changed, 12 insertions(+), 9 deletions(-)

Patch hide | download patch | download mbox

diff --git a/drivers/md/dm-ploop-map.c b/drivers/md/dm-ploop-map.c
index a67656b2b01e..4ca0f8e83400 100644
--- a/drivers/md/dm-ploop-map.c
+++ b/drivers/md/dm-ploop-map.c
@@ -623,6 +623,18 @@  static void ploop_bat_write_complete(struct bio *bio)
 
 	track_bio(ploop, bio);
 
+	if (!bio->bi_status) {
+		/*
+		 * Success: now update local BAT copy. We could do this
+		 * from our delayed work, but we want to publish new
+		 * mapping in the fastest way. This must be done before
+		 * data bios completion, since right after we complete
+		 * a bio, subsequent read wants to see written data
+		 * (ploop_map() wants to see not zero bat_entries[.]).
+		 */
+		ploop_advance_local_after_bat_wb(ploop, piwb, true);
+	}
+
 	spin_lock_irqsave(&piwb->lock, flags);
 	piwb->completed = true;
 	piwb->bi_status = bio->bi_status;
@@ -644,15 +656,6 @@  static void ploop_bat_write_complete(struct bio *bio)
 		complete_cow(cow, bio->bi_status);
 	}
 
-	if (!piwb->bi_status) {
-		/*
-		 * Success: now update local BAT copy. We could do this
-		 * from our delayed work, but we want to publish new
-		 * mapping in the fastest way.
-		 */
-		ploop_advance_local_after_bat_wb(ploop, piwb, true);
-	}
-
 	/*
 	 * In case of update BAT is failed, dst_clusters will be
 	 * set back to holes_bitmap on last put_piwb().

Comments

Kirill Tkhai Feb. 19, 2020, 12:05 p.m.
https://jira.sw.ru/browse/PSBM-101499

On 19.02.2020 15:03, Kirill Tkhai wrote:
> Right after data_bio->end_io is called, it's expected
> a subsequent read returns written data. But there is
> a race between ploop_bat_write_complete() and ploop_map():
> 
> WRITE					READ
> ploop_bat_write_complete()
>   data_bio->end_io()
> 					write submitter sees write completion
> 					next submittion is READ
> 					ploop_map()
> 					  check BAT --- sees zero BAT entry
> 					  zero_fill_bio()
>   update BAT
> 
> So, the order has to be vise versa: first is to update BAT cache,
> second is to call data_bio->end_io().
> 
> https://jira.sw.ru/browse/PSBM-100788
> 
> Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
> ---
>  drivers/md/dm-ploop-map.c |   21 ++++++++++++---------
>  1 file changed, 12 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/md/dm-ploop-map.c b/drivers/md/dm-ploop-map.c
> index a67656b2b01e..4ca0f8e83400 100644
> --- a/drivers/md/dm-ploop-map.c
> +++ b/drivers/md/dm-ploop-map.c
> @@ -623,6 +623,18 @@ static void ploop_bat_write_complete(struct bio *bio)
>  
>  	track_bio(ploop, bio);
>  
> +	if (!bio->bi_status) {
> +		/*
> +		 * Success: now update local BAT copy. We could do this
> +		 * from our delayed work, but we want to publish new
> +		 * mapping in the fastest way. This must be done before
> +		 * data bios completion, since right after we complete
> +		 * a bio, subsequent read wants to see written data
> +		 * (ploop_map() wants to see not zero bat_entries[.]).
> +		 */
> +		ploop_advance_local_after_bat_wb(ploop, piwb, true);
> +	}
> +
>  	spin_lock_irqsave(&piwb->lock, flags);
>  	piwb->completed = true;
>  	piwb->bi_status = bio->bi_status;
> @@ -644,15 +656,6 @@ static void ploop_bat_write_complete(struct bio *bio)
>  		complete_cow(cow, bio->bi_status);
>  	}
>  
> -	if (!piwb->bi_status) {
> -		/*
> -		 * Success: now update local BAT copy. We could do this
> -		 * from our delayed work, but we want to publish new
> -		 * mapping in the fastest way.
> -		 */
> -		ploop_advance_local_after_bat_wb(ploop, piwb, true);
> -	}
> -
>  	/*
>  	 * In case of update BAT is failed, dst_clusters will be
>  	 * set back to holes_bitmap on last put_piwb().
> 
>
Kirill Tkhai March 10, 2020, 10:59 a.m.
Konstantin, could you please commit this?

On 19.02.2020 15:05, Kirill Tkhai wrote:
> https://jira.sw.ru/browse/PSBM-101499
> 
> On 19.02.2020 15:03, Kirill Tkhai wrote:
>> Right after data_bio->end_io is called, it's expected
>> a subsequent read returns written data. But there is
>> a race between ploop_bat_write_complete() and ploop_map():
>>
>> WRITE					READ
>> ploop_bat_write_complete()
>>   data_bio->end_io()
>> 					write submitter sees write completion
>> 					next submittion is READ
>> 					ploop_map()
>> 					  check BAT --- sees zero BAT entry
>> 					  zero_fill_bio()
>>   update BAT
>>
>> So, the order has to be vise versa: first is to update BAT cache,
>> second is to call data_bio->end_io().
>>
>> https://jira.sw.ru/browse/PSBM-100788
>>
>> Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
>> ---
>>  drivers/md/dm-ploop-map.c |   21 ++++++++++++---------
>>  1 file changed, 12 insertions(+), 9 deletions(-)
>>
>> diff --git a/drivers/md/dm-ploop-map.c b/drivers/md/dm-ploop-map.c
>> index a67656b2b01e..4ca0f8e83400 100644
>> --- a/drivers/md/dm-ploop-map.c
>> +++ b/drivers/md/dm-ploop-map.c
>> @@ -623,6 +623,18 @@ static void ploop_bat_write_complete(struct bio *bio)
>>  
>>  	track_bio(ploop, bio);
>>  
>> +	if (!bio->bi_status) {
>> +		/*
>> +		 * Success: now update local BAT copy. We could do this
>> +		 * from our delayed work, but we want to publish new
>> +		 * mapping in the fastest way. This must be done before
>> +		 * data bios completion, since right after we complete
>> +		 * a bio, subsequent read wants to see written data
>> +		 * (ploop_map() wants to see not zero bat_entries[.]).
>> +		 */
>> +		ploop_advance_local_after_bat_wb(ploop, piwb, true);
>> +	}
>> +
>>  	spin_lock_irqsave(&piwb->lock, flags);
>>  	piwb->completed = true;
>>  	piwb->bi_status = bio->bi_status;
>> @@ -644,15 +656,6 @@ static void ploop_bat_write_complete(struct bio *bio)
>>  		complete_cow(cow, bio->bi_status);
>>  	}
>>  
>> -	if (!piwb->bi_status) {
>> -		/*
>> -		 * Success: now update local BAT copy. We could do this
>> -		 * from our delayed work, but we want to publish new
>> -		 * mapping in the fastest way.
>> -		 */
>> -		ploop_advance_local_after_bat_wb(ploop, piwb, true);
>> -	}
>> -
>>  	/*
>>  	 * In case of update BAT is failed, dst_clusters will be
>>  	 * set back to holes_bitmap on last put_piwb().
>>
>>
> 
> _______________________________________________
> Devel mailing list
> Devel@openvz.org
> https://lists.openvz.org/mailman/listinfo/devel
>