[Devel,vz7] ploop: fix race on map->levels[] update

Submitted by Maxim Patlasov on Aug. 8, 2017, 2:21 a.m.

Details

Message ID 150215887031.19063.3813806723921930053.stgit@maxim-thinkpad
State New
Series "ploop: fix race on map->levels[] update"
Headers show

Commit Message

Maxim Patlasov Aug. 8, 2017, 2:21 a.m.
Doing fast-path, ploop needs to know iblock and corresponding
delta for io. iblock is stored in page_address(m->page))[idx]
while delta lookup will use m->levels[idx]. Hence both reads and
writes must access these bits atomically. Otherwise, a reader
might use older page[idx] and newer levels[idx] or vice-versa.
Both would be mistake.

Fast-path is already acquires plo->lock, so the patch adds
lock/unlock only to writer: map_wb_complete.

Thanks to Evgenii Shatokhin for discovery and analysis:

Report:
[rh] Detected a data race on the the memory block at ffff880080f7a956, size 1:
[rh] ----- Thread #1, comm: ploop55307 -----
[rh] IP: [<ffffffffa052b259>] ploop_index_wb_complete+0x2d9/0x850 [ploop]
 [<ffffffffa0525140>] ploop_req_state_process+0x4c0/0xc30 [ploop]
 [<ffffffffa0525af0>] ploop_thread+0x240/0x560 [ploop]
 [<ffffffff810b27df>] kthread+0xcf/0xe0
 [<ffffffff81693218>] ret_from_fork+0x58/0x90
[rh] ----- Thread #2, comm: gcc -----
[rh] IP: right before [<ffffffffa0529cc5>] ploop_fastmap+0x125/0x160 [ploop]
 <handling of the hardware BP, skipped>
 [<ffffffffa051e738>] ploop_fast_lookup+0x48/0xf0 [ploop]
 [<ffffffffa0522bfd>] ploop_make_request+0x83d/0xc90 [ploop]
 [<ffffffff812e93e9>] generic_make_request+0x109/0x1e0
 [<ffffffff812e9537>] submit_bio+0x77/0x1c0
 [<ffffffffa034a605>] ext4_io_submit+0x25/0x50 [ext4]
 [<ffffffffa03467f3>] ext4_writepages+0x8a3/0xd60 [ext4]
 [<ffffffff8119edee>] do_writepages+0x1e/0x40
 [<ffffffff81192382>] __filemap_fdatawrite_range+0x62/0x80
 [<ffffffff8119246c>] filemap_flush+0x1c/0x20
 [<ffffffffa0343d0c>] ext4_alloc_da_blocks+0x2c/0x70 [ext4]
 [<ffffffffa03525aa>] ext4_rename+0x62a/0x870 [ext4]
 [<ffffffffa035280d>] ext4_rename2+0x1d/0x40 [ext4]
 [<ffffffff812250b0>] vfs_rename+0x210/0x850
 [<ffffffff81229933>] SYSC_renameat2+0x4d3/0x570
 [<ffffffff8122a78e>] SyS_renameat2+0xe/0x10
 [<ffffffff8122a7ce>] SyS_rename+0x1e/0x20
 [<ffffffff816932c9>] system_call_fastpath+0x16/0x1b

The first thread, "ploop55307", was executing map_wb_complete() function, drivers/block/ploop/map.c:1152, inlined into ploop_index_wb_complete():
1258    if (m->levels) {
1259        m->levels[idx] = top_delta->level;    // data are written here

The second thread, "gcc", was executing ploop_fastmap() function, drivers/block/ploop/map.c:239:
238    if (m->levels)
239        return m->levels[idx];    // data are read here

Race on m->levels[idx], I suppose.

Signed-off-by: Maxim Patlasov <mpatlasov@virtuozzo.com>
---
 drivers/block/ploop/map.c |   11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

Patch hide | download patch | download mbox

diff --git a/drivers/block/ploop/map.c b/drivers/block/ploop/map.c
index b6f2243..78ae341 100644
--- a/drivers/block/ploop/map.c
+++ b/drivers/block/ploop/map.c
@@ -1142,6 +1142,7 @@  static void map_wb_complete(struct map_node * m, int err)
 			idx = (preq->req_cluster + PLOOP_MAP_OFFSET) & (INDEX_PER_PAGE - 1);
 			if (!err) {
 				struct ploop_request *pr = preq;
+				int do_levels_update = 0;
 
 				if (unlikely(test_bit(PLOOP_REQ_ZERO, &preq->state))) {
 					BUG_ON (list_empty(&preq->delay_list));
@@ -1150,6 +1151,11 @@  static void map_wb_complete(struct map_node * m, int err)
 							      list);
 				}
 
+				if (m->levels &&  m->levels[idx] != top_delta->level) {
+					spin_lock_irq(&plo->lock);
+					do_levels_update = 1;
+				}
+
 				if (unlikely(test_bit(PLOOP_REQ_RELOC_A, &preq->state) ||
 					     test_bit(PLOOP_REQ_ZERO, &preq->state)))
 					map_idx_swap(m, idx, &pr->iblock,
@@ -1159,7 +1165,10 @@  static void map_wb_complete(struct map_node * m, int err)
 						pr->iblock << ploop_map_log(plo);
 
 				if (m->levels) {
-					m->levels[idx] = top_delta->level;
+					if (do_levels_update) {
+						m->levels[idx] = top_delta->level;
+						spin_unlock_irq(&plo->lock);
+					}
 				} else {
 					BUG_ON(MAP_LEVEL(m) != top_delta->level);
 				}

Comments

Konstantin Khorenko Aug. 8, 2017, 9:24 a.m.
Please, consider to prepare a ReadyKernel patch for it.

https://readykernel.com/

--
Best regards,

Konstantin Khorenko,
Virtuozzo Linux Kernel Team

On 08/08/2017 05:21 AM, Maxim Patlasov wrote:
> Doing fast-path, ploop needs to know iblock and corresponding
> delta for io. iblock is stored in page_address(m->page))[idx]
> while delta lookup will use m->levels[idx]. Hence both reads and
> writes must access these bits atomically. Otherwise, a reader
> might use older page[idx] and newer levels[idx] or vice-versa.
> Both would be mistake.
>
> Fast-path is already acquires plo->lock, so the patch adds
> lock/unlock only to writer: map_wb_complete.
>
> Thanks to Evgenii Shatokhin for discovery and analysis:
>
> Report:
> [rh] Detected a data race on the the memory block at ffff880080f7a956, size 1:
> [rh] ----- Thread #1, comm: ploop55307 -----
> [rh] IP: [<ffffffffa052b259>] ploop_index_wb_complete+0x2d9/0x850 [ploop]
>  [<ffffffffa0525140>] ploop_req_state_process+0x4c0/0xc30 [ploop]
>  [<ffffffffa0525af0>] ploop_thread+0x240/0x560 [ploop]
>  [<ffffffff810b27df>] kthread+0xcf/0xe0
>  [<ffffffff81693218>] ret_from_fork+0x58/0x90
> [rh] ----- Thread #2, comm: gcc -----
> [rh] IP: right before [<ffffffffa0529cc5>] ploop_fastmap+0x125/0x160 [ploop]
>  <handling of the hardware BP, skipped>
>  [<ffffffffa051e738>] ploop_fast_lookup+0x48/0xf0 [ploop]
>  [<ffffffffa0522bfd>] ploop_make_request+0x83d/0xc90 [ploop]
>  [<ffffffff812e93e9>] generic_make_request+0x109/0x1e0
>  [<ffffffff812e9537>] submit_bio+0x77/0x1c0
>  [<ffffffffa034a605>] ext4_io_submit+0x25/0x50 [ext4]
>  [<ffffffffa03467f3>] ext4_writepages+0x8a3/0xd60 [ext4]
>  [<ffffffff8119edee>] do_writepages+0x1e/0x40
>  [<ffffffff81192382>] __filemap_fdatawrite_range+0x62/0x80
>  [<ffffffff8119246c>] filemap_flush+0x1c/0x20
>  [<ffffffffa0343d0c>] ext4_alloc_da_blocks+0x2c/0x70 [ext4]
>  [<ffffffffa03525aa>] ext4_rename+0x62a/0x870 [ext4]
>  [<ffffffffa035280d>] ext4_rename2+0x1d/0x40 [ext4]
>  [<ffffffff812250b0>] vfs_rename+0x210/0x850
>  [<ffffffff81229933>] SYSC_renameat2+0x4d3/0x570
>  [<ffffffff8122a78e>] SyS_renameat2+0xe/0x10
>  [<ffffffff8122a7ce>] SyS_rename+0x1e/0x20
>  [<ffffffff816932c9>] system_call_fastpath+0x16/0x1b
>
> The first thread, "ploop55307", was executing map_wb_complete() function, drivers/block/ploop/map.c:1152, inlined into ploop_index_wb_complete():
> 1258    if (m->levels) {
> 1259        m->levels[idx] = top_delta->level;    // data are written here
>
> The second thread, "gcc", was executing ploop_fastmap() function, drivers/block/ploop/map.c:239:
> 238    if (m->levels)
> 239        return m->levels[idx];    // data are read here
>
> Race on m->levels[idx], I suppose.
>
> Signed-off-by: Maxim Patlasov <mpatlasov@virtuozzo.com>
> ---
>  drivers/block/ploop/map.c |   11 ++++++++++-
>  1 file changed, 10 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/block/ploop/map.c b/drivers/block/ploop/map.c
> index b6f2243..78ae341 100644
> --- a/drivers/block/ploop/map.c
> +++ b/drivers/block/ploop/map.c
> @@ -1142,6 +1142,7 @@ static void map_wb_complete(struct map_node * m, int err)
>  			idx = (preq->req_cluster + PLOOP_MAP_OFFSET) & (INDEX_PER_PAGE - 1);
>  			if (!err) {
>  				struct ploop_request *pr = preq;
> +				int do_levels_update = 0;
>
>  				if (unlikely(test_bit(PLOOP_REQ_ZERO, &preq->state))) {
>  					BUG_ON (list_empty(&preq->delay_list));
> @@ -1150,6 +1151,11 @@ static void map_wb_complete(struct map_node * m, int err)
>  							      list);
>  				}
>
> +				if (m->levels &&  m->levels[idx] != top_delta->level) {
> +					spin_lock_irq(&plo->lock);
> +					do_levels_update = 1;
> +				}
> +
>  				if (unlikely(test_bit(PLOOP_REQ_RELOC_A, &preq->state) ||
>  					     test_bit(PLOOP_REQ_ZERO, &preq->state)))
>  					map_idx_swap(m, idx, &pr->iblock,
> @@ -1159,7 +1165,10 @@ static void map_wb_complete(struct map_node * m, int err)
>  						pr->iblock << ploop_map_log(plo);
>
>  				if (m->levels) {
> -					m->levels[idx] = top_delta->level;
> +					if (do_levels_update) {
> +						m->levels[idx] = top_delta->level;
> +						spin_unlock_irq(&plo->lock);
> +					}
>  				} else {
>  					BUG_ON(MAP_LEVEL(m) != top_delta->level);
>  				}
>
> .
>