[RHEL7,COMMIT] Revert "blk-mq: punt failed direct issue to dispatch list"

Submitted by Konstantin Khorenko on June 6, 2019, 12:07 p.m.


Message ID 201906061207.x56C70uX015489@finist-ce7.sw.ru
State New
Series "Revert "blk-mq: issue directly if hw queue isn't busy in case of 'none'""
Headers show

Commit Message

Konstantin Khorenko June 6, 2019, 12:07 p.m.
The commit is pushed to "branch-rh7-3.10.0-957.12.2.vz7.96.x-ovz" and will appear at https://src.openvz.org/scm/ovz/vzkernel.git
after rh7-3.10.0-957.12.2.vz7.96.12
commit 5b3ec3e465958a5c36165e08be457b2ca19ddee9
Author: Konstantin Khorenko <khorenko@virtuozzo.com>
Date:   Mon Jun 3 13:10:57 2019 +0300

    Revert "blk-mq: punt failed direct issue to dispatch list"
    This reverts RedHat patch:
    * Thu Feb 07 2019 Jan Stancek <jstancek@redhat.com> [3.10.0-957.10.1.el7]
    - [block] blk-mq: punt failed direct issue to dispatch list (Ming Lei)
      [1670511 1656654]
    Signed-off-by: Konstantin Khorenko <khorenko@virtuozzo.com>
    Original comment:
     From cbbd11112cbcb6cdb489f4e500234ec33035341c Mon Sep 17 00:00:00 2001
     From: Ming Lei <ming.lei@redhat.com>
     Date: Sat, 8 Dec 2018 13:07:12 +0100
     Subject: [PATCH 10494/10502] [block] blk-mq: punt failed direct issue to
      dispatch list
     Message-id: <20181208130712.10869-3-ming.lei@redhat.com>
     Patchwork-id: 234764
     O-Subject: [RHEL7.7 2/2] blk-mq: punt failed direct issue to dispatch list
     Bugzilla: 1656654
     Z-Bugzilla: 1670511
     RH-Acked-by: Jeff Moyer <jmoyer@redhat.com>
     RH-Acked-by: Xiao Ni <xni@redhat.com>
     RH-Acked-by: Gopal Tiwari <gtiwari@redhat.com>
     From: Jens Axboe <axboe@kernel.dk>
     BZ: 1656654
     commit c616cbee97aed4bc6178f148a7240206dcdb85a6
     Author: Jens Axboe <axboe@kernel.dk>
     Date:   Thu Dec 6 22:17:44 2018 -0700
         blk-mq: punt failed direct issue to dispatch list
         After the direct dispatch corruption fix, we permanently disallow direct
         dispatch of non read/write requests. This works fine off the normal IO
         path, as they will be retried like any other failed direct dispatch
         request. But for the blk_insert_cloned_request() that only DM uses to
         bypass the bottom level scheduler, we always first attempt direct
         dispatch. For some types of requests, that's now a permanent failure,
         and no amount of retrying will make that succeed. This results in a
         Instead of making special cases for what we can direct issue, and now
         having to deal with DM solving the livelock while still retaining a BUSY
         condition feedback loop, always just add a request that has been through
         ->queue_rq() to the hardware queue dispatch list. These are safe to use
         as no merging can take place there. Additionally, if requests do have
         prepped data from drivers, we aren't dependent on them not sharing space
         in the request structure to safely add them to the IO scheduler lists.
         This basically reverts ffe81d45322c and is based on a patch from Ming,
         but with the list insert case covered as well.
         Fixes: ffe81d45322c ("blk-mq: fix corruption with direct issue")
         Cc: stable@vger.kernel.org
         Suggested-by: Ming Lei <ming.lei@redhat.com>
         Reported-by: Bart Van Assche <bvanassche@acm.org>
         Tested-by: Ming Lei <ming.lei@redhat.com>
         Acked-by: Mike Snitzer <snitzer@redhat.com>
         Signed-off-by: Jens Axboe <axboe@kernel.dk>
     Signed-off-by: Ming Lei <ming.lei@redhat.com>
     Signed-off-by: Jan Stancek <jstancek@redhat.com>
 block/blk-mq.c | 33 ++++++++++++++++++++++++++++-----
 1 file changed, 28 insertions(+), 5 deletions(-)

Patch hide | download patch | download mbox

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 58363a7f68e4..0166c2a2b346 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1695,6 +1695,15 @@  static int __blk_mq_issue_directly(struct blk_mq_hw_ctx *hctx, struct request *r
+		/*
+		 * If direct dispatch fails, we cannot allow any merging on
+		 * this IO. Drivers (like SCSI) may have set up permanent state
+		 * for this request, like SG tables and mappings, and if we
+		 * merge to it later on then we'll still only do IO to the
+		 * original part.
+		 */
+		rq->cmd_flags |= REQ_NOMERGE;
 		blk_mq_update_dispatch_busy(hctx, true);
@@ -1706,6 +1715,18 @@  static int __blk_mq_issue_directly(struct blk_mq_hw_ctx *hctx, struct request *r
 	return ret;
+ * Don't allow direct dispatch of anything but regular reads/writes,
+ * as some of the other commands can potentially share request space
+ * with data we need for the IO scheduler. If we attempt a direct dispatch
+ * on those and fail, we can't safely add it to the scheduler afterwards
+ * without potentially overwriting data that the driver has already written.
+ */
+static bool blk_rq_can_direct_dispatch(struct request *rq)
+	return req_op(rq) == REQ_OP_READ || req_op(rq) == REQ_OP_WRITE;
 static int __blk_mq_try_issue_directly(struct blk_mq_hw_ctx *hctx,
 						struct request *rq,
 						bool bypass_insert)
@@ -1726,7 +1747,7 @@  static int __blk_mq_try_issue_directly(struct blk_mq_hw_ctx *hctx,
 		goto insert;
-	if (q->elevator && !bypass_insert)
+	if (!blk_rq_can_direct_dispatch(rq) || (q->elevator && !bypass_insert))
 		goto insert;
 	if (!blk_mq_get_dispatch_budget(hctx))
@@ -1742,7 +1763,7 @@  static int __blk_mq_try_issue_directly(struct blk_mq_hw_ctx *hctx,
 	if (bypass_insert)
-	blk_mq_request_bypass_insert(rq, run_queue);
+	blk_mq_sched_insert_request(rq, false, run_queue, false);
 	return BLK_MQ_RQ_QUEUE_OK;
@@ -1756,7 +1777,7 @@  static void blk_mq_try_issue_directly(struct blk_mq_hw_ctx *hctx,
 	hctx_lock(hctx, &srcu_idx);
 	ret = __blk_mq_try_issue_directly(hctx, rq, false);
-		blk_mq_request_bypass_insert(rq, true);
+		blk_mq_sched_insert_request(rq, false, true, false);
 	else if (ret != BLK_MQ_RQ_QUEUE_OK)
 		blk_mq_end_request(rq, ret);
@@ -1785,13 +1806,15 @@  void blk_mq_try_issue_list_directly(struct blk_mq_hw_ctx *hctx,
 		struct request *rq = list_first_entry(list, struct request,
+		if (!blk_rq_can_direct_dispatch(rq))
+			break;
 		ret = blk_mq_request_issue_directly(rq);
 		if (ret != BLK_MQ_RQ_QUEUE_OK) {
 			if (ret == BLK_MQ_RQ_QUEUE_BUSY ||
 					ret == BLK_MQ_RQ_QUEUE_DEV_BUSY) {
-				blk_mq_request_bypass_insert(rq,
-							list_empty(list));
+				list_add(&rq->queuelist, list);
 			blk_mq_end_request(rq, ret);