[RHEL7,COMMIT] ms/jbd2: clear dirty flag when revoking a buffer from an older transaction

Submitted by Konstantin Khorenko on Aug. 22, 2019, 11:36 a.m.

Details

Message ID 201908221136.x7MBaUPt011349@finist-ce7.sw.ru
State New
Series "ext4/jbd2: port data corruption fixes from ms"
Headers show

Commit Message

Konstantin Khorenko Aug. 22, 2019, 11:36 a.m.
The commit is pushed to "branch-rh7-3.10.0-957.27.2.vz7.107.x-ovz" and will appear at https://src.openvz.org/scm/ovz/vzkernel.git
after rh7-3.10.0-957.27.2.vz7.107.5
------>
commit e0e0d13f31f1f86380ae117e5773d82e47454461
Author: zhangyi (F) <yi.zhang@huawei.com>
Date:   Thu Aug 22 14:36:30 2019 +0300

    ms/jbd2: clear dirty flag when revoking a buffer from an older transaction
    
    Now, we capture a data corruption problem on ext4 while we're truncating
    an extent index block. Imaging that if we are revoking a buffer which
    has been journaled by the committing transaction, the buffer's jbddirty
    flag will not be cleared in jbd2_journal_forget(), so the commit code
    will set the buffer dirty flag again after refile the buffer.
    
    fsx                               kjournald2
                                      jbd2_journal_commit_transaction
    jbd2_journal_revoke                commit phase 1~5...
     jbd2_journal_forget
       belongs to older transaction    commit phase 6
       jbddirty not clear               __jbd2_journal_refile_buffer
                                         __jbd2_journal_unfile_buffer
                                          test_clear_buffer_jbddirty
                                           mark_buffer_dirty
    
    Finally, if the freed extent index block was allocated again as data
    block by some other files, it may corrupt the file data after writing
    cached pages later, such as during unmount time. (In general,
    clean_bdev_aliases() related helpers should be invoked after
    re-allocation to prevent the above corruption, but unfortunately we
    missed it when zeroout the head of extra extent blocks in
    ext4_ext_handle_unwritten_extents()).
    
    This patch mark buffer as freed and set j_next_transaction to the new
    transaction when it already belongs to the committing transaction in
    jbd2_journal_forget(), so that commit code knows it should clear dirty
    bits when it is done with the buffer.
    
    This problem can be reproduced by xfstests generic/455 easily with
    seeds (3246 3247 3248 3249).
    
    https://jira.sw.ru/browse/PSBM-96719
    
    Signed-off-by: zhangyi (F) <yi.zhang@huawei.com>
    Signed-off-by: Theodore Ts'o <tytso@mit.edu>
    Reviewed-by: Jan Kara <jack@suse.cz>
    Cc: stable@vger.kernel.org
    
    (cherry picked from commit 904cdbd41d749a476863a0ca41f6f396774f26e4)
    Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
    
    =====================
    Patchset description:
    
    ext4/jbd2: port data corruption fixes from ms
    
    While investigating the data corruption on vzt-ploop-check test, when we
    detect one page in file which contains wrong data, we were lucky to have
    exact the same pattern in bad page each time. So we've added a small
    debug to fail on setting a dirty bit for a page if it contains the
    pattern, in the begining of __set_page_dirty and
    __set_page_dirty_nobuffers.
    
    We've got a crash, which looks related with the ported patches:
    
    crash> bt
    PID: 17855  TASK: ffff8cfb19144000  CPU: 3   COMMAND: "jbd2/ploop45613"
     #0 [ffff8cfcb6fdf8a0] machine_kexec at ffffffff9e2643c4
     #1 [ffff8cfcb6fdf900] __crash_kexec at ffffffff9e32d672
     #2 [ffff8cfcb6fdf9d0] crash_kexec at ffffffff9e32d760
     #3 [ffff8cfcb6fdf9e8] oops_end at ffffffff9e99f858
     #4 [ffff8cfcb6fdfa10] die at ffffffff9e22f88b
     #5 [ffff8cfcb6fdfa40] do_trap at ffffffff9e99eee0
     #6 [ffff8cfcb6fdfa90] do_invalid_op at ffffffff9e22c1d4
     #7 [ffff8cfcb6fdfb40] invalid_op at ffffffff9e9a928e
        [exception RIP: page_check_corruption_pattern+397]
        RIP: ffffffff9e3d719d  RSP: ffff8cfcb6fdfbf8  RFLAGS: 00010246
        RAX: ffff8cfcb6fdffd8  RBX: 00007303b0607000  RCX: 000000010025603b
        RDX: 0000000000000190  RSI: 0000000000000000  RDI: 0000000000000206
        RBP: ffff8cfcb6fdfc10   R8: ffff8cfbe6c19e00   R9: 0000000000000001
        R10: 0000000000000004  R11: 0000000000000005  R12: ffffe079873e7e40
        R13: ffff8cfca7bc3ab0  R14: ffff8cfc351d5a90  R15: 0000000000000000
        ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
     #8 [ffff8cfcb6fdfbf0] page_check_corruption_pattern at ffffffff9e3d70d6
     #9 [ffff8cfcb6fdfc18] __set_page_dirty at ffffffff9e499cb5
     #10 [ffff8cfcb6fdfc50] mark_buffer_dirty at ffffffff9e499efa
     #11 [ffff8cfcb6fdfc70] __jbd2_journal_temp_unlink_buffer at ffffffffc048893a [jbd2]
     #12 [ffff8cfcb6fdfc80] __jbd2_journal_refile_buffer at ffffffffc048ac08 [jbd2]
     #13 [ffff8cfcb6fdfca8] jbd2_journal_commit_transaction at ffffffffc048c1e0 [jbd2]
     #14 [ffff8cfcb6fdfe48] kjournald2 at ffffffffc0491f79 [jbd2]
     #15 [ffff8cfcb6fdfec8] kthread at ffffffff9e2c4661
    
    Before ("jbd2: clear dirty flag when revoking a buffer from an older
    transaction") revoken buffer/page can be wrongly marked dirty, and later
    be wrongly written to disk. Other patches from the same series might be
    also helpful.
    
    https://jira.sw.ru/browse/PSBM-96719
    
    zhangyi (F) (3):
      jbd2: clear dirty flag when revoking a buffer from an older
            transaction
      jbd2: discard dirty data when forgetting an un-journalled buffer
      ext4: cleanup clean_bdev_aliases() calls
---
 fs/jbd2/transaction.c | 17 ++++++++++++-----
 1 file changed, 12 insertions(+), 5 deletions(-)

Patch hide | download patch | download mbox

diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c
index 9082eae76518..c182906a8bc4 100644
--- a/fs/jbd2/transaction.c
+++ b/fs/jbd2/transaction.c
@@ -1494,14 +1494,21 @@  int jbd2_journal_forget (handle_t *handle, struct buffer_head *bh)
 		/* However, if the buffer is still owned by a prior
 		 * (committing) transaction, we can't drop it yet... */
 		JBUFFER_TRACE(jh, "belongs to older transaction");
-		/* ... but we CAN drop it from the new transaction if we
-		 * have also modified it since the original commit. */
+		/* ... but we CAN drop it from the new transaction through
+		 * marking the buffer as freed and set j_next_transaction to
+		 * the new transaction, so that not only the commit code
+		 * knows it should clear dirty bits when it is done with the
+		 * buffer, but also the buffer can be checkpointed only
+		 * after the new transaction commits. */
 
-		if (jh->b_next_transaction) {
-			J_ASSERT(jh->b_next_transaction == transaction);
+		set_buffer_freed(bh);
+
+		if (!jh->b_next_transaction) {
 			spin_lock(&journal->j_list_lock);
-			jh->b_next_transaction = NULL;
+			jh->b_next_transaction = transaction;
 			spin_unlock(&journal->j_list_lock);
+		} else {
+			J_ASSERT(jh->b_next_transaction == transaction);
 
 			/*
 			 * only drop a reference if this transaction modified