[RHEL7,COMMIT] mm, page_alloc: move_freepages should not examine struct page of reserved memory

Submitted by Vasily Averin on Dec. 3, 2020, 8:17 a.m.


Message ID 202012030817.0B38Hpjw019910@vz7build.vvs.sw.ru
State New
Series "mm, page_alloc: move_freepages should not examine struct page of reserved memory"
Headers show

Commit Message

Vasily Averin Dec. 3, 2020, 8:17 a.m.
The commit is pushed to "branch-rh7-3.10.0-1160.6.1.vz7.171.x-ovz" and will appear at https://src.openvz.org/scm/ovz/vzkernel.git
after rh7-3.10.0-1160.6.1.vz7.171.1
commit d9dc0aa1c50dc54d64c8882089b6940535796b70
Author: David Rientjes <rientjes@google.com>
Date:   Thu Dec 3 11:17:51 2020 +0300

    mm, page_alloc: move_freepages should not examine struct page of reserved memory
    After commit 907ec5fca3dc ("mm: zero remaining unavailable struct
    pages"), struct page of reserved memory is zeroed.  This causes
    page->flags to be 0 and fixes issues related to reading
    /proc/kpageflags, for example, of reserved memory.
    The VM_BUG_ON() in move_freepages_block(), however, assumes that
    page_zone() is meaningful even for reserved memory.  That assumption is
    no longer true after the aforementioned commit.
    There's no reason why move_freepages_block() should be testing the
    legitimacy of page_zone() for reserved memory; its scope is limited only
    to pages on the zone's freelist.
    Note that pfn_valid() can be true for reserved memory: there is a
    backing struct page.  The check for page_to_nid(page) is also buggy but
    reserved memory normally only appears on node 0 so the zeroing doesn't
    affect this.
    Move the debug checks to after verifying PageBuddy is true.  This
    isolates the scope of the checks to only be for buddy pages which are on
    the zone's freelist which move_freepages_block() is operating on.  In
    this case, an incorrect node or zone is a bug worthy of being warned
    about (and the examination of struct page is acceptable bcause this
    memory is not reserved).
    Why does move_freepages_block() gets called on reserved memory? It's
    simply math after finding a valid free page from the per-zone free area
    to use as fallback.  We find the beginning and end of the pageblock of
    the valid page and that can bring us into memory that was reserved per
    the e820.  pfn_valid() is still true (it's backed by a struct page), but
    since it's zero'd we shouldn't make any inferences here about comparing
    its node or zone.  The current node check just happens to succeed most
    of the time by luck because reserved memory typically appears on node 0.
    The fix here is to validate that we actually have buddy pages before
    testing if there's any type of zone or node strangeness going on.
    We noticed it almost immediately after bringing 907ec5fca3dc in on
    CONFIG_DEBUG_VM builds.  It depends on finding specific free pages in
    the per-zone free area where the math in move_freepages() will bring the
    start or end pfn into reserved memory and wanting to claim that entire
    pageblock as a new migratetype.  So the path will be rare, require
    CONFIG_DEBUG_VM, and require fallback to a different migratetype.
    Some struct pages were already zeroed from reserve pages before
    907ec5fca3c so it theoretically could trigger before this commit.  I
    think it's rare enough under a config option that most people don't run
    that others may not have noticed.  I wouldn't argue against a stable tag
    and the backport should be easy enough, but probably wouldn't single out
    a commit that this is fixing.
    Mel said:
    : The overhead of the debugging check is higher with this patch although
    : it'll only affect debug builds and the path is not particularly hot.
    : If this was a concern, I think it would be reasonable to simply remove
    : the debugging check as the zone boundaries are checked in
    : move_freepages_block and we never expect a zone/node to be smaller than
    : a pageblock and stuck in the middle of another zone.
    Link: http://lkml.kernel.org/r/alpine.DEB.2.21.1908122036560.10779@chino.kir.corp.google.com
    Signed-off-by: David Rientjes <rientjes@google.com>
    Acked-by: Mel Gorman <mgorman@techsingularity.net>
    Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
    Cc: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>
    Cc: Oscar Salvador <osalvador@suse.de>
    Cc: Pavel Tatashin <pavel.tatashin@microsoft.com>
    Cc: Vlastimil Babka <vbabka@suse.cz>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    (cherry-picked from commit cd961038381f392b364a7c4a040f4576ca415b1a)
    [Note: we don't have commit 907ec5fca3dc, but as changelog says
    this could trigger before it. And we have all other symptoms - reserved
    page from NUMA node 1 with zeroed struct page, so page_zone() gives us
    wrong zone, hence BUG_ON()].
    Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
 mm/page_alloc.c | 19 +++----------------
 1 file changed, 3 insertions(+), 16 deletions(-)

Patch hide | download patch | download mbox

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index f67278b..1ae193b 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1672,23 +1672,7 @@  int move_freepages(struct zone *zone,
 	unsigned long order;
 	int pages_moved = 0;
-	/*
-	 * page_zone is not safe to call in this context when
-	 * CONFIG_HOLES_IN_ZONE is set. This bug check is probably redundant
-	 * anyway as we check zone boundaries in move_freepages_block().
-	 * Remove at a later date when no bug reports exist related to
-	 * grouping pages by mobility
-	 */
-	BUG_ON(pfn_valid(page_to_pfn(start_page)) &&
-	       pfn_valid(page_to_pfn(end_page)) &&
-	       page_zone(start_page) != page_zone(end_page));
 	for (page = start_page; page <= end_page;) {
-		/* Make sure we are not inadvertently changing nodes */
-		VM_BUG_ON_PAGE(page_to_nid(page) != zone_to_nid(zone), page);
 		if (!pfn_valid_within(page_to_pfn(page))) {
@@ -1699,6 +1683,9 @@  int move_freepages(struct zone *zone,
+		VM_BUG_ON_PAGE(page_to_nid(page) != zone_to_nid(zone), page);
+		VM_BUG_ON_PAGE(page_zone(page) != zone, page);
 		order = page_order(page);