[RHEL7,COMMIT] ms/vmscan: memcg: always use swappiness of the reclaimed memcg

Submitted by Konstantin Khorenko on Oct. 31, 2018, 1:19 p.m.


Message ID 201810311319.w9VDJfY3023922@finist-ce7.sw.ru
State New
Series "ms/vmscan: memcg: always use swappiness of the reclaimed memcg"
Headers show

Commit Message

Konstantin Khorenko Oct. 31, 2018, 1:19 p.m.
The commit is pushed to "branch-rh7-3.10.0-862.14.4.vz7.72.x-ovz" and will appear at https://src.openvz.org/scm/ovz/vzkernel.git
after rh7-3.10.0-862.14.4.vz7.72.16
commit 809a364c10a89f2d36aae68b93df29212778785b
Author: Michal Hocko <mhocko@suse.cz>
Date:   Wed Oct 31 16:19:41 2018 +0300

    ms/vmscan: memcg: always use swappiness of the reclaimed memcg
    Memory reclaim always uses swappiness of the reclaim target memcg
    (origin of the memory pressure) or vm_swappiness for global memory
    reclaim.  This behavior was consistent (except for difference between
    global and hard limit reclaim) because swappiness was enforced to be
    consistent within each memcg hierarchy.
    After "mm: memcontrol: remove hierarchy restrictions for swappiness and
    oom_control" each memcg can have its own swappiness independent of
    hierarchical parents, though, so the consistency guarantee is gone.
    This can lead to an unexpected behavior.  Say that a group is explicitly
    configured to not swapout by memory.swappiness=0 but its memory gets
    swapped out anyway when the memory pressure comes from its parent with a
    It is also unexpected that the knob is meaningless without setting the
    hard limit which would trigger the reclaim and enforce the swappiness.
    There are setups where the hard limit is configured higher in the
    hierarchy by an administrator and children groups are under control of
    somebody else who is interested in the swapout behavior but not
    necessarily about the memory limit.
    >From a semantic point of view swappiness is an attribute defining anon
     file proportional scanning of LRU which is memcg specific (unlike
    charges which are propagated up the hierarchy) so it should be applied
    to the particular memcg's LRU regardless where the memory pressure comes
    This patch removes vmscan_swappiness() and stores the swappiness into
    the scan_control structure.  mem_cgroup_swappiness is then used to
    provide the correct value before shrink_lruvec is called.  The global
    vm_swappiness is used for the root memcg.
    [hughd@google.com: oopses immediately when booted with cgroup_disable=memory]
    Signed-off-by: Michal Hocko <mhocko@suse.cz>
    Acked-by: Johannes Weiner <hannes@cmpxchg.org>
    Cc: Tejun Heo <tj@kernel.org>
    Signed-off-by: Hugh Dickins <hughd@google.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    (cherry picked from commit 688eb988d15af55c1d1b70b1ca9f6ce58f277c20)
    Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
 Documentation/cgroups/memory.txt | 15 +++++++--------
 mm/memcontrol.c                  |  2 +-
 mm/vmscan.c                      | 18 ++++++++----------
 3 files changed, 16 insertions(+), 19 deletions(-)

Patch hide | download patch | download mbox

diff --git a/Documentation/cgroups/memory.txt b/Documentation/cgroups/memory.txt
index 6fd693f45ed2..13a6acaa5416 100644
--- a/Documentation/cgroups/memory.txt
+++ b/Documentation/cgroups/memory.txt
@@ -532,14 +532,13 @@  recent_scanned_file	- VM internal parameter. (see mm/vmscan.c)
 5.3 swappiness
-Similar to /proc/sys/vm/swappiness, but only affecting reclaim that is
-triggered by this cgroup's hard limit.  The tunable in the root cgroup
-corresponds to the global swappiness setting.
-Please note that unlike the global swappiness, memcg knob set to 0
-really prevents from any swapping even if there is a swap storage
-available. This might lead to memcg OOM killer if there are no file
-pages to reclaim.
+Overrides /proc/sys/vm/swappiness for the particular group. The tunable
+in the root cgroup corresponds to the global swappiness setting.
+Please note that unlike during the global reclaim, limit reclaim
+enforces that 0 swappiness really prevents from any swapping even if
+there is a swap storage available. This might lead to memcg OOM killer
+if there are no file pages to reclaim.
 5.4 failcnt
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 423bcf8adfd0..c889102fe955 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2036,7 +2036,7 @@  int mem_cgroup_swappiness(struct mem_cgroup *memcg)
 	struct cgroup *cgrp = memcg->css.cgroup;
 	/* root ? */
-	if (cgrp->parent == NULL)
+	if (mem_cgroup_disabled() || cgrp->parent == NULL)
 		return vm_swappiness;
 	return memcg->swappiness;
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 43f761c39eac..cca655e08af4 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -103,6 +103,9 @@  struct scan_control {
 	/* Reclaim only slab */
 	bool slab_only;
+	/* anon vs. file LRUs scanning "ratio" */
+	int swappiness;
 	 * The memory cgroup that hit its limit and as a result is the
 	 * primary target of this reclaim invocation.
@@ -2041,13 +2044,6 @@  static unsigned long shrink_list(enum lru_list lru, unsigned long nr_to_scan,
 	return shrink_inactive_list(nr_to_scan, lruvec, sc, lru);
-static int vmscan_swappiness(struct scan_control *sc)
-	if (global_reclaim(sc))
-		return vm_swappiness;
-	return mem_cgroup_swappiness(sc->target_mem_cgroup);
 int sysctl_force_scan_thresh = 100;
@@ -2153,7 +2149,7 @@  static void get_scan_count(struct lruvec *lruvec, struct scan_control *sc,
 	 * using the memory controller's swap limit feature would be
 	 * too expensive.
-	if (!global_reclaim(sc) && !vmscan_swappiness(sc)) {
+	if (!global_reclaim(sc) && !sc->swappiness) {
 		scan_balance = SCAN_FILE;
 		goto out;
@@ -2163,7 +2159,7 @@  static void get_scan_count(struct lruvec *lruvec, struct scan_control *sc,
 	 * system is close to OOM, scan both anon and file equally
 	 * (unless the swappiness setting disagrees with swapping).
-	if (!sc->priority && vmscan_swappiness(sc)) {
+	if (!sc->priority && sc->swappiness) {
 		scan_balance = SCAN_EQUAL;
 		goto out;
@@ -2210,7 +2206,7 @@  static void get_scan_count(struct lruvec *lruvec, struct scan_control *sc,
 	 * With swappiness at 100, anonymous and file have the same priority.
 	 * This scanning priority is essentially the inverse of IO cost.
-	anon_prio = vmscan_swappiness(sc);
+	anon_prio = sc->swappiness;
 	file_prio = 200 - anon_prio;
@@ -2544,6 +2540,7 @@  static void shrink_zone(struct zone *zone, struct scan_control *sc,
 			if (!slab_only) {
 				lruvec = mem_cgroup_zone_lruvec(zone, memcg);
+				sc->swappiness = mem_cgroup_swappiness(memcg);
 				shrink_lruvec(lruvec, sc, &lru_pages);
 				zone_lru_pages += lru_pages;
@@ -3117,6 +3114,7 @@  unsigned long mem_cgroup_shrink_node_zone(struct mem_cgroup *memcg,
 		.may_swap = !noswap,
 		.order = 0,
 		.priority = 0,
+		.swappiness = mem_cgroup_swappiness(memcg),
 		.target_mem_cgroup = memcg,
 		.stat = &stat,