[RHEL7,COMMIT] ms/memcg: remove tasks/children test from mem_cgroup_force_empty()

Submitted by Konstantin Khorenko on May 28, 2020, 1:19 p.m.

Details

Message ID 202005281319.04SDJlqD017718@finist-ce7.sw.ru
State New
Series "ms/memcg: remove tasks/children test from mem_cgroup_force_empty()"
Headers show

Commit Message

Konstantin Khorenko May 28, 2020, 1:19 p.m.
The commit is pushed to "branch-rh7-3.10.0-1127.8.2.vz7.151.x-ovz" and will appear at https://src.openvz.org/scm/ovz/vzkernel.git
after rh7-3.10.0-1127.8.2.vz7.151.7
------>
commit 8a91d6578f23692011553ae28e37e9254b587178
Author: Michal Hocko <mhocko@suse.cz>
Date:   Thu May 28 16:19:46 2020 +0300

    ms/memcg: remove tasks/children test from mem_cgroup_force_empty()
    
    Tejun has correctly pointed out that tasks/children test in
    mem_cgroup_force_empty is not correct because there is no other locking
    which preserves this state throughout the rest of the function so both
    new tasks can join the group or new children groups can be added while
    somebody is writing to memory.force_empty. A new task would break
    mem_cgroup_reparent_charges expectation that all failures as described
    by mem_cgroup_force_empty_list are temporal and there is no way out.
    
    The main use case for the knob as described by
    Documentation/cgroups/memory.txt is to:
    "
      The typical use case for this interface is before calling rmdir().
      Because rmdir() moves all pages to parent, some out-of-use page caches can be
      moved to the parent. If you want to avoid that, force_empty will be useful.
    "
    
    This means that reparenting is not really required as rmdir will
    reparent pages implicitly from the safe context. If we remove it from
    mem_cgroup_force_empty then we are safe even with existing tasks because
    the number of reclaim attempts is bounded. Moreover the knob still does
    what the documentation claims (modulo reparenting which doesn't make any
    difference) and users might expect. Longterm we want to deprecate the
    whole knob and put the reparented pages to the tail of parent LRU during
    cgroup removal.
    
    tj: Removed unused variable @cgrp from mem_cgroup_force_empty()
    
    Signed-off-by: Michal Hocko <mhocko@suse.cz>
    Acked-by: Johannes Weiner <hannes@cmpxchg.org>
    Acked-by: Li Zefan <lizefan@huawei.com>
    Signed-off-by: Tejun Heo <tj@kernel.org>
    
    https://jira.sw.ru/browse/PSBM-104451
    [This fixes hang when writing to memory.force_empty which is caused
    by fake ->kmem charge. Normaly reparenting happens during offline
    and fake charge already uncharged. In the case of writing to
    memory.force_empty cgroup is still online, so mem_cgroup_reparent_charges()
    hangs because it doesn't know about fake charge. Another solution would
    be teaching it about fake charge, but this on seems better.]
    
    (cherry picked from commit f61c42a7d9119a8b72b9607ba8e3a34111f81d8c)
    Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
---
 Documentation/cgroups/memory.txt | 6 +-----
 mm/memcontrol.c                  | 7 -------
 2 files changed, 1 insertion(+), 12 deletions(-)

Patch hide | download patch | download mbox

diff --git a/Documentation/cgroups/memory.txt b/Documentation/cgroups/memory.txt
index 13a6acaa54163..b86054df18a02 100644
--- a/Documentation/cgroups/memory.txt
+++ b/Documentation/cgroups/memory.txt
@@ -452,15 +452,11 @@  About use_hierarchy, see Section 6.
 
 5.1 force_empty
   memory.force_empty interface is provided to make cgroup's memory usage empty.
-  You can use this interface only when the cgroup has no tasks.
   When writing anything to this
 
   # echo 0 > memory.force_empty
 
-  Almost all pages tracked by this memory cgroup will be unmapped and freed.
-  Some pages cannot be freed because they are locked or in-use. Such pages are
-  moved to parent (if use_hierarchy==1) or root (if use_hierarchy==0) and this
-  cgroup will be empty.
+  the cgroup will be reclaimed and as many pages reclaimed as possible.
 
   The typical use case for this interface is before calling rmdir().
   Because rmdir() moves all pages to parent, some out-of-use page caches can be
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 2f50b46cc24fc..c2713cfd99753 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -4500,11 +4500,6 @@  static inline bool memcg_has_children(struct mem_cgroup *memcg)
 static int mem_cgroup_force_empty(struct mem_cgroup *memcg)
 {
 	int nr_retries = MEM_CGROUP_RECLAIM_RETRIES;
-	struct cgroup *cgrp = memcg->css.cgroup;
-
-	/* returns EBUSY if there is a task or if we come here twice. */
-	if (cgroup_task_count(cgrp) || !list_empty(&cgrp->children))
-		return -EBUSY;
 
 	/* we call try-to-free pages for make this cgroup empty */
 	lru_add_drain_all();
@@ -4524,8 +4519,6 @@  static int mem_cgroup_force_empty(struct mem_cgroup *memcg)
 		}
 
 	}
-	lru_add_drain();
-	mem_cgroup_reparent_charges(memcg);
 
 	return 0;
 }