Make release_agent per-cgroup property. Run release_agent in proper ve.

Submitted by Valeriy Vdovin on July 28, 2020, 5:53 p.m.

Details

Reviewer None
Submitted July 28, 2020, 5:53 p.m.
Last Updated July 29, 2020, 3:54 a.m.
Revision 1

Cover Letter

Problems:
1. Currently release_agent is a mount-wide cgroup property, single for whole hierarchy. It is
not possible to override it's value for a cgroup down the hierarchy, which is a virtual root
for a container.
2. Code that spawnes release_agent notification processes, does so from ve0, inside of a container
any logic that waits for notifications of empty cgroups will fail, 
see https://jira.sw.ru/browse/PSBM-83887 for an example of such problem with systemd.

Solution:
In this patchset release_agent is moved from 'struct cgroupfs_root' to 'struct cgroup', enabling
the possibility to set release_agent per-ve.
Also 'struct cgroup' recieves a pointer to owning ve, so that release_agent notifications
could be spawned under the right ve.

v1: Removed complex locking scheme for ve_owner<->cgroup binding.
v2: release_agent_path protected by RCU
v3: cgroup_root_from_opts uses ..set_release_agent helper without lockdep
v4: fixed possible race at cgroup_release_agent
v5: Use per-ve workqueue to maintain per-ve cgroups notifications
v6: rebased to latest branch
v7: Fixed lockdeps, removed dependency from is_running param.
v8: cgroup_rcu_strdup uses strlcpy, ve_get_release_agent releases list spinlock early,
    patchset was split into lesser changes.
v9: rearranged cgroup_mark_ve_root with ve_workqueue_start, added lost kfree
v10: fixed indentation
v11: - patch 6,7 have been rearraged into 3 patches.
     - task_cgroup_from_root have been changed to css_cgroup_from_root for ve->init_task
     - cgroup_mount initialized cgroup->ve_owner to ve0
     - removed rarely used optimization branching from css_cgroup_from_root 
v12: Fixed compilation error in css_cgroup_from_root. Fixed usermodehelper error checking
v13: Moved struct ve * initialization to proper patch. cgroup is rm'ed from release_list 
     under rcu_read_lock.
v14: Rearranged error checking code after usermodehelper call in cgroup_release_agent
v15: Dedicated separate patch for release_agent file creation. Added logic of the same file
     destruction.
v16: Implemented release_agent file destruction at cgroup_unmark_ve_roots.
v17: Fixed missing RCU synchronize in mark_ve_root. Fixed missing release_agent assignment
     during cgroup_mount.
v18: Added RCU_INIT_POINTER in cgroup_mount. ve_get_release_agent rcu_dereference made before
     spin_unlock.
v19: Fixed build.
v20: Skip non-virtualized root-cgroups
v21: Fixed cgroup_show_options. Merged 2 per_cgroot functions in one. Added roll-back logic to
     cgroup_mark_ve_root in case of failure.

Valeriy Vdovin (14):
  ve/cgroup: implemented per-ve workqueue.
  cgroup: added rcu node string wrapper for in-cgroup usage.
  cgroup: declared cgroup_mark_ve_root in public header
  cgroup: exported __put_css_set and wrappers to cgroup.h
  ve/cgroup: saving root_css to ve
  ve/cgroup: unmark ve-root cgroups at container stop
  ve/cgroup: Added ve_owner field to cgroup
  ve/cgroup: moved release_agent from system_wq to per-ve workqueues
  ve/cgroup: Implemented logic that uses 'cgroup->ve_owner' to run
    release_agent notifications.
  ve/cgroup: private per-cgroup-root data container
  ve/cgroup: set release_agent_path for root cgroups separately for each
    ve.
  ve/cgroup: added release_agent to each container root cgroup.
  ve/cgroup: cleanup per_cgroot_data
  ve/cgroup: At cgroup_mark(unmark)_ve_roots skip non-virtualized roots

 include/linux/cgroup.h |  40 ++++-
 include/linux/ve.h     |  34 +++++
 kernel/cgroup.c        | 402 ++++++++++++++++++++++++++++++++++++++++---------
 kernel/ve/ve.c         | 240 ++++++++++++++++++++++++++++-
 4 files changed, 637 insertions(+), 79 deletions(-)
  

Revisions