From patchwork Wed May 27 18:34:51 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Subject: [RHEL7, COMMIT] Revert net: limit a number of namespaces which can be cleaned up concurrently From: Konstantin Khorenko X-Patchwork-Id: 12863 Message-Id: <202005271834.04RIYpmW014835@finist-ce7.sw.ru> To: Kirill Tkhai Cc: OpenVZ devel Date: Wed, 27 May 2020 21:34:51 +0300 The commit is pushed to "branch-rh7-3.10.0-1127.8.2.vz7.161.x-ovz" and will appear at https://src.openvz.org/scm/ovz/vzkernel.git after rh7-3.10.0-1127.8.2.vz7.161.3 ------> commit c4553160eb0f3d866bb4f895c01e75ed4e8a0a04 Author: Kirill Tkhai Date: Wed May 27 21:34:50 2020 +0300 Revert net: limit a number of namespaces which can be cleaned up concurrently rwsem series will fix this problem properly. Signed-off-by: Kirill Tkhai ===================== Patchset description: Parallel per-net init/exit https://jira.sw.ru/browse/PSBM-104158 --- net/core/net_namespace.c | 16 +--------------- 1 file changed, 1 insertion(+), 15 deletions(-) diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c index ee673ead5f04c..ba2c8077eb883 100644 --- a/net/core/net_namespace.c +++ b/net/core/net_namespace.c @@ -435,26 +435,12 @@ static void cleanup_net(struct work_struct *work) struct net *net, *tmp; struct list_head net_kill_list; LIST_HEAD(net_exit_list); - bool reload = false; - int i = 0; /* Atomically snapshot the list of namespaces to cleanup */ spin_lock_irq(&cleanup_list_lock); - list_for_each_entry_safe(net, tmp, &cleanup_list, cleanup_list) - if (++i == 16) - break; - - if (i == 16) { - list_cut_position(&net_kill_list, &cleanup_list, - &net->cleanup_list); - reload = true; - } else - list_replace_init(&cleanup_list, &net_kill_list); + list_replace_init(&cleanup_list, &net_kill_list); spin_unlock_irq(&cleanup_list_lock); - if (reload) - queue_work(netns_wq, work); - mutex_lock(&net_mutex); /* Don't let anyone else find us. */ From patchwork Wed May 27 18:34:52 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Subject: [RHEL7,COMMIT] net: Fix possible race in peernet2id_alloc() From: Konstantin Khorenko X-Patchwork-Id: 12862 Message-Id: <202005271834.04RIYqh9014890@finist-ce7.sw.ru> To: Kirill Tkhai Cc: OpenVZ devel Date: Wed, 27 May 2020 21:34:52 +0300 The commit is pushed to "branch-rh7-3.10.0-1127.8.2.vz7.161.x-ovz" and will appear at https://src.openvz.org/scm/ovz/vzkernel.git after rh7-3.10.0-1127.8.2.vz7.161.3 ------> commit 38fda25fa453f7f0476ec88c217d087c2e09a76c Author: Kirill Tkhai Date: Wed May 27 21:34:52 2020 +0300 net: Fix possible race in peernet2id_alloc() ms commit 0c06bea919f3 peernet2id_alloc() is racy without rtnl_lock() as refcount_read(&peer->count) under net->nsid_lock does not guarantee, peer is alive: rcu_read_lock() peernet2id_alloc() .. spin_lock_bh(&net->nsid_lock) .. refcount_read(&peer->count) (!= 0) .. .. put_net() .. cleanup_net() .. for_each_net(tmp) .. spin_lock_bh(&tmp->nsid_lock) .. __peernet2id(tmp, net) == -1 .. .. .. .. __peernet2id_alloc(alloc == true) .. .. .. rcu_read_unlock() .. .. synchronize_rcu() .. kmem_cache_free(net) After the above situation, net::netns_id contains id pointing to freed memory, and any other dereferencing by the id will operate with this freed memory. Currently, peernet2id_alloc() is used under rtnl_lock() everywhere except ovs_vport_cmd_fill_info(), and this race can't occur. But peernet2id_alloc() is generic interface, and better we fix it before someone really starts use it in wrong context. v2: Don't place refcount_read(&net->count) under net->nsid_lock as suggested by Eric W. Biederman v3: Rebase on top of net-next Signed-off-by: Kirill Tkhai Signed-off-by: David S. Miller ===================== Patchset description: Parallel per-net init/exit https://jira.sw.ru/browse/PSBM-104158 --- net/core/net_namespace.c | 13 +++++++++++-- 1 file changed, 11 insertions(+), 2 deletions(-) diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c index ba2c8077eb883..b550d025e0d4a 100644 --- a/net/core/net_namespace.c +++ b/net/core/net_namespace.c @@ -224,16 +224,25 @@ static void rtnl_net_notifyid(struct net *net, int cmd, int id); */ int peernet2id_alloc(struct net *net, struct net *peer) { + bool alloc = false, alive = false; unsigned long flags; - bool alloc; int id; spin_lock_irqsave(&net->nsid_lock, flags); - alloc = atomic_read(&peer->count) == 0 ? false : true; + /* + * When peer is obtained from RCU lists, we may race with + * its cleanup. Check whether it's alive, and this guarantees + * we never hash a peer back to net->netns_ids, after it has + * just been idr_remove()'d from there in cleanup_net(). + */ + if (maybe_get_net(peer)) + alive = alloc = true; id = __peernet2id_alloc(net, peer, &alloc); spin_unlock_irqrestore(&net->nsid_lock, flags); if (alloc && id >= 0) rtnl_net_notifyid(net, RTM_NEWNSID, id); + if (alive) + put_net(peer); return id; } EXPORT_SYMBOL_GPL(peernet2id_alloc); From patchwork Wed May 27 18:34:53 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Subject: [RHEL7, COMMIT] net: Move net:netns_ids destruction out of rtnl_lock() and document locking scheme From: Konstantin Khorenko X-Patchwork-Id: 12865 Message-Id: <202005271834.04RIYrZP014941@finist-ce7.sw.ru> To: Kirill Tkhai Cc: OpenVZ devel Date: Wed, 27 May 2020 21:34:53 +0300 The commit is pushed to "branch-rh7-3.10.0-1127.8.2.vz7.161.x-ovz" and will appear at https://src.openvz.org/scm/ovz/vzkernel.git after rh7-3.10.0-1127.8.2.vz7.161.3 ------> commit ec6ba20ee576ef78aafad6697d705787a76bc3a6 Author: Kirill Tkhai Date: Wed May 27 21:34:53 2020 +0300 net: Move net:netns_ids destruction out of rtnl_lock() and document locking scheme ms commit fb07a820fe3f Currently, we unhash a dying net from netns_ids lists under rtnl_lock(). It's a leftover from the time when net::netns_ids was introduced. There was no net::nsid_lock, and rtnl_lock() was mostly need to order modification of alive nets nsid idr, i.e. for: for_each_net(tmp) { ... id = __peernet2id(tmp, net); idr_remove(&tmp->netns_ids, id); ... } Since we have net::nsid_lock, the modifications are protected by this local lock, and now we may introduce better scheme of netns_ids destruction. Let's look at the functions peernet2id_alloc() and get_net_ns_by_id(). Previous commits taught these functions to work well with dying net acquired from rtnl unlocked lists. And they are the only functions which can hash a net to netns_ids or obtain from there. And as easy to check, other netns_ids operating functions works with id, not with net pointers. So, we do not need rtnl_lock to synchronize cleanup_net() with all them. The another property, which is used in the patch, is that net is unhashed from net_namespace_list in the only place and by the only process. So, we avoid excess rcu_read_lock() or rtnl_lock(), when we'are iterating over the list in unhash_nsid(). All the above makes possible to keep rtnl_lock() locked only for net->list deletion, and completely avoid it for netns_ids unhashing and destruction. As these two doings may take long time (e.g., memory allocation to send skb), the patch should positively act on the scalability and signify decrease the time, which rtnl_lock() is held in cleanup_net(). Signed-off-by: Kirill Tkhai Signed-off-by: David S. Miller ===================== Patchset description: Parallel per-net init/exit https://jira.sw.ru/browse/PSBM-104158 --- net/core/net_namespace.c | 62 ++++++++++++++++++++++++++++++++++-------------- 1 file changed, 44 insertions(+), 18 deletions(-) diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c index b550d025e0d4a..a8b7a9ddbe7c6 100644 --- a/net/core/net_namespace.c +++ b/net/core/net_namespace.c @@ -435,13 +435,40 @@ struct net *copy_net_ns(unsigned long flags, return net; } +static void unhash_nsid(struct net *net, struct net *last) +{ + struct net *tmp; + /* This function is only called from cleanup_net() work, + * and this work is the only process, that may delete + * a net from net_namespace_list. So, when the below + * is executing, the list may only grow. Thus, we do not + * use for_each_net_rcu() or rtnl_lock(). + */ + for_each_net(tmp) { + int id; + + spin_lock_irq(&tmp->nsid_lock); + id = __peernet2id(tmp, net); + if (id >= 0) + idr_remove(&tmp->netns_ids, id); + spin_unlock_irq(&tmp->nsid_lock); + if (id >= 0) + rtnl_net_notifyid(tmp, RTM_DELNSID, id); + if (tmp == last) + break; + } + spin_lock_irq(&net->nsid_lock); + idr_destroy(&net->netns_ids); + spin_unlock_irq(&net->nsid_lock); +} + static DEFINE_SPINLOCK(cleanup_list_lock); static LIST_HEAD(cleanup_list); /* Must hold cleanup_list_lock to touch */ static void cleanup_net(struct work_struct *work) { const struct pernet_operations *ops; - struct net *net, *tmp; + struct net *net, *tmp, *last; struct list_head net_kill_list; LIST_HEAD(net_exit_list); @@ -454,26 +481,25 @@ static void cleanup_net(struct work_struct *work) /* Don't let anyone else find us. */ rtnl_lock(); - list_for_each_entry(net, &net_kill_list, cleanup_list) { + list_for_each_entry(net, &net_kill_list, cleanup_list) list_del_rcu(&net->list); - list_add_tail(&net->exit_list, &net_exit_list); - for_each_net(tmp) { - int id; - - spin_lock_irq(&tmp->nsid_lock); - id = __peernet2id(tmp, net); - if (id >= 0) - idr_remove(&tmp->netns_ids, id); - spin_unlock_irq(&tmp->nsid_lock); - if (id >= 0) - rtnl_net_notifyid(tmp, RTM_DELNSID, id); - } - spin_lock_irq(&net->nsid_lock); - idr_destroy(&net->netns_ids); - spin_unlock_irq(&net->nsid_lock); + /* Cache last net. After we unlock rtnl, no one new net + * added to net_namespace_list can assign nsid pointer + * to a net from net_kill_list (see peernet2id_alloc()). + * So, we skip them in unhash_nsid(). + * + * Note, that unhash_nsid() does not delete nsid links + * between net_kill_list's nets, as they've already + * deleted from net_namespace_list. But, this would be + * useless anyway, as netns_ids are destroyed there. + */ + last = list_last_entry(&net_namespace_list, struct net, list); + rtnl_unlock(); + list_for_each_entry(net, &net_kill_list, cleanup_list) { + unhash_nsid(net, last); + list_add_tail(&net->exit_list, &net_exit_list); } - rtnl_unlock(); /* * Another CPU might be rcu-iterating the list, wait for it. From patchwork Wed May 27 18:34:55 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Subject: [RHEL7,COMMIT] net: Assign net to net_namespace_list in setup_net() From: Konstantin Khorenko X-Patchwork-Id: 12864 Message-Id: <202005271834.04RIYtQT014991@finist-ce7.sw.ru> To: Kirill Tkhai Cc: OpenVZ devel Date: Wed, 27 May 2020 21:34:55 +0300 The commit is pushed to "branch-rh7-3.10.0-1127.8.2.vz7.161.x-ovz" and will appear at https://src.openvz.org/scm/ovz/vzkernel.git after rh7-3.10.0-1127.8.2.vz7.161.3 ------> commit cd48086b0f1b101c798900ea97db46bf7d1e3762 Author: Kirill Tkhai Date: Wed May 27 21:34:54 2020 +0300 net: Assign net to net_namespace_list in setup_net() ms commit 98f6c533a3e9 This patch merges two repeating pieces of code in one, and they will live in setup_net() now. The only change is that assignment: init_net_initialized = true; becomes reordered with: list_add_tail_rcu(&net->list, &net_namespace_list); The order does not have visible effect, and it is a simple cleanup because of: init_net_initialized is used in !CONFIG_NET_NS case to order proc_net_ns_ops registration occuring at boot time: start_kernel()->proc_root_init()->proc_net_init(), with net_ns_init()->setup_net(&init_net, &init_user_ns) also occuring in boot time from the same init_task. When there are no another tasks to race with them, for the single task it does not matter, which order two sequential independent loads should be made. So we make them reordered. Signed-off-by: Kirill Tkhai Acked-by: Andrei Vagin Signed-off-by: David S. Miller ===================== Patchset description: Parallel per-net init/exit https://jira.sw.ru/browse/PSBM-104158 --- net/core/net_namespace.c | 12 +++--------- 1 file changed, 3 insertions(+), 9 deletions(-) diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c index a8b7a9ddbe7c6..954f78c526d3d 100644 --- a/net/core/net_namespace.c +++ b/net/core/net_namespace.c @@ -324,6 +324,9 @@ static __net_init int setup_net(struct net *net, struct user_namespace *user_ns) if (error < 0) goto out_undo; } + rtnl_lock(); + list_add_tail_rcu(&net->list, &net_namespace_list); + rtnl_unlock(); out: return error; @@ -419,11 +422,6 @@ struct net *copy_net_ns(unsigned long flags, mutex_lock(&net_mutex); net->ucounts = ucounts; rv = setup_net(net, user_ns); - if (rv == 0) { - rtnl_lock(); - list_add_tail_rcu(&net->list, &net_namespace_list); - rtnl_unlock(); - } mutex_unlock(&net_mutex); if (rv < 0) { dec_net_namespaces(ucounts); @@ -837,10 +835,6 @@ static int __init net_ns_init(void) if (setup_net(&init_net, &init_user_ns)) panic("Could not setup the initial network namespace"); - rtnl_lock(); - list_add_tail_rcu(&init_net.list, &net_namespace_list); - rtnl_unlock(); - mutex_unlock(&net_mutex); register_pernet_subsys(&net_ns_ops); From patchwork Wed May 27 15:54:02 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Subject: [RH7,05/11] net: Cleanup in copy_net_ns() From: Kirill Tkhai X-Patchwork-Id: 12850 Message-Id: <159059484281.408928.15678502357176748050.stgit@localhost.localdomain> To: devel@openvz.org, khorenko@virtuozzo.com, ktkhai@virtuozzo.com Date: Wed, 27 May 2020 18:54:02 +0300 ms commit 5ba049a5cc8e Line up destructors actions in the revers order to constructors. Next patches will add more actions, and this will be comfortable, if there is the such order. Signed-off-by: Kirill Tkhai Acked-by: Andrei Vagin Signed-off-by: David S. Miller --- net/core/net_namespace.c | 23 +++++++++++++++-------- 1 file changed, 15 insertions(+), 8 deletions(-) diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c index 6cc399656a09..69063828c7c0 100644 --- a/net/core/net_namespace.c +++ b/net/core/net_namespace.c @@ -408,28 +408,35 @@ struct net *copy_net_ns(unsigned long flags, return ERR_PTR(-ENOSPC); if (atomic_dec_if_positive(&ve->netns_avail_nr) < 0) { - dec_net_namespaces(ucounts); - return ERR_PTR(-ENOMEM); + rv = -ENOMEM; + goto dec_ucounts; + } net = net_alloc(); if (!net) { - dec_net_namespaces(ucounts); - atomic_inc(&ve->netns_avail_nr); - return ERR_PTR(-ENOMEM); + rv = -ENOMEM; + goto inc_avail_nr; } + atomic_set(&net->passive, 1); + net->ucounts = ucounts; get_user_ns(user_ns); - mutex_lock(&net_mutex); - net->ucounts = ucounts; + rv = mutex_lock_killable(&net_mutex); + if (rv < 0) + goto put_userns; + rv = setup_net(net, user_ns); mutex_unlock(&net_mutex); if (rv < 0) { - dec_net_namespaces(ucounts); +put_userns: put_user_ns(user_ns); net_drop_ns(net); +inc_avail_nr: atomic_inc(&ve->netns_avail_nr); +dec_ucounts: + dec_net_namespaces(ucounts); return ERR_PTR(rv); } return net; From patchwork Wed May 27 18:34:57 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Subject: [RHEL7,COMMIT] net: Introduce net_sem for protection of pernet_list From: Konstantin Khorenko X-Patchwork-Id: 12866 Message-Id: <202005271834.04RIYvEb015091@finist-ce7.sw.ru> To: Kirill Tkhai Cc: OpenVZ devel Date: Wed, 27 May 2020 21:34:57 +0300 The commit is pushed to "branch-rh7-3.10.0-1127.8.2.vz7.161.x-ovz" and will appear at https://src.openvz.org/scm/ovz/vzkernel.git after rh7-3.10.0-1127.8.2.vz7.161.3 ------> commit 28872d426e6b5a9797349b49739411079d642dc6 Author: Kirill Tkhai Date: Wed May 27 21:34:57 2020 +0300 net: Introduce net_sem for protection of pernet_list ms commit 1a57feb847c5 Currently, the mutex is mostly used to protect pernet operations list. It orders setup_net() and cleanup_net() with parallel {un,}register_pernet_operations() calls, so ->exit{,batch} methods of the same pernet operations are executed for a dying net, as were used to call ->init methods, even after the net namespace is unlinked from net_namespace_list in cleanup_net(). But there are several problems with scalability. The first one is that more than one net can't be created or destroyed at the same moment on the node. For big machines with many cpus running many containers it's very sensitive. The second one is that it's need to synchronize_rcu() after net is removed from net_namespace_list(): Destroy net_ns: cleanup_net() mutex_lock(&net_mutex) list_del_rcu(&net->list) synchronize_rcu() <--- Sleep there for ages list_for_each_entry_reverse(ops, &pernet_list, list) ops_exit_list(ops, &net_exit_list) list_for_each_entry_reverse(ops, &pernet_list, list) ops_free_list(ops, &net_exit_list) mutex_unlock(&net_mutex) This primitive is not fast, especially on the systems with many processors and/or when preemptible RCU is enabled in config. So, all the time, while cleanup_net() is waiting for RCU grace period, creation of new net namespaces is not possible, the tasks, who makes it, are sleeping on the same mutex: Create net_ns: copy_net_ns() mutex_lock_killable(&net_mutex) <--- Sleep there for ages I observed 20-30 seconds hangs of "unshare -n" on ordinary 8-cpu laptop with preemptible RCU enabled after CRIU tests round is finished. The solution is to convert net_mutex to the rw_semaphore and add fine grain locks to really small number of pernet_operations, what really need them. Then, pernet_operations::init/::exit methods, modifying the net-related data, will require down_read() locking only, while down_write() will be used for changing pernet_list (i.e., when modules are being loaded and unloaded). This gives signify performance increase, after all patch set is applied, like you may see here: %for i in {1..10000}; do unshare -n bash -c exit; done *before* real 1m40,377s user 0m9,672s sys 0m19,928s *after* real 0m17,007s user 0m5,311s sys 0m11,779 (5.8 times faster) This patch starts replacing net_mutex to net_sem. It adds rw_semaphore, describes the variables it protects, and makes to use, where appropriate. net_mutex is still present, and next patches will kick it out step-by-step. Signed-off-by: Kirill Tkhai Acked-by: Andrei Vagin Signed-off-by: David S. Miller ===================== Patchset description: Parallel per-net init/exit https://jira.sw.ru/browse/PSBM-104158 --- include/linux/rtnetlink.h | 1 + net/core/net_namespace.c | 41 +++++++++++++++++++++++++++-------------- net/core/rtnetlink.c | 4 ++-- 3 files changed, 30 insertions(+), 16 deletions(-) diff --git a/include/linux/rtnetlink.h b/include/linux/rtnetlink.h index 10c5fae799ee5..65dcc5859dd19 100644 --- a/include/linux/rtnetlink.h +++ b/include/linux/rtnetlink.h @@ -35,6 +35,7 @@ extern int rtnl_is_locked(void); extern wait_queue_head_t netdev_unregistering_wq; extern struct mutex net_mutex; +extern struct rw_semaphore net_sem; #ifdef CONFIG_PROVE_LOCKING extern bool lockdep_rtnl_is_held(void); diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c index c461acce68bb5..f63e32495e493 100644 --- a/net/core/net_namespace.c +++ b/net/core/net_namespace.c @@ -32,6 +32,11 @@ static LIST_HEAD(pernet_list); static struct list_head *first_device = &pernet_list; DEFINE_MUTEX(net_mutex); +/* + * net_sem: protects: pernet_list, net_generic_ids, + * init_net_initialized and first_device pointer. + */ +DECLARE_RWSEM(net_sem); LIST_HEAD(net_namespace_list); EXPORT_SYMBOL_GPL(net_namespace_list); @@ -302,7 +307,7 @@ static void dec_net_namespaces(struct ucounts *ucounts) */ static __net_init int setup_net(struct net *net, struct user_namespace *user_ns) { - /* Must be called with net_mutex held */ + /* Must be called with net_sem held */ const struct pernet_operations *ops, *saved_ops; int error = 0; LIST_HEAD(net_exit_list); @@ -421,12 +426,18 @@ struct net *copy_net_ns(unsigned long flags, net->ucounts = ucounts; get_user_ns(user_ns); - rv = mutex_lock_killable(&net_mutex); + rv = down_read_killable(&net_sem); if (rv < 0) goto put_userns; + rv = mutex_lock_killable(&net_mutex); + if (rv < 0) + goto up_read; + rv = setup_net(net, user_ns); mutex_unlock(&net_mutex); +up_read: + up_read(&net_sem); if (rv < 0) { put_userns: put_user_ns(user_ns); @@ -482,6 +493,7 @@ static void cleanup_net(struct work_struct *work) list_replace_init(&cleanup_list, &net_kill_list); spin_unlock_irq(&cleanup_list_lock); + down_read(&net_sem); mutex_lock(&net_mutex); /* Don't let anyone else find us. */ @@ -521,6 +533,9 @@ static void cleanup_net(struct work_struct *work) list_for_each_entry_reverse(ops, &pernet_list, list) ops_free_list(ops, &net_exit_list); + mutex_unlock(&net_mutex); + up_read(&net_sem); + list_for_each_entry(net, &net_kill_list, cleanup_list) { struct ve_struct *ve = net->owner_ve; @@ -528,8 +543,6 @@ static void cleanup_net(struct work_struct *work) put_ve(ve); } - mutex_unlock(&net_mutex); - /* Ensure there are no outstanding rcu callbacks using this * network namespace. */ @@ -838,11 +851,11 @@ static int __init net_ns_init(void) rcu_assign_pointer(init_net.gen, ng); - mutex_lock(&net_mutex); + down_write(&net_sem); if (setup_net(&init_net, &init_user_ns)) panic("Could not setup the initial network namespace"); - mutex_unlock(&net_mutex); + up_write(&net_sem); register_pernet_subsys(&net_ns_ops); @@ -972,9 +985,9 @@ static void unregister_pernet_operations(struct pernet_operations *ops) int register_pernet_subsys(struct pernet_operations *ops) { int error; - mutex_lock(&net_mutex); + down_write(&net_sem); error = register_pernet_operations(first_device, ops); - mutex_unlock(&net_mutex); + up_write(&net_sem); return error; } EXPORT_SYMBOL_GPL(register_pernet_subsys); @@ -990,9 +1003,9 @@ EXPORT_SYMBOL_GPL(register_pernet_subsys); */ void unregister_pernet_subsys(struct pernet_operations *ops) { - mutex_lock(&net_mutex); + down_write(&net_sem); unregister_pernet_operations(ops); - mutex_unlock(&net_mutex); + up_write(&net_sem); } EXPORT_SYMBOL_GPL(unregister_pernet_subsys); @@ -1018,11 +1031,11 @@ EXPORT_SYMBOL_GPL(unregister_pernet_subsys); int register_pernet_device(struct pernet_operations *ops) { int error; - mutex_lock(&net_mutex); + down_write(&net_sem); error = register_pernet_operations(&pernet_list, ops); if (!error && (first_device == &pernet_list)) first_device = &ops->list; - mutex_unlock(&net_mutex); + up_write(&net_sem); return error; } EXPORT_SYMBOL_GPL(register_pernet_device); @@ -1038,11 +1051,11 @@ EXPORT_SYMBOL_GPL(register_pernet_device); */ void unregister_pernet_device(struct pernet_operations *ops) { - mutex_lock(&net_mutex); + down_write(&net_sem); if (&ops->list == first_device) first_device = first_device->next; unregister_pernet_operations(ops); - mutex_unlock(&net_mutex); + up_write(&net_sem); } EXPORT_SYMBOL_GPL(unregister_pernet_device); diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c index 93528356f87e1..4525cecdd94bc 100644 --- a/net/core/rtnetlink.c +++ b/net/core/rtnetlink.c @@ -504,11 +504,11 @@ static void rtnl_lock_unregistering_all(void) void rtnl_link_unregister(struct rtnl_link_ops *ops) { /* Close the race with cleanup_net() */ - mutex_lock(&net_mutex); + down_write(&net_sem); rtnl_lock_unregistering_all(); __rtnl_link_unregister(ops); rtnl_unlock(); - mutex_unlock(&net_mutex); + up_write(&net_sem); } EXPORT_SYMBOL_GPL(rtnl_link_unregister); From patchwork Wed May 27 18:34:58 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Subject: [RHEL7,COMMIT] net: Move mutex_unlock() in cleanup_net() up From: Konstantin Khorenko X-Patchwork-Id: 12867 Message-Id: <202005271834.04RIYwZm015141@finist-ce7.sw.ru> To: Kirill Tkhai Cc: OpenVZ devel Date: Wed, 27 May 2020 21:34:58 +0300 The commit is pushed to "branch-rh7-3.10.0-1127.8.2.vz7.161.x-ovz" and will appear at https://src.openvz.org/scm/ovz/vzkernel.git after rh7-3.10.0-1127.8.2.vz7.161.3 ------> commit 56347e49285000aedd3c610eaefafb6cfaf2bf5c Author: Kirill Tkhai Date: Wed May 27 21:34:58 2020 +0300 net: Move mutex_unlock() in cleanup_net() up ms commit bcab1ddd9b2b net_sem protects from pernet_list changing, while ops_free_list() makes simple kfree(), and it can't race with other pernet_operations callbacks. So we may release net_mutex earlier then it was. Signed-off-by: Kirill Tkhai Acked-by: Andrei Vagin Signed-off-by: David S. Miller ===================== Patchset description: Parallel per-net init/exit https://jira.sw.ru/browse/PSBM-104158 --- net/core/net_namespace.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c index f63e32495e493..9827efc52a2e6 100644 --- a/net/core/net_namespace.c +++ b/net/core/net_namespace.c @@ -529,11 +529,12 @@ static void cleanup_net(struct work_struct *work) list_for_each_entry_reverse(ops, &pernet_list, list) ops_exit_list(ops, &net_exit_list); + mutex_unlock(&net_mutex); + /* Free the net generic variables */ list_for_each_entry_reverse(ops, &pernet_list, list) ops_free_list(ops, &net_exit_list); - mutex_unlock(&net_mutex); up_read(&net_sem); list_for_each_entry(net, &net_kill_list, cleanup_list) { From patchwork Wed May 27 18:35:00 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Subject: [RHEL7,COMMIT] mac80211_hwsim: Make hwsim_netgroup IDA From: Konstantin Khorenko X-Patchwork-Id: 12868 Message-Id: <202005271835.04RIZ0wl015191@finist-ce7.sw.ru> To: Kirill Tkhai Cc: OpenVZ devel Date: Wed, 27 May 2020 21:35:00 +0300 The commit is pushed to "branch-rh7-3.10.0-1127.8.2.vz7.161.x-ovz" and will appear at https://src.openvz.org/scm/ovz/vzkernel.git after rh7-3.10.0-1127.8.2.vz7.161.3 ------> commit 3e4eadef02a756ad4d60c6006b504392dda83785 Author: Kirill Tkhai Date: Wed May 27 21:34:59 2020 +0300 mac80211_hwsim: Make hwsim_netgroup IDA ms commit 03695549aa76 hwsim_netgroup counter is declarated as int, and it is incremented every time a new net is created. After sizeof(int) net are created, it will overflow, and different net namespaces will have the same identifier. This patch fixes the problem by introducing IDA instead of int counter. IDA guarantees, all the net namespaces have the uniq identifier. Note, that after we do ida_simple_remove() in hwsim_exit_net(), and we destroy the ID, later there may be executed destroy_radio() from the workqueue. But destroy_radio() does not use the ID, so it's OK. Out of bounds of this patch, just as a report to wireless subsystem maintainer, destroy_radio() increaments hwsim_radios_generation without hwsim_radio_lock, so this may need one more patch to fix. Signed-off-by: Kirill Tkhai Signed-off-by: Johannes Berg ===================== Patchset description: Parallel per-net init/exit https://jira.sw.ru/browse/PSBM-104158 --- drivers/net/wireless/mac80211_hwsim.c | 15 +++++++++------ 1 file changed, 9 insertions(+), 6 deletions(-) diff --git a/drivers/net/wireless/mac80211_hwsim.c b/drivers/net/wireless/mac80211_hwsim.c index 877261887d708..b7024f6fb4e5b 100644 --- a/drivers/net/wireless/mac80211_hwsim.c +++ b/drivers/net/wireless/mac80211_hwsim.c @@ -33,6 +33,7 @@ #include #include #include +#include #include "mac80211_hwsim.h" #define WARN_QUEUE 100 @@ -253,7 +254,7 @@ static inline void hwsim_clear_chanctx_magic(struct ieee80211_chanctx_conf *c) static unsigned int hwsim_net_id; -static int hwsim_netgroup; +static DEFINE_IDA(hwsim_netgroup_ida); struct hwsim_net { int netgroup; @@ -267,11 +268,13 @@ static inline int hwsim_net_get_netgroup(struct net *net) return hwsim_net->netgroup; } -static inline void hwsim_net_set_netgroup(struct net *net) +static inline int hwsim_net_set_netgroup(struct net *net) { struct hwsim_net *hwsim_net = net_generic(net, hwsim_net_id); - hwsim_net->netgroup = hwsim_netgroup++; + hwsim_net->netgroup = ida_simple_get(&hwsim_netgroup_ida, + 0, 0, GFP_KERNEL); + return hwsim_net->netgroup >= 0 ? 0 : -ENOMEM; } static inline u32 hwsim_net_get_wmediumd(struct net *net) @@ -3402,9 +3405,7 @@ static int __init hwsim_init_netlink(void) static __net_init int hwsim_init_net(struct net *net) { - hwsim_net_set_netgroup(net); - - return 0; + return hwsim_net_set_netgroup(net); } static void __net_exit hwsim_exit_net(struct net *net) @@ -3425,6 +3426,8 @@ static void __net_exit hwsim_exit_net(struct net *net) schedule_work(&data->destroy_work); } spin_unlock_bh(&hwsim_radio_lock); + + ida_simple_remove(&hwsim_netgroup_ida, hwsim_net_get_netgroup(net)); } static struct pernet_operations hwsim_net_ops = { From patchwork Wed May 27 15:54:25 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Subject: [RH7,09/11] ve: Synchronize vznetstat pernet_operations From: Kirill Tkhai X-Patchwork-Id: 12852 Message-Id: <159059486552.408928.3648847763657835520.stgit@localhost.localdomain> To: devel@openvz.org, khorenko@virtuozzo.com, ktkhai@virtuozzo.com Date: Wed, 27 May 2020 18:54:25 +0300 Make impossible parallel init/exit Signed-off-by: Kirill Tkhai --- kernel/ve/vznetstat/vznetstat.c | 18 ++++++++++++++---- 1 file changed, 14 insertions(+), 4 deletions(-) diff --git a/kernel/ve/vznetstat/vznetstat.c b/kernel/ve/vznetstat/vznetstat.c index e0ca8d2fb027..f37c137fbc15 100644 --- a/kernel/ve/vznetstat/vznetstat.c +++ b/kernel/ve/vznetstat/vznetstat.c @@ -1083,29 +1083,39 @@ static struct file_operations proc_venetstat_v6_operations = { .release = seq_release, }; +/* TODO: Use tc_lock? */ +static DEFINE_MUTEX(ve_stat_mutex); + static int __net_init net_init_acct(struct net *net) { struct ve_struct *ve = net->owner_ve; + int ret = 0; + mutex_lock(&ve_stat_mutex); if (!ve->stat) { ve->stat = venet_acct_find_create_stat(ve->veid); - if (!ve->stat) - return -ENOMEM; + if (!ve->stat) { + ret = -ENOMEM; + goto out; + } } else venet_acct_get_stat(ve->stat); - - return 0; + mutex_unlock(&ve_stat_mutex); +out: + return ret; } static void __net_exit net_exit_acct(struct net *net) { struct ve_struct *ve = net->owner_ve; + mutex_lock(&ve_stat_mutex); if (ve->stat) { venet_acct_put_stat(ve->stat); if (ve->ve_netns == net) ve->stat = NULL; } + mutex_unlock(&ve_stat_mutex); } static struct pernet_operations __net_initdata net_acct_ops = { From patchwork Wed May 27 15:54:30 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Subject: [RH7,10/11] net: Kill net_mutex From: Kirill Tkhai X-Patchwork-Id: 12854 Message-Id: <159059487088.408928.15136863247459170660.stgit@localhost.localdomain> To: devel@openvz.org, khorenko@virtuozzo.com, ktkhai@virtuozzo.com Date: Wed, 27 May 2020 18:54:30 +0300 ms commit 19efbd93e6fb We take net_mutex, when there are !async pernet_operations registered, and read locking of net_sem is not enough. But we may get rid of taking the mutex, and just change the logic to write lock net_sem in such cases. This obviously reduces the number of lock operations, we do. Signed-off-by: Kirill Tkhai Signed-off-by: David S. Miller [vz7: many intermediate and following patches skipped] Signed-off-by: Kirill Tkhai --- net/core/net_namespace.c | 11 +---------- 1 file changed, 1 insertion(+), 10 deletions(-) diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c index 15bfd1306141..7c80d00e4f75 100644 --- a/net/core/net_namespace.c +++ b/net/core/net_namespace.c @@ -31,7 +31,6 @@ static LIST_HEAD(pernet_list); static struct list_head *first_device = &pernet_list; -DEFINE_MUTEX(net_mutex); /* * net_sem: protects: pernet_list, net_generic_ids, * init_net_initialized and first_device pointer. @@ -73,11 +72,10 @@ int net_assign_generic(struct net *net, int id, void *data) { struct net_generic *ng, *old_ng; - BUG_ON(!mutex_is_locked(&net_mutex)); BUG_ON(id == 0); old_ng = rcu_dereference_protected(net->gen, - lockdep_is_held(&net_mutex)); + lockdep_is_held(&net_sem)); ng = old_ng; if (old_ng->len >= id) goto assign; @@ -432,13 +430,7 @@ struct net *copy_net_ns(unsigned long flags, if (rv < 0) goto put_userns; - rv = mutex_lock_killable(&net_mutex); - if (rv < 0) - goto up_read; - rv = setup_net(net, user_ns); - mutex_unlock(&net_mutex); -up_read: up_read(&net_sem); if (rv < 0) { put_userns: @@ -496,7 +488,6 @@ static void cleanup_net(struct work_struct *work) spin_unlock_irq(&cleanup_list_lock); down_read(&net_sem); - mutex_lock(&net_mutex); /* Don't let anyone else find us. */ rtnl_lock(); From patchwork Wed May 27 15:54:36 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Subject: [RH7,11/11] net: Rename net_sem to pernet_ops_rwsem From: Kirill Tkhai X-Patchwork-Id: 12856 Message-Id: <159059487614.408928.7686792143239886802.stgit@localhost.localdomain> To: devel@openvz.org, khorenko@virtuozzo.com, ktkhai@virtuozzo.com Date: Wed, 27 May 2020 18:54:36 +0300 ms commit 4420bf21fb6c net_sem is some undefined area name, so it will be better to make the area more defined. Rename it to pernet_ops_rwsem for better readability and better intelligibility. Signed-off-by: Kirill Tkhai Signed-off-by: David S. Miller --- include/linux/rtnetlink.h | 2 +- net/core/net_namespace.c | 38 ++++++++++++++++++-------------------- net/core/rtnetlink.c | 4 ++-- 3 files changed, 21 insertions(+), 23 deletions(-) diff --git a/include/linux/rtnetlink.h b/include/linux/rtnetlink.h index 65dcc5859dd1..c592edd7f2b1 100644 --- a/include/linux/rtnetlink.h +++ b/include/linux/rtnetlink.h @@ -35,7 +35,7 @@ extern int rtnl_is_locked(void); extern wait_queue_head_t netdev_unregistering_wq; extern struct mutex net_mutex; -extern struct rw_semaphore net_sem; +extern struct rw_semaphore pernet_ops_rwsem; #ifdef CONFIG_PROVE_LOCKING extern bool lockdep_rtnl_is_held(void); diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c index 7c80d00e4f75..25bf135a323c 100644 --- a/net/core/net_namespace.c +++ b/net/core/net_namespace.c @@ -32,10 +32,10 @@ static LIST_HEAD(pernet_list); static struct list_head *first_device = &pernet_list; /* - * net_sem: protects: pernet_list, net_generic_ids, + * pernet_ops_rwsem: protects: pernet_list, net_generic_ids, * init_net_initialized and first_device pointer. */ -DECLARE_RWSEM(net_sem); +DECLARE_RWSEM(pernet_ops_rwsem); LIST_HEAD(net_namespace_list); EXPORT_SYMBOL_GPL(net_namespace_list); @@ -75,7 +75,7 @@ int net_assign_generic(struct net *net, int id, void *data) BUG_ON(id == 0); old_ng = rcu_dereference_protected(net->gen, - lockdep_is_held(&net_sem)); + lockdep_is_held(&pernet_ops_rwsem)); ng = old_ng; if (old_ng->len >= id) goto assign; @@ -306,7 +306,7 @@ static void dec_net_namespaces(struct ucounts *ucounts) */ static __net_init int setup_net(struct net *net, struct user_namespace *user_ns) { - /* Must be called with net_sem held */ + /* Must be called with pernet_ops_rwsem held */ const struct pernet_operations *ops, *saved_ops; int error = 0; LIST_HEAD(net_exit_list); @@ -426,12 +426,12 @@ struct net *copy_net_ns(unsigned long flags, net->ucounts = ucounts; get_user_ns(user_ns); - rv = down_read_killable(&net_sem); + rv = down_read_killable(&pernet_ops_rwsem); if (rv < 0) goto put_userns; rv = setup_net(net, user_ns); - up_read(&net_sem); + up_read(&pernet_ops_rwsem); if (rv < 0) { put_userns: put_user_ns(user_ns); @@ -487,7 +487,7 @@ static void cleanup_net(struct work_struct *work) list_replace_init(&cleanup_list, &net_kill_list); spin_unlock_irq(&cleanup_list_lock); - down_read(&net_sem); + down_read(&pernet_ops_rwsem); /* Don't let anyone else find us. */ rtnl_lock(); @@ -522,13 +522,11 @@ static void cleanup_net(struct work_struct *work) list_for_each_entry_reverse(ops, &pernet_list, list) ops_exit_list(ops, &net_exit_list); - mutex_unlock(&net_mutex); - /* Free the net generic variables */ list_for_each_entry_reverse(ops, &pernet_list, list) ops_free_list(ops, &net_exit_list); - up_read(&net_sem); + up_read(&pernet_ops_rwsem); list_for_each_entry(net, &net_kill_list, cleanup_list) { struct ve_struct *ve = net->owner_ve; @@ -845,11 +843,11 @@ static int __init net_ns_init(void) rcu_assign_pointer(init_net.gen, ng); - down_write(&net_sem); + down_write(&pernet_ops_rwsem); if (setup_net(&init_net, &init_user_ns)) panic("Could not setup the initial network namespace"); - up_write(&net_sem); + up_write(&pernet_ops_rwsem); register_pernet_subsys(&net_ns_ops); @@ -979,9 +977,9 @@ static void unregister_pernet_operations(struct pernet_operations *ops) int register_pernet_subsys(struct pernet_operations *ops) { int error; - down_write(&net_sem); + down_write(&pernet_ops_rwsem); error = register_pernet_operations(first_device, ops); - up_write(&net_sem); + up_write(&pernet_ops_rwsem); return error; } EXPORT_SYMBOL_GPL(register_pernet_subsys); @@ -997,9 +995,9 @@ EXPORT_SYMBOL_GPL(register_pernet_subsys); */ void unregister_pernet_subsys(struct pernet_operations *ops) { - down_write(&net_sem); + down_write(&pernet_ops_rwsem); unregister_pernet_operations(ops); - up_write(&net_sem); + up_write(&pernet_ops_rwsem); } EXPORT_SYMBOL_GPL(unregister_pernet_subsys); @@ -1025,11 +1023,11 @@ EXPORT_SYMBOL_GPL(unregister_pernet_subsys); int register_pernet_device(struct pernet_operations *ops) { int error; - down_write(&net_sem); + down_write(&pernet_ops_rwsem); error = register_pernet_operations(&pernet_list, ops); if (!error && (first_device == &pernet_list)) first_device = &ops->list; - up_write(&net_sem); + up_write(&pernet_ops_rwsem); return error; } EXPORT_SYMBOL_GPL(register_pernet_device); @@ -1045,11 +1043,11 @@ EXPORT_SYMBOL_GPL(register_pernet_device); */ void unregister_pernet_device(struct pernet_operations *ops) { - down_write(&net_sem); + down_write(&pernet_ops_rwsem); if (&ops->list == first_device) first_device = first_device->next; unregister_pernet_operations(ops); - up_write(&net_sem); + up_write(&pernet_ops_rwsem); } EXPORT_SYMBOL_GPL(unregister_pernet_device); diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c index 4525cecdd94b..2c0ec5c9ffdd 100644 --- a/net/core/rtnetlink.c +++ b/net/core/rtnetlink.c @@ -504,11 +504,11 @@ static void rtnl_lock_unregistering_all(void) void rtnl_link_unregister(struct rtnl_link_ops *ops) { /* Close the race with cleanup_net() */ - down_write(&net_sem); + down_write(&pernet_ops_rwsem); rtnl_lock_unregistering_all(); __rtnl_link_unregister(ops); rtnl_unlock(); - up_write(&net_sem); + up_write(&pernet_ops_rwsem); } EXPORT_SYMBOL_GPL(rtnl_link_unregister);