[RHEL7,COMMIT] net: Move net:netns_ids destruction out of rtnl_lock() and document locking scheme

Submitted by Konstantin Khorenko on May 27, 2020, 6:34 p.m.

Details

Message ID 202005271834.04RIYrZP014941@finist-ce7.sw.ru
State New
Series "Parallel per-net init/exit"
Headers show

Commit Message

Konstantin Khorenko May 27, 2020, 6:34 p.m.
The commit is pushed to "branch-rh7-3.10.0-1127.8.2.vz7.161.x-ovz" and will appear at https://src.openvz.org/scm/ovz/vzkernel.git
after rh7-3.10.0-1127.8.2.vz7.161.3
------>
commit ec6ba20ee576ef78aafad6697d705787a76bc3a6
Author: Kirill Tkhai <ktkhai@virtuozzo.com>
Date:   Wed May 27 21:34:53 2020 +0300

    net: Move net:netns_ids destruction out of rtnl_lock() and document locking scheme
    
    ms commit fb07a820fe3f
    
    Currently, we unhash a dying net from netns_ids lists
    under rtnl_lock(). It's a leftover from the time when
    net::netns_ids was introduced. There was no net::nsid_lock,
    and rtnl_lock() was mostly need to order modification
    of alive nets nsid idr, i.e. for:
            for_each_net(tmp) {
                    ...
                    id = __peernet2id(tmp, net);
                    idr_remove(&tmp->netns_ids, id);
                    ...
            }
    
    Since we have net::nsid_lock, the modifications are
    protected by this local lock, and now we may introduce
    better scheme of netns_ids destruction.
    
    Let's look at the functions peernet2id_alloc() and
    get_net_ns_by_id(). Previous commits taught these
    functions to work well with dying net acquired from
    rtnl unlocked lists. And they are the only functions
    which can hash a net to netns_ids or obtain from there.
    And as easy to check, other netns_ids operating functions
    works with id, not with net pointers. So, we do not
    need rtnl_lock to synchronize cleanup_net() with all them.
    
    The another property, which is used in the patch,
    is that net is unhashed from net_namespace_list
    in the only place and by the only process. So,
    we avoid excess rcu_read_lock() or rtnl_lock(),
    when we'are iterating over the list in unhash_nsid().
    
    All the above makes possible to keep rtnl_lock() locked
    only for net->list deletion, and completely avoid it
    for netns_ids unhashing and destruction. As these two
    doings may take long time (e.g., memory allocation
    to send skb), the patch should positively act on
    the scalability and signify decrease the time, which
    rtnl_lock() is held in cleanup_net().
    
    Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
    
    Signed-off-by: David S. Miller <davem@davemloft.net>
    
    =====================
    Patchset description:
    
    Parallel per-net init/exit
    
    https://jira.sw.ru/browse/PSBM-104158
---
 net/core/net_namespace.c | 62 ++++++++++++++++++++++++++++++++++--------------
 1 file changed, 44 insertions(+), 18 deletions(-)

Patch hide | download patch | download mbox

diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
index b550d025e0d4a..a8b7a9ddbe7c6 100644
--- a/net/core/net_namespace.c
+++ b/net/core/net_namespace.c
@@ -435,13 +435,40 @@  struct net *copy_net_ns(unsigned long flags,
 	return net;
 }
 
+static void unhash_nsid(struct net *net, struct net *last)
+{
+	struct net *tmp;
+	/* This function is only called from cleanup_net() work,
+	 * and this work is the only process, that may delete
+	 * a net from net_namespace_list. So, when the below
+	 * is executing, the list may only grow. Thus, we do not
+	 * use for_each_net_rcu() or rtnl_lock().
+	 */
+	for_each_net(tmp) {
+		int id;
+
+		spin_lock_irq(&tmp->nsid_lock);
+		id = __peernet2id(tmp, net);
+		if (id >= 0)
+			idr_remove(&tmp->netns_ids, id);
+		spin_unlock_irq(&tmp->nsid_lock);
+		if (id >= 0)
+			rtnl_net_notifyid(tmp, RTM_DELNSID, id);
+		if (tmp == last)
+			break;
+	}
+	spin_lock_irq(&net->nsid_lock);
+	idr_destroy(&net->netns_ids);
+	spin_unlock_irq(&net->nsid_lock);
+}
+
 static DEFINE_SPINLOCK(cleanup_list_lock);
 static LIST_HEAD(cleanup_list);  /* Must hold cleanup_list_lock to touch */
 
 static void cleanup_net(struct work_struct *work)
 {
 	const struct pernet_operations *ops;
-	struct net *net, *tmp;
+	struct net *net, *tmp, *last;
 	struct list_head net_kill_list;
 	LIST_HEAD(net_exit_list);
 
@@ -454,26 +481,25 @@  static void cleanup_net(struct work_struct *work)
 
 	/* Don't let anyone else find us. */
 	rtnl_lock();
-	list_for_each_entry(net, &net_kill_list, cleanup_list) {
+	list_for_each_entry(net, &net_kill_list, cleanup_list)
 		list_del_rcu(&net->list);
-		list_add_tail(&net->exit_list, &net_exit_list);
-		for_each_net(tmp) {
-			int id;
-
-			spin_lock_irq(&tmp->nsid_lock);
-			id = __peernet2id(tmp, net);
-			if (id >= 0)
-				idr_remove(&tmp->netns_ids, id);
-			spin_unlock_irq(&tmp->nsid_lock);
-			if (id >= 0)
-				rtnl_net_notifyid(tmp, RTM_DELNSID, id);
-		}
-		spin_lock_irq(&net->nsid_lock);
-		idr_destroy(&net->netns_ids);
-		spin_unlock_irq(&net->nsid_lock);
+	/* Cache last net. After we unlock rtnl, no one new net
+	 * added to net_namespace_list can assign nsid pointer
+	 * to a net from net_kill_list (see peernet2id_alloc()).
+	 * So, we skip them in unhash_nsid().
+	 *
+	 * Note, that unhash_nsid() does not delete nsid links
+	 * between net_kill_list's nets, as they've already
+	 * deleted from net_namespace_list. But, this would be
+	 * useless anyway, as netns_ids are destroyed there.
+	 */
+	last = list_last_entry(&net_namespace_list, struct net, list);
+	rtnl_unlock();
 
+	list_for_each_entry(net, &net_kill_list, cleanup_list) {
+		unhash_nsid(net, last);
+		list_add_tail(&net->exit_list, &net_exit_list);
 	}
-	rtnl_unlock();
 
 	/*
 	 * Another CPU might be rcu-iterating the list, wait for it.