[Devel,RHEL7,COMMIT] venet: VEIP "SHUTDOWN" hook introduced

Submitted by Konstantin Khorenko on April 22, 2017, 11:47 a.m.

Details

Message ID 201704221147.v3MBl7YR028383@finist_cl7.x64_64.work.ct
State New
Series "Fix VEIP object handling on container destroy"
Headers show

Commit Message

Konstantin Khorenko April 22, 2017, 11:47 a.m.
The commit is pushed to "branch-rh7-3.10.0-514.16.1.vz7.30.x-ovz" and will appear at https://src.openvz.org/scm/ovz/vzkernel.git
after rh7-3.10.0-514.16.1.vz7.30.4
------>
commit b0c55d0b634138c49545807a78c7d334a739c8db
Author: Stanislav Kinsburskiy <skinsbursky@virtuozzo.com>
Date:   Sat Apr 22 15:47:07 2017 +0400

    venet: VEIP "SHUTDOWN" hook introduced
    
    Previously, there was SS hook to destroy the whole venet device (with veip
    object).
    Well, this is more or less ok, but SS hook was called before put_nsproxy(),
    leading to network disable, while there might be some files still opened on
    NFS, which is a deadlock.
    This was the reason, why the original SS hook was removed in commit
    a9002f5a714f130e624f42d8e3a57bb276213596 to fix this big issue.
    
    So, why another one is introduced again? Here is the rationale.
    
    First of all, I have to say, that the whole intention in ve_exit_ns() is to
    remove container ve_struct from the global list thus allowing to create a new
    one with the same VEID while the old one is under destruction.
    Second, destruction may take a while, because ve_struct is being hold by it's
    net, which is in turn destroyed asynchroniously.
    Third, this destruction means not only VE struct + net, but also "veip"
    object, which holds IP address. This means that it's unpredictable, when
    container IP address is released, while container is considered as destroyed
    already (when it's removed from the list, another one with the same veid can
    be created). This can lead to EEXIST error on IP assigning attempt to another
    interface.
    Thus, there is a problem.
    Look like the right solution should be to remove ve struct from the list on
    final put (when network is destroyed), but it changes current logic, which
    assumes, that container is "destroyed" when final task has exited, making
    container stop unpredictably long especially when node is overloaded.
    
    So, this patch bring back a piece of logic, dropped in commit
    a9002f5a714f130e624f42d8e3a57bb276213596, but in different circumstances.
    First of all, "SHUTDOWN" hook chain is used instead of "SS".
    Second, only VEIP object is destroyed in this hook to allow to assign the
    address to the new interface (if any). Venet destruction kept as it was: it
    will be destroyed asynchroniously with the network.
    
    Note: venet_dellink will skip VEIP destruction, iff ve->ve_netns is NULL (and
    it is set to NULL in ve_exit_ns() _before_ final net put).
    
    https://jira.sw.ru/browse/PSBM-64869
    
    Signed-off-by: Stanislav Kinsburskiy <skinsbursky@virtuozzo.com>
---
 drivers/net/venetdev.c | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

Patch hide | download patch | download mbox

diff --git a/drivers/net/venetdev.c b/drivers/net/venetdev.c
index b7d7596..ddaa6e8 100644
--- a/drivers/net/venetdev.c
+++ b/drivers/net/venetdev.c
@@ -761,7 +761,11 @@  static void venet_dellink(struct net_device *dev, struct list_head *head)
 {
 	struct ve_struct *env = dev->nd_net->owner_ve;
 
-	veip_shutdown(env);
+	/* We check ve_netns to avoid races with veip SHUTDOWN hook, called from
+	 * ve_exit_ns()
+	 */
+	if (env->ve_netns)
+		veip_shutdown(env);
 
 	env->_venet_dev = NULL;
 	unregister_netdevice_queue(dev, head);
@@ -1176,6 +1180,12 @@  static struct rtnl_link_ops venet_link_ops = {
 	.maxtype	= VENET_INFO_MAX,
 };
 
+static struct ve_hook veip_shutdown_hook = {
+	.fini		= veip_shutdown,
+	.priority	= HOOK_PRIO_FINISHING,
+	.owner		= THIS_MODULE,
+};
+
 __init int venet_init(void)
 {
 	struct proc_dir_entry *de;
@@ -1198,6 +1208,7 @@  __init int venet_init(void)
 
 	vzioctl_register(&venetcalls);
 	vzmon_register_veaddr_print_cb(veaddr_seq_print);
+	ve_hook_register(VE_SHUTDOWN_CHAIN, &veip_shutdown_hook);
 
 	return rtnl_link_register(&venet_link_ops);