[Devel,RHEL7,COMMIT] ms/net: prevent of emerging cross-namespace symlinks

Submitted by Konstantin Khorenko on Jan. 11, 2017, 4:19 p.m.

Details

Message ID 201701111619.v0BGJa1Y012133@finist_cl7.x64_64.work.ct
State New
Series "macvlan: fix crash on list_del_rcu in macvlan_dellink"
Headers show

Commit Message

Konstantin Khorenko Jan. 11, 2017, 4:19 p.m.
The commit is pushed to "branch-rh7-3.10.0-514.vz7.27.x-ovz" and will appear at https://src.openvz.org/scm/ovz/vzkernel.git
after rh7-3.10.0-514.vz7.27.9
------>
commit 53259712b96f1c8cfdb17096e79f26e007bef3a7
Author: Alexander Y. Fomichev <git.user@gmail.com>
Date:   Wed Jan 11 20:19:35 2017 +0400

    ms/net: prevent of emerging cross-namespace symlinks
    
    Patchset description:
    macvlan: fix crash on list_del_rcu in macvlan_dellink
    
    Fixing problem that criu zdtm macvlan test on host crashes VZ7.
    Note: macvlans are prohibited inside vz7 Containers => Container owner/user
    cannot crash the node in that way.
    
    1 - Remove cross-namespace upper_/lower_ symlinks on sysfs
    when upper device is moved to other netns, that prevented
    creation of another upper dev with same name on same lower
    dev in initial netns(with warning).
    
    2 - Fix for 1
    
    3 - Fix partial macvlan device creation in case of error in
    netdev_upper_dev_link, when remove such device we get crash.
    
    Crash:
    
    [43183.592029] ------------[ cut here ]------------
    [43183.592057] WARNING: at fs/sysfs/dir.c:560 sysfs_add_one+0xa5/0xd0()
    [43183.592059] sysfs: cannot create duplicate filename '/devices/virtual/net/zdtmbr0/upper_zdtmmvlan0'
    ...
    [43183.657255] BUG: unable to handle kernel NULL pointer dereference at           (null)
    [43183.657285] IP: [<ffffffff8132e7d9>] __list_del_entry+0x29/0xd0
    [43183.657313] PGD 147afb067 PUD 1466d8067 PMD 0
    [43183.657330] Oops: 0000 [#1] SMP
    [43183.657344] Modules linked in: xt_mark macvlan nf_conntrack_netlink nfnetlink udp_diag tcp_diag inet_diag netlink_diag af_packet_diag unix_diag binfmt_misc ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack xt_CHECKSUM iptable_mangle ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 tun 8021q garp mrp ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter intel_powerclamp iosf_mbi crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd sg sbs ppdev sbshc pcspkr virtio_balloon parport_pc parport shpchp lpc_ich veth overlay ip6_vzprivnet ip6_vznetstat pio_kaio pio_nfs nfsd auth_rpcgss nfs_acl lockd grace sunrpc pio_direct pfmt_raw pfmt_ploop1 ploop ip_vznetstat ip_vzprivnet vziolimit
    [43183.657631]  vzevent vzlist vzstat vznetstat vznetdev vzmon vzdev bridge stp llc ip_tables ext4 mbcache jbd2 sd_mod sr_mod crc_t10dif cdrom crct10dif_generic ata_generic pata_acpi crct10dif_pclmul crct10dif_common ahci ata_piix crc32c_intel libahci libata serio_raw virtio_pci virtio_ring e1000 virtio fjes floppy dm_mirror dm_region_hash dm_log dm_mod
    [43183.657807] CPU: 0 PID: 332 Comm: kworker/u64:7 ve: 0 Tainted: G        W      ------------   3.10.0-514.vz7.27.5 #1 27.5
    [43183.657823] Hardware name: Parallels Software International Inc. Parallels Virtual Platform/Parallels Virtual Platform, BIOS 6.10.24198.1226784 12/09/2015
    [43183.657842] Workqueue: netns cleanup_net
    [43183.657858] task: ffff880144c8d050 ti: ffff880144d00000 task.ti: ffff880144d00000
    [43183.657871] RIP: 0010:[<ffffffff8132e7d9>]  [<ffffffff8132e7d9>] __list_del_entry+0x29/0xd0
    [43183.657889] RSP: 0018:ffff880144d03ce0  EFLAGS: 00010207
    [43183.657898] RAX: 0000000000000000 RBX: ffff880012598000 RCX: dead000000000200
    [43183.657906] RDX: 0000000000000000 RSI: ffff880144d03d10 RDI: ffff8800125988c8
    [43183.657915] RBP: ffff880144d03ce0 R08: ffff880144d03d38 R09: 0000000000000000
    [43183.657926] R10: 0000000000000001 R11: 0000000000000000 R12: ffff880144d03d10
    [43183.657935] R13: ffff88009fe22848 R14: ffff880144d03d10 R15: ffff88009fe22780
    [43183.657946] FS:  0000000000000000(0000) GS:ffff88014ae00000(0000) knlGS:0000000000000000
    [43183.657959] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [43183.657967] CR2: 0000000000000000 CR3: 00000001459f5000 CR4: 00000000000406f0
    [43183.657975] DR0: 0000000000010140 DR1: 0000000000000000 DR2: 0000000000000000
    [43183.657983] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000600
    [43183.657990] Stack:
    [43183.657997]  ffff880144d03d00 ffffffffa05368ce ffff880012598000 ffff880144d03dd8
    [43183.658023]  ffff880144d03d78 ffffffff8156fb02 ffff880144d03d10 ffff880144d03d10
    [43183.658046]  0000000000000000 ffff880144c8d050 ffffffff810b3080 ffff880144d03d38
    [43183.658069] Call Trace:
    [43183.658081]  [<ffffffffa05368ce>] macvlan_dellink+0x1e/0x50 [macvlan]
    [43183.658093]  [<ffffffff8156fb02>] default_device_exit_batch+0x102/0x190
    [43183.658108]  [<ffffffff810b3080>] ? wake_up_atomic_t+0x30/0x30
    [43183.658118]  [<ffffffff81569953>] ops_exit_list.isra.5+0x53/0x60
    [43183.658127]  [<ffffffff8156aac0>] cleanup_net+0x260/0x480
    [43183.658142]  [<ffffffff810a94cb>] process_one_work+0x17b/0x470
    [43183.658151]  [<ffffffff810aa306>] worker_thread+0x126/0x410
    [43183.658160]  [<ffffffff810aa1e0>] ? rescuer_thread+0x460/0x460
    [43183.658171]  [<ffffffff810b1e0f>] kthread+0xcf/0xe0
    [43183.658181]  [<ffffffff810b1d40>] ? create_kthread+0x60/0x60
    [43183.658193]  [<ffffffff81690b18>] ret_from_fork+0x58/0x90
    [43183.658203]  [<ffffffff810b1d40>] ? create_kthread+0x60/0x60
    [43183.658213] Code: 00 00 55 48 8b 17 48 b9 00 01 00 00 00 00 ad de 48 8b 47 08 48 89 e5 48 39 ca 74 29 48 b9 00 02 00 00 00 00 ad de 48 39 c8 74 7a <4c> 8b 00 4c 39 c7 75 53 4c 8b 42 08 4c 39 c7 75 2b 48 89 42 08
    [43183.658436] RIP  [<ffffffff8132e7d9>] __list_del_entry+0x29/0xd0
    [43183.658450]  RSP <ffff880144d03ce0>
    [43183.658457] CR2: 0000000000000000
    
    Short reproducer:
    unshare -n
    ip link add zdtmbr0 type bridge
    ip link add zdtmmvlan0 link zdtmbr0 type macvlan mode bridge
    ip netns add test
    ip link set zdtmmvlan0 netns test
    ip link add zdtmmvlan0 link zdtmbr0 type macvlan mode bridge
    ip link del zdtmmvlan0
    
    https://jira.sw.ru/browse/PSBM-58300
    
    Alexander Y. Fomichev (2):
      net: prevent of emerging cross-namespace symlinks
      net: fix creation adjacent device symlinks
    
    Cong Wang (1):
      macvlan: unregister net device when netdev_upper_dev_link() fails
    
    ===
    This patch description:
    
    Code manipulating sysfs symlinks on adjacent net_devices(s)
    currently doesn't take into account that devices potentially
    belong to different namespaces.
    
    This patch trying to fix an issue as follows:
    - check for net_ns before creating / deleting symlink.
      for now only netdev_adjacent_rename_links and
      __netdev_adjacent_dev_remove are affected, afaics
      __netdev_adjacent_dev_insert implies both net_devs
      belong to the same namespace.
    - Drop all existing symlinks to / from all adj_devs before
      switching namespace and recreate them just after.
    
    Signed-off-by: Alexander Y. Fomichev <git.user@gmail.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    
    ms commit: 4c75431 ("net: prevent of emerging cross-namespace symlinks")
    https://jira.sw.ru/browse/PSBM-58300
    
    Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
    Acked-by: Andrew Vagin <avagin@virtuozzo.com>
---
 net/core/dev.c | 61 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 60 insertions(+), 1 deletion(-)

Patch hide | download patch | download mbox

diff --git a/net/core/dev.c b/net/core/dev.c
index e7c21ea..b1a183d 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -5267,7 +5267,8 @@  void __netdev_adjacent_dev_remove(struct net_device *dev,
 	if (adj->master)
 		sysfs_remove_link(&(dev->dev.kobj), "master");
 
-	if (netdev_adjacent_is_neigh_list(dev, dev_list))
+	if (netdev_adjacent_is_neigh_list(dev, dev_list) &&
+	    net_eq(dev_net(dev),dev_net(adj_dev)))
 		netdev_adjacent_sysfs_del(dev, adj_dev->name, dev_list);
 
 	list_del_rcu(&adj->list);
@@ -5573,11 +5574,65 @@  void netdev_bonding_info_change(struct net_device *dev,
 }
 EXPORT_SYMBOL(netdev_bonding_info_change);
 
+void netdev_adjacent_add_links(struct net_device *dev)
+{
+	struct netdev_adjacent *iter;
+
+	struct net *net = dev_net(dev);
+
+	list_for_each_entry(iter, &dev->adj_list.upper, list) {
+		if (!net_eq(net,dev_net(iter->dev)))
+			continue;
+		netdev_adjacent_sysfs_add(iter->dev, dev,
+					  &iter->dev->adj_list.lower);
+		netdev_adjacent_sysfs_add(dev, iter->dev,
+					  &dev->adj_list.upper);
+	}
+
+	list_for_each_entry(iter, &dev->adj_list.lower, list) {
+		if (!net_eq(net,dev_net(iter->dev)))
+			continue;
+		netdev_adjacent_sysfs_add(iter->dev, dev,
+					  &iter->dev->adj_list.upper);
+		netdev_adjacent_sysfs_add(dev, iter->dev,
+					  &dev->adj_list.lower);
+	}
+}
+
+void netdev_adjacent_del_links(struct net_device *dev)
+{
+	struct netdev_adjacent *iter;
+
+	struct net *net = dev_net(dev);
+
+	list_for_each_entry(iter, &dev->adj_list.upper, list) {
+		if (!net_eq(net,dev_net(iter->dev)))
+			continue;
+		netdev_adjacent_sysfs_del(iter->dev, dev->name,
+					  &iter->dev->adj_list.lower);
+		netdev_adjacent_sysfs_del(dev, iter->dev->name,
+					  &dev->adj_list.upper);
+	}
+
+	list_for_each_entry(iter, &dev->adj_list.lower, list) {
+		if (!net_eq(net,dev_net(iter->dev)))
+			continue;
+		netdev_adjacent_sysfs_del(iter->dev, dev->name,
+					  &iter->dev->adj_list.upper);
+		netdev_adjacent_sysfs_del(dev, iter->dev->name,
+					  &dev->adj_list.lower);
+	}
+}
+
 void netdev_adjacent_rename_links(struct net_device *dev, char *oldname)
 {
 	struct netdev_adjacent *iter;
 
+	struct net *net = dev_net(dev);
+
 	list_for_each_entry(iter, &dev->adj_list.upper, list) {
+		if (!net_eq(net,dev_net(iter->dev)))
+			continue;
 		netdev_adjacent_sysfs_del(iter->dev, oldname,
 					  &iter->dev->adj_list.lower);
 		netdev_adjacent_sysfs_add(iter->dev, dev,
@@ -5585,6 +5640,8 @@  void netdev_adjacent_rename_links(struct net_device *dev, char *oldname)
 	}
 
 	list_for_each_entry(iter, &dev->adj_list.lower, list) {
+		if (!net_eq(net,dev_net(iter->dev)))
+			continue;
 		netdev_adjacent_sysfs_del(iter->dev, oldname,
 					  &iter->dev->adj_list.upper);
 		netdev_adjacent_sysfs_add(iter->dev, dev,
@@ -7388,6 +7445,7 @@  int dev_change_net_namespace(struct net_device *dev, struct net *net, const char
 
 	/* Send a netdev-removed uevent to the old namespace */
 	kobject_uevent(&dev->dev.kobj, KOBJ_REMOVE);
+	netdev_adjacent_del_links(dev);
 
 	/* Actually switch the network namespace */
 	dev_net_set(dev, net);
@@ -7402,6 +7460,7 @@  int dev_change_net_namespace(struct net_device *dev, struct net *net, const char
 
 	/* Send a netdev-add uevent to the new namespace */
 	kobject_uevent(&dev->dev.kobj, KOBJ_ADD);
+	netdev_adjacent_add_links(dev);
 
 	/* Fixup kobjects */
 	err = device_rename(&dev->dev, dev->name);