[Devel,RH7,1/2] net/vxlan: enable support and autoload in a container

Submitted by Pavel Tikhomirov on Oct. 26, 2016, 3:14 p.m.

Details

Message ID 1477494877-19093-1-git-send-email-ptikhomirov@virtuozzo.com
State New
Series "Series without cover letter"
Headers show

Commit Message

Pavel Tikhomirov Oct. 26, 2016, 3:14 p.m.
vxlan is safe in CT as:

1) Udp multicast socket to connect to outer word sits in creation net-
namespace, and these socket can get packets only forwarded/routed
in creation ns.

2) Vxlan device is owned by second netns(could be same as first) as
any other network device, so same all packets come to it are from
the same ns.

3) Vxlans logic works through vxlan_net placed on creation netns,
vxlan_fdb and vxlan_rdst are per vxlan device. Thus entries can
not intersec with entries from host and other CTs.

* One problem I can see now is adding fdb with ifindex(index of
device to route packets from UDP socket through) after vxlan is
moved to second namespace in vxlan_fdb_parse we use second
namespace to check ifindex by device lookup, but in
vxlan_xmit_one->ip_route_output_key->...->__ip_route_output_key
we use first(creation) namespace to lookup device and probably
will fail. So all fdb configuration should go before moving to
ns. Same is in mainstream AFAICS.

https://jira.sw.ru/browse/PSBM-53629

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>

---
 drivers/net/vxlan.c | 1 +
 kernel/kmod.c       | 1 +
 2 files changed, 2 insertions(+)

Patch hide | download patch | download mbox

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index fd2516d..8e89665 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -2367,6 +2367,7 @@  static void vxlan_setup(struct net_device *dev)
 
 	dev->vlan_features = dev->features;
 	dev->features |= NETIF_F_HW_VLAN_CTAG_TX | NETIF_F_HW_VLAN_STAG_TX;
+	dev->features |= NETIF_F_VIRTUAL;
 	dev->hw_features |= NETIF_F_SG | NETIF_F_HW_CSUM | NETIF_F_RXCSUM;
 	dev->hw_features |= NETIF_F_GSO_SOFTWARE;
 	dev->hw_features |= NETIF_F_HW_VLAN_CTAG_TX | NETIF_F_HW_VLAN_STAG_TX;
diff --git a/kernel/kmod.c b/kernel/kmod.c
index e0ef148..63748d4 100644
--- a/kernel/kmod.c
+++ b/kernel/kmod.c
@@ -421,6 +421,7 @@  static const char * const ve0_allowed_mod[] = {
 	"ip_set_list:set",
 
 	"rtnl-link-dummy",
+	"rtnl-link-vxlan",
 };
 
 /*

Comments

Konstantin Khorenko Oct. 26, 2016, 3:22 p.m.
reviewer?

--
Best regards,

Konstantin Khorenko,
Virtuozzo Linux Kernel Team

On 10/26/2016 06:14 PM, Pavel Tikhomirov wrote:
> vxlan is safe in CT as:
>
> 1) Udp multicast socket to connect to outer word sits in creation net-
> namespace, and these socket can get packets only forwarded/routed
> in creation ns.
>
> 2) Vxlan device is owned by second netns(could be same as first) as
> any other network device, so same all packets come to it are from
> the same ns.
>
> 3) Vxlans logic works through vxlan_net placed on creation netns,
> vxlan_fdb and vxlan_rdst are per vxlan device. Thus entries can
> not intersec with entries from host and other CTs.
>
> * One problem I can see now is adding fdb with ifindex(index of
> device to route packets from UDP socket through) after vxlan is
> moved to second namespace in vxlan_fdb_parse we use second
> namespace to check ifindex by device lookup, but in
> vxlan_xmit_one->ip_route_output_key->...->__ip_route_output_key
> we use first(creation) namespace to lookup device and probably
> will fail. So all fdb configuration should go before moving to
> ns. Same is in mainstream AFAICS.
>
> https://jira.sw.ru/browse/PSBM-53629
>
> Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
>
> ---
>  drivers/net/vxlan.c | 1 +
>  kernel/kmod.c       | 1 +
>  2 files changed, 2 insertions(+)
>
> diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
> index fd2516d..8e89665 100644
> --- a/drivers/net/vxlan.c
> +++ b/drivers/net/vxlan.c
> @@ -2367,6 +2367,7 @@ static void vxlan_setup(struct net_device *dev)
>
>  	dev->vlan_features = dev->features;
>  	dev->features |= NETIF_F_HW_VLAN_CTAG_TX | NETIF_F_HW_VLAN_STAG_TX;
> +	dev->features |= NETIF_F_VIRTUAL;
>  	dev->hw_features |= NETIF_F_SG | NETIF_F_HW_CSUM | NETIF_F_RXCSUM;
>  	dev->hw_features |= NETIF_F_GSO_SOFTWARE;
>  	dev->hw_features |= NETIF_F_HW_VLAN_CTAG_TX | NETIF_F_HW_VLAN_STAG_TX;
> diff --git a/kernel/kmod.c b/kernel/kmod.c
> index e0ef148..63748d4 100644
> --- a/kernel/kmod.c
> +++ b/kernel/kmod.c
> @@ -421,6 +421,7 @@ static const char * const ve0_allowed_mod[] = {
>  	"ip_set_list:set",
>
>  	"rtnl-link-dummy",
> +	"rtnl-link-vxlan",
>  };
>
>  /*
>
Pavel Tikhomirov Oct. 26, 2016, 3:31 p.m.
Andrey, please review.

On 10/26/2016 06:14 PM, Pavel Tikhomirov wrote:
> vxlan is safe in CT as:
>
> 1) Udp multicast socket to connect to outer word sits in creation net-
> namespace, and these socket can get packets only forwarded/routed
> in creation ns.
>
> 2) Vxlan device is owned by second netns(could be same as first) as
> any other network device, so same all packets come to it are from
> the same ns.
>
> 3) Vxlans logic works through vxlan_net placed on creation netns,
> vxlan_fdb and vxlan_rdst are per vxlan device. Thus entries can
> not intersec with entries from host and other CTs.
>
> * One problem I can see now is adding fdb with ifindex(index of
> device to route packets from UDP socket through) after vxlan is
> moved to second namespace in vxlan_fdb_parse we use second
> namespace to check ifindex by device lookup, but in
> vxlan_xmit_one->ip_route_output_key->...->__ip_route_output_key
> we use first(creation) namespace to lookup device and probably
> will fail. So all fdb configuration should go before moving to
> ns. Same is in mainstream AFAICS.
>
> https://jira.sw.ru/browse/PSBM-53629
>
> Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
>
> ---
>  drivers/net/vxlan.c | 1 +
>  kernel/kmod.c       | 1 +
>  2 files changed, 2 insertions(+)
>
> diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
> index fd2516d..8e89665 100644
> --- a/drivers/net/vxlan.c
> +++ b/drivers/net/vxlan.c
> @@ -2367,6 +2367,7 @@ static void vxlan_setup(struct net_device *dev)
>
>  	dev->vlan_features = dev->features;
>  	dev->features |= NETIF_F_HW_VLAN_CTAG_TX | NETIF_F_HW_VLAN_STAG_TX;
> +	dev->features |= NETIF_F_VIRTUAL;
>  	dev->hw_features |= NETIF_F_SG | NETIF_F_HW_CSUM | NETIF_F_RXCSUM;
>  	dev->hw_features |= NETIF_F_GSO_SOFTWARE;
>  	dev->hw_features |= NETIF_F_HW_VLAN_CTAG_TX | NETIF_F_HW_VLAN_STAG_TX;
> diff --git a/kernel/kmod.c b/kernel/kmod.c
> index e0ef148..63748d4 100644
> --- a/kernel/kmod.c
> +++ b/kernel/kmod.c
> @@ -421,6 +421,7 @@ static const char * const ve0_allowed_mod[] = {
>  	"ip_set_list:set",
>
>  	"rtnl-link-dummy",
> +	"rtnl-link-vxlan",
>  };
>
>  /*
>
Pavel Tikhomirov Oct. 27, 2016, 3:47 p.m.
I managed to create reproducer for the mentioned problem, it fails as 
expected on 4.7.7-200.fc24.x86_64, so ifindex problem is indeed 
mainstream one.

bridge_gatway_cidr='10.0.0.1/24'
container1_ip_cidr='10.0.0.3/24'
container1_mac_addr='02:42:0a:00:00:03'
container2_ip='10.0.0.2'
container2_mac_addr='02:42:0a:00:00:02'
# Some actual address from your hosts local net
container2_host_ip='10.94.72.162'
vxlan_id=42

set -x

ip netns add ct-net
ip netns add vx-net
ip netns exec vx-net ip link add dev br1 type bridge

# vxlan1 created in host netns with port 4789 and moved to vx-net
ip link add dev vxlan-tmp-1 type vxlan id $vxlan_id l2miss l3miss proxy 
learning dstport 4789
ip link set vxlan-tmp-1 netns vx-net
ip netns exec vx-net ip link set dev vxlan-tmp-1 name vxlan1

ip netns exec vx-net brctl addif br1 vxlan1

# veth1:eth1 pair connects vx-net and ct-net
ip link add dev vetha1 mtu 1450 type veth peer name vetha2 mtu 1450
ip link set dev vetha1 netns vx-net
ip netns exec vx-net ip link set dev vetha1 name veth1
ip netns exec vx-net brctl addif br1 veth1
ip netns exec vx-net ip addr add dev br1 $bridge_gatway_cidr
ip netns exec vx-net ip link set vxlan1 up
ip netns exec vx-net ip link set veth1 up
ip netns exec vx-net ip link set br1 up

ip link set dev vetha2 netns ct-net
ip netns exec ct-net ip link set dev vetha2 name eth1 address 
$container1_mac_addr
ip netns exec ct-net ip addr add dev eth1 $container1_ip_cidr
ip netns exec ct-net ip link set dev eth1 up

ip netns exec vx-net ip neighbor add $container2_ip lladdr 
$container2_mac_addr dev vxlan1 nud permanent
# Will see no packets, after remove "via vxlan1" will see VXLAN ICMP 
echo requests.
ip netns exec vx-net bridge fdb add $container2_mac_addr dev vxlan1 self 
dst $container2_host_ip vni $vxlan_id port 4789 via vxlan1

ip netns exec ct-net ping $container2_ip &
tcpdump -i enp0s31f6 dst 10.94.72.162

ip netns del vx-net
ip netns del ct-net

On 10/26/2016 06:14 PM, Pavel Tikhomirov wrote:
> vxlan is safe in CT as:
>
> 1) Udp multicast socket to connect to outer word sits in creation net-
> namespace, and these socket can get packets only forwarded/routed
> in creation ns.
>
> 2) Vxlan device is owned by second netns(could be same as first) as
> any other network device, so same all packets come to it are from
> the same ns.
>
> 3) Vxlans logic works through vxlan_net placed on creation netns,
> vxlan_fdb and vxlan_rdst are per vxlan device. Thus entries can
> not intersec with entries from host and other CTs.
>
> * One problem I can see now is adding fdb with ifindex(index of
> device to route packets from UDP socket through) after vxlan is
> moved to second namespace in vxlan_fdb_parse we use second
> namespace to check ifindex by device lookup, but in
> vxlan_xmit_one->ip_route_output_key->...->__ip_route_output_key
> we use first(creation) namespace to lookup device and probably
> will fail. So all fdb configuration should go before moving to
> ns. Same is in mainstream AFAICS.
>
> https://jira.sw.ru/browse/PSBM-53629
>
> Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
>
> ---
>  drivers/net/vxlan.c | 1 +
>  kernel/kmod.c       | 1 +
>  2 files changed, 2 insertions(+)
>
> diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
> index fd2516d..8e89665 100644
> --- a/drivers/net/vxlan.c
> +++ b/drivers/net/vxlan.c
> @@ -2367,6 +2367,7 @@ static void vxlan_setup(struct net_device *dev)
>
>  	dev->vlan_features = dev->features;
>  	dev->features |= NETIF_F_HW_VLAN_CTAG_TX | NETIF_F_HW_VLAN_STAG_TX;
> +	dev->features |= NETIF_F_VIRTUAL;
>  	dev->hw_features |= NETIF_F_SG | NETIF_F_HW_CSUM | NETIF_F_RXCSUM;
>  	dev->hw_features |= NETIF_F_GSO_SOFTWARE;
>  	dev->hw_features |= NETIF_F_HW_VLAN_CTAG_TX | NETIF_F_HW_VLAN_STAG_TX;
> diff --git a/kernel/kmod.c b/kernel/kmod.c
> index e0ef148..63748d4 100644
> --- a/kernel/kmod.c
> +++ b/kernel/kmod.c
> @@ -421,6 +421,7 @@ static const char * const ve0_allowed_mod[] = {
>  	"ip_set_list:set",
>
>  	"rtnl-link-dummy",
> +	"rtnl-link-vxlan",
>  };
>
>  /*
>
Pavel Tikhomirov Nov. 16, 2016, 6:45 a.m.
ping

On 10/26/2016 06:31 PM, Pavel Tikhomirov wrote:
> Andrey, please review.
>
> On 10/26/2016 06:14 PM, Pavel Tikhomirov wrote:
>> vxlan is safe in CT as:
>>
>> 1) Udp multicast socket to connect to outer word sits in creation net-
>> namespace, and these socket can get packets only forwarded/routed
>> in creation ns.
>>
>> 2) Vxlan device is owned by second netns(could be same as first) as
>> any other network device, so same all packets come to it are from
>> the same ns.
>>
>> 3) Vxlans logic works through vxlan_net placed on creation netns,
>> vxlan_fdb and vxlan_rdst are per vxlan device. Thus entries can
>> not intersec with entries from host and other CTs.
>>
>> * One problem I can see now is adding fdb with ifindex(index of
>> device to route packets from UDP socket through) after vxlan is
>> moved to second namespace in vxlan_fdb_parse we use second
>> namespace to check ifindex by device lookup, but in
>> vxlan_xmit_one->ip_route_output_key->...->__ip_route_output_key
>> we use first(creation) namespace to lookup device and probably
>> will fail. So all fdb configuration should go before moving to
>> ns. Same is in mainstream AFAICS.
>>
>> https://jira.sw.ru/browse/PSBM-53629
>>
>> Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
>>
>> ---
>>  drivers/net/vxlan.c | 1 +
>>  kernel/kmod.c       | 1 +
>>  2 files changed, 2 insertions(+)
>>
>> diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
>> index fd2516d..8e89665 100644
>> --- a/drivers/net/vxlan.c
>> +++ b/drivers/net/vxlan.c
>> @@ -2367,6 +2367,7 @@ static void vxlan_setup(struct net_device *dev)
>>
>>      dev->vlan_features = dev->features;
>>      dev->features |= NETIF_F_HW_VLAN_CTAG_TX | NETIF_F_HW_VLAN_STAG_TX;
>> +    dev->features |= NETIF_F_VIRTUAL;
>>      dev->hw_features |= NETIF_F_SG | NETIF_F_HW_CSUM | NETIF_F_RXCSUM;
>>      dev->hw_features |= NETIF_F_GSO_SOFTWARE;
>>      dev->hw_features |= NETIF_F_HW_VLAN_CTAG_TX |
>> NETIF_F_HW_VLAN_STAG_TX;
>> diff --git a/kernel/kmod.c b/kernel/kmod.c
>> index e0ef148..63748d4 100644
>> --- a/kernel/kmod.c
>> +++ b/kernel/kmod.c
>> @@ -421,6 +421,7 @@ static const char * const ve0_allowed_mod[] = {
>>      "ip_set_list:set",
>>
>>      "rtnl-link-dummy",
>> +    "rtnl-link-vxlan",
>>  };
>>
>>  /*
>>
>
Andrey Vagin Nov. 22, 2016, 11:35 p.m.
On Wed, Oct 26, 2016 at 06:14:36PM +0300, Pavel Tikhomirov wrote:
> vxlan is safe in CT as:
> 
> 1) Udp multicast socket to connect to outer word sits in creation net-
> namespace, and these socket can get packets only forwarded/routed
> in creation ns.
> 
> 2) Vxlan device is owned by second netns(could be same as first) as
> any other network device, so same all packets come to it are from
> the same ns.
> 
> 3) Vxlans logic works through vxlan_net placed on creation netns,
> vxlan_fdb and vxlan_rdst are per vxlan device. Thus entries can
> not intersec with entries from host and other CTs.
> 
> * One problem I can see now is adding fdb with ifindex(index of
> device to route packets from UDP socket through) after vxlan is
> moved to second namespace in vxlan_fdb_parse we use second
> namespace to check ifindex by device lookup, but in
> vxlan_xmit_one->ip_route_output_key->...->__ip_route_output_key
> we use first(creation) namespace to lookup device and probably
> will fail. So all fdb configuration should go before moving to
> ns. Same is in mainstream AFAICS.
> 
> https://jira.sw.ru/browse/PSBM-53629
> 
Acked-by: Andrei Vagin <avagin@virtuozzo.com>
> Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
> 
> ---
>  drivers/net/vxlan.c | 1 +
>  kernel/kmod.c       | 1 +
>  2 files changed, 2 insertions(+)
> 
> diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
> index fd2516d..8e89665 100644
> --- a/drivers/net/vxlan.c
> +++ b/drivers/net/vxlan.c
> @@ -2367,6 +2367,7 @@ static void vxlan_setup(struct net_device *dev)
>  
>  	dev->vlan_features = dev->features;
>  	dev->features |= NETIF_F_HW_VLAN_CTAG_TX | NETIF_F_HW_VLAN_STAG_TX;
> +	dev->features |= NETIF_F_VIRTUAL;
>  	dev->hw_features |= NETIF_F_SG | NETIF_F_HW_CSUM | NETIF_F_RXCSUM;
>  	dev->hw_features |= NETIF_F_GSO_SOFTWARE;
>  	dev->hw_features |= NETIF_F_HW_VLAN_CTAG_TX | NETIF_F_HW_VLAN_STAG_TX;
> diff --git a/kernel/kmod.c b/kernel/kmod.c
> index e0ef148..63748d4 100644
> --- a/kernel/kmod.c
> +++ b/kernel/kmod.c
> @@ -421,6 +421,7 @@ static const char * const ve0_allowed_mod[] = {
>  	"ip_set_list:set",
>  
>  	"rtnl-link-dummy",
> +	"rtnl-link-vxlan",
>  };
>  
>  /*
> -- 
> 2.7.4
> 
> _______________________________________________
> Devel mailing list
> Devel@openvz.org
> https://lists.openvz.org/mailman/listinfo/devel