Message ID | 1471941991-21200-1-git-send-email-vdavydov@virtuozzo.com |
---|---|
State | New |
Series | "vzprivnet: vzprivnet_hook: fix crash if skb->dev == NULL" |
Headers | show |
diff --git a/net/ipv4/netfilter/ip_vzprivnet.c b/net/ipv4/netfilter/ip_vzprivnet.c index 293b2802b4c2..6e2bbe2d42ef 100644 --- a/net/ipv4/netfilter/ip_vzprivnet.c +++ b/net/ipv4/netfilter/ip_vzprivnet.c @@ -250,8 +250,13 @@ static unsigned int vzprivnet_hook(struct sk_buff *skb, int can_be_bridge) { struct dst_entry *dst; unsigned int pmark = VZPRIV_MARK_UNKNOWN; + struct net *src_net; - if (!ve_is_super(skb->dev->nd_net->owner_ve)) + if (WARN_ON_ONCE(!skb->dev && !skb->sk)) + return NF_ACCEPT; + + src_net = skb->dev ? dev_net(skb->dev) : sock_net(skb->sk); + if (!ve_is_super(src_net->owner_ve)) return NF_ACCEPT; dst = skb_dst(skb); diff --git a/net/ipv6/netfilter/ip6_vzprivnet.c b/net/ipv6/netfilter/ip6_vzprivnet.c index ee9c3c972637..0a491afd81b6 100644 --- a/net/ipv6/netfilter/ip6_vzprivnet.c +++ b/net/ipv6/netfilter/ip6_vzprivnet.c @@ -484,8 +484,13 @@ static unsigned int vzprivnet6_hook(struct sk_buff *skb, int can_be_bridge) int verdict = NF_DROP; struct vzprivnet *dst, *src; struct ipv6hdr *hdr; + struct net *src_net; - if (!ve_is_super(skb->dev->nd_net->owner_ve)) + if (WARN_ON_ONCE(!skb->dev && !skb->sk)) + return NF_ACCEPT; + + src_net = skb->dev ? dev_net(skb->dev) : sock_net(skb->sk); + if (!ve_is_super(src_net->owner_ve)) return NF_ACCEPT; hdr = ipv6_hdr(skb);
Acked-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com> On 08/23/2016 11:46 AM, Vladimir Davydov wrote: > For the sake of Docker, we only call vzprivnet rules if skb comes from > the host [1]. To check that, we look at skb->dev->nd_net->owner_ve. This > works fine when skb is retransmitted by a device (as it is the case in > case of a bridged network), but this results in KP when skb is sent > directly to a veth or venet device (via sendto) provided sysctl > net.vzpriv_filter_host is enabled: > > BUG: unable to handle kernel NULL pointer dereference at 00000000000003e8 > IP: [<ffffffffa03058f9>] vzprivnet_hook+0x9/0xb0 [ip_vzprivnet] > Oops: 0000 [#1] SMP > CPU: 0 PID: 3669 Comm: sendmail ve: 2ee1e66b-d1d4-4cb9-b65e-56af4cdd60b7 Not tainted 3.10.0-327.28.2.vz7.17.1 #1 17.1 > task: ffff880039e94c20 ti: ffff880036428000 task.ti: ffff880036428000 > RIP: 0010:[<ffffffffa03058f9>] [<ffffffffa03058f9>] vzprivnet_hook+0x9/0xb0 [ip_vzprivnet] > RSP: 0018:ffff88003642ba78 EFLAGS: 00010202 > RAX: 0000000000000000 RBX: ffff88003642bb08 RCX: ffff88003a451000 > RDX: 0000000000000001 RSI: 0000000000000001 RDI: ffff88000509d000 > RBP: ffff88003642ba80 R08: ffff88003642bb08 R09: 0000000000000024 > R10: ffff880035f78360 R11: 0000000000000006 R12: ffff88003642bad0 > R13: ffff88000509d000 R14: ffffffff81a6c1f0 R15: 0000000000000000 > FS: 00007fcc702ca840(0000) GS:ffff88003de00000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 00000000000003e8 CR3: 0000000039c1d000 CR4: 00000000000006f0 > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 > Stack: > ffffffffa03059f2 ffff88003642bac0 ffffffff81562950 ffffffffa0308680 > 0000000043838e8f ffff88000509d000 ffff88003642bb08 ffff88000509d000 > 0000000000000024 ffff88003642baf8 ffffffff81562a38 ffffffffa0308680 > Call Trace: > [<ffffffffa03059f2>] ? vzprivnet_out_hook+0x32/0x40 [ip_vzprivnet] > [<ffffffff81562950>] nf_iterate+0x70/0xb0 > [<ffffffff81562a38>] nf_hook_slow+0xa8/0x110 > [<ffffffff8156ff7e>] __ip_local_out_sk+0xee/0x100 > [<ffffffff81572e02>] ? ip_make_skb+0x22/0x120 > [<ffffffff8156f660>] ? ip_forward_options+0x1c0/0x1c0 > [<ffffffff8156ffab>] ip_local_out_sk+0x1b/0x40 > [<ffffffff81572d46>] ip_send_skb+0x16/0x50 > [<ffffffff81599df0>] udp_send_skb+0x170/0x380 > [<ffffffff8156fd00>] ? ip_copy_metadata+0x170/0x170 > [<ffffffff8159ae87>] udp_sendmsg+0x2f7/0x9d0 > [<ffffffff812088a1>] ? link_path_walk+0x81/0x860 > [<ffffffff815a77e4>] inet_sendmsg+0x64/0xb0 > [<ffffffff812f6442>] ? radix_tree_lookup_slot+0x22/0x50 > [<ffffffff81514e87>] sock_sendmsg+0x87/0xc0 > [<ffffffff8117a80b>] ? unlock_page+0x2b/0x30 > [<ffffffff81516f41>] SYSC_sendto+0x121/0x1c0 > [<ffffffff8163eef4>] ? __do_page_fault+0x164/0x450 > [<ffffffff8163f203>] ? do_page_fault+0x23/0x80 > [<ffffffff815177ee>] SyS_sendto+0xe/0x10 > [<ffffffff81643a09>] system_call_fastpath+0x16/0x1b > > This happens, because in this case skb->dev is NULL (there's no device > this skb is arrived to). To avoid crash there, let's take owner_ve from > the net namespace which the socket is assigned to. > > https://jira.sw.ru/browse/PSBM-51041 > > Fixes: 32efdd408fad ("vzprivnet: Do not execute vzprivnet_hook inside CT") [1] > Signed-off-by: Vladimir Davydov <vdavydov@virtuozzo.com> > Cc: Pavel Tikhomirov <ptikhomirov@virtuozzo.com> > --- > Changes in v2: > - do not crash if both skb->dev and skb->sk turn out to be NULL > for some reason - just print a warning and accept the packet > > net/ipv4/netfilter/ip_vzprivnet.c | 7 ++++++- > net/ipv6/netfilter/ip6_vzprivnet.c | 7 ++++++- > 2 files changed, 12 insertions(+), 2 deletions(-) > > diff --git a/net/ipv4/netfilter/ip_vzprivnet.c b/net/ipv4/netfilter/ip_vzprivnet.c > index 293b2802b4c2..6e2bbe2d42ef 100644 > --- a/net/ipv4/netfilter/ip_vzprivnet.c > +++ b/net/ipv4/netfilter/ip_vzprivnet.c > @@ -250,8 +250,13 @@ static unsigned int vzprivnet_hook(struct sk_buff *skb, int can_be_bridge) > { > struct dst_entry *dst; > unsigned int pmark = VZPRIV_MARK_UNKNOWN; > + struct net *src_net; > > - if (!ve_is_super(skb->dev->nd_net->owner_ve)) > + if (WARN_ON_ONCE(!skb->dev && !skb->sk)) > + return NF_ACCEPT; > + > + src_net = skb->dev ? dev_net(skb->dev) : sock_net(skb->sk); > + if (!ve_is_super(src_net->owner_ve)) > return NF_ACCEPT; > > dst = skb_dst(skb); > diff --git a/net/ipv6/netfilter/ip6_vzprivnet.c b/net/ipv6/netfilter/ip6_vzprivnet.c > index ee9c3c972637..0a491afd81b6 100644 > --- a/net/ipv6/netfilter/ip6_vzprivnet.c > +++ b/net/ipv6/netfilter/ip6_vzprivnet.c > @@ -484,8 +484,13 @@ static unsigned int vzprivnet6_hook(struct sk_buff *skb, int can_be_bridge) > int verdict = NF_DROP; > struct vzprivnet *dst, *src; > struct ipv6hdr *hdr; > + struct net *src_net; > > - if (!ve_is_super(skb->dev->nd_net->owner_ve)) > + if (WARN_ON_ONCE(!skb->dev && !skb->sk)) > + return NF_ACCEPT; > + > + src_net = skb->dev ? dev_net(skb->dev) : sock_net(skb->sk); > + if (!ve_is_super(src_net->owner_ve)) > return NF_ACCEPT; > > hdr = ipv6_hdr(skb); >
For the sake of Docker, we only call vzprivnet rules if skb comes from the host [1]. To check that, we look at skb->dev->nd_net->owner_ve. This works fine when skb is retransmitted by a device (as it is the case in case of a bridged network), but this results in KP when skb is sent directly to a veth or venet device (via sendto) provided sysctl net.vzpriv_filter_host is enabled: BUG: unable to handle kernel NULL pointer dereference at 00000000000003e8 IP: [<ffffffffa03058f9>] vzprivnet_hook+0x9/0xb0 [ip_vzprivnet] Oops: 0000 [#1] SMP CPU: 0 PID: 3669 Comm: sendmail ve: 2ee1e66b-d1d4-4cb9-b65e-56af4cdd60b7 Not tainted 3.10.0-327.28.2.vz7.17.1 #1 17.1 task: ffff880039e94c20 ti: ffff880036428000 task.ti: ffff880036428000 RIP: 0010:[<ffffffffa03058f9>] [<ffffffffa03058f9>] vzprivnet_hook+0x9/0xb0 [ip_vzprivnet] RSP: 0018:ffff88003642ba78 EFLAGS: 00010202 RAX: 0000000000000000 RBX: ffff88003642bb08 RCX: ffff88003a451000 RDX: 0000000000000001 RSI: 0000000000000001 RDI: ffff88000509d000 RBP: ffff88003642ba80 R08: ffff88003642bb08 R09: 0000000000000024 R10: ffff880035f78360 R11: 0000000000000006 R12: ffff88003642bad0 R13: ffff88000509d000 R14: ffffffff81a6c1f0 R15: 0000000000000000 FS: 00007fcc702ca840(0000) GS:ffff88003de00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000000003e8 CR3: 0000000039c1d000 CR4: 00000000000006f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Stack: ffffffffa03059f2 ffff88003642bac0 ffffffff81562950 ffffffffa0308680 0000000043838e8f ffff88000509d000 ffff88003642bb08 ffff88000509d000 0000000000000024 ffff88003642baf8 ffffffff81562a38 ffffffffa0308680 Call Trace: [<ffffffffa03059f2>] ? vzprivnet_out_hook+0x32/0x40 [ip_vzprivnet] [<ffffffff81562950>] nf_iterate+0x70/0xb0 [<ffffffff81562a38>] nf_hook_slow+0xa8/0x110 [<ffffffff8156ff7e>] __ip_local_out_sk+0xee/0x100 [<ffffffff81572e02>] ? ip_make_skb+0x22/0x120 [<ffffffff8156f660>] ? ip_forward_options+0x1c0/0x1c0 [<ffffffff8156ffab>] ip_local_out_sk+0x1b/0x40 [<ffffffff81572d46>] ip_send_skb+0x16/0x50 [<ffffffff81599df0>] udp_send_skb+0x170/0x380 [<ffffffff8156fd00>] ? ip_copy_metadata+0x170/0x170 [<ffffffff8159ae87>] udp_sendmsg+0x2f7/0x9d0 [<ffffffff812088a1>] ? link_path_walk+0x81/0x860 [<ffffffff815a77e4>] inet_sendmsg+0x64/0xb0 [<ffffffff812f6442>] ? radix_tree_lookup_slot+0x22/0x50 [<ffffffff81514e87>] sock_sendmsg+0x87/0xc0 [<ffffffff8117a80b>] ? unlock_page+0x2b/0x30 [<ffffffff81516f41>] SYSC_sendto+0x121/0x1c0 [<ffffffff8163eef4>] ? __do_page_fault+0x164/0x450 [<ffffffff8163f203>] ? do_page_fault+0x23/0x80 [<ffffffff815177ee>] SyS_sendto+0xe/0x10 [<ffffffff81643a09>] system_call_fastpath+0x16/0x1b This happens, because in this case skb->dev is NULL (there's no device this skb is arrived to). To avoid crash there, let's take owner_ve from the net namespace which the socket is assigned to. https://jira.sw.ru/browse/PSBM-51041 Fixes: 32efdd408fad ("vzprivnet: Do not execute vzprivnet_hook inside CT") [1] Signed-off-by: Vladimir Davydov <vdavydov@virtuozzo.com> Cc: Pavel Tikhomirov <ptikhomirov@virtuozzo.com> --- Changes in v2: - do not crash if both skb->dev and skb->sk turn out to be NULL for some reason - just print a warning and accept the packet net/ipv4/netfilter/ip_vzprivnet.c | 7 ++++++- net/ipv6/netfilter/ip6_vzprivnet.c | 7 ++++++- 2 files changed, 12 insertions(+), 2 deletions(-)