1. 28 Nov, 2018 1 commit
    • Taehee Yoo's avatar
      netfilter: nf_tables: deactivate expressions in rule replecement routine · ca089878
      Taehee Yoo authored
      There is no expression deactivation call from the rule replacement path,
      hence, chain counter is not decremented. A few steps to reproduce the
      problem:
      
         %nft add table ip filter
         %nft add chain ip filter c1
         %nft add chain ip filter c1
         %nft add rule ip filter c1 jump c2
         %nft replace rule ip filter c1 handle 3 accept
         %nft flush ruleset
      
      <jump c2> expression means immediate NFT_JUMP to chain c2.
      Reference count of chain c2 is increased when the rule is added.
      
      When rule is deleted or replaced, the reference counter of c2 should be
      decreased via nft_rule_expr_deactivate() which calls
      nft_immediate_deactivate().
      
      Splat looks like:
      [  214.396453] WARNING: CPU: 1 PID: 21 at net/netfilter/nf_tables_api.c:1432 nf_tables_chain_destroy.isra.38+0x2f9/0x3a0 [nf_tables]
      [  214.398983] Modules linked in: nf_tables nfnetlink
      [  214.398983] CPU: 1 PID: 21 Comm: kworker/1:1 Not tainted 4.20.0-rc2+ #44
      [  214.398983] Workqueue: events nf_tables_trans_destroy_work [nf_tables]
      [  214.398983] RIP: 0010:nf_tables_chain_destroy.isra.38+0x2f9/0x3a0 [nf_tables]
      [  214.398983] Code: 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 80 3c 02 00 0f 85 8e 00 00 00 48 8b 7b 58 e8 e1 2c 4e c6 48 89 df e8 d9 2c 4e c6 eb 9a <0f> 0b eb 96 0f 0b e9 7e fe ff ff e8 a7 7e 4e c6 e9 a4 fe ff ff e8
      [  214.398983] RSP: 0018:ffff8881152874e8 EFLAGS: 00010202
      [  214.398983] RAX: 0000000000000001 RBX: ffff88810ef9fc28 RCX: ffff8881152876f0
      [  214.398983] RDX: dffffc0000000000 RSI: 1ffff11022a50ede RDI: ffff88810ef9fc78
      [  214.398983] RBP: 1ffff11022a50e9d R08: 0000000080000000 R09: 0000000000000000
      [  214.398983] R10: 0000000000000000 R11: 0000000000000000 R12: 1ffff11022a50eba
      [  214.398983] R13: ffff888114446e08 R14: ffff8881152876f0 R15: ffffed1022a50ed6
      [  214.398983] FS:  0000000000000000(0000) GS:ffff888116400000(0000) knlGS:0000000000000000
      [  214.398983] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  214.398983] CR2: 00007fab9bb5f868 CR3: 000000012aa16000 CR4: 00000000001006e0
      [  214.398983] Call Trace:
      [  214.398983]  ? nf_tables_table_destroy.isra.37+0x100/0x100 [nf_tables]
      [  214.398983]  ? __kasan_slab_free+0x145/0x180
      [  214.398983]  ? nf_tables_trans_destroy_work+0x439/0x830 [nf_tables]
      [  214.398983]  ? kfree+0xdb/0x280
      [  214.398983]  nf_tables_trans_destroy_work+0x5f5/0x830 [nf_tables]
      [ ... ]
      
      Fixes: bb7b40ae ("netfilter: nf_tables: bogus EBUSY in chain deletions")
      Reported by: Christoph Anton Mitterer <calestyo@scientia.net>
      Link: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=914505
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=201791Signed-off-by: default avatarTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      ca089878
  2. 13 Nov, 2018 1 commit
  3. 12 Nov, 2018 2 commits
    • Florian Westphal's avatar
      netfilter: nf_tables: don't use position attribute on rule replacement · 447750f2
      Florian Westphal authored
      Its possible to set both HANDLE and POSITION when replacing a rule.
      In this case, the rule at POSITION gets replaced using the
      userspace-provided handle.  Rule handles are supposed to be generated
      by the kernel only.
      
      Duplicate handles should be harmless, however better disable this "feature"
      by only checking for the POSITION attribute on insert operations.
      
      Fixes: 5e948466 ("netfilter: nf_tables: add insert operation")
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      447750f2
    • Florian Westphal's avatar
      netfilter: nf_tables: don't skip inactive chains during update · 0fb39bbe
      Florian Westphal authored
      There is no synchronization between packet path and the configuration plane.
      
      The packet path uses two arrays with rules, one contains the current (active)
      generation.  The other either contains the last (obsolete) generation or
      the future one.
      
      Consider:
      cpu1               cpu2
                         nft_do_chain(c);
      delete c
      net->gen++;
                         genbit = !!net->gen;
                         rules = c->rg[genbit];
      
      cpu1 ignores c when updating if c is not active anymore in the new
      generation.
      
      On cpu2, we now use rules from wrong generation, as c->rg[old]
      contains the rules matching 'c' whereas c->rg[new] was not updated and
      can even point to rules that have been free'd already, causing a crash.
      
      To fix this, make sure that 'current' to the 'next' generation are
      identical for chains that are going away so that c->rg[new] will just
      use the matching rules even if genbit was incremented already.
      
      Fixes: 0cbc06b3 ("netfilter: nf_tables: remove synchronize_rcu in commit phase")
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      0fb39bbe
  4. 19 Oct, 2018 1 commit
    • Taehee Yoo's avatar
      netfilter: nf_flow_table: remove flowtable hook flush routine in netns exit routine · b7f1a16d
      Taehee Yoo authored
      When device is unregistered, flowtable flush routine is called
      by notifier_call(nf_tables_flowtable_event). and exit callback of
      nftables pernet_operation(nf_tables_exit_net) also has flowtable flush
      routine. but when network namespace is destroyed, both notifier_call
      and pernet_operation are called. hence flowtable flush routine in
      pernet_operation is unnecessary.
      
      test commands:
         %ip netns add vm1
         %ip netns exec vm1 nft add table ip filter
         %ip netns exec vm1 nft add flowtable ip filter w \
      	{ hook ingress priority 0\; devices = { lo }\; }
         %ip netns del vm1
      
      splat looks like:
      [  265.187019] WARNING: CPU: 0 PID: 87 at net/netfilter/core.c:309 nf_hook_entry_head+0xc7/0xf0
      [  265.187112] Modules linked in: nf_flow_table_ipv4 nf_flow_table nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables nfnetlink ip_tables x_tables
      [  265.187390] CPU: 0 PID: 87 Comm: kworker/u4:2 Not tainted 4.19.0-rc3+ #5
      [  265.187453] Workqueue: netns cleanup_net
      [  265.187514] RIP: 0010:nf_hook_entry_head+0xc7/0xf0
      [  265.187546] Code: 8d 81 68 03 00 00 5b c3 89 d0 83 fa 04 48 8d 84 c7 e8 11 00 00 76 81 0f 0b 31 c0 e9 78 ff ff ff 0f 0b 48 83 c4 08 31 c0 5b c3 <0f> 0b 31 c0 e9 65 ff ff ff 0f 0b 31 c0 e9 5c ff ff ff 48 89 0c 24
      [  265.187573] RSP: 0018:ffff88011546f098 EFLAGS: 00010246
      [  265.187624] RAX: ffffffff8d90e135 RBX: 1ffff10022a8de1c RCX: 0000000000000000
      [  265.187645] RDX: 0000000000000000 RSI: 0000000000000005 RDI: ffff880116298040
      [  265.187645] RBP: ffff88010ea4c1a8 R08: 0000000000000000 R09: 0000000000000000
      [  265.187645] R10: ffff88011546f1d8 R11: ffffed0022c532c1 R12: ffff88010ea4c1d0
      [  265.187645] R13: 0000000000000005 R14: dffffc0000000000 R15: ffff88010ea4c1c4
      [  265.187645] FS:  0000000000000000(0000) GS:ffff88011b200000(0000) knlGS:0000000000000000
      [  265.187645] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  265.187645] CR2: 00007fdfb8d00000 CR3: 0000000057a16000 CR4: 00000000001006f0
      [  265.187645] Call Trace:
      [  265.187645]  __nf_unregister_net_hook+0xca/0x5d0
      [  265.187645]  ? nf_hook_entries_free.part.3+0x80/0x80
      [  265.187645]  ? save_trace+0x300/0x300
      [  265.187645]  nf_unregister_net_hooks+0x2e/0x40
      [  265.187645]  nf_tables_exit_net+0x479/0x1340 [nf_tables]
      [  265.187645]  ? find_held_lock+0x39/0x1c0
      [  265.187645]  ? nf_tables_abort+0x30/0x30 [nf_tables]
      [  265.187645]  ? inet_frag_destroy_rcu+0xd0/0xd0
      [  265.187645]  ? trace_hardirqs_on+0x93/0x210
      [  265.187645]  ? __bpf_trace_preemptirq_template+0x10/0x10
      [  265.187645]  ? inet_frag_destroy_rcu+0xd0/0xd0
      [  265.187645]  ? inet_frag_destroy_rcu+0xd0/0xd0
      [  265.187645]  ? __mutex_unlock_slowpath+0x17f/0x740
      [  265.187645]  ? wait_for_completion+0x710/0x710
      [  265.187645]  ? bucket_table_free+0xb2/0x1f0
      [  265.187645]  ? nested_table_free+0x130/0x130
      [  265.187645]  ? __lock_is_held+0xb4/0x140
      [  265.187645]  ops_exit_list.isra.10+0x94/0x140
      [  265.187645]  cleanup_net+0x45b/0x900
      [ ... ]
      
      This WARNING means that hook unregisteration is failed because
      all flowtables hooks are already unregistered by notifier_call.
      
      Network namespace exit routine guarantees that all devices will be
      unregistered first. then, other exit callbacks of pernet_operations
      are called. so that removing flowtable flush routine in exit callback of
      pernet_operation(nf_tables_exit_net) doesn't make flowtable leak.
      Signed-off-by: default avatarTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      b7f1a16d
  5. 17 Sep, 2018 4 commits
  6. 31 Aug, 2018 1 commit
    • Taehee Yoo's avatar
      netfilter: nf_tables: release chain in flushing set · 7acfda53
      Taehee Yoo authored
      When element of verdict map is deleted, the delete routine should
      release chain. however, flush element of verdict map routine doesn't
      release chain.
      
      test commands:
         %nft add table ip filter
         %nft add chain ip filter c1
         %nft add map ip filter map1 { type ipv4_addr : verdict \; }
         %nft add element ip filter map1 { 1 : jump c1 }
         %nft flush map ip filter map1
         %nft flush ruleset
      
      splat looks like:
      [ 4895.170899] kernel BUG at net/netfilter/nf_tables_api.c:1415!
      [ 4895.178114] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI
      [ 4895.178880] CPU: 0 PID: 1670 Comm: nft Not tainted 4.18.0+ #55
      [ 4895.178880] RIP: 0010:nf_tables_chain_destroy.isra.28+0x39/0x220 [nf_tables]
      [ 4895.178880] Code: fc ff df 53 48 89 fb 48 83 c7 50 48 89 fa 48 c1 ea 03 0f b6 04 02 84 c0 74 09 3c 03 7f 05 e8 3e 4c 25 e1 8b 43 50 85 c0 74 02 <0f> 0b 48 89 da 48 b8 00 00 00 00 00 fc ff df 48 c1 ea 03 80 3c 02
      [ 4895.228342] RSP: 0018:ffff88010b98f4c0 EFLAGS: 00010202
      [ 4895.234841] RAX: 0000000000000001 RBX: ffff8801131c6968 RCX: ffff8801146585b0
      [ 4895.234841] RDX: 1ffff10022638d37 RSI: ffff8801191a9348 RDI: ffff8801131c69b8
      [ 4895.234841] RBP: ffff8801146585a8 R08: 1ffff1002323526a R09: 0000000000000000
      [ 4895.234841] R10: 0000000000000000 R11: 0000000000000000 R12: dead000000000200
      [ 4895.234841] R13: dead000000000100 R14: ffffffffa3638af8 R15: dffffc0000000000
      [ 4895.234841] FS:  00007f6d188e6700(0000) GS:ffff88011b600000(0000) knlGS:0000000000000000
      [ 4895.234841] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [ 4895.234841] CR2: 00007ffe72b8df88 CR3: 000000010e2d4000 CR4: 00000000001006f0
      [ 4895.234841] Call Trace:
      [ 4895.234841]  nf_tables_commit+0x2704/0x2c70 [nf_tables]
      [ 4895.234841]  ? nfnetlink_rcv_batch+0xa4f/0x11b0 [nfnetlink]
      [ 4895.234841]  ? nf_tables_setelem_notify.constprop.48+0x1a0/0x1a0 [nf_tables]
      [ 4895.323824]  ? __lock_is_held+0x9d/0x130
      [ 4895.323824]  ? kasan_unpoison_shadow+0x30/0x40
      [ 4895.333299]  ? kasan_kmalloc+0xa9/0xc0
      [ 4895.333299]  ? kmem_cache_alloc_trace+0x2c0/0x310
      [ 4895.333299]  ? nfnetlink_rcv_batch+0xa4f/0x11b0 [nfnetlink]
      [ 4895.333299]  nfnetlink_rcv_batch+0xdb9/0x11b0 [nfnetlink]
      [ 4895.333299]  ? debug_show_all_locks+0x290/0x290
      [ 4895.333299]  ? nfnetlink_net_init+0x150/0x150 [nfnetlink]
      [ 4895.333299]  ? sched_clock_cpu+0xe5/0x170
      [ 4895.333299]  ? sched_clock_local+0xff/0x130
      [ 4895.333299]  ? sched_clock_cpu+0xe5/0x170
      [ 4895.333299]  ? find_held_lock+0x39/0x1b0
      [ 4895.333299]  ? sched_clock_local+0xff/0x130
      [ 4895.333299]  ? memset+0x1f/0x40
      [ 4895.333299]  ? nla_parse+0x33/0x260
      [ 4895.333299]  ? ns_capable_common+0x6e/0x110
      [ 4895.333299]  nfnetlink_rcv+0x2c0/0x310 [nfnetlink]
      [ ... ]
      
      Fixes: 59105446 ("netfilter: nf_tables: revisit chain/object refcounting from elements")
      Signed-off-by: default avatarTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      7acfda53
  7. 16 Aug, 2018 3 commits
    • Florian Westphal's avatar
      netfilter: nf_tables: don't prevent event handler from device cleanup on netns exit · 6a48de01
      Florian Westphal authored
      When a netnsamespace exits, the nf_tables pernet_ops will remove all rules.
      However, there is one caveat:
      
      Base chains that register ingress hooks will cause use-after-free:
      device is already gone at that point.
      
      The device event handlers prevent this from happening:
      netns exit synthesizes unregister events for all devices.
      
      However, an improper fix for a race condition made the notifiers a no-op
      in case they get called from netns exit path, so revert that part.
      
      This is safe now as the previous patch fixed nf_tables pernet ops
      and device notifier initialisation ordering.
      
      Fixes: 0a2cf5ee ("netfilter: nf_tables: close race between netns exit and rmmod")
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      6a48de01
    • Florian Westphal's avatar
      netfilter: nf_tables: fix register ordering · d209df3e
      Florian Westphal authored
      We must register nfnetlink ops last, as that exposes nf_tables to
      userspace.  Without this, we could theoretically get nfnetlink request
      before net->nft state has been initialized.
      
      Fixes: 99633ab2 ("netfilter: nf_tables: complete net namespace support")
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      d209df3e
    • Taehee Yoo's avatar
      netfilter: nft_set: fix allocation size overflow in privsize callback. · 4ef360dd
      Taehee Yoo authored
      In order to determine allocation size of set, ->privsize is invoked.
      At this point, both desc->size and size of each data structure of set
      are used. desc->size means number of element that is given by user.
      desc->size is u32 type. so that upperlimit of set element is 4294967295.
      but return type of ->privsize is also u32. hence overflow can occurred.
      
      test commands:
         %nft add table ip filter
         %nft add set ip filter hash1 { type ipv4_addr \; size 4294967295 \; }
         %nft list ruleset
      
      splat looks like:
      [ 1239.202910] kasan: CONFIG_KASAN_INLINE enabled
      [ 1239.208788] kasan: GPF could be caused by NULL-ptr deref or user memory access
      [ 1239.217625] general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI
      [ 1239.219329] CPU: 0 PID: 1603 Comm: nft Not tainted 4.18.0-rc5+ #7
      [ 1239.229091] RIP: 0010:nft_hash_walk+0x1d2/0x310 [nf_tables_set]
      [ 1239.229091] Code: 84 d2 7f 10 4c 89 e7 89 44 24 38 e8 d8 5a 17 e0 8b 44 24 38 48 8d 7b 10 41 0f b6 0c 24 48 89 fa 48 89 fe 48 c1 ea 03 83 e6 07 <42> 0f b6 14 3a 40 38 f2 7f 1a 84 d2 74 16
      [ 1239.229091] RSP: 0018:ffff8801118cf358 EFLAGS: 00010246
      [ 1239.229091] RAX: 0000000000000000 RBX: 0000000000020400 RCX: 0000000000000001
      [ 1239.229091] RDX: 0000000000004082 RSI: 0000000000000000 RDI: 0000000000020410
      [ 1239.229091] RBP: ffff880114d5a988 R08: 0000000000007e94 R09: ffff880114dd8030
      [ 1239.229091] R10: ffff880114d5a988 R11: ffffed00229bb006 R12: ffff8801118cf4d0
      [ 1239.229091] R13: ffff8801118cf4d8 R14: 0000000000000000 R15: dffffc0000000000
      [ 1239.229091] FS:  00007f5a8fe0b700(0000) GS:ffff88011b600000(0000) knlGS:0000000000000000
      [ 1239.229091] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [ 1239.229091] CR2: 00007f5a8ecc27b0 CR3: 000000010608e000 CR4: 00000000001006f0
      [ 1239.229091] Call Trace:
      [ 1239.229091]  ? nft_hash_remove+0xf0/0xf0 [nf_tables_set]
      [ 1239.229091]  ? memset+0x1f/0x40
      [ 1239.229091]  ? __nla_reserve+0x9f/0xb0
      [ 1239.229091]  ? memcpy+0x34/0x50
      [ 1239.229091]  nf_tables_dump_set+0x9a1/0xda0 [nf_tables]
      [ 1239.229091]  ? __kmalloc_reserve.isra.29+0x2e/0xa0
      [ 1239.229091]  ? nft_chain_hash_obj+0x630/0x630 [nf_tables]
      [ 1239.229091]  ? nf_tables_commit+0x2c60/0x2c60 [nf_tables]
      [ 1239.229091]  netlink_dump+0x470/0xa20
      [ 1239.229091]  __netlink_dump_start+0x5ae/0x690
      [ 1239.229091]  nft_netlink_dump_start_rcu+0xd1/0x160 [nf_tables]
      [ 1239.229091]  nf_tables_getsetelem+0x2e5/0x4b0 [nf_tables]
      [ 1239.229091]  ? nft_get_set_elem+0x440/0x440 [nf_tables]
      [ 1239.229091]  ? nft_chain_hash_obj+0x630/0x630 [nf_tables]
      [ 1239.229091]  ? nf_tables_dump_obj_done+0x70/0x70 [nf_tables]
      [ 1239.229091]  ? nla_parse+0xab/0x230
      [ 1239.229091]  ? nft_get_set_elem+0x440/0x440 [nf_tables]
      [ 1239.229091]  nfnetlink_rcv_msg+0x7f0/0xab0 [nfnetlink]
      [ 1239.229091]  ? nfnetlink_bind+0x1d0/0x1d0 [nfnetlink]
      [ 1239.229091]  ? debug_show_all_locks+0x290/0x290
      [ 1239.229091]  ? sched_clock_cpu+0x132/0x170
      [ 1239.229091]  ? find_held_lock+0x39/0x1b0
      [ 1239.229091]  ? sched_clock_local+0x10d/0x130
      [ 1239.229091]  netlink_rcv_skb+0x211/0x320
      [ 1239.229091]  ? nfnetlink_bind+0x1d0/0x1d0 [nfnetlink]
      [ 1239.229091]  ? netlink_ack+0x7b0/0x7b0
      [ 1239.229091]  ? ns_capable_common+0x6e/0x110
      [ 1239.229091]  nfnetlink_rcv+0x2d1/0x310 [nfnetlink]
      [ 1239.229091]  ? nfnetlink_rcv_batch+0x10f0/0x10f0 [nfnetlink]
      [ 1239.229091]  ? netlink_deliver_tap+0x829/0x930
      [ 1239.229091]  ? lock_acquire+0x265/0x2e0
      [ 1239.229091]  netlink_unicast+0x406/0x520
      [ 1239.509725]  ? netlink_attachskb+0x5b0/0x5b0
      [ 1239.509725]  ? find_held_lock+0x39/0x1b0
      [ 1239.509725]  netlink_sendmsg+0x987/0xa20
      [ 1239.509725]  ? netlink_unicast+0x520/0x520
      [ 1239.509725]  ? _copy_from_user+0xa9/0xc0
      [ 1239.509725]  __sys_sendto+0x21a/0x2c0
      [ 1239.509725]  ? __ia32_sys_getpeername+0xa0/0xa0
      [ 1239.509725]  ? retint_kernel+0x10/0x10
      [ 1239.509725]  ? sched_clock_cpu+0x132/0x170
      [ 1239.509725]  ? find_held_lock+0x39/0x1b0
      [ 1239.509725]  ? lock_downgrade+0x540/0x540
      [ 1239.509725]  ? up_read+0x1c/0x100
      [ 1239.509725]  ? __do_page_fault+0x763/0x970
      [ 1239.509725]  ? retint_user+0x18/0x18
      [ 1239.509725]  __x64_sys_sendto+0x177/0x180
      [ 1239.509725]  do_syscall_64+0xaa/0x360
      [ 1239.509725]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
      [ 1239.509725] RIP: 0033:0x7f5a8f468e03
      [ 1239.509725] Code: 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb d0 0f 1f 84 00 00 00 00 00 83 3d 49 c9 2b 00 00 75 13 49 89 ca b8 2c 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 34 c3 48 83 ec 08 e8
      [ 1239.509725] RSP: 002b:00007ffd78d0b778 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
      [ 1239.509725] RAX: ffffffffffffffda RBX: 00007ffd78d0c890 RCX: 00007f5a8f468e03
      [ 1239.509725] RDX: 0000000000000034 RSI: 00007ffd78d0b7e0 RDI: 0000000000000003
      [ 1239.509725] RBP: 00007ffd78d0b7d0 R08: 00007f5a8f15c160 R09: 000000000000000c
      [ 1239.509725] R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffd78d0b7e0
      [ 1239.509725] R13: 0000000000000034 R14: 00007f5a8f9aff60 R15: 00005648040094b0
      [ 1239.509725] Modules linked in: nf_tables_set nf_tables nfnetlink ip_tables x_tables
      [ 1239.670713] ---[ end trace 39375adcda140f11 ]---
      [ 1239.676016] RIP: 0010:nft_hash_walk+0x1d2/0x310 [nf_tables_set]
      [ 1239.682834] Code: 84 d2 7f 10 4c 89 e7 89 44 24 38 e8 d8 5a 17 e0 8b 44 24 38 48 8d 7b 10 41 0f b6 0c 24 48 89 fa 48 89 fe 48 c1 ea 03 83 e6 07 <42> 0f b6 14 3a 40 38 f2 7f 1a 84 d2 74 16
      [ 1239.705108] RSP: 0018:ffff8801118cf358 EFLAGS: 00010246
      [ 1239.711115] RAX: 0000000000000000 RBX: 0000000000020400 RCX: 0000000000000001
      [ 1239.719269] RDX: 0000000000004082 RSI: 0000000000000000 RDI: 0000000000020410
      [ 1239.727401] RBP: ffff880114d5a988 R08: 0000000000007e94 R09: ffff880114dd8030
      [ 1239.735530] R10: ffff880114d5a988 R11: ffffed00229bb006 R12: ffff8801118cf4d0
      [ 1239.743658] R13: ffff8801118cf4d8 R14: 0000000000000000 R15: dffffc0000000000
      [ 1239.751785] FS:  00007f5a8fe0b700(0000) GS:ffff88011b600000(0000) knlGS:0000000000000000
      [ 1239.760993] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [ 1239.767560] CR2: 00007f5a8ecc27b0 CR3: 000000010608e000 CR4: 00000000001006f0
      [ 1239.775679] Kernel panic - not syncing: Fatal exception
      [ 1239.776630] Kernel Offset: 0x1f000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
      [ 1239.776630] Rebooting in 5 seconds..
      
      Fixes: 20a69341 ("netfilter: nf_tables: add netlink set API")
      Signed-off-by: default avatarTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      4ef360dd
  8. 03 Aug, 2018 3 commits
  9. 23 Jul, 2018 1 commit
  10. 20 Jul, 2018 4 commits
  11. 18 Jul, 2018 5 commits
  12. 17 Jul, 2018 1 commit
    • Taehee Yoo's avatar
      netfilter: nf_tables: fix jumpstack depth validation · 26b2f552
      Taehee Yoo authored
      The level of struct nft_ctx is updated by nf_tables_check_loops().  That
      is used to validate jumpstack depth. But jumpstack validation routine
      doesn't update and validate recursively.  So, in some cases, chain depth
      can be bigger than the NFT_JUMP_STACK_SIZE.
      
      After this patch, The jumpstack validation routine is located in the
      nft_chain_validate(). When new rules or new set elements are added, the
      nft_table_validate() is called by the nf_tables_newrule and the
      nf_tables_newsetelem. The nft_table_validate() calls the
      nft_chain_validate() that visit all their children chains recursively.
      So it can update depth of chain certainly.
      
      Reproducer:
         %cat ./test.sh
         #!/bin/bash
         nft add table ip filter
         nft add chain ip filter input { type filter hook input priority 0\; }
         for ((i=0;i<20;i++)); do
      	nft add chain ip filter a$i
         done
      
         nft add rule ip filter input jump a1
      
         for ((i=0;i<10;i++)); do
      	nft add rule ip filter a$i jump a$((i+1))
         done
      
         for ((i=11;i<19;i++)); do
      	nft add rule ip filter a$i jump a$((i+1))
         done
      
         nft add rule ip filter a10 jump a11
      
      Result:
      [  253.931782] WARNING: CPU: 1 PID: 0 at net/netfilter/nf_tables_core.c:186 nft_do_chain+0xacc/0xdf0 [nf_tables]
      [  253.931915] Modules linked in: nf_tables nfnetlink ip_tables x_tables
      [  253.932153] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.18.0-rc3+ #48
      [  253.932153] RIP: 0010:nft_do_chain+0xacc/0xdf0 [nf_tables]
      [  253.932153] Code: 83 f8 fb 0f 84 c7 00 00 00 e9 d0 00 00 00 83 f8 fd 74 0e 83 f8 ff 0f 84 b4 00 00 00 e9 bd 00 00 00 83 bd 64 fd ff ff 0f 76 09 <0f> 0b 31 c0 e9 bc 02 00 00 44 8b ad 64 fd
      [  253.933807] RSP: 0018:ffff88011b807570 EFLAGS: 00010212
      [  253.933807] RAX: 00000000fffffffd RBX: ffff88011b807660 RCX: 0000000000000000
      [  253.933807] RDX: 0000000000000010 RSI: ffff880112b39d78 RDI: ffff88011b807670
      [  253.933807] RBP: ffff88011b807850 R08: ffffed0023700ece R09: ffffed0023700ecd
      [  253.933807] R10: ffff88011b80766f R11: ffffed0023700ece R12: ffff88011b807898
      [  253.933807] R13: ffff880112b39d80 R14: ffff880112b39d60 R15: dffffc0000000000
      [  253.933807] FS:  0000000000000000(0000) GS:ffff88011b800000(0000) knlGS:0000000000000000
      [  253.933807] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  253.933807] CR2: 00000000014f1008 CR3: 000000006b216000 CR4: 00000000001006e0
      [  253.933807] Call Trace:
      [  253.933807]  <IRQ>
      [  253.933807]  ? sched_clock_cpu+0x132/0x170
      [  253.933807]  ? __nft_trace_packet+0x180/0x180 [nf_tables]
      [  253.933807]  ? sched_clock_cpu+0x132/0x170
      [  253.933807]  ? debug_show_all_locks+0x290/0x290
      [  253.933807]  ? __lock_acquire+0x4835/0x4af0
      [  253.933807]  ? inet_ehash_locks_alloc+0x1a0/0x1a0
      [  253.933807]  ? unwind_next_frame+0x159e/0x1840
      [  253.933807]  ? __read_once_size_nocheck.constprop.4+0x5/0x10
      [  253.933807]  ? nft_do_chain_ipv4+0x197/0x1e0 [nf_tables]
      [  253.933807]  ? nft_do_chain+0x5/0xdf0 [nf_tables]
      [  253.933807]  nft_do_chain_ipv4+0x197/0x1e0 [nf_tables]
      [  253.933807]  ? nft_do_chain_arp+0xb0/0xb0 [nf_tables]
      [  253.933807]  ? __lock_is_held+0x9d/0x130
      [  253.933807]  nf_hook_slow+0xc4/0x150
      [  253.933807]  ip_local_deliver+0x28b/0x380
      [  253.933807]  ? ip_call_ra_chain+0x3e0/0x3e0
      [  253.933807]  ? ip_rcv_finish+0x1610/0x1610
      [  253.933807]  ip_rcv+0xbcc/0xcc0
      [  253.933807]  ? debug_show_all_locks+0x290/0x290
      [  253.933807]  ? ip_local_deliver+0x380/0x380
      [  253.933807]  ? __lock_is_held+0x9d/0x130
      [  253.933807]  ? ip_local_deliver+0x380/0x380
      [  253.933807]  __netif_receive_skb_core+0x1c9c/0x2240
      Signed-off-by: default avatarTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      26b2f552
  13. 22 Jun, 2018 1 commit
    • NeilBrown's avatar
      rhashtable: split rhashtable.h · 0eb71a9d
      NeilBrown authored
      Due to the use of rhashtables in net namespaces,
      rhashtable.h is included in lots of the kernel,
      so a small changes can required a large recompilation.
      This makes development painful.
      
      This patch splits out rhashtable-types.h which just includes
      the major type declarations, and does not include (non-trivial)
      inline code.  rhashtable.h is no longer included by anything
      in the include/ directory.
      Common include files only include rhashtable-types.h so a large
      recompilation is only triggered when that changes.
      Acked-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarNeilBrown <neilb@suse.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0eb71a9d
  14. 12 Jun, 2018 4 commits
    • Kees Cook's avatar
      treewide: kzalloc() -> kcalloc() · 6396bb22
      Kees Cook authored
      The kzalloc() function has a 2-factor argument form, kcalloc(). This
      patch replaces cases of:
      
              kzalloc(a * b, gfp)
      
      with:
              kcalloc(a * b, gfp)
      
      as well as handling cases of:
      
              kzalloc(a * b * c, gfp)
      
      with:
      
              kzalloc(array3_size(a, b, c), gfp)
      
      as it's slightly less ugly than:
      
              kzalloc_array(array_size(a, b), c, gfp)
      
      This does, however, attempt to ignore constant size factors like:
      
              kzalloc(4 * 1024, gfp)
      
      though any constants defined via macros get caught up in the conversion.
      
      Any factors with a sizeof() of "unsigned char", "char", and "u8" were
      dropped, since they're redundant.
      
      The Coccinelle script used for this was:
      
      // Fix redundant parens around sizeof().
      @@
      type TYPE;
      expression THING, E;
      @@
      
      (
        kzalloc(
      -	(sizeof(TYPE)) * E
      +	sizeof(TYPE) * E
        , ...)
      |
        kzalloc(
      -	(sizeof(THING)) * E
      +	sizeof(THING) * E
        , ...)
      )
      
      // Drop single-byte sizes and redundant parens.
      @@
      expression COUNT;
      typedef u8;
      typedef __u8;
      @@
      
      (
      ...
      6396bb22
    • Kees Cook's avatar
      treewide: kmalloc() -> kmalloc_array() · 6da2ec56
      Kees Cook authored
      The kmalloc() function has a 2-factor argument form, kmalloc_array(). This
      patch replaces cases of:
      
              kmalloc(a * b, gfp)
      
      with:
              kmalloc_array(a * b, gfp)
      
      as well as handling cases of:
      
              kmalloc(a * b * c, gfp)
      
      with:
      
              kmalloc(array3_size(a, b, c), gfp)
      
      as it's slightly less ugly than:
      
              kmalloc_array(array_size(a, b), c, gfp)
      
      This does, however, attempt to ignore constant size factors like:
      
              kmalloc(4 * 1024, gfp)
      
      though any constants defined via macros get caught up in the conversion.
      
      Any factors with a sizeof() of "unsigned char", "char", and "u8" were
      dropped, since they're redundant.
      
      The tools/ directory was manually excluded, since it has its own
      implementation of kmalloc().
      
      The Coccinelle script used for this was:
      
      // Fix redundant parens around sizeof().
      @@
      type TYPE;
      expression THING, E;
      @@
      
      (
        kmalloc(
      -	(sizeof(TYPE)) * E
      +	sizeof(TYPE) * E
        , ...)
      |
        kmalloc(
      -	(sizeof(THING)) * E
      +	sizeof(THING) * E
        , ...)
      )
      
      // Drop single-byte sizes and redundant parens.
      @@
      expression COUNT;
      typedef u8;
      typedef __u8;
      @@
      
      (
        kmalloc(
      -	sizeof(u8) * (COUNT)
      +	COUNT
        , ...)
      |
        kmalloc(
      -	sizeof(__u8) * (COUNT)
      +	COUNT
        , ...)
      |
        kmalloc(
      -	sizeof(char) * (COUNT)
      +	COUNT
        , ...)
      |
        kmalloc(
      -	sizeof(unsigned char) * (COUNT)
      +	COUNT
        , ...)
      |
        kmalloc(
      -	sizeof(u8) * COUNT
      +	COUNT
        , ...)
      |
        kmalloc(
      -	sizeof(__u8) * COUNT
      +	COUNT
        , ...)
      |
        kmalloc(
      -	sizeof(char) * COUNT
      +	COUNT
        , ...)
      |
        kmalloc(
      -	sizeof(unsigned char) * COUNT
      +	COUNT
        , ...)
      )
      
      // 2-factor product with sizeof(type/expression) and identifier or constant.
      @@
      type TYPE;
      expression THING;
      identifier COUNT_ID;
      constant COUNT_CONST;
      @@
      
      (
      - kmalloc
      + kmalloc_array
        (
      -	sizeof(TYPE) * (COUNT_ID)
      +	COUNT_ID, sizeof(TYPE)
        , ...)
      |
      - kmalloc
      + kmalloc_array
        (
      -	sizeof(TYPE) * COUNT_ID
      +	COUNT_ID, sizeof(TYPE)
        , ...)
      |
      - kmalloc
      + kmalloc_array
        (
      -	sizeof(TYPE) * (COUNT_CONST)
      +	COUNT_CONST, sizeof(TYPE)
        , ...)
      |
      - kmalloc
      + kmalloc_array
        (
      -	sizeof(TYPE) * COUNT_CONST
      +	COUNT_CONST, sizeof(TYPE)
        , ...)
      |
      - kmalloc
      + kmalloc_array
        (
      -	sizeof(THING) * (COUNT_ID)
      +	COUNT_ID, sizeof(THING)
        , ...)
      |
      - kmalloc
      + kmalloc_array
        (
      -	sizeof(THING) * COUNT_ID
      +	COUNT_ID, sizeof(THING)
        , ...)
      |
      - kmalloc
      + kmalloc_array
        (
      -	sizeof(THING) * (COUNT_CONST)
      +	COUNT_CONST, sizeof(THING)
        , ...)
      |
      - kmalloc
      + kmalloc_array
        (
      -	sizeof(THING) * COUNT_CONST
      +	COUNT_CONST, sizeof(THING)
        , ...)
      )
      
      // 2-factor product, only identifiers.
      @@
      identifier SIZE, COUNT;
      @@
      
      - kmalloc
      + kmalloc_array
        (
      -	SIZE * COUNT
      +	COUNT, SIZE
        , ...)
      
      // 3-factor product with 1 sizeof(type) or sizeof(expression), with
      // redundant parens removed.
      @@
      expression THING;
      identifier STRIDE, COUNT;
      type TYPE;
      @@
      
      (
        kmalloc(
      -	sizeof(TYPE) * (COUNT) * (STRIDE)
      +	array3_size(COUNT, STRIDE, sizeof(TYPE))
        , ...)
      |
        kmalloc(
      -	sizeof(TYPE) * (COUNT) * STRIDE
      +	array3_size(COUNT, STRIDE, sizeof(TYPE))
        , ...)
      |
        kmalloc(
      -	sizeof(TYPE) * COUNT * (STRIDE)
      +	array3_size(COUNT, STRIDE, sizeof(TYPE))
        , ...)
      |
        kmalloc(
      -	sizeof(TYPE) * COUNT * STRIDE
      +	array3_size(COUNT, STRIDE, sizeof(TYPE))
        , ...)
      |
        kmalloc(
      -	sizeof(THING) * (COUNT) * (STRIDE)
      +	array3_size(COUNT, STRIDE, sizeof(THING))
        , ...)
      |
        kmalloc(
      -	sizeof(THING) * (COUNT) * STRIDE
      +	array3_size(COUNT, STRIDE, sizeof(THING))
        , ...)
      |
        kmalloc(
      -	sizeof(THING) * COUNT * (STRIDE)
      +	array3_size(COUNT, STRIDE, sizeof(THING))
        , ...)
      |
        kmalloc(
      -	sizeof(THING) * COUNT * STRIDE
      +	array3_size(COUNT, STRIDE, sizeof(THING))
        , ...)
      )
      
      // 3-factor product with 2 sizeof(variable), with redundant parens removed.
      @@
      expression THING1, THING2;
      identifier COUNT;
      type TYPE1, TYPE2;
      @@
      
      (
        kmalloc(
      -	sizeof(TYPE1) * sizeof(TYPE2) * COUNT
      +	array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
        , ...)
      |
        kmalloc(
      -	sizeof(TYPE1) * sizeof(THING2) * (COUNT)
      +	array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
        , ...)
      |
        kmalloc(
      -	sizeof(THING1) * sizeof(THING2) * COUNT
      +	array3_size(COUNT, sizeof(THING1), sizeof(THING2))
        , ...)
      |
        kmalloc(
      -	sizeof(THING1) * sizeof(THING2) * (COUNT)
      +	array3_size(COUNT, sizeof(THING1), sizeof(THING2))
        , ...)
      |
        kmalloc(
      -	sizeof(TYPE1) * sizeof(THING2) * COUNT
      +	array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
        , ...)
      |
        kmalloc(
      -	sizeof(TYPE1) * sizeof(THING2) * (COUNT)
      +	array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
        , ...)
      )
      
      // 3-factor product, only identifiers, with redundant parens removed.
      @@
      identifier STRIDE, SIZE, COUNT;
      @@
      
      (
        kmalloc(
      -	(COUNT) * STRIDE * SIZE
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kmalloc(
      -	COUNT * (STRIDE) * SIZE
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kmalloc(
      -	COUNT * STRIDE * (SIZE)
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kmalloc(
      -	(COUNT) * (STRIDE) * SIZE
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kmalloc(
      -	COUNT * (STRIDE) * (SIZE)
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kmalloc(
      -	(COUNT) * STRIDE * (SIZE)
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kmalloc(
      -	(COUNT) * (STRIDE) * (SIZE)
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kmalloc(
      -	COUNT * STRIDE * SIZE
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      )
      
      // Any remaining multi-factor products, first at least 3-factor products,
      // when they're not all constants...
      @@
      expression E1, E2, E3;
      constant C1, C2, C3;
      @@
      
      (
        kmalloc(C1 * C2 * C3, ...)
      |
        kmalloc(
      -	(E1) * E2 * E3
      +	array3_size(E1, E2, E3)
        , ...)
      |
        kmalloc(
      -	(E1) * (E2) * E3
      +	array3_size(E1, E2, E3)
        , ...)
      |
        kmalloc(
      -	(E1) * (E2) * (E3)
      +	array3_size(E1, E2, E3)
        , ...)
      |
        kmalloc(
      -	E1 * E2 * E3
      +	array3_size(E1, E2, E3)
        , ...)
      )
      
      // And then all remaining 2 factors products when they're not all constants,
      // keeping sizeof() as the second factor argument.
      @@
      expression THING, E1, E2;
      type TYPE;
      constant C1, C2, C3;
      @@
      
      (
        kmalloc(sizeof(THING) * C2, ...)
      |
        kmalloc(sizeof(TYPE) * C2, ...)
      |
        kmalloc(C1 * C2 * C3, ...)
      |
        kmalloc(C1 * C2, ...)
      |
      - kmalloc
      + kmalloc_array
        (
      -	sizeof(TYPE) * (E2)
      +	E2, sizeof(TYPE)
        , ...)
      |
      - kmalloc
      + kmalloc_array
        (
      -	sizeof(TYPE) * E2
      +	E2, sizeof(TYPE)
        , ...)
      |
      - kmalloc
      + kmalloc_array
        (
      -	sizeof(THING) * (E2)
      +	E2, sizeof(THING)
        , ...)
      |
      - kmalloc
      + kmalloc_array
        (
      -	sizeof(THING) * E2
      +	E2, sizeof(THING)
        , ...)
      |
      - kmalloc
      + kmalloc_array
        (
      -	(E1) * E2
      +	E1, E2
        , ...)
      |
      - kmalloc
      + kmalloc_array
        (
      -	(E1) * (E2)
      +	E1, E2
        , ...)
      |
      - kmalloc
      + kmalloc_array
        (
      -	E1 * E2
      +	E1, E2
        , ...)
      )
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      6da2ec56
    • Florian Westphal's avatar
      netfilter: nf_tables: close race between netns exit and rmmod · 0a2cf5ee
      Florian Westphal authored
      If net namespace is exiting while nf_tables module is being removed
      we can oops:
      
       BUG: unable to handle kernel NULL pointer dereference at 0000000000000040
       IP: nf_tables_flowtable_event+0x43/0xf0 [nf_tables]
       PGD 0 P4D 0
       Oops: 0000 [#1] SMP PTI
       Modules linked in: nf_tables(-) nfnetlink [..]
        unregister_netdevice_notifier+0xdd/0x130
        nf_tables_module_exit+0x24/0x3a [nf_tables]
        SyS_delete_module+0x1c5/0x240
        do_syscall_64+0x74/0x190
      
      Avoid this by attempting to take reference on the net namespace from
      the notifiers.  If it fails the namespace is exiting already, and nft
      core is taking care of cleanup work.
      
      We also need to make sure the netdev hook type gets removed
      before netns ops removal, else notifier might be invoked with device
      event for a netns where net->nft was never initialised (because
      pernet ops was removed beforehand).
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      0a2cf5ee
    • Florian Westphal's avatar
      netfilter: nf_tables: fix module unload race · 71ad00c5
      Florian Westphal authored
      We must first remove the nfnetlink protocol handler when nf_tables module
      is unloaded -- we don't want userspace to submit new change requests once
      we've started to tear down nft state.
      
      Furthermore, nfnetlink must not call any subsystem function after
      call_batch returned -EAGAIN.
      
      EAGAIN means the subsys mutex was dropped, so its unlikely but possible that
      nf_tables subsystem was removed due to 'rmmod nf_tables' on another cpu.
      
      Therefore, we must abort batch completely and not move on to next part of
      the batch.
      
      Last, we can't invoke ->abort unless we've checked that the subsystem is
      still registered.
      
      Change netns exit path of nf_tables to make sure any incompleted
      transaction gets removed on exit.
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      71ad00c5
  15. 02 Jun, 2018 4 commits
  16. 01 Jun, 2018 2 commits
    • Alexey Kodanev's avatar
      netfilter: nf_tables: check msg_type before nft_trans_set(trans) · 9c7f96fd
      Alexey Kodanev authored
      The patch moves the "trans->msg_type == NFT_MSG_NEWSET" check before
      using nft_trans_set(trans). Otherwise we can get out of bounds read.
      
      For example, KASAN reported the one when running 0001_cache_handling_0 nft
      test. In this case "trans->msg_type" was NFT_MSG_NEWTABLE:
      
      [75517.177808] BUG: KASAN: slab-out-of-bounds in nft_set_lookup_global+0x22f/0x270 [nf_tables]
      [75517.279094] Read of size 8 at addr ffff881bdb643fc8 by task nft/7356
      ...
      [75517.375605] CPU: 26 PID: 7356 Comm: nft Tainted: G  E   4.17.0-rc7.1.x86_64 #1
      [75517.489587] Hardware name: Oracle Corporation SUN SERVER X4-2
      [75517.618129] Call Trace:
      [75517.648821]  dump_stack+0xd1/0x13b
      [75517.691040]  ? show_regs_print_info+0x5/0x5
      [75517.742519]  ? kmsg_dump_rewind_nolock+0xf5/0xf5
      [75517.799300]  ? lock_acquire+0x143/0x310
      [75517.846738]  print_address_description+0x85/0x3a0
      [75517.904547]  kasan_report+0x18d/0x4b0
      [75517.949892]  ? nft_set_lookup_global+0x22f/0x270 [nf_tables]
      [75518.019153]  ? nft_set_lookup_global+0x22f/0x270 [nf_tables]
      [75518.088420]  ? nft_set_lookup_global+0x22f/0x270 [nf_tables]
      [75518.157689]  nft_set_lookup_global+0x22f/0x270 [nf_tables]
      [75518.224869]  nf_tables_newsetelem+0x1a5/0x5d0 [nf_tables]
      [75518.291024]  ? nft_add_set_elem+0x2280/0x2280 [nf_tables]
      [75518.357154]  ? nla_parse+0x1a5/0x300
      [75518.401455]  ? kasan_kmalloc+0xa6/0xd0
      [75518.447842]  nfnetlink_rcv+0xc43/0x1bdf [nfnetlink]
      [75518.507743]  ? nfnetlink_rcv+0x7a5/0x1bdf [nfnetlink]
      [75518.569745]  ? nfnl_err_reset+0x3c0/0x3c0 [nfnetlink]
      [75518.631711]  ? lock_acquire+0x143/0x310
      [75518.679133]  ? netlink_deliver_tap+0x9b/0x1070
      [75518.733840]  ? kasan_unpoison_shadow+0x31/0x40
      [75518.788542]  netlink_unicast+0x45d/0x680
      [75518.837111]  ? __isolate_free_page+0x890/0x890
      [75518.891913]  ? netlink_attachskb+0x6b0/0x6b0
      [75518.944542]  netlink_sendmsg+0x6fa/0xd30
      [75518.993107]  ? netlink_unicast+0x680/0x680
      [75519.043758]  ? netlink_unicast+0x680/0x680
      [75519.094402]  sock_sendmsg+0xd9/0x160
      [75519.138810]  ___sys_sendmsg+0x64d/0x980
      [75519.186234]  ? copy_msghdr_from_user+0x350/0x350
      [75519.243118]  ? lock_downgrade+0x650/0x650
      [75519.292738]  ? do_raw_spin_unlock+0x5d/0x250
      [75519.345456]  ? _raw_spin_unlock+0x24/0x30
      [75519.395065]  ? __handle_mm_fault+0xbde/0x3410
      [75519.448830]  ? sock_setsockopt+0x3d2/0x1940
      [75519.500516]  ? __lock_acquire.isra.25+0xdc/0x19d0
      [75519.558448]  ? lock_downgrade+0x650/0x650
      [75519.608057]  ? __audit_syscall_entry+0x317/0x720
      [75519.664960]  ? __fget_light+0x58/0x250
      [75519.711325]  ? __sys_sendmsg+0xde/0x170
      [75519.758850]  __sys_sendmsg+0xde/0x170
      [75519.804193]  ? __ia32_sys_shutdown+0x90/0x90
      [75519.856725]  ? syscall_trace_enter+0x897/0x10e0
      [75519.912354]  ? trace_event_raw_event_sys_enter+0x920/0x920
      [75519.979432]  ? __audit_syscall_entry+0x720/0x720
      [75520.036118]  do_syscall_64+0xa3/0x3d0
      [75520.081248]  ? prepare_exit_to_usermode+0x47/0x1d0
      [75520.139904]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      [75520.201680] RIP: 0033:0x7fc153320ba0
      [75520.245772] RSP: 002b:00007ffe294c3638 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
      [75520.337708] RAX: ffffffffffffffda RBX: 00007ffe294c4820 RCX: 00007fc153320ba0
      [75520.424547] RDX: 0000000000000000 RSI: 00007ffe294c46b0 RDI: 0000000000000003
      [75520.511386] RBP: 00007ffe294c47b0 R08: 0000000000000004 R09: 0000000002114090
      [75520.598225] R10: 00007ffe294c30a0 R11: 0000000000000246 R12: 00007ffe294c3660
      [75520.684961] R13: 0000000000000001 R14: 00007ffe294c3650 R15: 0000000000000001
      
      [75520.790946] Allocated by task 7356:
      [75520.833994]  kasan_kmalloc+0xa6/0xd0
      [75520.878088]  __kmalloc+0x189/0x450
      [75520.920107]  nft_trans_alloc_gfp+0x20/0x190 [nf_tables]
      [75520.983961]  nf_tables_newtable+0xcd0/0x1bd0 [nf_tables]
      [75521.048857]  nfnetlink_rcv+0xc43/0x1bdf [nfnetlink]
      [75521.108655]  netlink_unicast+0x45d/0x680
      [75521.157013]  netlink_sendmsg+0x6fa/0xd30
      [75521.205271]  sock_sendmsg+0xd9/0x160
      [75521.249365]  ___sys_sendmsg+0x64d/0x980
      [75521.296686]  __sys_sendmsg+0xde/0x170
      [75521.341822]  do_syscall_64+0xa3/0x3d0
      [75521.386957]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      [75521.467867] Freed by task 23454:
      [75521.507804]  __kasan_slab_free+0x132/0x180
      [75521.558137]  kfree+0x14d/0x4d0
      [75521.596005]  free_rt_sched_group+0x153/0x280
      [75521.648410]  sched_autogroup_create_attach+0x19a/0x520
      [75521.711330]  ksys_setsid+0x2ba/0x400
      [75521.755529]  __ia32_sys_setsid+0xa/0x10
      [75521.802850]  do_syscall_64+0xa3/0x3d0
      [75521.848090]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      [75521.929000] The buggy address belongs to the object at ffff881bdb643f80
       which belongs to the cache kmalloc-96 of size 96
      [75522.079797] The buggy address is located 72 bytes inside of
       96-byte region [ffff881bdb643f80, ffff881bdb643fe0)
      [75522.221234] The buggy address belongs to the page:
      [75522.280100] page:ffffea006f6d90c0 count:1 mapcount:0 mapping:0000000000000000 index:0x0
      [75522.377443] flags: 0x2fffff80000100(slab)
      [75522.426956] raw: 002fffff80000100 0000000000000000 0000000000000000 0000000180200020
      [75522.521275] raw: ffffea006e6fafc0 0000000c0000000c ffff881bf180f400 0000000000000000
      [75522.615601] page dumped because: kasan: bad access detected
      
      Fixes: 37a9cc52 ("netfilter: nf_tables: add generation mask to sets")
      Signed-off-by: default avatarAlexey Kodanev <alexey.kodanev@oracle.com>
      Acked-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      9c7f96fd
    • Pablo Neira Ayuso's avatar
      netfilter: nf_tables: fix chain dependency validation · a654de8f
      Pablo Neira Ayuso authored
      The following ruleset:
      
       add table ip filter
       add chain ip filter input { type filter hook input priority 4; }
       add chain ip filter ap
       add rule ip filter input jump ap
       add rule ip filter ap masquerade
      
      results in a panic, because the masquerade extension should be rejected
      from the filter chain. The existing validation is missing a chain
      dependency check when the rule is added to the non-base chain.
      
      This patch fixes the problem by walking down the rules from the
      basechains, searching for either immediate or lookup expressions, then
      jumping to non-base chains and again walking down the rules to perform
      the expression validation, so we make sure the full ruleset graph is
      validated. This is done only once from the commit phase, in case of
      problem, we abort the transaction and perform fine grain validation for
      error reporting. This patch requires 00308791 ("netfilter:
      nfnetlink: allow commit to fail") to achieve this behaviour.
      
      This patch also adds a cleanup callback to nfnl batch interface to reset
      the validate state from the exit path.
      
      As a result of this patch, nf_tables_check_loops() doesn't use
      ->validate to check for loops, instead it just checks for immediate
      expressions.
      Reported-by: default avatarTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      a654de8f
  17. 29 May, 2018 2 commits
    • Florian Westphal's avatar
      netfilter: nf_tables: use call_rcu in netlink dumps · d9adf22a
      Florian Westphal authored
      We can make all dumps and lookups lockless.
      
      Dumps currently only hold the nfnl mutex on the dump request itself.
      Dumps can span multiple syscalls, dump continuation doesn't acquire the
      nfnl mutex anywhere, i.e. the dump callbacks in nf_tables already use
      rcu and never rely on nfnl mutex being held.
      
      So, just switch all dumpers to rcu.
      
      This requires taking a module reference before dropping the rcu lock
      so rmmod is blocked, we also need to hold module reference over
      the entire dump operation sequence. netlink already supports this
      via the .module member in the netlink_dump_control struct.
      
      For the non-dump case (i.e. lookup of a specific tables, chains, etc),
      we need to swtich to _rcu list iteration primitive and make sure we
      use GFP_ATOMIC.
      
      This patch also adds the new nft_netlink_dump_start_rcu() helper that
      takes care of the get_ref, drop-rcu-lock,start dump,
      get-rcu-lock,put-ref sequence.
      
      The helper will be reused for all dumps.
      
      Rationale in all dump requests is:
      
       - use the nft_netlink_dump_start_rcu helper added in first patch
       - use GFP_ATOMIC and rcu list iteration
       - switch to .call_rcu
      
      ... thus making all dumps in nf_tables not depend on the
      nfnl mutex anymore.
      
      In the nf_tables_getgen: This callback just fetches the current base
      sequence, there is no need to serialize this with nfnl nft mutex.
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      d9adf22a
    • Florian Westphal's avatar
      netfilter: nf_tables: fix endian mismatch in return type · d6501de8
      Florian Westphal authored
      harmless, but it avoids sparse warnings:
      
      nf_tables_api.c:2813:16: warning: incorrect type in return expression (different base types)
      nf_tables_api.c:2863:47: warning: incorrect type in argument 3 (different base types)
      nf_tables_api.c:3524:47: warning: incorrect type in argument 3 (different base types)
      nf_tables_api.c:3538:55: warning: incorrect type in argument 3 (different base types)
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      d6501de8