1. 28 Nov, 2018 1 commit
    • Taehee Yoo's avatar
      netfilter: nf_tables: deactivate expressions in rule replecement routine · ca089878
      Taehee Yoo authored
      There is no expression deactivation call from the rule replacement path,
      hence, chain counter is not decremented. A few steps to reproduce the
         %nft add table ip filter
         %nft add chain ip filter c1
         %nft add chain ip filter c1
         %nft add rule ip filter c1 jump c2
         %nft replace rule ip filter c1 handle 3 accept
         %nft flush ruleset
      <jump c2> expression means immediate NFT_JUMP to chain c2.
      Reference count of chain c2 is increased when the rule is added.
      When rule is deleted or replaced, the reference counter of c2 should be
      decreased via nft_rule_expr_deactivate() which calls
      Splat looks like:
      [  214.396453] WARNING: CPU: 1 PID: 21 at net/netfilter/nf_tables_api.c:1432 nf_tables_chain_destroy.isra.38+0x2f9/0x3a0 [nf_tables]
      [  214.398983] Modules linked in: nf_tables nfnetlink
      [  214.398983] CPU: 1 PID: 21 Comm: kworker/1:1 Not tainted 4.20.0-rc2+ #44
      [  214.398983] Workqueue: events nf_tables_trans_destroy_work [nf_tables]
      [  214.398983] RIP: 0010:nf_tables_chain_destroy.isra.38+0x2f9/0x3a0 [nf_tables]
      [  214.398983] Code: 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 80 3c 02 00 0f 85 8e 00 00 00 48 8b 7b 58 e8 e1 2c 4e c6 48 89 df e8 d9 2c 4e c6 eb 9a <0f> 0b eb 96 0f 0b e9 7e fe ff ff e8 a7 7e 4e c6 e9 a4 fe ff ff e8
      [  214.398983] RSP: 0018:ffff8881152874e8 EFLAGS: 00010202
      [  214.398983] RAX: 0000000000000001 RBX: ffff88810ef9fc28 RCX: ffff8881152876f0
      [  214.398983] RDX: dffffc0000000000 RSI: 1ffff11022a50ede RDI: ffff88810ef9fc78
      [  214.398983] RBP: 1ffff11022a50e9d R08: 0000000080000000 R09: 0000000000000000
      [  214.398983] R10: 0000000000000000 R11: 0000000000000000 R12: 1ffff11022a50eba
      [  214.398983] R13: ffff888114446e08 R14: ffff8881152876f0 R15: ffffed1022a50ed6
      [  214.398983] FS:  0000000000000000(0000) GS:ffff888116400000(0000) knlGS:0000000000000000
      [  214.398983] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  214.398983] CR2: 00007fab9bb5f868 CR3: 000000012aa16000 CR4: 00000000001006e0
      [  214.398983] Call Trace:
      [  214.398983]  ? nf_tables_table_destroy.isra.37+0x100/0x100 [nf_tables]
      [  214.398983]  ? __kasan_slab_free+0x145/0x180
      [  214.398983]  ? nf_tables_trans_destroy_work+0x439/0x830 [nf_tables]
      [  214.398983]  ? kfree+0xdb/0x280
      [  214.398983]  nf_tables_trans_destroy_work+0x5f5/0x830 [nf_tables]
      [ ... ]
      Fixes: bb7b40ae ("netfilter: nf_tables: bogus EBUSY in chain deletions")
      Reported by: Christoph Anton Mitterer <calestyo@scientia.net>
      Link: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=914505
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=201791Signed-off-by: default avatarTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
  2. 26 Nov, 2018 6 commits
    • Taehee Yoo's avatar
      netfilter: nf_conncount: remove wrong condition check routine · 53ca0f2f
      Taehee Yoo authored
      All lists that reach the tree_nodes_free() function have both zero
      counter and true dead flag. The reason for this is that lists to be
      release are selected by nf_conncount_gc_list() which already decrements
      the list counter and sets on the dead flag. Therefore, this if statement
      in tree_nodes_free() is unnecessary and wrong.
      Fixes: 31568ec0 ("netfilter: nf_conncount: fix list_del corruption in conn_free")
      Signed-off-by: default avatarTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
    • Taehee Yoo's avatar
      netfilter: nat: fix double register in masquerade modules · 095faf45
      Taehee Yoo authored
      There is a reference counter to ensure that masquerade modules register
      notifiers only once. However, the existing reference counter approach is
      not safe, test commands are:
         while :
         	   modprobe ip6t_MASQUERADE &
      	   modprobe nft_masq_ipv6 &
      	   modprobe -rv ip6t_MASQUERADE &
      	   modprobe -rv nft_masq_ipv6 &
      numbers below represent the reference counter.
      CPU0        CPU1        CPU2        CPU3        CPU4
      [insmod]    [insmod]    [rmmod]     [rmmod]     [insmod]
      register    1->2
                  returns     2->1
      			returns     1->0
                                                      register <--
      The unregistation of CPU3 should be processed before the
      registration of CPU4.
      In order to fix this, use a mutex instead of reference counter.
      splat looks like:
      [  323.869557] watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [modprobe:1381]
      [  323.869574] Modules linked in: nf_tables(+) nf_nat_ipv6(-) nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 n]
      [  323.869574] irq event stamp: 194074
      [  323.898930] hardirqs last  enabled at (194073): [<ffffffff90004a0d>] trace_hardirqs_on_thunk+0x1a/0x1c
      [  323.898930] hardirqs last disabled at (194074): [<ffffffff90004a29>] trace_hardirqs_off_thunk+0x1a/0x1c
      [  323.898930] softirqs last  enabled at (182132): [<ffffffff922006ec>] __do_softirq+0x6ec/0xa3b
      [  323.898930] softirqs last disabled at (182109): [<ffffffff90193426>] irq_exit+0x1a6/0x1e0
      [  323.898930] CPU: 0 PID: 1381 Comm: modprobe Not tainted 4.20.0-rc2+ #27
      [  323.898930] RIP: 0010:raw_notifier_chain_register+0xea/0x240
      [  323.898930] Code: 3c 03 0f 8e f2 00 00 00 44 3b 6b 10 7f 4d 49 bc 00 00 00 00 00 fc ff df eb 22 48 8d 7b 10 488
      [  323.898930] RSP: 0018:ffff888101597218 EFLAGS: 00000206 ORIG_RAX: ffffffffffffff13
      [  323.898930] RAX: 0000000000000000 RBX: ffffffffc04361c0 RCX: 0000000000000000
      [  323.898930] RDX: 1ffffffff26132ae RSI: ffffffffc04aa3c0 RDI: ffffffffc04361d0
      [  323.898930] RBP: ffffffffc04361c8 R08: 0000000000000000 R09: 0000000000000001
      [  323.898930] R10: ffff8881015972b0 R11: fffffbfff26132c4 R12: dffffc0000000000
      [  323.898930] R13: 0000000000000000 R14: 1ffff110202b2e44 R15: ffffffffc04aa3c0
      [  323.898930] FS:  00007f813ed41540(0000) GS:ffff88811ae00000(0000) knlGS:0000000000000000
      [  323.898930] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  323.898930] CR2: 0000559bf2c9f120 CR3: 000000010bc80000 CR4: 00000000001006f0
      [  323.898930] Call Trace:
      [  323.898930]  ? atomic_notifier_chain_register+0x2d0/0x2d0
      [  323.898930]  ? down_read+0x150/0x150
      [  323.898930]  ? sched_clock_cpu+0x126/0x170
      [  323.898930]  ? nf_tables_core_module_init+0xe4/0xe4 [nf_tables]
      [  323.898930]  ? nf_tables_core_module_init+0xe4/0xe4 [nf_tables]
      [  323.898930]  register_netdevice_notifier+0xbb/0x790
      [  323.898930]  ? __dev_close_many+0x2d0/0x2d0
      [  323.898930]  ? __mutex_unlock_slowpath+0x17f/0x740
      [  323.898930]  ? wait_for_completion+0x710/0x710
      [  323.898930]  ? nf_tables_core_module_init+0xe4/0xe4 [nf_tables]
      [  323.898930]  ? up_write+0x6c/0x210
      [  323.898930]  ? nf_tables_core_module_init+0xe4/0xe4 [nf_tables]
      [  324.127073]  ? nf_tables_core_module_init+0xe4/0xe4 [nf_tables]
      [  324.127073]  nft_chain_filter_init+0x1e/0xe8a [nf_tables]
      [  324.127073]  nf_tables_module_init+0x37/0x92 [nf_tables]
      [ ... ]
      Fixes: 8dd33cc9 ("netfilter: nf_nat: generalize IPv4 masquerading support for nf_tables")
      Fixes: be6b635c ("netfilter: nf_nat: generalize IPv6 masquerading support for nf_tables")
      Signed-off-by: default avatarTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
    • Taehee Yoo's avatar
      netfilter: add missing error handling code for register functions · 584eab29
      Taehee Yoo authored
      register_{netdevice/inetaddr/inet6addr}_notifier may return an error
      value, this patch adds the code to handle these error paths.
      Signed-off-by: default avatarTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
    • Alin Nastac's avatar
      netfilter: ipv6: Preserve link scope traffic original oif · 508b0904
      Alin Nastac authored
      When ip6_route_me_harder is invoked, it resets outgoing interface of:
        - link-local scoped packets sent by neighbor discovery
        - multicast packets sent by MLD host
        - multicast packets send by MLD proxy daemon that sets outgoing
          interface through IPV6_PKTINFO ipi6_ifindex
      Link-local and multicast packets must keep their original oif after
      ip6_route_me_harder is called.
      Signed-off-by: default avatarAlin Nastac <alin.nastac@gmail.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
    • Florian Westphal's avatar
      netfilter: nfnetlink_cttimeout: fetch timeouts for udplite and gre, too · 89259088
      Florian Westphal authored
      syzbot was able to trigger the WARN in cttimeout_default_get() by
      passing UDPLITE as l4protocol.  Alias UDPLITE to UDP, both use
      same timeout values.
      Furthermore, also fetch GRE timeouts.  GRE is a bit more complicated,
      as it still can be a module and its netns_proto_gre struct layout isn't
      visible outside of the gre module. Can't move timeouts around, it
      appears conntrack sysctl unregister assumes net_generic() returns
      nf_proto_net, so we get crash. Expose layout of netns_proto_gre instead.
      A followup nf-next patch could make gre tracker be built-in as well
      if needed, its not that large.
      Last, make the WARN() mention the missing protocol value in case
      anything else is missing.
      Reported-by: syzbot+2fae8fa157dd92618cae@syzkaller.appspotmail.com
      Fixes: 8866df92 ("netfilter: nfnetlink_cttimeout: pass default timeout policy to obj_to_nlattr")
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
    • Xin Long's avatar
      ipvs: call ip_vs_dst_notifier earlier than ipv6_dev_notf · 2a31e4bd
      Xin Long authored
      ip_vs_dst_event is supposed to clean up all dst used in ipvs'
      destinations when a net dev is going down. But it works only
      when the dst's dev is the same as the dev from the event.
      Now with the same priority but late registration,
      ip_vs_dst_notifier is always called later than ipv6_dev_notf
      where the dst's dev is set to lo for NETDEV_DOWN event.
      As the dst's dev lo is not the same as the dev from the event
      in ip_vs_dst_event, ip_vs_dst_notifier doesn't actually work.
      Also as these dst have to wait for dest_trash_timer to clean
      them up. It would cause some non-permanent kernel warnings:
        unregister_netdevice: waiting for br0 to become free. Usage count = 3
      To fix it, call ip_vs_dst_notifier earlier than ipv6_dev_notf
      by increasing its priority to ADDRCONF_NOTIFY_PRIORITY + 5.
      Note that for ipv4 route fib_netdev_notifier doesn't set dst's
      dev to lo in NETDEV_DOWN event, so this fix is only needed when
      IP_VS_IPV6 is defined.
      Fixes: 7a4f0761 ("IPVS: init and cleanup restructuring")
      Reported-by: default avatarLi Shuang <shuali@redhat.com>
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Acked-by: default avatarJulian Anastasov <ja@ssi.bg>
      Acked-by: default avatarSimon Horman <horms@verge.net.au>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
  3. 17 Nov, 2018 1 commit
  4. 13 Nov, 2018 2 commits
    • Florian Westphal's avatar
      netfilter: nf_tables: fix use-after-free when deleting compat expressions · 29e38801
      Florian Westphal authored
      nft_compat ops do not have static storage duration, unlike all other
      When nf_tables_expr_destroy() returns, expr->ops might have been
      free'd already, so we need to store next address before calling
      expression destructor.
      For same reason, we can't deref match pointer after nft_xt_put().
      This can be easily reproduced by adding msleep() before
      nft_match_destroy() returns.
      Fixes: 0ca743a5 ("netfilter: nf_tables: add compatibility layer for x_tables")
      Reported-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
    • Taehee Yoo's avatar
      netfilter: xt_RATEEST: remove netns exit routine · 0fbcc5b5
      Taehee Yoo authored
      xt_rateest_net_exit() was added to check whether rules are flushed
      successfully. but ->net_exit() callback is called earlier than
      ->destroy() callback.
      So that ->net_exit() callback can't check that.
      test commands:
         %ip netns add vm1
         %ip netns exec vm1 iptables -t mangle -I PREROUTING -p udp \
      	   --dport 1111 -j RATEEST --rateest-name ap \
      	   --rateest-interval 250ms --rateest-ewma 0.5s
         %ip netns del vm1
      splat looks like:
      [  668.813518] WARNING: CPU: 0 PID: 87 at net/netfilter/xt_RATEEST.c:210 xt_rateest_net_exit+0x210/0x340 [xt_RATEEST]
      [  668.813518] Modules linked in: xt_RATEEST xt_tcpudp iptable_mangle bpfilter ip_tables x_tables
      [  668.813518] CPU: 0 PID: 87 Comm: kworker/u4:2 Not tainted 4.19.0-rc7+ #21
      [  668.813518] Workqueue: netns cleanup_net
      [  668.813518] RIP: 0010:xt_rateest_net_exit+0x210/0x340 [xt_RATEEST]
      [  668.813518] Code: 00 48 8b 85 30 ff ff ff 4c 8b 23 80 38 00 0f 85 24 01 00 00 48 8b 85 30 ff ff ff 4d 85 e4 4c 89 a5 58 ff ff ff c6 00 f8 74 b2 <0f> 0b 48 83 c3 08 4c 39 f3 75 b0 48 b8 00 00 00 00 00 fc ff df 49
      [  668.813518] RSP: 0018:ffff8801156c73f8 EFLAGS: 00010282
      [  668.813518] RAX: ffffed0022ad8e85 RBX: ffff880118928e98 RCX: 5db8012a00000000
      [  668.813518] RDX: ffff8801156c7428 RSI: 00000000cb1d185f RDI: ffff880115663b74
      [  668.813518] RBP: ffff8801156c74d0 R08: ffff8801156633c0 R09: 1ffff100236440be
      [  668.813518] R10: 0000000000000001 R11: ffffed002367d852 R12: ffff880115142b08
      [  668.813518] R13: 1ffff10022ad8e81 R14: ffff880118928ea8 R15: dffffc0000000000
      [  668.813518] FS:  0000000000000000(0000) GS:ffff88011b200000(0000) knlGS:0000000000000000
      [  668.813518] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  668.813518] CR2: 0000563aa69f4f28 CR3: 0000000105a16000 CR4: 00000000001006f0
      [  668.813518] Call Trace:
      [  668.813518]  ? unregister_netdevice_many+0xe0/0xe0
      [  668.813518]  ? xt_rateest_net_init+0x2c0/0x2c0 [xt_RATEEST]
      [  668.813518]  ? default_device_exit+0x1ca/0x270
      [  668.813518]  ? remove_proc_entry+0x1cd/0x390
      [  668.813518]  ? dev_change_net_namespace+0xd00/0xd00
      [  668.813518]  ? __init_waitqueue_head+0x130/0x130
      [  668.813518]  ops_exit_list.isra.10+0x94/0x140
      [  668.813518]  cleanup_net+0x45b/0x900
      [  668.813518]  ? net_drop_ns+0x110/0x110
      [  668.813518]  ? swapgs_restore_regs_and_return_to_usermode+0x3c/0x80
      [  668.813518]  ? save_trace+0x300/0x300
      [  668.813518]  ? lock_acquire+0x196/0x470
      [  668.813518]  ? lock_acquire+0x196/0x470
      [  668.813518]  ? process_one_work+0xb60/0x1de0
      [  668.813518]  ? _raw_spin_unlock_irq+0x29/0x40
      [  668.813518]  ? _raw_spin_unlock_irq+0x29/0x40
      [  668.813518]  ? __lock_acquire+0x4500/0x4500
      [  668.813518]  ? __lock_is_held+0xb4/0x140
      [  668.813518]  process_one_work+0xc13/0x1de0
      [  668.813518]  ? pwq_dec_nr_in_flight+0x3c0/0x3c0
      [  668.813518]  ? set_load_weight+0x270/0x270
      [ ... ]
      Fixes: 3427b2ab ("netfilter: make xt_rateest hash table per net")
      Signed-off-by: default avatarTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
  5. 12 Nov, 2018 6 commits
    • Florian Westphal's avatar
      netfilter: nf_tables: don't use position attribute on rule replacement · 447750f2
      Florian Westphal authored
      Its possible to set both HANDLE and POSITION when replacing a rule.
      In this case, the rule at POSITION gets replaced using the
      userspace-provided handle.  Rule handles are supposed to be generated
      by the kernel only.
      Duplicate handles should be harmless, however better disable this "feature"
      by only checking for the POSITION attribute on insert operations.
      Fixes: 5e948466 ("netfilter: nf_tables: add insert operation")
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
    • Florian Westphal's avatar
      selftests: add script to stress-test nft packet path vs. control plane · 25d8bced
      Florian Westphal authored
      Start flood ping for each cpu while loading/flushing rulesets to make
      sure we do not access already-free'd rules from nf_tables evaluation loop.
      Also add this to TARGETS so 'make run_tests' in selftest dir runs it
      This would have caught the bug fixed in previous change
      ("netfilter: nf_tables: do not skip inactive chains during generation update")
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
    • Florian Westphal's avatar
      netfilter: nf_tables: don't skip inactive chains during update · 0fb39bbe
      Florian Westphal authored
      There is no synchronization between packet path and the configuration plane.
      The packet path uses two arrays with rules, one contains the current (active)
      generation.  The other either contains the last (obsolete) generation or
      the future one.
      cpu1               cpu2
      delete c
                         genbit = !!net->gen;
                         rules = c->rg[genbit];
      cpu1 ignores c when updating if c is not active anymore in the new
      On cpu2, we now use rules from wrong generation, as c->rg[old]
      contains the rules matching 'c' whereas c->rg[new] was not updated and
      can even point to rules that have been free'd already, causing a crash.
      To fix this, make sure that 'current' to the 'next' generation are
      identical for chains that are going away so that c->rg[new] will just
      use the matching rules even if genbit was incremented already.
      Fixes: 0cbc06b3 ("netfilter: nf_tables: remove synchronize_rcu in commit phase")
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
    • Taehee Yoo's avatar
      netfilter: nf_conncount: fix unexpected permanent node of list. · 3c5cdb17
      Taehee Yoo authored
      When list->count is 0, the list is deleted by GC. But list->count is
      never reached 0 because initial count value is 1 and it is increased
      when node is inserted. So that initial value of list->count should be 0.
      Originally GC always finds zero count list through deleting node and
      decreasing count. However, list may be left empty since node insertion
      may fail eg.  allocaton problem. In order to solve this problem, GC
      routine also finds zero count list without deleting node.
      Fixes: cb2b36f5 ("netfilter: nf_conncount: Switch to plain list")
      Signed-off-by: default avatarTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
    • Taehee Yoo's avatar
      netfilter: nf_conncount: fix list_del corruption in conn_free · 31568ec0
      Taehee Yoo authored
      nf_conncount_tuple is an element of nft_connlimit and that is deleted by
      conn_free(). Elements can be deleted by both GC routine and data path
      functions (nf_conncount_lookup, nf_conncount_add) and they call
      conn_free() to free elements. But conn_free() only protects lists, not
      each element. So that list_del corruption could occurred.
      The conn_free() doesn't check whether element is already deleted. In
      order to protect elements, dead flag is added. If an element is deleted,
      dead flag is set. The only conn_free() can delete elements so that both
      list lock and dead flag are enough to protect it.
      test commands:
         %nft add table ip filter
         %nft add chain ip filter input { type filter hook input priority 0\; }
         %nft add rule filter input meter test { ip id ct count over 2 } counter
      splat looks like:
      [ 1779.495778] list_del corruption, ffff8800b6e12008->prev is LIST_POISON2 (dead000000000200)
      [ 1779.505453] ------------[ cut here ]------------
      [ 1779.506260] kernel BUG at lib/list_debug.c:50!
      [ 1779.515831] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI
      [ 1779.516772] CPU: 0 PID: 33 Comm: kworker/0:2 Not tainted 4.19.0-rc6+ #22
      [ 1779.516772] Workqueue: events_power_efficient nft_rhash_gc [nf_tables_set]
      [ 1779.516772] RIP: 0010:__list_del_entry_valid+0xd8/0x150
      [ 1779.516772] Code: 39 48 83 c4 08 b8 01 00 00 00 5b 5d c3 48 89 ea 48 c7 c7 00 c3 5b 98 e8 0f dc 40 ff 0f 0b 48 c7 c7 60 c3 5b 98 e8 01 dc 40 ff <0f> 0b 48 c7 c7 c0 c3 5b 98 e8 f3 db 40 ff 0f 0b 48 c7 c7 20 c4 5b
      [ 1779.516772] RSP: 0018:ffff880119127420 EFLAGS: 00010286
      [ 1779.516772] RAX: 000000000000004e RBX: dead000000000200 RCX: 0000000000000000
      [ 1779.516772] RDX: 000000000000004e RSI: 0000000000000008 RDI: ffffed0023224e7a
      [ 1779.516772] RBP: ffff88011934bc10 R08: ffffed002367cea9 R09: ffffed002367cea9
      [ 1779.516772] R10: 0000000000000001 R11: ffffed002367cea8 R12: ffff8800b6e12008
      [ 1779.516772] R13: ffff8800b6e12010 R14: ffff88011934bc20 R15: ffff8800b6e12008
      [ 1779.516772] FS:  0000000000000000(0000) GS:ffff88011b200000(0000) knlGS:0000000000000000
      [ 1779.516772] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [ 1779.516772] CR2: 00007fc876534010 CR3: 000000010da16000 CR4: 00000000001006f0
      [ 1779.516772] Call Trace:
      [ 1779.516772]  conn_free+0x9f/0x2b0 [nf_conncount]
      [ 1779.516772]  ? nf_ct_tmpl_alloc+0x2a0/0x2a0 [nf_conntrack]
      [ 1779.516772]  ? nf_conncount_add+0x520/0x520 [nf_conncount]
      [ 1779.516772]  ? do_raw_spin_trylock+0x1a0/0x1a0
      [ 1779.516772]  ? do_raw_spin_trylock+0x10/0x1a0
      [ 1779.516772]  find_or_evict+0xe5/0x150 [nf_conncount]
      [ 1779.516772]  nf_conncount_gc_list+0x162/0x360 [nf_conncount]
      [ 1779.516772]  ? nf_conncount_lookup+0xee0/0xee0 [nf_conncount]
      [ 1779.516772]  ? _raw_spin_unlock_irqrestore+0x45/0x50
      [ 1779.516772]  ? trace_hardirqs_off+0x6b/0x220
      [ 1779.516772]  ? trace_hardirqs_on_caller+0x220/0x220
      [ 1779.516772]  nft_rhash_gc+0x16b/0x540 [nf_tables_set]
      [ ... ]
      Fixes: 5c789e13 ("netfilter: nf_conncount: Add list lock and gc worker, and RCU for init tree search")
      Signed-off-by: default avatarTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
    • Taehee Yoo's avatar
      netfilter: nf_conncount: use spin_lock_bh instead of spin_lock · fd3e71a9
      Taehee Yoo authored
      conn_free() holds lock with spin_lock() and it is called by both
      nf_conncount_lookup() and nf_conncount_gc_list(). nf_conncount_lookup()
      is called from bottom-half context and nf_conncount_gc_list() from
      process context. So that spin_lock() call is not safe. Hence
      conn_free() should use spin_lock_bh() instead of spin_lock().
      test commands:
         %nft add table ip filter
         %nft add chain ip filter input { type filter hook input priority 0\; }
         %nft add rule filter input meter test { ip saddr ct count over 2 } \
      splat looks like:
      [  461.996507] ================================
      [  461.998999] WARNING: inconsistent lock state
      [  461.998999] 4.19.0-rc6+ #22 Not tainted
      [  461.998999] --------------------------------
      [  461.998999] inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage.
      [  461.998999] kworker/0:2/134 [HC0[0]:SC0[0]:HE1:SE1] takes:
      [  461.998999] 00000000a71a559a (&(&list->list_lock)->rlock){+.?.}, at: conn_free+0x69/0x2b0 [nf_conncount]
      [  461.998999] {IN-SOFTIRQ-W} state was registered at:
      [  461.998999]   _raw_spin_lock+0x30/0x70
      [  461.998999]   nf_conncount_add+0x28a/0x520 [nf_conncount]
      [  461.998999]   nft_connlimit_eval+0x401/0x580 [nft_connlimit]
      [  461.998999]   nft_dynset_eval+0x32b/0x590 [nf_tables]
      [  461.998999]   nft_do_chain+0x497/0x1430 [nf_tables]
      [  461.998999]   nft_do_chain_ipv4+0x255/0x330 [nf_tables]
      [  461.998999]   nf_hook_slow+0xb1/0x160
      [ ... ]
      [  461.998999] other info that might help us debug this:
      [  461.998999]  Possible unsafe locking scenario:
      [  461.998999]
      [  461.998999]        CPU0
      [  461.998999]        ----
      [  461.998999]   lock(&(&list->list_lock)->rlock);
      [  461.998999]   <Interrupt>
      [  461.998999]     lock(&(&list->list_lock)->rlock);
      [  461.998999]
      [  461.998999]  *** DEADLOCK ***
      [  461.998999]
      [ ... ]
      Fixes: 5c789e13 ("netfilter: nf_conncount: Add list lock and gc worker, and RCU for init tree search")
      Signed-off-by: default avatarTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
  6. 11 Nov, 2018 18 commits
    • Linus Torvalds's avatar
      Linux 4.20-rc2 · ccda4af0
      Linus Torvalds authored
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 7a3765ed
      Linus Torvalds authored
      Pull networking fixes from David Miller:
       "One last pull request before heading to Vancouver for LPC, here we have:
         1) Don't forget to free VSI contexts during ice driver unload, from
            Victor Raj.
         2) Don't forget napi delete calls during device remove in ice driver,
            from Dave Ertman.
         3) Don't request VLAN tag insertion of ibmvnic device when SKB
            doesn't have VLAN tags at all.
         4) IPV4 frag handling code has to accomodate the situation where two
            threads try to insert the same fragment into the hash table at the
            same time. From Eric Dumazet.
         5) Relatedly, don't flow separate on protocol ports for fragmented
            frames, also from Eric Dumazet.
         6) Memory leaks in qed driver, from Denis Bolotin.
         7) Correct valid MTU range in smsc95xx driver, from Stefan Wahren.
         8) Validate cls_flower nested policies properly, from Jakub Kicinski.
         9) Clearing of stats counters in mc88e6xxx driver doesn't retain
            important bits in the G1_STATS_OP register causing the chip to
            hang. Fix from Andrew Lunn"
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (41 commits)
        act_mirred: clear skb->tstamp on redirect
        net: dsa: mv88e6xxx: Fix clearing of stats counters
        tipc: fix link re-establish failure
        net: sched: cls_flower: validate nested enc_opts_policy to avoid warning
        net: mvneta: correct typo
        flow_dissector: do not dissect l4 ports for fragments
        net: qualcomm: rmnet: Fix incorrect assignment of real_dev
        net: aquantia: allow rx checksum offload configuration
        net: aquantia: invalid checksumm offload implementation
        net: aquantia: fixed enable unicast on 32 macvlan
        net: aquantia: fix potential IOMMU fault after driver unbind
        net: aquantia: synchronized flow control between mac/phy
        net: smsc95xx: Fix MTU range
        net: stmmac: Fix RX packet size > 8191
        qed: Fix potential memory corruption
        qed: Fix SPQ entries not returned to pool in error flows
        qed: Fix blocking/unlimited SPQ entries leak
        qed: Fix memory/entry leak in qed_init_sp_request()
        inet: frags: better deal with smp races
        net: hns3: bugfix for not checking return value
    • Linus Torvalds's avatar
      Merge tag 'kbuild-fixes-v4.20' of... · e12e00e3
      Linus Torvalds authored
      Merge tag 'kbuild-fixes-v4.20' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
      Pull Kbuild fixes from Masahiro Yamada:
       - fix build errors in binrpm-pkg and bindeb-pkg targets
       - fix false positive matches in merge_config.sh
       - fix build version mismatch in deb-pkg target
       - fix dtbs_install handling in (bin)deb-pkg target
       - revert a commit that allows setlocalversion to write to source tree
      * tag 'kbuild-fixes-v4.20' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
        builddeb: Fix inclusion of dtbs in debian package
        Revert "scripts/setlocalversion: git: Make -dirty check more robust"
        kbuild: deb-pkg: fix too low build version number
        kconfig: merge_config: avoid false positive matches from comment lines
        kbuild: deb-pkg: fix bindeb-pkg breakage when O= is used
        kbuild: rpm-pkg: fix binrpm-pkg breakage when O= is used
    • Linus Torvalds's avatar
      Merge tag 'for-4.20-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux · 63a42e1a
      Linus Torvalds authored
      Pull btrfs fixes from David Sterba:
       "Several fixes to recent release (4.19, fixes tagged for stable) and
        other fixes"
      * tag 'for-4.20-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
        Btrfs: fix missing delayed iputs on unmount
        Btrfs: fix data corruption due to cloning of eof block
        Btrfs: fix infinite loop on inode eviction after deduplication of eof block
        Btrfs: fix deadlock on tree root leaf when finding free extent
        btrfs: avoid link error with CONFIG_NO_AUTO_INLINE
        btrfs: tree-checker: Fix misleading group system information
        Btrfs: fix missing data checksums after a ranged fsync (msync)
        btrfs: fix pinned underflow after transaction aborted
        Btrfs: fix cur_offset in the error case for nocow
    • Linus Torvalds's avatar
      Merge tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 · c140f8b0
      Linus Torvalds authored
      Pull ext4 fixes from Ted Ts'o:
       "A large number of ext4 bug fixes, mostly buffer and memory leaks on
        error return cleanup paths"
      * tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
        ext4: missing !bh check in ext4_xattr_inode_write()
        ext4: fix buffer leak in __ext4_read_dirblock() on error path
        ext4: fix buffer leak in ext4_expand_extra_isize_ea() on error path
        ext4: fix buffer leak in ext4_xattr_move_to_block() on error path
        ext4: release bs.bh before re-using in ext4_xattr_block_find()
        ext4: fix buffer leak in ext4_xattr_get_block() on error path
        ext4: fix possible leak of s_journal_flag_rwsem in error path
        ext4: fix possible leak of sbi->s_group_desc_leak in error path
        ext4: remove unneeded brelse call in ext4_xattr_inode_update_ref()
        ext4: avoid possible double brelse() in add_new_gdb() on error path
        ext4: avoid buffer leak in ext4_orphan_add() after prior errors
        ext4: avoid buffer leak on shutdown in ext4_mark_iloc_dirty()
        ext4: fix possible inode leak in the retry loop of ext4_resize_fs()
        ext4: fix missing cleanup if ext4_alloc_flex_bg_array() fails while resizing
        ext4: add missing brelse() update_backups()'s error path
        ext4: add missing brelse() add_new_gdb_meta_bg()'s error path
        ext4: add missing brelse() in set_flexbg_block_bitmap()'s error path
        ext4: avoid potential extra brelse in setup_new_flex_group_blocks()
    • Linus Torvalds's avatar
      Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · b6df7b6d
      Linus Torvalds authored
      Pull x86 fixes from Thomas Gleixner:
       "A set of x86 fixes:
         - Cure the LDT remapping to user space on 5 level paging which ended
           up in the KASLR space
         - Remove LDT mapping before freeing the LDT pages
         - Make NFIT MCE handling more robust
         - Unbreak the VSMP build by removing the dependency on paravirt ops
         - Support broken PIT emulation on Microsoft hyperV
         - Don't trace vmware_sched_clock() to avoid tracer recursion
         - Remove -pipe from KBUILD CFLAGS which breaks clang and is also
           slower on GCC
         - Trivial coding style and typo fixes"
      * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/cpu/vmware: Do not trace vmware_sched_clock()
        x86/vsmp: Remove dependency on pv_irq_ops
        x86/ldt: Remove unused variable in map_ldt_struct()
        x86/ldt: Unmap PTEs for the slot before freeing LDT pages
        x86/mm: Move LDT remap out of KASLR region on 5-level paging
        acpi/nfit, x86/mce: Validate a MCE's address before using it
        acpi/nfit, x86/mce: Handle only uncorrectable machine checks
        x86/build: Remove -pipe from KBUILD_CFLAGS
        x86/hyper-v: Fix indentation in hv_do_fast_hypercall16()
        Documentation/x86: Fix typo in zero-page.txt
        x86/hyper-v: Enable PIT shutdown quirk
        clockevents/drivers/i8253: Add support for PIT shutdown quirk
    • Linus Torvalds's avatar
      Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 655c6b97
      Linus Torvalds authored
      Pull perf fixes from Thomas Gleixner:
       "A bunch of perf tooling fixes:
         - Make the Intel PT SQL viewer more robust
         - Make the Intel PT debug log more useful
         - Support weak groups in perf record so it's behaving the same way as
           perf stat
         - Display the LBR stats in callchain entries properly in perf top
         - Handle different PMu names with common prefix properlin in pert
         - Start syscall augmenting in perf trace. Preparation for
           architecture independent eBPF instrumentation of syscalls.
         - Fix build breakage in JVMTI perf lib
         - Fix arm64 tools build failure wrt smp_load_{acquire,release}"
      * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        perf tools: Do not zero sample_id_all for group members
        perf tools: Fix undefined symbol scnprintf in libperf-jvmti.so
        perf beauty: Use SRCARCH, ARCH=x86_64 must map to "x86" to find the headers
        perf intel-pt: Add MTC and CYC timestamps to debug log
        perf intel-pt: Add more event information to debug log
        perf scripts python: exported-sql-viewer.py: Fix table find when table re-ordered
        perf scripts python: exported-sql-viewer.py: Add help window
        perf scripts python: exported-sql-viewer.py: Add Selected branches report
        perf scripts python: exported-sql-viewer.py: Fall back to /usr/local/lib/libxed.so
        perf top: Display the LBR stats in callchain entry
        perf stat: Handle different PMU names with common prefix
        perf record: Support weak groups
        perf evlist: Move perf_evsel__reset_weak_group into evlist
        perf augmented_syscalls: Start collecting pathnames in the BPF program
        perf trace: Fix setting of augmented payload when using eBPF + raw_syscalls
        perf trace: When augmenting raw_syscalls plug raw_syscalls:sys_exit too
        perf examples bpf: Start augmenting raw_syscalls:sys_{start,exit}
        tools headers barrier: Fix arm64 tools build failure wrt smp_load_{acquire,release}
    • Linus Torvalds's avatar
      Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 08b52786
      Linus Torvalds authored
      Pull timer fix from Thomas Gleixner:
       "Just the removal of a redundant call into the sched deadline overrun
      * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        posix-cpu-timers: Remove useless call to check_dl_overrun()
    • Linus Torvalds's avatar
      Merge branch 'sched/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 024d4d4c
      Linus Torvalds authored
      Pull scheduler fixes from Thomas Gleixner:
       "Two small scheduler fixes:
         - Take hotplug lock in sched_init_smp(). Technically not really
           required, but lockdep will complain other.
         - Trivial comment fix in sched/fair"
      * 'sched/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        sched/fair: Fix a comment in task_numa_fault()
        sched/core: Take the hotplug lock in sched_init_smp()
    • Linus Torvalds's avatar
      Merge branch 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 1acf93ca
      Linus Torvalds authored
      Pull locking build fix from Thomas Gleixner:
       "A single fix for a build fail with CONFIG_PROFILE_ALL_BRANCHES=y in
        the qspinlock code"
      * 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/qspinlock: Fix compile error
    • Linus Torvalds's avatar
      Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 0b002cdd
      Linus Torvalds authored
      Pull core fixes from Thomas Gleixner:
       "A couple of fixlets for the core:
         - Kernel doc function documentation fixes
         - Missing prototypes for weak watchdog functions"
      * 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        resource/docs: Complete kernel-doc style function documentation
        watchdog/core: Add missing prototypes for weak functions
        resource/docs: Fix new kernel-doc warnings
    • Eric Dumazet's avatar
      act_mirred: clear skb->tstamp on redirect · 7236ead1
      Eric Dumazet authored
      If sch_fq is used at ingress, skbs that might have been
      timestamped by net_timestamp_set() if a packet capture
      is requesting timestamps could be delayed by arbitrary
      amount of time, since sch_fq time base is MONOTONIC.
      Fix this problem by moving code from sch_netem.c to act_mirred.c.
      Fixes: fb420d5d ("tcp/fq: move back to CLOCK_MONOTONIC")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    • Andrew Lunn's avatar
      net: dsa: mv88e6xxx: Fix clearing of stats counters · a9049ff9
      Andrew Lunn authored
      The mv88e6161 would sometime fail to probe with a timeout waiting for
      the switch to complete an operation. This operation is supposed to
      clear the statistics counters. However, due to a read/modify/write,
      without the needed mask, the operation actually carried out was more
      random, with invalid parameters, resulting in the switch not
      responding. We need to preserve the histogram mode bits, so apply a
      mask to keep them.
      Reported-by: default avatarChris Healy <Chris.Healy@zii.aero>
      Fixes: 40cff8fc ("net: dsa: mv88e6xxx: Fix stats histogram mode")
      Signed-off-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    • Jon Maloy's avatar
      tipc: fix link re-establish failure · 7ab412d3
      Jon Maloy authored
      When a link failure is detected locally, the link is reset, the flag
      link->in_session is set to false, and a RESET_MSG with the 'stopping'
      bit set is sent to the peer.
      The purpose of this bit is to inform the peer that this endpoint just
      is going down, and that the peer should handle the reception of this
      particular RESET message as a local failure. This forces the peer to
      accept another RESET or ACTIVATE message from this endpoint before it
      can re-establish the link. This again is necessary to ensure that
      link session numbers are properly exchanged before the link comes up
      If a failure is detected locally at the same time at the peer endpoint
      this will do the same, which is also a correct behavior.
      However, when receiving such messages, the endpoints will not
      distinguish between 'stopping' RESETs and ordinary ones when it comes
      to updating session numbers. Both endpoints will copy the received
      session number and set their 'in_session' flags to true at the
      reception, while they are still expecting another RESET from the
      peer before they can go ahead and re-establish. This is contradictory,
      since, after applying the validation check referred to below, the
      'in_session' flag will cause rejection of all such messages, and the
      link will never come up again.
      We now fix this by not only handling received RESET/STOPPING messages
      as a local failure, but also by omitting to set a new session number
      and the 'in_session' flag in such cases.
      Fixes: 7ea817f4 ("tipc: check session number before accepting link protocol messages")
      Signed-off-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    • Rob Herring's avatar
      builddeb: Fix inclusion of dtbs in debian package · d5615e47
      Rob Herring authored
      Commit 37c8a5fa ("kbuild: consolidate Devicetree dtb build rules")
      moved the location of 'dtbs_install' target which caused dtbs to not be
      installed when building debian package with 'bindeb-pkg' target. Update
      the builddeb script to use the same logic that determines if there's a
      'dtbs_install' target which is presence of the arch dts directory. Also,
      use CONFIG_OF_EARLY_FLATTREE instead of CONFIG_OF as that's a better
      indication of whether we are building dtbs.
      This commit will also have the side effect of installing dtbs on any
      arch that has dts files. Previously, it was dependent on whether the
      arch defined 'dtbs_install'.
      Fixes: 37c8a5fa ("kbuild: consolidate Devicetree dtb build rules")
      Reported-by: default avatarNuno Gonçalves <nunojpg@gmail.com>
      Signed-off-by: Rob Herring's avatarRob Herring <robh@kernel.org>
      Signed-off-by: default avatarMasahiro Yamada <yamada.masahiro@socionext.com>
    • Guenter Roeck's avatar
      Revert "scripts/setlocalversion: git: Make -dirty check more robust" · 8ef14c2c
      Guenter Roeck authored
      This reverts commit 6147b1cf.
      The reverted patch results in attempted write access to the source
      repository, even if that repository is mounted read-only.
      Output from "strace git status -uno --porcelain":
      getcwd("/tmp/linux-test", 129)          = 16
      open("/tmp/linux-test/.git/index.lock", O_RDWR|O_CREAT|O_EXCL|O_CLOEXEC, 0666) =
      	-1 EROFS (Read-only file system)
      While git appears to be able to handle this situation, a monitored
      build environment (such as the one used for Chrome OS kernel builds)
      may detect it and bail out with an access violation error. On top of
      that, the attempted write access suggests that git _will_ write to the
      file even if a build output directory is specified. Users may have the
      reasonable expectation that the source repository remains untouched in
      that situation.
      Fixes: 6147b1cf ("scripts/setlocalversion: git: Make -dirty check more robust"
      Cc: Genki Sky <sky@genki.is>
      Signed-off-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Reviewed-by: default avatarBrian Norris <briannorris@chromium.org>
      Signed-off-by: default avatarMasahiro Yamada <yamada.masahiro@socionext.com>
    • Masahiro Yamada's avatar
      kbuild: deb-pkg: fix too low build version number · bbcde0a7
      Masahiro Yamada authored
      Since commit b41d920a ("kbuild: deb-pkg: split generating packaging
      and build"), the build version of the kernel contained in a deb package
      is too low by 1.
      Prior to the bad commit, the kernel was built first, then the number
      in .version file was read out, and written into the debian control file.
      Now, the debian control file is created before the kernel is actually
      compiled, which is causing the version number mismatch.
      Let the mkdebian script pass KBUILD_BUILD_VERSION=${revision} to require
      the build system to use the specified version number.
      Fixes: b41d920a ("kbuild: deb-pkg: split generating packaging and build")
      Reported-by: default avatarDoug Smythies <dsmythies@telus.net>
      Signed-off-by: default avatarMasahiro Yamada <yamada.masahiro@socionext.com>
      Tested-by: default avatarDoug Smythies <dsmythies@telus.net>
    • Masahiro Yamada's avatar
      kconfig: merge_config: avoid false positive matches from comment lines · 6bbe4385
      Masahiro Yamada authored
      The current SED_CONFIG_EXP could match to comment lines in config
      fragment files, especially when CONFIG_PREFIX_ is empty. For example,
      Buildroot uses empty prefixing; starting symbols with BR2_ is just
      Make the sed expression more robust against false positives from
      comment lines. The new sed expression matches to only valid patterns.
      Signed-off-by: default avatarMasahiro Yamada <yamada.masahiro@socionext.com>
      Reviewed-by: Petr Vorel's avatarPetr Vorel <petr.vorel@gmail.com>
      Reviewed-by: arnout's avatarArnout Vandecappelle (Essensium/Mind) <arnout@mind.be>
  7. 10 Nov, 2018 6 commits