Skip to content
Snippets Groups Projects
  1. Jan 16, 2023
  2. Dec 30, 2022
  3. Dec 28, 2022
  4. Dec 27, 2022
  5. Dec 22, 2022
  6. Dec 20, 2022
    • Jason A. Donenfeld's avatar
      prandom: remove prandom_u32_max() · 3c202d14
      Jason A. Donenfeld authored
      
      Convert the final two users of prandom_u32_max() that slipped in during
      6.2-rc1 to use get_random_u32_below().
      
      Then, with no more users left, we can finally remove the deprecated
      function.
      
      Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      3c202d14
    • Jason A. Donenfeld's avatar
      random: do not include <asm/archrandom.h> from random.h · 6bb20c15
      Jason A. Donenfeld authored
      
      The <asm/archrandom.h> header is a random.c private detail, not
      something to be called by other code. As such, don't make it
      automatically available by way of random.h.
      
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Acked-by: default avatarHeiko Carstens <hca@linux.ibm.com>
      Reviewed-by: default avatarChristophe Leroy <christophe.leroy@csgroup.eu>
      Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      6bb20c15
    • Benjamin Coddington's avatar
      net: simplify sk_page_frag · 08f65892
      Benjamin Coddington authored
      
      Now that in-kernel socket users that may recurse during reclaim have benn
      converted to sk_use_task_frag = false, we can have sk_page_frag() simply
      check that value.
      
      Signed-off-by: default avatarBenjamin Coddington <bcodding@redhat.com>
      Reviewed-by: default avatarGuillaume Nault <gnault@redhat.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      08f65892
    • Guillaume Nault's avatar
      net: Introduce sk_use_task_frag in struct sock. · fb87bd47
      Guillaume Nault authored
      Sockets that can be used while recursing into memory reclaim, like
      those used by network block devices and file systems, mustn't use
      current->task_frag: if the current process is already using it, then
      the inner memory reclaim call would corrupt the task_frag structure.
      
      To avoid this, sk_page_frag() uses ->sk_allocation to detect sockets
      that mustn't use current->task_frag, assuming that those used during
      memory reclaim had their allocation constraints reflected in
      ->sk_allocation.
      
      This unfortunately doesn't cover all cases: in an attempt to remove all
      usage of GFP_NOFS and GFP_NOIO, sunrpc stopped setting these flags in
      ->sk_allocation, and used memalloc_nofs critical sections instead.
      This breaks the sk_page_frag() heuristic since the allocation
      constraints are now stored in current->flags, which sk_page_frag()
      can't read without risking triggering a cache miss and slowing down
      TCP's fast path.
      
      This patch creates a new field in struct sock, named sk_use_task_frag,
      which sockets with memory reclaim constraints can set to false if they
      can't safely use current->task_frag. In such cases, sk_page_frag() now
      always returns the socket's page_frag (->sk_frag). The first user is
      sunrpc, which needs to avoid using current->task_frag but can keep
      ->sk_allocation set to GFP_KERNEL otherwise.
      
      Eventually, it might be possible to simplify sk_page_frag() by only
      testing ->sk_use_task_frag and avoid relying on the ->sk_allocation
      heuristic entirely (assuming other sockets will set ->sk_use_task_frag
      according to their constraints in the future).
      
      The new ->sk_use_task_frag field is placed in a hole in struct sock and
      belongs to a cache line shared with ->sk_shutdown. Therefore it should
      be hot and shouldn't have negative performance impacts on TCP's fast
      path (sk_shutdown is tested just before the while() loop in
      tcp_sendmsg_locked()).
      
      Link: https://lore.kernel.org/netdev/b4d8cb09c913d3e34f853736f3f5628abfd7f4b6.1656699567.git.gnault@redhat.com/
      
      
      Signed-off-by: default avatarGuillaume Nault <gnault@redhat.com>
      Reviewed-by: default avatarBenjamin Coddington <bcodding@redhat.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      fb87bd47
  7. Dec 19, 2022
  8. Dec 16, 2022
  9. Dec 15, 2022
  10. Dec 14, 2022
  11. Dec 13, 2022
  12. Dec 12, 2022
    • Coco Li's avatar
      IPv6/GRO: generic helper to remove temporary HBH/jumbo header in driver · 89300468
      Coco Li authored
      
      IPv6/TCP and GRO stacks can build big TCP packets with an added
      temporary Hop By Hop header.
      
      Is GSO is not involved, then the temporary header needs to be removed in
      the driver. This patch provides a generic helper for drivers that need
      to modify their headers in place.
      
      Tested:
      Compiled and ran with ethtool -K eth1 tso off
      Could send Big TCP packets
      
      Signed-off-by: default avatarCoco Li <lixiaoyan@google.com>
      Link: https://lore.kernel.org/r/20221210041646.3587757-1-lixiaoyan@google.com
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      89300468
    • Ido Schimmel's avatar
      bridge: mcast: Allow user space to specify MDB entry routing protocol · 1d7b66a7
      Ido Schimmel authored
      
      Add the 'MDBE_ATTR_RTPORT' attribute to allow user space to specify the
      routing protocol of the MDB port group entry. Enforce a minimum value of
      'RTPROT_STATIC' to prevent user space from using protocol values that
      should only be set by the kernel (e.g., 'RTPROT_KERNEL'). Maintain
      backward compatibility by defaulting to 'RTPROT_STATIC'.
      
      The protocol is already visible to user space in RTM_NEWMDB responses
      and notifications via the 'MDBA_MDB_EATTR_RTPROT' attribute.
      
      The routing protocol allows a routing daemon to distinguish between
      entries configured by it and those configured by the administrator. Once
      MDB flush is supported, the protocol can be used as a criterion
      according to which the flush is performed.
      
      Examples:
      
       # bridge mdb add dev br0 port dummy10 grp 239.1.1.1 permanent proto kernel
       Error: integer out of range.
      
       # bridge mdb add dev br0 port dummy10 grp 239.1.1.1 permanent proto static
      
       # bridge mdb add dev br0 port dummy10 grp 239.1.1.1 src 192.0.2.1 permanent proto zebra
      
       # bridge mdb add dev br0 port dummy10 grp 239.1.1.2 permanent source_list 198.51.100.1,198.51.100.2 filter_mode include proto 250
      
       # bridge -d mdb show
       dev br0 port dummy10 grp 239.1.1.2 src 198.51.100.2 permanent filter_mode include proto 250
       dev br0 port dummy10 grp 239.1.1.2 src 198.51.100.1 permanent filter_mode include proto 250
       dev br0 port dummy10 grp 239.1.1.2 permanent filter_mode include source_list 198.51.100.2/0.00,198.51.100.1/0.00 proto 250
       dev br0 port dummy10 grp 239.1.1.1 src 192.0.2.1 permanent filter_mode include proto zebra
       dev br0 port dummy10 grp 239.1.1.1 permanent filter_mode exclude proto static
      
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Acked-by: default avatarNikolay Aleksandrov <razor@blackwall.org>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      1d7b66a7
    • Ido Schimmel's avatar
      bridge: mcast: Allow user space to add (*, G) with a source list and filter mode · 6afaae6d
      Ido Schimmel authored
      
      Add new netlink attributes to the RTM_NEWMDB request that allow user
      space to add (*, G) with a source list and filter mode.
      
      The RTM_NEWMDB message can already dump such entries (created by the
      kernel) so there is no need to add dump support. However, the message
      contains a different set of attributes depending if it is a request or a
      response. The naming and structure of the new attributes try to follow
      the existing ones used in the response.
      
      Request:
      
      [ struct nlmsghdr ]
      [ struct br_port_msg ]
      [ MDBA_SET_ENTRY ]
      	struct br_mdb_entry
      [ MDBA_SET_ENTRY_ATTRS ]
      	[ MDBE_ATTR_SOURCE ]
      		struct in_addr / struct in6_addr
      	[ MDBE_ATTR_SRC_LIST ]		// new
      		[ MDBE_SRC_LIST_ENTRY ]
      			[ MDBE_SRCATTR_ADDRESS ]
      				struct in_addr / struct in6_addr
      		[ ...]
      	[ MDBE_ATTR_GROUP_MODE ]	// new
      		u8
      
      Response:
      
      [ struct nlmsghdr ]
      [ struct br_port_msg ]
      [ MDBA_MDB ]
      	[ MDBA_MDB_ENTRY ]
      		[ MDBA_MDB_ENTRY_INFO ]
      			struct br_mdb_entry
      		[ MDBA_MDB_EATTR_TIMER ]
      			u32
      		[ MDBA_MDB_EATTR_SOURCE ]
      			struct in_addr / struct in6_addr
      		[ MDBA_MDB_EATTR_RTPROT ]
      			u8
      		[ MDBA_MDB_EATTR_SRC_LIST ]
      			[ MDBA_MDB_SRCLIST_ENTRY ]
      				[ MDBA_MDB_SRCATTR_ADDRESS ]
      					struct in_addr / struct in6_addr
      				[ MDBA_MDB_SRCATTR_TIMER ]
      					u8
      			[...]
      		[ MDBA_MDB_EATTR_GROUP_MODE ]
      			u8
      
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Acked-by: default avatarNikolay Aleksandrov <razor@blackwall.org>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      6afaae6d
    • Xin Long's avatar
      net: add IFF_NO_ADDRCONF and use it in bonding to prevent ipv6 addrconf · 8a321cf7
      Xin Long authored
      
      Currently, in bonding it reused the IFF_SLAVE flag and checked it
      in ipv6 addrconf to prevent ipv6 addrconf.
      
      However, it is not a proper flag to use for no ipv6 addrconf, for
      bonding it has to move IFF_SLAVE flag setting ahead of dev_open()
      in bond_enslave(). Also, IFF_MASTER/SLAVE are historical flags
      used in bonding and eql, as Jiri mentioned, the new devices like
      Team, Failover do not use this flag.
      
      So as Jiri suggested, this patch adds IFF_NO_ADDRCONF in priv_flags
      of the device to indicate no ipv6 addconf, and uses it in bonding
      and moves IFF_SLAVE flag setting back to its original place.
      
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      8a321cf7
    • Yunsheng Lin's avatar
      net: tso: inline tso_count_descs() · d7b061b8
      Yunsheng Lin authored
      
      tso_count_descs() is a small function doing simple calculation,
      and tso_count_descs() is used in fast path, so inline it to
      reduce the overhead of calls.
      
      Signed-off-by: default avatarYunsheng Lin <linyunsheng@huawei.com>
      Link: https://lore.kernel.org/r/20221212032426.16050-1-linyunsheng@huawei.com
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      d7b061b8
    • Jaegeuk Kim's avatar
      f2fs: add block_age-based extent cache · 71644dff
      Jaegeuk Kim authored
      
      This patch introduces a runtime hot/cold data separation method
      for f2fs, in order to improve the accuracy for data temperature
      classification, reduce the garbage collection overhead after
      long-term data updates.
      
      Enhanced hot/cold data separation can record data block update
      frequency as "age" of the extent per inode, and take use of the age
      info to indicate better temperature type for data block allocation:
       - It records total data blocks allocated since mount;
       - When file extent has been updated, it calculate the count of data
      blocks allocated since last update as the age of the extent;
       - Before the data block allocated, it searches for the age info and
      chooses the suitable segment for allocation.
      
      Test and result:
       - Prepare: create about 30000 files
        * 3% for cold files (with cold file extension like .apk, from 3M to 10M)
        * 50% for warm files (with random file extension like .FcDxq, from 1K
      to 4M)
        * 47% for hot files (with hot file extension like .db, from 1K to 256K)
       - create(5%)/random update(90%)/delete(5%) the files
        * total write amount is about 70G
        * fsync will be called for .db files, and buffered write will be used
      for other files
      
      The storage of test device is large enough(128G) so that it will not
      switch to SSR mode during the test.
      
      Benefit: dirty segment count increment reduce about 14%
       - before: Dirty +21110
       - after:  Dirty +18286
      
      Signed-off-by: default avatarqixiaoyu1 <qixiaoyu1@xiaomi.com>
      Signed-off-by: default avatarxiongping1 <xiongping1@xiaomi.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      71644dff
    • Jaegeuk Kim's avatar
      f2fs: refactor extent_cache to support for read and more · e7547dac
      Jaegeuk Kim authored
      
      This patch prepares extent_cache to be ready for addition.
      
      Reviewed-by: default avatarChao Yu <chao@kernel.org>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      e7547dac
    • Sven Peter's avatar
      Bluetooth: Add quirk to disable MWS Transport Configuration · ffcb0a44
      Sven Peter authored
      
      Broadcom 4378/4387 controllers found in Apple Silicon Macs claim to
      support getting MWS Transport Layer Configuration,
      
      < HCI Command: Read Local Supported... (0x04|0x0002) plen 0
      > HCI Event: Command Complete (0x0e) plen 68
            Read Local Supported Commands (0x04|0x0002) ncmd 1
              Status: Success (0x00)
      [...]
                Get MWS Transport Layer Configuration (Octet 30 - Bit 3)]
      [...]
      
      , but then don't actually allow the required command:
      
      > HCI Event: Command Complete (0x0e) plen 15
            Get MWS Transport Layer Configuration (0x05|0x000c) ncmd 1
              Status: Command Disallowed (0x0c)
              Number of transports: 0
              Baud rate list: 0 entries
              00 00 00 00 00 00 00 00 00 00
      
      Signed-off-by: Sven Peter's avatarSven Peter <sven@svenpeter.dev>
      Signed-off-by: default avatarLuiz Augusto von Dentz <luiz.von.dentz@intel.com>
      ffcb0a44
    • Sven Peter's avatar
      Bluetooth: Add quirk to disable extended scanning · 392fca35
      Sven Peter authored
      
      Broadcom 4377 controllers found in Apple x86 Macs with the T2 chip
      claim to support extended scanning when querying supported states,
      
      < HCI Command: LE Read Supported St.. (0x08|0x001c) plen 0
      > HCI Event: Command Complete (0x0e) plen 12
            LE Read Supported States (0x08|0x001c) ncmd 1
              Status: Success (0x00)
              States: 0x000003ffffffffff
      [...]
                LE Set Extended Scan Parameters (Octet 37 - Bit 5)
                LE Set Extended Scan Enable (Octet 37 - Bit 6)
      [...]
      
      , but then fail to actually implement the extended scanning:
      
      < HCI Command: LE Set Extended Sca.. (0x08|0x0041) plen 8
              Own address type: Random (0x01)
              Filter policy: Accept all advertisement (0x00)
              PHYs: 0x01
              Entry 0: LE 1M
                Type: Active (0x01)
                Interval: 11.250 msec (0x0012)
                Window: 11.250 msec (0x0012)
      > HCI Event: Command Complete (0x0e) plen 4
            LE Set Extended Scan Parameters (0x08|0x0041) ncmd 1
              Status: Unknown HCI Command (0x01)
      
      Signed-off-by: Sven Peter's avatarSven Peter <sven@svenpeter.dev>
      Signed-off-by: default avatarLuiz Augusto von Dentz <luiz.von.dentz@intel.com>
      392fca35
    • Sven Peter's avatar
      Bluetooth: hci_event: Ignore reserved bits in LE Extended Adv Report · ad38e55e
      Sven Peter authored
      
      Broadcom controllers present on Apple Silicon devices use the upper
      8 bits of the event type in the LE Extended Advertising Report for
      the channel on which the frame has been received.
      These bits are reserved according to the Bluetooth spec anyway such that
      we can just drop them to ensure that the advertising results are parsed
      correctly.
      
      The following excerpt from a btmon trace shows a report received on
      channel 37 by these controllers:
      
      > HCI Event: LE Meta Event (0x3e) plen 55
            LE Extended Advertising Report (0x0d)
              Num reports: 1
              Entry 0
                Event type: 0x2513
                  Props: 0x0013
                    Connectable
                    Scannable
                    Use legacy advertising PDUs
                  Data status: Complete
                  Reserved (0x2500)
                Legacy PDU Type: Reserved (0x2513)
                Address type: Public (0x00)
                Address: XX:XX:XX:XX:XX:XX (Shenzhen Jingxun Software [...])
                Primary PHY: LE 1M
                Secondary PHY: No packets
                SID: no ADI field (0xff)
                TX power: 127 dBm
                RSSI: -76 dBm (0xb4)
                Periodic advertising interval: 0.00 msec (0x0000)
                Direct address type: Public (0x00)
                Direct address: 00:00:00:00:00:00 (OUI 00-00-00)
                Data length: 0x1d
                [...]
              Flags: 0x18
                Simultaneous LE and BR/EDR (Controller)
                Simultaneous LE and BR/EDR (Host)
              Company: Harman International Industries, Inc. (87)
                Data: [...]
              Service Data (UUID 0xfddf):
              Name (complete): JBL Flip 5
      
      Signed-off-by: Sven Peter's avatarSven Peter <sven@svenpeter.dev>
      Signed-off-by: default avatarLuiz Augusto von Dentz <luiz.von.dentz@intel.com>
      ad38e55e
Loading