1. 24 Sep, 2009 1 commit
    • Moni Shoua's avatar
      IPoIB: Don't turn on carrier for a non-active port · 5ee95120
      Moni Shoua authored
      Multicast joins can succeed even if the IB port is down.  This happens
      when the SM runs on the same port with the requesting port.  However,
      IPoIB calls netif_carrier_on() when the join of the broadcast group
      succeeds, without caring about the state of the IB port.  The result
      is an IPoIB interface in RUNNING state but without an active IB port
      to support it.
      
      If a bonding interface uses this IPoIB interface as a slave it might
      not detect that this slave is almost useless and failover
      functionality will be damaged.  The fix checks the state of the IB
      port in the carrier_task before calling netif_carrier_on().
      
      Adresses: https://bugs.openfabrics.org/show_bug.cgi?id=1726
      
      Signed-off-by: default avatarMoni Shoua <monis@voltaire.com>
      Signed-off-by: default avatarRoland Dreier <rolandd@cisco.com>
      5ee95120
  2. 06 Sep, 2009 2 commits
  3. 03 Jun, 2009 1 commit
  4. 16 Jan, 2009 1 commit
  5. 13 Jan, 2009 1 commit
  6. 29 Oct, 2008 2 commits
  7. 30 Sep, 2008 1 commit
    • Roland Dreier's avatar
      IPoIB: Use netif_tx_lock() and get rid of private tx_lock, LLTX · 943c246e
      Roland Dreier authored
      
      
      Currently, IPoIB is an LLTX driver that uses its own IRQ-disabling
      tx_lock.  Not only do we want to get rid of LLTX, this actually causes
      problems because of the skb_orphan() done with this tx_lock held: some
      skb destructors expect to be run with interrupts enabled.
      
      The simplest fix for this is to get rid of the driver-private tx_lock
      and stop using LLTX.  We kill off priv->tx_lock and use
      netif_tx_lock[_bh]() instead; the patch to do this is a tiny bit
      tricky because we need to update places that take priv->lock inside
      the tx_lock to disable IRQs, rather than relying on tx_lock having
      already disabled IRQs.
      
      Also, there are a couple of places where we need to disable BHs to
      make sure we have a consistent context to call netif_tx_lock() (since
      we no longer can use _irqsave() variants), and we also have to change
      ipoib_send_comp_handler() to call drain_tx_cq() through a timer rather
      than directly, because ipoib_send_comp_handler() runs in interrupt
      context and drain_tx_cq() must run in BH context so it can call
      netif_tx_lock().
      Signed-off-by: default avatarRoland Dreier <rolandd@cisco.com>
      943c246e
  8. 16 Sep, 2008 1 commit
    • Yossi Etigin's avatar
      IPoIB: Fix deadlock on RTNL between bcast join comp and ipoib_stop() · e8224e4b
      Yossi Etigin authored
      Taking rtnl_lock in ipoib_mcast_join_complete() causes a deadlock with
      ipoib_stop().  We avoid it by scheduling the piece of code that takes
      the lock on ipoib_workqueue instead of executing it directly.  This
      works because we only flush the ipoib_workqueue with the RTNL not held.
      
      The deadlock happens because ipoib_stop() calls ipoib_ib_dev_down()
      which calls ipoib_mcast_dev_flush(), which calls ipoib_mcast_free(),
      which calls ipoib_mcast_leave(). The latter calls
      ib_sa_free_multicast(), and this waits until the multicast completion
      handler finishes.  This handler is ipoib_mcast_join_complete(), which
      waits for the rtnl_lock(), which was already taken by ipoib_stop().
      
      This bug was introduced in commit a77a57a1
      
       ("IPoIB: Fix deadlock on
      RTNL in ipoib_stop()").
      Signed-off-by: default avatarYossi Etigin <yosefe@voltaire.com>
      Signed-off-by: default avatarRoland Dreier <rolandd@cisco.com>
      e8224e4b
  9. 19 Aug, 2008 1 commit
    • Roland Dreier's avatar
      IPoIB: Fix deadlock on RTNL in ipoib_stop() · a77a57a1
      Roland Dreier authored
      Commit c8c2afe3 ("IPoIB: Use rtnl lock/unlock when changing device
      flags") added a call to rtnl_lock() in ipoib_mcast_join_task(), which
      is run from the ipoib_workqueue.  However, ipoib_stop() (which is run
      inside rtnl_lock()) flushes this workqueue, which leads to a deadlock
      if the join task is pending.
      
      Fix this by simply not flushing the workqueue from ipoib_stop().  It
      turns out that we really don't care about workqueue tasks running
      during or after ipoib_stop(), as long as we make sure to flush the
      workqueue before unregistering a netdev.
      
      This fixes <https://bugs.openfabrics.org/show_bug.cgi?id=1114
      
      >.
      Signed-off-by: default avatarRoland Dreier <rolandd@cisco.com>
      a77a57a1
  10. 15 Jul, 2008 8 commits
  11. 20 May, 2008 1 commit
  12. 23 Apr, 2008 1 commit
  13. 11 Mar, 2008 1 commit
    • Or Gerlitz's avatar
      IPoIB: Don't drop multicast sends when they can be queued · b3e2749b
      Or Gerlitz authored
      
      
      When set_multicast_list() is called the multicast task is restarted
      and the IPOIB_MCAST_STARTED bit is cleared.  As a result for some
      window of time, multicast packets are not transmitted nor queued but
      rather dropped by ipoib_mcast_send().  These dropped packets are
      painful in two cases:
      
       - bonding fail-over which both calls set_multicast_list() on the new
         active slave and sends Gratuitous ARP through that slave.
      
       - IP_DROP_MEMBERSHIP code which both calls set_multicast_list() on the
         device and issues IGMP leave.
      
      In both these cases, depending on the scheduling of the IPoIB
      multicast task, the packets would be dropped.  As a result, in the
      bonding case, the failover would not be detected by the peers until
      their neighbour is renewed the neighbour (which takes a few tens of
      seconds).  In the IGMP case, the IP router doesn't get an IGMP leave
      and would only learn on that from further probes on the group (also a
      delay of at least a few tens of seconds).
      
      Fix this by allowing transmission (or queuing) depending on the
      IPOIB_FLAG_OPER_UP flag instead of the IPOIB_MCAST_STARTED flag.
      Signed-off-by: default avatarOlga Shern <olgas@voltaire.com>
      Signed-off-by: default avatarOr Gerlitz <ogerlitz@voltaire.com>
      Signed-off-by: default avatarRoland Dreier <rolandd@cisco.com>
      b3e2749b
  14. 25 Jan, 2008 2 commits
  15. 15 Oct, 2007 1 commit
    • Moni Shoua's avatar
      IB/ipoib: Bound the net device to the ipoib_neigh structue · 732a2170
      Moni Shoua authored
      
      
      IPoIB uses a two layer neighboring scheme, such that for each struct neighbour
      whose device is an ipoib one, there is a struct ipoib_neigh buddy which is
      created on demand at the tx flow by an ipoib_neigh_alloc(skb->dst->neighbour)
      call.
      
      When using the bonding driver, neighbours are created by the net stack on behalf
      of the bonding (master) device. On the tx flow the bonding code gets an skb such
      that skb->dev points to the master device, it changes this skb to point on the
      slave device and calls the slave hard_start_xmit function.
      
      Under this scheme, ipoib_neigh_destructor assumption that for each struct
      neighbour it gets, n->dev is an ipoib device and hence netdev_priv(n->dev)
      can be casted to struct ipoib_dev_priv is buggy.
      
      To fix it, this patch adds a dev field to struct ipoib_neigh which is used
      instead of the struct neighbour dev one, when n->dev->flags has the
      IFF_MASTER bit set.
      
      Signed-off-by: Moni Shoua <monis at voltaire.com>
      Signed-off-by: Or Gerlitz <ogerlitz at voltaire.com>
      Acked-by: default avatarRoland Dreier <rdreier@cisco.com>
      Signed-off-by: default avatarJeff Garzik <jeff@garzik.org>
      732a2170
  16. 10 Oct, 2007 3 commits
  17. 21 May, 2007 1 commit
  18. 22 Mar, 2007 1 commit
  19. 08 Mar, 2007 1 commit
  20. 22 Feb, 2007 1 commit
  21. 16 Feb, 2007 1 commit
    • Sean Hefty's avatar
      IB/sa: Track multicast join/leave requests · faec2f7b
      Sean Hefty authored
      
      
      The IB SA tracks multicast join/leave requests on a per port basis and
      does not do any reference counting: if two users of the same port join
      the same group, and one leaves that group, then the SA will remove the
      port from the group even though there is one user who wants to stay a
      member left.  Therefore, in order to support multiple users of the
      same multicast group from the same port, we need to perform reference
      counting locally.
      
      To do this, add an multicast submodule to ib_sa to perform reference
      counting of multicast join/leave operations.  Modify ib_ipoib (the
      only in-kernel user of multicast) to use the new interface.
      Signed-off-by: default avatarRoland Dreier <rolandd@cisco.com>
      faec2f7b
  22. 10 Feb, 2007 1 commit
    • Michael S. Tsirkin's avatar
      IPoIB: Connected mode experimental support · 839fcaba
      Michael S. Tsirkin authored
      
      
      The following patch adds experimental support for IPoIB connected
      mode, as defined by the draft from the IETF ipoib working group.  The
      idea is to increase performance by increasing the MTU from the maximum
      of 2K (theoretically 4K) supported by IPoIB on top of UD.  With this
      code, I'm able to get 800MByte/sec or more with netperf without
      options on a Mellanox 4x back-to-back DDR system.
      
      Some notes on code:
      1. SRQ is used for scalability to large cluster sizes
      2. Only RC connections are used (UC does not support SRQ now)
      3. Retry count is set to 0 since spec draft warns against retries
      4. Each connection is used for data transfers in only 1 direction, so
         each connection is either active(TX) or passive (RX).  2 sides that
         want to communicate create 2 connections.
      5. Each active (TX) connection has a separate CQ for send completions -
         this keeps the code simple without CQ resize and other tricks
      6. To detect stale passive side connections (where the remote side is
         down), we keep an LRU list of passive connections (updated once per
         second per connection) and destroy a connection after it has been
         unused for several seconds. The LRU rule makes it possible to avoid
         scanning connections that have recently been active.
      Signed-off-by: default avatarMichael S. Tsirkin <mst@mellanox.co.il>
      Signed-off-by: default avatarRoland Dreier <rolandd@cisco.com>
      839fcaba
  23. 29 Nov, 2006 1 commit
  24. 22 Nov, 2006 1 commit
  25. 22 Sep, 2006 3 commits
    • Roland Dreier's avatar
      IPoIB: Create MCGs with all attributes required by RFC · d0df6d6d
      Roland Dreier authored
      
      
      RFC 4391 ("Transmission of IP over InfiniBand (IPoIB)") says:
      
        If the IB multicast group does not already exist, one must be
        created first with the IPoIB link MTU.  The MGID MUST use the same
        P_Key, Q_Key, SL, MTU, and HopLimit as those used in the
        broadcast-GID.  The rest of attributes SHOULD follow the values used
        in the broadcast-GID as well.
      
      However, the current IPoIB driver is only setting the attributes
      required by the InfiniBand spec to create a multicast group, so in
      particular the MTU and HopLimit are not being set.  Add these
      attributes when creating MCGs, and also set the Rate attribute, since
      IPoIB pays attention to that attribute as well.
      Signed-off-by: default avatarRoland Dreier <rolandd@cisco.com>
      d0df6d6d
    • Michael S. Tsirkin's avatar
      IB/sa: Require SA registration · c1a0b23b
      Michael S. Tsirkin authored
      
      
      Require users to register with SA module, to prevent the sa_query
      module text from going away while an SA query callback is still
      running.  Update all in-tree users for the new interface.
      Signed-off-by: default avatarMichael S. Tsirkin <mst@mellanox.co.il>
      Signed-off-by: default avatarSean Hefty <sean.hefty@intel.com>
      Signed-off-by: default avatarRoland Dreier <rolandd@cisco.com>
      c1a0b23b
    • Roland Dreier's avatar
      IB: Whitespace fixes · 3cd96564
      Roland Dreier authored
      
      
      Remove some trailing whitespace that has snuck in despite the best
      efforts of whitespace=error-all.  Also fix a few other whitespace
      bogosities.
      Signed-off-by: default avatarRoland Dreier <rolandd@cisco.com>
      3cd96564
  26. 14 Sep, 2006 1 commit