1. 24 Sep, 2009 1 commit
  2. 06 Sep, 2009 1 commit
    • Jack Morgenstein's avatar
      IB/mthca: Don't allow userspace open while recovering from catastrophic error · d8410647
      Jack Morgenstein authored
      Userspace apps are supposed to release all ib device resources if they
      receive a fatal async event (IBV_EVENT_DEVICE_FATAL).  However, the
      app has no way of knowing when the device has come back up, except to
      repeatedly attempt ibv_open_device() until it succeeds.
      However, currently there is no protection against the open succeeding
      while the device is in being removed following the fatal event.  In
      this case, the open will succeed, but as a result the device waits in
      the middle of its removal until the new app releases its resources --
      and the new app will not do so, since the open succeeded at a point
      following the fatal event generation.
      This patch adds an "active" flag to the device. The active flag is set
      to false (in the fatal event flow) before the "fatal" event is
      generated, so any subsequent ibv_dev_open() call to the device will
      fail until the device comes back up, thus preventing the above
      Signed-off-by: default avatarJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: default avatarRoland Dreier <rolandd@cisco.com>
  3. 30 Sep, 2008 1 commit
    • Roland Dreier's avatar
      IB/mthca: Use pci_request_regions() · 208dde28
      Roland Dreier authored
      Back in prehistoric (pre-git!) days, the kernel's MSI-X support did
      request_mem_region() on a device's MSI-X tables, which meant that a
      driver that enabled MSI-X couldn't use pci_request_regions() (since
      that would clash with the PCI layer's MSI-X request).
      However, that was removed (by me!) years ago, so mthca can just use
      pci_request_regions() and pci_release_regions() instead of its own
      much more complicated code that avoids requesting the MSI-X tables.
      Signed-off-by: default avatarRoland Dreier <rolandd@cisco.com>
  4. 15 Jul, 2008 3 commits
  5. 22 Nov, 2006 1 commit
  6. 22 Sep, 2006 1 commit
  7. 10 Nov, 2005 1 commit
  8. 07 Nov, 2005 1 commit
  9. 27 Oct, 2005 1 commit
    • Roland Dreier's avatar
      [IB] mthca: first pass at catastrophic error reporting · 3d155f8c
      Roland Dreier authored
      Add some initial support for detecting and reporting catastrophic
      errors reported by Mellanox HCAs.  We start a periodic timer which
      polls the catastrophic error reporting buffer in device memory.  If an
      error is detected, we dump the contents of the buffer for port-mortem
      debugging, and report a fatal asynchronous error to higher levels.
      In the future we can try to recover from these errors by resetting the
      device, but this will require some work in higher-level code as well.
      Let's get this in now, so that we at least get catastrophic errors
      reported in logs.
      Signed-off-by: default avatarRoland Dreier <rolandd@cisco.com>