1. 09 Oct, 2008 2 commits
    • Tejun Heo's avatar
      block: move stats from disk to part0 · 074a7aca
      Tejun Heo authored
      Move stats related fields - stamp, in_flight, dkstats - from disk to
      part0 and unify stat handling such that...
      
      * part_stat_*() now updates part0 together if the specified partition
        is not part0.  ie. part_stat_*() are now essentially all_stat_*().
      
      * {disk|all}_stat_*() are gone.
      
      * part_round_stats() is updated similary.  It handles part0 stats
        automatically and disk_round_stats() is killed.
      
      * part_{inc|dec}_in_fligh() is implemented which automatically updates
        part0 stats for parts other than part0.
      
      * disk_map_sector_rcu() is updated to return part0 if no part matches.
        Combined with the above changes, this makes NULL special case
        handling in callers unnecessary.
      
      * Separate stats show code paths for disk are collapsed into part
        stats show code paths.
      
      * Rename disk_stat_lock/unlock() to part_stat_lock/unlock()
      
      While at it, reposition stat handling macros a bit and add missing
      parentheses around macro parameters.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarJens Axboe <jens.axboe@oracle.com>
      074a7aca
    • Tejun Heo's avatar
      block: fix diskstats access · c9959059
      Tejun Heo authored
      There are two variants of stat functions - ones prefixed with double
      underbars which don't care about preemption and ones without which
      disable preemption before manipulating per-cpu counters.  It's unclear
      whether the underbarred ones assume that preemtion is disabled on
      entry as some callers don't do that.
      
      This patch unifies diskstats access by implementing disk_stat_lock()
      and disk_stat_unlock() which take care of both RCU (for partition
      access) and preemption (for per-cpu counter access).  diskstats access
      should always be enclosed between the two functions.  As such, there's
      no need for the versions which disables preemption.  They're removed
      and double underbars ones are renamed to drop the underbars.  As an
      extra argument is added, there's no danger of using the old version
      unconverted.
      
      disk_stat_lock() uses get_cpu() and returns the cpu index and all
      diskstat functions which access per-cpu counters now has @cpu
      argument to help RT.
      
      This change adds RCU or preemption operations at some places but also
      collapses several preemption ops into one at others.  Overall, the
      performance difference should be negligible as all involved ops are
      very lightweight per-cpu ones.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Signed-off-by: default avatarJens Axboe <jens.axboe@oracle.com>
      c9959059
  2. 21 Jul, 2008 1 commit
  3. 27 Jun, 2008 2 commits
  4. 24 May, 2008 1 commit
    • NeilBrown's avatar
      md: restart recovery cleanly after device failure. · dfc70645
      NeilBrown authored
      When we get any IO error during a recovery (rebuilding a spare), we abort
      the recovery and restart it.
      
      For RAID6 (and multi-drive RAID1) it may not be best to restart at the
      beginning: when multiple failures can be tolerated, the recovery may be
      able to continue and re-doing all that has already been done doesn't make
      sense.
      
      We already have the infrastructure to record where a recovery is up to
      and restart from there, but it is not being used properly.
      This is because:
        - We sometimes abort with MD_RECOVERY_ERR rather than just MD_RECOVERY_INTR,
          which causes the recovery not be be checkpointed.
        - We remove spares and then re-added them which loses important state
          information.
      
      The distinction between MD_RECOVERY_ERR and MD_RECOVERY_INTR really isn't
      needed.  If there is an error, the relevant drive will be marked as
      Faulty, and that is enough to ensure correct handling of the error.  So we
      first remove MD_RECOVERY_ERR, changing some of the uses of it to
      MD_RECOVERY_INTR.
      
      Then we cause the attempt to remove a non-faulty device from an array to
      fail (unless recovery is impossible as the array is too degraded).  Then
      when remove_and_add_spares attempts to remove the devices on which
      recovery can continue, it will fail, they will remain in place, and
      recovery will continue on them as desired.
      
      Issue:  If we are halfway through rebuilding a spare and another drive
      fails, and a new spare is immediately available,  do we want to:
       1/ complete the current rebuild, then go back and rebuild the new spare or
       2/ restart the rebuild from the start and rebuild both devices in
          parallel.
      
      Both options can be argued for.  The code currently takes option 2 as
        a/ this requires least code change
        b/ this results in a minimally-degraded array in minimal time.
      
      Cc: "Eivind Sarto" <ivan@kasenna.com>
      Signed-off-by: default avatarNeil Brown <neilb@suse.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      dfc70645
  5. 15 May, 2008 1 commit
    • Neil Brown's avatar
      Remove blkdev warning triggered by using md · e7e72bf6
      Neil Brown authored
      As setting and clearing queue flags now requires that we hold a spinlock
      on the queue, and as blk_queue_stack_limits is called without that lock,
      get the lock inside blk_queue_stack_limits.
      
      For blk_queue_stack_limits to be able to find the right lock, each md
      personality needs to set q->queue_lock to point to the appropriate lock.
      Those personalities which didn't previously use a spin_lock, us
      q->__queue_lock.  So always initialise that lock when allocated.
      
      With this in place, setting/clearing of the QUEUE_FLAG_PLUGGED bit will no
      longer cause warnings as it will be clear that the proper lock is held.
      
      Thanks to Dan Williams for review and fixing the silly bugs.
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Jens Axboe <jens.axboe@oracle.com>
      Cc: Alistair John Strachan <alistair@devzero.co.uk>
      Cc: Nick Piggin <npiggin@suse.de>
      Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
      Cc: Jacek Luczak <difrost.kernel@gmail.com>
      Cc: Prakash Punnoor <prakash@punnoor.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      e7e72bf6
  6. 28 Apr, 2008 1 commit
  7. 06 Feb, 2008 1 commit
  8. 09 Nov, 2007 1 commit
  9. 16 Oct, 2007 1 commit
  10. 10 Oct, 2007 1 commit
  11. 24 Jul, 2007 1 commit
  12. 28 Oct, 2006 1 commit
  13. 21 Oct, 2006 1 commit
  14. 03 Oct, 2006 2 commits
    • NeilBrown's avatar
      [PATCH] md: define ->congested_fn for raid1, raid10, and multipath · 0d129228
      NeilBrown authored
      raid1, raid10 and multipath don't report their 'congested' status through
      bdi_*_congested, but should.
      
      This patch adds the appropriate functions which just check the 'congested'
      status of all active members (with appropriate locking).
      
      raid1 read_balance should be modified to prefer devices where
      bdi_read_congested returns false.  Then we could use the '&' branch rather
      than the '|' branch.  However that should would need some benchmarking first
      to make sure it is actually a good idea.
      Signed-off-by: default avatarNeil Brown <neilb@suse.de>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      0d129228
    • NeilBrown's avatar
      [PATCH] md: replace magic numbers in sb_dirty with well defined bit flags · 850b2b42
      NeilBrown authored
      Instead of magic numbers (0,1,2,3) in sb_dirty, we have
      some flags instead:
      MD_CHANGE_DEVS
         Some device state has changed requiring superblock update
         on all devices.
      MD_CHANGE_CLEAN
         The array has transitions from 'clean' to 'dirty' or back,
         requiring a superblock update on active devices, but possibly
         not on spares
      MD_CHANGE_PENDING
         A superblock update is underway.
      
      We wait for an update to complete by waiting for all flags to be clear.  A
      flag can be set at any time, even during an update, without risk that the
      change will be lost.
      
      Stop exporting md_update_sb - isn't needed.
      Signed-off-by: default avatarNeil Brown <neilb@suse.de>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      850b2b42
  15. 26 Mar, 2006 1 commit
  16. 10 Jan, 2006 1 commit
  17. 06 Jan, 2006 3 commits
  18. 09 Nov, 2005 2 commits
  19. 01 Nov, 2005 1 commit
    • Jens Axboe's avatar
      [BLOCK] Unify the seperate read/write io stat fields into arrays · a362357b
      Jens Axboe authored
      Instead of having ->read_sectors and ->write_sectors, combine the two
      into ->sectors[2] and similar for the other fields. This saves a branch
      several places in the io path, since we don't have to care for what the
      actual io direction is. On my x86-64 box, that's 200 bytes less text in
      just the core (not counting the various drivers).
      Signed-off-by: default avatarJens Axboe <axboe@suse.de>
      a362357b
  20. 08 Oct, 2005 1 commit
  21. 09 Sep, 2005 1 commit
  22. 22 Jun, 2005 1 commit
  23. 17 May, 2005 1 commit
  24. 05 May, 2005 1 commit
  25. 01 May, 2005 1 commit
  26. 16 Apr, 2005 1 commit
    • Linus Torvalds's avatar
      Linux-2.6.12-rc2 · 1da177e4
      Linus Torvalds authored
      Initial git repository build. I'm not bothering with the full history,
      even though we have it. We can create a separate "historical" git
      archive of that later if we want to, and in the meantime it's about
      3.2GB when imported into git - space that would just make the early
      git days unnecessarily complicated, when we don't have a lot of good
      infrastructure for it.
      
      Let it rip!
      1da177e4