1. 09 Oct, 2008 4 commits
    • Tejun Heo's avatar
      block: move stats from disk to part0 · 074a7aca
      Tejun Heo authored
      Move stats related fields - stamp, in_flight, dkstats - from disk to
      part0 and unify stat handling such that...
      
      * part_stat_*() now updates part0 together if the specified partition
        is not part0.  ie. part_stat_*() are now essentially all_stat_*().
      
      * {disk|all}_stat_*() are gone.
      
      * part_round_stats() is updated similary.  It handles part0 stats
        automatically and disk_round_stats() is killed.
      
      * part_{inc|dec}_in_fligh() is implemented which automatically updates
        part0 stats for parts other than part0.
      
      * disk_map_sector_rcu() is updated to return part0 if no part matches.
        Combined with the above changes, this makes NULL special case
        handling in callers unnecessary.
      
      * Separate stats show code paths for disk are collapsed into part
        stats show code paths.
      
      * Rename disk_stat_lock/unlock() to part_stat_lock/unlock()
      
      While at it, reposition stat handling macros a bit and add missing
      parentheses around macro parameters.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarJens Axboe <jens.axboe@oracle.com>
      074a7aca
    • Tejun Heo's avatar
      block: implement and use {disk|part}_to_dev() · ed9e1982
      Tejun Heo authored
      Implement {disk|part}_to_dev() and use them to access generic device
      instead of directly dereferencing {disk|part}->dev.  To make sure no
      user is left behind, rename generic devices fields to __dev.
      
      This is in preparation of unifying partition 0 handling with other
      partitions.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarJens Axboe <jens.axboe@oracle.com>
      ed9e1982
    • Tejun Heo's avatar
      block: fix diskstats access · c9959059
      Tejun Heo authored
      There are two variants of stat functions - ones prefixed with double
      underbars which don't care about preemption and ones without which
      disable preemption before manipulating per-cpu counters.  It's unclear
      whether the underbarred ones assume that preemtion is disabled on
      entry as some callers don't do that.
      
      This patch unifies diskstats access by implementing disk_stat_lock()
      and disk_stat_unlock() which take care of both RCU (for partition
      access) and preemption (for per-cpu counter access).  diskstats access
      should always be enclosed between the two functions.  As such, there's
      no need for the versions which disables preemption.  They're removed
      and double underbars ones are renamed to drop the underbars.  As an
      extra argument is added, there's no danger of using the old version
      unconverted.
      
      disk_stat_lock() uses get_cpu() and returns the cpu index and all
      diskstat functions which access per-cpu counters now has @cpu
      argument to help RT.
      
      This change adds RCU or preemption operations at some places but also
      collapses several preemption ops into one at others.  Overall, the
      performance difference should be negligible as all involved ops are
      very lightweight per-cpu ones.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Signed-off-by: default avatarJens Axboe <jens.axboe@oracle.com>
      c9959059
    • Tejun Heo's avatar
      block: don't depend on consecutive minor space · f331c029
      Tejun Heo authored
      * Implement disk_devt() and part_devt() and use them to directly
        access devt instead of computing it from ->major and ->first_minor.
      
        Note that all references to ->major and ->first_minor outside of
        block layer is used to determine devt of the disk (the part0) and as
        ->major and ->first_minor will continue to represent devt for the
        disk, converting these users aren't strictly necessary.  However,
        convert them for consistency.
      
      * Implement disk_max_parts() to avoid directly deferencing
        genhd->minors.
      
      * Update bdget_disk() such that it doesn't assume consecutive minor
        space.
      
      * Move devt computation from register_disk() to add_disk() and make it
        the only one (all other usages use the initially determined value).
      
      These changes clean up the code and will help disk->part dereference
      fix and extended block device numbers.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarJens Axboe <jens.axboe@oracle.com>
      f331c029
  2. 01 Oct, 2008 2 commits
  3. 21 Jul, 2008 2 commits
  4. 25 Apr, 2008 2 commits
  5. 08 Feb, 2008 9 commits
  6. 25 Jan, 2008 1 commit
    • Kay Sievers's avatar
      Driver core: convert block from raw kobjects to core devices · edfaa7c3
      Kay Sievers authored
      This moves the block devices to /sys/class/block. It will create a
      flat list of all block devices, with the disks and partitions in one
      directory. For compatibility /sys/block is created and contains symlinks
      to the disks.
      
        /sys/class/block
        |-- sda -> ../../devices/pci0000:00/0000:00:1f.2/host0/target0:0:0/0:0:0:0/block/sda
        |-- sda1 -> ../../devices/pci0000:00/0000:00:1f.2/host0/target0:0:0/0:0:0:0/block/sda/sda1
        |-- sda10 -> ../../devices/pci0000:00/0000:00:1f.2/host0/target0:0:0/0:0:0:0/block/sda/sda10
        |-- sda5 -> ../../devices/pci0000:00/0000:00:1f.2/host0/target0:0:0/0:0:0:0/block/sda/sda5
        |-- sda6 -> ../../devices/pci0000:00/0000:00:1f.2/host0/target0:0:0/0:0:0:0/block/sda/sda6
        |-- sda7 -> ../../devices/pci0000:00/0000:00:1f.2/host0/target0:0:0/0:0:0:0/block/sda/sda7
        |-- sda8 -> ../../devices/pci0000:00/0000:00:1f.2/host0/target0:0:0/0:0:0:0/block/sda/sda8
        |-- sda9 -> ../../devices/pci0000:00/0000:00:1f.2/host0/target0:0:0/0:0:0:0/block/sda/sda9
        `-- sr0 -> ../../devices/pci0000:00/0000:00:1f.2/host1/target1:0:0/1:0:0:0/block/sr0
      
        /sys/block/
        |-- sda -> ../devices/pci0000:00/0000:00:1f.2/host0/target0:0:0/0:0:0:0/block/sda
        `-- sr0 -> ../devices/pci0000:00/0000:00:1f.2/host1/target1:0:0/1:0:0:0/block/sr0
      Signed-off-by: Kay Sievers's avatarKay Sievers <kay.sievers@vrfy.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      edfaa7c3
  7. 20 Dec, 2007 2 commits
    • Alasdair G Kergon's avatar
      dm: trigger change uevent on rename · 69267a30
      Alasdair G Kergon authored
      Insert a missing KOBJ_CHANGE notification when a device is renamed.
      
      Cc: Scott James Remnant <scott@ubuntu.com>
      Signed-off-by: default avatarAlasdair G Kergon <agk@redhat.com>
      69267a30
    • Jun'ichi Nomura's avatar
      dm: table detect io beyond device · 512875bd
      Jun'ichi Nomura authored
      This patch fixes a panic on shrinking a DM device if there is
      outstanding I/O to the part of the device that is being removed.
      (Normally this doesn't happen - a filesystem would be resized first,
      for example.)
      
      The bug is that __clone_and_map() assumes dm_table_find_target()
      always returns a valid pointer.  It may fail if a bio arrives from the
      block layer but its target sector is no longer included in the DM
      btree.
      
      This patch appends an empty entry to table->targets[] which will
      be returned by a lookup beyond the end of the device.
      
      After calling dm_table_find_target(), __clone_and_map() and target_message()
      check for this condition using
      dm_target_is_valid().
      
      Sample test script to trigger oops:
      512875bd
  8. 20 Oct, 2007 4 commits
    • Mike Anderson's avatar
      dm: uevent generate events · 7a8c3d3b
      Mike Anderson authored
      This patch adds support for the dm_path_event dm_send_event functions which
      create and send udev events.
      Signed-off-by: default avatarMike Anderson <andmike@linux.vnet.ibm.com>
      Signed-off-by: default avatarAlasdair G Kergon <agk@redhat.com>
      7a8c3d3b
    • Mike Anderson's avatar
      dm: add uevent to core · 51e5b2bd
      Mike Anderson authored
      This patch adds a uevent skeleton to device-mapper.
      Signed-off-by: default avatarMike Anderson <andmike@linux.vnet.ibm.com>
      Signed-off-by: default avatarAlasdair G Kergon <agk@redhat.com>
      51e5b2bd
    • Milan Broz's avatar
      dm: tidy bio_io_error usage · 9e4e5f87
      Milan Broz authored
      Use bio_io_error() in only two places and tidy the code,
      preparing for later patches.
      
      There is no functional change in this patch.
      Signed-off-by: default avatarMilan Broz <mbroz@redhat.com>
      Signed-off-by: default avatarAlasdair G Kergon <agk@redhat.com>
      9e4e5f87
    • Jun'ichi Nomura's avatar
      dm: fix thaw_bdev · ae9da83f
      Jun'ichi Nomura authored
      This patch fixes a bd_mount_sem counter corruption bug in device-mapper.
      
      thaw_bdev() should be called only when freeze_bdev() was called for the
      device.
      Otherwise, thaw_bdev() will up bd_mount_sem and corrupt the semaphore counter.
      struct block_device with the corrupted semaphore may remain in slab cache
      and be reused later.
      
      Attached patch will fix it by calling unlock_fs() instead.
      unlock_fs() will determine whether it should call thaw_bdev()
      by checking the device is frozen or not.
      
      Easy reproducer is:
        #!/bin/sh
        while [ 1 ]; do
           dmsetup --notable create a
           dmsetup --nolockfs suspend a
           dmsetup remove a
        done
      
      It's not easy to see the effect of corrupted semaphore.
      So I have tested with putting printk below in bdev_alloc_inode():
              if (atomic_read(&ei->bdev.bd_mount_sem.count) != 1)
                      printk(KERN_DEBUG "Incorrect semaphore count = %d (%p)\n",
                              atomic_read(&ei->bdev.bd_mount_sem.count),
                              &ei->bdev);
      
      Without the patch, I saw something like:
       Incorrect semaphore count = 17 (f2ab91c0)
      
      With the patch, the message didn't appear.
      
      The bug was introduced in 2.6.16 with this bug fix:
      
      commit d9dde59b
      Date:   Fri Feb 24 13:04:24 2006 -0800
      
          [PATCH] dm: missing bdput/thaw_bdev at removal
      
          Need to unfreeze and release bdev otherwise the bdev inode with
          inconsistent state is reused later and cause problem.
      
      and backported to 2.6.15.5.
      
      It occurs only in free_dev(), which is called only when the dm device is
      removed.  The buggy code is executed only if md->suspended_bdev is
      non-NULL and that can happen only when the device was suspended without
      noflush.
      Signed-off-by: default avatarJun'ichi Nomura <j-nomura@ce.jp.nec.com>
      Signed-off-by: default avatarAlasdair G Kergon <agk@redhat.com>
      Cc: stable@kernel.org
      ae9da83f
  9. 16 Oct, 2007 1 commit
  10. 10 Oct, 2007 1 commit
  11. 11 Aug, 2007 1 commit
  12. 24 Jul, 2007 1 commit
  13. 17 Jul, 2007 1 commit
  14. 12 Jul, 2007 2 commits
  15. 09 May, 2007 1 commit
  16. 30 Apr, 2007 1 commit
    • Jens Axboe's avatar
      [BLOCK] Don't pin lots of memory in mempools · 5972511b
      Jens Axboe authored
      Currently we scale the mempool sizes depending on memory installed
      in the machine, except for the bio pool itself which sits at a fixed
      256 entry pre-allocation.
      
      There's really no point in "optimizing" this OOM path, we just need
      enough preallocated to make progress. A single unit is enough, lets
      scale it down to 2 just to be on the safe side.
      
      This patch saves ~150kb of pinned kernel memory on a 32-bit box.
      Signed-off-by: default avatarJens Axboe <jens.axboe@oracle.com>
      5972511b
  17. 26 Jan, 2007 1 commit
    • Jun'ichi Nomura's avatar
      [PATCH] dm-multipath: fix stall on noflush suspend/resume · bfa152fa
      Jun'ichi Nomura authored
      Allow noflush suspend/resume of device-mapper device only for the case
      where the device size is unchanged.
      
      Otherwise, dm-multipath devices can stall when resumed if noflush was used
      when suspending them, all paths have failed and queue_if_no_path is set.
      
      Explanation:
       1. Something is doing fsync() on the block dev,
          holding inode->i_sem
       2. The fsync write is blocked by all-paths-down and queue_if_no_path
       3. Someone requests to suspend the dm device with noflush.
          Pending writes are left in queue.
       4. In the middle of dm_resume(), __bind() tries to get
          inode->i_sem to do __set_size() and waits forever.
      
      'noflush suspend' is a new device-mapper feature introduced in
      early 2.6.20. So I hope the fix being included before 2.6.20 is
      released.
      
      Example of reproducer:
       1. Create a multipath device by dmsetup
       2. Fail all paths during mkfs
       3. Do dmsetup suspend --noflush and load new map with healthy paths
       4. Do dmsetup resume
      Signed-off-by: default avatarJun'ichi Nomura <j-nomura@ce.jp.nec.com>
      Acked-by: default avatarAlasdair G Kergon <agk@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      bfa152fa
  18. 08 Dec, 2006 4 commits
    • Kiyoshi Ueda's avatar
      [PATCH] dm: suspend: add noflush pushback · 2e93ccc1
      Kiyoshi Ueda authored
      In device-mapper I/O is sometimes queued within targets for later processing.
      For example the multipath target can be configured to store I/O when no paths
      are available instead of returning it -EIO.
      
      This patch allows the device-mapper core to instruct a target to transfer the
      contents of any such in-target queue back into the core.  This frees up the
      resources used by the target so the core can replace that target with an
      alternative one and then resend the I/O to it.  Without this patch the only
      way to change the target in such circumstances involves returning the I/O with
      an error back to the filesystem/application.  In the multipath case, this
      patch will let us add new paths for existing I/O to try after all the existing
      paths have failed.
      
          DMF_NOFLUSH_SUSPENDING
          ----------------------
      
      If the DM_NOFLUSH_FLAG ioctl option is specified at suspend time, the
      DMF_NOFLUSH_SUSPENDING flag is set in md->flags during dm_suspend().  It
      is always cleared before dm_suspend() returns.
      
      The flag must be visible while the target is flushing pending I/Os so it
      is set before presuspend where the flush starts and unset after the wait
      for md->pending where the flush ends.
      
      Target drivers can check this flag by calling dm_noflush_suspending().
      
          DM_MAPIO_REQUEUE / DM_ENDIO_REQUEUE
          -----------------------------------
      
      A target's map() function can now return DM_MAPIO_REQUEUE to request the
      device mapper core queue the bio.
      
      Similarly, a target's end_io() function can return DM_ENDIO_REQUEUE to request
      the same.  This has been labelled 'pushback'.
      
      The __map_bio() and clone_endio() functions in the core treat these return
      values as errors and call dec_pending() to end the I/O.
      
          dec_pending
          -----------
      
      dec_pending() saves the pushback request in struct dm_io->error.  Once all
      the split clones have ended, dec_pending() will put the original bio on
      the md->pushback list.  Note that this supercedes any I/O errors.
      
      It is possible for the suspend with DM_NOFLUSH_FLAG to be aborted while
      in progress (e.g. by user interrupt).  dec_pending() checks for this and
      returns -EIO if it happened.
      
          pushdback list and pushback_lock
          --------------------------------
      
      The bio is queued on md->pushback temporarily in dec_pending(), and after
      all pending I/Os return, md->pushback is merged into md->deferred in
      dm_suspend() for re-issuing at resume time.
      
      md->pushback_lock protects md->pushback.
      The lock should be held with irq disabled because dec_pending() can be
      called from interrupt context.
      
      Queueing bios to md->pushback in dec_pending() must be done atomically
      with the check for DMF_NOFLUSH_SUSPENDING flag.  So md->pushback_lock is
      held when checking the flag.  Otherwise dec_pending() may queue a bio to
      md->pushback after the interrupted dm_suspend() flushes md->pushback.
      Then the bio would be left in md->pushback.
      
      Flag setting in dm_suspend() can be done without md->pushback_lock because
      the flag is checked only after presuspend and the set value is already
      made visible via the target's presuspend function.
      
      The flag can be checked without md->pushback_lock (e.g. the first part of
      the dec_pending() or target drivers), because the flag is checked again
      with md->pushback_lock held when the bio is really queued to md->pushback
      as described above.  So even if the flag is cleared after the lockless
      checkings, the bio isn't left in md->pushback but returned to applications
      with -EIO.
      
          Other notes on the current patch
          --------------------------------
      
      - md->pushback is added to the struct mapped_device instead of using
        md->deferred directly because md->io_lock which protects md->deferred is
        rw_semaphore and can't be used in interrupt context like dec_pending(),
        and md->io_lock protects the DMF_BLOCK_IO flag of md->flags too.
      
      - Don't issue lock_fs() in dm_suspend() if the DM_NOFLUSH_FLAG
        ioctl option is specified, because I/Os generated by lock_fs() would be
        pushed back and never return if there were no valid devices.
      
      - If an error occurs in dm_suspend() after the DMF_NOFLUSH_SUSPENDING
        flag is set, md->pushback must be flushed because I/Os may be queued to
        the list already.  (flush_and_out label in dm_suspend())
      
          Test results
          ------------
      
      I have tested using multipath target with the next patch.
      
      The following tests are for regression/compatibility:
        - I/Os succeed when valid paths exist;
        - I/Os fail when there are no valid paths and queue_if_no_path is not
          set;
        - I/Os are queued in the multipath target when there are no valid paths and
          queue_if_no_path is set;
        - The queued I/Os above fail when suspend is issued without the
          DM_NOFLUSH_FLAG ioctl option.  I/Os spanning 2 multipath targets also
          fail.
      
      The following tests are for the normal code path of new pushback feature:
        - Queued I/Os in the multipath target are flushed from the target
          but don't return when suspend is issued with the DM_NOFLUSH_FLAG
          ioctl option;
        - The I/Os above are queued in the multipath target again when
          resume is issued without path recovery;
        - The I/Os above succeed when resume is issued after path recovery
          or table load;
        - Queued I/Os in the multipath target succeed when resume is issued
          with the DM_NOFLUSH_FLAG ioctl option after table load. I/Os
          spanning 2 multipath targets also succeed.
      
      The following tests are for the error paths of the new pushback feature:
        - When the bdget_disk() fails in dm_suspend(), the
          DMF_NOFLUSH_SUSPENDING flag is cleared and I/Os already queued to the
          pushback list are flushed properly.
        - When suspend with the DM_NOFLUSH_FLAG ioctl option is interrupted,
            o I/Os which had already been queued to the pushback list
              at the time don't return, and are re-issued at resume time;
            o I/Os which hadn't been returned at the time return with EIO.
      Signed-off-by: default avatarKiyoshi Ueda <k-ueda@ct.jp.nec.com>
      Signed-off-by: default avatarJun'ichi Nomura <j-nomura@ce.jp.nec.com>
      Signed-off-by: default avatarAlasdair G Kergon <agk@redhat.com>
      Cc: dm-devel@redhat.com
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      2e93ccc1
    • Kiyoshi Ueda's avatar
      [PATCH] dm: map and endio return code clarification · 45cbcd79
      Kiyoshi Ueda authored
      Tighten the use of return values from the target map and end_io functions.
      Values of 2 and above are now explictly reserved for future use.  There are no
      existing targets using such values.
      
      The patch has no effect on existing behaviour.
      
      o Reserve return values of 2 and above from target map functions.
        Any positive value currently indicates "mapping complete", but all
        existing drivers use the value 1.  We now make that a requirement
        so we can assign new meaning to higher values in future.
      
        The new definition of return values from target map functions is:
            < 0 : error
            = 0 : The target will handle the io (DM_MAPIO_SUBMITTED).
            = 1 : Mapping completed (DM_MAPIO_REMAPPED).
            > 1 : Reserved (undefined).  Previously this was the same as '= 1'.
      
      o Reserve return values of 2 and above from target end_io functions
        for similar reasons.
        DM_ENDIO_INCOMPLETE is introduced for a return value of 1.
      
      Test results:
      
        I have tested by using the multipath target.
      
        I/Os succeed when valid paths exist.
      
        I/Os are queued in the multipath target when there are no valid paths and
      queue_if_no_path is set.
      
        I/Os fail when there are no valid paths and queue_if_no_path is not set.
      Signed-off-by: default avatarKiyoshi Ueda <k-ueda@ct.jp.nec.com>
      Signed-off-by: default avatarJun'ichi Nomura <j-nomura@ce.jp.nec.com>
      Signed-off-by: default avatarAlasdair G Kergon <agk@redhat.com>
      Cc: dm-devel@redhat.com
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      45cbcd79
    • Kiyoshi Ueda's avatar
      [PATCH] dm: suspend: parameter change · a3d77d35
      Kiyoshi Ueda authored
      Change the interface of dm_suspend() so that we can pass several options
      without increasing the number of parameters.  The existing 'do_lockfs' integer
      parameter is replaced by a flag DM_SUSPEND_LOCKFS_FLAG.
      
      There is no functional change to the code.
      
      Test results:
      I have tested 'dmsetup suspend' command with/without the '--nolockfs'
      option and confirmed the do_lockfs value is correctly set.
      Signed-off-by: default avatarKiyoshi Ueda <k-ueda@ct.jp.nec.com>
      Signed-off-by: default avatarJun'ichi Nomura <j-nomura@ce.jp.nec.com>
      Signed-off-by: default avatarAlasdair G Kergon <agk@redhat.com>
      Cc: dm-devel@redhat.com
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      a3d77d35
    • Kiyoshi Ueda's avatar
      [PATCH] dm: tidy core formatting · 74859364
      Kiyoshi Ueda authored
      Remove unnecessary spaces in dm.c.
      Signed-off-by: default avatarKiyoshi Ueda <k-ueda@ct.jp.nec.com>
      Signed-off-by: default avatarJun'ichi Nomura <j-nomura@ce.jp.nec.com>
      Signed-off-by: default avatarAlasdair G Kergon <agk@redhat.com>
      Cc: dm-devel@redhat.com
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      74859364