Skip to content
Snippets Groups Projects
  1. Oct 18, 2021
  2. Jun 30, 2021
  3. Jan 29, 2021
  4. Jan 25, 2021
  5. May 27, 2020
  6. Oct 10, 2019
  7. Apr 30, 2019
  8. Nov 07, 2018
    • Jens Axboe's avatar
      block: remove dead elevator code · a1ce35fa
      Jens Axboe authored
      
      This removes a bunch of core and elevator related code. On the core
      front, we remove anything related to queue running, draining,
      initialization, plugging, and congestions. We also kill anything
      related to request allocation, merging, retrieval, and completion.
      
      Remove any checking for single queue IO schedulers, as they no
      longer exist. This means we can also delete a bunch of code related
      to request issue, adding, completion, etc - and all the SQ related
      ops and helpers.
      
      Also kill the load_default_modules(), as all that did was provide
      for a way to load the default single queue elevator.
      
      Tested-by: default avatarMing Lei <ming.lei@redhat.com>
      Reviewed-by: default avatarOmar Sandoval <osandov@fb.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      a1ce35fa
  9. Jan 17, 2018
  10. Jun 09, 2017
    • Christoph Hellwig's avatar
      block: introduce new block status code type · 2a842aca
      Christoph Hellwig authored
      
      Currently we use nornal Linux errno values in the block layer, and while
      we accept any error a few have overloaded magic meanings.  This patch
      instead introduces a new  blk_status_t value that holds block layer specific
      status codes and explicitly explains their meaning.  Helpers to convert from
      and to the previous special meanings are provided for now, but I suspect
      we want to get rid of them in the long run - those drivers that have a
      errno input (e.g. networking) usually get errnos that don't know about
      the special block layer overloads, and similarly returning them to userspace
      will usually return somethings that strictly speaking isn't correct
      for file system operations, but that's left as an exercise for later.
      
      For now the set of errors is a very limited set that closely corresponds
      to the previous overloaded errno values, but there is some low hanging
      fruite to improve it.
      
      blk_status_t (ab)uses the sparse __bitwise annotations to allow for sparse
      typechecking, so that we can easily catch places passing the wrong values.
      
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      2a842aca
  11. Apr 20, 2017
  12. Jan 31, 2017
  13. Jan 27, 2017
  14. Jan 17, 2017
  15. Oct 28, 2016
  16. Jul 21, 2016
  17. May 05, 2015
  18. Sep 22, 2014
  19. Jun 09, 2014
  20. Mar 21, 2014
  21. Feb 21, 2014
  22. Feb 07, 2014
  23. Dec 31, 2013
  24. Oct 25, 2013
    • Jens Axboe's avatar
      blk-mq: new multi-queue block IO queueing mechanism · 320ae51f
      Jens Axboe authored
      
      Linux currently has two models for block devices:
      
      - The classic request_fn based approach, where drivers use struct
        request units for IO. The block layer provides various helper
        functionalities to let drivers share code, things like tag
        management, timeout handling, queueing, etc.
      
      - The "stacked" approach, where a driver squeezes in between the
        block layer and IO submitter. Since this bypasses the IO stack,
        driver generally have to manage everything themselves.
      
      With drivers being written for new high IOPS devices, the classic
      request_fn based driver doesn't work well enough. The design dates
      back to when both SMP and high IOPS was rare. It has problems with
      scaling to bigger machines, and runs into scaling issues even on
      smaller machines when you have IOPS in the hundreds of thousands
      per device.
      
      The stacked approach is then most often selected as the model
      for the driver. But this means that everybody has to re-invent
      everything, and along with that we get all the problems again
      that the shared approach solved.
      
      This commit introduces blk-mq, block multi queue support. The
      design is centered around per-cpu queues for queueing IO, which
      then funnel down into x number of hardware submission queues.
      We might have a 1:1 mapping between the two, or it might be
      an N:M mapping. That all depends on what the hardware supports.
      
      blk-mq provides various helper functions, which include:
      
      - Scalable support for request tagging. Most devices need to
        be able to uniquely identify a request both in the driver and
        to the hardware. The tagging uses per-cpu caches for freed
        tags, to enable cache hot reuse.
      
      - Timeout handling without tracking request on a per-device
        basis. Basically the driver should be able to get a notification,
        if a request happens to fail.
      
      - Optional support for non 1:1 mappings between issue and
        submission queues. blk-mq can redirect IO completions to the
        desired location.
      
      - Support for per-request payloads. Drivers almost always need
        to associate a request structure with some driver private
        command structure. Drivers can tell blk-mq this at init time,
        and then any request handed to the driver will have the
        required size of memory associated with it.
      
      - Support for merging of IO, and plugging. The stacked model
        gets neither of these. Even for high IOPS devices, merging
        sequential IO reduces per-command overhead and thus
        increases bandwidth.
      
      For now, this is provided as a potential 3rd queueing model, with
      the hope being that, as it matures, it can replace both the classic
      and stacked model. That would get us back to having just 1 real
      model for block devices, leaving the stacked approach to dm/md
      devices (as it was originally intended).
      
      Contributions in this patch from the following people:
      
      Shaohua Li <shli@fusionio.com>
      Alexander Gordeev <agordeev@redhat.com>
      Christoph Hellwig <hch@infradead.org>
      Mike Christie <michaelc@cs.wisc.edu>
      Matias Bjorling <m@bjorling.me>
      Jeff Moyer <jmoyer@redhat.com>
      
      Acked-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      320ae51f
    • Christoph Hellwig's avatar
      block: remove request ref_count · 71fe07d0
      Christoph Hellwig authored
      
      This reference count has been around since before git history, but the only
      place where it's used is in blk_execute_rq, and ther it is entirely useless
      as it is incremented before submitting the request and decremented in the
      end_io handler before waking up the submitter thread.
      
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      71fe07d0
  25. Sep 18, 2013
  26. Feb 15, 2013
    • Vladimir Davydov's avatar
      block: account iowait time when waiting for completion of IO request · 5577022f
      Vladimir Davydov authored
      
      Using wait_for_completion() for waiting for a IO request to be executed
      results in wrong iowait time accounting. For example, a system having
      the only task doing write() and fdatasync() on a block device can be
      reported being idle instead of iowaiting as it should because
      blkdev_issue_flush() calls wait_for_completion() which in turn calls
      schedule() that does not increment the iowait proc counter and thus does
      not turn on iowait time accounting.
      
      The patch makes block layer use wait_for_completion_io() instead of
      wait_for_completion() where appropriate to account iowait time
      correctly.
      
      Signed-off-by: default avatarVladimir Davydov <vdavydov@parallels.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      5577022f
  27. Feb 07, 2013
  28. Dec 06, 2012
    • Bart Van Assche's avatar
      block: Avoid that request_fn is invoked on a dead queue · c246e80d
      Bart Van Assche authored
      
      A block driver may start cleaning up resources needed by its
      request_fn as soon as blk_cleanup_queue() finished, so request_fn
      must not be invoked after draining finished. This is important
      when blk_run_queue() is invoked without any requests in progress.
      As an example, if blk_drain_queue() and scsi_run_queue() run in
      parallel, blk_drain_queue() may have finished all requests after
      scsi_run_queue() has taken a SCSI device off the starved list but
      before that last function has had a chance to run the queue.
      
      Signed-off-by: default avatarBart Van Assche <bvanassche@acm.org>
      Cc: James Bottomley <JBottomley@Parallels.com>
      Cc: Mike Christie <michaelc@cs.wisc.edu>
      Cc: Chanho Min <chanho.min@lge.com>
      Acked-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      c246e80d
    • Bart Van Assche's avatar
      block: Rename queue dead flag · 3f3299d5
      Bart Van Assche authored
      
      QUEUE_FLAG_DEAD is used to indicate that queuing new requests must
      stop. After this flag has been set queue draining starts. However,
      during the queue draining phase it is still safe to invoke the
      queue's request_fn, so QUEUE_FLAG_DYING is a better name for this
      flag.
      
      This patch has been generated by running the following command
      over the kernel source tree:
      
      git grep -lEw 'blk_queue_dead|QUEUE_FLAG_DEAD' |
          xargs sed -i.tmp -e 's/blk_queue_dead/blk_queue_dying/g'      \
              -e 's/QUEUE_FLAG_DEAD/QUEUE_FLAG_DYING/g';                \
      sed -i.tmp -e "s/QUEUE_FLAG_DYING$(printf \\t)*5/QUEUE_FLAG_DYING$(printf \\t)5/g" \
          include/linux/blkdev.h;                                       \
      sed -i.tmp -e 's/ DEAD/ DYING/g' -e 's/dead queue/a dying queue/' \
          -e 's/Dead queue/A dying queue/' block/blk-core.c
      
      Signed-off-by: default avatarBart Van Assche <bvanassche@acm.org>
      Acked-by: default avatarTejun Heo <tj@kernel.org>
      Cc: James Bottomley <JBottomley@Parallels.com>
      Cc: Mike Christie <michaelc@cs.wisc.edu>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Chanho Min <chanho.min@lge.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      3f3299d5
  29. Nov 23, 2012
    • Roland Dreier's avatar
      block: Don't access request after it might be freed · 893d290f
      Roland Dreier authored
      
      After we've done __elv_add_request() and __blk_run_queue() in
      blk_execute_rq_nowait(), the request might finish and be freed
      immediately.  Therefore checking if the type is REQ_TYPE_PM_RESUME
      isn't safe afterwards, because if it isn't, rq might be gone.
      Instead, check beforehand and stash the result in a temporary.
      
      This fixes crashes in blk_execute_rq_nowait() I get occasionally when
      running with lots of memory debugging options enabled -- I think this
      race is usually harmless because the window for rq to be reallocated
      is so small.
      
      Signed-off-by: default avatarRoland Dreier <roland@purestorage.com>
      Cc: stable@kernel.org
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      893d290f
  30. Jul 20, 2012
  31. Dec 13, 2011
Loading