Skip to content
Snippets Groups Projects
  1. Nov 03, 2021
  2. Oct 20, 2021
  3. Oct 19, 2021
  4. Oct 18, 2021
  5. Aug 23, 2021
    • Jens Axboe's avatar
      block: provide bio_clear_hipri() helper · 270a1c91
      Jens Axboe authored
      
      Any case that turns off REQ_HIPRI must also clear BIO_PERCPU_CACHE,
      as non-polled IO may complete through hard/soft IRQ and hence isn't
      safe for our polled bio alloc cache.
      
      Provide a helper that does just that, and use it in the merging code as
      well if we split a bio and turn off polling.
      
      Fixes: be863b9e ("block: clear BIO_PERCPU_CACHE flag if polling isn't supported")
      Reported-by: default avatarKeith Busch <kbusch@kernel.org>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      270a1c91
  6. Aug 15, 2021
  7. Aug 09, 2021
  8. Jun 29, 2021
  9. Jun 25, 2021
    • Jan Kara's avatar
      blk: Fix lock inversion between ioc lock and bfqd lock · fd2ef39c
      Jan Kara authored
      
      Lockdep complains about lock inversion between ioc->lock and bfqd->lock:
      
      bfqd -> ioc:
       put_io_context+0x33/0x90 -> ioc->lock grabbed
       blk_mq_free_request+0x51/0x140
       blk_put_request+0xe/0x10
       blk_attempt_req_merge+0x1d/0x30
       elv_attempt_insert_merge+0x56/0xa0
       blk_mq_sched_try_insert_merge+0x4b/0x60
       bfq_insert_requests+0x9e/0x18c0 -> bfqd->lock grabbed
       blk_mq_sched_insert_requests+0xd6/0x2b0
       blk_mq_flush_plug_list+0x154/0x280
       blk_finish_plug+0x40/0x60
       ext4_writepages+0x696/0x1320
       do_writepages+0x1c/0x80
       __filemap_fdatawrite_range+0xd7/0x120
       sync_file_range+0xac/0xf0
      
      ioc->bfqd:
       bfq_exit_icq+0xa3/0xe0 -> bfqd->lock grabbed
       put_io_context_active+0x78/0xb0 -> ioc->lock grabbed
       exit_io_context+0x48/0x50
       do_exit+0x7e9/0xdd0
       do_group_exit+0x54/0xc0
      
      To avoid this inversion we change blk_mq_sched_try_insert_merge() to not
      free the merged request but rather leave that upto the caller similarly
      to blk_mq_sched_try_merge(). And in bfq_insert_requests() we make sure
      to free all the merged requests after dropping bfqd->lock.
      
      Fixes: aee69d78 ("block, bfq: introduce the BFQ-v0 I/O scheduler as an extra scheduler")
      Reviewed-by: default avatarMing Lei <ming.lei@redhat.com>
      Acked-by: default avatarPaolo Valente <paolo.valente@linaro.org>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Link: https://lore.kernel.org/r/20210623093634.27879-3-jack@suse.cz
      
      
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      fd2ef39c
  10. Mar 23, 2021
    • David Jeffery's avatar
      block: recalculate segment count for multi-segment discards correctly · a958937f
      David Jeffery authored
      
      When a stacked block device inserts a request into another block device
      using blk_insert_cloned_request, the request's nr_phys_segments field gets
      recalculated by a call to blk_recalc_rq_segments in
      blk_cloned_rq_check_limits. But blk_recalc_rq_segments does not know how to
      handle multi-segment discards. For disk types which can handle
      multi-segment discards like nvme, this results in discard requests which
      claim a single segment when it should report several, triggering a warning
      in nvme and causing nvme to fail the discard from the invalid state.
      
       WARNING: CPU: 5 PID: 191 at drivers/nvme/host/core.c:700 nvme_setup_discard+0x170/0x1e0 [nvme_core]
       ...
       nvme_setup_cmd+0x217/0x270 [nvme_core]
       nvme_loop_queue_rq+0x51/0x1b0 [nvme_loop]
       __blk_mq_try_issue_directly+0xe7/0x1b0
       blk_mq_request_issue_directly+0x41/0x70
       ? blk_account_io_start+0x40/0x50
       dm_mq_queue_rq+0x200/0x3e0
       blk_mq_dispatch_rq_list+0x10a/0x7d0
       ? __sbitmap_queue_get+0x25/0x90
       ? elv_rb_del+0x1f/0x30
       ? deadline_remove_request+0x55/0xb0
       ? dd_dispatch_request+0x181/0x210
       __blk_mq_do_dispatch_sched+0x144/0x290
       ? bio_attempt_discard_merge+0x134/0x1f0
       __blk_mq_sched_dispatch_requests+0x129/0x180
       blk_mq_sched_dispatch_requests+0x30/0x60
       __blk_mq_run_hw_queue+0x47/0xe0
       __blk_mq_delay_run_hw_queue+0x15b/0x170
       blk_mq_sched_insert_requests+0x68/0xe0
       blk_mq_flush_plug_list+0xf0/0x170
       blk_finish_plug+0x36/0x50
       xlog_cil_committed+0x19f/0x290 [xfs]
       xlog_cil_process_committed+0x57/0x80 [xfs]
       xlog_state_do_callback+0x1e0/0x2a0 [xfs]
       xlog_ioend_work+0x2f/0x80 [xfs]
       process_one_work+0x1b6/0x350
       worker_thread+0x53/0x3e0
       ? process_one_work+0x350/0x350
       kthread+0x11b/0x140
       ? __kthread_bind_mask+0x60/0x60
       ret_from_fork+0x22/0x30
      
      This patch fixes blk_recalc_rq_segments to be aware of devices which can
      have multi-segment discards. It calculates the correct discard segment
      count by counting the number of bio as each discard bio is considered its
      own segment.
      
      Fixes: 1e739730 ("block: optionally merge discontiguous discard bios into a single request")
      Signed-off-by: default avatarDavid Jeffery <djeffery@redhat.com>
      Reviewed-by: default avatarMing Lei <ming.lei@redhat.com>
      Reviewed-by: default avatarLaurence Oberman <loberman@redhat.com>
      Link: https://lore.kernel.org/r/20210211143807.GA115624@redhat
      
      
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      a958937f
  11. Jan 25, 2021
  12. Dec 08, 2020
    • Jeffle Xu's avatar
      block: disable iopoll for split bio · cc29e1bf
      Jeffle Xu authored
      
      iopoll is initially for small size, latency sensitive IO. It doesn't
      work well for big IO, especially when it needs to be split to multiple
      bios. In this case, the returned cookie of __submit_bio_noacct_mq() is
      indeed the cookie of the last split bio. The completion of *this* last
      split bio done by iopoll doesn't mean the whole original bio has
      completed. Callers of iopoll still need to wait for completion of other
      split bios.
      
      Besides bio splitting may cause more trouble for iopoll which isn't
      supposed to be used in case of big IO.
      
      iopoll for split bio may cause potential race if CPU migration happens
      during bio submission. Since the returned cookie is that of the last
      split bio, polling on the corresponding hardware queue doesn't help
      complete other split bios, if these split bios are enqueued into
      different hardware queues. Since interrupts are disabled for polling
      queues, the completion of these other split bios depends on timeout
      mechanism, thus causing a potential hang.
      
      iopoll for split bio may also cause hang for sync polling. Currently
      both the blkdev and iomap-based fs (ext4/xfs, etc) support sync polling
      in direct IO routine. These routines will submit bio without REQ_NOWAIT
      flag set, and then start sync polling in current process context. The
      process may hang in blk_mq_get_tag() if the submitted bio has to be
      split into multiple bios and can rapidly exhaust the queue depth. The
      process are waiting for the completion of the previously allocated
      requests, which should be reaped by the following polling, and thus
      causing a deadlock.
      
      To avoid these subtle trouble described above, just disable iopoll for
      split bio and return BLK_QC_T_NONE in this case. The side effect is that
      non-HIPRI IO also returns BLK_QC_T_NONE now. It should be acceptable
      since the returned cookie is never used for non-HIPRI IO.
      
      Suggested-by: default avatarMing Lei <ming.lei@redhat.com>
      Signed-off-by: default avatarJeffle Xu <jefflexu@linux.alibaba.com>
      Reviewed-by: default avatarMing Lei <ming.lei@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      cc29e1bf
  13. Dec 04, 2020
  14. Dec 01, 2020
  15. Oct 06, 2020
  16. Sep 02, 2020
  17. Sep 01, 2020
  18. Aug 21, 2020
    • Keith Busch's avatar
      block: fix get_max_io_size() · e4b469c6
      Keith Busch authored
      
      A previous commit aligning splits to physical block sizes inadvertently
      modified one return case such that that it now returns 0 length splits
      when the number of sectors doesn't exceed the physical offset. This
      later hits a BUG in bio_split(). Restore the previous working behavior.
      
      Fixes: 9cc5169c ("block: Improve physical block alignment of split bios")
      Reported-by: default avatarEric Deal <eric.deal@wdc.com>
      Signed-off-by: default avatarKeith Busch <kbusch@kernel.org>
      Cc: Bart Van Assche <bvanassche@acm.org>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      e4b469c6
  19. Aug 17, 2020
  20. Jul 16, 2020
  21. Jul 01, 2020
  22. Jun 26, 2020
    • Jan Kara's avatar
      blktrace: Provide event for request merging · f3bdc62f
      Jan Kara authored
      
      Currently blk-mq does not report any event when two requests get merged
      in the elevator. This then results in difficult to understand sequence
      of events like:
      
      ...
        8,0   34     1579     0.608765271  2718  I  WS 215023504 + 40 [dbench]
        8,0   34     1584     0.609184613  2719  A  WS 215023544 + 56 <- (8,4) 2160568
        8,0   34     1585     0.609184850  2719  Q  WS 215023544 + 56 [dbench]
        8,0   34     1586     0.609188524  2719  G  WS 215023544 + 56 [dbench]
        8,0    3      602     0.609684162   773  D  WS 215023504 + 96 [kworker/3:1H]
        8,0   34     1591     0.609843593     0  C  WS 215023504 + 96 [0]
      
      and you can only guess (after quite some headscratching since the above
      excerpt is intermixed with a lot of other IO) that request 215023544+56
      got merged to request 215023504+40. Provide proper event for request
      merging like we used to do in the legacy block layer.
      
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      f3bdc62f
  23. May 27, 2020
  24. May 19, 2020
  25. May 14, 2020
    • Satya Tangirala's avatar
      block: Inline encryption support for blk-mq · a892c8d5
      Satya Tangirala authored
      
      We must have some way of letting a storage device driver know what
      encryption context it should use for en/decrypting a request. However,
      it's the upper layers (like the filesystem/fscrypt) that know about and
      manages encryption contexts. As such, when the upper layer submits a bio
      to the block layer, and this bio eventually reaches a device driver with
      support for inline encryption, the device driver will need to have been
      told the encryption context for that bio.
      
      We want to communicate the encryption context from the upper layer to the
      storage device along with the bio, when the bio is submitted to the block
      layer. To do this, we add a struct bio_crypt_ctx to struct bio, which can
      represent an encryption context (note that we can't use the bi_private
      field in struct bio to do this because that field does not function to pass
      information across layers in the storage stack). We also introduce various
      functions to manipulate the bio_crypt_ctx and make the bio/request merging
      logic aware of the bio_crypt_ctx.
      
      We also make changes to blk-mq to make it handle bios with encryption
      contexts. blk-mq can merge many bios into the same request. These bios need
      to have contiguous data unit numbers (the necessary changes to blk-merge
      are also made to ensure this) - as such, it suffices to keep the data unit
      number of just the first bio, since that's all a storage driver needs to
      infer the data unit number to use for each data block in each bio in a
      request. blk-mq keeps track of the encryption context to be used for all
      the bios in a request with the request's rq_crypt_ctx. When the first bio
      is added to an empty request, blk-mq will program the encryption context
      of that bio into the request_queue's keyslot manager, and store the
      returned keyslot in the request's rq_crypt_ctx. All the functions to
      operate on encryption contexts are in blk-crypto.c.
      
      Upper layers only need to call bio_crypt_set_ctx with the encryption key,
      algorithm and data_unit_num; they don't have to worry about getting a
      keyslot for each encryption context, as blk-mq/blk-crypto handles that.
      Blk-crypto also makes it possible for request-based layered devices like
      dm-rq to make use of inline encryption hardware by cloning the
      rq_crypt_ctx and programming a keyslot in the new request_queue when
      necessary.
      
      Note that any user of the block layer can submit bios with an
      encryption context, such as filesystems, device-mapper targets, etc.
      
      Signed-off-by: default avatarSatya Tangirala <satyat@google.com>
      Reviewed-by: default avatarEric Biggers <ebiggers@google.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      a892c8d5
  26. Apr 29, 2020
Loading