Skip to content
Snippets Groups Projects
  1. Oct 27, 2021
  2. Aug 23, 2021
  3. Aug 09, 2021
  4. Aug 02, 2021
  5. May 09, 2021
  6. May 07, 2021
  7. May 03, 2021
    • Changheun Lee's avatar
      bio: limit bio max size · cd2c7545
      Changheun Lee authored
      
      bio size can grow up to 4GB when muli-page bvec is enabled.
      but sometimes it would lead to inefficient behaviors.
      in case of large chunk direct I/O, - 32MB chunk read in user space -
      all pages for 32MB would be merged to a bio structure if the pages
      physical addresses are contiguous. it makes some delay to submit
      until merge complete. bio max size should be limited to a proper size.
      
      When 32MB chunk read with direct I/O option is coming from userspace,
      kernel behavior is below now in do_direct_IO() loop. it's timeline.
      
       | bio merge for 32MB. total 8,192 pages are merged.
       | total elapsed time is over 2ms.
       |------------------ ... ----------------------->|
                                                       | 8,192 pages merged a bio.
                                                       | at this time, first bio submit is done.
                                                       | 1 bio is split to 32 read request and issue.
                                                       |--------------->
                                                        |--------------->
                                                         |--------------->
                                                                    ......
                                                                         |--------------->
                                                                          |--------------->|
                                total 19ms elapsed to complete 32MB read done from device. |
      
      If bio max size is limited with 1MB, behavior is changed below.
      
       | bio merge for 1MB. 256 pages are merged for each bio.
       | total 32 bio will be made.
       | total elapsed time is over 2ms. it's same.
       | but, first bio submit timing is fast. about 100us.
       |--->|--->|--->|---> ... -->|--->|--->|--->|--->|
            | 256 pages merged a bio.
            | at this time, first bio submit is done.
            | and 1 read request is issued for 1 bio.
            |--------------->
                 |--------------->
                      |--------------->
                                            ......
                                                       |--------------->
                                                        |--------------->|
              total 17ms elapsed to complete 32MB read done from device. |
      
      As a result, read request issue timing is faster if bio max size is limited.
      Current kernel behavior with multipage bvec, super large bio can be created.
      And it lead to delay first I/O request issue.
      
      Signed-off-by: default avatarChangheun Lee <nanich.lee@samsung.com>
      Reviewed-by: default avatarBart Van Assche <bvanassche@acm.org>
      Link: https://lore.kernel.org/r/20210503095203.29076-1-nanich.lee@samsung.com
      
      
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      cd2c7545
  8. Apr 06, 2021
  9. Feb 24, 2021
    • Mikulas Patocka's avatar
      blk-settings: align max_sectors on "logical_block_size" boundary · 97f433c3
      Mikulas Patocka authored
      
      We get I/O errors when we run md-raid1 on the top of dm-integrity on the
      top of ramdisk.
      device-mapper: integrity: Bio not aligned on 8 sectors: 0xff00, 0xff
      device-mapper: integrity: Bio not aligned on 8 sectors: 0xff00, 0xff
      device-mapper: integrity: Bio not aligned on 8 sectors: 0xffff, 0x1
      device-mapper: integrity: Bio not aligned on 8 sectors: 0xffff, 0x1
      device-mapper: integrity: Bio not aligned on 8 sectors: 0x8048, 0xff
      device-mapper: integrity: Bio not aligned on 8 sectors: 0x8147, 0xff
      device-mapper: integrity: Bio not aligned on 8 sectors: 0x8246, 0xff
      device-mapper: integrity: Bio not aligned on 8 sectors: 0x8345, 0xbb
      
      The ramdisk device has logical_block_size 512 and max_sectors 255. The
      dm-integrity device uses logical_block_size 4096 and it doesn't affect the
      "max_sectors" value - thus, it inherits 255 from the ramdisk. So, we have
      a device with max_sectors not aligned on logical_block_size.
      
      The md-raid device sees that the underlying leg has max_sectors 255 and it
      will split the bios on 255-sector boundary, making the bios unaligned on
      logical_block_size.
      
      In order to fix the bug, we round down max_sectors to logical_block_size.
      
      Cc: stable@vger.kernel.org
      Reviewed-by: default avatarMing Lei <ming.lei@redhat.com>
      Signed-off-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      97f433c3
  10. Feb 10, 2021
  11. Jan 25, 2021
  12. Dec 08, 2020
    • Damien Le Moal's avatar
      block: Align max_hw_sectors to logical blocksize · 817046ec
      Damien Le Moal authored
      
      Block device drivers do not have to call blk_queue_max_hw_sectors() to
      set a limit on request size if the default limit BLK_SAFE_MAX_SECTORS
      is acceptable. However, this limit (255 sectors) may not be aligned
      to the device logical block size which cannot be used as is for a
      request maximum size. This is the case for the null_blk device driver.
      
      Modify blk_queue_max_hw_sectors() to make sure that the request size
      limits specified by the max_hw_sectors and max_sectors queue limits
      are always aligned to the device logical block size. Additionally, to
      avoid introducing a dependence on the execution order of this function
      with blk_queue_logical_block_size(), also modify
      blk_queue_logical_block_size() to perform the same alignment when the
      logical block size is set after max_hw_sectors.
      
      Signed-off-by: default avatarDamien Le Moal <damien.lemoal@wdc.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarJohannes Thumshirn <johannes.thumshirn@wdc.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      817046ec
  13. Dec 01, 2020
  14. Sep 24, 2020
  15. Sep 23, 2020
  16. Sep 16, 2020
  17. Jul 20, 2020
  18. May 13, 2020
    • Keith Busch's avatar
      block: Introduce REQ_OP_ZONE_APPEND · 0512a75b
      Keith Busch authored
      
      Define REQ_OP_ZONE_APPEND to append-write sectors to a zone of a zoned
      block device. This is a no-merge write operation.
      
      A zone append write BIO must:
      * Target a zoned block device
      * Have a sector position indicating the start sector of the target zone
      * The target zone must be a sequential write zone
      * The BIO must not cross a zone boundary
      * The BIO size must not be split to ensure that a single range of LBAs
        is written with a single command.
      
      Implement these checks in generic_make_request_checks() using the
      helper function blk_check_zone_append(). To avoid write append BIO
      splitting, introduce the new max_zone_append_sectors queue limit
      attribute and ensure that a BIO size is always lower than this limit.
      Export this new limit through sysfs and check these limits in bio_full().
      
      Also when a LLDD can't dispatch a request to a specific zone, it
      will return BLK_STS_ZONE_RESOURCE indicating this request needs to
      be delayed, e.g.  because the zone it will be dispatched to is still
      write-locked. If this happens set the request aside in a local list
      to continue trying dispatching requests such as READ requests or a
      WRITE/ZONE_APPEND requests targetting other zones. This way we can
      still keep a high queue depth without starving other requests even if
      one request can't be served due to zone write-locking.
      
      Finally, make sure that the bio sector position indicates the actual
      write position as indicated by the device on completion.
      
      Signed-off-by: default avatarKeith Busch <kbusch@kernel.org>
      [ jth: added zone-append specific add_page and merge_page helpers ]
      Signed-off-by: default avatarJohannes Thumshirn <johannes.thumshirn@wdc.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarHannes Reinecke <hare@suse.de>
      Reviewed-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      0512a75b
  19. Apr 22, 2020
  20. Mar 27, 2020
    • Christoph Hellwig's avatar
      block: simplify queue allocation · 3d745ea5
      Christoph Hellwig authored
      
      Current make_request based drivers use either blk_alloc_queue_node or
      blk_alloc_queue to allocate a queue, and then set up the make_request_fn
      function pointer and a few parameters using the blk_queue_make_request
      helper.  Simplify this by passing the make_request pointer to
      blk_alloc_queue, and while at it merge the _node variant into the main
      helper by always passing a node_id, and remove the superfluous gfp_mask
      parameter.  A lower-level __blk_alloc_queue is kept for the blk-mq case.
      
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      3d745ea5
  21. Mar 17, 2020
  22. Jan 16, 2020
  23. Sep 06, 2019
    • Damien Le Moal's avatar
      block: Introduce elevator features · 68c43f13
      Damien Le Moal authored
      
      Introduce the definition of elevator features through the
      elevator_features flags in the elevator_type structure. Each flag can
      represent a feature supported by an elevator. The first feature defined
      by this patch is support for zoned block device sequential write
      constraint with the flag ELEVATOR_F_ZBD_SEQ_WRITE, which is implemented
      by the mq-deadline elevator using zone write locking.
      
      Other possible features are IO priorities, write hints, latency targets
      or single-LUN dual-actuator disks (for which the elevator could maintain
      one LBA ordered list per actuator).
      
      The required_elevator_features field is also added to the request_queue
      structure to allow a device driver to specify elevator feature flags
      that an elevator must support for the correct operation of the device
      (e.g. device drivers for zoned block devices can have the
      ELEVATOR_F_ZBD_SEQ_WRITE flag as a required feature).
      The helper function blk_queue_required_elevator_features() is
      defined for setting this new field.
      
      With these two new fields in place, the elevator functions
      elevator_match() and elevator_find() are modified to allow a user to set
      only an elevator with a set of features that satisfies the device
      required features. Elevators not matching the device requirements are
      not shown in the device sysfs queue/scheduler file to prevent their use.
      
      The "none" elevator can always be selected as before.
      
      Reviewed-by: default avatarJohannes Thumshirn <jthumshirn@suse.de>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarMing Lei <ming.lei@redhat.com>
      Signed-off-by: default avatarDamien Le Moal <damien.lemoal@wdc.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      68c43f13
  24. Sep 03, 2019
  25. Aug 29, 2019
  26. Jul 26, 2019
  27. May 23, 2019
  28. Apr 30, 2019
  29. Feb 09, 2019
  30. Dec 19, 2018
  31. Nov 15, 2018
  32. Nov 07, 2018
Loading