- Oct 27, 2021
-
-
Shin'ichiro Kawasaki authored
Commit a33df75c ("block: use an xarray for disk->part_tbl") modified the method to check partition existence in host-aware zoned block devices from disk_has_partitions() helper function call to empty check of xarray disk->part_tbl. However, disk->part_tbl always has single entry for disk->part0 and never becomes empty. This resulted in the host-aware zoned devices always judged to have partitions, and it made the sysfs queue/zoned attribute to be "none" instead of "host-aware" regardless of partition existence in the devices. This also caused DEBUG_LOCKS_WARN_ON(lock->magic != lock) for sdkp->rev_mutex in scsi layer when the kernel detects host-aware zoned device. Since block layer handled the host-aware zoned devices as non- zoned devices, scsi layer did not have chance to initialize the mutex for zone revalidation. Therefore, the warning was triggered. To fix the issues, call the helper function disk_has_partitions() in place of disk->part_tbl empty check. Since the function was removed with the commit a33df75c, reimplement it to walk through entries in the xarray disk->part_tbl. Fixes: a33df75c ("block: use an xarray for disk->part_tbl") Signed-off-by:
Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> Cc: stable@vger.kernel.org # v5.14+ Reviewed-by:
Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by:
Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20211026060115.753746-1-shinichiro.kawasaki@wdc.com Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Aug 23, 2021
-
-
Christoph Hellwig authored
Replace the magic lookup through the kobject tree with an explicit backpointer, given that the device model links are set up and torn down at times when I/O is still possible, leading to potential NULL or invalid pointer dereferences. Fixes: edb0872f ("block: move the bdi from the request_queue to the gendisk") Reported-by:
syzbot <syzbot+aa0801b6b32dca9dda82@syzkaller.appspotmail.com> Signed-off-by:
Christoph Hellwig <hch@lst.de> Tested-by:
Sven Schnelle <svens@linux.ibm.com> Link: https://lore.kernel.org/r/20210816134624.GA24234@lst.de Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Aug 09, 2021
-
-
Christoph Hellwig authored
The backing device information only makes sense for file system I/O, and thus belongs into the gendisk and not the lower level request_queue structure. Move it there. Signed-off-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20210809141744.1203023-5-hch@lst.de Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
.. and rename the function to disk_update_readahead. This is in preparation for moving the BDI from the request_queue to the gendisk. Signed-off-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Jan Kara <jack@suse.cz> Reviewed-by:
Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20210809141744.1203023-3-hch@lst.de Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Aug 02, 2021
-
-
Christoph Hellwig authored
Printk ->disk_name directly for the disk and use the %pg format specifier for the block device, which is equivalent to a bdevname call. Signed-off-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Martin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20210727062518.122108-5-hch@lst.de Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- May 09, 2021
-
-
Jens Axboe authored
This reverts commit cd2c7545. Alex reports that the commit causes corruption with LUKS on ext4. Revert it for now so that this can be investigated properly. Link: https://lore.kernel.org/linux-block/1620493841.bxdq8r5haw.none@localhost/ Reported-by:
Alex Xu (Hello71) <alex_y_xu@yahoo.ca> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- May 07, 2021
-
-
Matthew Wilcox (Oracle) authored
My UEK-derived config has 1030 files depending on pagemap.h before this change. Afterwards, just 326 files need to be rebuilt when I touch pagemap.h. I think blkdev.h is probably included too widely, but untangling that dependency is harder and this solves my problem. x86 allmodconfig builds, but there may be implicit include problems on other architectures. Link: https://lkml.kernel.org/r/20210309195747.283796-1-willy@infradead.org Signed-off-by:
Matthew Wilcox (Oracle) <willy@infradead.org> Acked-by: Dan Williams <dan.j.williams@intel.com> [nvdimm] Acked-by: Jens Axboe <axboe@kernel.dk> [block] Reviewed-by:
Christoph Hellwig <hch@lst.de> Acked-by: Coly Li <colyli@suse.de> [bcache] Acked-by: Martin K. Petersen <martin.petersen@oracle.com> [scsi] Reviewed-by:
William Kucharski <william.kucharski@oracle.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- May 03, 2021
-
-
Changheun Lee authored
bio size can grow up to 4GB when muli-page bvec is enabled. but sometimes it would lead to inefficient behaviors. in case of large chunk direct I/O, - 32MB chunk read in user space - all pages for 32MB would be merged to a bio structure if the pages physical addresses are contiguous. it makes some delay to submit until merge complete. bio max size should be limited to a proper size. When 32MB chunk read with direct I/O option is coming from userspace, kernel behavior is below now in do_direct_IO() loop. it's timeline. | bio merge for 32MB. total 8,192 pages are merged. | total elapsed time is over 2ms. |------------------ ... ----------------------->| | 8,192 pages merged a bio. | at this time, first bio submit is done. | 1 bio is split to 32 read request and issue. |---------------> |---------------> |---------------> ...... |---------------> |--------------->| total 19ms elapsed to complete 32MB read done from device. | If bio max size is limited with 1MB, behavior is changed below. | bio merge for 1MB. 256 pages are merged for each bio. | total 32 bio will be made. | total elapsed time is over 2ms. it's same. | but, first bio submit timing is fast. about 100us. |--->|--->|--->|---> ... -->|--->|--->|--->|--->| | 256 pages merged a bio. | at this time, first bio submit is done. | and 1 read request is issued for 1 bio. |---------------> |---------------> |---------------> ...... |---------------> |--------------->| total 17ms elapsed to complete 32MB read done from device. | As a result, read request issue timing is faster if bio max size is limited. Current kernel behavior with multipage bvec, super large bio can be created. And it lead to delay first I/O request issue. Signed-off-by:
Changheun Lee <nanich.lee@samsung.com> Reviewed-by:
Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20210503095203.29076-1-nanich.lee@samsung.com Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Apr 06, 2021
-
-
Christoph Hellwig authored
Get rid of all the PFN arithmetics and just use an enum for the two remaining options, and use PageHighMem for the actual bounce decision. Add a fast path to entirely avoid the call for the common case of a queue not using the legacy bouncing code. Signed-off-by:
Christoph Hellwig <hch@lst.de> Acked-by:
Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by:
Hannes Reinecke <hare@suse.de> Link: https://lore.kernel.org/r/20210331073001.46776-8-hch@lst.de Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
Remove the BLK_BOUNCE_ISA support now that all users are gone. Signed-off-by:
Christoph Hellwig <hch@lst.de> Acked-by:
Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by:
Hannes Reinecke <hare@suse.de> Link: https://lore.kernel.org/r/20210331073001.46776-7-hch@lst.de Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Feb 24, 2021
-
-
Mikulas Patocka authored
We get I/O errors when we run md-raid1 on the top of dm-integrity on the top of ramdisk. device-mapper: integrity: Bio not aligned on 8 sectors: 0xff00, 0xff device-mapper: integrity: Bio not aligned on 8 sectors: 0xff00, 0xff device-mapper: integrity: Bio not aligned on 8 sectors: 0xffff, 0x1 device-mapper: integrity: Bio not aligned on 8 sectors: 0xffff, 0x1 device-mapper: integrity: Bio not aligned on 8 sectors: 0x8048, 0xff device-mapper: integrity: Bio not aligned on 8 sectors: 0x8147, 0xff device-mapper: integrity: Bio not aligned on 8 sectors: 0x8246, 0xff device-mapper: integrity: Bio not aligned on 8 sectors: 0x8345, 0xbb The ramdisk device has logical_block_size 512 and max_sectors 255. The dm-integrity device uses logical_block_size 4096 and it doesn't affect the "max_sectors" value - thus, it inherits 255 from the ramdisk. So, we have a device with max_sectors not aligned on logical_block_size. The md-raid device sees that the underlying leg has max_sectors 255 and it will split the bios on 255-sector boundary, making the bios unaligned on logical_block_size. In order to fix the bug, we round down max_sectors to logical_block_size. Cc: stable@vger.kernel.org Reviewed-by:
Ming Lei <ming.lei@redhat.com> Signed-off-by:
Mikulas Patocka <mpatocka@redhat.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Feb 10, 2021
-
-
Damien Le Moal authored
Introduce the internal function blk_queue_clear_zone_settings() to cleanup all limits and resources related to zoned block devices. This new function is called from blk_queue_set_zoned() when a disk zoned model is set to BLK_ZONED_NONE. This particular case can happens when a partition is created on a host-aware scsi disk. Signed-off-by:
Damien Le Moal <damien.lemoal@wdc.com> Reviewed-by:
Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by:
Johannes Thumshirn <johannes.thumshirn@edc.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Damien Le Moal authored
Per ZBC and ZAC specifications, host-managed SMR hard-disks mandate that all writes into sequential write required zones be aligned to the device physical block size. However, NVMe ZNS does not have this constraint and allows write operations into sequential zones to be aligned to the device logical block size. This inconsistency does not help with software portability across device types. To solve this, introduce the zone_write_granularity queue limit to indicate the alignment constraint, in bytes, of write operations into zones of a zoned block device. This new limit is exported as a read-only sysfs queue attribute and the helper blk_queue_zone_write_granularity() introduced for drivers to set this limit. The function blk_queue_set_zoned() is modified to set this new limit to the device logical block size by default. NVMe ZNS devices as well as zoned nullb devices use this default value as is. The scsi disk driver is modified to execute the blk_queue_zone_write_granularity() helper to set the zone write granularity of host-managed SMR disks to the disk physical block size. The accessor functions queue_zone_write_granularity() and bdev_zone_write_granularity() are also introduced. Signed-off-by:
Damien Le Moal <damien.lemoal@wdc.com> Reviewed-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by:
Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by:
Johannes Thumshirn <johannes.thumshirn@edc.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Jan 25, 2021
-
-
Christoph Hellwig authored
Now that no fast path lookups in the partition table are left, there is no point in micro-optimizing the data structure for it. Just use a bog standard xarray. Signed-off-by:
Christoph Hellwig <hch@lst.de> Acked-by:
Tejun Heo <tj@kernel.org> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Dec 08, 2020
-
-
Damien Le Moal authored
Block device drivers do not have to call blk_queue_max_hw_sectors() to set a limit on request size if the default limit BLK_SAFE_MAX_SECTORS is acceptable. However, this limit (255 sectors) may not be aligned to the device logical block size which cannot be used as is for a request maximum size. This is the case for the null_blk device driver. Modify blk_queue_max_hw_sectors() to make sure that the request size limits specified by the max_hw_sectors and max_sectors queue limits are always aligned to the device logical block size. Additionally, to avoid introducing a dependence on the execution order of this function with blk_queue_logical_block_size(), also modify blk_queue_logical_block_size() to perform the same alignment when the logical block size is set after max_hw_sectors. Signed-off-by:
Damien Le Moal <damien.lemoal@wdc.com> Reviewed-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Dec 01, 2020
-
-
Mike Snitzer authored
commit 22ada802 ("block: use lcm_not_zero() when stacking chunk_sectors") broke chunk_sectors limit stacking. chunk_sectors must reflect the most limited of all devices in the IO stack. Otherwise malformed IO may result. E.g.: prior to this fix, ->chunk_sectors = lcm_not_zero(8, 128) would result in blk_max_size_offset() splitting IO at 128 sectors rather than the required more restrictive 8 sectors. And since commit 07d098e6 ("block: allow 'chunk_sectors' to be non-power-of-2") care must be taken to properly stack chunk_sectors to be compatible with the possibility that a non-power-of-2 chunk_sectors may be stacked. This is why gcd() is used instead of reverting back to using min_not_zero(). Fixes: 22ada802 ("block: use lcm_not_zero() when stacking chunk_sectors") Fixes: 07d098e6 ("block: allow 'chunk_sectors' to be non-power-of-2") Reported-by:
John Dorminy <jdorminy@redhat.com> Reported-by:
Bruce Johnston <bjohnsto@redhat.com> Signed-off-by:
Mike Snitzer <snitzer@redhat.com> Reviewed-by:
John Dorminy <jdorminy@redhat.com> Cc: stable@vger.kernel.org Reviewed-by:
Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Sep 24, 2020
-
-
Christoph Hellwig authored
Drivers shouldn't really mess with the readahead size, as that is a VM concept. Instead set it based on the optimal I/O size by lifting the algorithm from the md driver when registering the disk. Also set bdi->io_pages there as well by applying the same scheme based on max_sectors. To ensure the limits work well for stacking drivers a new helper is added to update the readahead limits from the block limits, which is also called from disk_stack_limits. Signed-off-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by:
Jan Kara <jack@suse.cz> Reviewed-by:
Mike Snitzer <snitzer@redhat.com> Reviewed-by:
Martin K. Petersen <martin.petersen@oracle.com> Acked-by:
Coly Li <colyli@suse.de> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Sep 23, 2020
-
-
Mike Snitzer authored
It is possible, albeit more unlikely, for a block device to have a non power-of-2 for chunk_sectors (e.g. 10+2 RAID6 with 128K chunk_sectors, which results in a full-stripe size of 1280K. This causes the RAID6's io_opt to be advertised as 1280K, and a stacked device _could_ then be made to use a blocksize, aka chunk_sectors, that matches non power-of-2 io_opt of underlying RAID6 -- resulting in stacked device's chunk_sectors being a non power-of-2). Update blk_queue_chunk_sectors() and blk_max_size_offset() to accommodate drivers that need a non power-of-2 chunk_sectors. Reviewed-by:
Ming Lei <ming.lei@redhat.com> Reviewed-by:
Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by:
Mike Snitzer <snitzer@redhat.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Mike Snitzer authored
Like 'io_opt', blk_stack_limits() should stack 'chunk_sectors' using lcm_not_zero() rather than min_not_zero() -- otherwise the final 'chunk_sectors' could result in sub-optimal alignment of IO to component devices in the IO stack. Also, if 'chunk_sectors' isn't a multiple of 'physical_block_size' then it is a bug in the driver and the device should be flagged as 'misaligned'. Reviewed-by:
Ming Lei <ming.lei@redhat.com> Reviewed-by:
Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by:
Mike Snitzer <snitzer@redhat.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Sep 16, 2020
-
-
Damien Le Moal authored
When CONFIG_BLK_DEV_ZONED is disabled, allow using host-aware ZBC disks as regular disks. In this case, ensure that command completion is correctly executed by changing sd_zbc_complete() to return good_bytes instead of 0 and causing a hang during device probe (endless retries). When CONFIG_BLK_DEV_ZONED is enabled and a host-aware disk is detected to have partitions, it will be used as a regular disk. In this case, make sure to not do anything in sd_zbc_revalidate_zones() as that triggers warnings. Since all these different cases result in subtle settings of the disk queue zoned model, introduce the block layer helper function blk_queue_set_zoned() to generically implement setting up the effective zoned model according to the disk type, the presence of partitions on the disk and CONFIG_BLK_DEV_ZONED configuration. Link: https://lore.kernel.org/r/20200915073347.832424-2-damien.lemoal@wdc.com Fixes: b7205307 ("block: allow partitions on host aware zone devices") Cc: <stable@vger.kernel.org> Reported-by:
Borislav Petkov <bp@alien8.de> Suggested-by:
Christoph Hellwig <hch@infradead.org> Reviewed-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by:
Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by:
Martin K. Petersen <martin.petersen@oracle.com>
-
- Jul 20, 2020
-
-
Christoph Hellwig authored
This function is just a tiny wrapper around blk_stack_limits. Open code it int the two callers. Reviewed-by:
Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by:
Damien Le Moal <damien.lemoal@wdc.com> Tested-by:
Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
This function is just a tiny wrapper around blk_stack_limit and has two callers. Simplify the stack a bit by open coding it in the two callers. Reviewed-by:
Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by:
Damien Le Moal <damien.lemoal@wdc.com> Tested-by:
Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
Lift the code from device mapper into blk_stack_limits to inherity the stacking limitations. This ensures we do the right thing for all stacked zoned block devices. Reviewed-by:
Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by:
Damien Le Moal <damien.lemoal@wdc.com> Tested-by:
Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- May 13, 2020
-
-
Keith Busch authored
Define REQ_OP_ZONE_APPEND to append-write sectors to a zone of a zoned block device. This is a no-merge write operation. A zone append write BIO must: * Target a zoned block device * Have a sector position indicating the start sector of the target zone * The target zone must be a sequential write zone * The BIO must not cross a zone boundary * The BIO size must not be split to ensure that a single range of LBAs is written with a single command. Implement these checks in generic_make_request_checks() using the helper function blk_check_zone_append(). To avoid write append BIO splitting, introduce the new max_zone_append_sectors queue limit attribute and ensure that a BIO size is always lower than this limit. Export this new limit through sysfs and check these limits in bio_full(). Also when a LLDD can't dispatch a request to a specific zone, it will return BLK_STS_ZONE_RESOURCE indicating this request needs to be delayed, e.g. because the zone it will be dispatched to is still write-locked. If this happens set the request aside in a local list to continue trying dispatching requests such as READ requests or a WRITE/ZONE_APPEND requests targetting other zones. This way we can still keep a high queue depth without starving other requests even if one request can't be served due to zone write-locking. Finally, make sure that the bio sector position indicates the actual write position as indicated by the device on completion. Signed-off-by:
Keith Busch <kbusch@kernel.org> [ jth: added zone-append specific add_page and merge_page helpers ] Signed-off-by:
Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Hannes Reinecke <hare@suse.de> Reviewed-by:
Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Apr 22, 2020
-
-
Christoph Hellwig authored
Don't burden the common block code with with specifics of the libata DMA draining mechanism. Instead move most of the code to the scsi midlayer. That also means the nr_phys_segments adjustments in the blk-mq fast path can go away entirely, given that SCSI never looks at nr_phys_segments after mapping the request to a scatterlist. Signed-off-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Mar 27, 2020
-
-
Christoph Hellwig authored
Current make_request based drivers use either blk_alloc_queue_node or blk_alloc_queue to allocate a queue, and then set up the make_request_fn function pointer and a few parameters using the blk_queue_make_request helper. Simplify this by passing the make_request pointer to blk_alloc_queue, and while at it merge the _node variant into the main helper by always passing a node_id, and remove the superfluous gfp_mask parameter. A lower-level __blk_alloc_queue is kept for the blk-mq case. Signed-off-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Mar 17, 2020
-
-
Konstantin Khlebnikov authored
Field bdi->io_pages added in commit 9491ae4a ("mm: don't cap request size based on read-ahead setting") removes unneeded split of read requests. Stacked drivers do not call blk_queue_max_hw_sectors(). Instead they set limits of their devices by blk_set_stacking_limits() + disk_stack_limits(). Field bio->io_pages stays zero until user set max_sectors_kb via sysfs. This patch updates io_pages after merging limits in disk_stack_limits(). Commit c6d6e9b0 ("dm: do not allow readahead to limit IO size") fixed the same problem for device-mapper devices, this one fixes MD RAIDs. Fixes: 9491ae4a ("mm: don't cap request size based on read-ahead setting") Reviewed-by:
Paul Menzel <pmenzel@molgen.mpg.de> Reviewed-by:
Bob Liu <bob.liu@oracle.com> Signed-off-by:
Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Signed-off-by:
Song Liu <songliubraving@fb.com>
-
- Jan 16, 2020
-
-
Mikulas Patocka authored
Logical block size has type unsigned short. That means that it can be at most 32768. However, there are architectures that can run with 64k pages (for example arm64) and on these architectures, it may be possible to create block devices with 64k block size. For exmaple (run this on an architecture with 64k pages): Mount will fail with this error because it tries to read the superblock using 2-sector access: device-mapper: writecache: I/O is not aligned, sector 2, size 1024, block size 65536 EXT4-fs (dm-0): unable to read superblock This patch changes the logical block size from unsigned short to unsigned int to avoid the overflow. Cc: stable@vger.kernel.org Reviewed-by:
Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by:
Ming Lei <ming.lei@redhat.com> Signed-off-by:
Mikulas Patocka <mpatocka@redhat.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Sep 06, 2019
-
-
Damien Le Moal authored
Introduce the definition of elevator features through the elevator_features flags in the elevator_type structure. Each flag can represent a feature supported by an elevator. The first feature defined by this patch is support for zoned block device sequential write constraint with the flag ELEVATOR_F_ZBD_SEQ_WRITE, which is implemented by the mq-deadline elevator using zone write locking. Other possible features are IO priorities, write hints, latency targets or single-LUN dual-actuator disks (for which the elevator could maintain one LBA ordered list per actuator). The required_elevator_features field is also added to the request_queue structure to allow a device driver to specify elevator feature flags that an elevator must support for the correct operation of the device (e.g. device drivers for zoned block devices can have the ELEVATOR_F_ZBD_SEQ_WRITE flag as a required feature). The helper function blk_queue_required_elevator_features() is defined for setting this new field. With these two new fields in place, the elevator functions elevator_match() and elevator_find() are modified to allow a user to set only an elevator with a set of features that satisfies the device required features. Elevators not matching the device requirements are not shown in the device sysfs queue/scheduler file to prevent their use. The "none" elevator can always be selected as before. Reviewed-by:
Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Ming Lei <ming.lei@redhat.com> Signed-off-by:
Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Sep 03, 2019
-
-
Yoshihiro Shimoda authored
This patch adds a helper function whether a queue can merge the segments by the DMA MAP layer (e.g. via IOMMU). Signed-off-by:
Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com> Reviewed-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Simon Horman <horms+renesas@verge.net.au> Signed-off-by:
Christoph Hellwig <hch@lst.de>
-
- Aug 29, 2019
-
-
Tejun Heo authored
wbt already gets queue depth changed notification through wbt_set_queue_depth(). Generalize it into rq_qos_ops->queue_depth_changed() so that other rq_qos policies can easily hook into the events too. Signed-off-by:
Tejun Heo <tj@kernel.org> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Jul 26, 2019
-
-
Christoph Hellwig authored
We should only set the max segment size to unlimited if we actually have a virt boundary. Otherwise we accidentally clear that limit when called from the SCSI midlayer, which always calls blk_queue_virt_boundary, even if that mask is 0. Fixes: 7ad388d8 ("scsi: core: add a host / host template field for the virt boundary") Reported-by:
Guenter Roeck <linux@roeck-us.net> Signed-off-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- May 23, 2019
-
-
Christoph Hellwig authored
We currently fail to update the front/back segment size in the bio when deciding to allow an otherwise gappy segement to a device with a virt boundary. The reason why this did not cause problems is that devices with a virt boundary fundamentally don't use segments as we know it and thus don't care. Make that assumption formal by forcing an unlimited segement size in this case. Fixes: f6970f83 ("block: don't check if adjacent bvecs in one bio can be mergeable") Signed-off-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Ming Lei <ming.lei@redhat.com> Reviewed-by:
Hannes Reinecke <hare@suse.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Apr 30, 2019
-
-
Christoph Hellwig authored
Reviewed-by:
Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
Various block layer files do not have any licensing information at all. Add SPDX tags for the default kernel GPLv2 license to those. Reviewed-by:
Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Feb 09, 2019
-
-
Jens Axboe authored
We have various helpers for setting/clearing this flag, and also a helper to check if the queue supports queueable flushes or not. But nobody uses them anymore, kill it with fire. Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Dec 19, 2018
-
-
Christoph Hellwig authored
Now that the the SCSI layer replaced the use of the cluster flag with segment size limits and the DMA boundary we can remove the cluster flag from the block layer. Signed-off-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Jens Axboe <axboe@kernel.dk> Signed-off-by:
Martin K. Petersen <martin.petersen@oracle.com>
-
- Nov 15, 2018
-
-
Christoph Hellwig authored
->queue_flags is generally not set or cleared in the fast path, and also generally set or cleared one flag at a time. Make use of the normal atomic bitops for it so that we don't need to take the queue_lock, which is otherwise mostly unused in the core block layer now. Reviewed-by:
Hannes Reinecke <hare@suse.com> Signed-off-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Nov 07, 2018
-
-
Jens Axboe authored
With the legacy path gone, all we do is funnel it through the mq_ops->complete() operation. Tested-by:
Ming Lei <ming.lei@redhat.com> Reviewed-by:
Omar Sandoval <osandov@fb.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Jens Axboe authored
The only user of legacy timing now is BSG, which is invoked from the mq timeout handler. Kill the legacy code, and rename the q->rq_timed_out_fn to q->bsg_job_timeout_fn. Reviewed-by:
Hannes Reinecke <hare@suse.com> Tested-by:
Ming Lei <ming.lei@redhat.com> Reviewed-by:
Omar Sandoval <osandov@fb.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-