- Oct 18, 2021
-
-
Pavel Begunkov authored
Convert bdev->bd_disk->queue to bdev_get_queue(), it's uses a cached queue pointer and so is faster. Signed-off-by:
Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/addf6ea988c04213697ba3684c853e4ed7642a39.1634219547.git.asml.silence@gmail.com Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Jens Axboe authored
Even if no policies are defined, we spend ~2% of the total IO time checking. Move the fast path inline. Acked-by:
Tejun Heo <tj@kernel.org> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Sep 07, 2021
-
-
Li Jinlin authored
The pending timer has been set up in blk_throtl_init(). However, the timer is not deleted in blk_throtl_exit(). This means that the timer handler may still be running after freeing the timer, which would result in a use-after-free. Fix by calling del_timer_sync() to delete the timer in blk_throtl_exit(). Signed-off-by:
Li Jinlin <lijinlin3@huawei.com> Link: https://lore.kernel.org/r/20210907121242.2885564-1-lijinlin3@huawei.com Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Aug 15, 2021
-
-
Chunguang Xu authored
After patch 54efd50b (block: make generic_make_request handle arbitrarily sized bios), the IO through io-throttle may be larger, and these IOs may be further split into more small IOs. However, IOPS throttle does not seem to be aware of this change, which makes the calculation of IOPS of large IOs incomplete, resulting in disk-side IOPS that does not meet expectations. Maybe we should fix this problem. We can reproduce it by set max_sectors_kb of disk to 128, set blkio.write_iops_throttle to 100, run a dd instance inside blkio and use iostat to watch IOPS: dd if=/dev/zero of=/dev/sdb bs=1M count=1000 oflag=direct As a result, without this change the average IOPS is 1995, with this change the IOPS is 98. Signed-off-by:
Chunguang Xu <brookxu@tencent.com> Acked-by:
Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/65869aaad05475797d63b4c3fed4f529febe3c26.1627876014.git.brookxu@tencent.com Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Jan 25, 2021
-
-
Christoph Hellwig authored
Replace the gendisk pointer in struct bio with a pointer to the newly improved struct block device. From that the gendisk can be trivially accessed with an extra indirection, but it also allows to directly look up all information related to partition remapping. Signed-off-by:
Christoph Hellwig <hch@lst.de> Acked-by:
Tejun Heo <tj@kernel.org> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Dec 02, 2020
-
-
Yu Kuai authored
blk-throttle: don't check whether or not lower limit is valid if CONFIG_BLK_DEV_THROTTLING_LOW is off blk_throtl_update_limit_valid() will search for descendants to see if 'LIMIT_LOW' of bps/iops and READ/WRITE is nonzero. However, they're always zero if CONFIG_BLK_DEV_THROTTLING_LOW is not set, furthermore, a lot of time will be wasted to iterate descendants. Thus do nothing in blk_throtl_update_limit_valid() in such situation. Signed-off-by:
Yu Kuai <yukuai3@huawei.com> Acked-by:
Tejun Heo <tj@kernel.org> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Oct 08, 2020
-
-
Baolin Wang authored
Re-use throtl_set_slice_end() to remove duplicate code. Signed-off-by:
Baolin Wang <baolin.wang@linux.alibaba.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Baolin Wang authored
The __throtl_de/enqueue_tg() functions are only be called by throtl_de/enqueue_tg(), thus we can just open code them to make code more readable. Signed-off-by:
Baolin Wang <baolin.wang@linux.alibaba.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Baolin Wang authored
The throtl_schedule_next_dispatch() will validate if the service queue is empty before calling update_min_dispatch_time(), and the update_min_dispatch_time() will call throtl_rb_first(), which will validate service queue again. Thus we can move the service queue validation out of the throtl_rb_first() to remove the redundant validation in the fast path. Signed-off-by:
Baolin Wang <baolin.wang@linux.alibaba.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Baolin Wang authored
We should move the list operation after validation. Signed-off-by:
Baolin Wang <baolin.wang@linux.alibaba.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Baolin Wang authored
It can not scale up in throtl_adjusted_limit() if we set bps or iops is 1, which will cause IO hang when enable low limit. Thus we should treat 1 as a illegal value to avoid this issue. Signed-off-by:
Baolin Wang <baolin.wang@linux.alibaba.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Baolin Wang authored
The IO latency tracking is only for LOW limit, so we should add a validation to avoid redundant latency tracking if the LOW limit is not valid. Signed-off-by:
Baolin Wang <baolin.wang@linux.alibaba.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Baolin Wang authored
We only update the tg->last_finish_time when the low limitaion is enabled, so we can move the tg->last_finish_time validation a little forward to avoid getting the unnecessary current time stamp if the the low limitation is not enabled. Signed-off-by:
Baolin Wang <baolin.wang@linux.alibaba.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Baolin Wang authored
The throtl_downgrade_state() is always used to change to LIMIT_LOW limitation, thus remove the latter meaningless parameter which indicates the limitation index. Signed-off-by:
Baolin Wang <baolin.wang@linux.alibaba.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Sep 15, 2020
-
-
Baolin Wang authored
Do not need check the bps or iops limitation if bps or iops is unlimited. Signed-off-by:
Baolin Wang <baolin.wang@linux.alibaba.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Baolin Wang authored
The tg_may_dispatch() will call tg_with_in_bps_limit() and tg_with_in_iops_limit() to check if we can dispatch a bio or not, which will calculate bps/iops limitation multiple times. But tg_may_dispatch() is always called under queue lock, which means the bps/iops limitation will not change in tg_may_dispatch(). So we can calculate the bps/iops limitation only once, and pass them to tg_with_in_bps_limit() and tg_with_in_iops_limit() to avoid calculating bps/iops limitation repeatedly. Signed-off-by:
Baolin Wang <baolin.wang@linux.alibaba.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Baolin Wang authored
The 'throtl_grp_quantum' and 'throtl_quantum' are both read-only variables, thus better to use readable macros instead of static variables, which can also save some spaces for .bss area. Signed-off-by:
Baolin Wang <baolin.wang@linux.alibaba.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Baolin Wang authored
Use readable READ/WRITE macros instead of magic numbers. Signed-off-by:
Baolin Wang <baolin.wang@linux.alibaba.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Baolin Wang authored
Fix some comments' typos. Signed-off-by:
Baolin Wang <baolin.wang@linux.alibaba.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Jul 01, 2020
-
-
Christoph Hellwig authored
generic_make_request has always been very confusingly misnamed, so rename it to submit_bio_noacct to make it clear that it is submit_bio minus accounting and a few checks. Signed-off-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Jun 29, 2020
-
-
Christoph Hellwig authored
bios must have a valid block group by the time they are submitted. Acked-by:
Tejun Heo <tj@kernel.org> Signed-off-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
blkcg_bio_issue_check is a giant inline function that does three entirely different things. Factor out the blk-cgroup related bio initalization into a new helper, and the open code the sequence in the only caller, relying on the fact that all the actual functionality is stubbed out for non-cgroup builds. Acked-by:
Tejun Heo <tj@kernel.org> Signed-off-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
The only thing in blkcg_bio_issue_check that needs to be under rcu_read_lock is blk_throtl_bio, so move the locking there. Acked-by:
Tejun Heo <tj@kernel.org> Signed-off-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- May 29, 2020
-
-
Guoqing Jiang authored
After blk_throtl_drain is removed, there is no caller of tg_drain_bios, so remove it as well. Signed-off-by:
Guoqing Jiang <guoqing.jiang@cloud.ionos.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Guoqing Jiang authored
After the commit 5addeae1 ("blk-cgroup: remove blkcg_drain_queue"), there is no caller of blk_throtl_drain, so let's remove it. Signed-off-by:
Guoqing Jiang <guoqing.jiang@cloud.ionos.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Nov 07, 2019
-
-
Tejun Heo authored
blkg_rwstat is now only used by bfq-iosched and blk-throtl when on cgroup1. Let's move it into its own files and gate it behind a config option. Signed-off-by:
Tejun Heo <tj@kernel.org> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Tejun Heo authored
When used on cgroup1, blk-throtl uses the blkg->stat_bytes and ->stat_ios from blk-cgroup core to populate four stat knobs. blk-cgroup core is moving away from blkg_rwstat to improve scalability and won't be able to support this usage. It isn't like the sharing gains all that much. Let's break them out to dedicated rwstat counters which are updated when on cgroup1. Signed-off-by:
Tejun Heo <tj@kernel.org> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Sep 15, 2019
-
-
Hou Tao authored
Currently rq->data_len will be decreased by partial completion or zeroed by completion, so when blk_stat_add() is invoked, data_len will be zero and there will never be samples in poll_cb because blk_mq_poll_stats_bkt() will return -1 if data_len is zero. We could move blk_stat_add() back to __blk_mq_complete_request(), but that would make the effort of trying to call ktime_get_ns() once in vain. Instead we can reuse throtl_size field, and use it for both block stats and block throttle, and adjust the logic in blk_mq_poll_stats_bkt() accordingly. Fixes: 4bc6339a ("block: move blk_stat_add() to __blk_mq_end_request()") Tested-by:
Pavel Begunkov <asml.silence@gmail.com> Signed-off-by:
Hou Tao <houtao1@huawei.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Aug 29, 2019
-
-
Tejun Heo authored
Instead of @node, pass in @q and @blkcg so that the alloc function has more context. This doesn't cause any behavior change and will be used by io.weight implementation. Signed-off-by:
Tejun Heo <tj@kernel.org> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Jul 10, 2019
-
-
Konstantin Khlebnikov authored
After commit 991f61fe ("Blk-throttle: reduce tail io latency when iops limit is enforced") wait time could be zero even if group is throttled and cannot issue requests right now. As a result throtl_select_dispatch() turns into busy-loop under irq-safe queue spinlock. Fix is simple: always round up target time to the next throttle slice. Fixes: 991f61fe ("Blk-throttle: reduce tail io latency when iops limit is enforced") Signed-off-by:
Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Cc: stable@vger.kernel.org # v4.19+ Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- May 31, 2019
-
-
Bart Van Assche authored
Commit e99e88a9 renamed a function argument without updating the corresponding kernel-doc header. Update the kernel-doc header. Reviewed-by:
Chaitanya Kulkarni <chiatanya.kulkarni@wdc.com> Reviewed-by:
Kees Cook <keescook@chromium.org> Fixes: e99e88a9 ("treewide: setup_timer() -> timer_setup()") # v4.15. Signed-off-by:
Bart Van Assche <bvanassche@acm.org> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Dec 08, 2018
-
-
Dennis Zhou authored
bio_issue_init among other things initializes the timestamp for an IO. Rather than have this logic handled by policies, this consolidates it to be on the init paths (normal, clone, bounce clone). Signed-off-by:
Dennis Zhou <dennis@kernel.org> Acked-by:
Tejun Heo <tj@kernel.org> Reviewed-by:
Liu Bo <bo.liu@linux.alibaba.com> Reviewed-by:
Josef Bacik <josef@toxicpanda.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Dennis Zhou authored
Previously, blkg association was handled by controller specific code in blk-throttle and blk-iolatency. However, because a blkg represents a relationship between a blkcg and a request_queue, it makes sense to keep the blkg->q and bio->bi_disk->queue consistent. This patch moves association into the bio_set_dev macro(). This should cover the majority of cases where the device is set/changed keeping the two pointers consistent. Fallback code is added to blkcg_bio_issue_check() to catch any missing paths. Signed-off-by:
Dennis Zhou <dennis@kernel.org> Reviewed-by:
Josef Bacik <josef@toxicpanda.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Dennis Zhou authored
There are 3 ways blkg association can happen: association with the current css, with the page css (swap), or from the wbc css (writeback). This patch handles how association is done for the first case where we are associating bsaed on the current css. If there is already a blkg associated, the css will be reused and association will be redone as the request_queue may have changed. Signed-off-by:
Dennis Zhou <dennis@kernel.org> Reviewed-by:
Josef Bacik <josef@toxicpanda.com> Acked-by:
Tejun Heo <tj@kernel.org> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Dennis Zhou authored
There are several scenarios where blkg_lookup_create() can fail such as the blkcg dying, request_queue is dying, or simply being OOM. Most handle this by simply falling back to the q->root_blkg and calling it a day. This patch implements the notion of closest blkg. During blkg_lookup_create(), if it fails to create, return the closest blkg found or the q->root_blkg. blkg_try_get_closest() is introduced and used during association so a bio is always attached to a blkg. Signed-off-by:
Dennis Zhou <dennis@kernel.org> Acked-by:
Tejun Heo <tj@kernel.org> Reviewed-by:
Josef Bacik <josef@toxicpanda.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Nov 16, 2018
-
-
Jens Axboe authored
Various spots check for q->mq_ops being non-NULL, but provide a helper to do this instead. Where the ->mq_ops != NULL check is redundant, remove it. Since mq == rq-based now that legacy is gone, get rid of the queue_is_rq_based() and just use queue_is_mq() everywhere. Reviewed-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Nov 15, 2018
-
-
Christoph Hellwig authored
With the legacy request path gone there is no good reason to keep queue_lock as a pointer, we can always use the embedded lock now. Reviewed-by:
Hannes Reinecke <hare@suse.com> Signed-off-by:
Christoph Hellwig <hch@lst.de> Fixed floppy and blk-cgroup missing conversions and half done edits. Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
The only remaining user unconditionally drops and reacquires the lock, which means we really don't need any additional (conditional) annotation. Reviewed-by:
Hannes Reinecke <hare@suse.com> Signed-off-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
Unused since the removal of the legacy request code. Reviewed-by:
Hannes Reinecke <hare@suse.com> Signed-off-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Nov 02, 2018
-
-
Dennis Zhou authored
This reverts a series committed earlier due to null pointer exception bug report in [1]. It seems there are edge case interactions that I did not consider and will need some time to understand what causes the adverse interactions. The original series can be found in [2] with a follow up series in [3]. [1] https://www.spinics.net/lists/cgroups/msg20719.html [2] https://lore.kernel.org/lkml/20180911184137.35897-1-dennisszhou@gmail.com/ [3] https://lore.kernel.org/lkml/20181020185612.51587-1-dennis@kernel.org/ This reverts the following commits: d459d853, b2c3fa54, 101246ec, b3b9f24f, e2b09899, f0fcb3ec, c839e7a0, bdc24917, 74b7c02a, 5bf9a1f3, a7b39b4e, 07b05bcc, 49f4c2dc, 27e6fa99 Signed-off-by:
Dennis Zhou <dennis@kernel.org> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-