- Dec 26, 2022
-
-
Yu Kuai authored
Commit 64dc8c73 ("block, bfq: fix possible uaf for 'bfqq->bic'") will access 'bic->bfqq' in bic_set_bfqq(), however, bfq_exit_icq_bfqq() can free bfqq first, and then call bic_set_bfqq(), which will cause uaf. Fix the problem by moving bfq_exit_bfqq() behind bic_set_bfqq(). Fixes: 64dc8c73 ("block, bfq: fix possible uaf for 'bfqq->bic'") Reported-by:
Yi Zhang <yi.zhang@redhat.com> Signed-off-by:
Yu Kuai <yukuai3@huawei.com> Link: https://lore.kernel.org/r/20221226030605.1437081-1-yukuai1@huaweicloud.com Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Dec 25, 2022
-
-
Steven Rostedt (Google) authored
Due to several bugs caused by timers being re-armed after they are shutdown and just before they are freed, a new state of timers was added called "shutdown". After a timer is set to this state, then it can no longer be re-armed. The following script was run to find all the trivial locations where del_timer() or del_timer_sync() is called in the same function that the object holding the timer is freed. It also ignores any locations where the timer->function is modified between the del_timer*() and the free(), as that is not considered a "trivial" case. This was created by using a coccinelle script and the following commands: $ cat timer.cocci @@ expression ptr, slab; identifier timer, rfield; @@ ( - del_timer(&ptr->timer); + timer_shutdown(&ptr->timer); | - del_timer_sync(&ptr->timer); + timer_shutdown_sync(&ptr->timer); ) ... when strict when != ptr->timer ( kfree_rcu(ptr, rfield); | kmem_cache_free(slab, ptr); | kfree(ptr); ) $ spatch timer.cocci . > /tmp/t.patch $ patch -p1 < /tmp/t.patch Link: https://lore.kernel.org/lkml/20221123201306.823305113@linutronix.de/ Signed-off-by:
Steven Rostedt (Google) <rostedt@goodmis.org> Acked-by: Pavel Machek <pavel@ucw.cz> [ LED ] Acked-by: Kalle Valo <kvalo@kernel.org> [ wireless ] Acked-by: Paolo Abeni <pabeni@redhat.com> [ networking ] Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- Dec 15, 2022
-
-
Ming Lei authored
For blk-mq, queue release handler is usually called after blk_mq_freeze_queue_wait() returns. However, the q_usage_counter->release() handler may not be run yet at that time, so this can cause a use-after-free. Fix the issue by moving percpu_ref_exit() into blk_free_queue_rcu(). Since ->release() is called with rcu read lock held, it is agreed that the race should be covered in caller per discussion from the two links. Reported-by:
Zhang Wensheng <zhangwensheng@huaweicloud.com> Reported-by:
Zhong Jinghua <zhongjinghua@huawei.com> Link: https://lore.kernel.org/linux-block/Y5prfOjyyjQKUrtH@T590/T/#u Link: https://lore.kernel.org/lkml/Y4%2FmzMd4evRg9yDi@fedora/ Cc: Hillf Danton <hdanton@sina.com> Cc: Yu Kuai <yukuai3@huawei.com> Cc: Dennis Zhou <dennis@kernel.org> Fixes: 2b0d3d3e ("percpu_ref: reduce memory footprint of percpu_ref in fast path") Signed-off-by:
Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20221215021629.74870-1-ming.lei@redhat.com Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Yuwei Guan authored
The 'bfqd->num_groups_with_pending_reqs' is used when CONFIG_BFQ_GROUP_IOSCHED is enabled, so let the variables and processes take effect when CONFIG_BFQ_GROUP_IOSCHED is enabled. Cc: Yu Kuai <yukuai3@huawei.com> Signed-off-by:
Yuwei Guan <Yuwei.Guan@zeekrlife.com> Reviewed-by:
Jan Kara <jack@suse.cz> Reviewed-by:
Yu Kuai <yukuai3@huawei.com> Link: https://lore.kernel.org/r/20221110112622.389332-1-Yuwei.Guan@zeekrlife.com Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Dec 14, 2022
-
-
Tejun Heo authored
When a gendisk is successfully initialized but add_disk() fails such as when a loop device has invalid number of minor device numbers specified, blkcg_init_disk() is called during init and then blkcg_exit_disk() during error handling. Unfortunately, iolatency gets initialized in the former but doesn't get cleaned up in the latter. This is because, in non-error cases, the cleanup is performed by del_gendisk() calling rq_qos_exit(), the assumption being that rq_qos policies, iolatency being one of them, can only be activated once the disk is fully registered and visible. That assumption is true for wbt and iocost, but not so for iolatency as it gets initialized before add_disk() is called. It is desirable to lazy-init rq_qos policies because they are optional features and add to hot path overhead once initialized - each IO has to walk all the registered rq_qos policies. So, we want to switch iolatency to lazy init too. However, that's a bigger change. As a fix for the immediate problem, let's just add an extra call to rq_qos_exit() in blkcg_exit_disk(). This is safe because duplicate calls to rq_qos_exit() become noop's. Signed-off-by:
Tejun Heo <tj@kernel.org> Reported-by:
<darklight2357@icloud.com> Cc: Josef Bacik <josef@toxicpanda.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Fixes: d7067512 ("block: introduce blk-iolatency io controller") Cc: stable@vger.kernel.org # v4.19+ Reviewed-by:
Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/Y5TQ5gm3O4HXrXR3@slm.duckdns.org Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Jiri Slaby (SUSE) authored
Since gcc13, each member of an enum has the same type as the enum [1]. And that is inherited from its members. Provided: VTIME_PER_SEC_SHIFT = 37, VTIME_PER_SEC = 1LLU << VTIME_PER_SEC_SHIFT, ... AUTOP_CYCLE_NSEC = 10LLU * NSEC_PER_SEC, the named type is unsigned long. This generates warnings with gcc-13: block/blk-iocost.c: In function 'ioc_weight_prfill': block/blk-iocost.c:3037:37: error: format '%u' expects argument of type 'unsigned int', but argument 4 has type 'long unsigned int' block/blk-iocost.c: In function 'ioc_weight_show': block/blk-iocost.c:3047:34: error: format '%u' expects argument of type 'unsigned int', but argument 3 has type 'long unsigned int' So split the anonymous enum with large values to a separate enum, so that they don't affect other members. [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=36113 Cc: Martin Liska <mliska@suse.cz> Cc: Tejun Heo <tj@kernel.org> Cc: Josef Bacik <josef@toxicpanda.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: cgroups@vger.kernel.org Cc: linux-block@vger.kernel.org Signed-off-by:
Jiri Slaby (SUSE) <jirislaby@kernel.org> Link: https://lore.kernel.org/r/20221213120826.17446-1-jirislaby@kernel.org Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Yu Kuai authored
Just to make the code a litter cleaner, there are no functional changes. Signed-off-by:
Yu Kuai <yukuai3@huawei.com> Reviewed-by:
Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20221214033155.3455754-3-yukuai1@huaweicloud.com Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Yu Kuai authored
The return value is not used, hence remove it. Signed-off-by:
Yu Kuai <yukuai3@huawei.com> Reviewed-by:
Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20221214033155.3455754-2-yukuai1@huaweicloud.com Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Yu Kuai authored
Our test report a uaf for 'bfqq->bic' in 5.10: ================================================================== BUG: KASAN: use-after-free in bfq_select_queue+0x378/0xa30 CPU: 6 PID: 2318352 Comm: fsstress Kdump: loaded Not tainted 5.10.0-60.18.0.50.h602.kasan.eulerosv2r11.x86_64 #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58-20220320_160524-szxrtosci10000 04/01/2014 Call Trace: bfq_select_queue+0x378/0xa30 bfq_dispatch_request+0xe8/0x130 blk_mq_do_dispatch_sched+0x62/0xb0 __blk_mq_sched_dispatch_requests+0x215/0x2a0 blk_mq_sched_dispatch_requests+0x8f/0xd0 __blk_mq_run_hw_queue+0x98/0x180 __blk_mq_delay_run_hw_queue+0x22b/0x240 blk_mq_run_hw_queue+0xe3/0x190 blk_mq_sched_insert_requests+0x107/0x200 blk_mq_flush_plug_list+0x26e/0x3c0 blk_finish_plug+0x63/0x90 __iomap_dio_rw+0x7b5/0x910 iomap_dio_rw+0x36/0x80 ext4_dio_read_iter+0x146/0x190 [ext4] ext4_file_read_iter+0x1e2/0x230 [ext4] new_sync_read+0x29f/0x400 vfs_read+0x24e/0x2d0 ksys_read+0xd5/0x1b0 do_syscall_64+0x33/0x40 entry_SYSCALL_64_after_hwframe+0x61/0xc6 Commit 3bc5e683 ("bfq: Split shared queues on move between cgroups") changes that move process to a new cgroup will allocate a new bfqq to use, however, the old bfqq and new bfqq can point to the same bic: 1) Initial state, two process with io in the same cgroup. Process 1 Process 2 (BIC1) (BIC2) | Λ | Λ | | | | V | V | bfqq1 bfqq2 2) bfqq1 is merged to bfqq2. Process 1 Process 2 (BIC1) (BIC2) | | \-------------\| V bfqq1 bfqq2(coop) 3) Process 1 exit, then issue new io(denoce IOA) from Process 2. (BIC2) | Λ | | V | bfqq2(coop) 4) Before IOA is completed, move Process 2 to another cgroup and issue io. Process 2 (BIC2) Λ |\--------------\ | V bfqq2 bfqq3 Now that BIC2 points to bfqq3, while bfqq2 and bfqq3 both point to BIC2. If all the requests are completed, and Process 2 exit, BIC2 will be freed while there is no guarantee that bfqq2 will be freed before BIC2. Fix the problem by clearing bfqq->bic while bfqq is detached from bic. Fixes: 3bc5e683 ("bfq: Split shared queues on move between cgroups") Suggested-by:
Jan Kara <jack@suse.cz> Signed-off-by:
Yu Kuai <yukuai3@huawei.com> Reviewed-by:
Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20221214030430.3304151-1-yukuai1@huaweicloud.com Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Dec 08, 2022
-
-
Luca Boccassi authored
Usually when closing a crypto device (eg: dm-crypt with LUKS) the volume key is not required, as it requires root privileges anyway, and root can deny access to a disk in many ways regardless. Requiring the volume key to lock the device is a peculiarity of the OPAL specification. Given we might already have saved the key if the user requested it via the 'IOC_OPAL_SAVE' ioctl, we can use that key to lock the device if no key was provided here and the locking range matches, and the user sets the appropriate flag with 'IOC_OPAL_SAVE'. This allows integrating OPAL with tools and libraries that are used to the common behaviour and do not ask for the volume key when closing a device. Callers can always pass a non-zero key and it will be used regardless, as before. Suggested-by:
Štěpán Horáček <stepan.horacek@gmail.com> Signed-off-by:
Luca Boccassi <bluca@debian.org> Reviewed-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Christian Brauner <brauner@kernel.org> Link: https://lore.kernel.org/r/20221206092913.4625-1-luca.boccassi@gmail.com Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Kemeng Shi authored
Replace assocating with associating. Replace intiailized with initialized. Signed-off-by:
Kemeng Shi <shikemeng@huawei.com> Acked-by:
Tejun Heo <tj@kernel.org> Reviewed-by:
Mukesh Ojha <quic_mojha@quicinc.com> Reviewed-by:
Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20221206093307.378249-1-shikemeng@huaweicloud.com Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Dec 06, 2022
-
-
Christoph Hellwig authored
With the pktcdvdv removal, bio_copy_data_iter is unused now. Fold the logic into bio_copy_data and remove the separate lower level function. Signed-off-by:
Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20221206144407.722049-1-hch@lst.de Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Dec 05, 2022
-
-
Kemeng Shi authored
There is no need to update tg->slice_start[rw] to start when they are equal already. So remove "eq" part of check before update slice_start. Acked-by:
Tejun Heo <tj@kernel.org> Signed-off-by:
Kemeng Shi <shikemeng@huawei.com> Link: https://lore.kernel.org/r/20221205115709.251489-10-shikemeng@huaweicloud.com Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Kemeng Shi authored
There is no need to check elapsed time from last upgrade for each node in hierarchy. Move this check before traversing as throtl_can_upgrade do to remove repeat check. Reported-by:
kernel test robot <lkp@intel.com> Acked-by:
Tejun Heo <tj@kernel.org> Signed-off-by:
Kemeng Shi <shikemeng@huawei.com> Link: https://lore.kernel.org/r/20221205115709.251489-9-shikemeng@huaweicloud.com Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Kemeng Shi authored
Function tg_last_low_overflow_time is called with intermediate node as following: throtl_hierarchy_can_downgrade throtl_tg_can_downgrade tg_last_low_overflow_time throtl_hierarchy_can_upgrade throtl_tg_can_upgrade tg_last_low_overflow_time throtl_hierarchy_can_downgrade/throtl_hierarchy_can_upgrade will traverse from leaf node to sub-root node and pass traversed intermediate node to tg_last_low_overflow_time. No such limit could be found from context and implementation of tg_last_low_overflow_time, so remove this limit in comment. Acked-by:
Tejun Heo <tj@kernel.org> Signed-off-by:
Kemeng Shi <shikemeng@huawei.com> Link: https://lore.kernel.org/r/20221205115709.251489-8-shikemeng@huaweicloud.com Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Kemeng Shi authored
lapsed time -> elapsed time Acked-by:
Tejun Heo <tj@kernel.org> Signed-off-by:
Kemeng Shi <shikemeng@huawei.com> Link: https://lore.kernel.org/r/20221205115709.251489-7-shikemeng@huaweicloud.com Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Kemeng Shi authored
Commit c79892c5 ("blk-throttle: add upgrade logic for LIMIT_LOW state") added upgrade logic for low limit and methioned that 1. "To determine if a cgroup exceeds its limitation, we check if the cgroup has pending request. Since cgroup is throttled according to the limit, pending request means the cgroup reaches the limit." 2. "If a cgroup has limit set for both read and write, we consider the combination of them for upgrade. The reason is read IO and write IO can interfere with each other. If we do the upgrade based in one direction IO, the other direction IO could be severly harmed." Besides, we also determine that cgroup reaches low limit if low limit is 0, see comment in throtl_tg_can_upgrade. Collect the information above, the desgin of upgrade check is as following: 1.The low limit is reached if limit is zero or io is already queued. 2.Cgroup will pass upgrade check if low limits of READ and WRITE are both reached. Simpfy the check code described above to removce repeat check and improve readability. There is no functional change. Detail equivalence proof is as following: All replaced conditions to return true are as following: condition 1 (!read_limit && !write_limit) condition 2 read_limit && sq->nr_queued[READ] && (!write_limit || sq->nr_queued[WRITE]) condition 3 write_limit && sq->nr_queued[WRITE] && (!read_limit || sq->nr_queued[READ]) Transferring condition 2 as following: (read_limit && sq->nr_queued[READ]) && (!write_limit || sq->nr_queued[WRITE]) is equivalent to (read_limit && sq->nr_queued[READ]) && (!write_limit || (write_limit && sq->nr_queued[WRITE])) is equivalent to condition 2.1 (read_limit && sq->nr_queued[READ] && !write_limit) || condition 2.2 (read_limit && sq->nr_queued[READ] && (write_limit && sq->nr_queued[WRITE])) Transferring condition 3 as following: write_limit && sq->nr_queued[WRITE] && (!read_limit || sq->nr_queued[READ]) is equivalent to (write_limit && sq->nr_queued[WRITE]) && (!read_limit || (read_limit && sq->nr_queued[READ])) is equivalent to condition 3.1 ((write_limit && sq->nr_queued[WRITE]) && !read_limit) || condition 3.2 ((write_limit && sq->nr_queued[WRITE]) && (read_limit && sq->nr_queued[READ])) Condition 3.2 is the same as condition 2.2, so all conditions we get to return are as following: (!read_limit && !write_limit) (1) (!read_limit && (write_limit && sq->nr_queued[WRITE])) (3.1) ((read_limit && sq->nr_queued[READ]) && !write_limit) (2.1) ((write_limit && sq->nr_queued[WRITE]) && (read_limit && sq->nr_queued[READ])) (2.2) As we can extract conditions "(a1 || a2) && (b1 || b2)" to: a1 && b1 a1 && b2 a2 && b1 ab && b2 Considering that: a1 = !read_limit a2 = read_limit && sq->nr_queued[READ] b1 = !write_limit b2 = write_limit && sq->nr_queued[WRITE] We can pack replaced conditions to (!read_limit || (read_limit && sq->nr_queued[READ])) && (!write_limit || (write_limit && sq->nr_queued[WRITE])) which is equivalent to (!read_limit || sq->nr_queued[READ]) && (!write_limit || sq->nr_queued[WRITE]) Reported-by:
kernel test robot <lkp@intel.com> Acked-by:
Tejun Heo <tj@kernel.org> Signed-off-by:
Kemeng Shi <shikemeng@huawei.com> Link: https://lore.kernel.org/r/20221205115709.251489-6-shikemeng@huaweicloud.com Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Kemeng Shi authored
In C language, When executing "if (expression1 && expression2)" and expression1 return false, the expression2 may not be executed. For "tg_within_bps_limit(tg, bio, bps_limit, &bps_wait) && tg_within_iops_limit(tg, bio, iops_limit, &iops_wait))", if bps is limited, tg_within_bps_limit will return false and tg_within_iops_limit will not be called. So even bps and iops are both limited, iops_wait will not be calculated and is always zero. So wait time of iops is always ignored. Fix this by always calling tg_within_bps_limit and tg_within_iops_limit to get wait time for both bps and iops. Observed that: 1. Wait time in tg_within_iops_limit/tg_within_bps_limit need always be stored as wait argument is always passed. 2. wait time is stored to zero if iops/bps is limited otherwise non-zero is stored. Simpfy tg_within_iops_limit/tg_within_bps_limit by removing wait argument and return wait time directly. Caller tg_may_dispatch checks if wait time is zero to find if iops/bps is limited. Acked-by:
Tejun Heo <tj@kernel.org> Signed-off-by:
Kemeng Shi <shikemeng@huawei.com> Link: https://lore.kernel.org/r/20221205115709.251489-5-shikemeng@huaweicloud.com Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Kemeng Shi authored
Ignore cgroup without io queued in blk_throtl_cancel_bios for two reasons: 1. Save cpu cycle for trying to dispatch cgroup which is no io queued. 2. Avoid non-consistent state that cgroup is inserted to service queue without THROTL_TG_PENDING set as tg_update_disptime will unconditional re-insert cgroup to service queue. If we are on the default hierarchy, IO dispatched from child in tg_dispatch_one_bio will trigger inserting cgroup to service queue without erase first and ruin the tree. Acked-by:
Tejun Heo <tj@kernel.org> Signed-off-by:
Kemeng Shi <shikemeng@huawei.com> Link: https://lore.kernel.org/r/20221205115709.251489-4-shikemeng@huaweicloud.com Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Kemeng Shi authored
Consider situation as following (on the default hierarchy): HDD | root (bps limit: 4k) | child (bps limit :8k) | fio bs=8k Rate of fio is supposed to be 4k, but result is 8k. Reason is as following: Size of single IO from fio is larger than bytes allowed in one throtl_slice in child, so IOs are always queued in child group first. When queued IOs in child are dispatched to parent group, BIO_BPS_THROTTLED is set and these IOs will not be limited by tg_within_bps_limit anymore. Fix this by only set BIO_BPS_THROTTLED when the bio traversed the entire tree. There patch has no influence on situation which is not on the default hierarchy as each group is a single root group without parent. Acked-by:
Tejun Heo <tj@kernel.org> Signed-off-by:
Kemeng Shi <shikemeng@huawei.com> Link: https://lore.kernel.org/r/20221205115709.251489-3-shikemeng@huaweicloud.com Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Kemeng Shi authored
On the default hierarchy (cgroup2), the throttle interface files don't exist in the root cgroup, so the ablity to limit the whole system by configuring root group is not existing anymore. In general, cgroup doesn't wanna be in the business of restricting resources at the system level, so correct the stale comment that we can limit whole system to we can only limit subtree. Acked-by:
Tejun Heo <tj@kernel.org> Signed-off-by:
Kemeng Shi <shikemeng@huawei.com> Link: https://lore.kernel.org/r/20221205115709.251489-2-shikemeng@huaweicloud.com Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Dec 03, 2022
-
-
Greg Kroah-Hartman authored
With the removal of the pktcdvd driver, there are no in-kernel users of the devnode callback in struct block_device_operations, so it can be safely removed. If it is needed for new block drivers in the future, it can be brought back. Cc: Jens Axboe <axboe@kernel.dk> Cc: linux-block@vger.kernel.org Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org> Link: https://lore.kernel.org/r/20221203140747.1942969-1-gregkh@linuxfoundation.org Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Dec 02, 2022
-
-
Yang Li authored
Make the description of @gendisk to @disk in blkcg_schedule_throttle() to clear the below warnings: block/blk-cgroup.c:1850: warning: Function parameter or member 'disk' not described in 'blkcg_schedule_throttle' block/blk-cgroup.c:1850: warning: Excess function parameter 'gendisk' description in 'blkcg_schedule_throttle' Fixes: de185b56 ("blk-cgroup: pass a gendisk to blkcg_schedule_throttle") Link: https://bugzilla.openanolis.cn/show_bug.cgi?id=3338 Reported-by:
Abaci Robot <abaci@linux.alibaba.com> Signed-off-by:
Yang Li <yang.lee@linux.alibaba.com> Link: https://lore.kernel.org/r/20221202011713.14834-1-yang.lee@linux.alibaba.com Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Dec 01, 2022
-
-
Tianjia Zhang authored
SM4 is a symmetric cipher algorithm widely used in China. The SM4-XTS variant is used to encrypt length-preserving data. This is the mandatory algorithm in some special scenarios. Add support for the algorithm to block inline encryption. This is needed for the inlinecrypt mount option to be supported via blk-crypto-fallback, as it is for the other fscrypt modes. Signed-off-by:
Tianjia Zhang <tianjia.zhang@linux.alibaba.com> Signed-off-by:
Eric Biggers <ebiggers@google.com> Link: https://lore.kernel.org/r/20221201125819.36932-2-tianjia.zhang@linux.alibaba.com
-
Randy Dunlap authored
Use only one hyphen in kernel-doc notation between the function name and its short description. The is the documented kerenl-doc format. It also fixes the HTML presentation to be consistent with other functions. Signed-off-by:
Randy Dunlap <rdunlap@infradead.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: linux-block@vger.kernel.org Link: https://lore.kernel.org/r/20221201070331.25685-1-rdunlap@infradead.org Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Kemeng Shi authored
There is no iocg_pd_init function. The pd_alloc_fn function pointer of iocost policy is set with ioc_pd_init. Just correct it. Signed-off-by:
Kemeng Shi <shikemeng@huawei.com> Acked-by:
Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20221018121932.10792-6-shikemeng@huawei.com Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Kemeng Shi authored
If we trace vtime_base_rate instead of vtime_rate, there is nowhere which accesses now->vrate except function ioc_now using now->vrate locally. Just remove it. Signed-off-by:
Kemeng Shi <shikemeng@huawei.com> Acked-by:
Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20221018121932.10792-5-shikemeng@huawei.com Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Kemeng Shi authored
Since commit ac33e91e ("blk-iocost: implement vtime loss compensation") rename original vtime_rate to vtime_base_rate and current vtime_rate is original vtime_rate with compensation. The current rate showed in tracepoint is mixed with vtime_rate and vtime_base_rate: 1) In function ioc_adjust_base_vrate, the first trace_iocost_ioc_vrate_adj shows vtime_rate, the second trace_iocost_ioc_vrate_adj shows vtime_base_rate. 2) In function iocg_activate shows vtime_rate by calling TRACE_IOCG_PATH(iocg_activate... 3) In function ioc_check_iocgs shows vtime_rate by calling TRACE_IOCG_PATH(iocg_idle... Trace vtime_base_rate instead of vtime_rate as: 1) Before commit ac33e91e ("blk-iocost: implement vtime loss compensation"), the traced rate is without compensation, so still show rate without compensation. 2) The vtime_base_rate is more stable while vtime_rate heavily depends on excess budeget on current period which may change abruptly in next period. Signed-off-by:
Kemeng Shi <shikemeng@huawei.com> Acked-by:
Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20221018121932.10792-4-shikemeng@huawei.com Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Kemeng Shi authored
Since commit ac33e91e("blk-iocost: implement vtime loss compensation") split vtime_rate into vtime_rate and vtime_base_rate, we need reset both vtime_base_rate and vtime_rate when device parameters are refreshed. If vtime_base_rate is no reset here, vtime_rate will be overwritten with old vtime_base_rate soon in ioc_refresh_vrate. Signed-off-by:
Kemeng Shi <shikemeng@huawei.com> Acked-by:
Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20221018121932.10792-3-shikemeng@huawei.com Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Kemeng Shi authored
soley -> solely Signed-off-by:
Kemeng Shi <shikemeng@huawei.com> Acked-by:
Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20221018121932.10792-2-shikemeng@huawei.com Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Jan Kara authored
Since commit 10c70d95 ("block: remove the bd_openers checks in blk_drop_partitions") we allow rereading of partition table although there are users of the block device. This has an undesirable consequence that e.g. if sda and sdb are assembled to a RAID1 device md0 with partitions, BLKRRPART ioctl on sda will rescan partition table and create sda1 device. This partition device under a raid device confuses some programs (such as libstorage-ng used for initial partitioning for distribution installation) leading to failures. Fix the problem refusing to rescan partitions if there is another user that has the block device exclusively open. Cc: stable@vger.kernel.org Link: https://lore.kernel.org/all/20221130135344.2ul4cyfstfs3znxg@quack3 Fixes: 10c70d95 ("block: remove the bd_openers checks in blk_drop_partitions") Signed-off-by:
Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20221130175653.24299-1-jack@suse.cz [axboe: fold in followup fix] Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Nov 30, 2022
-
-
Christoph Hellwig authored
We can't just say that the last reference release may block, as any reference dropped could be the last one. So move the might_sleep() from blk_free_queue to blk_put_queue and update the documentation. Signed-off-by:
Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20221114042637.1009333-6-hch@lst.de Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
The kobject embedded into the request_queue is used for the queue directory in sysfs, but that is a child of the gendisks directory and is intimately tied to it. Move this kobject to the gendisk and use a refcount_t in the request_queue for the actual request_queue refcounting that is completely unrelated to the device model. Signed-off-by:
Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20221114042637.1009333-5-hch@lst.de Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
blk_register_queue fails to handle errors from blk_mq_sysfs_register, leaks various resources on errors and accidentally sets queue refs percpu refcount to percpu mode on kobject_add failure. Fix all that by properly unwinding on errors. Signed-off-by:
Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20221114042637.1009333-4-hch@lst.de Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
Split the debugfs removal from blk_unregister_queue into a helper so that the it can be reused for blk_register_queue error handling. Signed-off-by:
Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20221114042637.1009333-3-hch@lst.de Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
Prepare for changes to the block layer sysfs handling by passing the readily available gendisk to blk_crypto_sysfs_{,un}register. Signed-off-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Eric Biggers <ebiggers@google.com> Link: https://lore.kernel.org/r/20221114042637.1009333-2-hch@lst.de Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Jens Axboe authored
This reverts commit dae590a6. We've had a few reports on this causing a crash at boot time, because of a reference issue. While this problem seemginly did exist before the patch and needs solving separately, this patch makes it a lot easier to trigger. Link: https://lore.kernel.org/linux-block/CA+QYu4oxiRKC6hJ7F27whXy-PRBx=Tvb+-7TQTONN8qTtV3aDA@mail.gmail.com/ Link: https://lore.kernel.org/linux-block/69af7ccb-6901-c84c-0e95-5682ccfb750c@acm.org/ Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Nov 29, 2022
-
-
Jinlong Chen authored
We have bool type now, update the old signature. Signed-off-by:
Jinlong Chen <nickyc975@zju.edu.cn> Reviewed-by:
Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/0db0a0298758d60d0f4df8b7126ac6a381e5a5bb.1669736350.git.nickyc975@zju.edu.cn Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Jinlong Chen authored
The "pointer + offset" pattern is more resonable. Signed-off-by:
Jinlong Chen <nickyc975@zju.edu.cn> Reviewed-by:
Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/d9beaee71b14f7b2a39ab0db6458dc0f7d961ceb.1669736350.git.nickyc975@zju.edu.cn Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Jinlong Chen authored
Printing e->elevator_name in all cases improves the readability, and 'e' and 'cur' are identical in this branch. Suggested-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Jinlong Chen <nickyc975@zju.edu.cn> Link: https://lore.kernel.org/r/4bae180ffbac608ea0cf46ffa9739ce0973b60aa.1669736350.git.nickyc975@zju.edu.cn Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-