Skip to content
Snippets Groups Projects
  1. Nov 29, 2024
    • Yu Kuai's avatar
      block, bfq: fix bfqq uaf in bfq_limit_depth() · e8b8344d
      Yu Kuai authored
      
      Set new allocated bfqq to bic or remove freed bfqq from bic are both
      protected by bfqd->lock, however bfq_limit_depth() is deferencing bfqq
      from bic without the lock, this can lead to UAF if the io_context is
      shared by multiple tasks.
      
      For example, test bfq with io_uring can trigger following UAF in v6.6:
      
      ==================================================================
      BUG: KASAN: slab-use-after-free in bfqq_group+0x15/0x50
      
      Call Trace:
       <TASK>
       dump_stack_lvl+0x47/0x80
       print_address_description.constprop.0+0x66/0x300
       print_report+0x3e/0x70
       kasan_report+0xb4/0xf0
       bfqq_group+0x15/0x50
       bfqq_request_over_limit+0x130/0x9a0
       bfq_limit_depth+0x1b5/0x480
       __blk_mq_alloc_requests+0x2b5/0xa00
       blk_mq_get_new_requests+0x11d/0x1d0
       blk_mq_submit_bio+0x286/0xb00
       submit_bio_noacct_nocheck+0x331/0x400
       __block_write_full_folio+0x3d0/0x640
       writepage_cb+0x3b/0xc0
       write_cache_pages+0x254/0x6c0
       write_cache_pages+0x254/0x6c0
       do_writepages+0x192/0x310
       filemap_fdatawrite_wbc+0x95/0xc0
       __filemap_fdatawrite_range+0x99/0xd0
       filemap_write_and_wait_range.part.0+0x4d/0xa0
       blkdev_read_iter+0xef/0x1e0
       io_read+0x1b6/0x8a0
       io_issue_sqe+0x87/0x300
       io_wq_submit_work+0xeb/0x390
       io_worker_handle_work+0x24d/0x550
       io_wq_worker+0x27f/0x6c0
       ret_from_fork_asm+0x1b/0x30
       </TASK>
      
      Allocated by task 808602:
       kasan_save_stack+0x1e/0x40
       kasan_set_track+0x21/0x30
       __kasan_slab_alloc+0x83/0x90
       kmem_cache_alloc_node+0x1b1/0x6d0
       bfq_get_queue+0x138/0xfa0
       bfq_get_bfqq_handle_split+0xe3/0x2c0
       bfq_init_rq+0x196/0xbb0
       bfq_insert_request.isra.0+0xb5/0x480
       bfq_insert_requests+0x156/0x180
       blk_mq_insert_request+0x15d/0x440
       blk_mq_submit_bio+0x8a4/0xb00
       submit_bio_noacct_nocheck+0x331/0x400
       __blkdev_direct_IO_async+0x2dd/0x330
       blkdev_write_iter+0x39a/0x450
       io_write+0x22a/0x840
       io_issue_sqe+0x87/0x300
       io_wq_submit_work+0xeb/0x390
       io_worker_handle_work+0x24d/0x550
       io_wq_worker+0x27f/0x6c0
       ret_from_fork+0x2d/0x50
       ret_from_fork_asm+0x1b/0x30
      
      Freed by task 808589:
       kasan_save_stack+0x1e/0x40
       kasan_set_track+0x21/0x30
       kasan_save_free_info+0x27/0x40
       __kasan_slab_free+0x126/0x1b0
       kmem_cache_free+0x10c/0x750
       bfq_put_queue+0x2dd/0x770
       __bfq_insert_request.isra.0+0x155/0x7a0
       bfq_insert_request.isra.0+0x122/0x480
       bfq_insert_requests+0x156/0x180
       blk_mq_dispatch_plug_list+0x528/0x7e0
       blk_mq_flush_plug_list.part.0+0xe5/0x590
       __blk_flush_plug+0x3b/0x90
       blk_finish_plug+0x40/0x60
       do_writepages+0x19d/0x310
       filemap_fdatawrite_wbc+0x95/0xc0
       __filemap_fdatawrite_range+0x99/0xd0
       filemap_write_and_wait_range.part.0+0x4d/0xa0
       blkdev_read_iter+0xef/0x1e0
       io_read+0x1b6/0x8a0
       io_issue_sqe+0x87/0x300
       io_wq_submit_work+0xeb/0x390
       io_worker_handle_work+0x24d/0x550
       io_wq_worker+0x27f/0x6c0
       ret_from_fork+0x2d/0x50
       ret_from_fork_asm+0x1b/0x30
      
      Fix the problem by protecting bic_to_bfqq() with bfqd->lock.
      
      CC: Jan Kara <jack@suse.cz>
      Fixes: 76f1df88 ("bfq: Limit number of requests consumed by each cgroup")
      Signed-off-by: default avatarYu Kuai <yukuai3@huawei.com>
      Link: https://lore.kernel.org/r/20241129091509.2227136-1-yukuai1@huaweicloud.com
      
      
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      e8b8344d
  2. Nov 27, 2024
  3. Nov 26, 2024
    • Christoph Hellwig's avatar
      mq-deadline: don't call req_get_ioprio from the I/O completion handler · 1b0cab32
      Christoph Hellwig authored
      
      req_get_ioprio looks at req->bio to find the I/O priority, which is not
      set when completing bios that the driver fully iterated through.
      
      Stash away the dd_per_prio in the elevator private data instead of looking
      it up again to optimize the code a bit while fixing the regression from
      removing the per-request ioprio value.
      
      Fixes: 6975c1a4 ("block: remove the ioprio field from struct request")
      Reported-by: default avatarChris Bainbridge <chris.bainbridge@gmail.com>
      Reported-by: default avatarSam Protsenko <semen.protsenko@linaro.org>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Tested-by: default avatarChris Bainbridge <chris.bainbridge@gmail.com>
      Tested-by: default avatarSam Protsenko <semen.protsenko@linaro.org>
      Reviewed-by: default avatarBart Van Assche <bvanassche@acm.org>
      Link: https://lore.kernel.org/r/20241126102136.619067-1-hch@lst.de
      
      
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      1b0cab32
    • Damien Le Moal's avatar
      block: Prevent potential deadlock in blk_revalidate_disk_zones() · 0b83c86b
      Damien Le Moal authored
      
      The function blk_revalidate_disk_zones() calls the function
      disk_update_zone_resources() after freezing the device queue. In turn,
      disk_update_zone_resources() calls queue_limits_start_update() which
      takes a queue limits mutex lock, resulting in the ordering:
      q->q_usage_counter check -> q->limits_lock. However, the usual ordering
      is to always take a queue limit lock before freezing the queue to commit
      the limits updates, e.g., the code pattern:
      
      lim = queue_limits_start_update(q);
      ...
      blk_mq_freeze_queue(q);
      ret = queue_limits_commit_update(q, &lim);
      blk_mq_unfreeze_queue(q);
      
      Thus, blk_revalidate_disk_zones() introduces a potential circular
      locking dependency deadlock that lockdep sometimes catches with the
      splat:
      
      [   51.934109] ======================================================
      [   51.935916] WARNING: possible circular locking dependency detected
      [   51.937561] 6.12.0+ #2107 Not tainted
      [   51.938648] ------------------------------------------------------
      [   51.940351] kworker/u16:4/157 is trying to acquire lock:
      [   51.941805] ffff9fff0aa0bea8 (&q->limits_lock){+.+.}-{4:4}, at: disk_update_zone_resources+0x86/0x170
      [   51.944314]
                     but task is already holding lock:
      [   51.945688] ffff9fff0aa0b890 (&q->q_usage_counter(queue)#3){++++}-{0:0}, at: blk_revalidate_disk_zones+0x15f/0x340
      [   51.948527]
                     which lock already depends on the new lock.
      
      [   51.951296]
                     the existing dependency chain (in reverse order) is:
      [   51.953708]
                     -> #1 (&q->q_usage_counter(queue)#3){++++}-{0:0}:
      [   51.956131]        blk_queue_enter+0x1c9/0x1e0
      [   51.957290]        blk_mq_alloc_request+0x187/0x2a0
      [   51.958365]        scsi_execute_cmd+0x78/0x490 [scsi_mod]
      [   51.959514]        read_capacity_16+0x111/0x410 [sd_mod]
      [   51.960693]        sd_revalidate_disk.isra.0+0x872/0x3240 [sd_mod]
      [   51.962004]        sd_probe+0x2d7/0x520 [sd_mod]
      [   51.962993]        really_probe+0xd5/0x330
      [   51.963898]        __driver_probe_device+0x78/0x110
      [   51.964925]        driver_probe_device+0x1f/0xa0
      [   51.965916]        __driver_attach_async_helper+0x60/0xe0
      [   51.967017]        async_run_entry_fn+0x2e/0x140
      [   51.968004]        process_one_work+0x21f/0x5a0
      [   51.968987]        worker_thread+0x1dc/0x3c0
      [   51.969868]        kthread+0xe0/0x110
      [   51.970377]        ret_from_fork+0x31/0x50
      [   51.970983]        ret_from_fork_asm+0x11/0x20
      [   51.971587]
                     -> #0 (&q->limits_lock){+.+.}-{4:4}:
      [   51.972479]        __lock_acquire+0x1337/0x2130
      [   51.973133]        lock_acquire+0xc5/0x2d0
      [   51.973691]        __mutex_lock+0xda/0xcf0
      [   51.974300]        disk_update_zone_resources+0x86/0x170
      [   51.975032]        blk_revalidate_disk_zones+0x16c/0x340
      [   51.975740]        sd_zbc_revalidate_zones+0x73/0x160 [sd_mod]
      [   51.976524]        sd_revalidate_disk.isra.0+0x465/0x3240 [sd_mod]
      [   51.977824]        sd_probe+0x2d7/0x520 [sd_mod]
      [   51.978917]        really_probe+0xd5/0x330
      [   51.979915]        __driver_probe_device+0x78/0x110
      [   51.981047]        driver_probe_device+0x1f/0xa0
      [   51.982143]        __driver_attach_async_helper+0x60/0xe0
      [   51.983282]        async_run_entry_fn+0x2e/0x140
      [   51.984319]        process_one_work+0x21f/0x5a0
      [   51.985873]        worker_thread+0x1dc/0x3c0
      [   51.987289]        kthread+0xe0/0x110
      [   51.988546]        ret_from_fork+0x31/0x50
      [   51.989926]        ret_from_fork_asm+0x11/0x20
      [   51.991376]
                     other info that might help us debug this:
      
      [   51.994127]  Possible unsafe locking scenario:
      
      [   51.995651]        CPU0                    CPU1
      [   51.996694]        ----                    ----
      [   51.997716]   lock(&q->q_usage_counter(queue)#3);
      [   51.998817]                                lock(&q->limits_lock);
      [   52.000043]                                lock(&q->q_usage_counter(queue)#3);
      [   52.001638]   lock(&q->limits_lock);
      [   52.002485]
                      *** DEADLOCK ***
      
      Prevent this issue by moving the calls to blk_mq_freeze_queue() and
      blk_mq_unfreeze_queue() around the call to queue_limits_commit_update()
      in disk_update_zone_resources(). In case of revalidation failure, the
      call to disk_free_zone_resources() in blk_revalidate_disk_zones()
      is still done with the queue frozen as before.
      
      Fixes: 843283e9 ("block: Fake max open zones limit when there is no limit")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarDamien Le Moal <dlemoal@kernel.org>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Link: https://lore.kernel.org/r/20241126104705.183996-1-dlemoal@kernel.org
      
      
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      0b83c86b
  4. Nov 25, 2024
  5. Nov 20, 2024
    • Christoph Hellwig's avatar
      block: req->bio is always set in the merge code · 81314bfb
      Christoph Hellwig authored
      
      As smatch, which is a lot smarter than me noticed.  So remove the checks
      for it, and condense these checks a bit including the comments stating
      the obvious.
      
      Reported-by: default avatarDan Carpenter <dan.carpenter@linaro.org>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarJohn Garry <john.g.garry@oracle.com>
      Reviewed-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Reviewed-by: default avatarJohannes Thumshirn <johannes.thumshirn@wdc.com>
      Link: https://lore.kernel.org/r/20241119161157.1328171-3-hch@lst.de
      
      
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      81314bfb
    • Christoph Hellwig's avatar
    • Suraj Sonawane's avatar
      block: blk-mq: fix uninit-value in blk_rq_prep_clone and refactor · dcbb598e
      Suraj Sonawane authored
      Fix an issue detected by the `smatch` tool:
      
      block/blk-mq.c:3314 blk_rq_prep_clone() error: uninitialized
      symbol 'bio'.
      
      This patch refactors `blk_rq_prep_clone()` to improve code
      readability and ensure safety by addressing potential misuse of
      the `bio` variable:
      
      - Move the bio_put(bio); call to the bio_ctr error handling block,
        which is the only place where it can be triggered.
      - Move the bio variable into the __rq_for_each_bio loop scope.
        This change removes the need to set bio to NULL at the loop's
        end.
      
      discussion on why bio remains uninitialized:
      https://lore.kernel.org/lkml/20241004141037.43277-1-surajsonawane0215@gmail.com
      
      Summary of above discussion:
      - I pointed out that `bio` can remain uninitialized if the
        allocation with `bio_alloc_clone` fails.
      - Keith Busch explained that `bio` is initialized to `NULL` when
        `bio_alloc_clone()` fails, preventing uninitialized usage.
      - John Garry questioned whether `rq_src->bio` being `NULL` could
        leave `bio` uninitialized. Keith clarified that in such cases,
        `bio` is not referenced, so it does not need initialization.
      - Christoph Hellwig recommended code improvements:
       - move the bio_put to the bio_ctr error handling, which is the only
         case where it can happen
       - move the bio variable into the __rq_for_each_bio scope, which
         also removed the need to zero it at the end of the loop
      
      These changes enhance code clarity, address static analysis tool
      warnings, and make the function more maintainable.
      
      thread of previous version patch discussion:
      https://lore.kernel.org/lkml/20241004100842.9052-1-surajsonawane0215@gmail.com
      
      
      
      Signed-off-by: default avatarSuraj Sonawane <surajsonawane0215@gmail.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Link: https://lore.kernel.org/r/20241119164412.37609-1-surajsonawane0215@gmail.com
      
      
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      dcbb598e
    • Zach Wade's avatar
      Revert "block, bfq: merge bfq_release_process_ref() into bfq_put_cooperator()" · cf5a60d9
      Zach Wade authored
      
      This reverts commit bc3b1e9e.
      
      The bic is associated with sync_bfqq, and bfq_release_process_ref cannot
      be put into bfq_put_cooperator.
      
      kasan report:
      [  400.347277] ==================================================================
      [  400.347287] BUG: KASAN: slab-use-after-free in bic_set_bfqq+0x200/0x230
      [  400.347420] Read of size 8 at addr ffff88881cab7d60 by task dockerd/5800
      [  400.347430]
      [  400.347436] CPU: 24 UID: 0 PID: 5800 Comm: dockerd Kdump: loaded Tainted: G E 6.12.0 #32
      [  400.347450] Tainted: [E]=UNSIGNED_MODULE
      [  400.347454] Hardware name: VMware, Inc. VMware20,1/440BX Desktop Reference Platform, BIOS VMW201.00V.20192059.B64.2207280713 07/28/2022
      [  400.347460] Call Trace:
      [  400.347464]  <TASK>
      [  400.347468]  dump_stack_lvl+0x5d/0x80
      [  400.347490]  print_report+0x174/0x505
      [  400.347521]  kasan_report+0xe0/0x160
      [  400.347541]  bic_set_bfqq+0x200/0x230
      [  400.347549]  bfq_bic_update_cgroup+0x419/0x740
      [  400.347560]  bfq_bio_merge+0x133/0x320
      [  400.347584]  blk_mq_submit_bio+0x1761/0x1e20
      [  400.347625]  __submit_bio+0x28b/0x7b0
      [  400.347664]  submit_bio_noacct_nocheck+0x6b2/0xd30
      [  400.347690]  iomap_readahead+0x50c/0x680
      [  400.347731]  read_pages+0x17f/0x9c0
      [  400.347785]  page_cache_ra_unbounded+0x366/0x4a0
      [  400.347795]  filemap_fault+0x83d/0x2340
      [  400.347819]  __xfs_filemap_fault+0x11a/0x7d0 [xfs]
      [  400.349256]  __do_fault+0xf1/0x610
      [  400.349270]  do_fault+0x977/0x11a0
      [  400.349281]  __handle_mm_fault+0x5d1/0x850
      [  400.349314]  handle_mm_fault+0x1f8/0x560
      [  400.349324]  do_user_addr_fault+0x324/0x970
      [  400.349337]  exc_page_fault+0x76/0xf0
      [  400.349350]  asm_exc_page_fault+0x26/0x30
      [  400.349360] RIP: 0033:0x55a480d77375
      [  400.349384] Code: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc 49 3b 66 10 0f 86 ae 02 00 00 55 48 89 e5 48 83 ec 58 48 8b 10 <83> 7a 10 00 0f 84 27 02 00 00 44 0f b6 42 28 44 0f b6 4a 29 41 80
      [  400.349392] RSP: 002b:00007f18c37fd8b8 EFLAGS: 00010216
      [  400.349401] RAX: 00007f18c37fd9d0 RBX: 0000000000000000 RCX: 0000000000000000
      [  400.349407] RDX: 000055a484407d38 RSI: 000000c000e8b0c0 RDI: 0000000000000000
      [  400.349412] RBP: 00007f18c37fd910 R08: 000055a484017f60 R09: 000055a484066f80
      [  400.349417] R10: 0000000000194000 R11: 0000000000000005 R12: 0000000000000008
      [  400.349422] R13: 0000000000000000 R14: 000000c000476a80 R15: 0000000000000000
      [  400.349430]  </TASK>
      [  400.349452]
      [  400.349454] Allocated by task 5800:
      [  400.349459]  kasan_save_stack+0x30/0x50
      [  400.349469]  kasan_save_track+0x14/0x30
      [  400.349475]  __kasan_slab_alloc+0x89/0x90
      [  400.349482]  kmem_cache_alloc_node_noprof+0xdc/0x2a0
      [  400.349492]  bfq_get_queue+0x1ef/0x1100
      [  400.349502]  __bfq_get_bfqq_handle_split+0x11a/0x510
      [  400.349511]  bfq_insert_requests+0xf55/0x9030
      [  400.349519]  blk_mq_flush_plug_list+0x446/0x14c0
      [  400.349527]  __blk_flush_plug+0x27c/0x4e0
      [  400.349534]  blk_finish_plug+0x52/0xa0
      [  400.349540]  _xfs_buf_ioapply+0x739/0xc30 [xfs]
      [  400.350246]  __xfs_buf_submit+0x1b2/0x640 [xfs]
      [  400.350967]  xfs_buf_read_map+0x306/0xa20 [xfs]
      [  400.351672]  xfs_trans_read_buf_map+0x285/0x7d0 [xfs]
      [  400.352386]  xfs_imap_to_bp+0x107/0x270 [xfs]
      [  400.353077]  xfs_iget+0x70d/0x1eb0 [xfs]
      [  400.353786]  xfs_lookup+0x2ca/0x3a0 [xfs]
      [  400.354506]  xfs_vn_lookup+0x14e/0x1a0 [xfs]
      [  400.355197]  __lookup_slow+0x19c/0x340
      [  400.355204]  lookup_one_unlocked+0xfc/0x120
      [  400.355211]  ovl_lookup_single+0x1b3/0xcf0 [overlay]
      [  400.355255]  ovl_lookup_layer+0x316/0x490 [overlay]
      [  400.355295]  ovl_lookup+0x844/0x1fd0 [overlay]
      [  400.355351]  lookup_one_qstr_excl+0xef/0x150
      [  400.355357]  do_unlinkat+0x22a/0x620
      [  400.355366]  __x64_sys_unlinkat+0x109/0x1e0
      [  400.355375]  do_syscall_64+0x82/0x160
      [  400.355384]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
      [  400.355393]
      [  400.355395] Freed by task 5800:
      [  400.355400]  kasan_save_stack+0x30/0x50
      [  400.355407]  kasan_save_track+0x14/0x30
      [  400.355413]  kasan_save_free_info+0x3b/0x70
      [  400.355422]  __kasan_slab_free+0x4f/0x70
      [  400.355429]  kmem_cache_free+0x176/0x520
      [  400.355438]  bfq_put_queue+0x67e/0x980
      [  400.355447]  bfq_bic_update_cgroup+0x407/0x740
      [  400.355454]  bfq_bio_merge+0x133/0x320
      [  400.355460]  blk_mq_submit_bio+0x1761/0x1e20
      [  400.355467]  __submit_bio+0x28b/0x7b0
      [  400.355473]  submit_bio_noacct_nocheck+0x6b2/0xd30
      [  400.355480]  iomap_readahead+0x50c/0x680
      [  400.355490]  read_pages+0x17f/0x9c0
      [  400.355498]  page_cache_ra_unbounded+0x366/0x4a0
      [  400.355505]  filemap_fault+0x83d/0x2340
      [  400.355514]  __xfs_filemap_fault+0x11a/0x7d0 [xfs]
      [  400.356204]  __do_fault+0xf1/0x610
      [  400.356213]  do_fault+0x977/0x11a0
      [  400.356221]  __handle_mm_fault+0x5d1/0x850
      [  400.356230]  handle_mm_fault+0x1f8/0x560
      [  400.356238]  do_user_addr_fault+0x324/0x970
      [  400.356248]  exc_page_fault+0x76/0xf0
      [  400.356258]  asm_exc_page_fault+0x26/0x30
      [  400.356266]
      [  400.356269] The buggy address belongs to the object at ffff88881cab7bc0
                      which belongs to the cache bfq_queue of size 576
      [  400.356276] The buggy address is located 416 bytes inside of
                      freed 576-byte region [ffff88881cab7bc0, ffff88881cab7e00)
      [  400.356285]
      [  400.356287] The buggy address belongs to the physical page:
      [  400.356292] page: refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff88881cab0b00 pfn:0x81cab0
      [  400.356300] head: order:3 mapcount:0 entire_mapcount:0 nr_pages_mapped:0 pincount:0
      [  400.356323] flags: 0x50000000000040(head|node=1|zone=2)
      [  400.356331] page_type: f5(slab)
      [  400.356340] raw: 0050000000000040 ffff88880a00c280 dead000000000122 0000000000000000
      [  400.356347] raw: ffff88881cab0b00 00000000802e0025 00000001f5000000 0000000000000000
      [  400.356354] head: 0050000000000040 ffff88880a00c280 dead000000000122 0000000000000000
      [  400.356359] head: ffff88881cab0b00 00000000802e0025 00000001f5000000 0000000000000000
      [  400.356365] head: 0050000000000003 ffffea002072ac01 ffffffffffffffff 0000000000000000
      [  400.356370] head: 0000000000000008 0000000000000000 00000000ffffffff 0000000000000000
      [  400.356378] page dumped because: kasan: bad access detected
      [  400.356381]
      [  400.356383] Memory state around the buggy address:
      [  400.356387]  ffff88881cab7c00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      [  400.356392]  ffff88881cab7c80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      [  400.356397] >ffff88881cab7d00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      [  400.356400]                                                        ^
      [  400.356405]  ffff88881cab7d80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      [  400.356409]  ffff88881cab7e00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
      [  400.356413] ==================================================================
      
      Cc: stable@vger.kernel.org
      Fixes: bc3b1e9e ("block, bfq: merge bfq_release_process_ref() into bfq_put_cooperator()")
      Signed-off-by: default avatarZach Wade <zachwade.k@gmail.com>
      Cc: Ding Hui <dinghui@sangfor.com.cn>
      Reviewed-by: default avatarYu Kuai <yukuai3@huawei.com>
      Link: https://lore.kernel.org/r/20241119153410.2546-1-zachwade.k@gmail.com
      
      
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      cf5a60d9
  6. Nov 19, 2024
    • John Garry's avatar
      block: Support atomic writes limits for stacked devices · d7f36dc4
      John Garry authored
      
      Allow stacked devices to support atomic writes by aggregating the minimum
      capability of all bottom devices.
      
      Flag BLK_FEAT_ATOMIC_WRITES_STACKED is set for stacked devices which
      have been enabled to support atomic writes.
      
      Some things to note on the implementation:
      - For simplicity, all bottom devices must have same atomic write boundary
        value (if any)
      - The atomic write boundary must be a power-of-2 already, but this
        restriction could be relaxed. Furthermore, it is now required that the
        chunk sectors for a top device must be aligned with this boundary.
      - If a bottom device atomic write unit min/max are not aligned with the
        top device chunk sectors, the top device atomic write unit min/max are
        reduced to a value which works for the chunk sectors.
      
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJohn Garry <john.g.garry@oracle.com>
      Reviewed-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Link: https://lore.kernel.org/r/20241118105018.1870052-3-john.g.garry@oracle.com
      
      
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      d7f36dc4
    • John Garry's avatar
      block: Add extra checks in blk_validate_atomic_write_limits() · d00eea91
      John Garry authored
      
      It is so far expected that the limits passed are valid.
      
      In future atomic writes will be supported for stacked block devices, and
      calculating the limits there will be complicated, so add extra sanity
      checks to ensure that the values are always valid.
      
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJohn Garry <john.g.garry@oracle.com>
      Reviewed-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Link: https://lore.kernel.org/r/20241118105018.1870052-2-john.g.garry@oracle.com
      
      
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      d00eea91
    • John Garry's avatar
      block: Drop granularity check in queue_limit_discard_alignment() · e924da7d
      John Garry authored
      
      lim->discard_granularity is always at least SECTOR_SIZE, so drop the
      pointless check for granularity less than SECTOR_SIZE.
      
      Signed-off-by: default avatarJohn Garry <john.g.garry@oracle.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Link: https://lore.kernel.org/r/20241112092144.4059847-1-john.g.garry@oracle.com
      
      
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      e924da7d
    • Yu Kuai's avatar
      block: fix uaf for flush rq while iterating tags · 3802f73b
      Yu Kuai authored
      
      blk_mq_clear_flush_rq_mapping() is not called during scsi probe, by
      checking blk_queue_init_done(). However, QUEUE_FLAG_INIT_DONE is cleared
      in del_gendisk by commit aec89dc5 ("block: keep q_usage_counter in
      atomic mode after del_gendisk"), hence for disk like scsi, following
      blk_mq_destroy_queue() will not clear flush rq from tags->rqs[] as well,
      cause following uaf that is found by our syzkaller for v6.6:
      
      ==================================================================
      BUG: KASAN: slab-use-after-free in blk_mq_find_and_get_req+0x16e/0x1a0 block/blk-mq-tag.c:261
      Read of size 4 at addr ffff88811c969c20 by task kworker/1:2H/224909
      
      CPU: 1 PID: 224909 Comm: kworker/1:2H Not tainted 6.6.0-ga836a5060850 #32
      Workqueue: kblockd blk_mq_timeout_work
      Call Trace:
      
      __dump_stack lib/dump_stack.c:88 [inline]
      dump_stack_lvl+0x91/0xf0 lib/dump_stack.c:106
      print_address_description.constprop.0+0x66/0x300 mm/kasan/report.c:364
      print_report+0x3e/0x70 mm/kasan/report.c:475
      kasan_report+0xb8/0xf0 mm/kasan/report.c:588
      blk_mq_find_and_get_req+0x16e/0x1a0 block/blk-mq-tag.c:261
      bt_iter block/blk-mq-tag.c:288 [inline]
      __sbitmap_for_each_set include/linux/sbitmap.h:295 [inline]
      sbitmap_for_each_set include/linux/sbitmap.h:316 [inline]
      bt_for_each+0x455/0x790 block/blk-mq-tag.c:325
      blk_mq_queue_tag_busy_iter+0x320/0x740 block/blk-mq-tag.c:534
      blk_mq_timeout_work+0x1a3/0x7b0 block/blk-mq.c:1673
      process_one_work+0x7c4/0x1450 kernel/workqueue.c:2631
      process_scheduled_works kernel/workqueue.c:2704 [inline]
      worker_thread+0x804/0xe40 kernel/workqueue.c:2785
      kthread+0x346/0x450 kernel/kthread.c:388
      ret_from_fork+0x4d/0x80 arch/x86/kernel/process.c:147
      ret_from_fork_asm+0x1b/0x30 arch/x86/entry/entry_64.S:293
      
      Allocated by task 942:
      kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
      kasan_set_track+0x25/0x30 mm/kasan/common.c:52
      ____kasan_kmalloc mm/kasan/common.c:374 [inline]
      __kasan_kmalloc mm/kasan/common.c:383 [inline]
      __kasan_kmalloc+0xaa/0xb0 mm/kasan/common.c:380
      kasan_kmalloc include/linux/kasan.h:198 [inline]
      __do_kmalloc_node mm/slab_common.c:1007 [inline]
      __kmalloc_node+0x69/0x170 mm/slab_common.c:1014
      kmalloc_node include/linux/slab.h:620 [inline]
      kzalloc_node include/linux/slab.h:732 [inline]
      blk_alloc_flush_queue+0x144/0x2f0 block/blk-flush.c:499
      blk_mq_alloc_hctx+0x601/0x940 block/blk-mq.c:3788
      blk_mq_alloc_and_init_hctx+0x27f/0x330 block/blk-mq.c:4261
      blk_mq_realloc_hw_ctxs+0x488/0x5e0 block/blk-mq.c:4294
      blk_mq_init_allocated_queue+0x188/0x860 block/blk-mq.c:4350
      blk_mq_init_queue_data block/blk-mq.c:4166 [inline]
      blk_mq_init_queue+0x8d/0x100 block/blk-mq.c:4176
      scsi_alloc_sdev+0x843/0xd50 drivers/scsi/scsi_scan.c:335
      scsi_probe_and_add_lun+0x77c/0xde0 drivers/scsi/scsi_scan.c:1189
      __scsi_scan_target+0x1fc/0x5a0 drivers/scsi/scsi_scan.c:1727
      scsi_scan_channel drivers/scsi/scsi_scan.c:1815 [inline]
      scsi_scan_channel+0x14b/0x1e0 drivers/scsi/scsi_scan.c:1791
      scsi_scan_host_selected+0x2fe/0x400 drivers/scsi/scsi_scan.c:1844
      scsi_scan+0x3a0/0x3f0 drivers/scsi/scsi_sysfs.c:151
      store_scan+0x2a/0x60 drivers/scsi/scsi_sysfs.c:191
      dev_attr_store+0x5c/0x90 drivers/base/core.c:2388
      sysfs_kf_write+0x11c/0x170 fs/sysfs/file.c:136
      kernfs_fop_write_iter+0x3fc/0x610 fs/kernfs/file.c:338
      call_write_iter include/linux/fs.h:2083 [inline]
      new_sync_write+0x1b4/0x2d0 fs/read_write.c:493
      vfs_write+0x76c/0xb00 fs/read_write.c:586
      ksys_write+0x127/0x250 fs/read_write.c:639
      do_syscall_x64 arch/x86/entry/common.c:51 [inline]
      do_syscall_64+0x70/0x120 arch/x86/entry/common.c:81
      entry_SYSCALL_64_after_hwframe+0x78/0xe2
      
      Freed by task 244687:
      kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
      kasan_set_track+0x25/0x30 mm/kasan/common.c:52
      kasan_save_free_info+0x2b/0x50 mm/kasan/generic.c:522
      ____kasan_slab_free mm/kasan/common.c:236 [inline]
      __kasan_slab_free+0x12a/0x1b0 mm/kasan/common.c:244
      kasan_slab_free include/linux/kasan.h:164 [inline]
      slab_free_hook mm/slub.c:1815 [inline]
      slab_free_freelist_hook mm/slub.c:1841 [inline]
      slab_free mm/slub.c:3807 [inline]
      __kmem_cache_free+0xe4/0x520 mm/slub.c:3820
      blk_free_flush_queue+0x40/0x60 block/blk-flush.c:520
      blk_mq_hw_sysfs_release+0x4a/0x170 block/blk-mq-sysfs.c:37
      kobject_cleanup+0x136/0x410 lib/kobject.c:689
      kobject_release lib/kobject.c:720 [inline]
      kref_put include/linux/kref.h:65 [inline]
      kobject_put+0x119/0x140 lib/kobject.c:737
      blk_mq_release+0x24f/0x3f0 block/blk-mq.c:4144
      blk_free_queue block/blk-core.c:298 [inline]
      blk_put_queue+0xe2/0x180 block/blk-core.c:314
      blkg_free_workfn+0x376/0x6e0 block/blk-cgroup.c:144
      process_one_work+0x7c4/0x1450 kernel/workqueue.c:2631
      process_scheduled_works kernel/workqueue.c:2704 [inline]
      worker_thread+0x804/0xe40 kernel/workqueue.c:2785
      kthread+0x346/0x450 kernel/kthread.c:388
      ret_from_fork+0x4d/0x80 arch/x86/kernel/process.c:147
      ret_from_fork_asm+0x1b/0x30 arch/x86/entry/entry_64.S:293
      
      Other than blk_mq_clear_flush_rq_mapping(), the flag is only used in
      blk_register_queue() from initialization path, hence it's safe not to
      clear the flag in del_gendisk. And since QUEUE_FLAG_REGISTERED already
      make sure that queue should only be registered once, there is no need
      to test the flag as well.
      
      Fixes: 6cfeadbf ("blk-mq: don't clear flush_rq from tags->rqs[]")
      Depends-on: commit aec89dc5 ("block: keep q_usage_counter in atomic mode after del_gendisk")
      Signed-off-by: default avatarYu Kuai <yukuai3@huawei.com>
      Reviewed-by: default avatarMing Lei <ming.lei@redhat.com>
      Link: https://lore.kernel.org/r/20241104110005.1412161-1-yukuai1@huaweicloud.com
      
      
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      3802f73b
  7. Nov 18, 2024
  8. Nov 13, 2024
  9. Nov 12, 2024
  10. Nov 11, 2024
  11. Nov 07, 2024
  12. Nov 04, 2024
  13. Oct 31, 2024
Loading