- Jan 24, 2024
-
-
Maksim Kiselev authored
Move set_capacity() outside of the section procected by (&d->lock). To avoid possible interrupt unsafe locking scenario: CPU0 CPU1 ---- ---- [1] lock(&bdev->bd_size_lock); local_irq_disable(); [2] lock(&d->lock); [3] lock(&bdev->bd_size_lock); <Interrupt> [4] lock(&d->lock); *** DEADLOCK *** Where [1](&bdev->bd_size_lock) hold by zram_add()->set_capacity(). [2]lock(&d->lock) hold by aoeblk_gdalloc(). And aoeblk_gdalloc() is trying to acquire [3](&bdev->bd_size_lock) at set_capacity() call. In this situation an attempt to acquire [4]lock(&d->lock) from aoecmd_cfg_rsp() will lead to deadlock. So the simplest solution is breaking lock dependency [2](&d->lock) -> [3](&bdev->bd_size_lock) by moving set_capacity() outside. Signed-off-by:
Maksim Kiselev <bigunclemax@gmail.com> Reviewed-by:
Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20240124072436.3745720-2-bigunclemax@gmail.com Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Jan 21, 2024
-
-
Ilya Dryomov authored
The running list is supposed to contain requests that are pinning the exclusive lock, i.e. those that must be flushed before exclusive lock is released. When wake_lock_waiters() is called to handle an error, requests on the acquiring list are failed with that error and no flushing takes place. Briefly moving them to the running list is not only pointless but also harmful: if exclusive lock gets acquired before all of their state machines are scheduled and go through rbd_lock_del_request(), we trigger rbd_assert(list_empty(&rbd_dev->running_list)); in rbd_try_acquire_lock(). Cc: stable@vger.kernel.org Fixes: 637cd060 ("rbd: new exclusive lock wait/wake code") Signed-off-by:
Ilya Dryomov <idryomov@gmail.com> Reviewed-by:
Dongsheng Yang <dongsheng.yang@easystack.cn>
-
Christophe JAILLET authored
ida_alloc() and ida_free() should be preferred to the deprecated ida_simple_get() and ida_simple_remove(). Note that the upper limit of ida_simple_get() is exclusive, while that of ida_alloc_max() is inclusive, so 1 has been subtracted. [ idryomov: tweak changelog ] Signed-off-by:
Christophe JAILLET <christophe.jaillet@wanadoo.fr> Reviewed-by:
Ilya Dryomov <idryomov@gmail.com> Signed-off-by:
Ilya Dryomov <idryomov@gmail.com>
-
- Jan 18, 2024
-
-
Christoph Hellwig authored
__loop_update_dio only checks the alignment requirement for block backed file systems, but misses them for the case where the loop device is created directly on top of another block device. Due to this creating a loop device with default option plus the direct I/O flag on a > 512 byte sector size file system will lead to incorrect I/O being submitted to the lower block device and a lot of error from the lock layer. This can be seen with xfstests generic/563. Fix the code in __loop_update_dio by factoring the alignment check into a helper, and calling that also for the struct block_device of a block device inode. Also remove the TODO comment talking about dynamically switching between buffered and direct I/O, which is a would be a recipe for horrible performance and occasional data loss. Fixes: 2e5ab5f3 ("block: loop: prepare for supporing direct IO") Signed-off-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20240117175901.871796-1-hch@lst.de Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Jan 17, 2024
-
-
Eric Dumazet authored
syzbot complains that msg->msg_get_inq value can be uninitialized [1] struct msghdr got many new fields recently, we should always make sure their values is zero by default. [1] BUG: KMSAN: uninit-value in tcp_recvmsg+0x686/0xac0 net/ipv4/tcp.c:2571 tcp_recvmsg+0x686/0xac0 net/ipv4/tcp.c:2571 inet_recvmsg+0x131/0x580 net/ipv4/af_inet.c:879 sock_recvmsg_nosec net/socket.c:1044 [inline] sock_recvmsg+0x12b/0x1e0 net/socket.c:1066 __sock_xmit+0x236/0x5c0 drivers/block/nbd.c:538 nbd_read_reply drivers/block/nbd.c:732 [inline] recv_work+0x262/0x3100 drivers/block/nbd.c:863 process_one_work kernel/workqueue.c:2627 [inline] process_scheduled_works+0x104e/0x1e70 kernel/workqueue.c:2700 worker_thread+0xf45/0x1490 kernel/workqueue.c:2781 kthread+0x3ed/0x540 kernel/kthread.c:388 ret_from_fork+0x66/0x80 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x11/0x20 arch/x86/entry/entry_64.S:242 Local variable msg created at: __sock_xmit+0x4c/0x5c0 drivers/block/nbd.c:513 nbd_read_reply drivers/block/nbd.c:732 [inline] recv_work+0x262/0x3100 drivers/block/nbd.c:863 CPU: 1 PID: 7465 Comm: kworker/u5:1 Not tainted 6.7.0-rc7-syzkaller-00041-gf016f7547aee #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 11/17/2023 Workqueue: nbd5-recv recv_work Fixes: f94fd25c ("tcp: pass back data left in socket after receive") Reported-by:
syzbot <syzkaller@googlegroups.com> Signed-off-by:
Eric Dumazet <edumazet@google.com> Cc: stable@vger.kernel.org Cc: Josef Bacik <josef@toxicpanda.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: linux-block@vger.kernel.org Cc: nbd@other.debian.org Reviewed-by:
Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240112132657.647112-1-edumazet@google.com Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Jan 15, 2024
-
-
Li RongQing authored
virtqueue_enable_cb() will call virtqueue_poll() which will check if queue is broken at beginning, so remove the virtqueue_is_broken() call Acked-by:
Jason Wang <jasowang@redhat.com> Signed-off-by:
Li RongQing <lirongqing@baidu.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Jan 14, 2024
-
-
Christophe JAILLET authored
ida_alloc() and ida_free() should be preferred to the deprecated ida_simple_get() and ida_simple_remove(). This is less verbose. Signed-off-by:
Christophe JAILLET <christophe.jaillet@wanadoo.fr> Link: https://lore.kernel.org/r/bf257b1078475a415cdc3344c6a750842946e367.1705222845.git.christophe.jaillet@wanadoo.fr Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Jan 08, 2024
-
-
Kirill A. Shutemov authored
commit 23baf831 ("mm, treewide: redefine MAX_ORDER sanely") has changed the definition of MAX_ORDER to be inclusive. This has caused issues with code that was not yet upstream and depended on the previous definition. To draw attention to the altered meaning of the define, rename MAX_ORDER to MAX_PAGE_ORDER. Link: https://lkml.kernel.org/r/20231228144704.14033-2-kirill.shutemov@linux.intel.com Signed-off-by:
Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org>
-
- Jan 04, 2024
-
-
liyouhong authored
Fix spelling typo in comment. Reported-by:
k2ci <kernel-bot@kylinos.cn> Signed-off-by:
liyouhong <liyouhong@kylinos.cn> Reviewed-by:
Juergen Gross <jgross@suse.com> Link: https://lore.kernel.org/r/20231226095701.172080-1-liyouhong@kylinos.cn Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Dec 29, 2023
-
-
Christoph Hellwig authored
The discard granularity now defaults to a single sector, so don't set that value explicitly. Signed-off-by:
Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20231228075545.362768-8-hch@lst.de Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
The discard granularity now defaults to a single sector, so don't set that value explicitly. Signed-off-by:
Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20231228075545.362768-7-hch@lst.de Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
The discard granularity now defaults to a single sector, so don't set that value explicitly. Also don't bother clearing it as a discard granularity without discard_sectors doesn't mean anything. Signed-off-by:
Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20231228075545.362768-6-hch@lst.de Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Dec 27, 2023
-
-
Christoph Hellwig authored
BLK_DEF_MAX_SECTORS despite the confusing name is the default cap for the max_sectors limits. Don't use it to initialize max_hw_setors, which is a hardware / driver capacility. Signed-off-by:
Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20231227092305.279567-4-hch@lst.de Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
BLK_DEF_MAX_SECTORS despite the confusing name is the default cap for the max_sectors limits. Don't use it to initialize max_hw_setors, which is a hardware / driver capacility. Signed-off-by:
Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20231227092305.279567-3-hch@lst.de Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
null_blk has some rather odd capping of the max_hw_sectors value to BLK_DEF_MAX_SECTORS, which doesn't make sense - max_hw_sector is the hardware limit, and BLK_DEF_MAX_SECTORS despite the confusing name is the default cap for the max_sectors field used for normal file system I/O. Remove all the capping, and simply leave it to the block layer or user to take up or not all of that for file system I/O. Fixes: ea17fd35 ("null_blk: Allow controlling max_hw_sectors limit") Signed-off-by:
Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20231227092305.279567-2-hch@lst.de Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
loop_set_status doesn't change anything relevant to the discard and write_zeroes setting, so don't bother calling loop_config_discard. Signed-off-by:
Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20231227082020.249427-1-hch@lst.de Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Dec 22, 2023
-
-
Randy Dunlap authored
Fix all kernel-doc warnings in drbd_actlog.c: drbd_actlog.c:963: warning: No description found for return value of 'drbd_rs_begin_io' drbd_actlog.c:1015: warning: Function parameter or member 'peer_device' not described in 'drbd_try_rs_begin_io' drbd_actlog.c:1015: warning: Excess function parameter 'device' description in 'drbd_try_rs_begin_io' drbd_actlog.c:1015: warning: No description found for return value of 'drbd_try_rs_begin_io' drbd_actlog.c:1197: warning: No description found for return value of 'drbd_rs_del_all' Fix one spelling error (s/ore/or/). Signed-off-by:
Randy Dunlap <rdunlap@infradead.org> Cc: Philipp Reisner <philipp.reisner@linbit.com> Cc: Lars Ellenberg <lars.ellenberg@linbit.com> Cc: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> Cc: <drbd-dev@lists.linbit.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: <linux-block@vger.kernel.org> Link: https://lore.kernel.org/r/20231222061909.8791-1-rdunlap@infradead.org Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Dec 20, 2023
-
-
Christoph Hellwig authored
Only use disk_set_zoned to actually enable zoned device support. For clearing it, call disk_clear_zoned, which is renamed from disk_clear_zone_settings and now directly clears the zoned flag as well. Signed-off-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Damien Le Moal <dlemoal@kernel.org> Reviewed-by:
Martin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20231217165359.604246-5-hch@lst.de Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
When zones were first added the SCSI and ATA specs, two different models were supported (in addition to the drive managed one that is invisible to the host): - host managed where non-conventional zones there is strict requirement to write at the write pointer, or else an error is returned - host aware where a write point is maintained if writes always happen at it, otherwise it is left in an under-defined state and the sequential write preferred zones behave like conventional zones (probably very badly performing ones, though) Not surprisingly this lukewarm model didn't prove to be very useful and was finally removed from the ZBC and SBC specs (NVMe never implemented it). Due to to the easily disappearing write pointer host software could never rely on the write pointer to actually be useful for say recovery. Fortunately only a few HDD prototypes shipped using this model which never made it to mass production. Drop the support before it is too late. Note that any such host aware prototype HDD can still be used with Linux as we'll now treat it as a conventional HDD. Signed-off-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Martin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20231217165359.604246-4-hch@lst.de Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
virtblk_revalidate_zones is called unconditionally from virtblk_config_changed_work from the virtio config_changed callback. virtblk_revalidate_zones is a bit odd in that it re-clears the zoned state for host aware or non-zoned devices, which isn't needed unless the zoned mode changed - but a zone mode change to a host managed model isn't handled at all, and virtio_blk also doesn't handle any other config change except for a capacity change is handled (and even if it was the upper layers above virtio_blk wouldn't handle it very well). But even the useful case of a size change that would add or remove zones isn't handled properly as blk_revalidate_disk_zones expects the device capacity to cover all zones, but the capacity is only updated after virtblk_revalidate_zones. As this code appears to be entirely untested and is getting in the way remove it for now, but it can be readded in a fixed version with proper test coverage if needed. Fixes: 95bfec41 ("virtio-blk: add support for zoned block devices") Fixes: f1ba4e67 ("virtio-blk: fix to match virtio spec") Signed-off-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Damien Le Moal <dlemoal@kernel.org> Reviewed-by:
Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by:
Martin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20231217165359.604246-3-hch@lst.de Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
Move reading and checking the zoned model from virtblk_probe_zoned_device into the caller, leaving only the code to perform the actual setup for host managed zoned devices in virtblk_probe_zoned_device. This allows to share the model reading and sharing between builds with and without CONFIG_BLK_DEV_ZONED, and improve it for the !CONFIG_BLK_DEV_ZONED case. Signed-off-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Damien Le Moal <dlemoal@kernel.org> Reviewed-by:
Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by:
Martin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20231217165359.604246-2-hch@lst.de Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Dec 13, 2023
-
-
Kees Cook authored
Since "dev_search_path" can technically be as large as PATH_MAX, there was a risk of truncation when copying it and a second string into "full_path" since it was also PATH_MAX sized. The W=1 builds were reporting this warning: drivers/block/rnbd/rnbd-srv.c: In function 'process_msg_open.isra': drivers/block/rnbd/rnbd-srv.c:616:51: warning: '%s' directive output may be truncated writing up to 254 bytes into a region of size between 0 and 4095 [-Wformat-truncation=] 616 | snprintf(full_path, PATH_MAX, "%s/%s", | ^~ In function 'rnbd_srv_get_full_path', inlined from 'process_msg_open.isra' at drivers/block/rnbd/rnbd-srv.c:721:14: drivers/block/rnbd/rnbd-srv.c:616:17: note: 'snprintf' output between 2 and 4351 bytes into a destination of size 4096 616 | snprintf(full_path, PATH_MAX, "%s/%s", | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 617 | dev_search_path, dev_name); | ~~~~~~~~~~~~~~~~~~~~~~~~~~ To fix this, unconditionally check for truncation (as was already done for the case where "%SESSNAME%" was present). Reported-by:
kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202312100355.lHoJPgKy-lkp@intel.com/ Cc: Md. Haris Iqbal <haris.iqbal@ionos.com> Cc: Jack Wang <jinpu.wang@ionos.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: <linux-block@vger.kernel.org> Signed-off-by:
Kees Cook <keescook@chromium.org> Acked-by:
Guoqing Jiang <guoqing.jiang@linux.dev> Acked-by:
Jack Wang <jinpu.wang@ionos.com> Link: https://lore.kernel.org/r/20231212214738.work.169-kees@kernel.org Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Dec 12, 2023
-
-
Pavel Begunkov authored
linux/io_uring.h is slowly becoming a rubbish bin where we put anything exposed to other subsystems. For instance, the task exit hooks and io_uring cmd infra are completely orthogonal and don't need each other's definitions. Start cleaning it up by splitting out all command bits into a new header file. Signed-off-by:
Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/7ec50bae6e21f371d3850796e716917fc141225a.1701391955.git.asml.silence@gmail.com Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Dec 11, 2023
-
-
Sergey Senozhatsky authored
Use kmap_local_page() instead of kmap_atomic() which has been deprecated. Link: https://lkml.kernel.org/r/20231128083845.848008-1-senozhatsky@chromium.org Signed-off-by:
Sergey Senozhatsky <senozhatsky@chromium.org> Acked-by:
Minchan Kim <minchan@kernel.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org>
-
Sergey Senozhatsky authored
Writeback is for incompressible and idle zram pages. Link: https://lkml.kernel.org/r/20231115024223.4133148-2-senozhatsky@chromium.org Signed-off-by:
Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Dmytro Maluka <dmaluka@chromium.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org>
-
Sergey Senozhatsky authored
ZRAM_MEMORY_TRACKING enables two features: - per-entry ac-time tracking - debugfs interface The latter one is the reason why memory-tracking depends on DEBUG_FS, while the former one is used far beyond debugging these days. Namely ac-time is used for fine grained writeback of idle entries (pages). Move ac-time tracking under its own config option so that it can be enabled (along with writeback) on systems without DEBUG_FS. [senozhatsky@chromium.org: ifdef fixup, per Dmytro] Link: https://lkml.kernel.org/r/20231117013543.540280-1-senozhatsky@chromium.org Link: https://lkml.kernel.org/r/20231115024223.4133148-1-senozhatsky@chromium.org Signed-off-by:
Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Dmytro Maluka <dmaluka@chromium.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org>
-
- Dec 05, 2023
-
-
Jens Axboe authored
With the removal of the 'iov' argument to import_single_range(), the two functions are now fully identical. Convert the import_single_range() callers to import_ubuf(), and remove the former fully. Signed-off-by:
Jens Axboe <axboe@kernel.dk> Link: https://lore.kernel.org/r/20231204174827.1258875-3-axboe@kernel.dk Signed-off-by:
Christian Brauner <brauner@kernel.org>
-
Jens Axboe authored
It is entirely unused, just get rid of it. Signed-off-by:
Jens Axboe <axboe@kernel.dk> Link: https://lore.kernel.org/r/20231204174827.1258875-2-axboe@kernel.dk Signed-off-by:
Christian Brauner <brauner@kernel.org>
-
- Dec 04, 2023
-
-
Commit 4e040052 ("virtio-blk: support polling I/O") triggers the following gcc 13 W=1 warnings: drivers/block/virtio_blk.c: In function ‘init_vq’: drivers/block/virtio_blk.c:1077:68: warning: ‘%d’ directive output may be truncated writing between 1 and 11 bytes into a region of size 7 [-Wformat-truncation=] 1077 | snprintf(vblk->vqs[i].name, VQ_NAME_LEN, "req_poll.%d", i); | ^~ drivers/block/virtio_blk.c:1077:58: note: directive argument in the range [-2147483648, 65534] 1077 | snprintf(vblk->vqs[i].name, VQ_NAME_LEN, "req_poll.%d", i); | ^~~~~~~~~~~~~ drivers/block/virtio_blk.c:1077:17: note: ‘snprintf’ output between 11 and 21 bytes into a destination of size 16 1077 | snprintf(vblk->vqs[i].name, VQ_NAME_LEN, "req_poll.%d", i); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ This is a false positive because the lower bound -2147483648 is incorrect. The true range of i is [0, num_vqs - 1] where 0 < num_vqs < 65536. The code mixes int, unsigned short, and unsigned int types in addition to using "%d" for an unsigned value. Use unsigned short and "%u" consistently to solve the compiler warning. Cc: Suwan Kim <suwan.kim027@gmail.com> Reported-by:
kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202312041509.DIyvEt9h-lkp@intel.com/ Signed-off-by:
Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20231204140743.1487843-1-stefanha@redhat.com> Signed-off-by:
Michael S. Tsirkin <mst@redhat.com>
-
- Nov 27, 2023
-
-
Supriti Singh authored
While printing error, replace %ld by %pe. %pe prints a string whereas %ld would print an error code. Signed-off-by:
Supriti Singh <supriti.singh@ionos.com> Signed-off-by:
Jack Wang <jinpu.wang@ionos.com> Signed-off-by:
Grzegorz Prajsner <grzegorz.prajsner@ionos.com> Signed-off-by:
Md Haris Iqbal <haris.iqbal@ionos.com> Link: https://lore.kernel.org/r/20231124213422.113449-3-haris.iqbal@ionos.com Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Santosh Pradhan authored
Remove REQ_OP_WRITE_SAME in favour of REQ_OP_WRITE_ZEROES. Signed-off-by:
Santosh Pradhan <santosh.pradhan@ionos.com> Reviewed-by:
Md Haris Iqbal <haris.iqbal@ionos.com> Signed-off-by:
Grzegorz Prajsner <grzegorz.prajsner@ionos.com> Signed-off-by:
Md Haris Iqbal <haris.iqbal@ionos.com> Link: https://lore.kernel.org/r/20231124213422.113449-2-haris.iqbal@ionos.com Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Nov 24, 2023
-
-
Amir Goldstein authored
All the callers of vfs_iter_write() call file_start_write() just before calling vfs_iter_write() except for target_core_file's fd_do_rw(). Move file_start_write() from the callers into vfs_iter_write(). fd_do_rw() calls vfs_iter_write() with a non-regular file, so file_start_write() is a no-op. This is needed for fanotify "pre content" events. Suggested-by:
Jan Kara <jack@suse.cz> Reviewed-by:
Josef Bacik <josef@toxicpanda.com> Signed-off-by:
Amir Goldstein <amir73il@gmail.com> Link: https://lore.kernel.org/r/20231122122715.2561213-11-amir73il@gmail.com Signed-off-by:
Christian Brauner <brauner@kernel.org>
-
- Nov 21, 2023
-
-
Li Nan authored
If a socket is processing ioctl 'NBD_SET_SOCK', config->socks might be krealloc in nbd_add_socket(), and a garbage request is received now, a UAF may occurs. T1 nbd_ioctl __nbd_ioctl nbd_add_socket blk_mq_freeze_queue T2 recv_work nbd_read_reply sock_xmit krealloc config->socks def config->socks Pass nbd_sock to nbd_read_reply(). And introduce a new function sock_xmit_recv(), which differs from sock_xmit only in the way it get socket. ================================================================== BUG: KASAN: use-after-free in sock_xmit+0x525/0x550 Read of size 8 at addr ffff8880188ec428 by task kworker/u12:1/18779 Workqueue: knbd4-recv recv_work Call Trace: __dump_stack dump_stack+0xbe/0xfd print_address_description.constprop.0+0x19/0x170 __kasan_report.cold+0x6c/0x84 kasan_report+0x3a/0x50 sock_xmit+0x525/0x550 nbd_read_reply+0xfe/0x2c0 recv_work+0x1c2/0x750 process_one_work+0x6b6/0xf10 worker_thread+0xdd/0xd80 kthread+0x30a/0x410 ret_from_fork+0x22/0x30 Allocated by task 18784: kasan_save_stack+0x1b/0x40 kasan_set_track set_alloc_info __kasan_kmalloc __kasan_kmalloc.constprop.0+0xf0/0x130 slab_post_alloc_hook slab_alloc_node slab_alloc __kmalloc_track_caller+0x157/0x550 __do_krealloc krealloc+0x37/0xb0 nbd_add_socket +0x2d3/0x880 __nbd_ioctl nbd_ioctl+0x584/0x8e0 __blkdev_driver_ioctl blkdev_ioctl+0x2a0/0x6e0 block_ioctl+0xee/0x130 vfs_ioctl __do_sys_ioctl __se_sys_ioctl+0x138/0x190 do_syscall_64+0x33/0x40 entry_SYSCALL_64_after_hwframe+0x61/0xc6 Freed by task 18784: kasan_save_stack+0x1b/0x40 kasan_set_track+0x1c/0x30 kasan_set_free_info+0x20/0x40 __kasan_slab_free.part.0+0x13f/0x1b0 slab_free_hook slab_free_freelist_hook slab_free kfree+0xcb/0x6c0 krealloc+0x56/0xb0 nbd_add_socket+0x2d3/0x880 __nbd_ioctl nbd_ioctl+0x584/0x8e0 __blkdev_driver_ioctl blkdev_ioctl+0x2a0/0x6e0 block_ioctl+0xee/0x130 vfs_ioctl __do_sys_ioctl __se_sys_ioctl+0x138/0x190 do_syscall_64+0x33/0x40 entry_SYSCALL_64_after_hwframe+0x61/0xc6 Signed-off-by:
Li Nan <linan122@huawei.com> Reviewed-by:
Yu Kuai <yukuai3@huawei.com> Reviewed-by:
Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20230911023308.3467802-1-linan666@huaweicloud.com Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Nov 20, 2023
-
-
Chengming Zhou authored
When CONFIG_BLK_DEV_NULL_BLK_FAULT_INJECTION is enabled, null_queue_rq() would return BLK_STS_RESOURCE or BLK_STS_DEV_RESOURCE for the request, which has been marked as MQ_RQ_IN_FLIGHT by blk_mq_start_request(). Then null_queue_rqs() put these requests in the rqlist, return back to the block layer core, which would try to queue them individually again, so the warning in blk_mq_start_request() triggered. Fix it by splitting the null_queue_rq() into two parts: the first is the preparation of request, the second is the handling of request. We put the blk_mq_start_request() after the preparation part, which may fail and return back to the block layer core. The throttling also belongs to the preparation part, so move it before blk_mq_start_request(). And change the return type of null_handle_cmd() to void, since it always return BLK_STS_OK now. Reported-by:
<syzbot+fcc47ba2476570cbbeb0@syzkaller.appspotmail.com> Closes: https://lore.kernel.org/all/0000000000000e6aac06098aee0c@google.com/ Fixes: d78bfa13 ("block/null_blk: add queue_rqs() support") Suggested-by:
Bart Van Assche <bvanassche@acm.org> Signed-off-by:
Chengming Zhou <zhouchengming@bytedance.com> Link: https://lore.kernel.org/r/20231120032521.1012037-1-chengming.zhou@linux.dev Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Li Nan authored
Memory reordering may occur in nbd_genl_connect(), causing config_refs to be set to 1 while nbd->config is still empty. Opening nbd at this time will cause null-ptr-dereference. T1 T2 nbd_open nbd_get_config_unlocked nbd_genl_connect nbd_alloc_and_init_config //memory reordered refcount_set(&nbd->config_refs, 1) // 2 nbd->config ->null point nbd->config = config // 1 Fix it by adding smp barrier to guarantee the execution sequence. Signed-off-by:
Li Nan <linan122@huawei.com> Reviewed-by:
Josef Bacik <josef@toxicpanda.com> Link: https://lore.kernel.org/r/20231116162316.1740402-4-linan666@huaweicloud.com Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Li Nan authored
There are no functional changes, just to make code cleaner and prepare to fix null-ptr-dereference while accessing 'nbd->config'. Signed-off-by:
Li Nan <linan122@huawei.com> Reviewed-by:
Josef Bacik <josef@toxicpanda.com> Link: https://lore.kernel.org/r/20231116162316.1740402-3-linan666@huaweicloud.com Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Li Nan authored
There are no functional changes, make the code cleaner and prepare to fix null-ptr-dereference while accessing 'nbd->config'. Signed-off-by:
Li Nan <linan122@huawei.com> Reviewed-by:
Josef Bacik <josef@toxicpanda.com> Link: https://lore.kernel.org/r/20231116162316.1740402-2-linan666@huaweicloud.com Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Nov 07, 2023
-
-
Li Lingfeng authored
Commit 4af5f2e0 ("nbd: use blk_mq_alloc_disk and blk_cleanup_disk") cleans up disk by blk_cleanup_disk() and it won't set disk->private_data as NULL as before. UAF may be triggered in nbd_open() if someone tries to open nbd device right after nbd_put() since nbd has been free in nbd_dev_remove(). Fix this by implementing ->free_disk and free private data in it. Fixes: 4af5f2e0 ("nbd: use blk_mq_alloc_disk and blk_cleanup_disk") Signed-off-by:
Li Lingfeng <lilingfeng3@huawei.com> Reviewed-by:
Josef Bacik <josef@toxicpanda.com> Link: https://lore.kernel.org/r/20231107103435.2074904-1-lilingfeng@huaweicloud.com Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Nov 01, 2023
-
-
The following codes have an implicit conversion from size_t to u32: (u32)max_size = (size_t)virtio_max_dma_size(vdev); This may lead overflow, Ex (size_t)4G -> (u32)0. Once virtio_max_dma_size() has a larger size than U32_MAX, use U32_MAX instead. Signed-off-by:
zhenwei pi <pizhenwei@bytedance.com> Message-Id: <20230904061045.510460-1-pizhenwei@bytedance.com> Signed-off-by:
Michael S. Tsirkin <mst@redhat.com>
-
- Oct 28, 2023
-
-
Christoph Hellwig authored
disk_check_media_change is mostly called from ->open where it makes little sense to mark the file system on the device as dead, as we are just opening it. So instead of calling bdev_mark_dead from disk_check_media_change move it into the few callers that are not in an open instance. This avoid calling into bdev_mark_dead and thus taking s_umount with open_mutex held. Signed-off-by:
Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20231017184823.1383356-4-hch@lst.de Reviewed-by:
Ming Lei <ming.lei@redhat.com> Reviewed-by:
Christian Brauner <brauner@kernel.org> Reviewed-by:
Jan Kara <jack@suse.cz> Reviewed-by:
Jens Axboe <axboe@kernel.dk> Signed-off-by:
Christian Brauner <brauner@kernel.org>
-