- Feb 25, 2025
-
-
Icenowy Zheng authored
The PCI P2PDMA code will register the CMB block to the memory hot-plugging subsystem, which have an alignment requirement. Memory blocks that do not satisfy this alignment requirement (usually 2MB) will lead to a WARNING from memory hotplugging. Verify the CMB block's address and size against the alignment and only try to send CMB blocks compatible with it to prevent this warning. Tested on Intel DC D4502 SSD, which has a 512K CMB block that is too small for memory hotplugging (thus PCI P2PDMA). Signed-off-by:
Icenowy Zheng <uwu@icenowy.me> Signed-off-by:
Keith Busch <kbusch@kernel.org>
-
Icenowy Zheng authored
CMB decoding should get disabled when the CMB block isn't successfully registered to P2P DMA subsystem. Clean up the CMBMSC register in this error handling codepath to disable CMB decoding (and CMBLOC/CMBSZ registers). Signed-off-by:
Icenowy Zheng <uwu@icenowy.me> Reviewed-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Keith Busch <kbusch@kernel.org>
-
- Feb 24, 2025
-
-
Sagi Grimberg authored
nvme_tcp_poll() may race with the send path error handler because it may complete the request while it is actively being polled for completion, resulting in a UAF panic [1]: We should make sure to stop polling when we see an error when trying to read from the socket. Hence make sure to propagate the error so that the block layer breaks the polling cycle. [1]: -- [35665.692310] nvme nvme2: failed to send request -13 [35665.702265] nvme nvme2: unsupported pdu type (3) [35665.702272] BUG: kernel NULL pointer dereference, address: 0000000000000000 [35665.702542] nvme nvme2: queue 1 receive failed: -22 [35665.703209] #PF: supervisor write access in kernel mode [35665.703213] #PF: error_code(0x0002) - not-present page [35665.703214] PGD 8000003801cce067 P4D 8000003801cce067 PUD 37e6f79067 PMD 0 [35665.703220] Oops: 0002 [#1] SMP PTI [35665.703658] nvme nvme2: starting error recovery [35665.705809] Hardware name: Inspur aaabbb/YZMB-00882-104, BIOS 4.1.26 09/22/2022 [35665.705812] Workqueue: kblockd blk_mq_requeue_work [35665.709172] RIP: 0010:_raw_spin_lock+0xc/0x30 [35665.715788] Call Trace: [35665.716201] <TASK> [35665.716613] ? show_trace_log_lvl+0x1c1/0x2d9 [35665.717049] ? show_trace_log_lvl+0x1c1/0x2d9 [35665.717457] ? blk_mq_request_bypass_insert+0x2c/0xb0 [35665.717950] ? __die_body.cold+0x8/0xd [35665.718361] ? page_fault_oops+0xac/0x140 [35665.718749] ? blk_mq_start_request+0x30/0xf0 [35665.719144] ? nvme_tcp_queue_rq+0xc7/0x170 [nvme_tcp] [35665.719547] ? exc_page_fault+0x62/0x130 [35665.719938] ? asm_exc_page_fault+0x22/0x30 [35665.720333] ? _raw_spin_lock+0xc/0x30 [35665.720723] blk_mq_request_bypass_insert+0x2c/0xb0 [35665.721101] blk_mq_requeue_work+0xa5/0x180 [35665.721451] process_one_work+0x1e8/0x390 [35665.721809] worker_thread+0x53/0x3d0 [35665.722159] ? process_one_work+0x390/0x390 [35665.722501] kthread+0x124/0x150 [35665.722849] ? set_kthread_struct+0x50/0x50 [35665.723182] ret_from_fork+0x1f/0x30 Reported-by:
Zhang Guanghui <zhang.guanghui@cestc.cn> Signed-off-by:
Sagi Grimberg <sagi@grimberg.me> Reviewed-by:
Chaitanya Kulkarni <kch@nvidia.com> Signed-off-by:
Keith Busch <kbusch@kernel.org>
-
- Feb 21, 2025
-
-
git://git.infradead.org/nvmeJens Axboe authored
Pull NVMe fixes from Keith: "nvme fixes for Linux 6.14 - FC controller state check fixes (Daniel) - PCI Endpoint fixes (Damien) - TCP connection failure fixe (Caleb) - TCP handling C2HTermReq PDU (Maurizio) - RDMA queue state check (Ruozhu) - Apple controller fixes (Hector) - Target crash on disbaled namespace (Hannes)" * tag 'nvme-6.14-2025-02-20' of git://git.infradead.org/nvme: nvme: only allow entering LIVE from CONNECTING state nvme-fc: rely on state transitions to handle connectivity loss apple-nvme: Support coprocessors left idle apple-nvme: Release power domains when probe fails nvmet: Use enum definitions instead of hardcoded values nvme: Cleanup the definition of the controller config register fields nvme/ioctl: add missing space in err message nvme-tcp: fix connect failure on receiving partial ICResp PDU nvme: tcp: Fix compilation warning with W=1 nvmet: pci-epf: Avoid RCU stalls under heavy workload nvmet: pci-epf: Do not uselessly write the CSTS register nvmet: pci-epf: Correctly initialize CSTS when enabling the controller nvmet-rdma: recheck queue state is LIVE in state lock in recv done nvmet: Fix crash when a namespace is disabled nvme-tcp: add basic support for the C2HTermReq PDU nvme-pci: quirk Acer FA100 for non-uniqueue identifiers
-
- Feb 20, 2025
-
-
Daniel Wagner authored
The fabric transports and also the PCI transport are not entering the LIVE state from NEW or RESETTING. This makes the state machine more restrictive and allows to catch not supported state transitions, e.g. directly switching from RESETTING to LIVE. Reviewed-by:
Sagi Grimberg <sagi@grimberg.me> Signed-off-by:
Daniel Wagner <wagi@kernel.org> Signed-off-by:
Keith Busch <kbusch@kernel.org>
-
Daniel Wagner authored
It's not possible to call nvme_state_ctrl_state with holding a spin lock, because nvme_state_ctrl_state calls cancel_delayed_work_sync when fastfail is enabled. Instead syncing the ASSOC_FLAG and state transitions using a lock, it's possible to only rely on the state machine transitions. That means nvme_fc_ctrl_connectivity_loss should unconditionally call nvme_reset_ctrl which avoids the read race on the ctrl state variable. Actually, it's not necessary to test in which state the ctrl is, the reset work will only scheduled when the state machine is in LIVE state. In nvme_fc_create_association, the LIVE state can only be entered if it was previously CONNECTING. If this is not possible then the reset handler got triggered. Thus just error out here. Fixes: ee59e382 ("nvme-fc: do not ignore connectivity loss during connecting") Closes: https://lore.kernel.org/all/denqwui6sl5erqmz2gvrwueyxakl5txzbbiu3fgebryzrfxunm@iwxuthct377m/ Reported-by:
Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com> Tested-by:
Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> Reviewed-by:
Sagi Grimberg <sagi@grimberg.me> Signed-off-by:
Daniel Wagner <wagi@kernel.org> Signed-off-by:
Keith Busch <kbusch@kernel.org>
-
Jens Axboe authored
Merge tag 'md-6.14-20250218' of https://git.kernel.org/pub/scm/linux/kernel/git/mdraid/linux into block-6.14 Pull MD fix from Yu: "This patch, by Bart Van Assche, fixes queue limits error handling for raid0, raid1 and raid10." * tag 'md-6.14-20250218' of https://git.kernel.org/pub/scm/linux/kernel/git/mdraid/linux: md/raid*: Fix the set_queue_limits implementations
-
- Feb 18, 2025
-
-
Hector Martin authored
iBoot on at least some firmwares/machines leaves ANS2 running, requiring a wake command instead of a CPU boot (and if we reset ANS2 in that state, everything breaks). Only stop the CPU if RTKit was running, and only do the reset dance if the CPU is stopped. Normal shutdown handoff: - RTKit not yet running - CPU detected not running - Reset - CPU powerup - RTKit boot wait ANS2 left running/idle: - RTKit not yet running - CPU detected running - RTKit wake message Sleep/resume cycle: - RTKit shutdown - CPU stopped - (sleep here) - CPU detected not running - Reset - CPU powerup - RTKit boot wait Shutdown or device removal: - RTKit shutdown - CPU stopped Therefore, the CPU running bit serves as a consistent flag of whether the coprocessor is fully stopped or just idle. Signed-off-by:
Hector Martin <marcan@marcan.st> Reviewed-by:
Neal Gompa <neal@gompa.dev> Reviewed-by:
Sven Peter <sven@svenpeter.dev> Signed-off-by:
Alyssa Rosenzweig <alyssa@rosenzweig.io> Signed-off-by:
Keith Busch <kbusch@kernel.org>
-
Hector Martin authored
Signed-off-by:
Hector Martin <marcan@marcan.st> Reviewed-by:
Neal Gompa <neal@gompa.dev> Reviewed-by:
Sven Peter <sven@svenpeter.dev> Signed-off-by:
Alyssa Rosenzweig <alyssa@rosenzweig.io> Signed-off-by:
Keith Busch <kbusch@kernel.org>
-
Damien Le Moal authored
Change the definition of the inline functions nvmet_cc_en(), nvmet_cc_css(), nvmet_cc_mps(), nvmet_cc_ams(), nvmet_cc_shn(), nvmet_cc_iosqes(), and nvmet_cc_iocqes() to use the enum difinitions in include/linux/nvme.h instead of hardcoded values. Signed-off-by:
Damien Le Moal <dlemoal@kernel.org> Reviewed-by:
Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Keith Busch <kbusch@kernel.org>
-
Damien Le Moal authored
Reorganized the enum used to define the fields of the contrller configuration (CC) register in include/linux/nvme.h to: 1) Group together all the values defined for each field. 2) Add the missing field masks definitions. 3) Add comments to describe the enum and each field. Signed-off-by:
Damien Le Moal <dlemoal@kernel.org> Reviewed-by:
Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Keith Busch <kbusch@kernel.org>
-
Caleb Sander Mateos authored
nvme_validate_passthru_nsid() logs an err message whose format string is split over 2 lines. There is a missing space between the two pieces, resulting in log lines like "... does not match nsid (1)of namespace". Add the missing space between ")" and "of". Also combine the format string pieces onto a single line to make the err message easier to grep. Fixes: e7d4b549 ("nvme: factor out a nvme_validate_passthru_nsid helper") Signed-off-by:
Caleb Sander Mateos <csander@purestorage.com> Reviewed-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Keith Busch <kbusch@kernel.org>
-
Caleb Sander Mateos authored
nvme_tcp_init_connection() attempts to receive an ICResp PDU but only checks that the return value from recvmsg() is non-negative. If the sender closes the TCP connection or sends fewer than 128 bytes, this check will pass even though the full PDU wasn't received. Ensure the full ICResp PDU is received by checking that recvmsg() returns the expected 128 bytes. Additionally set the MSG_WAITALL flag for recvmsg(), as a sender could split the ICResp over multiple TCP frames. Without MSG_WAITALL, recvmsg() could return prematurely with only part of the PDU. Fixes: 3f2304f8 ("nvme-tcp: add NVMe over TCP host driver") Signed-off-by:
Caleb Sander Mateos <csander@purestorage.com> Reviewed-by:
Sagi Grimberg <sagi@grimberg.me> Reviewed-by:
Hannes Reinecke <hare@suse.de> Signed-off-by:
Keith Busch <kbusch@kernel.org>
-
Damien Le Moal authored
When compiling with W=1, a warning result for the function nvme_tcp_set_queue_io_cpu(): host/tcp.c:1578: warning: Function parameter or struct member 'queue' not described in 'nvme_tcp_set_queue_io_cpu' host/tcp.c:1578: warning: expecting prototype for Track the number of queues assigned to each cpu using a global per(). Prototype was for nvme_tcp_set_queue_io_cpu() instead Avoid this warning by using the regular comment format for the function nvme_tcp_set_queue_io_cpu() instead of the kdoc comment format. Fixes: 32193789 ("nvme-tcp: Fix I/O queue cpu spreading for multiple controllers") Signed-off-by:
Damien Le Moal <dlemoal@kernel.org> Reviewed-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Sagi Grimberg <sagi@grimberg.me> Signed-off-by:
Keith Busch <kbusch@kernel.org>
-
Damien Le Moal authored
The delayed work item function nvmet_pci_epf_poll_sqs_work() polls all submission queues and keeps running in a loop as long as commands are being submitted by the host. Depending on the preemption configuration of the kernel, under heavy command workload, this function can thus run for more than RCU_CPU_STALL_TIMEOUT seconds, leading to a RCU stall: rcu: INFO: rcu_sched self-detected stall on CPU rcu: 5-....: (20998 ticks this GP) idle=4244/1/0x4000000000000000 softirq=301/301 fqs=5132 rcu: (t=21000 jiffies g=-443 q=12 ncpus=8) CPU: 5 UID: 0 PID: 82 Comm: kworker/5:1 Not tainted 6.14.0-rc2 #1 Hardware name: Radxa ROCK 5B (DT) Workqueue: events nvmet_pci_epf_poll_sqs_work [nvmet_pci_epf] pstate: 60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : dw_edma_device_tx_status+0xb8/0x130 lr : dw_edma_device_tx_status+0x9c/0x130 sp : ffff800080b5bbb0 x29: ffff800080b5bbb0 x28: ffff0331c5c78400 x27: ffff0331c1cd1960 x26: ffff0331c0e39010 x25: ffff0331c20e4000 x24: ffff0331c20e4a90 x23: 0000000000000000 x22: 0000000000000001 x21: 00000000005aca33 x20: ffff800080b5bc30 x19: ffff0331c123e370 x18: 000000000ab29e62 x17: ffffb2a878c9c118 x16: ffff0335bde82040 x15: 0000000000000000 x14: 000000000000017b x13: 00000000ee601780 x12: 0000000000000018 x11: 0000000000000000 x10: 0000000000000001 x9 : 0000000000000040 x8 : 00000000ee601780 x7 : 0000000105c785c0 x6 : ffff0331c1027d80 x5 : 0000000001ee7ad6 x4 : ffff0335bdea16c0 x3 : ffff0331c123e438 x2 : 00000000005aca33 x1 : 0000000000000000 x0 : ffff0331c123e410 Call trace: dw_edma_device_tx_status+0xb8/0x130 (P) dma_sync_wait+0x60/0xbc nvmet_pci_epf_dma_transfer+0x128/0x264 [nvmet_pci_epf] nvmet_pci_epf_poll_sqs_work+0x2a0/0x2e0 [nvmet_pci_epf] process_one_work+0x144/0x390 worker_thread+0x27c/0x458 kthread+0xe8/0x19c ret_from_fork+0x10/0x20 The solution for this is simply to explicitly allow rescheduling using cond_resched(). However, since doing so for every loop of nvmet_pci_epf_poll_sqs_work() significantly degrades performance (for 4K random reads using 4 I/O queues, the maximum IOPS goes down from 137 KIOPS to 110 KIOPS), call cond_resched() every second to avoid the RCU stalls. Fixes: 0faa0fe6 ("nvmet: New NVMe PCI endpoint function target driver") Signed-off-by:
Damien Le Moal <dlemoal@kernel.org> Reviewed-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Sagi Grimberg <sagi@grimberg.me> Signed-off-by:
Keith Busch <kbusch@kernel.org>
-
Damien Le Moal authored
The function nvmet_pci_epf_poll_cc_work() will do nothing if there are no changes to the controller configuration (CC) register. However, even for such case, this function still calls nvmet_update_cc() and uselessly writes the CSTS register. Avoid this by simply rescheduling the poll_cc work if the CC register has not changed. Also reschedule the poll_cc work if the function nvmet_pci_epf_enable_ctrl() fails to allow the host the chance to try again enabling the controller. While at it, since there is no point in trying to handle the CC register as quickly as possible, change the poll_cc work scheduling interval to 10 ms (from 5ms), to avoid excessive read accesses to that register. Fixes: 0faa0fe6 ("nvmet: New NVMe PCI endpoint function target driver") Signed-off-by:
Damien Le Moal <dlemoal@kernel.org> Reviewed-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Sagi Grimberg <sagi@grimberg.me> Signed-off-by:
Keith Busch <kbusch@kernel.org>
-
Damien Le Moal authored
The function nvmet_pci_epf_poll_cc_work() sets the NVME_CSTS_RDY bit of the controller status register (CSTS) when nvmet_pci_epf_enable_ctrl() returns success. However, since this function can be called several times (e.g. if the host reboots), instead of setting the bit in ctrl->csts, initialize this field to only have NVME_CSTS_RDY set. Conversely, if nvmet_pci_epf_enable_ctrl() fails, make sure to clear all bits from ctrl->csts. To simplify nvmet_pci_epf_poll_cc_work(), initialize ctrl->csts to NVME_CSTS_RDY directly inside nvmet_pci_epf_enable_ctrl() and clear this field in that function as well in case of a failure. To be consistent, move clearing the NVME_CSTS_RDY bit from ctrl->csts when the controller is being disabled from nvmet_pci_epf_poll_cc_work() into nvmet_pci_epf_disable_ctrl(). Fixes: 0faa0fe6 ("nvmet: New NVMe PCI endpoint function target driver") Signed-off-by:
Damien Le Moal <dlemoal@kernel.org> Reviewed-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Sagi Grimberg <sagi@grimberg.me> Signed-off-by:
Keith Busch <kbusch@kernel.org>
-
Ruozhu Li authored
The queue state checking in nvmet_rdma_recv_done is not in queue state lock.Queue state can transfer to LIVE in cm establish handler between state checking and state lock here, cause a silent drop of nvme connect cmd. Recheck queue state whether in LIVE state in state lock to prevent this issue. Signed-off-by:
Ruozhu Li <david.li@jaguarmicro.com> Reviewed-by:
Sagi Grimberg <sagi@grimberg.me> Signed-off-by:
Keith Busch <kbusch@kernel.org>
-
Hannes Reinecke authored
The namespace percpu counter protects pending I/O, and we can only safely diable the namespace once the counter drop to zero. Otherwise we end up with a crash when running blktests/nvme/058 (eg for loop transport): [ 2352.930426] [ T53909] Oops: general protection fault, probably for non-canonical address 0xdffffc0000000005: 0000 [#1] PREEMPT SMP KASAN PTI [ 2352.930431] [ T53909] KASAN: null-ptr-deref in range [0x0000000000000028-0x000000000000002f] [ 2352.930434] [ T53909] CPU: 3 UID: 0 PID: 53909 Comm: kworker/u16:5 Tainted: G W 6.13.0-rc6 #232 [ 2352.930438] [ T53909] Tainted: [W]=WARN [ 2352.930440] [ T53909] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-3.fc41 04/01/2014 [ 2352.930443] [ T53909] Workqueue: nvmet-wq nvme_loop_execute_work [nvme_loop] [ 2352.930449] [ T53909] RIP: 0010:blkcg_set_ioprio+0x44/0x180 as the queue is already torn down when calling submit_bio(); So we need to init the percpu counter in nvmet_ns_enable(), and wait for it to drop to zero in nvmet_ns_disable() to avoid having I/O pending after the namespace has been disabled. Fixes: 74d16965 ("nvmet-loop: avoid using mutex in IO hotpath") Signed-off-by:
Hannes Reinecke <hare@kernel.org> Reviewed-by:
Nilay Shroff <nilay@linux.ibm.com> Reviewed-by:
Sagi Grimberg <sagi@grimberg.me> Reviewed-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Chaitanya Kulkarni <kch@nvidia.com> Tested-by:
Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> Signed-off-by:
Keith Busch <kbusch@kernel.org>
-
Maurizio Lombardi authored
Previously, the NVMe/TCP host driver did not handle the C2HTermReq PDU, instead printing "unsupported pdu type (3)" when received. This patch adds support for processing the C2HTermReq PDU, allowing the driver to print the Fatal Error Status field. Example of output: nvme nvme4: Received C2HTermReq (FES = Invalid PDU Header Field) Signed-off-by:
Maurizio Lombardi <mlombard@redhat.com> Reviewed-by:
Sagi Grimberg <sagi@grimberg.me> Signed-off-by:
Keith Busch <kbusch@kernel.org>
-
Christopher Lentocha authored
In order for two Acer FA100 SSDs to work in one PC (in the case of myself, a Lenovo Legion T5 28IMB05), and not show one drive and not the other, and sometimes mix up what drive shows up (randomly), these two lines of code need to be added, and then both of the SSDs will show up and not conflict when booting off of one of them. If you boot up your computer with both SSDs installed without this patch, you may also randomly get into a kernel panic (if the initrd is not set up) or stuck in the initrd "/init" process, it is set up, however, if you do apply this patch, there should not be problems with booting or seeing both contents of the drive. Tested with the btrfs filesystem with a RAID configuration of having the root drive '/' combined to make two 256GB Acer FA100 SSDs become 512GB in total storage. Kernel Logs with patch applied (`dmesg -t | grep -i nvm`): ``` ... nvme 0000:04:00.0: platform quirk: setting simple suspend nvme nvme0: pci function 0000:04:00.0 nvme 0000:05:00.0: platform quirk: setting simple suspend nvme nvme1: pci function 0000:05:00.0 nvme nvme1: missing or invalid SUBNQN field. nvme nvme1: allocated 64 MiB host memory buffer. nvme nvme0: missing or invalid SUBNQN field. nvme nvme0: allocated 64 MiB host memory buffer. nvme nvme1: 8/0/0 default/read/poll queues nvme nvme1: Ignoring bogus Namespace Identifiers nvme nvme0: 8/0/0 default/read/poll queues nvme nvme0: Ignoring bogus Namespace Identifiers nvme0n1: p1 p2 ... ``` Kernel Logs with patch not applied (`dmesg -t | grep -i nvm`): ``` ... nvme 0000:04:00.0: platform quirk: setting simple suspend nvme nvme0: pci function 0000:04:00.0 nvme 0000:05:00.0: platform quirk: setting simple suspend nvme nvme1: pci function 0000:05:00.0 nvme nvme0: missing or invalid SUBNQN field. nvme nvme1: missing or invalid SUBNQN field. nvme nvme0: allocated 64 MiB host memory buffer. nvme nvme1: allocated 64 MiB host memory buffer. nvme nvme0: 8/0/0 default/read/poll queues nvme nvme1: 8/0/0 default/read/poll queues nvme nvme1: globally duplicate IDs for nsid 1 nvme nvme1: VID:DID 1dbe:5216 model:Acer SSD FA100 256GB firmware:1.Z.J.2X nvme0n1: p1 p2 ... ``` Signed-off-by:
Christopher Lentocha <christopherericlentocha@gmail.com> Reviewed-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Keith Busch <kbusch@kernel.org>
-
- Feb 17, 2025
-
-
Ming Lei authored
The block layer internal flush request may not have bio attached, so the request iterator has to be initialized from valid req->bio, otherwise NULL pointer dereferenced is triggered. Cc: Christoph Hellwig <hch@lst.de> Reported-and-tested-by:
Cheyenne Wills <cheyenne.wills@gmail.com> Fixes: b7175e24 ("block: add a dma mapping iterator") Signed-off-by:
Ming Lei <ming.lei@redhat.com> Reviewed-by:
Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20250217031626.461977-1-ming.lei@redhat.com Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Feb 14, 2025
-
-
Caleb Sander Mateos authored
In bvec_split_segs(), max_bytes is an unsigned, so it must be less than or equal to UINT_MAX. Remove the unnecessary min(). Prior to commit 67927d22 ("block/merge: count bytes instead of sectors"), the min() was with UINT_MAX >> 9, so it did have an effect. Signed-off-by:
Caleb Sander Mateos <csander@purestorage.com> Link: https://lore.kernel.org/r/20250214193637.234702-1-csander@purestorage.com Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Jann Horn authored
Fix several issues in partition probing: - The bailout for a bad partoffset must use put_dev_sector(), since the preceding read_part_sector() succeeded. - If the partition table claims a silly sector size like 0xfff bytes (which results in partition table entries straddling sector boundaries), bail out instead of accessing out-of-bounds memory. - We must not assume that the partition table contains proper NUL termination - use strnlen() and strncmp() instead of strlen() and strcmp(). Cc: stable@vger.kernel.org Signed-off-by:
Jann Horn <jannh@google.com> Link: https://lore.kernel.org/r/20250214-partition-mac-v1-1-c1c626dffbd5@google.com Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Feb 13, 2025
-
-
Bart Van Assche authored
queue_limits_cancel_update() must only be called if queue_limits_start_update() is called first. Remove the queue_limits_cancel_update() calls from the raid*_set_limits() functions because there is no corresponding queue_limits_start_update() call. Cc: Christoph Hellwig <hch@lst.de> Fixes: c6e56cf6 ("block: move integrity information into queue_limits") Signed-off-by:
Bart Van Assche <bvanassche@acm.org> Reviewed-by:
Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/linux-raid/20250212171108.3483150-1-bvanassche@acm.org/ Signed-off-by:
Yu Kuai <yukuai@kernel.org>
-
Jens Axboe authored
The conditions for whether or not a request is allowed adding to a completion batch are a bit hard to read, and they also have a few issues. One is that ioerror may indeed be a random value on passthrough, and it's being checked unconditionally of whether or not the given request is a passthrough request or not. Rewrite the conditions to be separate for easier reading, and only check ioerror for non-passthrough requests. This fixes an issue with bio unmapping on passthrough, where it fails getting added to a batch. This both leads to suboptimal performance, and may trigger a potential schedule-under-atomic condition for polled passthrough IO. Fixes: f794f335 ("block: add support for blk_mq_end_request_batch()") Link: https://lore.kernel.org/r/20575f0a-656e-4bb3-9d82-dec6c7e3a35c@kernel.dk Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Feb 07, 2025
-
-
Jens Axboe authored
Merge tag 'md-6.14-20250206' of https://git.kernel.org/pub/scm/linux/kernel/git/mdraid/linux into block-6.14 Pull MD fix from Song: "This patch, by Bart Van Assche, fixes an error handling path for md-linear." * tag 'md-6.14-20250206' of https://git.kernel.org/pub/scm/linux/kernel/git/mdraid/linux: md: Fix linear_set_limits()
-
- Feb 03, 2025
-
-
git://git.infradead.org/nvmeJens Axboe authored
Pull NVMe fixes from Keith: "nvme fixes for Linux 6.14 - Connection fixes for fibre channel transport (Daniel) - Endian fixes (Keith, Christoph) - Cleanup fix for host memory buffer (Francis) - Platform specific power quirks (Georg) - Target memory leak (Sagi) - Use appropriate controller state accessor (Daniel)" * tag 'nvme-6.14-2025-01-31' of git://git.infradead.org/nvme: nvme-fc: use ctrl state getter nvme: make nvme_tls_attrs_group static nvmet: add a missing endianess conversion in nvmet_execute_admin_connect nvmet: the result field in nvmet_alloc_ctrl_args is little endian nvmet: fix a memory leak in controller identify nvme-fc: do not ignore connectivity loss during connecting nvme: handle connectivity loss in nvme_set_queue_count nvme-fc: go straight to connecting state when initializing nvme-pci: Add TUXEDO IBP Gen9 to Samsung sleep quirk nvme-pci: Add TUXEDO InfinityFlex to Samsung sleep quirk nvme-pci: remove redundant dma frees in hmb nvmet: fix rw control endian access
-
Stephen Rothwell authored
My sparc64 defconfig build failed like this: drivers/block/sunvdc.c: In function 'vdc_queue_drain': drivers/block/sunvdc.c:1130:9: error: too many arguments to function 'blk_mq_unquiesce_queue' 1130 | blk_mq_unquiesce_queue(q, memflags); | ^~~~~~~~~~~~~~~~~~~~~~ In file included from drivers/block/sunvdc.c:10: include/linux/blk-mq.h:895:6: note: declared here 895 | void blk_mq_unquiesce_queue(struct request_queue *q); | ^~~~~~~~~~~~~~~~~~~~~~ drivers/block/sunvdc.c:1131:9: error: too few arguments to function 'blk_mq_unfreeze_queue' 1131 | blk_mq_unfreeze_queue(q); | ^~~~~~~~~~~~~~~~~~~~~ In file included from drivers/block/sunvdc.c:10: include/linux/blk-mq.h:914:1: note: declared here 914 | blk_mq_unfreeze_queue(struct request_queue *q, unsigned int memflags) | ^~~~~~~~~~~~~~~~~~~~~ Fixes: 1e1a9cec ("block: force noio scope in blk_mq_freeze_queue") Cc: Christoph Hellwig <hch@lst.de> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by:
Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Jan 31, 2025
-
-
Bart Van Assche authored
queue_limits_cancel_update() must only be called if queue_limits_start_update() is called first. Remove the queue_limits_cancel_update() call from linear_set_limits() because there is no corresponding queue_limits_start_update() call. This bug was discovered by annotating all mutex operations with clang thread-safety attributes and by building the kernel with clang and -Wthread-safety. Cc: Yu Kuai <yukuai3@huawei.com> Cc: Coly Li <colyli@kernel.org> Cc: Mike Snitzer <snitzer@kernel.org> Cc: Christoph Hellwig <hch@lst.de> Fixes: 127186cf ("md: reintroduce md-linear") Signed-off-by:
Bart Van Assche <bvanassche@acm.org> Reviewed-by:
Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20250129225636.2667932-1-bvanassche@acm.org Signed-off-by:
Song Liu <song@kernel.org>
-
Daniel Wagner authored
Do not access the state variable directly, instead use proper synchronization so not stale data is read. Fixes: e6e7f7ac ("nvme: ensure reset state check ordering") Signed-off-by:
Daniel Wagner <wagi@kernel.org> Reviewed-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Sagi Grimberg <sagi@grimberg.me> Signed-off-by:
Keith Busch <kbusch@kernel.org>
-
Keith Busch authored
To suppress the compiler "warning: symbol 'nvme_tls_attrs_group' was not declared. Should it be static?" Fixes: 1e48b34c ("nvme: split off TLS sysfs attributes into a separate group") Reviewed-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Sagi Grimberg <sagi@grimberg.me> Signed-off-by:
Keith Busch <kbusch@kernel.org>
-
Christoph Hellwig authored
When block drivers or the core block code perform allocations with a frozen queue, this could try to recurse into the block device to reclaim memory and deadlock. Thus all allocations done by a process that froze a queue need to be done without __GFP_IO and __GFP_FS. Instead of tying to track all of them down, force a noio scope as part of freezing the queue. Note that nvme is a bit of a mess here due to the non-owner freezes, and they will be addressed separately. Signed-off-by:
Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20250131120352.1315351-2-hch@lst.de Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Jan 29, 2025
-
-
Nilay Shroff authored
The nr_hw_queue update could potentially race with disk addtion/removal while registering/unregistering hctx sysfs files. The __blk_mq_update_ nr_hw_queues() runs with q->tag_list_lock held and so to avoid it racing with disk addition/removal we should acquire q->tag_list_lock while registering/unregistering hctx sysfs files. With this patch, blk_mq_sysfs_register() (called during disk addition) and blk_mq_sysfs_unregister() (called during disk removal) now runs with q->tag_list_lock held so that it avoids racing with __blk_mq_update _nr_hw_queues(). Signed-off-by:
Nilay Shroff <nilay@linux.ibm.com> Reviewed-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Hannes Reinecke <hare@suse.de> Link: https://lore.kernel.org/r/20250128143436.874357-3-nilay@linux.ibm.com Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Nilay Shroff authored
The request queue uses ->sysfs_dir_lock for protecting the addition/ deletion of kobject entries under sysfs while we register/unregister blk-mq. However kobject addition/deletion is already protected with kernfs/sysfs internal synchronization primitives. So use of q->sysfs_ dir_lock seems redundant. Moreover, q->sysfs_dir_lock is also used at few other callsites along with q->sysfs_lock for protecting the addition/deletion of kojects. One such example is when we register with sysfs a set of independent access ranges for a disk. Here as well we could get rid off q->sysfs_ dir_lock and only use q->sysfs_lock. The only variable which q->sysfs_dir_lock appears to protect is q-> mq_sysfs_init_done which is set/unset while registering/unregistering blk-mq with sysfs. But use of q->mq_sysfs_init_done could be easily replaced using queue registered bit QUEUE_FLAG_REGISTERED. So with this patch we remove q->sysfs_dir_lock from each callsite and replace q->mq_sysfs_init_done using QUEUE_FLAG_REGISTERED. Reviewed-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Nilay Shroff <nilay@linux.ibm.com> Reviewed-by:
Hannes Reinecke <hare@suse.de> Link: https://lore.kernel.org/r/20250128143436.874357-2-nilay@linux.ibm.com Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Jan 28, 2025
-
-
Christoph Hellwig authored
The kato field is little endian on the wire, but native endian in the in-core structure, add the missing byte swap. Fixes: 62027831 ("nvmet: Improve nvmet_alloc_ctrl() interface and implementation") Signed-off-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Sagi Grimberg <sagi@grimberg.me> Signed-off-by:
Keith Busch <kbusch@kernel.org>
-
Christoph Hellwig authored
So use the __le32 type for it. Fixes: 62027831 ("nvmet: Improve nvmet_alloc_ctrl() interface and implementation") Signed-off-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Sagi Grimberg <sagi@grimberg.me> Signed-off-by:
Keith Busch <kbusch@kernel.org>
-
- Jan 27, 2025
-
-
Christoph Hellwig authored
LOOP_SET_STATUS{,64} can set a lot more flags than it is supposed to clear (the LOOP_SET_STATUS_CLEARABLE_FLAGS vs LOOP_SET_STATUS_SETTABLE_FLAGS defines should have been a hint..). Fix this by only clearing the bits in LOOP_SET_STATUS_CLEARABLE_FLAGS. Fixes: ae074d07 ("loop: move updating lo_flag s out of loop_set_status_from_info") Reported-by:
kernel test robot <oliver.sang@intel.com> Signed-off-by:
Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20250127143045.538279-1-hch@lst.de Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Jan 24, 2025
-
-
Jens Axboe authored
Merge tag 'md-6.14-20250124' of https://git.kernel.org/pub/scm/linux/kernel/git/mdraid/linux into block-6.14 Pull MD fix from Song: "Fix a md-cluster regression introduced in the 6.12 release." * tag 'md-6.14-20250124' of https://git.kernel.org/pub/scm/linux/kernel/git/mdraid/linux: md/md-bitmap: Synchronize bitmap_get_stats() with bitmap lifetime
-
Yu Kuai authored
After commit ec6bb299 ("md/md-bitmap: add 'sync_size' into struct md_bitmap_stats"), following panic is reported: Oops: general protection fault, probably for non-canonical address RIP: 0010:bitmap_get_stats+0x2b/0xa0 Call Trace: <TASK> md_seq_show+0x2d2/0x5b0 seq_read_iter+0x2b9/0x470 seq_read+0x12f/0x180 proc_reg_read+0x57/0xb0 vfs_read+0xf6/0x380 ksys_read+0x6c/0xf0 do_syscall_64+0x82/0x170 entry_SYSCALL_64_after_hwframe+0x76/0x7e Root cause is that bitmap_get_stats() can be called at anytime if mddev is still there, even if bitmap is destroyed, or not fully initialized. Deferenceing bitmap in this case can crash the kernel. Meanwhile, the above commit start to deferencing bitmap->storage, make the problem easier to trigger. Fix the problem by protecting bitmap_get_stats() with bitmap_info.mutex. Cc: stable@vger.kernel.org # v6.12+ Fixes: 32a7627c ("[PATCH] md: optimised resync using Bitmap based intent logging") Reported-and-tested-by:
Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com> Closes: https://lore.kernel.org/linux-raid/ca3a91a2-50ae-4f68-b317-abd9889f3907@oracle.com/T/#m6e5086c95201135e4941fe38f9efa76daf9666c5 Signed-off-by:
Yu Kuai <yukuai3@huawei.com> Link: https://lore.kernel.org/r/20250124092055.4050195-1-yukuai1@huaweicloud.com Signed-off-by:
Song Liu <song@kernel.org>
-