- Feb 01, 2024
-
-
Jens Axboe authored
If we use IORING_OP_RECV with provided buffers and pass in '0' as the length of the request, the length is retrieved from the selected buffer. If MSG_WAITALL is also set and we get a short receive, then we may hit the retry path which decrements sr->len and increments the buffer for a retry. However, the length is still zero at this point, which means that sr->len now becomes huge and import_ubuf() will cap it to MAX_RW_COUNT and subsequently return -EFAULT for the range as a whole. Fix this by always assigning sr->len once the buffer has been selected. Cc: stable@vger.kernel.org Fixes: 7ba89d2a ("io_uring: ensure recv and recvmsg handle MSG_WAITALL correctly") Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Jan 29, 2024
-
-
Jens Axboe authored
If we have multiple clients and some/all are flooding the receives to such an extent that we can retry a LOT handling multishot receives, then we can be starving some clients and hence serving traffic in an imbalanced fashion. Limit multishot retry attempts to some arbitrary value, whose only purpose serves to ensure that we don't keep serving a single connection for way too long. We default to 32 retries, which should be more than enough to provide fairness, yet not so small that we'll spend too much time requeuing rather than handling traffic. Cc: stable@vger.kernel.org Depends-on: 704ea888 ("io_uring/poll: add requeue return code from poll multishot handling") Depends-on: 1e5d765a82f ("io_uring/net: un-indent mshot retry path in io_recv_finish()") Depends-on: e84b01a8 ("io_uring/poll: move poll execution helpers higher up") Fixes: b3fdea6e ("io_uring: multishot recv") Fixes: 9bb66906 ("io_uring: support multishot in recvmsg") Link: https://github.com/axboe/liburing/issues/1043 Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Jens Axboe authored
In preparation for putting some retry logic in there, have the done path just skip straight to the end rather than have too much nesting in here. No functional changes in this patch. Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Nov 03, 2023
-
-
Jens Axboe authored
io_uring does non-blocking connection attempts, which can yield some unexpected results if a connect request is re-attempted by an an application. This is equivalent to the following sync syscall sequence: sock = socket(AF_INET, SOCK_STREAM | SOCK_NONBLOCK, IPPROTO_TCP); connect(sock, &addr, sizeof(addr); ret == -1 and errno == EINPROGRESS expected here. Now poll for POLLOUT on sock, and when that returns, we expect the socket to be connected. But if we follow that procedure with: connect(sock, &addr, sizeof(addr)); you'd expect ret == -1 and errno == EISCONN here, but you actually get ret == 0. If we attempt the connection one more time, then we get EISCON as expected. io_uring used to do this, but turns out that bluetooth fails with EBADFD if you attempt to re-connect. Also looks like EISCONN _could_ occur with this sequence. Retain the ->in_progress logic, but work-around a potential EISCONN or EBADFD error and only in those cases look at the sock_error(). This should work in general and avoid the odd sequence of a repeated connect request returning success when the socket is already connected. This is all a side effect of the socket state being in a CONNECTING state when we get EINPROGRESS, and only a re-connect or other related operation will turn that into CONNECTED. Cc: stable@vger.kernel.org Fixes: 3fb1bd68 ("io_uring/net: handle -EINPROGRESS correct for IORING_OP_CONNECT") Link: https://github.com/axboe/liburing/issues/980 Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Sep 14, 2023
-
-
Pavel Begunkov authored
When using selected buffer feature, io_uring delays data iter setup until later. If io_setup_async_msg() is called before that it might see not correctly setup iterator. Pre-init nr_segs and judge from its state whether we repointing. Cc: stable@vger.kernel.org Reported-by:
<syzbot+a4c6e5ef999b68b26ed1@syzkaller.appspotmail.com> Fixes: 0455d4cc ("io_uring: add POLL_FIRST support for send/sendmsg and recv/recvmsg") Signed-off-by:
Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/0000000000002770be06053c7757@google.com Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Aug 11, 2023
-
-
Pavel Begunkov authored
Now all callers of io_aux_cqe() set allow_overflow to false, remove the parameter and not allow overflowing auxilary multishot cqes. When CQ is full the function callers and all multishot requests in general are expected to complete the request. That prevents indefinite in-background grows of the overflow list and let's the userspace to handle the backlog at its own pace. Resubmitting a request should also be faster than accounting a bunch of overflows, so it should be better for perf when it happens, but a well behaving userspace should be trying to avoid overflows in any case. Signed-off-by:
Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/bb20d14d708ea174721e58bb53786b0521e4dd6d.1691757663.git.asml.silence@gmail.com Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Pavel Begunkov authored
Don't allow overflowing multishot recv CQEs, it might get out of hand, hurt performance, and in the worst case scenario OOM the task. Cc: stable@vger.kernel.org Fixes: b3fdea6e ("io_uring: multishot recv") Signed-off-by:
Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/0b295634e8f1b71aa764c984608c22d85f88f75c.1691757663.git.asml.silence@gmail.com Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Pavel Begunkov authored
Don't allow overflowing multishot accept CQEs, we want to limit the grows of the overflow list. Cc: stable@vger.kernel.org Fixes: 4e86a2c9 ("io_uring: implement multishot mode for accept") Signed-off-by:
Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/7d0d749649244873772623dd7747966f516fe6e2.1691757663.git.asml.silence@gmail.com Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Jun 27, 2023
-
-
Jens Axboe authored
struct msghdr->msg_inq is a signed type, yet we attempt to store what is essentially an unsigned bitmask in there. We only really need to know if the field was stored or not, but let's use the proper type to avoid any misunderstandings on what is being attempted here. Link: https://lore.kernel.org/io-uring/CAHk-=wjKb24aSe6fE4zDH-eh8hr-FB9BbukObUVSMGOrsBHCRQ@mail.gmail.com/ Reported-by:
Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Jun 21, 2023
-
-
Jens Axboe authored
Rather than assign the user pointer to msghdr->msg_control, assign it to msghdr->msg_control_user to make sparse happy. They are in a union so the end result is the same, but let's avoid new sparse warnings and squash this one. Reported-by:
kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202306210654.mDMcyMuB-lkp@intel.com/ Fixes: cac9e441 ("io_uring/net: save msghdr->msg_control for retries") Reviewed-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Jens Axboe authored
We cannot sanely handle partial retries for recvmsg if we have cmsg attached. If we don't, then we'd just be overwriting the initial cmsg header on retries. Alternatively we could increment and handle this appropriately, but it doesn't seem worth the complication. Move the MSG_WAITALL check into the non-multishot case while at it, since MSG_WAITALL is explicitly disabled for multishot anyway. Link: https://lore.kernel.org/io-uring/0b0d4411-c8fd-4272-770b-e030af6919a0@kernel.dk/ Cc: stable@vger.kernel.org # 5.10+ Reported-by:
Stefan Metzmacher <metze@samba.org> Reviewed-by:
Stefan Metzmacher <metze@samba.org> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Jens Axboe authored
If we have cmsg attached AND we transferred partial data at least, clear msg_controllen on retry so we don't attempt to send that again. Cc: stable@vger.kernel.org # 5.10+ Fixes: cac9e441 ("io_uring/net: save msghdr->msg_control for retries") Reported-by:
Stefan Metzmacher <metze@samba.org> Reviewed-by:
Stefan Metzmacher <metze@samba.org> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Jun 14, 2023
-
-
Jens Axboe authored
If the application sets ->msg_control and we have to later retry this command, or if it got queued with IOSQE_ASYNC to begin with, then we need to retain the original msg_control value. This is due to the net stack overwriting this field with an in-kernel pointer, to copy it in. Hitting that path for the second time will now fail the copy from user, as it's attempting to copy from a non-user address. Cc: stable@vger.kernel.org # 5.10+ Link: https://github.com/axboe/liburing/issues/880 Reported-and-tested-by:
Marek Majkowski <marek@cloudflare.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Jun 07, 2023
-
-
Jens Axboe authored
Everybody is passing in the request, so get rid of the io_ring_ctx and explicit user_data pass-in. Both the ctx and user_data can be deduced from the request at hand. Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- May 24, 2023
-
-
David Howells authored
Declare MSG_SPLICE_PAGES, an internal sendmsg() flag, that hints to a network protocol that it should splice pages from the source iterator rather than copying the data if it can. This flag is added to a list that is cleared by sendmsg syscalls on entry. This is intended as a replacement for the ->sendpage() op, allowing a way to splice in several multipage folios in one go. Signed-off-by:
David Howells <dhowells@redhat.com> Reviewed-by:
Willem de Bruijn <willemb@google.com> cc: Jens Axboe <axboe@kernel.dk> cc: Matthew Wilcox <willy@infradead.org> Signed-off-by:
Jakub Kicinski <kuba@kernel.org>
-
- May 17, 2023
-
-
Jens Axboe authored
If we're doing multishot receives, then we always end up doing two trips through sock_recvmsg(). For protocols that sanely set msghdr->msg_inq, then we don't need to waste time picking a new buffer and attempting a new receive if there's nothing there. Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Jens Axboe authored
Rather than have this logic in both io_recv() and io_recvmsg_multishot(), push it into the handler they both call when finishing a receive operation. Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Jens Axboe authored
We can't currently tell if ->msg_inq was set when we ask for msg_get_inq, initialize it to -1U so we can tell apart if it was set and there's no data left, or if it just wasn't set at all by the protocol. Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Jens Axboe authored
We only need to clear the input fields on the first invocation, not when potentially doing a retry. Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Mar 30, 2023
-
-
Jens Axboe authored
This returns a pointer to the current iovec entry in the iterator. Only useful with ITER_IOVEC right now, but it prepares us to treat ITER_UBUF and ITER_IOVEC identically for the first segment. Rename struct iov_iter->iov to iov_iter->__iov to find any potentially troublesome spots, and also to prevent anyone from adding new code that accesses iter->iov directly. Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Mar 21, 2023
-
-
Jens Axboe authored
Since io_uring does nonblocking connect requests, if we do two repeated ones without having a listener, the second will get -ECONNABORTED rather than the expected -ECONNREFUSED. Treat -ECONNABORTED like a normal retry condition if we're nonblocking, if we haven't already seen it. Cc: stable@vger.kernel.org Fixes: 3fb1bd68 ("io_uring/net: handle -EINPROGRESS correct for IORING_OP_CONNECT") Link: https://github.com/axboe/liburing/issues/828 Reported-by:
Hui, Chunyang <sanqian.hcy@antgroup.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Feb 24, 2023
-
-
David Lamparter authored
MSG_NOSIGNAL is not applicable for the receiving side, SIGPIPE is generated when trying to write to a "broken pipe". AF_PACKET's packet_recvmsg() does enforce this, giving back EINVAL when MSG_NOSIGNAL is set - making it unuseable in io_uring's recvmsg. Remove MSG_NOSIGNAL from io_recvmsg_prep(). Cc: stable@vger.kernel.org # v5.10+ Signed-off-by:
David Lamparter <equinox@diac24.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Jens Axboe <axboe@kernel.dk> Reviewed-by:
Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20230224150123.128346-1-equinox@diac24.net Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Jan 29, 2023
-
-
Dylan Yudaken authored
Some requests require being run async as they do not support non-blocking. Instead of trying to issue these requests, getting -EAGAIN and then queueing them for async issue, rather just force async upfront. Add WARN_ON_ONCE to make sure surprising code paths do not come up, however in those cases the bug would end up being a blocking io_uring_enter(2) which should not be critical. Signed-off-by:
Dylan Yudaken <dylany@meta.com> Link: https://lore.kernel.org/r/20230127135227.3646353-3-dylany@meta.com Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Jan 23, 2023
-
-
Jens Axboe authored
If we're using ring provided buffers with multishot receive, and we end up doing an io-wq based issue at some points that also needs to select a buffer, we'll lose the initially assigned buffer group as io_ring_buffer_select() correctly clears the buffer group list as the issue isn't serialized by the ctx uring_lock. This is fine for normal receives as the request puts the buffer and finishes, but for multishot, we will re-arm and do further receives. On the next trigger for this multishot receive, the receive will try and pick from a buffer group whose value is the same as the buffer ID of the las receive. That is obviously incorrect, and will result in a premature -ENOUFS error for the receive even if we had available buffers in the correct group. Cache the buffer group value at prep time, so we can restore it for future receives. This only needs doing for the above mentioned case, but just do it by default to keep it easier to read. Cc: stable@vger.kernel.org Fixes: b3fdea6e ("io_uring: multishot recv") Fixes: 9bb66906 ("io_uring: support multishot in recvmsg") Cc: Dylan Yudaken <dylany@meta.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Jan 09, 2023
-
-
Jens Axboe authored
This is more efficient than iter_iov. Signed-off-by:
Jens Axboe <axboe@kernel.dk> [merged to 6.2] Signed-off-by:
Keith Busch <kbusch@kernel.org> Reviewed-by:
Christoph Hellwig <hch@lst.de>
-
- Dec 19, 2022
-
-
Pavel Begunkov authored
Don't access io_async_msghdr io_netmsg_recycle(), it may be reallocated. Cc: stable@vger.kernel.org Fixes: 9bb66906 ("io_uring: support multishot in recvmsg") Signed-off-by:
Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/9e326f4ad4046ddadf15bf34bf3fa58c6372f6b5.1671461985.git.asml.silence@gmail.com Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Jens Axboe authored
If we're not allocating the vectors because the count is below UIO_FASTIOV, we still do need to properly clear ->free_iov to prevent an erronous free of on-stack data. Reported-by:
Jiri Slaby <jirislaby@gmail.com> Fixes: 4c17a496 ("io_uring/net: fix cleanup double free free_iov init") Cc: stable@vger.kernel.org Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Dec 07, 2022
-
-
Pavel Begunkov authored
Multishot are posting CQEs outside of the normal request completion path, which is usually done from within a task work handler. However, it might be not the case when it's yet to be polled but has been punted to io-wq. Make it abide ->task_complete and push it to the polling path when executed by io-wq. Signed-off-by:
Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/d7714aaff583096769a0f26e8e747759e556feb1.1670384893.git.asml.silence@gmail.com Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Nov 25, 2022
-
-
Al Viro authored
READ/WRITE proved to be actively confusing - the meanings are "data destination, as used with read(2)" and "data source, as used with write(2)", but people keep interpreting those as "we read data from it" and "we write data to it", i.e. exactly the wrong way. Call them ITER_DEST and ITER_SOURCE - at least that is harder to misinterpret... Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
Dylan Yudaken authored
Use the just introduced deferred post cqe completion state when possible in io_aux_cqe. If not possible fallback to io_post_aux_cqe. This introduces a complication because of allow_overflow. For deferred completions we cannot know without locking the completion_lock if it will overflow (and even if we locked it, another post could sneak in and cause this cqe to be in overflow). However since overflow protection is mostly a best effort defence in depth to prevent infinite loops of CQEs for poll, just checking the overflow bit is going to be good enough and will result in at most 16 (array size of deferred cqes) overflows. Suggested-by:
Pavel Begunkov <asml.silence@gmail.com> Signed-off-by:
Dylan Yudaken <dylany@meta.com> Link: https://lore.kernel.org/r/20221124093559.3780686-6-dylany@meta.com Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Nov 21, 2022
-
-
Dylan Yudaken authored
With commit aa1df3a3 ("io_uring: fix CQE reordering"), there are stronger guarantees for overflow ordering. Specifically ensuring that userspace will not receive out of order receive CQEs. Therefore this is not needed any more for recv/recvmsg. Signed-off-by:
Dylan Yudaken <dylany@meta.com> Link: https://lore.kernel.org/r/20221107125236.260132-4-dylany@meta.com Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Dylan Yudaken authored
This is no longer needed after commit aa1df3a3 ("io_uring: fix CQE reordering"), since all reordering is now taken care of. This reverts commit cbd25748 ("io_uring: fix multishot accept ordering"). Signed-off-by:
Dylan Yudaken <dylany@meta.com> Link: https://lore.kernel.org/r/20221107125236.260132-2-dylany@meta.com Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Xinghui Li authored
Fixes two errors: "ERROR: do not use assignment in if condition 130: FILE: io_uring/net.c:130: + if (!(issue_flags & IO_URING_F_UNLOCKED) && ERROR: do not use assignment in if condition 599: FILE: io_uring/poll.c:599: + } else if (!(issue_flags & IO_URING_F_UNLOCKED) &&" reported by checkpatch.pl in net.c and poll.c . Signed-off-by:
Xinghui Li <korantli@tencent.com> Reported-by:
kernel test robot <lkp@intel.com> Link: https://lore.kernel.org/r/20221102082503.32236-1-korantwork@gmail.com [axboe: style tweaks] Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Pavel Begunkov authored
We can also move mm accounting to the extended callbacks. It removes a few cycles from the hot path including skipping one function call and setting io_req_task_complete as a callback directly. For user backed I/O it shouldn't make any difference taking into considering atomic mm accounting and page pinning. Signed-off-by:
Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/1062f270273ad11c1b7b45ec59a6a317533d5e64.1667557923.git.asml.silence@gmail.com Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Pavel Begunkov authored
Add custom tw and notif callbacks on top of usual bits also handling zc reporting. That moves it from the hot path. Signed-off-by:
Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/40de4a6409042478e1f35adc4912e23226cb1b5c.1667557923.git.asml.silence@gmail.com Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Stefan Metzmacher authored
It might be useful for applications to detect if a zero copy transfer with SEND[MSG]_ZC was actually possible or not. The application can fallback to plain SEND[MSG] in order to avoid the overhead of two cqes per request. Or it can generate a log message that could indicate to an administrator that no zero copy was possible and could explain degraded performance. Cc: stable@vger.kernel.org # 6.1 Link: https://lore.kernel.org/io-uring/fb6a7599-8a9b-15e5-9b64-6cd9d01c6ff4@gmail.com/T/#m2b0d9df94ce43b0e69e6c089bdff0ce6babbdfaa Signed-off-by:
Stefan Metzmacher <metze@samba.org> Reviewed-by:
Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/8945b01756d902f5d5b0667f20b957ad3f742e5e.1666895626.git.metze@samba.org Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Nov 17, 2022
-
-
Pavel Begunkov authored
Having REQ_F_POLLED set doesn't guarantee that the request is executed as a multishot from the polling path. Fortunately for us, if the code thinks it's multishot issue when it's not, it can only ask to skip completion so leaking the request. Use issue_flags to mark multipoll issues. Cc: stable@vger.kernel.org Fixes: 1300ebb20286b ("io_uring: multishot recv") Signed-off-by:
Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/37762040ba9c52b81b92a2f5ebfd4ee484088951.1668710222.git.asml.silence@gmail.com Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Pavel Begunkov authored
Having REQ_F_POLLED set doesn't guarantee that the request is executed as a multishot from the polling path. Fortunately for us, if the code thinks it's multishot issue when it's not, it can only ask to skip completion so leaking the request. Use issue_flags to mark multipoll issues. Cc: stable@vger.kernel.org Fixes: 390ed29b ("io_uring: add IORING_ACCEPT_MULTISHOT for accept") Signed-off-by:
Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/7700ac57653f2823e30b34dc74da68678c0c5f13.1668710222.git.asml.silence@gmail.com Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Oct 22, 2022
-
-
Pavel Begunkov authored
The previous patch fails zerocopy send requests for protocols that don't support it, do the same for zerocopy sendmsg. Signed-off-by:
Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/0854e7bb4c3d810a48ec8b5853e2f61af36a0467.1666346426.git.asml.silence@gmail.com Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Pavel Begunkov authored
If a protocol doesn't support zerocopy it will silently fall back to copying. This type of behaviour has always been a source of troubles so it's better to fail such requests instead. Cc: <stable@vger.kernel.org> # 6.0 Signed-off-by:
Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/2db3c7f16bb6efab4b04569cd16e6242b40c5cb3.1666346426.git.asml.silence@gmail.com Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-