Commits · 8ecee4cbc72b53551c1d33251a48c912d70282ea · Alex Deucher / linux

Jul 07, 2023

io_uring: Use io_schedule* in cqring wait · 8a796565

Andres Freund authored 1 year ago


I observed poor performance of io_uring compared to synchronous IO. That
turns out to be caused by deeper CPU idle states entered with io_uring,
due to io_uring using plain schedule(), whereas synchronous IO uses
io_schedule().

The losses due to this are substantial. On my cascade lake workstation,
t/io_uring from the fio repository e.g. yields regressions between 20%
and 40% with the following command:
./t/io_uring -r 5 -X0 -d 1 -s 1 -c 1 -p 0 -S$use_sync -R 0 /mnt/t2/fio/write.0.0

This is repeatable with different filesystems, using raw block devices
and using different block devices.

Use io_schedule_prepare() / io_schedule_finish() in
io_cqring_wait_schedule() to address the difference.

After that using io_uring is on par or surpassing synchronous IO (using
registered files etc makes it reliably win, but arguably is a less fair
comparison).

There are other calls to schedule() in io_uring/, but none immediately
jump out to be similarly situated, so I did not touch them. Similarly,
it's possible that mutex_lock_io() should be used, but it's not clear if
there are cases where that matters.

Cc: stable@vger.kernel.org # 5.10+
Cc: Pavel Begunkov <asml.silence@gmail.com>
Cc: io-uring@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Andres Freund <andres@anarazel.de>
Link: https://lore.kernel.org/r/20230707162007.194068-1-andres@anarazel.de


[axboe: minor style fixup]
Signed-off-by: Jens Axboe <axboe@kernel.dk>

8a796565

Jun 28, 2023

io_uring: flush offloaded and delayed task_work on exit · dfbe5561

Jens Axboe authored 1 year ago


io_uring offloads task_work for cancelation purposes when the task is
exiting. This is conceptually fine, but we should be nicer and actually
wait for that work to complete before returning.

Add an argument to io_fallback_tw() telling it to flush the deferred
work when it's all queued up, and have it flush a ctx behind whenever
the ctx changes.

Signed-off-by: Jens Axboe <axboe@kernel.dk>

dfbe5561

Jun 27, 2023

io_uring: remove io_fallback_tw() forward declaration · 10e1c0d5

Jens Axboe authored 1 year ago


It's used just one function higher up, get rid of the declaration and
just move it up a bit.

Signed-off-by: Jens Axboe <axboe@kernel.dk>

10e1c0d5

io_uring/net: use proper value for msg_inq · b65db921

Jens Axboe authored 1 year ago

struct msghdr->msg_inq is a signed type, yet we attempt to store what
is essentially an unsigned bitmask in there. We only really need to know
if the field was stored or not, but let's use the proper type to avoid
any misunderstandings on what is being attempted here.

Link: https://lore.kernel.org/io-uring/CAHk-=wjKb24aSe6fE4zDH-eh8hr-FB9BbukObUVSMGOrsBHCRQ@mail.gmail.com/

Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

b65db921

Jun 23, 2023

io_uring: merge conditional unlock flush helpers · c98c81a4

Pavel Begunkov authored 1 year ago


There is no reason not to use __io_cq_unlock_post_flush for intermediate
aux CQE flushing, all ->task_complete should apply there, i.e. if set it
should be the submitter task. Combine them, get rid of of
__io_cq_unlock_post() and rename the left function.

This place was also taking a couple percents of CPU according to
profiles for max throughput net benchmarks due to multishot recv
flooding it with completions.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/bbed60734cbec2e833d9c7bdcf9741aada5d8aab.1687518903.git.asml.silence@gmail.com


Signed-off-by: Jens Axboe <axboe@kernel.dk>

c98c81a4

io_uring: make io_cq_unlock_post static · 0fdb9a19

Pavel Begunkov authored 1 year ago

io_cq_unlock_post() is exclusively used in io_uring/io_uring.c, mark it
static and don't expose to other files.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/3dc8127dda4514e1dd24bb32035faac887c5fa37.1687518903.git.asml.silence@gmail.com

Signed-off-by: Jens Axboe <axboe@kernel.dk>

0fdb9a19

io_uring: inline __io_cq_unlock · ff126177

Pavel Begunkov authored 1 year ago


__io_cq_unlock is not very helpful, and users should be calling flush
variants anyway. Open code the function.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/d875c4cfb69f38ccecb58a57111446c77a614caa.1687518903.git.asml.silence@gmail.com


Signed-off-by: Jens Axboe <axboe@kernel.dk>

ff126177

io_uring: fix acquire/release annotations · 55b6a69f

Pavel Begunkov authored 1 year ago

We do conditional locking, so __io_cq_lock() and friends not always
actually grab/release the lock, so kill misleading annotations.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/2a098f9144c24cab622f8bf90b39f44da5d0401e.1687518903.git.asml.silence@gmail.com

Signed-off-by: Jens Axboe <axboe@kernel.dk>

55b6a69f

io_uring: kill io_cq_unlock() · f432b76b

Pavel Begunkov authored 1 year ago

We're abusing ->completion_lock helpers. io_cq_unlock() neither
locking conditionally nor doing CQE flushing, which means that callers
must have some side reason of taking the lock and should do it directly.

Open code io_cq_unlock() into io_cqring_overflow_kill() and clean it up.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/7dabb36856db2b562e78780480396c52c29b2bf4.1687518903.git.asml.silence@gmail.com

Signed-off-by: Jens Axboe <axboe@kernel.dk>

f432b76b

io_uring: remove IOU_F_TWQ_FORCE_NORMAL · 91c7884a

Pavel Begunkov authored 1 year ago


Extract a function for non-local task_work_add, and use it directly from
io_move_task_work_from_local(). Now we don't use IOU_F_TWQ_FORCE_NORMAL
and it can be killed.

As a small positive side effect we don't grab task->io_uring in
io_req_normal_work_add anymore, which is not needed for
io_req_local_work_add().

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/2e55571e8ff2927ae3cc12da606d204e2485525b.1687518903.git.asml.silence@gmail.com


Signed-off-by: Jens Axboe <axboe@kernel.dk>

91c7884a

io_uring: don't batch task put on reqs free · 2fdd6fb5

Pavel Begunkov authored 1 year ago

We're trying to batch io_put_task() in io_free_batch_list(), but
considering that the hot path is a simple inc, it's most cerainly and
probably faster to just do io_put_task() instead of task tracking.

We don't care about io_put_task_remote() as it's only for IOPOLL
where polling/waiting is done by not the submitter task.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/4a7ef7dce845fe2bd35507bf389d6bd2d5c1edf0.1687518903.git.asml.silence@gmail.com

Signed-off-by: Jens Axboe <axboe@kernel.dk>

2fdd6fb5

io_uring: move io_clean_op() · 5a754dea

Pavel Begunkov authored 1 year ago


Move io_clean_op() up in the source file and remove the forward
declaration, as the function doesn't have tricky dependencies
anymore.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/1b7163b2ba7c3a8322d972c79c1b0a9301b3057e.1687518903.git.asml.silence@gmail.com


Signed-off-by: Jens Axboe <axboe@kernel.dk>

5a754dea

io_uring: inline io_dismantle_req() · 3b7a612f

Pavel Begunkov authored 1 year ago


io_dismantle_req() is only used in __io_req_complete_post(), open code
it there.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/ba8f20cb2c914eefa2e7d120a104a198552050db.1687518903.git.asml.silence@gmail.com


Signed-off-by: Jens Axboe <axboe@kernel.dk>

3b7a612f

io_uring: remove io_free_req_tw · 6ec9afc7

Pavel Begunkov authored 1 year ago


Request completion is a very hot path in general, but there are 3 places
that can be doing it: io_free_batch_list(), io_req_complete_post() and
io_free_req_tw().

io_free_req_tw() is used rather marginally and we don't care about it.
Killing it can help to clean up and optimise the left two, do that by
replacing it with io_req_task_complete().

There are two things to consider:
1) io_free_req() is called when all refs are put, so we need to reinit
   references. The easiest way to do that is to clear REQ_F_REFCOUNT.
2) We also don't need a cqe from it, so silence it with REQ_F_CQE_SKIP.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/434a2be8f33d474ad888ce1c17fe5ea7bbcb2a55.1687518903.git.asml.silence@gmail.com


Signed-off-by: Jens Axboe <axboe@kernel.dk>

6ec9afc7

io_uring: open code io_put_req_find_next · 247f97a5

Pavel Begunkov authored 1 year ago

There is only one user of io_put_req_find_next() and it doesn't make
much sense to have it. Open code the function.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/38b5c5e48e4adc8e6a0cd16fdd5c1531d7ff81a9.1687518903.git.asml.silence@gmail.com

Signed-off-by: Jens Axboe <axboe@kernel.dk>

247f97a5

Jun 21, 2023

io_uring/net: use the correct msghdr union member in io_sendmsg_copy_hdr · 26fed836

Jens Axboe authored 1 year ago


Rather than assign the user pointer to msghdr->msg_control, assign it
to msghdr->msg_control_user to make sparse happy. They are in a union
so the end result is the same, but let's avoid new sparse warnings and
squash this one.

Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202306210654.mDMcyMuB-lkp@intel.com/


Fixes: cac9e441 ("io_uring/net: save msghdr->msg_control for retries")
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

26fed836

io_uring/net: disable partial retries for recvmsg with cmsg · 78d0d206

Jens Axboe authored 1 year ago

We cannot sanely handle partial retries for recvmsg if we have cmsg
attached. If we don't, then we'd just be overwriting the initial cmsg
header on retries. Alternatively we could increment and handle this
appropriately, but it doesn't seem worth the complication.

Move the MSG_WAITALL check into the non-multishot case while at it,
since MSG_WAITALL is explicitly disabled for multishot anyway.

Link: https://lore.kernel.org/io-uring/0b0d4411-c8fd-4272-770b-e030af6919a0@kernel.dk/


Cc: stable@vger.kernel.org # 5.10+
Reported-by: Stefan Metzmacher <metze@samba.org>
Reviewed-by: Stefan Metzmacher <metze@samba.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

78d0d206

io_uring/net: clear msg_controllen on partial sendmsg retry · b1dc4920

Jens Axboe authored 1 year ago


If we have cmsg attached AND we transferred partial data at least, clear
msg_controllen on retry so we don't attempt to send that again.

Cc: stable@vger.kernel.org # 5.10+
Fixes: cac9e441 ("io_uring/net: save msghdr->msg_control for retries")
Reported-by: Stefan Metzmacher <metze@samba.org>
Reviewed-by: Stefan Metzmacher <metze@samba.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

b1dc4920

Jun 20, 2023

io_uring: add helpers to decode the fixed file file_ptr · 4bfb0c9a

Christoph Hellwig authored 1 year ago


Remove all the open coded magic on slot->file_ptr by introducing two
helpers that return the file pointer and the flags instead.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20230620113235.920399-9-hch@lst.de


Signed-off-by: Jens Axboe <axboe@kernel.dk>

4bfb0c9a

io_uring: use io_file_from_index in io_msg_grab_file · f432c8c8

Christoph Hellwig authored 1 year ago


Use io_file_from_index instead of open coding it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20230620113235.920399-8-hch@lst.de


Signed-off-by: Jens Axboe <axboe@kernel.dk>

f432c8c8

io_uring: use io_file_from_index in __io_sync_cancel · 60a666f0

Christoph Hellwig authored 1 year ago


Use io_file_from_index instead of open coding it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20230620113235.920399-7-hch@lst.de


Signed-off-by: Jens Axboe <axboe@kernel.dk>

60a666f0

io_uring: return REQ_F_ flags from io_file_get_flags · 8487f083

Christoph Hellwig authored 1 year ago


Two of the three callers want them, so return the more usual format,
and shift into the FFS_ form only for the fixed file table.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20230620113235.920399-6-hch@lst.de


Signed-off-by: Jens Axboe <axboe@kernel.dk>

8487f083

io_uring: remove io_req_ffs_set · 3beed235

Christoph Hellwig authored 1 year ago


Just checking the flag directly makes it a lot more obvious what is
going on here.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20230620113235.920399-5-hch@lst.de


Signed-off-by: Jens Axboe <axboe@kernel.dk>

3beed235

io_uring: remove a confusing comment above io_file_get_flags · b57c7cd1

Christoph Hellwig authored 1 year ago


The SCM inflight mechanism has nothing to do with the fact that a file
might be a regular file or not and if it supports non-blocking
operations.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20230620113235.920399-4-hch@lst.de


Signed-off-by: Jens Axboe <axboe@kernel.dk>

b57c7cd1

io_uring: remove the mode variable in io_file_get_flags · 53cfd5ce

Christoph Hellwig authored 1 year ago


The variable is only once now, so don't bother with it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20230620113235.920399-3-hch@lst.de


Signed-off-by: Jens Axboe <axboe@kernel.dk>

53cfd5ce

io_uring: remove __io_file_supports_nowait · b9a6c945

Christoph Hellwig authored 1 year ago


Now that this only checks O_NONBLOCK and FMODE_NOWAIT, the helper is
complete overkilļ, and the comments are confusing bordering to wrong.
Just inline the check into the caller.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20230620113235.920399-2-hch@lst.de


Signed-off-by: Jens Axboe <axboe@kernel.dk>

b9a6c945

Jun 18, 2023

io_uring/poll: serialize poll linked timer start with poll removal · ef7dfac5

Jens Axboe authored 1 year ago


We selectively grab the ctx->uring_lock for poll update/removal, but
we really should grab it from the start to fully synchronize with
linked timeouts. Normally this is indeed the case, but if requests
are forced async by the application, we don't fully cover removal
and timer disarm within the uring_lock.

Make this simpler by having consistent locking state for poll removal.

Cc: stable@vger.kernel.org # 6.1+
Reported-by: Querijn Voet <querijnqyn@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

ef7dfac5

Jun 14, 2023

io_uring/io-wq: clear current->worker_private on exit · adeaa3f2

Jens Axboe authored 1 year ago


A recent fix stopped clearing PF_IO_WORKER from current->flags on exit,
which meant that we can now call inc/dec running on the worker after it
has been removed if it ends up scheduling in/out as part of exit.

If this happens after an RCU grace period has passed, then the struct
pointed to by current->worker_private may have been freed, and we can
now be accessing memory that is freed.

Ensure this doesn't happen by clearing the task worker_private field.
Both io_wq_worker_running() and io_wq_worker_sleeping() check this
field before going any further, and we don't need any accounting etc
done after this worker has exited.

Fixes: fd37b884 ("io_uring/io-wq: don't clear PF_IO_WORKER on exit")
Reported-by: Zorro Lang <zlang@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

adeaa3f2

io_uring/net: save msghdr->msg_control for retries · cac9e441

Jens Axboe authored 1 year ago

If the application sets ->msg_control and we have to later retry this
command, or if it got queued with IOSQE_ASYNC to begin with, then we
need to retain the original msg_control value. This is due to the net
stack overwriting this field with an in-kernel pointer, to copy it
in. Hitting that path for the second time will now fail the copy from
user, as it's attempting to copy from a non-user address.

Cc: stable@vger.kernel.org # 5.10+
Link: https://github.com/axboe/liburing/issues/880


Reported-and-tested-by: Marek Majkowski <marek@cloudflare.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

cac9e441

Jun 12, 2023

io_uring/io-wq: don't clear PF_IO_WORKER on exit · fd37b884

Jens Axboe authored 1 year ago

A recent commit gated the core dumping task exit logic on current->flags
remaining consistent in terms of PF_{IO,USER}_WORKER at task exit time.
This exposed a problem with the io-wq handling of that, which explicitly
clears PF_IO_WORKER before calling do_exit().

The reasons for this manual clear of PF_IO_WORKER is historical, where
io-wq used to potentially trigger a sleep on exit. As the io-wq thread
is exiting, it should not participate any further accounting. But these
days we don't need to rely on current->flags anymore, so we can safely
remove the PF_IO_WORKER clearing.

Reported-by: Zorro Lang <zlang@redhat.com>
Reported-by: Dave Chinner <david@fromorbit.com>
Link: https://lore.kernel.org/all/ZIZSPyzReZkGBEFy@dread.disaster.area/

Fixes: f9010dbd ("fork, vhost: Use CLONE_THREAD to fix freezer/ps regression")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

fd37b884

io_uring: wait interruptibly for request completions on exit · 4826c594

Jens Axboe authored 1 year ago

WHen the ring exits, cleanup is done and the final cancelation and
waiting on completions is done by io_ring_exit_work. That function is
invoked by kworker, which doesn't take any signals. Because of that, it
doesn't really matter if we wait for completions in TASK_INTERRUPTIBLE
or TASK_UNINTERRUPTIBLE state. However, it does matter to the hung task
detection checker!

Normally we expect cancelations and completions to happen rather
quickly. Some test cases, however, will exit the ring and park the
owning task stopped (eg via SIGSTOP). If the owning task needs to run
task_work to complete requests, then io_ring_exit_work won't make any
progress until the task is runnable again. Hence io_ring_exit_work can
trigger the hung task detection, which is particularly problematic if
panic-on-hung-task is enabled.

As the ring exit doesn't take signals to begin with, have it wait
interruptibly rather than uninterruptibly. io_uring has a separate
stuck-exit warning that triggers independently anyway, so we're not
really missing anything by making this switch.

Cc: stable@vger.kernel.org # 5.10+
Link: https://lore.kernel.org/r/b0e4aaef-7088-56ce-244c-976edeac0e66@kernel.dk


Signed-off-by: Jens Axboe <axboe@kernel.dk>

4826c594

fsnotify: move fsnotify_open() hook into do_dentry_open() · 7b8c9d7b

Amir Goldstein authored 1 year ago and

Jan Kara committed 1 year ago


fsnotify_open() hook is called only from high level system calls
context and not called for the very many helpers to open files.

This may makes sense for many of the special file open cases, but it is
inconsistent with fsnotify_close() hook that is called for every last
fput() of on a file object with FMODE_OPENED.

As a result, it is possible to observe ACCESS, MODIFY and CLOSE events
without ever observing an OPEN event.

Fix this inconsistency by replacing all the fsnotify_open() hooks with
a single hook inside do_dentry_open().

If there are special cases that would like to opt-out of the possible
overhead of fsnotify() call in fsnotify_open(), they would probably also
want to avoid the overhead of fsnotify() call in the rest of the fsnotify
hooks, so they should be opening that file with the __FMODE_NONOTIFY flag.

However, in the majority of those cases, the s_fsnotify_connectors
optimization in fsnotify_parent() would be sufficient to avoid the
overhead of fsnotify() call anyway.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Message-Id: <20230611122429.1499617-1-amir73il@gmail.com>

7b8c9d7b

Jun 09, 2023

mm/gup: remove vmas parameter from pin_user_pages() · 4c630f30

Lorenzo Stoakes authored 1 year ago

We are now in a position where no caller of pin_user_pages() requires the
vmas parameter at all, so eliminate this parameter from the function and
all callers.

This clears the way to removing the vmas parameter from GUP altogether.

Link: https://lkml.kernel.org/r/195a99ae949c9f5cb589d2222b736ced96ec199a.1684350871.git.lstoakes@gmail.com


Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com>
Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>	[qib]
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Sakari Ailus <sakari.ailus@linux.intel.com>	[drivers/media]
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Janosch Frank <frankja@linux.ibm.com>
Cc: Jarkko Sakkinen <jarkko@kernel.org>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Sean Christopherson <seanjc@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

4c630f30

io_uring: rsrc: delegate VMA file-backed check to GUP · 34ed8d0d

Lorenzo Stoakes authored 1 year ago

Now that the GUP explicitly checks FOLL_LONGTERM pin_user_pages() for
broken file-backed mappings in "mm/gup: disallow FOLL_LONGTERM GUP-nonfast
writing to file-backed mappings", there is no need to explicitly check VMAs
for this condition, so simply remove this logic from io_uring altogether.

Link: https://lkml.kernel.org/r/e4a4efbda9cd12df71e0ed81796dc630231a1ef2.1684350871.git.lstoakes@gmail.com


Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Reviewed-by: David Hildenbrand <david@redhat.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Janosch Frank <frankja@linux.ibm.com>
Cc: Jarkko Sakkinen <jarkko@kernel.org>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Sakari Ailus <sakari.ailus@linux.intel.com>
Cc: Sean Christopherson <seanjc@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

34ed8d0d

Jun 07, 2023

io_uring: get rid of unnecessary 'length' variable · 003f242b

Jens Axboe authored 1 year ago


Just use the ARRAY_SIZE directly, we don't use length for anything else
in this function.

Signed-off-by: Jens Axboe <axboe@kernel.dk>

003f242b

io_uring: cleanup io_aux_cqe() API · d86eaed1

Jens Axboe authored 1 year ago


Everybody is passing in the request, so get rid of the io_ring_ctx and
explicit user_data pass-in. Both the ctx and user_data can be deduced
from the request at hand.

Signed-off-by: Jens Axboe <axboe@kernel.dk>

d86eaed1

Jun 02, 2023

io_uring: avoid indirect function calls for the hottest task_work · c92fcfc2

Jens Axboe authored 1 year ago


We use task_work for a variety of reasons, but doing completions or
triggering rety after poll are by far the hottest two. Use the indirect
funtion call wrappers to avoid the indirect function call if
CONFIG_RETPOLINE is set.

Signed-off-by: Jens Axboe <axboe@kernel.dk>

c92fcfc2

May 27, 2023

io_uring: undeprecate epoll_ctl support · 4ea0bf4b

Ben Noordhuis authored 1 year ago

Libuv recently started using it so there is at least one consumer now.

Cc: stable@vger.kernel.org
Fixes: 61a2732a ("io_uring: deprecate epoll_ctl support")
Link: https://github.com/libuv/libuv/pull/3979


Signed-off-by: Ben Noordhuis <info@bnoordhuis.nl>
Link: https://lore.kernel.org/r/20230506095502.13401-1-info@bnoordhuis.nl


Signed-off-by: Jens Axboe <axboe@kernel.dk>

4ea0bf4b

May 25, 2023

io_uring: unlock sqd->lock before sq thread release CPU · 533ab73f

Wenwen Chen authored 1 year ago


The sq thread actively releases CPU resources by calling the
cond_resched() and schedule() interfaces when it is idle. Therefore,
more resources are available for other threads to run.

There exists a problem in sq thread: it does not unlock sqd->lock before
releasing CPU resources every time. This makes other threads pending on
sqd->lock for a long time. For example, the following interfaces all
require sqd->lock: io_sq_offload_create(), io_register_iowq_max_workers()
and io_ring_exit_work().

Before the sq thread releases CPU resources, unlocking sqd->lock will
provide the user a better experience because it can respond quickly to
user requests.

Signed-off-by: Kanchan <Joshi&lt;joshi.k@samsung.com>
Signed-off-by: Wenwen <Chen&lt;wenwen.chen@samsung.com>
Link: https://lore.kernel.org/r/20230525082626.577862-1-wenwen.chen@samsung.com


Signed-off-by: Jens Axboe <axboe@kernel.dk>

533ab73f

io_uring/cmd: add cmd lazy tw wake helper · 5f3139fc

Pavel Begunkov authored 1 year ago

We want to use IOU_F_TWQ_LAZY_WAKE in commands. First, introduce a new
cmd tw helper accepting TWQ flags, and then add
io_uring_cmd_do_in_task_laz() that will pass IOU_F_TWQ_LAZY_WAKE and
imply the "lazy" semantics, i.e. it posts no more than 1 CQE and
delaying execution of this tw should not prevent forward progress.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/5b9f6716006df7e817f18bd555aee2f8f9c8b0c3.1684154817.git.asml.silence@gmail.com

Signed-off-by: Jens Axboe <axboe@kernel.dk>

5f3139fc

Admin message

Admin message