Commits · c6f60037bfa0 · Dmitry Baryshkov / msm

Nov 28, 2023

io_uring: use fget/fput consistently · 73363c26

Jens Axboe authored 1 year ago

Normally within a syscall it's fine to use fdget/fdput for grabbing a
file from the file table, and it's fine within io_uring as well. We do
that via io_uring_enter(2), io_uring_register(2), and then also for
cancel which is invoked from the latter. io_uring cannot close its own
file descriptors as that is explicitly rejected, and for the cancel
side of things, the file itself is just used as a lookup cookie.

However, it is more prudent to ensure that full references are always
grabbed. For anything threaded, either explicitly in the application
itself or through use of the io-wq worker threads, this is what happens
anyway. Generalize it and use fget/fput throughout.

Also see the below link for more details.

Link: https://lore.kernel.org/io-uring/CAG48ez1htVSO3TqmrF8QcX2WFuYTRM-VZ_N10i-VZgbtg=NNqw@mail.gmail.com/

Suggested-by: Jann Horn <jannh@google.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

73363c26

Sep 29, 2023

io_uring: add support for futex wake and wait · 194bb58c

Jens Axboe authored 1 year ago


Add support for FUTEX_WAKE/WAIT primitives.

IORING_OP_FUTEX_WAKE is mix of FUTEX_WAKE and FUTEX_WAKE_BITSET, as
it does support passing in a bitset.

Similary, IORING_OP_FUTEX_WAIT is a mix of FUTEX_WAIT and
FUTEX_WAIT_BITSET.

For both of them, they are using the futex2 interface.

FUTEX_WAKE is straight forward, as those can always be done directly from
the io_uring submission without needing async handling. For FUTEX_WAIT,
things are a bit more complicated. If the futex isn't ready, then we
rely on a callback via futex_queue->wake() when someone wakes up the
futex. From that calback, we queue up task_work with the original task,
which will post a CQE and wake it, if necessary.

Cancelations are supported, both from the application point-of-view,
but also to be able to cancel pending waits if the ring exits before
all events have occurred. The return value of futex_unqueue() is used
to gate who wins the potential race between cancelation and futex
wakeups. Whomever gets a 'ret == 1' return from that claims ownership
of the io_uring futex request.

This is just the barebones wait/wake support. PI or REQUEUE support is
not added at this point, unclear if we might look into that later.

Likewise, explicit timeouts are not supported either. It is expected
that users that need timeouts would do so via the usual io_uring
mechanism to do that using linked timeouts.

The SQE format is as follows:

`addr`		Address of futex
`fd`		futex2(2) FUTEX2_* flags
`futex_flags`	io_uring specific command flags. None valid now.
`addr2`		Value of futex
`addr3`		Mask to wake/wait

Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

194bb58c

Sep 21, 2023

io_uring: add IORING_OP_WAITID support · f31ecf67

Jens Axboe authored 1 year ago


This adds support for an async version of waitid(2), in a fully async
version. If an event isn't immediately available, wait for a callback
to trigger a retry.

The format of the sqe is as follows:

sqe->len		The 'which', the idtype being queried/waited for.
sqe->fd			The 'pid' (or id) being waited for.
sqe->file_index		The 'options' being set.
sqe->addr2		A pointer to siginfo_t, if any, being filled in.

buf_index, add3, and waitid_flags are reserved/unused for now.
waitid_flags will be used for options for this request type. One
interesting use case may be to add multi-shot support, so that the
request stays armed and posts a notification every time a monitored
process state change occurs.

Note that this does not support rusage, on Arnd's recommendation.

See the waitid(2) man page for details on the arguments.

Signed-off-by: Jens Axboe <axboe@kernel.dk>

f31ecf67

Jul 17, 2023

io_uring/cancel: wire up IORING_ASYNC_CANCEL_OP for sync cancel · f77569d2

Jens Axboe authored 1 year ago


Allow usage of IORING_ASYNC_CANCEL_OP through the sync cancelation
API as well.

Signed-off-by: Jens Axboe <axboe@kernel.dk>

f77569d2

io_uring/cancel: support opcode based lookup and cancelation · d7b8b079

Jens Axboe authored 1 year ago


Add IORING_ASYNC_CANCEL_OP flag for cancelation, which allows the
application to target cancelation based on the opcode of the original
request.

Signed-off-by: Jens Axboe <axboe@kernel.dk>

d7b8b079

io_uring/cancel: add IORING_ASYNC_CANCEL_USERDATA · 8165b566

Jens Axboe authored 1 year ago


Add a flag to explicitly match on user_data in the request for
cancelation purposes. This is the default behavior if none of the
other match flags are set, but if we ALSO want to match on user_data,
then this flag can be set.

Signed-off-by: Jens Axboe <axboe@kernel.dk>

8165b566

io_uring/cancel: fix sequence matching for IORING_ASYNC_CANCEL_ANY · 3a372b66

Jens Axboe authored 1 year ago


We always need to check/update the cancel sequence if
IORING_ASYNC_CANCEL_ALL is set. Also kill the redundant check for
IORING_ASYNC_CANCEL_ANY at the end, if we get here we know it's
not set as we would've matched it higher up.

Signed-off-by: Jens Axboe <axboe@kernel.dk>

3a372b66

io_uring/cancel: abstract out request match helper · aa5cd116

Jens Axboe authored 1 year ago

We have different match code in a variety of spots. Start the cleanup of
this by abstracting out a helper that can be used to check if a given
request matches the cancelation criteria outlined in io_cancel_data.

Signed-off-by: Jens Axboe <axboe@kernel.dk>

aa5cd116

Jun 20, 2023

io_uring: use io_file_from_index in __io_sync_cancel · 60a666f0

Christoph Hellwig authored 1 year ago


Use io_file_from_index instead of open coding it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20230620113235.920399-7-hch@lst.de


Signed-off-by: Jens Axboe <axboe@kernel.dk>

60a666f0

Dec 21, 2022

io_uring/cancel: re-grab ctx mutex after finishing wait · 23fffb2f

Jens Axboe authored 2 years ago


If we have a signal pending during cancelations, it'll cause the
task_work run to return an error. Since we didn't run task_work, the
current task is left in TASK_INTERRUPTIBLE state when we need to
re-grab the ctx mutex, and the kernel will rightfully complain about
that.

Move the lock grabbing for the error cases outside the loop to avoid
that issue.

Reported-by:  <syzbot+7df055631cd1be4586fd@syzkaller.appspotmail.com>
Link: https://lore.kernel.org/io-uring/0000000000003a14a905f05050b0@google.com/


Signed-off-by: Jens Axboe <axboe@kernel.dk>

23fffb2f

Sep 21, 2022

io_uring: add IORING_SETUP_DEFER_TASKRUN · c0e0d6ba

Dylan Yudaken authored 2 years ago

Allow deferring async tasks until the user calls io_uring_enter(2) with
the IORING_ENTER_GETEVENTS flag. Enable this mode with a flag at
io_uring_setup time. This functionality requires that the later
io_uring_enter will be called from the same submission task, and therefore
restrict this flag to work only when IORING_SETUP_SINGLE_ISSUER is also
set.

Being able to hand pick when tasks are run prevents the problem where
there is current work to be done, however task work runs anyway.

For example, a common workload would obtain a batch of CQEs, and process
each one. Interrupting this to additional taskwork would add latency but
not gain anything. If instead task work is deferred to just before more
CQEs are obtained then no additional latency is added.

The way this is implemented is by trying to keep task work local to a
io_ring_ctx, rather than to the submission task. This is required, as the
application will want to wake up only a single io_ring_ctx at a time to
process work, and so the lists of work have to be kept separate.

This has some other benefits like not having to check the task continually
in handle_tw_list (and potentially unlocking/locking those), and reducing
locks in the submit & process completions path.

There are networking cases where using this option can reduce request
latency by 50%. For example a contrived example using [1] where the client
sends 2k data and receives the same data back while doing some system
calls (to trigger task work) shows this reduction. The reason ends up
being that if sending responses is delayed by processing task work, then
the client side sits idle. Whereas reordering the sends first means that
the client runs it's workload in parallel with the local task work.

[1]:
Using https://github.com/DylanZA/netbench/tree/defer_run

Client:
./netbench --client_only 1 --control_port 10000 --host <host> --tx "epoll --threads 16 --per_thread 1 --size 2048 --resp 2048 --workload 1000"
Server:
./netbench --server_only 1 --control_port 10000 --rx "io_uring --defer_taskrun 0 --workload 100" --rx "io_uring --defer_taskrun 1 --workload 100"

Signed-off-by: Dylan Yudaken <dylany@fb.com>
Link: https://lore.kernel.org/r/20220830125013.570060-5-dylany@fb.com

Signed-off-by: Jens Axboe <axboe@kernel.dk>

c0e0d6ba

Aug 23, 2022

io_uring: fix off-by-one in sync cancelation file check · 47abea04

Jens Axboe authored 2 years ago


The passed in index should be validated against the number of registered
files we have, it needs to be smaller than the index value to avoid going
one beyond the end.

Fixes: 78a861b9 ("io_uring: add sync cancelation API through io_uring_register()")
Reported-by: Luo Likang <luolikang@nsfocus.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

47abea04

Aug 12, 2022

io_uring: make io_kiocb_to_cmd() typesafe · f2ccb5ae

Stefan Metzmacher authored 2 years ago


We need to make sure (at build time) that struct io_cmd_data is not
casted to a structure that's larger.

Signed-off-by: Stefan Metzmacher <metze@samba.org>
Link: https://lore.kernel.org/r/c024cdf25ae19fc0319d4180e2298bade8ed17b8.1660201408.git.metze@samba.org


Signed-off-by: Jens Axboe <axboe@kernel.dk>

f2ccb5ae

Jul 25, 2022

io_uring: add sync cancelation API through io_uring_register() · 78a861b9

Jens Axboe authored 2 years ago

The io_uring cancelation API is async, like any other API that we expose
there. For the case of finding a request to cancel, or not finding one,
it is fully sync in that when submission returns, the CQE for both the
cancelation request and the targeted request have been posted to the
CQ ring.

However, if the targeted work is being executed by io-wq, the API can
only start the act of canceling it. This makes it difficult to use in
some circumstances, as the caller then has to wait for the CQEs to come
in and match on the same cancelation data there.

Provide a IORING_REGISTER_SYNC_CANCEL command for io_uring_register()
that does sync cancelations, always. For the io-wq case, it'll wait
for the cancelation to come in before returning. The only expected
returns from this API is:

0		Request found and canceled fine.
> 0		Requests found and canceled. Only happens if asked to
		cancel multiple requests, and if the work wasn't in
		progress.
-ENOENT		Request not found.
-ETIME		A timeout on the operation was requested, but the timeout
		expired before we could cancel.

and we won't get -EALREADY via this API.

If the timeout value passed in is -1 (tv_sec and tv_nsec), then that
means that no timeout is requested. Otherwise, the timespec passed in
is the amount of time the sync cancel will wait for a successful
cancelation.

Link: https://github.com/axboe/liburing/discussions/608


Signed-off-by: Jens Axboe <axboe@kernel.dk>

78a861b9

io_uring: add IORING_ASYNC_CANCEL_FD_FIXED cancel flag · 7d8ca725

Jens Axboe authored 2 years ago


In preparation for not having a request to pass in that carries this
state, add a separate cancelation flag that allows the caller to ask
for a fixed file for cancelation.

Signed-off-by: Jens Axboe <axboe@kernel.dk>

7d8ca725

io_uring: have cancelation API accept io_uring_task directly · 88f52eaa

Jens Axboe authored 2 years ago


We just use the io_kiocb passed in to find the io_uring_task, and we
already pass in the ctx via cd->ctx anyway.

Signed-off-by: Jens Axboe <axboe@kernel.dk>

88f52eaa

io_uring: kill extra io_uring_types.h includes · 27a9d66f

Pavel Begunkov authored 2 years ago


io_uring/io_uring.h already includes io_uring_types.h, no need to
include it every time. Kill it in a bunch of places, it prepares us for
following patches.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/94d8c943fbe0ef949981c508ddcee7fc1c18850f.1655384063.git.asml.silence@gmail.com


Signed-off-by: Jens Axboe <axboe@kernel.dk>

27a9d66f

io_uring: propagate locking state to poll cancel · 5d7943d9

Pavel Begunkov authored 2 years ago

Poll cancellation will be soon need to grab ->uring_lock inside, pass
the locking state, i.e. issue_flags, inside the cancellation functions.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/b86781d047727c07163443b57551a3fa57c7c5e1.1655371007.git.asml.silence@gmail.com

Reviewed-by: Hao Xu <howeyxu@tencent.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

5d7943d9

io_uring: introduce a struct for hash table · e6f89be6

Pavel Begunkov authored 2 years ago


Instead of passing around a pointer to hash buckets, add a bit of type
safety and wrap it into a structure.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/d65bc3faba537ec2aca9eabf334394936d44bd28.1655371007.git.asml.silence@gmail.com


Reviewed-by: Hao Xu <howeyxu@tencent.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

e6f89be6

io_uring: clean up io_try_cancel · 4dfab8ab

Pavel Begunkov authored 2 years ago


Get rid of an unnecessary extra goto in io_try_cancel() and simplify the
function.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/48cf5417b43a8386c6c364dba1ad9b4c7382d158.1655371007.git.asml.silence@gmail.com


Reviewed-by: Hao Xu <howeyxu@tencent.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

4dfab8ab

io_uring: switch cancel_hash to use per entry spinlock · 38513c46

Hao Xu authored 2 years ago


Add a new io_hash_bucket structure so that each bucket in cancel_hash
has separate spinlock. Use per entry lock for cancel_hash, this removes
some completion lock invocation and remove contension between different
cancel_hash entries.

Signed-off-by: Hao Xu <howeyxu@tencent.com>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/05d1e135b0c8bce9d1441e6346776589e5783e26.1655371007.git.asml.silence@gmail.com


Signed-off-by: Jens Axboe <axboe@kernel.dk>

38513c46

io_uring: move cancelation into its own file · 7aaff708

Jens Axboe authored 2 years ago


This also helps cleanup the io_uring.h cancel parts, as we can make
things static in the cancel.c file, mostly.

Signed-off-by: Jens Axboe <axboe@kernel.dk>

7aaff708

Admin message

Admin message