Blame man/io_uring_setup.2

Packit Service 63ea89
.\" Copyright (C) 2019 Jens Axboe <axboe@kernel.dk>
Packit Service 63ea89
.\" Copyright (C) 2019 Jon Corbet <corbet@lwn.net>
Packit Service 63ea89
.\" Copyright (C) 2019 Red Hat, Inc.
Packit Service 63ea89
.\"
Packit Service 63ea89
.\" SPDX-License-Identifier: LGPL-2.0-or-later
Packit Service 63ea89
.\"
Packit Service 63ea89
.TH IO_URING_SETUP 2 2019-01-29 "Linux" "Linux Programmer's Manual"
Packit Service 63ea89
.SH NAME
Packit Service 63ea89
io_uring_setup \- setup a context for performing asynchronous I/O
Packit Service 63ea89
.SH SYNOPSIS
Packit Service 63ea89
.nf
Packit Service 63ea89
.BR "#include <linux/io_uring.h>"
Packit Service 63ea89
.PP
Packit Service 63ea89
.BI "int io_uring_setup(u32 " entries ", struct io_uring_params *" p );
Packit Service 63ea89
.fi
Packit Service 63ea89
.PP
Packit Service 63ea89
.SH DESCRIPTION
Packit Service 63ea89
.PP
Packit Service 63ea89
The io_uring_setup() system call sets up a submission queue (SQ) and
Packit Service 63ea89
completion queue (CQ) with at least
Packit Service 63ea89
.I entries
Packit Service 63ea89
entries, and returns a file descriptor which can be used to perform
Packit Service 63ea89
subsequent operations on the io_uring instance.  The submission and
Packit Service 63ea89
completion queues are shared between userspace and the kernel, which
Packit Service 63ea89
eliminates the need to copy data when initiating and completing I/O.
Packit Service 63ea89
Packit Service 63ea89
.I params
Packit Service 63ea89
is used by the application to pass options to the kernel, and by the
Packit Service 63ea89
kernel to convey information about the ring buffers.
Packit Service 63ea89
.PP
Packit Service 63ea89
.in +4n
Packit Service 63ea89
.EX
Packit Service 63ea89
struct io_uring_params {
Packit Service 63ea89
    __u32 sq_entries;
Packit Service 63ea89
    __u32 cq_entries;
Packit Service 63ea89
    __u32 flags;
Packit Service 63ea89
    __u32 sq_thread_cpu;
Packit Service 63ea89
    __u32 sq_thread_idle;
Packit Service 63ea89
    __u32 features;
Packit Service 63ea89
    __u32 resv[4];
Packit Service 63ea89
    struct io_sqring_offsets sq_off;
Packit Service 63ea89
    struct io_cqring_offsets cq_off;
Packit Service 63ea89
};
Packit Service 63ea89
.EE
Packit Service 63ea89
.in
Packit Service 63ea89
.PP
Packit Service 63ea89
The
Packit Service 63ea89
.IR flags ,
Packit Service 63ea89
.IR sq_thread_cpu ,
Packit Service 63ea89
and
Packit Service 63ea89
.I sq_thread_idle
Packit Service 63ea89
fields are used to configure the io_uring instance.
Packit Service 63ea89
.I flags
Packit Service 63ea89
is a bit mask of 0 or more of the following values ORed
Packit Service 63ea89
together:
Packit Service 63ea89
.TP
Packit Service 63ea89
.B IORING_SETUP_IOPOLL
Packit Service 63ea89
Perform busy-waiting for an I/O completion, as opposed to getting
Packit Service 63ea89
notifications via an asynchronous IRQ (Interrupt Request).  The file
Packit Service 63ea89
system (if any) and block device must support polling in order for
Packit Service 63ea89
this to work.  Busy-waiting provides lower latency, but may consume
Packit Service 63ea89
more CPU resources than interrupt driven I/O.  Currently, this feature
Packit Service 63ea89
is usable only on a file descriptor opened using the
Packit Service 63ea89
.B O_DIRECT
Packit Service 63ea89
flag.  When a read or write is submitted to a polled context, the
Packit Service 63ea89
application must poll for completions on the CQ ring by calling
Packit Service 63ea89
.BR io_uring_enter (2).
Packit Service 63ea89
It is illegal to mix and match polled and non-polled I/O on an io_uring
Packit Service 63ea89
instance.
Packit Service 63ea89
Packit Service 63ea89
.TP
Packit Service 63ea89
.B IORING_SETUP_SQPOLL
Packit Service 63ea89
When this flag is specified, a kernel thread is created to perform
Packit Service 63ea89
submission queue polling.  An io_uring instance configured in this way
Packit Service 63ea89
enables an application to issue I/O without ever context switching
Packit Service 63ea89
into the kernel.  By using the submission queue to fill in new
Packit Service 63ea89
submission queue entries and watching for completions on the
Packit Service 63ea89
completion queue, the application can submit and reap I/Os without
Packit Service 63ea89
doing a single system call.
Packit Service 63ea89
Packit Service 63ea89
If the kernel thread is idle for more than
Packit Service 63ea89
.I sq_thread_idle
Packit Service 63ea89
milliseconds, it will set the
Packit Service 63ea89
.B IORING_SQ_NEED_WAKEUP
Packit Service 63ea89
bit in the
Packit Service 63ea89
.I flags
Packit Service 63ea89
field of the
Packit Service 63ea89
.IR "struct io_sq_ring" .
Packit Service 63ea89
When this happens, the application must call
Packit Service 63ea89
.BR io_uring_enter (2)
Packit Service 63ea89
to wake the kernel thread.  If I/O is kept busy, the kernel thread
Packit Service 63ea89
will never sleep.  An application making use of this feature will need
Packit Service 63ea89
to guard the
Packit Service 63ea89
.BR io_uring_enter (2)
Packit Service 63ea89
call with the following code sequence:
Packit Service 63ea89
Packit Service 63ea89
.in +4n
Packit Service 63ea89
.EX
Packit Service 63ea89
/*
Packit Service 63ea89
 * Ensure that the wakeup flag is read after the tail pointer has been
Packit Service 63ea89
 * written.
Packit Service 63ea89
 */
Packit Service 63ea89
smp_mb();
Packit Service 63ea89
if (*sq_ring->flags & IORING_SQ_NEED_WAKEUP)
Packit Service 63ea89
    io_uring_enter(fd, 0, 0, IORING_ENTER_SQ_WAKEUP);
Packit Service 63ea89
.EE
Packit Service 63ea89
.in
Packit Service 63ea89
Packit Service 63ea89
where
Packit Service 63ea89
.I sq_ring
Packit Service 63ea89
is a submission queue ring setup using the
Packit Service 63ea89
.I struct io_sqring_offsets
Packit Service 63ea89
described below.
Packit Service 63ea89
.TP
Packit Service 63ea89
.BR
Packit Service 63ea89
To successfully use this feature, the application must register a set of files
Packit Service 63ea89
to be used for IO through
Packit Service 63ea89
.BR io_uring_register (2)
Packit Service 63ea89
using the
Packit Service 63ea89
.B IORING_REGISTER_FILES
Packit Service 63ea89
opcode. Failure to do so will result in submitted IO being errored with
Packit Service 63ea89
.B EBADF.
Packit Service 63ea89
.TP
Packit Service 63ea89
.B IORING_SETUP_SQ_AFF
Packit Service 63ea89
If this flag is specified, then the poll thread will be bound to the
Packit Service 63ea89
cpu set in the
Packit Service 63ea89
.I sq_thread_cpu
Packit Service 63ea89
field of the
Packit Service 63ea89
.IR "struct io_uring_params" .
Packit Service 63ea89
This flag is only meaningful when
Packit Service 63ea89
.B IORING_SETUP_SQPOLL
Packit Service 63ea89
is specified.
Packit Service 63ea89
.TP
Packit Service 63ea89
.B IORING_SETUP_CQSIZE
Packit Service 63ea89
Create the completion queue with
Packit Service 63ea89
.IR "struct io_uring_params.cq_entries"
Packit Service 63ea89
entries.  The value must be greater than
Packit Service 63ea89
.IR entries ,
Packit Service 63ea89
and may be rounded up to the next power-of-two.
Packit Service 63ea89
.PP
Packit Service 63ea89
If no flags are specified, the io_uring instance is setup for
Packit Service 63ea89
interrupt driven I/O.  I/O may be submitted using
Packit Service 63ea89
.BR io_uring_enter (2)
Packit Service 63ea89
and can be reaped by polling the completion queue.
Packit Service 63ea89
Packit Service 63ea89
The
Packit Service 63ea89
.I resv
Packit Service 63ea89
array must be initialized to zero.
Packit Service 63ea89
Packit Service 63ea89
.I features
Packit Service 63ea89
is filled in by the kernel, which specifies various features supported
Packit Service 63ea89
by current kernel version.
Packit Service 63ea89
.TP
Packit Service 63ea89
.B IORING_FEAT_SINGLE_MMAP
Packit Service 63ea89
If this flag is set, the two SQ and CQ rings can be mapped with a single
Packit Service 63ea89
.I mmap(2)
Packit Service 63ea89
call. The SQEs must still be allocated separately. This brings the necessary
Packit Service 63ea89
.I mmap(2)
Packit Service 63ea89
calls down from three to two.
Packit Service 63ea89
.TP
Packit Service 63ea89
.B IORING_FEAT_NODROP
Packit Service 63ea89
If this flag is set, io_uring supports never dropping completion events.
Packit Service 63ea89
If a completion event occurs and the CQ ring is full, the kernel stores
Packit Service 63ea89
the event internally until such a time that the CQ ring has room for more
Packit Service 63ea89
entries. If this overflow condition is entered, attempting to submit more
Packit Service 63ea89
IO with fail with the
Packit Service 63ea89
.B -EBUSY
Packit Service 63ea89
error value, if it can't flush the overflown events to the CQ ring. If this
Packit Service 63ea89
happens, the application must reap events from the CQ ring and attempt the
Packit Service 63ea89
submit again.
Packit Service 63ea89
.TP
Packit Service 63ea89
.B IORING_FEAT_SUBMIT_STABLE
Packit Service 63ea89
If this flag is set, applications can be certain that any data for
Packit Service 63ea89
async offload has been consumed when the kernel has consumed the SQE.
Packit Service 63ea89
.TP
Packit Service 63ea89
.B IORING_FEAT_RW_CUR_POS
Packit Service 63ea89
If this flag is set, applications can specify
Packit Service 63ea89
.I offset
Packit Service 63ea89
== -1 with
Packit Service 63ea89
.B IORING_OP_{READV,WRITEV}
Packit Service 63ea89
,
Packit Service 63ea89
.B IORING_OP_{READ,WRITE}_FIXED
Packit Service 63ea89
, and
Packit Service 63ea89
.B IORING_OP_{READ,WRITE}
Packit Service 63ea89
to mean current file position, which behaves like
Packit Service 63ea89
.I preadv2(2)
Packit Service 63ea89
and
Packit Service 63ea89
.I pwritev2(2)
Packit Service 63ea89
with
Packit Service 63ea89
.I offset
Packit Service 63ea89
== -1. It'll use (and update) the current file position. This obviously comes
Packit Service 63ea89
with the caveat that if the application has multiple reads or writes in flight,
Packit Service 63ea89
then the end result will not be as expected. This is similar to threads sharing
Packit Service 63ea89
a file descriptor and doing IO using the current file position.
Packit Service 63ea89
.TP
Packit Service 63ea89
.B IORING_FEAT_CUR_PERSONALITY
Packit Service 63ea89
If this flag is set, then io_uring guarantees that both sync and async
Packit Service 63ea89
execution of a request assumes the credentials of the task that called
Packit Service 63ea89
.I
Packit Service 63ea89
io_uring_enter(2)
Packit Service 63ea89
to queue the requests. If this flag isn't set, then requests are issued with
Packit Service 63ea89
the credentials of the task that originally registered the io_uring. If only
Packit Service 63ea89
one task is using a ring, then this flag doesn't matter as the credentials
Packit Service 63ea89
will always be the same. Note that this is the default behavior, tasks can
Packit Service 63ea89
still register different personalities through
Packit Service 63ea89
.I
Packit Service 63ea89
io_uring_register(2)
Packit Service 63ea89
with
Packit Service 63ea89
.B IORING_REGISTER_PERSONALITY
Packit Service 63ea89
and specify the personality to use in the sqe.
Packit Service 63ea89
Packit Service 63ea89
.PP
Packit Service 63ea89
The rest of the fields in the
Packit Service 63ea89
.I struct io_uring_params
Packit Service 63ea89
are filled in by the kernel, and provide the information necessary to
Packit Service 63ea89
memory map the submission queue, completion queue, and the array of
Packit Service 63ea89
submission queue entries.
Packit Service 63ea89
.I sq_entries
Packit Service 63ea89
specifies the number of submission queue entries allocated.
Packit Service 63ea89
.I sq_off
Packit Service 63ea89
describes the offsets of various ring buffer fields:
Packit Service 63ea89
.PP
Packit Service 63ea89
.in +4n
Packit Service 63ea89
.EX
Packit Service 63ea89
struct io_sqring_offsets {
Packit Service 63ea89
    __u32 head;
Packit Service 63ea89
    __u32 tail;
Packit Service 63ea89
    __u32 ring_mask;
Packit Service 63ea89
    __u32 ring_entries;
Packit Service 63ea89
    __u32 flags;
Packit Service 63ea89
    __u32 dropped;
Packit Service 63ea89
    __u32 array;
Packit Service 63ea89
    __u32 resv[3];
Packit Service 63ea89
};
Packit Service 63ea89
.EE
Packit Service 63ea89
.in
Packit Service 63ea89
.PP
Packit Service 63ea89
Taken together,
Packit Service 63ea89
.I sq_entries
Packit Service 63ea89
and
Packit Service 63ea89
.I sq_off
Packit Service 63ea89
provide all of the information necessary for accessing the submission
Packit Service 63ea89
queue ring buffer and the submission queue entry array.  The
Packit Service 63ea89
submission queue can be mapped with a call like:
Packit Service 63ea89
.PP
Packit Service 63ea89
.in +4n
Packit Service 63ea89
.EX
Packit Service 63ea89
ptr = mmap(0, sq_off.array + sq_entries * sizeof(__u32),
Packit Service 63ea89
           PROT_READ|PROT_WRITE, MAP_SHARED|MAP_POPULATE,
Packit Service 63ea89
           ring_fd, IORING_OFF_SQ_RING);
Packit Service 63ea89
.EE
Packit Service 63ea89
.in
Packit Service 63ea89
.PP
Packit Service 63ea89
where
Packit Service 63ea89
.I sq_off
Packit Service 63ea89
is the
Packit Service 63ea89
.I io_sqring_offsets
Packit Service 63ea89
structure, and
Packit Service 63ea89
.I ring_fd
Packit Service 63ea89
is the file descriptor returned from
Packit Service 63ea89
.BR io_uring_setup (2).
Packit Service 63ea89
The addition of
Packit Service 63ea89
.I sq_off.array
Packit Service 63ea89
to the length of the region accounts for the fact that the ring
Packit Service 63ea89
located at the end of the data structure.  As an example, the ring
Packit Service 63ea89
buffer head pointer can be accessed by adding
Packit Service 63ea89
.I sq_off.head
Packit Service 63ea89
to the address returned from
Packit Service 63ea89
.BR mmap (2):
Packit Service 63ea89
.PP
Packit Service 63ea89
.in +4n
Packit Service 63ea89
.EX
Packit Service 63ea89
head = ptr + sq_off.head;
Packit Service 63ea89
.EE
Packit Service 63ea89
.in
Packit Service 63ea89
Packit Service 63ea89
The
Packit Service 63ea89
.I flags
Packit Service 63ea89
field is used by the kernel to communicate state information to the
Packit Service 63ea89
application.  Currently, it is used to inform the application when a
Packit Service 63ea89
call to
Packit Service 63ea89
.BR io_uring_enter (2)
Packit Service 63ea89
is necessary.  See the documentation for the
Packit Service 63ea89
.B IORING_SETUP_SQPOLL
Packit Service 63ea89
flag above.
Packit Service 63ea89
The
Packit Service 63ea89
.I dropped
Packit Service 63ea89
member is incremented for each invalid submission queue entry
Packit Service 63ea89
encountered in the ring buffer.
Packit Service 63ea89
Packit Service 63ea89
The head and tail track the ring buffer state.  The tail is
Packit Service 63ea89
incremented by the application when submitting new I/O, and the head
Packit Service 63ea89
is incremented by the kernel when the I/O has been successfully
Packit Service 63ea89
submitted.  Determining the index of the head or tail into the ring is
Packit Service 63ea89
accomplished by applying a mask:
Packit Service 63ea89
.PP
Packit Service 63ea89
.in +4n
Packit Service 63ea89
.EX
Packit Service 63ea89
index = tail & ring_mask;
Packit Service 63ea89
.EE
Packit Service 63ea89
.in
Packit Service 63ea89
.PP
Packit Service 63ea89
The array of submission queue entries is mapped with:
Packit Service 63ea89
.PP
Packit Service 63ea89
.in +4n
Packit Service 63ea89
.EX
Packit Service 63ea89
sqentries = mmap(0, sq_entries * sizeof(struct io_uring_sqe),
Packit Service 63ea89
                 PROT_READ|PROT_WRITE, MAP_SHARED|MAP_POPULATE,
Packit Service 63ea89
                 ring_fd, IORING_OFF_SQES);
Packit Service 63ea89
.EE
Packit Service 63ea89
.in
Packit Service 63ea89
.PP
Packit Service 63ea89
The completion queue is described by
Packit Service 63ea89
.I cq_entries
Packit Service 63ea89
and
Packit Service 63ea89
.I cq_off
Packit Service 63ea89
shown here:
Packit Service 63ea89
.PP
Packit Service 63ea89
.in +4n
Packit Service 63ea89
.EX
Packit Service 63ea89
struct io_cqring_offsets {
Packit Service 63ea89
    __u32 head;
Packit Service 63ea89
    __u32 tail;
Packit Service 63ea89
    __u32 ring_mask;
Packit Service 63ea89
    __u32 ring_entries;
Packit Service 63ea89
    __u32 overflow;
Packit Service 63ea89
    __u32 cqes;
Packit Service 63ea89
    __u32 flags;
Packit Service 63ea89
    __u32 resv[3];
Packit Service 63ea89
};
Packit Service 63ea89
.EE
Packit Service 63ea89
.in
Packit Service 63ea89
.PP
Packit Service 63ea89
The completion queue is simpler, since the entries are not separated
Packit Service 63ea89
from the queue itself, and can be mapped with:
Packit Service 63ea89
.PP
Packit Service 63ea89
.in +4n
Packit Service 63ea89
.EX
Packit Service 63ea89
ptr = mmap(0, cq_off.cqes + cq_entries * sizeof(struct io_uring_cqe),
Packit Service 63ea89
           PROT_READ|PROT_WRITE, MAP_SHARED|MAP_POPULATE, ring_fd,
Packit Service 63ea89
           IORING_OFF_CQ_RING);
Packit Service 63ea89
.EE
Packit Service 63ea89
.in
Packit Service 63ea89
.PP
Packit Service 63ea89
Closing the file descriptor returned by
Packit Service 63ea89
.BR io_uring_setup (2)
Packit Service 63ea89
will free all resources associated with the io_uring context.
Packit Service 63ea89
.PP
Packit Service 63ea89
.SH RETURN VALUE
Packit Service 63ea89
.BR io_uring_setup (2)
Packit Service 63ea89
returns a new file descriptor on success.  The application may then
Packit Service 63ea89
provide the file descriptor in a subsequent
Packit Service 63ea89
.BR mmap (2)
Packit Service 63ea89
call to map the submission and completion queues, or to the
Packit Service 63ea89
.BR io_uring_register (2)
Packit Service 63ea89
or
Packit Service 63ea89
.BR io_uring_enter (2)
Packit Service 63ea89
system calls.
Packit Service 63ea89
Packit Service 63ea89
On error, -1 is returned and
Packit Service 63ea89
.I errno
Packit Service 63ea89
is set appropriately.
Packit Service 63ea89
.PP
Packit Service 63ea89
.SH ERRORS
Packit Service 63ea89
.TP
Packit Service 63ea89
.B EFAULT
Packit Service 63ea89
params is outside your accessible address space.
Packit Service 63ea89
.TP
Packit Service 63ea89
.B EINVAL
Packit Service 63ea89
The resv array contains non-zero data, p.flags contains an unsupported
Packit Service 63ea89
flag,
Packit Service 63ea89
.I entries
Packit Service 63ea89
is out of bounds,
Packit Service 63ea89
.B IORING_SETUP_SQ_AFF
Packit Service 63ea89
was specified, but
Packit Service 63ea89
.B IORING_SETUP_SQPOLL
Packit Service 63ea89
was not, or
Packit Service 63ea89
.B IORING_SETUP_CQSIZE
Packit Service 63ea89
was specified, but
Packit Service 63ea89
.I io_uring_params.cq_entries
Packit Service 63ea89
was invalid.
Packit Service 63ea89
.TP
Packit Service 63ea89
.B EMFILE
Packit Service 63ea89
The per-process limit on the number of open file descriptors has been
Packit Service 63ea89
reached (see the description of
Packit Service 63ea89
.B RLIMIT_NOFILE
Packit Service 63ea89
in
Packit Service 63ea89
.BR getrlimit (2)).
Packit Service 63ea89
.TP
Packit Service 63ea89
.B ENFILE
Packit Service 63ea89
The system-wide limit on the total number of open files has been
Packit Service 63ea89
reached.
Packit Service 63ea89
.TP
Packit Service 63ea89
.B ENOMEM
Packit Service 63ea89
Insufficient kernel resources are available.
Packit Service 63ea89
.TP
Packit Service 63ea89
.B EPERM
Packit Service 63ea89
.B IORING_SETUP_SQPOLL
Packit Service 63ea89
was specified, but the effective user ID of the caller did not have sufficient
Packit Service 63ea89
privileges.
Packit Service 63ea89
.SH SEE ALSO
Packit Service 63ea89
.BR io_uring_register (2),
Packit Service 63ea89
.BR io_uring_enter (2)