|
Packit |
7cfc04 |
.\" Copyright (C) 2003 Davide Libenzi
|
|
Packit |
7cfc04 |
.\"
|
|
Packit |
7cfc04 |
.\" %%%LICENSE_START(GPLv2+_SW_3_PARA)
|
|
Packit |
7cfc04 |
.\" This program is free software; you can redistribute it and/or modify
|
|
Packit |
7cfc04 |
.\" it under the terms of the GNU General Public License as published by
|
|
Packit |
7cfc04 |
.\" the Free Software Foundation; either version 2 of the License, or
|
|
Packit |
7cfc04 |
.\" (at your option) any later version.
|
|
Packit |
7cfc04 |
.\"
|
|
Packit |
7cfc04 |
.\" This program is distributed in the hope that it will be useful,
|
|
Packit |
7cfc04 |
.\" but WITHOUT ANY WARRANTY; without even the implied warranty of
|
|
Packit |
7cfc04 |
.\" MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
|
Packit |
7cfc04 |
.\" GNU General Public License for more details.
|
|
Packit |
7cfc04 |
.\"
|
|
Packit |
7cfc04 |
.\" You should have received a copy of the GNU General Public
|
|
Packit |
7cfc04 |
.\" License along with this manual; if not, see
|
|
Packit |
7cfc04 |
.\" <http://www.gnu.org/licenses/>.
|
|
Packit |
7cfc04 |
.\" %%%LICENSE_END
|
|
Packit |
7cfc04 |
.\"
|
|
Packit |
7cfc04 |
.\" Davide Libenzi <davidel@xmailserver.org>
|
|
Packit |
7cfc04 |
.\"
|
|
Packit |
7cfc04 |
.TH EPOLL 7 2017-09-15 "Linux" "Linux Programmer's Manual"
|
|
Packit |
7cfc04 |
.SH NAME
|
|
Packit |
7cfc04 |
epoll \- I/O event notification facility
|
|
Packit |
7cfc04 |
.SH SYNOPSIS
|
|
Packit |
7cfc04 |
.B #include <sys/epoll.h>
|
|
Packit |
7cfc04 |
.SH DESCRIPTION
|
|
Packit |
7cfc04 |
The
|
|
Packit |
7cfc04 |
.B epoll
|
|
Packit |
7cfc04 |
API performs a similar task to
|
|
Packit |
7cfc04 |
.BR poll (2):
|
|
Packit |
7cfc04 |
monitoring multiple file descriptors to see if I/O is possible on any of them.
|
|
Packit |
7cfc04 |
The
|
|
Packit |
7cfc04 |
.B epoll
|
|
Packit |
7cfc04 |
API can be used either as an edge-triggered or a level-triggered
|
|
Packit |
7cfc04 |
interface and scales well to large numbers of watched file descriptors.
|
|
Packit |
7cfc04 |
The following system calls are provided to
|
|
Packit |
7cfc04 |
create and manage an
|
|
Packit |
7cfc04 |
.B epoll
|
|
Packit |
7cfc04 |
instance:
|
|
Packit |
7cfc04 |
.IP * 3
|
|
Packit |
7cfc04 |
.BR epoll_create (2)
|
|
Packit |
7cfc04 |
creates a new
|
|
Packit |
7cfc04 |
.B epoll
|
|
Packit |
7cfc04 |
instance and returns a file descriptor referring to that instance.
|
|
Packit |
7cfc04 |
(The more recent
|
|
Packit |
7cfc04 |
.BR epoll_create1 (2)
|
|
Packit |
7cfc04 |
extends the functionality of
|
|
Packit |
7cfc04 |
.BR epoll_create (2).)
|
|
Packit |
7cfc04 |
.IP *
|
|
Packit |
7cfc04 |
Interest in particular file descriptors is then registered via
|
|
Packit |
7cfc04 |
.BR epoll_ctl (2).
|
|
Packit |
7cfc04 |
The set of file descriptors currently registered on an
|
|
Packit |
7cfc04 |
.B epoll
|
|
Packit |
7cfc04 |
instance is sometimes called an
|
|
Packit |
7cfc04 |
.I epoll
|
|
Packit |
7cfc04 |
set.
|
|
Packit |
7cfc04 |
.IP *
|
|
Packit |
7cfc04 |
.BR epoll_wait (2)
|
|
Packit |
7cfc04 |
waits for I/O events,
|
|
Packit |
7cfc04 |
blocking the calling thread if no events are currently available.
|
|
Packit |
7cfc04 |
.SS Level-triggered and edge-triggered
|
|
Packit |
7cfc04 |
The
|
|
Packit |
7cfc04 |
.B epoll
|
|
Packit |
7cfc04 |
event distribution interface is able to behave both as edge-triggered
|
|
Packit |
7cfc04 |
(ET) and as level-triggered (LT).
|
|
Packit |
7cfc04 |
The difference between the two mechanisms
|
|
Packit |
7cfc04 |
can be described as follows.
|
|
Packit |
7cfc04 |
Suppose that
|
|
Packit |
7cfc04 |
this scenario happens:
|
|
Packit |
7cfc04 |
.IP 1. 3
|
|
Packit |
7cfc04 |
The file descriptor that represents the read side of a pipe
|
|
Packit |
7cfc04 |
.RI ( rfd )
|
|
Packit |
7cfc04 |
is registered on the
|
|
Packit |
7cfc04 |
.B epoll
|
|
Packit |
7cfc04 |
instance.
|
|
Packit |
7cfc04 |
.IP 2.
|
|
Packit |
7cfc04 |
A pipe writer writes 2\ kB of data on the write side of the pipe.
|
|
Packit |
7cfc04 |
.IP 3.
|
|
Packit |
7cfc04 |
A call to
|
|
Packit |
7cfc04 |
.BR epoll_wait (2)
|
|
Packit |
7cfc04 |
is done that will return
|
|
Packit |
7cfc04 |
.I rfd
|
|
Packit |
7cfc04 |
as a ready file descriptor.
|
|
Packit |
7cfc04 |
.IP 4.
|
|
Packit |
7cfc04 |
The pipe reader reads 1\ kB of data from
|
|
Packit |
7cfc04 |
.IR rfd .
|
|
Packit |
7cfc04 |
.IP 5.
|
|
Packit |
7cfc04 |
A call to
|
|
Packit |
7cfc04 |
.BR epoll_wait (2)
|
|
Packit |
7cfc04 |
is done.
|
|
Packit |
7cfc04 |
.PP
|
|
Packit |
7cfc04 |
If the
|
|
Packit |
7cfc04 |
.I rfd
|
|
Packit |
7cfc04 |
file descriptor has been added to the
|
|
Packit |
7cfc04 |
.B epoll
|
|
Packit |
7cfc04 |
interface using the
|
|
Packit |
7cfc04 |
.B EPOLLET
|
|
Packit |
7cfc04 |
(edge-triggered)
|
|
Packit |
7cfc04 |
flag, the call to
|
|
Packit |
7cfc04 |
.BR epoll_wait (2)
|
|
Packit |
7cfc04 |
done in step
|
|
Packit |
7cfc04 |
.B 5
|
|
Packit |
7cfc04 |
will probably hang despite the available data still present in the file
|
|
Packit |
7cfc04 |
input buffer;
|
|
Packit |
7cfc04 |
meanwhile the remote peer might be expecting a response based on the
|
|
Packit |
7cfc04 |
data it already sent.
|
|
Packit |
7cfc04 |
The reason for this is that edge-triggered mode
|
|
Packit |
7cfc04 |
delivers events only when changes occur on the monitored file descriptor.
|
|
Packit |
7cfc04 |
So, in step
|
|
Packit |
7cfc04 |
.B 5
|
|
Packit |
7cfc04 |
the caller might end up waiting for some data that is already present inside
|
|
Packit |
7cfc04 |
the input buffer.
|
|
Packit |
7cfc04 |
In the above example, an event on
|
|
Packit |
7cfc04 |
.I rfd
|
|
Packit |
7cfc04 |
will be generated because of the write done in
|
|
Packit |
7cfc04 |
.B 2
|
|
Packit |
7cfc04 |
and the event is consumed in
|
|
Packit |
7cfc04 |
.BR 3 .
|
|
Packit |
7cfc04 |
Since the read operation done in
|
|
Packit |
7cfc04 |
.B 4
|
|
Packit |
7cfc04 |
does not consume the whole buffer data, the call to
|
|
Packit |
7cfc04 |
.BR epoll_wait (2)
|
|
Packit |
7cfc04 |
done in step
|
|
Packit |
7cfc04 |
.B 5
|
|
Packit |
7cfc04 |
might block indefinitely.
|
|
Packit |
7cfc04 |
.PP
|
|
Packit |
7cfc04 |
An application that employs the
|
|
Packit |
7cfc04 |
.B EPOLLET
|
|
Packit |
7cfc04 |
flag should use nonblocking file descriptors to avoid having a blocking
|
|
Packit |
7cfc04 |
read or write starve a task that is handling multiple file descriptors.
|
|
Packit |
7cfc04 |
The suggested way to use
|
|
Packit |
7cfc04 |
.B epoll
|
|
Packit |
7cfc04 |
as an edge-triggered
|
|
Packit |
7cfc04 |
.RB ( EPOLLET )
|
|
Packit |
7cfc04 |
interface is as follows:
|
|
Packit |
7cfc04 |
.RS
|
|
Packit |
7cfc04 |
.TP 4
|
|
Packit |
7cfc04 |
.B i
|
|
Packit |
7cfc04 |
with nonblocking file descriptors; and
|
|
Packit |
7cfc04 |
.TP
|
|
Packit |
7cfc04 |
.B ii
|
|
Packit |
7cfc04 |
by waiting for an event only after
|
|
Packit |
7cfc04 |
.BR read (2)
|
|
Packit |
7cfc04 |
or
|
|
Packit |
7cfc04 |
.BR write (2)
|
|
Packit |
7cfc04 |
return
|
|
Packit |
7cfc04 |
.BR EAGAIN .
|
|
Packit |
7cfc04 |
.RE
|
|
Packit |
7cfc04 |
.PP
|
|
Packit |
7cfc04 |
By contrast, when used as a level-triggered interface
|
|
Packit |
7cfc04 |
(the default, when
|
|
Packit |
7cfc04 |
.B EPOLLET
|
|
Packit |
7cfc04 |
is not specified),
|
|
Packit |
7cfc04 |
.B epoll
|
|
Packit |
7cfc04 |
is simply a faster
|
|
Packit |
7cfc04 |
.BR poll (2),
|
|
Packit |
7cfc04 |
and can be used wherever the latter is used since it shares the
|
|
Packit |
7cfc04 |
same semantics.
|
|
Packit |
7cfc04 |
.PP
|
|
Packit |
7cfc04 |
Since even with edge-triggered
|
|
Packit |
7cfc04 |
.BR epoll ,
|
|
Packit |
7cfc04 |
multiple events can be generated upon receipt of multiple chunks of data,
|
|
Packit |
7cfc04 |
the caller has the option to specify the
|
|
Packit |
7cfc04 |
.B EPOLLONESHOT
|
|
Packit |
7cfc04 |
flag, to tell
|
|
Packit |
7cfc04 |
.B epoll
|
|
Packit |
7cfc04 |
to disable the associated file descriptor after the receipt of an event with
|
|
Packit |
7cfc04 |
.BR epoll_wait (2).
|
|
Packit |
7cfc04 |
When the
|
|
Packit |
7cfc04 |
.B EPOLLONESHOT
|
|
Packit |
7cfc04 |
flag is specified,
|
|
Packit |
7cfc04 |
it is the caller's responsibility to rearm the file descriptor using
|
|
Packit |
7cfc04 |
.BR epoll_ctl (2)
|
|
Packit |
7cfc04 |
with
|
|
Packit |
7cfc04 |
.BR EPOLL_CTL_MOD .
|
|
Packit |
7cfc04 |
.SS Interaction with autosleep
|
|
Packit |
7cfc04 |
If the system is in
|
|
Packit |
7cfc04 |
.B autosleep
|
|
Packit |
7cfc04 |
mode via
|
|
Packit |
7cfc04 |
.I /sys/power/autosleep
|
|
Packit |
7cfc04 |
and an event happens which wakes the device from sleep, the device
|
|
Packit |
7cfc04 |
driver will keep the device awake only until that event is queued.
|
|
Packit |
7cfc04 |
To keep the device awake until the event has been processed,
|
|
Packit |
7cfc04 |
it is necessary to use the
|
|
Packit |
7cfc04 |
.BR epoll_ctl (2)
|
|
Packit |
7cfc04 |
.B EPOLLWAKEUP
|
|
Packit |
7cfc04 |
flag.
|
|
Packit |
7cfc04 |
.PP
|
|
Packit |
7cfc04 |
When the
|
|
Packit |
7cfc04 |
.B EPOLLWAKEUP
|
|
Packit |
7cfc04 |
flag is set in the
|
|
Packit |
7cfc04 |
.B events
|
|
Packit |
7cfc04 |
field for a
|
|
Packit |
7cfc04 |
.IR "struct epoll_event" ,
|
|
Packit |
7cfc04 |
the system will be kept awake from the moment the event is queued,
|
|
Packit |
7cfc04 |
through the
|
|
Packit |
7cfc04 |
.BR epoll_wait (2)
|
|
Packit |
7cfc04 |
call which returns the event until the subsequent
|
|
Packit |
7cfc04 |
.BR epoll_wait (2)
|
|
Packit |
7cfc04 |
call.
|
|
Packit |
7cfc04 |
If the event should keep the system awake beyond that time,
|
|
Packit |
7cfc04 |
then a separate
|
|
Packit |
7cfc04 |
.I wake_lock
|
|
Packit |
7cfc04 |
should be taken before the second
|
|
Packit |
7cfc04 |
.BR epoll_wait (2)
|
|
Packit |
7cfc04 |
call.
|
|
Packit |
7cfc04 |
.SS /proc interfaces
|
|
Packit |
7cfc04 |
The following interfaces can be used to limit the amount of
|
|
Packit |
7cfc04 |
kernel memory consumed by epoll:
|
|
Packit |
7cfc04 |
.\" Following was added in 2.6.28, but them removed in 2.6.29
|
|
Packit |
7cfc04 |
.\" .TP
|
|
Packit |
7cfc04 |
.\" .IR /proc/sys/fs/epoll/max_user_instances " (since Linux 2.6.28)"
|
|
Packit |
7cfc04 |
.\" This specifies an upper limit on the number of epoll instances
|
|
Packit |
7cfc04 |
.\" that can be created per real user ID.
|
|
Packit |
7cfc04 |
.TP
|
|
Packit |
7cfc04 |
.IR /proc/sys/fs/epoll/max_user_watches " (since Linux 2.6.28)"
|
|
Packit |
7cfc04 |
This specifies a limit on the total number of
|
|
Packit |
7cfc04 |
file descriptors that a user can register across
|
|
Packit |
7cfc04 |
all epoll instances on the system.
|
|
Packit |
7cfc04 |
The limit is per real user ID.
|
|
Packit |
7cfc04 |
Each registered file descriptor costs roughly 90 bytes on a 32-bit kernel,
|
|
Packit |
7cfc04 |
and roughly 160 bytes on a 64-bit kernel.
|
|
Packit |
7cfc04 |
Currently,
|
|
Packit |
7cfc04 |
.\" 2.6.29 (in 2.6.28, the default was 1/32 of lowmem)
|
|
Packit |
7cfc04 |
the default value for
|
|
Packit |
7cfc04 |
.I max_user_watches
|
|
Packit |
7cfc04 |
is 1/25 (4%) of the available low memory,
|
|
Packit |
7cfc04 |
divided by the registration cost in bytes.
|
|
Packit |
7cfc04 |
.SS Example for suggested usage
|
|
Packit |
7cfc04 |
While the usage of
|
|
Packit |
7cfc04 |
.B epoll
|
|
Packit |
7cfc04 |
when employed as a level-triggered interface does have the same
|
|
Packit |
7cfc04 |
semantics as
|
|
Packit |
7cfc04 |
.BR poll (2),
|
|
Packit |
7cfc04 |
the edge-triggered usage requires more clarification to avoid stalls
|
|
Packit |
7cfc04 |
in the application event loop.
|
|
Packit |
7cfc04 |
In this example, listener is a
|
|
Packit |
7cfc04 |
nonblocking socket on which
|
|
Packit |
7cfc04 |
.BR listen (2)
|
|
Packit |
7cfc04 |
has been called.
|
|
Packit |
7cfc04 |
The function
|
|
Packit |
7cfc04 |
.I do_use_fd()
|
|
Packit |
7cfc04 |
uses the new ready file descriptor until
|
|
Packit |
7cfc04 |
.B EAGAIN
|
|
Packit |
7cfc04 |
is returned by either
|
|
Packit |
7cfc04 |
.BR read (2)
|
|
Packit |
7cfc04 |
or
|
|
Packit |
7cfc04 |
.BR write (2).
|
|
Packit |
7cfc04 |
An event-driven state machine application should, after having received
|
|
Packit |
7cfc04 |
.BR EAGAIN ,
|
|
Packit |
7cfc04 |
record its current state so that at the next call to
|
|
Packit |
7cfc04 |
.I do_use_fd()
|
|
Packit |
7cfc04 |
it will continue to
|
|
Packit |
7cfc04 |
.BR read (2)
|
|
Packit |
7cfc04 |
or
|
|
Packit |
7cfc04 |
.BR write (2)
|
|
Packit |
7cfc04 |
from where it stopped before.
|
|
Packit |
7cfc04 |
.PP
|
|
Packit |
7cfc04 |
.in +4n
|
|
Packit |
7cfc04 |
.EX
|
|
Packit |
7cfc04 |
#define MAX_EVENTS 10
|
|
Packit |
7cfc04 |
struct epoll_event ev, events[MAX_EVENTS];
|
|
Packit |
7cfc04 |
int listen_sock, conn_sock, nfds, epollfd;
|
|
Packit |
7cfc04 |
|
|
Packit |
7cfc04 |
/* Code to set up listening socket, \(aqlisten_sock\(aq,
|
|
Packit |
7cfc04 |
(socket(), bind(), listen()) omitted */
|
|
Packit |
7cfc04 |
|
|
Packit |
7cfc04 |
epollfd = epoll_create1(0);
|
|
Packit |
7cfc04 |
if (epollfd == \-1) {
|
|
Packit |
7cfc04 |
perror("epoll_create1");
|
|
Packit |
7cfc04 |
exit(EXIT_FAILURE);
|
|
Packit |
7cfc04 |
}
|
|
Packit |
7cfc04 |
|
|
Packit |
7cfc04 |
ev.events = EPOLLIN;
|
|
Packit |
7cfc04 |
ev.data.fd = listen_sock;
|
|
Packit |
7cfc04 |
if (epoll_ctl(epollfd, EPOLL_CTL_ADD, listen_sock, &ev) == \-1) {
|
|
Packit |
7cfc04 |
perror("epoll_ctl: listen_sock");
|
|
Packit |
7cfc04 |
exit(EXIT_FAILURE);
|
|
Packit |
7cfc04 |
}
|
|
Packit |
7cfc04 |
|
|
Packit |
7cfc04 |
for (;;) {
|
|
Packit |
7cfc04 |
nfds = epoll_wait(epollfd, events, MAX_EVENTS, \-1);
|
|
Packit |
7cfc04 |
if (nfds == \-1) {
|
|
Packit |
7cfc04 |
perror("epoll_wait");
|
|
Packit |
7cfc04 |
exit(EXIT_FAILURE);
|
|
Packit |
7cfc04 |
}
|
|
Packit |
7cfc04 |
|
|
Packit |
7cfc04 |
for (n = 0; n < nfds; ++n) {
|
|
Packit |
7cfc04 |
if (events[n].data.fd == listen_sock) {
|
|
Packit |
7cfc04 |
conn_sock = accept(listen_sock,
|
|
Packit |
7cfc04 |
(struct sockaddr *) &addr, &addrlen);
|
|
Packit |
7cfc04 |
if (conn_sock == \-1) {
|
|
Packit |
7cfc04 |
perror("accept");
|
|
Packit |
7cfc04 |
exit(EXIT_FAILURE);
|
|
Packit |
7cfc04 |
}
|
|
Packit |
7cfc04 |
setnonblocking(conn_sock);
|
|
Packit |
7cfc04 |
ev.events = EPOLLIN | EPOLLET;
|
|
Packit |
7cfc04 |
ev.data.fd = conn_sock;
|
|
Packit |
7cfc04 |
if (epoll_ctl(epollfd, EPOLL_CTL_ADD, conn_sock,
|
|
Packit |
7cfc04 |
&ev) == \-1) {
|
|
Packit |
7cfc04 |
perror("epoll_ctl: conn_sock");
|
|
Packit |
7cfc04 |
exit(EXIT_FAILURE);
|
|
Packit |
7cfc04 |
}
|
|
Packit |
7cfc04 |
} else {
|
|
Packit |
7cfc04 |
do_use_fd(events[n].data.fd);
|
|
Packit |
7cfc04 |
}
|
|
Packit |
7cfc04 |
}
|
|
Packit |
7cfc04 |
}
|
|
Packit |
7cfc04 |
.EE
|
|
Packit |
7cfc04 |
.in
|
|
Packit |
7cfc04 |
.PP
|
|
Packit |
7cfc04 |
When used as an edge-triggered interface, for performance reasons, it is
|
|
Packit |
7cfc04 |
possible to add the file descriptor inside the
|
|
Packit |
7cfc04 |
.B epoll
|
|
Packit |
7cfc04 |
interface
|
|
Packit |
7cfc04 |
.RB ( EPOLL_CTL_ADD )
|
|
Packit |
7cfc04 |
once by specifying
|
|
Packit |
7cfc04 |
.RB ( EPOLLIN | EPOLLOUT ).
|
|
Packit |
7cfc04 |
This allows you to avoid
|
|
Packit |
7cfc04 |
continuously switching between
|
|
Packit |
7cfc04 |
.B EPOLLIN
|
|
Packit |
7cfc04 |
and
|
|
Packit |
7cfc04 |
.B EPOLLOUT
|
|
Packit |
7cfc04 |
calling
|
|
Packit |
7cfc04 |
.BR epoll_ctl (2)
|
|
Packit |
7cfc04 |
with
|
|
Packit |
7cfc04 |
.BR EPOLL_CTL_MOD .
|
|
Packit |
7cfc04 |
.SS Questions and answers
|
|
Packit |
7cfc04 |
.TP 4
|
|
Packit |
7cfc04 |
.B Q0
|
|
Packit |
7cfc04 |
What is the key used to distinguish the file descriptors registered in an
|
|
Packit |
7cfc04 |
.B epoll
|
|
Packit |
7cfc04 |
set?
|
|
Packit |
7cfc04 |
.TP
|
|
Packit |
7cfc04 |
.B A0
|
|
Packit |
7cfc04 |
The key is the combination of the file descriptor number and
|
|
Packit |
7cfc04 |
the open file description
|
|
Packit |
7cfc04 |
(also known as an "open file handle",
|
|
Packit |
7cfc04 |
the kernel's internal representation of an open file).
|
|
Packit |
7cfc04 |
.TP
|
|
Packit |
7cfc04 |
.B Q1
|
|
Packit |
7cfc04 |
What happens if you register the same file descriptor on an
|
|
Packit |
7cfc04 |
.B epoll
|
|
Packit |
7cfc04 |
instance twice?
|
|
Packit |
7cfc04 |
.TP
|
|
Packit |
7cfc04 |
.B A1
|
|
Packit |
7cfc04 |
You will probably get
|
|
Packit |
7cfc04 |
.BR EEXIST .
|
|
Packit |
7cfc04 |
However, it is possible to add a duplicate
|
|
Packit |
7cfc04 |
.RB ( dup (2),
|
|
Packit |
7cfc04 |
.BR dup2 (2),
|
|
Packit |
7cfc04 |
.BR fcntl (2)
|
|
Packit |
7cfc04 |
.BR F_DUPFD )
|
|
Packit |
7cfc04 |
file descriptor to the same
|
|
Packit |
7cfc04 |
.B epoll
|
|
Packit |
7cfc04 |
instance.
|
|
Packit |
7cfc04 |
.\" But a file descriptor duplicated by fork(2) can't be added to the
|
|
Packit |
7cfc04 |
.\" set, because the [file *, fd] pair is already in the epoll set.
|
|
Packit |
7cfc04 |
.\" That is a somewhat ugly inconsistency. On the one hand, a child process
|
|
Packit |
7cfc04 |
.\" cannot add the duplicate file descriptor to the epoll set. (In every
|
|
Packit |
7cfc04 |
.\" other case that I can think of, file descriptors duplicated by fork have
|
|
Packit |
7cfc04 |
.\" similar semantics to file descriptors duplicated by dup() and friends.) On
|
|
Packit |
7cfc04 |
.\" the other hand, the very fact that the child has a duplicate of the
|
|
Packit |
7cfc04 |
.\" file descriptor means that even if the parent closes its file descriptor,
|
|
Packit |
7cfc04 |
.\" then epoll_wait() in the parent will continue to receive notifications for
|
|
Packit |
7cfc04 |
.\" that file descriptor because of the duplicated file descriptor in the child.
|
|
Packit |
7cfc04 |
.\"
|
|
Packit |
7cfc04 |
.\" See http://thread.gmane.org/gmane.linux.kernel/596462/
|
|
Packit |
7cfc04 |
.\" "epoll design problems with common fork/exec patterns"
|
|
Packit |
7cfc04 |
.\"
|
|
Packit |
7cfc04 |
.\" mtk, Feb 2008
|
|
Packit |
7cfc04 |
This can be a useful technique for filtering events,
|
|
Packit |
7cfc04 |
if the duplicate file descriptors are registered with different
|
|
Packit |
7cfc04 |
.I events
|
|
Packit |
7cfc04 |
masks.
|
|
Packit |
7cfc04 |
.TP
|
|
Packit |
7cfc04 |
.B Q2
|
|
Packit |
7cfc04 |
Can two
|
|
Packit |
7cfc04 |
.B epoll
|
|
Packit |
7cfc04 |
instances wait for the same file descriptor?
|
|
Packit |
7cfc04 |
If so, are events reported to both
|
|
Packit |
7cfc04 |
.B epoll
|
|
Packit |
7cfc04 |
file descriptors?
|
|
Packit |
7cfc04 |
.TP
|
|
Packit |
7cfc04 |
.B A2
|
|
Packit |
7cfc04 |
Yes, and events would be reported to both.
|
|
Packit |
7cfc04 |
However, careful programming may be needed to do this correctly.
|
|
Packit |
7cfc04 |
.TP
|
|
Packit |
7cfc04 |
.B Q3
|
|
Packit |
7cfc04 |
Is the
|
|
Packit |
7cfc04 |
.B epoll
|
|
Packit |
7cfc04 |
file descriptor itself poll/epoll/selectable?
|
|
Packit |
7cfc04 |
.TP
|
|
Packit |
7cfc04 |
.B A3
|
|
Packit |
7cfc04 |
Yes.
|
|
Packit |
7cfc04 |
If an
|
|
Packit |
7cfc04 |
.B epoll
|
|
Packit |
7cfc04 |
file descriptor has events waiting, then it will
|
|
Packit |
7cfc04 |
indicate as being readable.
|
|
Packit |
7cfc04 |
.TP
|
|
Packit |
7cfc04 |
.B Q4
|
|
Packit |
7cfc04 |
What happens if one attempts to put an
|
|
Packit |
7cfc04 |
.B epoll
|
|
Packit |
7cfc04 |
file descriptor into its own file descriptor set?
|
|
Packit |
7cfc04 |
.TP
|
|
Packit |
7cfc04 |
.B A4
|
|
Packit |
7cfc04 |
The
|
|
Packit |
7cfc04 |
.BR epoll_ctl (2)
|
|
Packit |
7cfc04 |
call fails
|
|
Packit |
7cfc04 |
.RB ( EINVAL ).
|
|
Packit |
7cfc04 |
However, you can add an
|
|
Packit |
7cfc04 |
.B epoll
|
|
Packit |
7cfc04 |
file descriptor inside another
|
|
Packit |
7cfc04 |
.B epoll
|
|
Packit |
7cfc04 |
file descriptor set.
|
|
Packit |
7cfc04 |
.TP
|
|
Packit |
7cfc04 |
.B Q5
|
|
Packit |
7cfc04 |
Can I send an
|
|
Packit |
7cfc04 |
.B epoll
|
|
Packit |
7cfc04 |
file descriptor over a UNIX domain socket to another process?
|
|
Packit |
7cfc04 |
.TP
|
|
Packit |
7cfc04 |
.B A5
|
|
Packit |
7cfc04 |
Yes, but it does not make sense to do this, since the receiving process
|
|
Packit |
7cfc04 |
would not have copies of the file descriptors in the
|
|
Packit |
7cfc04 |
.B epoll
|
|
Packit |
7cfc04 |
set.
|
|
Packit |
7cfc04 |
.TP
|
|
Packit |
7cfc04 |
.B Q6
|
|
Packit |
7cfc04 |
Will closing a file descriptor cause it to be removed from all
|
|
Packit |
7cfc04 |
.B epoll
|
|
Packit |
7cfc04 |
sets automatically?
|
|
Packit |
7cfc04 |
.TP
|
|
Packit |
7cfc04 |
.B A6
|
|
Packit |
7cfc04 |
Yes, but be aware of the following point.
|
|
Packit |
7cfc04 |
A file descriptor is a reference to an open file description (see
|
|
Packit |
7cfc04 |
.BR open (2)).
|
|
Packit |
7cfc04 |
Whenever a file descriptor is duplicated via
|
|
Packit |
7cfc04 |
.BR dup (2),
|
|
Packit |
7cfc04 |
.BR dup2 (2),
|
|
Packit |
7cfc04 |
.BR fcntl (2)
|
|
Packit |
7cfc04 |
.BR F_DUPFD ,
|
|
Packit |
7cfc04 |
or
|
|
Packit |
7cfc04 |
.BR fork (2),
|
|
Packit |
7cfc04 |
a new file descriptor referring to the same open file description is
|
|
Packit |
7cfc04 |
created.
|
|
Packit |
7cfc04 |
An open file description continues to exist until all
|
|
Packit |
7cfc04 |
file descriptors referring to it have been closed.
|
|
Packit |
7cfc04 |
A file descriptor is removed from an
|
|
Packit |
7cfc04 |
.B epoll
|
|
Packit |
7cfc04 |
set only after all the file descriptors referring to the underlying
|
|
Packit |
7cfc04 |
open file description have been closed
|
|
Packit |
7cfc04 |
(or before if the file descriptor is explicitly removed using
|
|
Packit |
7cfc04 |
.BR epoll_ctl (2)
|
|
Packit |
7cfc04 |
.BR EPOLL_CTL_DEL ).
|
|
Packit |
7cfc04 |
This means that even after a file descriptor that is part of an
|
|
Packit |
7cfc04 |
.B epoll
|
|
Packit |
7cfc04 |
set has been closed,
|
|
Packit |
7cfc04 |
events may be reported for that file descriptor if other file
|
|
Packit |
7cfc04 |
descriptors referring to the same underlying file description remain open.
|
|
Packit |
7cfc04 |
.TP
|
|
Packit |
7cfc04 |
.B Q7
|
|
Packit |
7cfc04 |
If more than one event occurs between
|
|
Packit |
7cfc04 |
.BR epoll_wait (2)
|
|
Packit |
7cfc04 |
calls, are they combined or reported separately?
|
|
Packit |
7cfc04 |
.TP
|
|
Packit |
7cfc04 |
.B A7
|
|
Packit |
7cfc04 |
They will be combined.
|
|
Packit |
7cfc04 |
.TP
|
|
Packit |
7cfc04 |
.B Q8
|
|
Packit |
7cfc04 |
Does an operation on a file descriptor affect the
|
|
Packit |
7cfc04 |
already collected but not yet reported events?
|
|
Packit |
7cfc04 |
.TP
|
|
Packit |
7cfc04 |
.B A8
|
|
Packit |
7cfc04 |
You can do two operations on an existing file descriptor.
|
|
Packit |
7cfc04 |
Remove would be meaningless for
|
|
Packit |
7cfc04 |
this case.
|
|
Packit |
7cfc04 |
Modify will reread available I/O.
|
|
Packit |
7cfc04 |
.TP
|
|
Packit |
7cfc04 |
.B Q9
|
|
Packit |
7cfc04 |
Do I need to continuously read/write a file descriptor
|
|
Packit |
7cfc04 |
until
|
|
Packit |
7cfc04 |
.B EAGAIN
|
|
Packit |
7cfc04 |
when using the
|
|
Packit |
7cfc04 |
.B EPOLLET
|
|
Packit |
7cfc04 |
flag (edge-triggered behavior) ?
|
|
Packit |
7cfc04 |
.TP
|
|
Packit |
7cfc04 |
.B A9
|
|
Packit |
7cfc04 |
Receiving an event from
|
|
Packit |
7cfc04 |
.BR epoll_wait (2)
|
|
Packit |
7cfc04 |
should suggest to you that such
|
|
Packit |
7cfc04 |
file descriptor is ready for the requested I/O operation.
|
|
Packit |
7cfc04 |
You must consider it ready until the next (nonblocking)
|
|
Packit |
7cfc04 |
read/write yields
|
|
Packit |
7cfc04 |
.BR EAGAIN .
|
|
Packit |
7cfc04 |
When and how you will use the file descriptor is entirely up to you.
|
|
Packit |
7cfc04 |
.IP
|
|
Packit |
7cfc04 |
For packet/token-oriented files (e.g., datagram socket,
|
|
Packit |
7cfc04 |
terminal in canonical mode),
|
|
Packit |
7cfc04 |
the only way to detect the end of the read/write I/O space
|
|
Packit |
7cfc04 |
is to continue to read/write until
|
|
Packit |
7cfc04 |
.BR EAGAIN .
|
|
Packit |
7cfc04 |
.IP
|
|
Packit |
7cfc04 |
For stream-oriented files (e.g., pipe, FIFO, stream socket), the
|
|
Packit |
7cfc04 |
condition that the read/write I/O space is exhausted can also be detected by
|
|
Packit |
7cfc04 |
checking the amount of data read from / written to the target file
|
|
Packit |
7cfc04 |
descriptor.
|
|
Packit |
7cfc04 |
For example, if you call
|
|
Packit |
7cfc04 |
.BR read (2)
|
|
Packit |
7cfc04 |
by asking to read a certain amount of data and
|
|
Packit |
7cfc04 |
.BR read (2)
|
|
Packit |
7cfc04 |
returns a lower number of bytes, you
|
|
Packit |
7cfc04 |
can be sure of having exhausted the read I/O space for the file
|
|
Packit |
7cfc04 |
descriptor.
|
|
Packit |
7cfc04 |
The same is true when writing using
|
|
Packit |
7cfc04 |
.BR write (2).
|
|
Packit |
7cfc04 |
(Avoid this latter technique if you cannot guarantee that
|
|
Packit |
7cfc04 |
the monitored file descriptor always refers to a stream-oriented file.)
|
|
Packit |
7cfc04 |
.SS Possible pitfalls and ways to avoid them
|
|
Packit |
7cfc04 |
.TP
|
|
Packit |
7cfc04 |
.B o Starvation (edge-triggered)
|
|
Packit |
7cfc04 |
.PP
|
|
Packit |
7cfc04 |
If there is a large amount of I/O space,
|
|
Packit |
7cfc04 |
it is possible that by trying to drain
|
|
Packit |
7cfc04 |
it the other files will not get processed causing starvation.
|
|
Packit |
7cfc04 |
(This problem is not specific to
|
|
Packit |
7cfc04 |
.BR epoll .)
|
|
Packit |
7cfc04 |
.PP
|
|
Packit |
7cfc04 |
The solution is to maintain a ready list
|
|
Packit |
7cfc04 |
and mark the file descriptor as ready
|
|
Packit |
7cfc04 |
in its associated data structure, thereby allowing the application to
|
|
Packit |
7cfc04 |
remember which files need to be processed but still round robin amongst
|
|
Packit |
7cfc04 |
all the ready files.
|
|
Packit |
7cfc04 |
This also supports ignoring subsequent events you
|
|
Packit |
7cfc04 |
receive for file descriptors that are already ready.
|
|
Packit |
7cfc04 |
.TP
|
|
Packit |
7cfc04 |
.B o If using an event cache...
|
|
Packit |
7cfc04 |
.PP
|
|
Packit |
7cfc04 |
If you use an event cache or store all the file descriptors returned from
|
|
Packit |
7cfc04 |
.BR epoll_wait (2),
|
|
Packit |
7cfc04 |
then make sure to provide a way to mark
|
|
Packit |
7cfc04 |
its closure dynamically (i.e., caused by
|
|
Packit |
7cfc04 |
a previous event's processing).
|
|
Packit |
7cfc04 |
Suppose you receive 100 events from
|
|
Packit |
7cfc04 |
.BR epoll_wait (2),
|
|
Packit |
7cfc04 |
and in event #47 a condition causes event #13 to be closed.
|
|
Packit |
7cfc04 |
If you remove the structure and
|
|
Packit |
7cfc04 |
.BR close (2)
|
|
Packit |
7cfc04 |
the file descriptor for event #13, then your
|
|
Packit |
7cfc04 |
event cache might still say there are events waiting for that
|
|
Packit |
7cfc04 |
file descriptor causing confusion.
|
|
Packit |
7cfc04 |
.PP
|
|
Packit |
7cfc04 |
One solution for this is to call, during the processing of event 47,
|
|
Packit |
7cfc04 |
.BR epoll_ctl ( EPOLL_CTL_DEL )
|
|
Packit |
7cfc04 |
to delete file descriptor 13 and
|
|
Packit |
7cfc04 |
.BR close (2),
|
|
Packit |
7cfc04 |
then mark its associated
|
|
Packit |
7cfc04 |
data structure as removed and link it to a cleanup list.
|
|
Packit |
7cfc04 |
If you find another
|
|
Packit |
7cfc04 |
event for file descriptor 13 in your batch processing,
|
|
Packit |
7cfc04 |
you will discover the file descriptor had been
|
|
Packit |
7cfc04 |
previously removed and there will be no confusion.
|
|
Packit |
7cfc04 |
.SH VERSIONS
|
|
Packit |
7cfc04 |
The
|
|
Packit |
7cfc04 |
.B epoll
|
|
Packit |
7cfc04 |
API was introduced in Linux kernel 2.5.44.
|
|
Packit |
7cfc04 |
.\" Its interface should be finalized in Linux kernel 2.5.66.
|
|
Packit |
7cfc04 |
Support was added to glibc in version 2.3.2.
|
|
Packit |
7cfc04 |
.SH CONFORMING TO
|
|
Packit |
7cfc04 |
The
|
|
Packit |
7cfc04 |
.B epoll
|
|
Packit |
7cfc04 |
API is Linux-specific.
|
|
Packit |
7cfc04 |
Some other systems provide similar
|
|
Packit |
7cfc04 |
mechanisms, for example, FreeBSD has
|
|
Packit |
7cfc04 |
.IR kqueue ,
|
|
Packit |
7cfc04 |
and Solaris has
|
|
Packit |
7cfc04 |
.IR /dev/poll .
|
|
Packit |
7cfc04 |
.SH NOTES
|
|
Packit |
7cfc04 |
The set of file descriptors that is being monitored via
|
|
Packit |
7cfc04 |
an epoll file descriptor can be viewed via the entry for
|
|
Packit |
7cfc04 |
the epoll file descriptor in the process's
|
|
Packit |
7cfc04 |
.IR /proc/[pid]/fdinfo
|
|
Packit |
7cfc04 |
directory.
|
|
Packit |
7cfc04 |
See
|
|
Packit |
7cfc04 |
.BR proc (5)
|
|
Packit |
7cfc04 |
for further details.
|
|
Packit |
7cfc04 |
.PP
|
|
Packit |
7cfc04 |
The
|
|
Packit |
7cfc04 |
.BR kcmp (2)
|
|
Packit |
7cfc04 |
.B KCMP_EPOLL_TFD
|
|
Packit |
7cfc04 |
operation can be used to test whether a file descriptor
|
|
Packit |
7cfc04 |
is present in an epoll instance.
|
|
Packit |
7cfc04 |
.SH SEE ALSO
|
|
Packit |
7cfc04 |
.BR epoll_create (2),
|
|
Packit |
7cfc04 |
.BR epoll_create1 (2),
|
|
Packit |
7cfc04 |
.BR epoll_ctl (2),
|
|
Packit |
7cfc04 |
.BR epoll_wait (2),
|
|
Packit |
7cfc04 |
.BR poll (2),
|
|
Packit |
7cfc04 |
.BR select (2)
|
|
Packit |
7cfc04 |
.SH COLOPHON
|
|
Packit |
7cfc04 |
This page is part of release 4.15 of the Linux
|
|
Packit |
7cfc04 |
.I man-pages
|
|
Packit |
7cfc04 |
project.
|
|
Packit |
7cfc04 |
A description of the project,
|
|
Packit |
7cfc04 |
information about reporting bugs,
|
|
Packit |
7cfc04 |
and the latest version of this page,
|
|
Packit |
7cfc04 |
can be found at
|
|
Packit |
7cfc04 |
\%https://www.kernel.org/doc/man\-pages/.
|