This is a tutorial document for libaio, as my blah post of 2024-01-30 (http://ftp.rodents-montreal.org/mouse/blah/2024-01-30-1.html) mentions. It is a work-in-progress and is doubtless incomplete. I welcome thoughts on it. libaio was originally designed for doing I/O, which is where the "io" part of the name comes from. The "a" comes from "asynchronous", which is only partially true - I/O is not really asynchronous, but the resulting programming paradigm is somewhat similar. There really are two (or arguably three) pieces to libaio. The first piece is the poll loop support. This is designed to be a top-level event loop designed around poll(2). There are calls to register file descriptors the program is interested in performing I/O on; when I/O is possible, the library calls a user-provided callback. For example, if a conventional event loop might do something like struct pollfd *pfds; int pfdx; setup() { ...set up somefd... } mainloop() { while (1) { ... pfds[pfdx].fd = somefd; pfds[pfdx].events = POLLIN | POLLRDNORM; ... poll(pfds,pfdx,...); ... if (pfds[pfdx].events & (POLLIN | POLLRDNORM | POLLHUP | POLLERR | POLLNVAL)) { input_somefd(); } ... } } main() { setup(); mainloop(); } then libaio version would look like main() { aio_poll_init(); ...set up somefd... aio_add_poll(somefd,&aio_rwtest_always,&aio_rwtest_never,&input_somefd,0,0); ... aio_event_loop(); return(1); } In the simplest cases, you don't need more than that. aio_add_poll takes a descriptor, interest test functions, callbacks, and a callback argument. In the simple example above, we're always interested in doing read I/O and never interested doing write I/O, so our read interest function is aio_rwtest_always, a utility function provided by the library that always indicates interest, and our write interest function is aio_rwtest_never, similar except it never indicates interest. The read function is input_somefd (but it needs a small change - see below), the write function doesn't matter because the write interest function never expresses interest (so we pass a nil pointer - that's the first 0), and the callback argument isn't useful (because input_somefd() gets everything it needs from global variables) so we make that a nil pointer too (the second 0). This means input_somefd has to accept a void * argument, which is not useful in this case but is necessary to conform to the API; it would probably be a dummy, as in void input_somefd(void *dummy) { (void)dummy; // shut up "unused argument" compiler warnings ... } Then, aio_event_loop loops forever calling poll() and, when it indicates that somefd is readable, calling input_somefd(). If we want to make it slightly more complicated, we might have something to send back in some cases. In a very simple version, we might want to operate in lock-step, so that we don't read when we have anything to send. It might look something like main() { aio_poll_init(); ...set up somefd... aio_add_poll(somefd,&somefd_not_sending,&somefd_sending, &input_somefd,&output_somefd,0); ... aio_event_loop(); return(1); } Then somefd_sending() would return true when there is something to send and somefd_not_sending() would return true when there is nothing to send. input_send() would then receive input, process it, and, when it has anything to send back, set whatever is necessary so that somefd_not_sending() returns zero and somefd_sending() returns nonzero. output_somefd() would then write data; once it's all written, it would reset so that somefd_not_sending() returns nonzero and somefd_sending() returns zero. Here's a complete example, different from the above only in that it reads from 0 and writes to 1 instead of sending and receiving on the same descriptor. This would be suitable for use with inetd or the like. (Error handling is very rudimentary in this version.) #include #include #include static char rbuf[8192]; static int rfill = 0; static int nothing_pending(void *dummy) { (void)dummy; return(rfill==0); } static int something_pending(void *dummy) { (void)dummy; return(rfill!=0); } static void rd_in(void *dummy) { (void)dummy; rfill = read(0,&rbuf[0],sizeof(rbuf)); if (rfill < 0) { // XXX handle error exit(1); } if (rfill == 0) { // XXX handle EOF exit(0); } } static void wr_out(void *dummy) { int w; int o; (void)dummy; o = 0; while (o < rfill) { w = write(1,&rbuf[o],rfill-o); if (w < 0) { // XXX handle error exit(1); } o += w; } rfill = 0; } int main(void) { aio_poll_init(); aio_add_poll(0,¬hing_pending,&aio_rwtest_never,&rd_in,0,0); aio_add_poll(1,&aio_rwtest_never,&something_pending,0,&wr_out,0); aio_event_loop(); // XXX handle error return(1); } At this point, you may be wondering why we're writing two functions which just return complementary booleans. This is because we're writing a very simple introductory program. If we wanted to handle input at the same time as having output queued, they would get more complex. We could do that ourselves, allocating blocks of output data and managing the queue. But this is exactly what libaio's output queues are for, so I'll take the opportunity to introduce them. The central type here is AIO_OQ. An AIO_OQ represents a queue of data waiting to be processed (typically, sent). There are calls to append data blocks to the tail of the queue and calls to pull data off the head of the queue and process it. There are also a few other ancillary calls. Here's a modified version using an AIO_OQ, with fairly minimal other changes: #include #include #include static AIO_OQ oq; static int want_write(void *dummy) { (void)dummy; return(aio_oq_nonempty(&oq)); } static void rd_in(void *dummy) { char rbuf[512]; int nr; (void)dummy; nr = read(0,&rbuf[0],sizeof(rbuf)); if (nr < 0) { // XXX handle error exit(1); } if (nr == 0) { // XXX handle EOF exit(0); } aio_oq_queue_copy(&oq,&rbuf[0],nr); } static void wr_out(void *dummy) { int w; (void)dummy; w = aio_oq_writev(&oq,1,-1); if (w < 0) { // XXX handle error exit(1); } aio_oq_dropdata(&oq,w); } int main(void) { aio_poll_init(); aio_oq_init(&oq); aio_add_poll(0,&aio_rwtest_always,&aio_rwtest_never,&rd_in,0,0); aio_add_poll(1,&aio_rwtest_never,&want_write,0,&wr_out,0); aio_event_loop(); // XXX handle error return(1); } This version has some issues, though. In particular, if output is slow to accept data, eventually buffers will fill up and the write operation buried in aio_oq_writev() will block. Sometimes, of course, this is what you want. But more often not, especially if you're trying to handle multiple data streams in a single process. Side note here: some people would use threads to do this. I don't, because I don't like threads in C. (In other languages, ones threading works well in, threads are fine. libaio is for C.) libaio does not play nice with threading. Provided you use a given poll loop or output queue from at most one thread at a time, it works fine, but any mutual exclusion between threads has to be handled by the caller - libaio has no facilities to help. libaio is really for event-driven programs, the kind of thing that, when written with threads, spends almost all of its time with all threads blocked waiting for I/O. What I find best for this is to set the relevant file descriptors non-blocking and then special-case EWOULDBLOCK errors in the read and write functions. (I usually special-case EINTR as well, though, unless the program uses signals, that's paranoia.) Here's what that last program looks like, with non-blocking mode turned on: #include #include #include #include #include static AIO_OQ oq; static void set_nb(int fd) { fcntl(fd,F_SETFL,fcntl(fd,F_GETFL,0)|O_NONBLOCK); } static int want_write(void *dummy) { (void)dummy; return(aio_oq_nonempty(&oq)); } static void rd_in(void *dummy) { char rbuf[512]; int nr; (void)dummy; nr = read(0,&rbuf[0],sizeof(rbuf)); if (nr < 0) { switch (errno) { case EINTR: case EWOULDBLOCK: return; break; } // XXX handle error exit(1); } if (nr == 0) { // XXX handle EOF exit(0); } aio_oq_queue_copy(&oq,&rbuf[0],nr); } static void wr_out(void *dummy) { int w; (void)dummy; w = aio_oq_writev(&oq,1,-1); if (w < 0) { switch (errno) { case EINTR: case EWOULDBLOCK: return; break; } // XXX handle error exit(1); } aio_oq_dropdata(&oq,w); } int main(void) { aio_poll_init(); aio_oq_init(&oq); set_nb(0); set_nb(1); aio_add_poll(0,&aio_rwtest_always,&aio_rwtest_never,&rd_in,0,0); aio_add_poll(1,&aio_rwtest_never,&want_write,0,&wr_out,0); aio_event_loop(); // XXX handle error return(1); } A little bigger (20 lines bigger, if I've counted right), but not much. With just one more change, this version is ready to handle many streams instead of just one. That change is straightforward. This version has only one queue of pending data. If we're going to be handling many streams, we want many queues. This also exemplifies the kind of desire that leads to that void * argument that's been a dummy all this time. Here's a version that still just copies from stdin to stdout, but does it in a multi-stream-ready way: #include #include #include #include #include typedef struct datastream DATASTREAM; struct datastream { int ifd; int ofd; AIO_OQ oq; } ; static void set_nb(int fd) { fcntl(fd,F_SETFL,fcntl(fd,F_GETFL,0)|O_NONBLOCK); } static int want_write(void *sv) { return(aio_oq_nonempty(&((DATASTREAM *)sv)->oq)); } static void rd_in(void *sv) { DATASTREAM *s; char rbuf[512]; int nr; s = sv; nr = read(s->ifd,&rbuf[0],sizeof(rbuf)); if (nr < 0) { switch (errno) { case EINTR: case EWOULDBLOCK: return; break; } // XXX handle error exit(1); } if (nr == 0) { // XXX handle EOF exit(0); } aio_oq_queue_copy(&s->oq,&rbuf[0],nr); } static void wr_out(void *sv) { DATASTREAM *s; int w; s = sv; w = aio_oq_writev(&s->oq,s->ofd,-1); if (w < 0) { switch (errno) { case EINTR: case EWOULDBLOCK: return; break; } // XXX handle error exit(1); } aio_oq_dropdata(&s->oq,w); } static void setup_stream(int i, int o) { DATASTREAM *s; s = malloc(sizeof(DATASTREAM)); set_nb(i); set_nb(o); s->ifd = i; s->ofd = o; aio_oq_init(&s->oq); aio_add_poll(i,&aio_rwtest_always,&aio_rwtest_never,&rd_in,0,s); aio_add_poll(o,&aio_rwtest_never,&want_write,0,&wr_out,s); } int main(void) { aio_poll_init(); setup_stream(0,1); aio_event_loop(); // XXX handle error return(1); } This version just needs one more call to setup_stream(), once the file descriptors exist, and it has another stream going. Note that the call to enqueue data, above, is aio_oq_queue_copy(). This version copies the data, so the buffer passed in does not have to remain valid past the point when aio_oq_queue_copy() returns. There are other variants: aio_oq_queue_point() just uses the data pointer passed in, which is suitable when it has static storage duration and won't change until the write finishes (an example is a string literal); aio_oq_queue_free() is just like aio_oq_queue_point() except that it calls free() on the data buffer once the write finishes (suitable when you have a malloc()ed buffer you want to write but don't need after it's written); aio_oq_queue_printf() is like fprintf except that, rather than writing to a FILE *, it queues the generated data on the AIO_OQ. There are others, but these are the common ones. That's the core of it. There are numerous frills - just one look at the list of calls libaio provides will give you some idea just how many frills - but there are only a few more pieces I want to mention here. The first is IDs. Every aio_add_poll or aio_add_block (see below about block functions) call returns a nonnegative ID. This ID can be used to deregister that function later; for example, if we were to turn the last data-copier program above into a multi-stream copier, we probably would want to be able to shut down one stream without affecting any others. This would involve saving the poll IDs returned by the aio_add_poll calls and using aio_remove_poll to deregister them. (It would also involve other things, such as closing the file descriptors and freeing memory.) Another is block functions. The library has the ability to call functions whenever it is about to do a potentially blocking poll() call. These can be used to do things like background processing, checking conditions, or the like. (But for background processing, be careful to limit the amount of processing you do per call; if a block function takes three seconds to run, the poll loop will lock up for those three seconds.) A block function also returns a value which can influence the poll loop's behaviour. It can return an integer not less than zero, in which case the poll call will block for at most that long (the units of that integer are the units of poll()'s third argument). It can return AIO_BLOCK_LOOP, in which case that iteration of the loop short-circuits, going back to the top of the loop instead of calling poll() at all. And, finally, it can return AIO_BLOCK_NIL, in which case it doesn't affect anything; the loop carries on as if that particular block function registration hadn't happened at all. The use cases these are designed for: - The nonnegative integer case is designed for a program with timeouts, where the block function computes and returns the remaining timeout, or, if the timeout has already passed, does whatever is appropriate and returns either a new timeout or one of the other two values. - AIO_BLOCK_LOOP is for the case where the block function registers or unregisters a poll or block function, or changes global state in a way that affects the return value of a read or write interest test function. - AIO_BLOCK_NIL is for the case where the block function checks and finds it has nothing to do. (An example might be an interactive program which has multiple ways to exit; they could simply set a global variable, with a block function that checks that variable and either exits or returns AIO_BLOCK_NIL. Another example might be a program using Xlib, with flushing handled by a block function which calls XFlush and returns AIO_BLOCK_NIL.) The last is priority queues. An AIO_PQ is just like an AIO_OQ except that it represents a queue of packets, not of bytes, and that it represents multiple queues, not just one. The queues are prioritized - when done writing a packet, the next packet comes from the highest-priority queue that has anything queued. (It has to be a stream of packets, instead of a stream of bytes, because it has to know where in the data stream it can switch priorities.) I find that AIO_PQ gets significantly less use than AIO_OQ, but when you want it, you really want it.