public inbox for libc-help@sourceware.org
 help / color / mirror / Atom feed
* Missing <aio_misc.h> exported header ?
@ 2019-08-22 11:53 Xavier Roche
  2019-09-04 19:28 ` Adhemerval Zanella
  0 siblings, 1 reply; 6+ messages in thread
From: Xavier Roche @ 2019-08-22 11:53 UTC (permalink / raw)
  To: libc-help

Dear glibc users,

The librt's <aio_misc.h> (which can be found in
sysdeps/pthread/aio_misc.h) header file seems not to be exported as
public header file (like <aio.h> is)

I was wondering if there was a reason behind this ?

The LIO_DSYNC/LIO_SYNC operations are implemented in glibc's librt and
seem perfectly working (providing much better performances when
syncing a lot of small files, specifically):

/* Extend the operation enum.  */
enum
{
  LIO_DSYNC = LIO_READ + 1,
  LIO_SYNC,
  LIO_READ64 = LIO_READ | 128,
  LIO_WRITE64 = LIO_WRITE | 128
};

[ Also note that the sysdeps/generic/aio_misc.h version includes the
erroneous LIO_DSYNC definition (b8744bea9bf, fix by Roland McGrath a
long time ago) ]

Regards,
Xavier Roche

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Missing <aio_misc.h> exported header ?
  2019-08-22 11:53 Missing <aio_misc.h> exported header ? Xavier Roche
@ 2019-09-04 19:28 ` Adhemerval Zanella
  2019-09-06 11:26   ` Xavier Roche
  0 siblings, 1 reply; 6+ messages in thread
From: Adhemerval Zanella @ 2019-09-04 19:28 UTC (permalink / raw)
  To: libc-help



On 22/08/2019 08:53, Xavier Roche wrote:
> Dear glibc users,
> 
> The librt's <aio_misc.h> (which can be found in
> sysdeps/pthread/aio_misc.h) header file seems not to be exported as
> public header file (like <aio.h> is)
> 
> I was wondering if there was a reason behind this ?
> 
> The LIO_DSYNC/LIO_SYNC operations are implemented in glibc's librt and
> seem perfectly working (providing much better performances when
> syncing a lot of small files, specifically):
> 
> /* Extend the operation enum.  */
> enum
> {
>   LIO_DSYNC = LIO_READ + 1,
>   LIO_SYNC,
>   LIO_READ64 = LIO_READ | 128,
>   LIO_WRITE64 = LIO_WRITE | 128
> };
> 
> [ Also note that the sysdeps/generic/aio_misc.h version includes the
> erroneous LIO_DSYNC definition (b8744bea9bf, fix by Roland McGrath a
> long time ago) ]

Because it an internal-only implementation files.  The Linux one, for
instance, calls syscall directly using internal macros (INTERNAL_SYSCALL)
which are not meant to be exported.

The LIO_DSYNC/LIO_SYNC are used to call the internal __aio_enqueue_request,
external usage should use aio_fsync. By "providing much better performances when
> syncing a lot of small files" which exactly usage pattern are you referring? 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Missing <aio_misc.h> exported header ?
  2019-09-04 19:28 ` Adhemerval Zanella
@ 2019-09-06 11:26   ` Xavier Roche
  2019-09-07 18:56     ` Adhemerval Zanella
  0 siblings, 1 reply; 6+ messages in thread
From: Xavier Roche @ 2019-09-06 11:26 UTC (permalink / raw)
  To: libc-help

[-- Attachment #1: Type: text/plain, Size: 1462 bytes --]

Hi!,

On Wed, Sep 4, 2019 at 9:28 PM Adhemerval Zanella
<adhemerval.zanella@linaro.org> wrote:
> Because it an internal-only implementation files.  The Linux one, for
> instance, calls syscall directly using internal macros (INTERNAL_SYSCALL)
> which are not meant to be exported.

My understanding is that on Linux the kernel version was bypassed, and
the pure glibc version was used ?

Could the glibc version expose this feature ? The fsync call is
surprisingly missing in the lio_listio() flavor.

> The LIO_DSYNC/LIO_SYNC are used to call the internal __aio_enqueue_request,
> external usage should use aio_fsync. By "providing much better performances when
> > syncing a lot of small files" which exactly usage pattern are you referring?

The program attached is a small benchmark program that, despite its
rather basic nature, allows to spot differences between different
strategies:
- sync every files with fsync or fdatasync
- use aio
- sync with global syncfs call

For example, with 10 files of 10KB,
g++ -Wall -Wextra -std=c++17 -O3 -g3 fsynctest.cpp -o fsynctest -lrt
./fsynctest 10000 10

I get the following consistent results with a standard consumer SSD:
AVERAGE fsync: 22ms
AVERAGE fdatasync: 21ms
AVERAGE parallel(fsync): 3ms
AVERAGE parallel(fdatasync): 3ms
AVERAGE syncfs: 5ms
AVERAGE sync: 5ms

The idea being that a single lio_listio() is better than several aio_fsync().

Regards,


-- 
Xavier Roche -
xavier.roche@algolia.com
0675167036

[-- Attachment #2: fsynctest.cpp --]
[-- Type: text/x-c++src, Size: 5775 bytes --]

// fsynctest.cpp
// fsync of multiple files pattern test sample
// Xavier Roche, Algolia, 2019
// This file is covered by the MIT License

#include <iostream>
#include <string>
#include <vector>
#include <functional>
#include <chrono>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>
#include <assert.h>
#include <aio.h>
#include <string.h>

// g++ -Wall -Wextra -std=c++17 -O3 -g3 fsynctest.cpp -o fsynctest -lrt

/* Missing <aio_misc.h> definition. */
#ifndef LIO_DSYNC
enum
{
    LIO_DSYNC = LIO_NOP + 1,
    LIO_SYNC,
    LIO_READ64 = LIO_READ | 128,
    LIO_WRITE64 = LIO_WRITE | 128
};
#endif

enum class SyncMode
{
    Fsync,
    Fdatasync,
    ParallelFsync,
    ParallelFdatasync,
    Syncfs,
    Sync,
    _MAX = Sync,
};

template<typename T>
class Wrapped
{
public:
    Wrapped(T& ref, std::function<void(T&)> fun)
      : _ref(ref)
      , _fun(fun)
    {}

    ~Wrapped() { _fun(_ref); }

private:
    T& _ref;
    std::function<void(T&)> _fun;
};

struct File
{
    File(const std::string& file, int fd)
      : file(file)
      , fd(fd)
    {}

    std::string file;
    int fd;
};

#define BENCH_COUNT 10

int main(int argc, char** argv)
{
    if (argc != 3) {
        std::cout << "Usage: " << argv[0] << " <file_size> <nbfiles>"
                  << "\n";
    }
    const size_t size = std::stoi(argv[1]);
    const size_t nbfiles = std::stoi(argv[2]);
    assert(nbfiles > 0);

    long microseconds[static_cast<size_t>(SyncMode::_MAX) + 1] = {};

    for (size_t bench = 0; bench < BENCH_COUNT; bench++) {
        for (SyncMode mode = static_cast<SyncMode>(0);
             static_cast<size_t>(mode) <= static_cast<size_t>(SyncMode::_MAX);
             mode = static_cast<SyncMode>(static_cast<size_t>(mode) + 1)) {
            std::vector<char> data;
            data.resize(size);

            std::vector<File> files;
            const Wrapped<decltype(files)> wrap(files, (std::function<void(std::vector<File>&)>)[](auto& files) {
                for (const auto& file : files) {
                    close(file.fd);
                    unlink(file.file.c_str());
                }
            });

            const auto start = std::chrono::steady_clock::now();

            for (size_t i = 0; i < nbfiles; i++) {
                const std::string testfile = std::to_string(i);
                const int fd = open(testfile.c_str(),
                                    O_RDWR | O_CREAT | O_TRUNC | O_CLOEXEC,
                                    S_IRUSR | S_IWUSR | S_IRGRP | S_IWGRP | S_IROTH | S_IWOTH);
                assert(fd != -1);
                const size_t nw = write(fd, data.data(), data.size());
                assert(nw == data.size());

                files.push_back(File(testfile, fd));
            }

            const char* smode;
            switch (mode) {
                case SyncMode::Fsync:
                    smode = "fsync";
                    for (const auto& file : files) {
                        if (fsync(file.fd) != 0) {
                            assert(!"fsync failed");
                        }
                    }
                    break;
                case SyncMode::Fdatasync:
                    smode = "fdatasync";
                    for (const auto& file : files) {
                        if (fdatasync(file.fd) != 0) {
                            assert(!"fdatasync failed");
                        }
                    }
                    break;
                case SyncMode::ParallelFsync:
                case SyncMode::ParallelFdatasync: {
                    const bool isFsync = mode == SyncMode::ParallelFsync;
                    smode = isFsync ? "parallel(fsync)" : "parallel(fdatasync)";

                    std::vector<struct aiocb> syncs;
                    syncs.resize(nbfiles);

                    std::vector<struct aiocb*> psyncs;
                    psyncs.resize(nbfiles);

                    for (size_t i = 0; i < nbfiles; i++) {
                        syncs[i].aio_fildes = files[i].fd;
                        syncs[i].aio_sigevent.sigev_notify = SIGEV_NONE;
                        syncs[i].aio_lio_opcode = isFsync ? LIO_SYNC : LIO_DSYNC;
                        psyncs[i] = &syncs[i];
                    }

                    // Wait for all IO to be completed; EIO mark at least one IO error
                    lio_listio(LIO_WAIT, psyncs.data(), psyncs.size(), nullptr);
                    for (const auto& sync : psyncs) {
                        const auto result = aio_error(sync);
                        if (result != 0) {
                            assert(result == 0 || !"aio_error");
                        }
                    }

                } break;
                case SyncMode::Syncfs:
                    smode = "syncfs";
                    if (syncfs(files[0].fd) != 0) {
                        assert(!"syncfs");
                    }
                    break;
                case SyncMode::Sync:
                    smode = "sync";
                    sync();
                    break;
                default:
                    assert(!"oops");
            }

            const auto end = std::chrono::steady_clock::now();
            const auto elapsed = std::chrono::duration_cast<std::chrono::microseconds>(end - start);
            microseconds[static_cast<size_t>(mode)] += elapsed.count();

            if (bench < BENCH_COUNT - 1) {
                std::cout << smode << ": " << (elapsed.count() / 1000) << "ms\n";
            } else {
                std::cout << "AVERAGE " << smode << ": "
                          << ((microseconds[static_cast<size_t>(mode)] / BENCH_COUNT) / 1000) << "ms\n";
            }
        }
        std::cout << "\n";
    }

    return 0;
}

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Missing <aio_misc.h> exported header ?
  2019-09-06 11:26   ` Xavier Roche
@ 2019-09-07 18:56     ` Adhemerval Zanella
  2019-09-09  7:06       ` Xavier Roche
  0 siblings, 1 reply; 6+ messages in thread
From: Adhemerval Zanella @ 2019-09-07 18:56 UTC (permalink / raw)
  To: libc-help



On 06/09/2019 08:25, Xavier Roche wrote:
> Hi!,
> 
> On Wed, Sep 4, 2019 at 9:28 PM Adhemerval Zanella
> <adhemerval.zanella@linaro.org> wrote:
>> Because it an internal-only implementation files.  The Linux one, for
>> instance, calls syscall directly using internal macros (INTERNAL_SYSCALL)
>> which are not meant to be exported.
> 
> My understanding is that on Linux the kernel version was bypassed, and
> the pure glibc version was used ?

Internally, POSIX AIO code is organized so an environment that supports
it natively can reimplement the required internal API and not use the
generic pthread based one. But currently only Hurd and Linux are actually 
supported, so it means only the pthread version is used. Linux version
also assumes NPTL, so for some primitives it calls futex operations directly 
as an optimization.

Now, for the code organization the Linux aio_misc.h implementation used
the include_next which will in turn include sysdeps/nptl/aio_misc.h which
then includes sysdeps/pthread/aio_misc.h (this is set by the sysdeps 
mechanism defines the Implies files).

> 
> Could the glibc version expose this feature ? The fsync call is
> surprisingly missing in the lio_listio() flavor.

It would be possible to include them as GNU extension, but since our
current implementation is just default synchronous IO implemented with
thread operation you won't get much without some lacking kernel support
(man(7) aio describe briefly the issue with an userland AIO implementation
as done by glibc).

Even kernel support has its drawbacks [1] and I am not sure how it has matured
over these 3 years. You can use it through libaio [2] and if I recall correctly
some RH based distro applied some out-of-tree patches to add some support on
glibc POSIX aio implementation.


> 
>> The LIO_DSYNC/LIO_SYNC are used to call the internal __aio_enqueue_request,
>> external usage should use aio_fsync. By "providing much better performances when
>>> syncing a lot of small files" which exactly usage pattern are you referring?
> 
> The program attached is a small benchmark program that, despite its
> rather basic nature, allows to spot differences between different
> strategies:
> - sync every files with fsync or fdatasync
> - use aio
> - sync with global syncfs call
> 
> For example, with 10 files of 10KB,
> g++ -Wall -Wextra -std=c++17 -O3 -g3 fsynctest.cpp -o fsynctest -lrt
> ./fsynctest 10000 10
> 
> I get the following consistent results with a standard consumer SSD:
> AVERAGE fsync: 22ms
> AVERAGE fdatasync: 21ms
> AVERAGE parallel(fsync): 3ms
> AVERAGE parallel(fdatasync): 3ms
> AVERAGE syncfs: 5ms
> AVERAGE sync: 5ms
> 
> The idea being that a single lio_listio() is better than several aio_fsync().

Which is not true for glibc implementation, both calls are essentially the same
For instance using your benchmark as base, with the following extra strategy:

---
                case SyncMode::ParallelAioFsync:
                case SyncMode::ParallelAioFdatasync: {
                    const bool isFsync = mode == SyncMode::ParallelFsync;
                    smode = isFsync ? "aio_parallel(fsync)" : "aio_parallel(fdatasync)";

                    std::vector<struct aiocb> syncs;
                    syncs.resize(nbfiles);

                    std::vector<struct aiocb*> psyncs;
                    psyncs.resize(nbfiles);

                    for (size_t i = 0; i < nbfiles; i++) {
                        syncs[i].aio_fildes = files[i].fd;
                        syncs[i].aio_sigevent.sigev_notify = SIGEV_NONE;
                        if (aio_fsync (isFsync ? O_SYNC : O_DSYNC, &syncs[i]) < 0) {
                            assert (!"aio_fsync failed");
                        }
                        psyncs[i] = &syncs[i];
                    }

                    bool go_on;
                    do {
                        aio_suspend(psyncs.data(), psyncs.size(), nullptr);
                        go_on = false;
                        for (const auto& sync : psyncs) {
                            if (aio_error(sync) == EINPROGRESS) {
                                go_on = true;
                            } else {
                                if (aio_return (sync) == -1 && aio_error (sync) != 0) {
                                   assert(!"aio_error");
                               }
                            }
                        }
                    } while (go_on);
                } break;
---

I see on my machine (also with a consumer grade SSD):

AVERAGE fsync: 75ms
AVERAGE fdatasync: 81ms
AVERAGE parallel(fsync): 8ms
AVERAGE parallel(fdatasync): 8ms
AVERAGE aio_parallel(fdatasync): 8ms
AVERAGE aio_parallel(fdatasync): 8ms
AVERAGE syncfs: 9ms
AVERAGE sync: 8ms

Basically you have on glibc implementation:

---
lio_listio (...) {
  return lio_listio_internal (...);
}

static int
lio_listio_internal (int mode, struct aiocb *const list[], int nent,
                     struct sigevent *sig)
{
  [...]

  /* Request the mutex.  */
  pthread_mutex_lock (&__aio_requests_mutex);

  for (cnt = 0; cnt < nent; ++cnt)
    if (list[cnt] != NULL && list[cnt]->aio_lio_opcode != LIO_NOP)
      { 
	[...]
	requests[cnt] = __aio_enqueue_request (...)
	[...]
      }

  [...]

  struct waitlist waitlist[nent];

  [...]

  else if (LIO_MODE (mode) == LIO_WAIT)
    {
      for (cnt = 0; cnt < nent; ++cnt)
	/* Add each element on the waitlist

      AIO_MISC_WAIT (result, total, NULL, 0);
    }

  /* Release the mutex.  */
  pthread_mutex_unlock (&__aio_requests_mutex);

  return result;
}
---

While aio_fsync and aio_suspend are:

---
int
aio_fsync (int op, struct aiocb *aiocbp)
{
  [...]
  return (__aio_enqueue_request (...)
          ? -1 : 0);
}

int
aio_suspend (const struct aiocb *const list[], int nent,
             const struct timespec *timeout)
{
  struct waitlist waitlist[nent]
  [...]

  pthread_mutex_lock (&__aio_requests_mutex);

  [...]

  for (cnt = 0; cnt < nent; ++cnt)
    if (list[cnt] != NULL)
      {
	/* Update waitlist.  */

  result = do_aio_misc_wait (&cntr, timeout);
  \_ AIO_MISC_WAIT (result, *cntr, timeout, 1);

    /* Release the mutex.  */
  pthread_mutex_unlock (&__aio_requests_mutex);
---

You might see a small contention on the lio_listio case because the
__aio_enqueue_request might require a small synchronization (since although
it is recursive, it might issue less atomic operation) plus some less CPU
time due less function call, but on IO bounded work cases I think it won't 
matter.

I really think it does not worth the development time to add sync operation
extension as lio_listio at current state.  You may try to bring it to
Austin Group, but I think POSIX AIO will be only a viable usage API once
we get proper kernel support.

[1] https://lwn.net/Articles/671649/
[2] https://pagure.io/libaio

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Missing <aio_misc.h> exported header ?
  2019-09-07 18:56     ` Adhemerval Zanella
@ 2019-09-09  7:06       ` Xavier Roche
  2019-09-09 17:15         ` Adhemerval Zanella
  0 siblings, 1 reply; 6+ messages in thread
From: Xavier Roche @ 2019-09-09  7:06 UTC (permalink / raw)
  To: libc-help

Hi folks!,

On Sat, Sep 7, 2019 at 8:56 PM Adhemerval Zanella
<adhemerval.zanella@linaro.org> wrote:
> It would be possible to include them as GNU extension, but since our
> current implementation is just default synchronous IO implemented with
> thread operation you won't get much without some lacking kernel support
> (man(7) aio describe briefly the issue with an userland AIO implementation
> as done by glibc).

True. But this may ease the transition if the kernel does implement
proper aio primitives one day. If it even happens one day, of course
(which may never happen, as arguments exposed by the LWN article you
cite seem sound)

> Which is not true for glibc implementation, both calls are essentially the same

Thanks, I was fooled by my micro-benchmarks, and I'll revert not using
lio_listio() in such case.

But it might be sound to have a /consistent/ API anyway (ie. excluding
fsync primitives from lio_listio() is kind of a surprising choice ?)

> I really think it does not worth the development time to add sync operation
> extension as lio_listio at current state.

For the pthread version, it does already "work" (if you define the
special macros), but I admit it would require some official API
change/extension, documentation & testing..

> You may try to bring it to
> Austin Group, but I think POSIX AIO will be only a viable usage API once
> we get proper kernel support.

My understanding is that POSIX won't be interested at all until the
interface actually exist somewhere (ie. glibc); ie. the group
typically merge in the standard existing features, never original ones
as far as I know. But maybe this is not true for extensions ?

Thanks a lot for the detailed and thorough explanations!

-- 
Xavier Roche -

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Missing <aio_misc.h> exported header ?
  2019-09-09  7:06       ` Xavier Roche
@ 2019-09-09 17:15         ` Adhemerval Zanella
  0 siblings, 0 replies; 6+ messages in thread
From: Adhemerval Zanella @ 2019-09-09 17:15 UTC (permalink / raw)
  To: libc-help



On 09/09/2019 04:06, Xavier Roche wrote:
> Hi folks!,
> 
> On Sat, Sep 7, 2019 at 8:56 PM Adhemerval Zanella
> <adhemerval.zanella@linaro.org> wrote:
>> It would be possible to include them as GNU extension, but since our
>> current implementation is just default synchronous IO implemented with
>> thread operation you won't get much without some lacking kernel support
>> (man(7) aio describe briefly the issue with an userland AIO implementation
>> as done by glibc).
> 
> True. But this may ease the transition if the kernel does implement
> proper aio primitives one day. If it even happens one day, of course
> (which may never happen, as arguments exposed by the LWN article you
> cite seem sound)

It would assume kernel also provide a similar functionality that does
not require another layer of indirection. Fedora like distribution used
to provide the librtkaio [1], which is a out-of-tree patch to librt to
enable the Linux AIO primitives along POSIX AIO interfaces.

The issue is Linux AIO was not initially designed to be POSIX AIO compatible,
and as the previous link indicated "it never provided full POSIX conformance
and worked only with a small subset of the required POSIX realtime features, 
those supported by KAIO".

It is different from other systems, like FreeBSD, which provides direct kernel
support (lio_listio is a syscall on FreeBSD for instance). I can't say about
its advantages or drawbacks compared to either to its other I/O APIs or to 
other systems though.

> 
>> Which is not true for glibc implementation, both calls are essentially the same
> 
> Thanks, I was fooled by my micro-benchmarks, and I'll revert not using
> lio_listio() in such case.
> 

Another problem of using LIO_DSYNC/LIO_SYNC with lio_listio is it abuses the
interface by using an internal implementation detail. It is unlikely, but still
possible that in a future version we change the LIO_DSYNC/LIO_SYNC internal
values and your program will stop working.

> But it might be sound to have a /consistent/ API anyway (ie. excluding
> fsync primitives from lio_listio() is kind of a surprising choice ?)

I don't disagree, but even for system that does provide POSIX AIO natively
(FreeBSD) it is not supported as an extension. In fact, I am not aware of any 
system that supports it as an extension.

> 
>> I really think it does not worth the development time to add sync operation
>> extension as lio_listio at current state.
> 
> For the pthread version, it does already "work" (if you define the
> special macros), but I admit it would require some official API
> change/extension, documentation & testing..
> 
>> You may try to bring it to
>> Austin Group, but I think POSIX AIO will be only a viable usage API once
>> we get proper kernel support.
> 
> My understanding is that POSIX won't be interested at all until the
> interface actually exist somewhere (ie. glibc); ie. the group
> typically merge in the standard existing features, never original ones
> as far as I know. But maybe this is not true for extensions ?

Indeed this is the usual path, although for already in place interfaces
the standardized one might differ from the existent one if it show 
deficiencies (for instance the f_owner_ex type where defining it as enum
is not the best option).

However, to actually add the existing feature we usually require some
initial plan on how this new interface would be useful, the deficiencies
it might to overcome, and the advantages it might bring.  My wild guess
is up now there isn't a strong demand for such extension (based also
on other POSIX-like system that does not provide such functionality), and 
in this case I don't see much gain in adding it.

But I am not against adding this extension, so fell free to stir up 
discussions on libc-alpha with a prof-of-concept implementation.  

> 
> Thanks a lot for the detailed and thorough explanations!
> 

[1] https://fedoraproject.org/wiki/Changes/GLIBC223_librtkaio_removal

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2019-09-09 17:15 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-22 11:53 Missing <aio_misc.h> exported header ? Xavier Roche
2019-09-04 19:28 ` Adhemerval Zanella
2019-09-06 11:26   ` Xavier Roche
2019-09-07 18:56     ` Adhemerval Zanella
2019-09-09  7:06       ` Xavier Roche
2019-09-09 17:15         ` Adhemerval Zanella

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).