public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
From: Adhemerval Zanella <adhemerval.zanella@linaro.org>
To: Jonathon Anderson <janderson@rice.edu>, libc-alpha@sourceware.org
Cc: John Mellor-Crummey <johnmc@rice.edu>
Subject: Re: [PATCH v6 4/5] io: Add closefrom [BZ #10353]
Date: Mon, 5 Jul 2021 17:26:37 -0300	[thread overview]
Message-ID: <f58c689d-2b33-3e1a-5357-24d0c08105e3@linaro.org> (raw)
In-Reply-To: <b3473d33-fa2a-78e4-f51c-415f39858aea@rice.edu>



On 03/07/2021 11:45, Jonathon Anderson via Libc-alpha wrote:
> Hello,
> 
> I'm part of a team at Rice University developing HPCToolkit, a fine-grained sampling-based performance analysis tool. Recently we have been sharing our experiences using the LD_AUDIT framework with the community, this patch series came to our attention as the subject of a long-standing issue we have run into multiple times with our tool.
> 
> The performance measurement portion of our tool runs as a library preloaded into the application, sharing the address space and (more importantly) file descriptors. As mentioned previously in this thread there are a number of projects which at times close all open file descriptors not known to the application (often via iteration over /proc/self/fd). We have found that in most cases this will also close our own file descriptors with no warning, causing us to corrupt and lose valuable measurement data, especially for highly multi-threaded applications.
> 
> In response to this problem we have considered reserving file descriptors above the RLIMIT_NOFILE limit, a technique used by Valgrind [7,8] where file descriptors are opened early in the process and then the limit is lowered below them (Valgrind intercepts getrlimit to report modified limits to the application, we would use setrlimit to lower the limit instead). While this does not remove their entries from /proc/self/fd (they are still valid open file descriptors), a number of projects that close all open file descriptors support this technique by only closing descriptors within the RLIMIT_NOFILE bound (obtained directly or indirectly through getdtablesize) [1][3][4][6].
> 
> Two of the three implementations of closefrom provided by this patch *do not* support this technique (specific locations marked below). Not only is this inconsistent, it directly affects in-process tools like HPCToolkit as applications shift to using modern Glibc extensions.
> 
> We recommend that this be adjusted before this patch is accepted into upstream Glibc, and that support for this technique be tested in tst-closefrom.c.

My initial idea is follow the the current closefrom() semantic provided by
some BSDs (FreeBSD and OpenBSD) which do not consult the current limit and
just use the higher limit for the syscall.

And my understanding of other libraries and programs that use RLIMIT_NOFILE
to get the higher file limit is only for *fallback* implementations and to 
to have a somewhat bounded execution time (the cpython have a comment 
stating it [1]) 

The systemd code you posted for instance will *only* check against
get_max_fd() iff "/proc/self/fd" can not be opened, otherwise it will
interact over *all* opened files regardless.  This is also the same
for cypthon [2].  I can't really comment on the Rust implementation,
but if it only relying to RLIMIT_NOFILE it will most likely suffer
for file descriptor leakage in some scenarios (if a library does the
same trick as you described).

So sorry, but I think this not the best semantic for closefrom() and it also
does not help a program that uses close_range. In fact, this specific semantic
might indeed make users prefer close_range instead.

Valgrind can overwrite it because it make is invisible to user program and 
libc, and for your tool I think the best course of action would be to 
interpose closefrom() and clone_range() and exclude the file descriptors 
you want to keep it opened.


[1] https://github.com/python/cpython/blob/9e4f2f3a6b8ee995c365e86d976937c141d867f8/Modules/_posixsubprocess.c#L213
[2] https://github.com/python/cpython/blob/9e4f2f3a6b8ee995c365e86d976937c141d867f8/Modules/_posixsubprocess.c#L281
> 
> -Jonathon
> 
> [1] https://github.com/systemd/systemd/blob/5238e9575906297608ff802a27e2ff9effa3b338/src/basic/fd-util.c#L203-L204
> [3] https://github.com/python/cpython/blob/9e4f2f3a6b8ee995c365e86d976937c141d867f8/Modules/_posixsubprocess.c#L196-L200
> [4] https://github.com/rust-lang/rust/blob/5f47c0613ed4eb46fca3633c1297364c09e5e451/src/libstd/sys/unix/process2.rs#L161-L164
> [6] https://hg.mozilla.org/mozilla-central/file/tip/ipc/chromium/src/base/process_util_posix.cc#l152
> [7] https://sourceware.org/git/?p=valgrind.git;a=blob;f=coregrind/m_main.c;hb=393732dda164c1cc0fc511eadc0b8f06008ade4f#l1125
> [8] https://sourceware.org/git/?p=valgrind.git;a=blob;f=coregrind/m_syswrap/syswrap-generic.c;hb=393732dda164c1cc0fc511eadc0b8f06008ade4f#l3609
> 
> On 6/23/21 1:51 PM, Adhemerval Zanella via Libc-alpha wrote:
>> diff --git a/sysdeps/unix/sysv/linux/closefrom.c b/sysdeps/unix/sysv/linux/closefrom.c
>> new file mode 100644
>> index 0000000000..f5d7342c2c
>> --- /dev/null
>> +++ b/sysdeps/unix/sysv/linux/closefrom.c
>> @@ -0,0 +1,35 @@
>> +/* Close a range of file descriptors.  Linux version.
>> +   Copyright (C) 2021 Free Software Foundation, Inc.
>> +   This file is part of the GNU C Library.
>> +
>> +   The GNU C Library is free software; you can redistribute it and/or
>> +   modify it under the terms of the GNU Lesser General Public
>> +   License as published by the Free Software Foundation; either
>> +   version 2.1 of the License, or (at your option) any later version.
>> +
>> +   The GNU C Library is distributed in the hope that it will be useful,
>> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
>> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
>> +   Lesser General Public License for more details.
>> +
>> +   You should have received a copy of the GNU Lesser General Public
>> +   License along with the GNU C Library; if not, see
>> +   <https://www.gnu.org/licenses/>.  */
>> +
>> +#include <stdio.h>
>> +#include <sys/param.h>
>> +#include <unistd.h>
>> +
>> +void
>> +__closefrom (int lowfd)
>> +{
>> +  int l = MAX (0, lowfd);
>> +
>> +  int r = __close_range (l, ~0U, 0);
> The upper limit `~0U` should be changed to `__getdtablesize()` or similar.
>> +  if (r == 0)
>> +    return;
>> +
>> +  if (!__closefrom_fallback (l))
>> +    __fortify_fail ("closefrom failed to close a file descriptor");
>> +}
>> +weak_alias (__closefrom, closefrom)
>> diff --git a/sysdeps/unix/sysv/linux/closefrom_fallback.c b/sysdeps/unix/sysv/linux/closefrom_fallback.c
>> new file mode 100644
>> index 0000000000..61e71d388d
>> --- /dev/null
>> +++ b/sysdeps/unix/sysv/linux/closefrom_fallback.c
>> @@ -0,0 +1,97 @@
>> +/* Close a range of file descriptors.  Linux version.
>> +   Copyright (C) 2021 Free Software Foundation, Inc.
>> +   This file is part of the GNU C Library.
>> +
>> +   The GNU C Library is free software; you can redistribute it and/or
>> +   modify it under the terms of the GNU Lesser General Public
>> +   License as published by the Free Software Foundation; either
>> +   version 2.1 of the License, or (at your option) any later version.
>> +
>> +   The GNU C Library is distributed in the hope that it will be useful,
>> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
>> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
>> +   Lesser General Public License for more details.
>> +
>> +   You should have received a copy of the GNU Lesser General Public
>> +   License along with the GNU C Library; if not, see
>> +   <https://www.gnu.org/licenses/>.  */
>> +
>> +#include <arch-fd_to_filename.h>
>> +#include <dirent.h>
>> +#include <not-cancel.h>
>> +#include <stdbool.h>
>> +
>> +/* Fallback code: iterates over /proc/self/fd, closing each file descriptor
>> +   that fall on the criteria.  */
>> +_Bool
>> +__closefrom_fallback (int from)
>> +{
>> +  bool ret = false;
>> +
>> +  int dirfd = __open_nocancel (FD_TO_FILENAME_PREFIX, O_RDONLY | O_DIRECTORY,
>> +                               0);
>> +  if (dirfd == -1)
>> +    {
>> +      /* The closefrom should work even when process can't open new files.  */
>> +      if (errno == ENOENT)
>> +        goto err;
>> +
>> +      for (int i = from; i < INT_MAX; i++)
>> +        {
>> +          int r = __close_nocancel (i);
>> +          if (r == 0 || (r == -1 && errno != EBADF))
>> +            break;
>> +        }
>> +
>> +      dirfd = __open_nocancel (FD_TO_FILENAME_PREFIX, O_RDONLY | O_DIRECTORY,
>> +                               0);
>> +      if (dirfd == -1)
>> +        goto err;
>> +    }
>> +
>> +  char buffer[1024];
>> +  while (true)
>> +    {
>> +      ssize_t ret = __getdents64 (dirfd, buffer, sizeof (buffer));
>> +      if (ret == -1)
>> +        goto err;
>> +      else if (ret == 0)
>> +        break;
>> +
>> +      /* If any file descriptor is closed it resets the /proc/self position
>> +         read again from the start (to obtain any possible kernel update).  */
>> +      bool closed = false;
>> +      char *begin = buffer, *end = buffer + ret;
>> +      while (begin != end)
>> +        {
>> +          unsigned short int d_reclen;
>> +          memcpy (&d_reclen, begin + offsetof (struct dirent64, d_reclen),
>> +                  sizeof (d_reclen));
>> +          const char *dname = begin + offsetof (struct dirent64, d_name);
>> +          begin += d_reclen;
>> +
>> +          if (dname[0] == '.')
>> +            continue;
>> +
>> +          int fd = 0;
>> +          for (const char *s = dname; (unsigned int) (*s) - '0' < 10; s++)
>> +            fd = 10 * fd + (*s - '0');
>> +
>> +          if (fd == dirfd || fd < from)
>> +            continue;
> An extra condition here should check that `fd < __getdtablesize()` or similar.
>> +
>> +          /* We ignore close errors because EBADF, EINTR, and EIO means the
>> +             descriptor has been released.  */
>> +          __close_nocancel (fd);
>> +          closed = true;
>> +        }
>> +
>> +      if (closed && __lseek (dirfd, 0, SEEK_SET) < 0)
>> +        goto err;
>> +    }
>> +
>> +  ret = true;
>> +err:
>> +  __close_nocancel (dirfd);
>> +  return ret;
>> +}

  reply	other threads:[~2021-07-05 20:26 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-23 18:51 [PATCH v6 0/5] Add close_range, closefrom, and posix_spawn_file_actions_closefrom_np Adhemerval Zanella
2021-06-23 18:51 ` [PATCH v6 1/5] support: Add support_stack_alloc Adhemerval Zanella
2021-06-24  9:15   ` Florian Weimer
2021-06-24 11:33     ` Adhemerval Zanella
2021-06-23 18:51 ` [PATCH v6 2/5] support: Add xclone Adhemerval Zanella
2021-06-24  9:15   ` Florian Weimer
2021-06-23 18:51 ` [PATCH v6 3/5] linux: Add close_range Adhemerval Zanella
2021-06-27 17:23   ` Florian Weimer
2021-06-23 18:51 ` [PATCH v6 4/5] io: Add closefrom [BZ #10353] Adhemerval Zanella
2021-07-03 14:45   ` Jonathon Anderson
2021-07-05 20:26     ` Adhemerval Zanella [this message]
2021-07-06 11:37       ` Florian Weimer
2021-06-23 18:51 ` [PATCH v6 5/5] posix: Add posix_spawn_file_actions_closefrom_np Adhemerval Zanella

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f58c689d-2b33-3e1a-5357-24d0c08105e3@linaro.org \
    --to=adhemerval.zanella@linaro.org \
    --cc=janderson@rice.edu \
    --cc=johnmc@rice.edu \
    --cc=libc-alpha@sourceware.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).