From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pg1-x533.google.com (mail-pg1-x533.google.com [IPv6:2607:f8b0:4864:20::533]) by sourceware.org (Postfix) with ESMTPS id DE568385DC10 for ; Mon, 5 Jul 2021 20:26:41 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org DE568385DC10 Received: by mail-pg1-x533.google.com with SMTP id f5so10015406pgv.3 for ; Mon, 05 Jul 2021 13:26:41 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=+f9FV2ZzmbesL2KeEQl3Gi5Yr/2no4V1rzMkLZgMV08=; b=NjgzMNet0VXe/VKQPVA3KP8LrKQvrPb1dUS0INvwzJEd2nuVesZX7eNTlVFB8EPITH 3ERsMtNcrm+9rc18LaNWmz7n2L0ziitj7f3gyzcKHTzLTc0I5zwdGh9s0CyvuNyNb7Ax Z4Wf22KwNypYFGgobSjXBJA7sLoYiQJ4uU0iGfCdXQtLw0Sxzpd0mvWRqcZx2ygXIBfn 7GKLG9FN8yxYHY/SBYQaXbfVNFMX+fZ0ANy5YMV5dteMAhAiiPiipX/gpb5U7l4qePEK Uws1+t/5pE4GBPN5PWmI5DqqX6bV8HtAbdMOmY3h+gQppTo5XZneQqLvON2aSj0JoarD 7SjQ== X-Gm-Message-State: AOAM533zBrrGiC7hrDwJjNy++qMoW3i5v2vT7GZCLxSnqJwkUYyLo1R+ QLbscy3QOTNHReta7GmTWTQCCQ== X-Google-Smtp-Source: ABdhPJzPzXxVTthDN+hQPFRum7qKtWk+7iuVEQhlw2MvtD80dIqmvZWCLDEO/Dftxw8Q9gMHZqCsdg== X-Received: by 2002:a65:4903:: with SMTP id p3mr17521023pgs.402.1625516800874; Mon, 05 Jul 2021 13:26:40 -0700 (PDT) Received: from [192.168.1.108] ([177.194.59.218]) by smtp.gmail.com with ESMTPSA id k8sm13847856pfa.142.2021.07.05.13.26.39 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 05 Jul 2021 13:26:40 -0700 (PDT) Subject: Re: [PATCH v6 4/5] io: Add closefrom [BZ #10353] To: Jonathon Anderson , libc-alpha@sourceware.org Cc: John Mellor-Crummey References: <20210623185115.395812-1-adhemerval.zanella@linaro.org> <20210623185115.395812-5-adhemerval.zanella@linaro.org> From: Adhemerval Zanella Message-ID: Date: Mon, 5 Jul 2021 17:26:37 -0300 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.11.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-10.7 required=5.0 tests=BAYES_00, BODY_8BITS, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_LOTSOFHASH, KAM_SHORT, NICE_REPLY_A, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 05 Jul 2021 20:26:44 -0000 On 03/07/2021 11:45, Jonathon Anderson via Libc-alpha wrote: > Hello, > > I'm part of a team at Rice University developing HPCToolkit, a fine-grained sampling-based performance analysis tool. Recently we have been sharing our experiences using the LD_AUDIT framework with the community, this patch series came to our attention as the subject of a long-standing issue we have run into multiple times with our tool. > > The performance measurement portion of our tool runs as a library preloaded into the application, sharing the address space and (more importantly) file descriptors. As mentioned previously in this thread there are a number of projects which at times close all open file descriptors not known to the application (often via iteration over /proc/self/fd). We have found that in most cases this will also close our own file descriptors with no warning, causing us to corrupt and lose valuable measurement data, especially for highly multi-threaded applications. > > In response to this problem we have considered reserving file descriptors above the RLIMIT_NOFILE limit, a technique used by Valgrind [7,8] where file descriptors are opened early in the process and then the limit is lowered below them (Valgrind intercepts getrlimit to report modified limits to the application, we would use setrlimit to lower the limit instead). While this does not remove their entries from /proc/self/fd (they are still valid open file descriptors), a number of projects that close all open file descriptors support this technique by only closing descriptors within the RLIMIT_NOFILE bound (obtained directly or indirectly through getdtablesize) [1][3][4][6]. > > Two of the three implementations of closefrom provided by this patch *do not* support this technique (specific locations marked below). Not only is this inconsistent, it directly affects in-process tools like HPCToolkit as applications shift to using modern Glibc extensions. > > We recommend that this be adjusted before this patch is accepted into upstream Glibc, and that support for this technique be tested in tst-closefrom.c. My initial idea is follow the the current closefrom() semantic provided by some BSDs (FreeBSD and OpenBSD) which do not consult the current limit and just use the higher limit for the syscall. And my understanding of other libraries and programs that use RLIMIT_NOFILE to get the higher file limit is only for *fallback* implementations and to to have a somewhat bounded execution time (the cpython have a comment stating it [1]) The systemd code you posted for instance will *only* check against get_max_fd() iff "/proc/self/fd" can not be opened, otherwise it will interact over *all* opened files regardless. This is also the same for cypthon [2]. I can't really comment on the Rust implementation, but if it only relying to RLIMIT_NOFILE it will most likely suffer for file descriptor leakage in some scenarios (if a library does the same trick as you described). So sorry, but I think this not the best semantic for closefrom() and it also does not help a program that uses close_range. In fact, this specific semantic might indeed make users prefer close_range instead. Valgrind can overwrite it because it make is invisible to user program and libc, and for your tool I think the best course of action would be to interpose closefrom() and clone_range() and exclude the file descriptors you want to keep it opened. [1] https://github.com/python/cpython/blob/9e4f2f3a6b8ee995c365e86d976937c141d867f8/Modules/_posixsubprocess.c#L213 [2] https://github.com/python/cpython/blob/9e4f2f3a6b8ee995c365e86d976937c141d867f8/Modules/_posixsubprocess.c#L281 > > -Jonathon > > [1] https://github.com/systemd/systemd/blob/5238e9575906297608ff802a27e2ff9effa3b338/src/basic/fd-util.c#L203-L204 > [3] https://github.com/python/cpython/blob/9e4f2f3a6b8ee995c365e86d976937c141d867f8/Modules/_posixsubprocess.c#L196-L200 > [4] https://github.com/rust-lang/rust/blob/5f47c0613ed4eb46fca3633c1297364c09e5e451/src/libstd/sys/unix/process2.rs#L161-L164 > [6] https://hg.mozilla.org/mozilla-central/file/tip/ipc/chromium/src/base/process_util_posix.cc#l152 > [7] https://sourceware.org/git/?p=valgrind.git;a=blob;f=coregrind/m_main.c;hb=393732dda164c1cc0fc511eadc0b8f06008ade4f#l1125 > [8] https://sourceware.org/git/?p=valgrind.git;a=blob;f=coregrind/m_syswrap/syswrap-generic.c;hb=393732dda164c1cc0fc511eadc0b8f06008ade4f#l3609 > > On 6/23/21 1:51 PM, Adhemerval Zanella via Libc-alpha wrote: >> diff --git a/sysdeps/unix/sysv/linux/closefrom.c b/sysdeps/unix/sysv/linux/closefrom.c >> new file mode 100644 >> index 0000000000..f5d7342c2c >> --- /dev/null >> +++ b/sysdeps/unix/sysv/linux/closefrom.c >> @@ -0,0 +1,35 @@ >> +/* Close a range of file descriptors.  Linux version. >> +   Copyright (C) 2021 Free Software Foundation, Inc. >> +   This file is part of the GNU C Library. >> + >> +   The GNU C Library is free software; you can redistribute it and/or >> +   modify it under the terms of the GNU Lesser General Public >> +   License as published by the Free Software Foundation; either >> +   version 2.1 of the License, or (at your option) any later version. >> + >> +   The GNU C Library is distributed in the hope that it will be useful, >> +   but WITHOUT ANY WARRANTY; without even the implied warranty of >> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU >> +   Lesser General Public License for more details. >> + >> +   You should have received a copy of the GNU Lesser General Public >> +   License along with the GNU C Library; if not, see >> +   .  */ >> + >> +#include >> +#include >> +#include >> + >> +void >> +__closefrom (int lowfd) >> +{ >> +  int l = MAX (0, lowfd); >> + >> +  int r = __close_range (l, ~0U, 0); > The upper limit `~0U` should be changed to `__getdtablesize()` or similar. >> +  if (r == 0) >> +    return; >> + >> +  if (!__closefrom_fallback (l)) >> +    __fortify_fail ("closefrom failed to close a file descriptor"); >> +} >> +weak_alias (__closefrom, closefrom) >> diff --git a/sysdeps/unix/sysv/linux/closefrom_fallback.c b/sysdeps/unix/sysv/linux/closefrom_fallback.c >> new file mode 100644 >> index 0000000000..61e71d388d >> --- /dev/null >> +++ b/sysdeps/unix/sysv/linux/closefrom_fallback.c >> @@ -0,0 +1,97 @@ >> +/* Close a range of file descriptors.  Linux version. >> +   Copyright (C) 2021 Free Software Foundation, Inc. >> +   This file is part of the GNU C Library. >> + >> +   The GNU C Library is free software; you can redistribute it and/or >> +   modify it under the terms of the GNU Lesser General Public >> +   License as published by the Free Software Foundation; either >> +   version 2.1 of the License, or (at your option) any later version. >> + >> +   The GNU C Library is distributed in the hope that it will be useful, >> +   but WITHOUT ANY WARRANTY; without even the implied warranty of >> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU >> +   Lesser General Public License for more details. >> + >> +   You should have received a copy of the GNU Lesser General Public >> +   License along with the GNU C Library; if not, see >> +   .  */ >> + >> +#include >> +#include >> +#include >> +#include >> + >> +/* Fallback code: iterates over /proc/self/fd, closing each file descriptor >> +   that fall on the criteria.  */ >> +_Bool >> +__closefrom_fallback (int from) >> +{ >> +  bool ret = false; >> + >> +  int dirfd = __open_nocancel (FD_TO_FILENAME_PREFIX, O_RDONLY | O_DIRECTORY, >> +                               0); >> +  if (dirfd == -1) >> +    { >> +      /* The closefrom should work even when process can't open new files.  */ >> +      if (errno == ENOENT) >> +        goto err; >> + >> +      for (int i = from; i < INT_MAX; i++) >> +        { >> +          int r = __close_nocancel (i); >> +          if (r == 0 || (r == -1 && errno != EBADF)) >> +            break; >> +        } >> + >> +      dirfd = __open_nocancel (FD_TO_FILENAME_PREFIX, O_RDONLY | O_DIRECTORY, >> +                               0); >> +      if (dirfd == -1) >> +        goto err; >> +    } >> + >> +  char buffer[1024]; >> +  while (true) >> +    { >> +      ssize_t ret = __getdents64 (dirfd, buffer, sizeof (buffer)); >> +      if (ret == -1) >> +        goto err; >> +      else if (ret == 0) >> +        break; >> + >> +      /* If any file descriptor is closed it resets the /proc/self position >> +         read again from the start (to obtain any possible kernel update).  */ >> +      bool closed = false; >> +      char *begin = buffer, *end = buffer + ret; >> +      while (begin != end) >> +        { >> +          unsigned short int d_reclen; >> +          memcpy (&d_reclen, begin + offsetof (struct dirent64, d_reclen), >> +                  sizeof (d_reclen)); >> +          const char *dname = begin + offsetof (struct dirent64, d_name); >> +          begin += d_reclen; >> + >> +          if (dname[0] == '.') >> +            continue; >> + >> +          int fd = 0; >> +          for (const char *s = dname; (unsigned int) (*s) - '0' < 10; s++) >> +            fd = 10 * fd + (*s - '0'); >> + >> +          if (fd == dirfd || fd < from) >> +            continue; > An extra condition here should check that `fd < __getdtablesize()` or similar. >> + >> +          /* We ignore close errors because EBADF, EINTR, and EIO means the >> +             descriptor has been released.  */ >> +          __close_nocancel (fd); >> +          closed = true; >> +        } >> + >> +      if (closed && __lseek (dirfd, 0, SEEK_SET) < 0) >> +        goto err; >> +    } >> + >> +  ret = true; >> +err: >> +  __close_nocancel (dirfd); >> +  return ret; >> +}