From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0b-0010f301.pphosted.com (mx0b-0010f301.pphosted.com [148.163.153.244]) by sourceware.org (Postfix) with ESMTPS id A32723939C00 for ; Sat, 3 Jul 2021 14:45:24 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org A32723939C00 Received: from pps.filterd (m0102859.ppops.net [127.0.0.1]) by mx0b-0010f301.pphosted.com (8.16.0.43/8.16.0.43) with SMTP id 163EiOOi032021 for ; Sat, 3 Jul 2021 09:45:24 -0500 Received: from mh3.mail.rice.edu (mh3.mail.rice.edu [128.42.199.10]) by mx0b-0010f301.pphosted.com with ESMTP id 39jk8xr74b-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT) for ; Sat, 03 Jul 2021 09:45:23 -0500 Received-X: from mh3.mail.rice.edu (localhost [127.0.0.1]) by mh3.mail.rice.edu (Postfix) with ESMTP id 6C33E604A72 for ; Sat, 3 Jul 2021 09:45:23 -0500 (CDT) Received: from localhost (localhost [127.0.0.1]) by mh3.mail.rice.edu (Postfix) with ESMTP id 6B027604A6A; Sat, 3 Jul 2021 09:45:23 -0500 (CDT) X-Virus-Scanned: by amavis-2.12.1 at mh3.mail.rice.edu, auth channel Received: from mh3.mail.rice.edu ([127.0.0.1]) by localhost (mh3.mail.rice.edu [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id MepXjhDDiDNB; Sat, 3 Jul 2021 09:45:13 -0500 (CDT) Received: from [98.200.161.233] (c-98-200-161-233.hsd1.tx.comcast.net [98.200.161.233]) (using TLSv1.2 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) (Authenticated sender: jma14) by mh3.mail.rice.edu (Postfix) with ESMTPSA id 56E4822BA46; Sat, 3 Jul 2021 09:45:13 -0500 (CDT) Subject: Re: [PATCH v6 4/5] io: Add closefrom [BZ #10353] To: libc-alpha@sourceware.org References: <20210623185115.395812-1-adhemerval.zanella@linaro.org> <20210623185115.395812-5-adhemerval.zanella@linaro.org> Cc: John Mellor-Crummey From: Jonathon Anderson Message-ID: Date: Sat, 3 Jul 2021 09:45:12 -0500 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.11.0 MIME-Version: 1.0 In-Reply-To: <20210623185115.395812-5-adhemerval.zanella@linaro.org> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Content-Language: en-US X-Proofpoint-GUID: e3y3meUqOwgA8SehbPgCij5APpRzb8-j X-Proofpoint-ORIG-GUID: e3y3meUqOwgA8SehbPgCij5APpRzb8-j X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.391, 18.0.790 definitions=2021-07-03_06:2021-07-02, 2021-07-03 signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 adultscore=0 mlxlogscore=999 mlxscore=0 bulkscore=0 lowpriorityscore=0 suspectscore=0 phishscore=0 malwarescore=0 priorityscore=1501 clxscore=1011 spamscore=0 impostorscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2104190000 definitions=main-2107030094 X-Spam-Status: No, score=-10.8 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_LOTSOFHASH, KAM_SHORT, NICE_REPLY_A, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 03 Jul 2021 14:45:26 -0000 Hello, I'm part of a team at Rice University developing HPCToolkit, a fine-grained sampling-based performance analysis tool. Recently we have been sharing our experiences using the LD_AUDIT framework with the community, this patch series came to our attention as the subject of a long-standing issue we have run into multiple times with our tool. The performance measurement portion of our tool runs as a library preloaded into the application, sharing the address space and (more importantly) file descriptors. As mentioned previously in this thread there are a number of projects which at times close all open file descriptors not known to the application (often via iteration over /proc/self/fd). We have found that in most cases this will also close our own file descriptors with no warning, causing us to corrupt and lose valuable measurement data, especially for highly multi-threaded applications. In response to this problem we have considered reserving file descriptors above the RLIMIT_NOFILE limit, a technique used by Valgrind [7,8] where file descriptors are opened early in the process and then the limit is lowered below them (Valgrind intercepts getrlimit to report modified limits to the application, we would use setrlimit to lower the limit instead). While this does not remove their entries from /proc/self/fd (they are still valid open file descriptors), a number of projects that close all open file descriptors support this technique by only closing descriptors within the RLIMIT_NOFILE bound (obtained directly or indirectly through getdtablesize) [1][3][4][6]. Two of the three implementations of closefrom provided by this patch *do not* support this technique (specific locations marked below). Not only is this inconsistent, it directly affects in-process tools like HPCToolkit as applications shift to using modern Glibc extensions. We recommend that this be adjusted before this patch is accepted into upstream Glibc, and that support for this technique be tested in tst-closefrom.c. -Jonathon [1] https://github.com/systemd/systemd/blob/5238e9575906297608ff802a27e2ff9effa3b338/src/basic/fd-util.c#L203-L204 [3] https://github.com/python/cpython/blob/9e4f2f3a6b8ee995c365e86d976937c141d867f8/Modules/_posixsubprocess.c#L196-L200 [4] https://github.com/rust-lang/rust/blob/5f47c0613ed4eb46fca3633c1297364c09e5e451/src/libstd/sys/unix/process2.rs#L161-L164 [6] https://hg.mozilla.org/mozilla-central/file/tip/ipc/chromium/src/base/process_util_posix.cc#l152 [7] https://sourceware.org/git/?p=valgrind.git;a=blob;f=coregrind/m_main.c;hb=393732dda164c1cc0fc511eadc0b8f06008ade4f#l1125 [8] https://sourceware.org/git/?p=valgrind.git;a=blob;f=coregrind/m_syswrap/syswrap-generic.c;hb=393732dda164c1cc0fc511eadc0b8f06008ade4f#l3609 On 6/23/21 1:51 PM, Adhemerval Zanella via Libc-alpha wrote: > diff --git a/sysdeps/unix/sysv/linux/closefrom.c b/sysdeps/unix/sysv/linux/closefrom.c > new file mode 100644 > index 0000000000..f5d7342c2c > --- /dev/null > +++ b/sysdeps/unix/sysv/linux/closefrom.c > @@ -0,0 +1,35 @@ > +/* Close a range of file descriptors. Linux version. > + Copyright (C) 2021 Free Software Foundation, Inc. > + This file is part of the GNU C Library. > + > + The GNU C Library is free software; you can redistribute it and/or > + modify it under the terms of the GNU Lesser General Public > + License as published by the Free Software Foundation; either > + version 2.1 of the License, or (at your option) any later version. > + > + The GNU C Library is distributed in the hope that it will be useful, > + but WITHOUT ANY WARRANTY; without even the implied warranty of > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > + Lesser General Public License for more details. > + > + You should have received a copy of the GNU Lesser General Public > + License along with the GNU C Library; if not, see > + . */ > + > +#include > +#include > +#include > + > +void > +__closefrom (int lowfd) > +{ > + int l = MAX (0, lowfd); > + > + int r = __close_range (l, ~0U, 0); The upper limit `~0U` should be changed to `__getdtablesize()` or similar. > + if (r == 0) > + return; > + > + if (!__closefrom_fallback (l)) > + __fortify_fail ("closefrom failed to close a file descriptor"); > +} > +weak_alias (__closefrom, closefrom) > diff --git a/sysdeps/unix/sysv/linux/closefrom_fallback.c b/sysdeps/unix/sysv/linux/closefrom_fallback.c > new file mode 100644 > index 0000000000..61e71d388d > --- /dev/null > +++ b/sysdeps/unix/sysv/linux/closefrom_fallback.c > @@ -0,0 +1,97 @@ > +/* Close a range of file descriptors. Linux version. > + Copyright (C) 2021 Free Software Foundation, Inc. > + This file is part of the GNU C Library. > + > + The GNU C Library is free software; you can redistribute it and/or > + modify it under the terms of the GNU Lesser General Public > + License as published by the Free Software Foundation; either > + version 2.1 of the License, or (at your option) any later version. > + > + The GNU C Library is distributed in the hope that it will be useful, > + but WITHOUT ANY WARRANTY; without even the implied warranty of > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > + Lesser General Public License for more details. > + > + You should have received a copy of the GNU Lesser General Public > + License along with the GNU C Library; if not, see > + . */ > + > +#include > +#include > +#include > +#include > + > +/* Fallback code: iterates over /proc/self/fd, closing each file descriptor > + that fall on the criteria. */ > +_Bool > +__closefrom_fallback (int from) > +{ > + bool ret = false; > + > + int dirfd = __open_nocancel (FD_TO_FILENAME_PREFIX, O_RDONLY | O_DIRECTORY, > + 0); > + if (dirfd == -1) > + { > + /* The closefrom should work even when process can't open new files. */ > + if (errno == ENOENT) > + goto err; > + > + for (int i = from; i < INT_MAX; i++) > + { > + int r = __close_nocancel (i); > + if (r == 0 || (r == -1 && errno != EBADF)) > + break; > + } > + > + dirfd = __open_nocancel (FD_TO_FILENAME_PREFIX, O_RDONLY | O_DIRECTORY, > + 0); > + if (dirfd == -1) > + goto err; > + } > + > + char buffer[1024]; > + while (true) > + { > + ssize_t ret = __getdents64 (dirfd, buffer, sizeof (buffer)); > + if (ret == -1) > + goto err; > + else if (ret == 0) > + break; > + > + /* If any file descriptor is closed it resets the /proc/self position > + read again from the start (to obtain any possible kernel update). */ > + bool closed = false; > + char *begin = buffer, *end = buffer + ret; > + while (begin != end) > + { > + unsigned short int d_reclen; > + memcpy (&d_reclen, begin + offsetof (struct dirent64, d_reclen), > + sizeof (d_reclen)); > + const char *dname = begin + offsetof (struct dirent64, d_name); > + begin += d_reclen; > + > + if (dname[0] == '.') > + continue; > + > + int fd = 0; > + for (const char *s = dname; (unsigned int) (*s) - '0' < 10; s++) > + fd = 10 * fd + (*s - '0'); > + > + if (fd == dirfd || fd < from) > + continue; An extra condition here should check that `fd < __getdtablesize()` or similar. > + > + /* We ignore close errors because EBADF, EINTR, and EIO means the > + descriptor has been released. */ > + __close_nocancel (fd); > + closed = true; > + } > + > + if (closed && __lseek (dirfd, 0, SEEK_SET) < 0) > + goto err; > + } > + > + ret = true; > +err: > + __close_nocancel (dirfd); > + return ret; > +}