* fstat(2) penalized by using newfstatat(6, "", buf, AT_EMPTY_PATH) @ 2023-09-04 9:55 Mateusz Guzik 2023-09-04 10:08 ` Andreas Schwab 0 siblings, 1 reply; 13+ messages in thread From: Mateusz Guzik @ 2023-09-04 9:55 UTC (permalink / raw) To: libc-alpha; +Cc: Linus Torvalds, Adhemerval Zanella Commit 4d97cc8cf3da ("io: Remove xstat implementations") reimplemented fstat entry point on top of newfstatat (as opposed to newfstat). This comes with a significant performance penalty as it induces a lot of work on kernel side to handle the path, which is additionally slowed down on x86-64 due to SMAP handling. Here are sample results from calling newfstatat vs newfstat on Sapphire Rapids: newfstatat 5088199 newfstat 8540383 (+67%) Are there any problems switching it back to newfstat at least for x86-64? If you want to bench yourself you can grab will-it-scale + https://github.com/antonblanchard/will-it-scale/pull/35/files + patch up one of the testcases to call newfstat directly: int error = syscall(5, fd, &sb); Note if you bench yourself and have a CPU significantly impacted by mitigations (e.g., meltdown) the difference may be very small in your setup. -- Mateusz Guzik <mjguzik gmail.com> ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: fstat(2) penalized by using newfstatat(6, "", buf, AT_EMPTY_PATH) 2023-09-04 9:55 fstat(2) penalized by using newfstatat(6, "", buf, AT_EMPTY_PATH) Mateusz Guzik @ 2023-09-04 10:08 ` Andreas Schwab 2023-09-04 10:11 ` Mateusz Guzik 0 siblings, 1 reply; 13+ messages in thread From: Andreas Schwab @ 2023-09-04 10:08 UTC (permalink / raw) To: Mateusz Guzik via Libc-alpha; +Cc: Mateusz Guzik, Linus Torvalds On Sep 04 2023, Mateusz Guzik via Libc-alpha wrote: > Commit 4d97cc8cf3da ("io: Remove xstat implementations") reimplemented > fstat entry point on top of newfstatat (as opposed to newfstat). You are looking at the wrong commit. See 30f1c74394 instead. -- Andreas Schwab, SUSE Labs, schwab@suse.de GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE 1748 E4D4 88E3 0EEA B9D7 "And now for something completely different." ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: fstat(2) penalized by using newfstatat(6, "", buf, AT_EMPTY_PATH) 2023-09-04 10:08 ` Andreas Schwab @ 2023-09-04 10:11 ` Mateusz Guzik 2023-09-05 13:01 ` Adhemerval Zanella Netto 0 siblings, 1 reply; 13+ messages in thread From: Mateusz Guzik @ 2023-09-04 10:11 UTC (permalink / raw) To: Andreas Schwab; +Cc: Mateusz Guzik via Libc-alpha, Linus Torvalds On 9/4/23, Andreas Schwab <schwab@suse.de> wrote: > On Sep 04 2023, Mateusz Guzik via Libc-alpha wrote: > >> Commit 4d97cc8cf3da ("io: Remove xstat implementations") reimplemented >> fstat entry point on top of newfstatat (as opposed to newfstat). > > You are looking at the wrong commit. See 30f1c74394 instead. > I still got the right person and the question stands. ;) -- Mateusz Guzik <mjguzik gmail.com> ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: fstat(2) penalized by using newfstatat(6, "", buf, AT_EMPTY_PATH) 2023-09-04 10:11 ` Mateusz Guzik @ 2023-09-05 13:01 ` Adhemerval Zanella Netto 2023-09-05 13:14 ` Mateusz Guzik 0 siblings, 1 reply; 13+ messages in thread From: Adhemerval Zanella Netto @ 2023-09-05 13:01 UTC (permalink / raw) To: Mateusz Guzik, Andreas Schwab, Rich Felker Cc: Mateusz Guzik via Libc-alpha, Linus Torvalds On 04/09/23 07:11, Mateusz Guzik via Libc-alpha wrote: > On 9/4/23, Andreas Schwab <schwab@suse.de> wrote: >> On Sep 04 2023, Mateusz Guzik via Libc-alpha wrote: >> >>> Commit 4d97cc8cf3da ("io: Remove xstat implementations") reimplemented >>> fstat entry point on top of newfstatat (as opposed to newfstat). >> >> You are looking at the wrong commit. See 30f1c74394 instead. >> > > I still got the right person and the question stands. ;) > The glibc has started to implement some symbols over more generic syscalls because it simplifies *a lot* the required code (just check how messy was the stat family implementation back then). So I would really like to avoid the need to get back on arch-specific syscall to fix very specific corner cases. If I understand correctly, the issue seems to be the usage of a empty string sentinel ("") to indicate the argument it not really used (which trigger all the SMAP issues since kernel will always try to copy the argument from userland). This also means on x86_64 you will also have this performance penalty on syscalls that use AT_EMPTY_PATH (i.e, execveat, name_to_handle_at, open_tree, faccessat, etc.). I really think it would better fixed in the kernel instead of adding extra constraints for the userland. CCing Rich, from musl, which seems to be also affected it and to have another view from a libc implementation. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: fstat(2) penalized by using newfstatat(6, "", buf, AT_EMPTY_PATH) 2023-09-05 13:01 ` Adhemerval Zanella Netto @ 2023-09-05 13:14 ` Mateusz Guzik 2023-09-05 17:28 ` Adhemerval Zanella Netto 2023-09-05 17:29 ` Linus Torvalds 0 siblings, 2 replies; 13+ messages in thread From: Mateusz Guzik @ 2023-09-05 13:14 UTC (permalink / raw) To: Adhemerval Zanella Netto Cc: Andreas Schwab, Rich Felker, Mateusz Guzik via Libc-alpha, Linus Torvalds On 9/5/23, Adhemerval Zanella Netto <adhemerval.zanella@linaro.org> wrote: > If I understand correctly, the issue seems to be the usage of a empty string > > sentinel ("") to indicate the argument it not really used (which trigger > all > the SMAP issues since kernel will always try to copy the argument from > userland). This also means on x86_64 you will also have this performance > penalty on syscalls that use AT_EMPTY_PATH (i.e, execveat, > name_to_handle_at, > open_tree, faccessat, etc.). I really think it would better fixed in the > kernel instead of adding extra constraints for the userland. > I completely agree this is a problem going way past fstat. One could be tempted to allow NULL with the flag, but that wont work -- I know of code out there which checks for statx availability by deliberately passing a NULL path. I would not be shocked if there was more of the sort and passing the AT_EMPTY_PATH flag on top. I am considering proposing a new flag which combined with NULL path would indicate there is only a fd lookup to perform -- fuly backwards compatible and avoiding the problem. Then syscalls could start supporting it over time as people get around to it. However, the fstab stub in glibc does not have to suffer it regardless of what happens with the above. -- Mateusz Guzik <mjguzik gmail.com> ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: fstat(2) penalized by using newfstatat(6, "", buf, AT_EMPTY_PATH) 2023-09-05 13:14 ` Mateusz Guzik @ 2023-09-05 17:28 ` Adhemerval Zanella Netto 2023-09-05 17:45 ` Linus Torvalds 2023-09-05 17:29 ` Linus Torvalds 1 sibling, 1 reply; 13+ messages in thread From: Adhemerval Zanella Netto @ 2023-09-05 17:28 UTC (permalink / raw) To: Mateusz Guzik Cc: Andreas Schwab, Rich Felker, Mateusz Guzik via Libc-alpha, Linus Torvalds On 05/09/23 10:14, Mateusz Guzik wrote: > On 9/5/23, Adhemerval Zanella Netto <adhemerval.zanella@linaro.org> wrote: >> If I understand correctly, the issue seems to be the usage of a empty string >> >> sentinel ("") to indicate the argument it not really used (which trigger >> all >> the SMAP issues since kernel will always try to copy the argument from >> userland). This also means on x86_64 you will also have this performance >> penalty on syscalls that use AT_EMPTY_PATH (i.e, execveat, >> name_to_handle_at, >> open_tree, faccessat, etc.). I really think it would better fixed in the >> kernel instead of adding extra constraints for the userland. >> > > I completely agree this is a problem going way past fstat. > > One could be tempted to allow NULL with the flag, but that wont work > -- I know of code out there which checks for statx availability by > deliberately passing a NULL path. I would not be shocked if there was > more of the sort and passing the AT_EMPTY_PATH flag on top. I though about it, but besides being a clear kABI breakage it does not help on older kernels (where fstat returns EFAULT for NULL). I am not sure about how this statx availability would work, passing NULL would returns EFAULT in both statx and old stat cases. > > I am considering proposing a new flag which combined with NULL path > would indicate there is only a fd lookup to perform -- fuly backwards > compatible and avoiding the problem. Then syscalls could start > supporting it over time as people get around to it> > However, the fstab stub in glibc does not have to suffer it regardless > of what happens with the above. I think we can still make it in a generic way, stat family would use more syscall (which some filters might complain) but it should be ok: diff --git a/sysdeps/unix/sysv/linux/fstatat64.c b/sysdeps/unix/sysv/linux/fstatat64.c index 3509d3ca6d..9fc7f82db2 100644 --- a/sysdeps/unix/sysv/linux/fstatat64.c +++ b/sysdeps/unix/sysv/linux/fstatat64.c @@ -91,20 +91,30 @@ fstatat64_time64_stat (int fd, const char *file, struct __stat64_t64 *buf, int flag) { int r; + bool is_fstat = flag == AT_EMPTY_PATH && fd >= 0 && file[0] == '\0'; #if XSTAT_IS_XSTAT64 # ifdef __NR_newfstatat /* 64-bit kABI, e.g. aarch64, ia64, powerpc64*, s390x, riscv64, and x86_64. */ - r = INTERNAL_SYSCALL_CALL (newfstatat, fd, file, buf, flag); + if (is_fstat) + r = INTERNAL_SYSCALL_CALL (fstat, fd, buf); + else + r = INTERNAL_SYSCALL_CALL (newfstatat, fd, file, buf, flag); # elif defined __NR_fstatat64 # if STAT64_IS_KERNEL_STAT64 /* 64-bit kABI outlier, e.g. alpha */ - r = INTERNAL_SYSCALL_CALL (fstatat64, fd, file, buf, flag); + if (is_fstat) + r = INTERNAL_SYSCALL_CALL (fstat64, fd, buf); + else + r = INTERNAL_SYSCALL_CALL (fstatat64, fd, file, buf, flag); # else /* 64-bit kABI outlier, e.g. sparc64. */ struct kernel_stat64 kst64; - r = INTERNAL_SYSCALL_CALL (fstatat64, fd, file, &kst64, flag); + if (is_fstat) + r = INTERNAL_SYSCALL_CALL (fstat64, fd, &kst64); + else + r = INTERNAL_SYSCALL_CALL (fstatat64, fd, file, &kst64, flag); if (r == 0) __cp_stat64_kstat64 (buf, &kst64); # endif @@ -115,7 +125,10 @@ fstatat64_time64_stat (int fd, const char *file, struct __stat64_t64 *buf, e.g. arm, csky, i386, hppa, m68k, microblaze, nios2, sh, powerpc32, and sparc32. */ struct stat64 st64; - r = INTERNAL_SYSCALL_CALL (fstatat64, fd, file, &st64, flag); + if (is_fstat) + r = INTERNAL_SYSCALL_CALL (fstat64, fd, &st64); + else + r = INTERNAL_SYSCALL_CALL (fstatat64, fd, file, &st64, flag); if (r == 0) { /* Clear both pad and reserved fields. */ @@ -138,7 +151,10 @@ fstatat64_time64_stat (int fd, const char *file, struct __stat64_t64 *buf, # else /* 64-bit kabi outlier, e.g. mips64 and mips64-n32. */ struct kernel_stat kst; - r = INTERNAL_SYSCALL_CALL (newfstatat, fd, file, &kst, flag); + if (is_fstat) + r = INTERNAL_SYSCALL_CALL (fstat, fd, &kst); + else + r = INTERNAL_SYSCALL_CALL (newfstatat, fd, file, &kst, flag); if (r == 0) __cp_kstat_stat64_t64 (&kst, buf); # endif ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: fstat(2) penalized by using newfstatat(6, "", buf, AT_EMPTY_PATH) 2023-09-05 17:28 ` Adhemerval Zanella Netto @ 2023-09-05 17:45 ` Linus Torvalds 2023-09-05 18:22 ` Adhemerval Zanella Netto 2023-09-05 21:42 ` Rich Felker 0 siblings, 2 replies; 13+ messages in thread From: Linus Torvalds @ 2023-09-05 17:45 UTC (permalink / raw) To: Adhemerval Zanella Netto Cc: Mateusz Guzik, Andreas Schwab, Rich Felker, Mateusz Guzik via Libc-alpha On Tue, 5 Sept 2023 at 10:28, Adhemerval Zanella Netto <adhemerval.zanella@linaro.org> wrote: > > I think we can still make it in a generic way, stat family would use more > syscall (which some filters might complain) but it should be ok: Ugh. That patch of yours certainly seems to avoid the kernel overhead (where that "fetch a byte from user space just to check it is '\0'" is much more expensive than it is in user space), but why does glibc do that whole "turn fstat() into fstatat(), and then turn it back again"? It seems just stupid. If the user asked for 'fstat()', just give the user 'fstat()'. Your patch literally adds *complexity*, rather than take it away. Is this all just because glibc did its own 'struct stat' implementation way back when, and you want to have just one place where the stat conversion is done? Can't you just special case *that* case instead? It's long been irrelevant. Afaik, on 32-bit x86, you just want to use the 'stat64' system call family, and on 64-bit x86 you just use the regular 'stat' family (yes, called 'newfstatat' for that one system call for internal historial reasons), and there is no conversion needed. IOW, all those wrappers just for 'struct stat' conversion are pure overhead and silly garbage. Why do they exist? And if there is still some other reason for them to exist, can't you make that be a 'sysdeps' file of its own, and not penalize all the sane cases with 'copy stat structures around'? Linus ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: fstat(2) penalized by using newfstatat(6, "", buf, AT_EMPTY_PATH) 2023-09-05 17:45 ` Linus Torvalds @ 2023-09-05 18:22 ` Adhemerval Zanella Netto 2023-09-05 19:16 ` Adhemerval Zanella Netto 2023-09-05 21:42 ` Rich Felker 1 sibling, 1 reply; 13+ messages in thread From: Adhemerval Zanella Netto @ 2023-09-05 18:22 UTC (permalink / raw) To: Linus Torvalds Cc: Mateusz Guzik, Andreas Schwab, Rich Felker, Mateusz Guzik via Libc-alpha On 05/09/23 14:45, Linus Torvalds wrote: > On Tue, 5 Sept 2023 at 10:28, Adhemerval Zanella Netto > <adhemerval.zanella@linaro.org> wrote: >> >> I think we can still make it in a generic way, stat family would use more >> syscall (which some filters might complain) but it should be ok: > > Ugh. That patch of yours certainly seems to avoid the kernel overhead > (where that "fetch a byte from user space just to check it is '\0'" is > much more expensive than it is in user space), but why does glibc do > that whole "turn fstat() into fstatat(), and then turn it back again"? > > It seems just stupid. If the user asked for 'fstat()', just give the > user 'fstat()'. > > Your patch literally adds *complexity*, rather than take it away. > > Is this all just because glibc did its own 'struct stat' > implementation way back when, and you want to have just one place > where the stat conversion is done? > > Can't you just special case *that* case instead? It's long been > irrelevant. Afaik, on 32-bit x86, you just want to use the 'stat64' > system call family, and on 64-bit x86 you just use the regular 'stat' > family (yes, called 'newfstatat' for that one system call for internal > historial reasons), and there is no conversion needed. > > IOW, all those wrappers just for 'struct stat' conversion are pure > overhead and silly garbage. Why do they exist? And if there is still > some other reason for them to exist, can't you make that be a > 'sysdeps' file of its own, and not penalize all the sane cases with > 'copy stat structures around'? The old glibc stat code was a complete mess that was added prior symbol versioning, with multiple implementations, along with a static linkage that defines the current version (there is no need to describe it further). The consolidation also provide a much simpler way to support y2038. On x86_64 and some other 64 bits ABIs where glibc exported struct stat is the same as the kernel, there is no conversion whatsoever. It is only required on some ABI where due historical mistakes the exported user struct does not match kernel (sparc64, mips64) and on 32 bits where the no-LFS support is just routed to LFS interface to simplify things. So the fstat to fstatat approach is a way to simplify the code and put all the required syscall logic only at fstatat implementation. By adding the 'special' case back on fstat.c it would require to replicate all this logic plus when to use statx instead (FSTATAT_USE_STATX), which is far from ideal. It is doable, but not really simpler... ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: fstat(2) penalized by using newfstatat(6, "", buf, AT_EMPTY_PATH) 2023-09-05 18:22 ` Adhemerval Zanella Netto @ 2023-09-05 19:16 ` Adhemerval Zanella Netto 2023-09-05 19:21 ` Linus Torvalds 0 siblings, 1 reply; 13+ messages in thread From: Adhemerval Zanella Netto @ 2023-09-05 19:16 UTC (permalink / raw) To: Linus Torvalds Cc: Mateusz Guzik, Andreas Schwab, Rich Felker, Mateusz Guzik via Libc-alpha On 05/09/23 15:22, Adhemerval Zanella Netto wrote: > > > On 05/09/23 14:45, Linus Torvalds wrote: >> On Tue, 5 Sept 2023 at 10:28, Adhemerval Zanella Netto >> <adhemerval.zanella@linaro.org> wrote: >>> >>> I think we can still make it in a generic way, stat family would use more >>> syscall (which some filters might complain) but it should be ok: >> >> Ugh. That patch of yours certainly seems to avoid the kernel overhead >> (where that "fetch a byte from user space just to check it is '\0'" is >> much more expensive than it is in user space), but why does glibc do >> that whole "turn fstat() into fstatat(), and then turn it back again"? >> >> It seems just stupid. If the user asked for 'fstat()', just give the >> user 'fstat()'. >> >> Your patch literally adds *complexity*, rather than take it away. >> >> Is this all just because glibc did its own 'struct stat' >> implementation way back when, and you want to have just one place >> where the stat conversion is done? >> >> Can't you just special case *that* case instead? It's long been >> irrelevant. Afaik, on 32-bit x86, you just want to use the 'stat64' >> system call family, and on 64-bit x86 you just use the regular 'stat' >> family (yes, called 'newfstatat' for that one system call for internal >> historial reasons), and there is no conversion needed. >> >> IOW, all those wrappers just for 'struct stat' conversion are pure >> overhead and silly garbage. Why do they exist? And if there is still >> some other reason for them to exist, can't you make that be a >> 'sysdeps' file of its own, and not penalize all the sane cases with >> 'copy stat structures around'? > > The old glibc stat code was a complete mess that was added prior symbol > versioning, with multiple implementations, along with a static linkage > that defines the current version (there is no need to describe it > further). The consolidation also provide a much simpler way to support > y2038. > > On x86_64 and some other 64 bits ABIs where glibc exported struct stat > is the same as the kernel, there is no conversion whatsoever. It is > only required on some ABI where due historical mistakes the exported > user struct does not match kernel (sparc64, mips64) and on 32 bits > where the no-LFS support is just routed to LFS interface to simplify > things. > > So the fstat to fstatat approach is a way to simplify the code and put > all the required syscall logic only at fstatat implementation. By adding > the 'special' case back on fstat.c it would require to replicate all this > logic plus when to use statx instead (FSTATAT_USE_STATX), which is far > from ideal. It is doable, but not really simpler... Below it is the stat logic implemented directly on stat64.c code. It avoids the "flag == AT_EMPTY_PATH ..." check, but it only handles LFS calls where statx is not required (so 32 bit architectures will still continue to use statx, but I don't think this would be a problem). It not really simpler than add the logic on fstatat64.c, but it should be ok. diff --git a/sysdeps/unix/sysv/linux/fstat64.c b/sysdeps/unix/sysv/linux/fstat64.c index 124384e57f..a7f4935a01 100644 --- a/sysdeps/unix/sysv/linux/fstat64.c +++ b/sysdeps/unix/sysv/linux/fstat64.c @@ -19,20 +19,53 @@ #define __fstat __redirect___fstat #define fstat __redirect_fstat #include <sys/stat.h> +#undef __fstat +#undef fstat #include <fcntl.h> -#include <kernel_stat.h> -#include <stat_t64_cp.h> +#include <internal-stat.h> #include <errno.h> int __fstat64_time64 (int fd, struct __stat64_t64 *buf) { +#if !FSTATAT_USE_STATX +# if XSTAT_IS_XSTAT64 +# ifdef __NR_newfstatat + /* 64-bit kABI, e.g. aarch64, ia64, powerpc64*, s390x, riscv64, and + x86_64. */ + return INLINE_SYSCALL_CALL (fstat, fd, buf); +# elif defined __NR_fstatat64 +# if STAT64_IS_KERNEL_STAT64 + /* 64-bit kABI outlier, e.g. alpha */ + return INLINE_SYSCALL_CALL (fstat64, fd, buf); +# else + /* 64-bit kABI outlier, e.g. sparc64. */ + struct kernel_stat64 kst64; + int r = INLINE_SYSCALL_CALL (fstat64, fd, &kst64); + if (r == 0) + __cp_stat64_kstat64 (buf, &kst64); + return r; +# endif /* STAT64_IS_KERNEL_STAT64 */ +# endif +# else /* XSTAT_IS_XSTAT64 */ + /* 64-bit kabi outlier, e.g. mips64 and mips64-n32. */ + struct kernel_stat kst; + int r = INLINE_SYSCALL_CALL (fstat, fd, &kst); + if (r == 0) + __cp_kstat_stat64_t64 (&kst, buf); + return r; +# endif +#else /* !FSTATAT_USE_STATX */ + /* All kABIs with non-LFS support and with old 32-bit time_t support + e.g. arm, csky, i386, hppa, m68k, microblaze, nios2, sh, powerpc32, + and sparc32. */ if (fd < 0) { __set_errno (EBADF); return -1; } return __fstatat64_time64 (fd, "", buf, AT_EMPTY_PATH); +#endif } #if __TIMESIZE != 64 hidden_def (__fstat64_time64) diff --git a/sysdeps/unix/sysv/linux/fstatat64.c b/sysdeps/unix/sysv/linux/fstatat64.c index 3509d3ca6d..127c6ff601 100644 --- a/sysdeps/unix/sysv/linux/fstatat64.c +++ b/sysdeps/unix/sysv/linux/fstatat64.c @@ -21,12 +21,10 @@ #include <sys/stat.h> #include <fcntl.h> #include <string.h> -#include <kernel_stat.h> #include <sysdep.h> #include <time.h> -#include <kstat_cp.h> -#include <stat_t64_cp.h> #include <sys/sysmacros.h> +#include <internal-stat.h> #if __TIMESIZE == 64 \ && (__WORDSIZE == 32 \ @@ -40,11 +38,7 @@ _Static_assert (sizeof (__blkcnt_t) == sizeof (__blkcnt64_t), "__blkcnt_t and __blkcnt64_t must match"); #endif -#if (__WORDSIZE == 32 \ - && (!defined __SYSCALL_WORDSIZE || __SYSCALL_WORDSIZE == 32)) \ - || defined STAT_HAS_TIME32 \ - || (!defined __NR_newfstatat && !defined __NR_fstatat64) -# define FSTATAT_USE_STATX 1 +#if FSTATAT_USE_STATX static inline int fstatat64_time64_statx (int fd, const char *file, struct __stat64_t64 *buf, @@ -79,8 +73,6 @@ fstatat64_time64_statx (int fd, const char *file, struct __stat64_t64 *buf, return r; } -#else -# define FSTATAT_USE_STATX 0 #endif /* Only statx supports 64-bit timestamps for 32-bit architectures with diff --git a/sysdeps/unix/sysv/linux/internal-stat.h b/sysdeps/unix/sysv/linux/internal-stat.h new file mode 100644 index 0000000000..e3b0569853 --- /dev/null +++ b/sysdeps/unix/sysv/linux/internal-stat.h @@ -0,0 +1,31 @@ +/* Internal stat definitions. + Copyright (C) 2023 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + <https://www.gnu.org/licenses/>. */ + +#include <sysdep.h> +#include <stat_t64_cp.h> +#include <kernel_stat.h> +#include <kstat_cp.h> + +#if (__WORDSIZE == 32 \ + && (!defined __SYSCALL_WORDSIZE || __SYSCALL_WORDSIZE == 32)) \ + || defined STAT_HAS_TIME32 \ + || (!defined __NR_newfstatat && !defined __NR_fstatat64) +# define FSTATAT_USE_STATX 1 +#else +# define FSTATAT_USE_STATX 0 +#endif ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: fstat(2) penalized by using newfstatat(6, "", buf, AT_EMPTY_PATH) 2023-09-05 19:16 ` Adhemerval Zanella Netto @ 2023-09-05 19:21 ` Linus Torvalds 0 siblings, 0 replies; 13+ messages in thread From: Linus Torvalds @ 2023-09-05 19:21 UTC (permalink / raw) To: Adhemerval Zanella Netto Cc: Mateusz Guzik, Andreas Schwab, Rich Felker, Mateusz Guzik via Libc-alpha On Tue, 5 Sept 2023 at 12:17, Adhemerval Zanella Netto <adhemerval.zanella@linaro.org> wrote: > > Below it is the stat logic implemented directly on stat64.c code. I was looking at that exact thing and going "if this just had FSTATAT_USE_STATX it would all work easily". You seem to have fixed that by just moving the FSTATAT_USE_STATX logic into a common header file. IOW, this looks good to me. > It avoids > the "flag == AT_EMPTY_PATH ..." check, but it only handles LFS calls > where statx is not required (so 32 bit architectures will still continue to > use statx, but I don't think this would be a problem). I think that given the existing logic, that's exactly the right thing to do. Let's face it, 32-bit is getting to be pretty irrelevant. Thanks, Linus ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: fstat(2) penalized by using newfstatat(6, "", buf, AT_EMPTY_PATH) 2023-09-05 17:45 ` Linus Torvalds 2023-09-05 18:22 ` Adhemerval Zanella Netto @ 2023-09-05 21:42 ` Rich Felker 2023-09-05 21:46 ` Mateusz Guzik 1 sibling, 1 reply; 13+ messages in thread From: Rich Felker @ 2023-09-05 21:42 UTC (permalink / raw) To: Linus Torvalds Cc: Adhemerval Zanella Netto, Mateusz Guzik, Andreas Schwab, Mateusz Guzik via Libc-alpha On Tue, Sep 05, 2023 at 10:45:17AM -0700, Linus Torvalds wrote: > On Tue, 5 Sept 2023 at 10:28, Adhemerval Zanella Netto > <adhemerval.zanella@linaro.org> wrote: > > > > I think we can still make it in a generic way, stat family would use more > > syscall (which some filters might complain) but it should be ok: > > Ugh. That patch of yours certainly seems to avoid the kernel overhead > (where that "fetch a byte from user space just to check it is '\0'" is > much more expensive than it is in user space), but why does glibc do > that whole "turn fstat() into fstatat(), and then turn it back again"? > > It seems just stupid. If the user asked for 'fstat()', just give the > user 'fstat()'. We do it in musl, but we also do the "turn it back again" in userspace. musl's __fstatat_kstat checks if AT_EMPTY_PATH is set, and if so, calls SYS_fstat. I don't see why glibc couldn't do the same. However, we don't do this, and this does not seem to be possible, in the case where statx has to be used (32-bit archs). Is there an analogous issue for statx? Rich ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: fstat(2) penalized by using newfstatat(6, "", buf, AT_EMPTY_PATH) 2023-09-05 21:42 ` Rich Felker @ 2023-09-05 21:46 ` Mateusz Guzik 0 siblings, 0 replies; 13+ messages in thread From: Mateusz Guzik @ 2023-09-05 21:46 UTC (permalink / raw) To: Rich Felker Cc: Linus Torvalds, Adhemerval Zanella Netto, Andreas Schwab, Mateusz Guzik via Libc-alpha On 9/5/23, Rich Felker <dalias@libc.org> wrote: > On Tue, Sep 05, 2023 at 10:45:17AM -0700, Linus Torvalds wrote: >> On Tue, 5 Sept 2023 at 10:28, Adhemerval Zanella Netto >> <adhemerval.zanella@linaro.org> wrote: >> > >> > I think we can still make it in a generic way, stat family would use >> > more >> > syscall (which some filters might complain) but it should be ok: >> >> Ugh. That patch of yours certainly seems to avoid the kernel overhead >> (where that "fetch a byte from user space just to check it is '\0'" is >> much more expensive than it is in user space), but why does glibc do >> that whole "turn fstat() into fstatat(), and then turn it back again"? >> >> It seems just stupid. If the user asked for 'fstat()', just give the >> user 'fstat()'. > > We do it in musl, but we also do the "turn it back again" in > userspace. musl's __fstatat_kstat checks if AT_EMPTY_PATH is set, and > if so, calls SYS_fstat. I don't see why glibc couldn't do the same. > > However, we don't do this, and this does not seem to be possible, in > the case where statx has to be used (32-bit archs). Is there an > analogous issue for statx? > Yes and *currently* statx does not have a way out, but I'm going to propose some patches to Linux about it in foreseeable future. -- Mateusz Guzik <mjguzik gmail.com> ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: fstat(2) penalized by using newfstatat(6, "", buf, AT_EMPTY_PATH) 2023-09-05 13:14 ` Mateusz Guzik 2023-09-05 17:28 ` Adhemerval Zanella Netto @ 2023-09-05 17:29 ` Linus Torvalds 1 sibling, 0 replies; 13+ messages in thread From: Linus Torvalds @ 2023-09-05 17:29 UTC (permalink / raw) To: Mateusz Guzik Cc: Adhemerval Zanella Netto, Andreas Schwab, Rich Felker, Mateusz Guzik via Libc-alpha On Tue, 5 Sept 2023 at 06:14, Mateusz Guzik <mjguzik@gmail.com> wrote: > > I completely agree this is a problem going way past fstat. Realistically, fstatat() is likely the only case that matters from any performance angle because it's the only one likely to be used a lot in real loads. Sure, there are other "at" system calls, but they just aren't important from a performance angle. I don't think we've ever seen 'fchown/fchmod()' be a major performance issue, and if it was, the cost is elsewhere (ie the writeback of the changed inode), so if glibc were to translate it to 'fchownat/fchmodat()' with AT_EMPTY_PATH, it really wouldn't matter. In contrast, there are tons of loads where 'fstat()' is a noticeable part of the load, because the "open+fstat" pattern is simply fundamental Unix code. So converting it to 'fstatat()' is simply *bad*. Right now the kernel does even more than it needs to do (ie it does the whole pathname handling, because I certainly didn't expect AT_EMPTY_PATH to be a *hot* path), but as Mateusz says, even with that all short-circuited (we have a trivial patch to do just that), just the cost of checking "is it actually empty" is noticeable because of the security boundary issue. Linus ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2023-09-05 21:46 UTC | newest] Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2023-09-04 9:55 fstat(2) penalized by using newfstatat(6, "", buf, AT_EMPTY_PATH) Mateusz Guzik 2023-09-04 10:08 ` Andreas Schwab 2023-09-04 10:11 ` Mateusz Guzik 2023-09-05 13:01 ` Adhemerval Zanella Netto 2023-09-05 13:14 ` Mateusz Guzik 2023-09-05 17:28 ` Adhemerval Zanella Netto 2023-09-05 17:45 ` Linus Torvalds 2023-09-05 18:22 ` Adhemerval Zanella Netto 2023-09-05 19:16 ` Adhemerval Zanella Netto 2023-09-05 19:21 ` Linus Torvalds 2023-09-05 21:42 ` Rich Felker 2023-09-05 21:46 ` Mateusz Guzik 2023-09-05 17:29 ` Linus Torvalds
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).