public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
* fstat(2) penalized by using newfstatat(6, "", buf, AT_EMPTY_PATH)
@ 2023-09-04  9:55 Mateusz Guzik
  2023-09-04 10:08 ` Andreas Schwab
  0 siblings, 1 reply; 13+ messages in thread
From: Mateusz Guzik @ 2023-09-04  9:55 UTC (permalink / raw)
  To: libc-alpha; +Cc: Linus Torvalds, Adhemerval Zanella

Commit 4d97cc8cf3da ("io: Remove xstat implementations") reimplemented
fstat entry point on top of newfstatat (as opposed to newfstat).

This comes with a significant performance penalty as it induces a lot
of work on kernel side to handle the path, which is additionally
slowed down on x86-64 due to SMAP handling.

Here are sample results from calling newfstatat vs newfstat on Sapphire Rapids:
newfstatat 5088199
newfstat   8540383 (+67%)

Are there any problems switching it back to newfstat at least for x86-64?

If you want to bench yourself you can grab will-it-scale +
https://github.com/antonblanchard/will-it-scale/pull/35/files + patch
up one of the testcases to call newfstat directly: int error =
syscall(5, fd, &sb);

Note if you bench yourself and have a CPU significantly impacted by
mitigations (e.g., meltdown) the difference may be very small in your
setup.

-- 
Mateusz Guzik <mjguzik gmail.com>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: fstat(2) penalized by using newfstatat(6, "", buf, AT_EMPTY_PATH)
  2023-09-04  9:55 fstat(2) penalized by using newfstatat(6, "", buf, AT_EMPTY_PATH) Mateusz Guzik
@ 2023-09-04 10:08 ` Andreas Schwab
  2023-09-04 10:11   ` Mateusz Guzik
  0 siblings, 1 reply; 13+ messages in thread
From: Andreas Schwab @ 2023-09-04 10:08 UTC (permalink / raw)
  To: Mateusz Guzik via Libc-alpha; +Cc: Mateusz Guzik, Linus Torvalds

On Sep 04 2023, Mateusz Guzik via Libc-alpha wrote:

> Commit 4d97cc8cf3da ("io: Remove xstat implementations") reimplemented
> fstat entry point on top of newfstatat (as opposed to newfstat).

You are looking at the wrong commit.  See 30f1c74394 instead.

-- 
Andreas Schwab, SUSE Labs, schwab@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: fstat(2) penalized by using newfstatat(6, "", buf, AT_EMPTY_PATH)
  2023-09-04 10:08 ` Andreas Schwab
@ 2023-09-04 10:11   ` Mateusz Guzik
  2023-09-05 13:01     ` Adhemerval Zanella Netto
  0 siblings, 1 reply; 13+ messages in thread
From: Mateusz Guzik @ 2023-09-04 10:11 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: Mateusz Guzik via Libc-alpha, Linus Torvalds

On 9/4/23, Andreas Schwab <schwab@suse.de> wrote:
> On Sep 04 2023, Mateusz Guzik via Libc-alpha wrote:
>
>> Commit 4d97cc8cf3da ("io: Remove xstat implementations") reimplemented
>> fstat entry point on top of newfstatat (as opposed to newfstat).
>
> You are looking at the wrong commit.  See 30f1c74394 instead.
>

I still got the right person and the question stands. ;)

-- 
Mateusz Guzik <mjguzik gmail.com>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: fstat(2) penalized by using newfstatat(6, "", buf, AT_EMPTY_PATH)
  2023-09-04 10:11   ` Mateusz Guzik
@ 2023-09-05 13:01     ` Adhemerval Zanella Netto
  2023-09-05 13:14       ` Mateusz Guzik
  0 siblings, 1 reply; 13+ messages in thread
From: Adhemerval Zanella Netto @ 2023-09-05 13:01 UTC (permalink / raw)
  To: Mateusz Guzik, Andreas Schwab, Rich Felker
  Cc: Mateusz Guzik via Libc-alpha, Linus Torvalds



On 04/09/23 07:11, Mateusz Guzik via Libc-alpha wrote:
> On 9/4/23, Andreas Schwab <schwab@suse.de> wrote:
>> On Sep 04 2023, Mateusz Guzik via Libc-alpha wrote:
>>
>>> Commit 4d97cc8cf3da ("io: Remove xstat implementations") reimplemented
>>> fstat entry point on top of newfstatat (as opposed to newfstat).
>>
>> You are looking at the wrong commit.  See 30f1c74394 instead.
>>
> 
> I still got the right person and the question stands. ;)
> 

The glibc has started to implement some symbols over more generic syscalls
because it simplifies *a lot* the required code (just check how messy was the
stat family implementation back then).  So I would really like to avoid the
need to get back on arch-specific syscall to fix very specific corner cases.

If I understand correctly, the issue seems to be the usage of a empty string 
sentinel ("") to indicate the argument it not really used (which trigger all
the SMAP issues since kernel will always try to copy the argument from
userland).  This also means on x86_64 you will also have this performance
penalty on syscalls that use AT_EMPTY_PATH (i.e, execveat, name_to_handle_at,
open_tree, faccessat, etc.).  I really think it would better fixed in the
kernel instead of adding extra constraints for the userland.

CCing Rich, from musl, which seems to be also affected it and to have another
view from a libc implementation.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: fstat(2) penalized by using newfstatat(6, "", buf, AT_EMPTY_PATH)
  2023-09-05 13:01     ` Adhemerval Zanella Netto
@ 2023-09-05 13:14       ` Mateusz Guzik
  2023-09-05 17:28         ` Adhemerval Zanella Netto
  2023-09-05 17:29         ` Linus Torvalds
  0 siblings, 2 replies; 13+ messages in thread
From: Mateusz Guzik @ 2023-09-05 13:14 UTC (permalink / raw)
  To: Adhemerval Zanella Netto
  Cc: Andreas Schwab, Rich Felker, Mateusz Guzik via Libc-alpha,
	Linus Torvalds

On 9/5/23, Adhemerval Zanella Netto <adhemerval.zanella@linaro.org> wrote:
> If I understand correctly, the issue seems to be the usage of a empty string
>
> sentinel ("") to indicate the argument it not really used (which trigger
> all
> the SMAP issues since kernel will always try to copy the argument from
> userland).  This also means on x86_64 you will also have this performance
> penalty on syscalls that use AT_EMPTY_PATH (i.e, execveat,
> name_to_handle_at,
> open_tree, faccessat, etc.).  I really think it would better fixed in the
> kernel instead of adding extra constraints for the userland.
>

I completely agree this is a problem going way past fstat.

One could be tempted to allow NULL with the flag, but that wont work
-- I know of code out there which checks for statx availability by
deliberately passing a NULL path. I would not be shocked if there was
more of the sort and passing the AT_EMPTY_PATH flag on top.

I am considering proposing a new flag which combined with NULL path
would indicate there is only a fd lookup to perform -- fuly backwards
compatible and avoiding the problem. Then syscalls could start
supporting it over time as people get around to it.

However, the fstab stub in glibc does not have to suffer it regardless
of what happens with the above.

-- 
Mateusz Guzik <mjguzik gmail.com>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: fstat(2) penalized by using newfstatat(6, "", buf, AT_EMPTY_PATH)
  2023-09-05 13:14       ` Mateusz Guzik
@ 2023-09-05 17:28         ` Adhemerval Zanella Netto
  2023-09-05 17:45           ` Linus Torvalds
  2023-09-05 17:29         ` Linus Torvalds
  1 sibling, 1 reply; 13+ messages in thread
From: Adhemerval Zanella Netto @ 2023-09-05 17:28 UTC (permalink / raw)
  To: Mateusz Guzik
  Cc: Andreas Schwab, Rich Felker, Mateusz Guzik via Libc-alpha,
	Linus Torvalds



On 05/09/23 10:14, Mateusz Guzik wrote:
> On 9/5/23, Adhemerval Zanella Netto <adhemerval.zanella@linaro.org> wrote:
>> If I understand correctly, the issue seems to be the usage of a empty string
>>
>> sentinel ("") to indicate the argument it not really used (which trigger
>> all
>> the SMAP issues since kernel will always try to copy the argument from
>> userland).  This also means on x86_64 you will also have this performance
>> penalty on syscalls that use AT_EMPTY_PATH (i.e, execveat,
>> name_to_handle_at,
>> open_tree, faccessat, etc.).  I really think it would better fixed in the
>> kernel instead of adding extra constraints for the userland.
>>
> 
> I completely agree this is a problem going way past fstat.
> 
> One could be tempted to allow NULL with the flag, but that wont work
> -- I know of code out there which checks for statx availability by
> deliberately passing a NULL path. I would not be shocked if there was
> more of the sort and passing the AT_EMPTY_PATH flag on top.

I though about it, but besides being a clear kABI breakage it does not
help on older kernels (where fstat returns EFAULT for NULL).

I am not sure about how this statx availability would work, passing
NULL would returns EFAULT in both statx and old stat cases.

> 
> I am considering proposing a new flag which combined with NULL path
> would indicate there is only a fd lookup to perform -- fuly backwards
> compatible and avoiding the problem. Then syscalls could start
> supporting it over time as people get around to it> 
> However, the fstab stub in glibc does not have to suffer it regardless
> of what happens with the above.

I think we can still make it in a generic way, stat family would use more
syscall (which some filters might complain) but it should be ok:

diff --git a/sysdeps/unix/sysv/linux/fstatat64.c b/sysdeps/unix/sysv/linux/fstatat64.c
index 3509d3ca6d..9fc7f82db2 100644
--- a/sysdeps/unix/sysv/linux/fstatat64.c
+++ b/sysdeps/unix/sysv/linux/fstatat64.c
@@ -91,20 +91,30 @@ fstatat64_time64_stat (int fd, const char *file, struct __stat64_t64 *buf,
                       int flag)
 {
   int r;
+  bool is_fstat = flag == AT_EMPTY_PATH && fd >= 0 && file[0] == '\0';

 #if XSTAT_IS_XSTAT64
 # ifdef __NR_newfstatat
   /* 64-bit kABI, e.g. aarch64, ia64, powerpc64*, s390x, riscv64, and
      x86_64.  */
-  r = INTERNAL_SYSCALL_CALL (newfstatat, fd, file, buf, flag);
+  if (is_fstat)
+    r = INTERNAL_SYSCALL_CALL (fstat, fd, buf);
+  else
+    r = INTERNAL_SYSCALL_CALL (newfstatat, fd, file, buf, flag);
 # elif defined __NR_fstatat64
 #  if STAT64_IS_KERNEL_STAT64
   /* 64-bit kABI outlier, e.g. alpha  */
-  r = INTERNAL_SYSCALL_CALL (fstatat64, fd, file, buf, flag);
+  if (is_fstat)
+    r = INTERNAL_SYSCALL_CALL (fstat64, fd, buf);
+  else
+    r = INTERNAL_SYSCALL_CALL (fstatat64, fd, file, buf, flag);
 #  else
   /* 64-bit kABI outlier, e.g. sparc64.  */
   struct kernel_stat64 kst64;
-  r = INTERNAL_SYSCALL_CALL (fstatat64, fd, file, &kst64, flag);
+  if (is_fstat)
+    r = INTERNAL_SYSCALL_CALL (fstat64, fd, &kst64);
+  else
+    r = INTERNAL_SYSCALL_CALL (fstatat64, fd, file, &kst64, flag);
   if (r == 0)
     __cp_stat64_kstat64 (buf, &kst64);
 #  endif
@@ -115,7 +125,10 @@ fstatat64_time64_stat (int fd, const char *file, struct __stat64_t64 *buf,
      e.g. arm, csky, i386, hppa, m68k, microblaze, nios2, sh, powerpc32,
      and sparc32.  */
   struct stat64 st64;
-  r = INTERNAL_SYSCALL_CALL (fstatat64, fd, file, &st64, flag);
+  if (is_fstat)
+    r = INTERNAL_SYSCALL_CALL (fstat64, fd, &st64);
+  else
+    r = INTERNAL_SYSCALL_CALL (fstatat64, fd, file, &st64, flag);
   if (r == 0)
     {
       /* Clear both pad and reserved fields.  */
@@ -138,7 +151,10 @@ fstatat64_time64_stat (int fd, const char *file, struct __stat64_t64 *buf,
 # else
   /* 64-bit kabi outlier, e.g. mips64 and mips64-n32.  */
   struct kernel_stat kst;
-  r = INTERNAL_SYSCALL_CALL (newfstatat, fd, file, &kst, flag);
+  if (is_fstat)
+    r = INTERNAL_SYSCALL_CALL (fstat, fd, &kst);
+  else
+    r = INTERNAL_SYSCALL_CALL (newfstatat, fd, file, &kst, flag);
   if (r == 0)
     __cp_kstat_stat64_t64 (&kst, buf);
 # endif

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: fstat(2) penalized by using newfstatat(6, "", buf, AT_EMPTY_PATH)
  2023-09-05 13:14       ` Mateusz Guzik
  2023-09-05 17:28         ` Adhemerval Zanella Netto
@ 2023-09-05 17:29         ` Linus Torvalds
  1 sibling, 0 replies; 13+ messages in thread
From: Linus Torvalds @ 2023-09-05 17:29 UTC (permalink / raw)
  To: Mateusz Guzik
  Cc: Adhemerval Zanella Netto, Andreas Schwab, Rich Felker,
	Mateusz Guzik via Libc-alpha

On Tue, 5 Sept 2023 at 06:14, Mateusz Guzik <mjguzik@gmail.com> wrote:
>
> I completely agree this is a problem going way past fstat.

Realistically, fstatat() is likely the only case that matters from any
performance angle because it's the only one likely to be used a lot in
real loads.

Sure, there are other "at" system calls, but they just aren't
important from a performance angle.

I don't think we've ever seen 'fchown/fchmod()' be a major performance
issue, and if it was, the cost is elsewhere (ie the writeback of the
changed inode), so if glibc were to translate it to
'fchownat/fchmodat()' with AT_EMPTY_PATH, it really wouldn't matter.

In contrast, there are tons of loads where 'fstat()' is a noticeable
part of the load, because the "open+fstat" pattern is simply
fundamental Unix code. So converting it to 'fstatat()' is simply
*bad*.

Right now the kernel does even more than it needs to do (ie it does
the whole pathname handling, because I certainly didn't expect
AT_EMPTY_PATH to be a *hot* path), but as Mateusz says, even with that
all short-circuited (we have a trivial patch to do just that), just
the cost of checking "is it actually empty" is noticeable because of
the security boundary issue.

              Linus

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: fstat(2) penalized by using newfstatat(6, "", buf, AT_EMPTY_PATH)
  2023-09-05 17:28         ` Adhemerval Zanella Netto
@ 2023-09-05 17:45           ` Linus Torvalds
  2023-09-05 18:22             ` Adhemerval Zanella Netto
  2023-09-05 21:42             ` Rich Felker
  0 siblings, 2 replies; 13+ messages in thread
From: Linus Torvalds @ 2023-09-05 17:45 UTC (permalink / raw)
  To: Adhemerval Zanella Netto
  Cc: Mateusz Guzik, Andreas Schwab, Rich Felker, Mateusz Guzik via Libc-alpha

On Tue, 5 Sept 2023 at 10:28, Adhemerval Zanella Netto
<adhemerval.zanella@linaro.org> wrote:
>
> I think we can still make it in a generic way, stat family would use more
> syscall (which some filters might complain) but it should be ok:

Ugh. That patch of yours certainly seems to avoid the kernel overhead
(where that "fetch a byte from user space just to check it is '\0'" is
much more expensive than it is in user space), but why does glibc do
that whole "turn fstat() into fstatat(), and then turn it back again"?

It seems just stupid. If the user asked for 'fstat()', just give the
user 'fstat()'.

Your patch literally adds *complexity*, rather than take it away.

Is this all just because glibc did its own 'struct stat'
implementation way back when, and you want to have just one place
where the stat conversion is done?

Can't you just special case *that* case instead? It's long been
irrelevant. Afaik, on 32-bit x86, you just want to use the 'stat64'
system call family, and on 64-bit x86 you just use the regular 'stat'
family (yes, called 'newfstatat' for that one system call for internal
historial reasons), and there is no conversion needed.

IOW, all those wrappers just for 'struct stat' conversion are pure
overhead and silly garbage. Why do they exist? And if there is still
some other reason for them to exist, can't you make that be a
'sysdeps' file of its own, and not penalize all the sane cases with
'copy stat structures around'?

            Linus

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: fstat(2) penalized by using newfstatat(6, "", buf, AT_EMPTY_PATH)
  2023-09-05 17:45           ` Linus Torvalds
@ 2023-09-05 18:22             ` Adhemerval Zanella Netto
  2023-09-05 19:16               ` Adhemerval Zanella Netto
  2023-09-05 21:42             ` Rich Felker
  1 sibling, 1 reply; 13+ messages in thread
From: Adhemerval Zanella Netto @ 2023-09-05 18:22 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Mateusz Guzik, Andreas Schwab, Rich Felker, Mateusz Guzik via Libc-alpha



On 05/09/23 14:45, Linus Torvalds wrote:
> On Tue, 5 Sept 2023 at 10:28, Adhemerval Zanella Netto
> <adhemerval.zanella@linaro.org> wrote:
>>
>> I think we can still make it in a generic way, stat family would use more
>> syscall (which some filters might complain) but it should be ok:
> 
> Ugh. That patch of yours certainly seems to avoid the kernel overhead
> (where that "fetch a byte from user space just to check it is '\0'" is
> much more expensive than it is in user space), but why does glibc do
> that whole "turn fstat() into fstatat(), and then turn it back again"?
> 
> It seems just stupid. If the user asked for 'fstat()', just give the
> user 'fstat()'.
> 
> Your patch literally adds *complexity*, rather than take it away.
> 
> Is this all just because glibc did its own 'struct stat'
> implementation way back when, and you want to have just one place
> where the stat conversion is done?
> 
> Can't you just special case *that* case instead? It's long been
> irrelevant. Afaik, on 32-bit x86, you just want to use the 'stat64'
> system call family, and on 64-bit x86 you just use the regular 'stat'
> family (yes, called 'newfstatat' for that one system call for internal
> historial reasons), and there is no conversion needed.
> 
> IOW, all those wrappers just for 'struct stat' conversion are pure
> overhead and silly garbage. Why do they exist? And if there is still
> some other reason for them to exist, can't you make that be a
> 'sysdeps' file of its own, and not penalize all the sane cases with
> 'copy stat structures around'?

The old glibc stat code was a complete mess that was added prior symbol
versioning, with multiple implementations, along with a static linkage
that defines the current version (there is no need to describe it
further). The consolidation also provide a much simpler way to support
y2038.

On x86_64 and some other 64 bits ABIs where glibc exported struct stat 
is  the same as the kernel, there is no conversion whatsoever.  It is 
only required on some ABI where due historical mistakes the exported 
user struct does not match kernel (sparc64, mips64) and on 32 bits 
where the no-LFS support is just routed to LFS interface to simplify
things.

So the fstat to fstatat approach is a way to simplify the code and put 
all the required syscall logic only at fstatat implementation.  By adding 
the 'special' case back on fstat.c it would require to replicate all this
logic plus when to use statx instead (FSTATAT_USE_STATX), which is far 
from ideal. It is doable, but not really simpler...

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: fstat(2) penalized by using newfstatat(6, "", buf, AT_EMPTY_PATH)
  2023-09-05 18:22             ` Adhemerval Zanella Netto
@ 2023-09-05 19:16               ` Adhemerval Zanella Netto
  2023-09-05 19:21                 ` Linus Torvalds
  0 siblings, 1 reply; 13+ messages in thread
From: Adhemerval Zanella Netto @ 2023-09-05 19:16 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Mateusz Guzik, Andreas Schwab, Rich Felker, Mateusz Guzik via Libc-alpha



On 05/09/23 15:22, Adhemerval Zanella Netto wrote:
> 
> 
> On 05/09/23 14:45, Linus Torvalds wrote:
>> On Tue, 5 Sept 2023 at 10:28, Adhemerval Zanella Netto
>> <adhemerval.zanella@linaro.org> wrote:
>>>
>>> I think we can still make it in a generic way, stat family would use more
>>> syscall (which some filters might complain) but it should be ok:
>>
>> Ugh. That patch of yours certainly seems to avoid the kernel overhead
>> (where that "fetch a byte from user space just to check it is '\0'" is
>> much more expensive than it is in user space), but why does glibc do
>> that whole "turn fstat() into fstatat(), and then turn it back again"?
>>
>> It seems just stupid. If the user asked for 'fstat()', just give the
>> user 'fstat()'.
>>
>> Your patch literally adds *complexity*, rather than take it away.
>>
>> Is this all just because glibc did its own 'struct stat'
>> implementation way back when, and you want to have just one place
>> where the stat conversion is done?
>>
>> Can't you just special case *that* case instead? It's long been
>> irrelevant. Afaik, on 32-bit x86, you just want to use the 'stat64'
>> system call family, and on 64-bit x86 you just use the regular 'stat'
>> family (yes, called 'newfstatat' for that one system call for internal
>> historial reasons), and there is no conversion needed.
>>
>> IOW, all those wrappers just for 'struct stat' conversion are pure
>> overhead and silly garbage. Why do they exist? And if there is still
>> some other reason for them to exist, can't you make that be a
>> 'sysdeps' file of its own, and not penalize all the sane cases with
>> 'copy stat structures around'?
> 
> The old glibc stat code was a complete mess that was added prior symbol
> versioning, with multiple implementations, along with a static linkage
> that defines the current version (there is no need to describe it
> further). The consolidation also provide a much simpler way to support
> y2038.
> 
> On x86_64 and some other 64 bits ABIs where glibc exported struct stat 
> is  the same as the kernel, there is no conversion whatsoever.  It is 
> only required on some ABI where due historical mistakes the exported 
> user struct does not match kernel (sparc64, mips64) and on 32 bits 
> where the no-LFS support is just routed to LFS interface to simplify
> things.
> 
> So the fstat to fstatat approach is a way to simplify the code and put 
> all the required syscall logic only at fstatat implementation.  By adding 
> the 'special' case back on fstat.c it would require to replicate all this
> logic plus when to use statx instead (FSTATAT_USE_STATX), which is far 
> from ideal. It is doable, but not really simpler...

Below it is the stat logic implemented directly on stat64.c code.  It avoids
the "flag == AT_EMPTY_PATH ..." check, but it only handles LFS calls
where statx is not required (so 32 bit architectures will still continue to
use statx, but I don't think this would be a problem). It not really simpler 
than add the logic on fstatat64.c, but it should be ok.

diff --git a/sysdeps/unix/sysv/linux/fstat64.c b/sysdeps/unix/sysv/linux/fstat64.c
index 124384e57f..a7f4935a01 100644
--- a/sysdeps/unix/sysv/linux/fstat64.c
+++ b/sysdeps/unix/sysv/linux/fstat64.c
@@ -19,20 +19,53 @@
 #define __fstat __redirect___fstat
 #define fstat   __redirect_fstat
 #include <sys/stat.h>
+#undef __fstat
+#undef fstat
 #include <fcntl.h>
-#include <kernel_stat.h>
-#include <stat_t64_cp.h>
+#include <internal-stat.h>
 #include <errno.h>
 
 int
 __fstat64_time64 (int fd, struct __stat64_t64 *buf)
 {
+#if !FSTATAT_USE_STATX
+# if XSTAT_IS_XSTAT64
+#  ifdef __NR_newfstatat
+  /* 64-bit kABI, e.g. aarch64, ia64, powerpc64*, s390x, riscv64, and
+     x86_64.  */
+  return INLINE_SYSCALL_CALL (fstat, fd, buf);
+#  elif defined __NR_fstatat64
+#   if STAT64_IS_KERNEL_STAT64
+  /* 64-bit kABI outlier, e.g. alpha  */
+  return INLINE_SYSCALL_CALL (fstat64, fd, buf);
+#   else
+  /* 64-bit kABI outlier, e.g. sparc64.  */
+  struct kernel_stat64 kst64;
+  int r = INLINE_SYSCALL_CALL (fstat64, fd, &kst64);
+  if (r == 0)
+    __cp_stat64_kstat64 (buf, &kst64);
+  return r;
+#   endif /* STAT64_IS_KERNEL_STAT64 */
+#  endif
+# else /* XSTAT_IS_XSTAT64 */
+  /* 64-bit kabi outlier, e.g. mips64 and mips64-n32.  */
+  struct kernel_stat kst;
+  int r = INLINE_SYSCALL_CALL (fstat, fd, &kst);
+  if (r == 0)
+    __cp_kstat_stat64_t64 (&kst, buf);
+  return r;
+# endif
+#else /* !FSTATAT_USE_STATX  */
+  /* All kABIs with non-LFS support and with old 32-bit time_t support
+     e.g. arm, csky, i386, hppa, m68k, microblaze, nios2, sh, powerpc32,
+     and sparc32.  */
   if (fd < 0)
     {
       __set_errno (EBADF);
       return -1;
     }
   return __fstatat64_time64 (fd, "", buf, AT_EMPTY_PATH);
+#endif
 }
 #if __TIMESIZE != 64
 hidden_def (__fstat64_time64)
diff --git a/sysdeps/unix/sysv/linux/fstatat64.c b/sysdeps/unix/sysv/linux/fstatat64.c
index 3509d3ca6d..127c6ff601 100644
--- a/sysdeps/unix/sysv/linux/fstatat64.c
+++ b/sysdeps/unix/sysv/linux/fstatat64.c
@@ -21,12 +21,10 @@
 #include <sys/stat.h>
 #include <fcntl.h>
 #include <string.h>
-#include <kernel_stat.h>
 #include <sysdep.h>
 #include <time.h>
-#include <kstat_cp.h>
-#include <stat_t64_cp.h>
 #include <sys/sysmacros.h>
+#include <internal-stat.h>
 
 #if __TIMESIZE == 64 \
      && (__WORDSIZE == 32 \
@@ -40,11 +38,7 @@ _Static_assert (sizeof (__blkcnt_t) == sizeof (__blkcnt64_t),
                 "__blkcnt_t and __blkcnt64_t must match");
 #endif
 
-#if (__WORDSIZE == 32 \
-     && (!defined __SYSCALL_WORDSIZE || __SYSCALL_WORDSIZE == 32)) \
-     || defined STAT_HAS_TIME32 \
-     || (!defined __NR_newfstatat && !defined __NR_fstatat64)
-# define FSTATAT_USE_STATX 1
+#if FSTATAT_USE_STATX
 
 static inline int
 fstatat64_time64_statx (int fd, const char *file, struct __stat64_t64 *buf,
@@ -79,8 +73,6 @@ fstatat64_time64_statx (int fd, const char *file, struct __stat64_t64 *buf,
 
   return r;
 }
-#else
-# define FSTATAT_USE_STATX 0
 #endif
 
 /* Only statx supports 64-bit timestamps for 32-bit architectures with
diff --git a/sysdeps/unix/sysv/linux/internal-stat.h b/sysdeps/unix/sysv/linux/internal-stat.h
new file mode 100644
index 0000000000..e3b0569853
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/internal-stat.h
@@ -0,0 +1,31 @@
+/* Internal stat definitions.
+   Copyright (C) 2023 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include <stat_t64_cp.h>
+#include <kernel_stat.h>
+#include <kstat_cp.h>
+
+#if (__WORDSIZE == 32 \
+     && (!defined __SYSCALL_WORDSIZE || __SYSCALL_WORDSIZE == 32)) \
+     || defined STAT_HAS_TIME32 \
+     || (!defined __NR_newfstatat && !defined __NR_fstatat64)
+# define FSTATAT_USE_STATX 1
+#else
+# define FSTATAT_USE_STATX 0
+#endif

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: fstat(2) penalized by using newfstatat(6, "", buf, AT_EMPTY_PATH)
  2023-09-05 19:16               ` Adhemerval Zanella Netto
@ 2023-09-05 19:21                 ` Linus Torvalds
  0 siblings, 0 replies; 13+ messages in thread
From: Linus Torvalds @ 2023-09-05 19:21 UTC (permalink / raw)
  To: Adhemerval Zanella Netto
  Cc: Mateusz Guzik, Andreas Schwab, Rich Felker, Mateusz Guzik via Libc-alpha

On Tue, 5 Sept 2023 at 12:17, Adhemerval Zanella Netto
<adhemerval.zanella@linaro.org> wrote:
>
> Below it is the stat logic implemented directly on stat64.c code.

I was looking at that exact thing and going "if this just had
FSTATAT_USE_STATX it would all work easily".

You seem to have fixed that by just moving the FSTATAT_USE_STATX logic
into a common header file.

IOW, this looks good to me.

>     It avoids
> the "flag == AT_EMPTY_PATH ..." check, but it only handles LFS calls
> where statx is not required (so 32 bit architectures will still continue to
> use statx, but I don't think this would be a problem).

I think that given the existing logic, that's exactly the right thing to do.

Let's face it, 32-bit is getting to be pretty irrelevant.

Thanks,

              Linus

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: fstat(2) penalized by using newfstatat(6, "", buf, AT_EMPTY_PATH)
  2023-09-05 17:45           ` Linus Torvalds
  2023-09-05 18:22             ` Adhemerval Zanella Netto
@ 2023-09-05 21:42             ` Rich Felker
  2023-09-05 21:46               ` Mateusz Guzik
  1 sibling, 1 reply; 13+ messages in thread
From: Rich Felker @ 2023-09-05 21:42 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Adhemerval Zanella Netto, Mateusz Guzik, Andreas Schwab,
	Mateusz Guzik via Libc-alpha

On Tue, Sep 05, 2023 at 10:45:17AM -0700, Linus Torvalds wrote:
> On Tue, 5 Sept 2023 at 10:28, Adhemerval Zanella Netto
> <adhemerval.zanella@linaro.org> wrote:
> >
> > I think we can still make it in a generic way, stat family would use more
> > syscall (which some filters might complain) but it should be ok:
> 
> Ugh. That patch of yours certainly seems to avoid the kernel overhead
> (where that "fetch a byte from user space just to check it is '\0'" is
> much more expensive than it is in user space), but why does glibc do
> that whole "turn fstat() into fstatat(), and then turn it back again"?
> 
> It seems just stupid. If the user asked for 'fstat()', just give the
> user 'fstat()'.

We do it in musl, but we also do the "turn it back again" in
userspace. musl's __fstatat_kstat checks if AT_EMPTY_PATH is set, and
if so, calls SYS_fstat. I don't see why glibc couldn't do the same.

However, we don't do this, and this does not seem to be possible, in
the case where statx has to be used (32-bit archs). Is there an
analogous issue for statx?

Rich

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: fstat(2) penalized by using newfstatat(6, "", buf, AT_EMPTY_PATH)
  2023-09-05 21:42             ` Rich Felker
@ 2023-09-05 21:46               ` Mateusz Guzik
  0 siblings, 0 replies; 13+ messages in thread
From: Mateusz Guzik @ 2023-09-05 21:46 UTC (permalink / raw)
  To: Rich Felker
  Cc: Linus Torvalds, Adhemerval Zanella Netto, Andreas Schwab,
	Mateusz Guzik via Libc-alpha

On 9/5/23, Rich Felker <dalias@libc.org> wrote:
> On Tue, Sep 05, 2023 at 10:45:17AM -0700, Linus Torvalds wrote:
>> On Tue, 5 Sept 2023 at 10:28, Adhemerval Zanella Netto
>> <adhemerval.zanella@linaro.org> wrote:
>> >
>> > I think we can still make it in a generic way, stat family would use
>> > more
>> > syscall (which some filters might complain) but it should be ok:
>>
>> Ugh. That patch of yours certainly seems to avoid the kernel overhead
>> (where that "fetch a byte from user space just to check it is '\0'" is
>> much more expensive than it is in user space), but why does glibc do
>> that whole "turn fstat() into fstatat(), and then turn it back again"?
>>
>> It seems just stupid. If the user asked for 'fstat()', just give the
>> user 'fstat()'.
>
> We do it in musl, but we also do the "turn it back again" in
> userspace. musl's __fstatat_kstat checks if AT_EMPTY_PATH is set, and
> if so, calls SYS_fstat. I don't see why glibc couldn't do the same.
>
> However, we don't do this, and this does not seem to be possible, in
> the case where statx has to be used (32-bit archs). Is there an
> analogous issue for statx?
>

Yes and *currently* statx does not have a way out, but I'm going to
propose some patches to Linux about it in foreseeable future.

-- 
Mateusz Guzik <mjguzik gmail.com>

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2023-09-05 21:46 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-09-04  9:55 fstat(2) penalized by using newfstatat(6, "", buf, AT_EMPTY_PATH) Mateusz Guzik
2023-09-04 10:08 ` Andreas Schwab
2023-09-04 10:11   ` Mateusz Guzik
2023-09-05 13:01     ` Adhemerval Zanella Netto
2023-09-05 13:14       ` Mateusz Guzik
2023-09-05 17:28         ` Adhemerval Zanella Netto
2023-09-05 17:45           ` Linus Torvalds
2023-09-05 18:22             ` Adhemerval Zanella Netto
2023-09-05 19:16               ` Adhemerval Zanella Netto
2023-09-05 19:21                 ` Linus Torvalds
2023-09-05 21:42             ` Rich Felker
2023-09-05 21:46               ` Mateusz Guzik
2023-09-05 17:29         ` Linus Torvalds

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).