From: Adhemerval Zanella Netto <adhemerval.zanella@linaro.org>
To: Mateusz Guzik <mjguzik@gmail.com>
Cc: Andreas Schwab <schwab@suse.de>, Rich Felker <dalias@libc.org>,
Mateusz Guzik via Libc-alpha <libc-alpha@sourceware.org>,
Linus Torvalds <torvalds@linux-foundation.org>
Subject: Re: fstat(2) penalized by using newfstatat(6, "", buf, AT_EMPTY_PATH)
Date: Tue, 5 Sep 2023 14:28:22 -0300 [thread overview]
Message-ID: <6d0e4e9e-ab69-0c73-bb9d-ce344b4a043b@linaro.org> (raw)
In-Reply-To: <CAGudoHF1pLbO4+1ucqct2kEqNEkyqCPeX7uDsYRE82tVVX6cmQ@mail.gmail.com>
On 05/09/23 10:14, Mateusz Guzik wrote:
> On 9/5/23, Adhemerval Zanella Netto <adhemerval.zanella@linaro.org> wrote:
>> If I understand correctly, the issue seems to be the usage of a empty string
>>
>> sentinel ("") to indicate the argument it not really used (which trigger
>> all
>> the SMAP issues since kernel will always try to copy the argument from
>> userland). This also means on x86_64 you will also have this performance
>> penalty on syscalls that use AT_EMPTY_PATH (i.e, execveat,
>> name_to_handle_at,
>> open_tree, faccessat, etc.). I really think it would better fixed in the
>> kernel instead of adding extra constraints for the userland.
>>
>
> I completely agree this is a problem going way past fstat.
>
> One could be tempted to allow NULL with the flag, but that wont work
> -- I know of code out there which checks for statx availability by
> deliberately passing a NULL path. I would not be shocked if there was
> more of the sort and passing the AT_EMPTY_PATH flag on top.
I though about it, but besides being a clear kABI breakage it does not
help on older kernels (where fstat returns EFAULT for NULL).
I am not sure about how this statx availability would work, passing
NULL would returns EFAULT in both statx and old stat cases.
>
> I am considering proposing a new flag which combined with NULL path
> would indicate there is only a fd lookup to perform -- fuly backwards
> compatible and avoiding the problem. Then syscalls could start
> supporting it over time as people get around to it>
> However, the fstab stub in glibc does not have to suffer it regardless
> of what happens with the above.
I think we can still make it in a generic way, stat family would use more
syscall (which some filters might complain) but it should be ok:
diff --git a/sysdeps/unix/sysv/linux/fstatat64.c b/sysdeps/unix/sysv/linux/fstatat64.c
index 3509d3ca6d..9fc7f82db2 100644
--- a/sysdeps/unix/sysv/linux/fstatat64.c
+++ b/sysdeps/unix/sysv/linux/fstatat64.c
@@ -91,20 +91,30 @@ fstatat64_time64_stat (int fd, const char *file, struct __stat64_t64 *buf,
int flag)
{
int r;
+ bool is_fstat = flag == AT_EMPTY_PATH && fd >= 0 && file[0] == '\0';
#if XSTAT_IS_XSTAT64
# ifdef __NR_newfstatat
/* 64-bit kABI, e.g. aarch64, ia64, powerpc64*, s390x, riscv64, and
x86_64. */
- r = INTERNAL_SYSCALL_CALL (newfstatat, fd, file, buf, flag);
+ if (is_fstat)
+ r = INTERNAL_SYSCALL_CALL (fstat, fd, buf);
+ else
+ r = INTERNAL_SYSCALL_CALL (newfstatat, fd, file, buf, flag);
# elif defined __NR_fstatat64
# if STAT64_IS_KERNEL_STAT64
/* 64-bit kABI outlier, e.g. alpha */
- r = INTERNAL_SYSCALL_CALL (fstatat64, fd, file, buf, flag);
+ if (is_fstat)
+ r = INTERNAL_SYSCALL_CALL (fstat64, fd, buf);
+ else
+ r = INTERNAL_SYSCALL_CALL (fstatat64, fd, file, buf, flag);
# else
/* 64-bit kABI outlier, e.g. sparc64. */
struct kernel_stat64 kst64;
- r = INTERNAL_SYSCALL_CALL (fstatat64, fd, file, &kst64, flag);
+ if (is_fstat)
+ r = INTERNAL_SYSCALL_CALL (fstat64, fd, &kst64);
+ else
+ r = INTERNAL_SYSCALL_CALL (fstatat64, fd, file, &kst64, flag);
if (r == 0)
__cp_stat64_kstat64 (buf, &kst64);
# endif
@@ -115,7 +125,10 @@ fstatat64_time64_stat (int fd, const char *file, struct __stat64_t64 *buf,
e.g. arm, csky, i386, hppa, m68k, microblaze, nios2, sh, powerpc32,
and sparc32. */
struct stat64 st64;
- r = INTERNAL_SYSCALL_CALL (fstatat64, fd, file, &st64, flag);
+ if (is_fstat)
+ r = INTERNAL_SYSCALL_CALL (fstat64, fd, &st64);
+ else
+ r = INTERNAL_SYSCALL_CALL (fstatat64, fd, file, &st64, flag);
if (r == 0)
{
/* Clear both pad and reserved fields. */
@@ -138,7 +151,10 @@ fstatat64_time64_stat (int fd, const char *file, struct __stat64_t64 *buf,
# else
/* 64-bit kabi outlier, e.g. mips64 and mips64-n32. */
struct kernel_stat kst;
- r = INTERNAL_SYSCALL_CALL (newfstatat, fd, file, &kst, flag);
+ if (is_fstat)
+ r = INTERNAL_SYSCALL_CALL (fstat, fd, &kst);
+ else
+ r = INTERNAL_SYSCALL_CALL (newfstatat, fd, file, &kst, flag);
if (r == 0)
__cp_kstat_stat64_t64 (&kst, buf);
# endif
next prev parent reply other threads:[~2023-09-05 17:28 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-09-04 9:55 Mateusz Guzik
2023-09-04 10:08 ` Andreas Schwab
2023-09-04 10:11 ` Mateusz Guzik
2023-09-05 13:01 ` Adhemerval Zanella Netto
2023-09-05 13:14 ` Mateusz Guzik
2023-09-05 17:28 ` Adhemerval Zanella Netto [this message]
2023-09-05 17:45 ` Linus Torvalds
2023-09-05 18:22 ` Adhemerval Zanella Netto
2023-09-05 19:16 ` Adhemerval Zanella Netto
2023-09-05 19:21 ` Linus Torvalds
2023-09-05 21:42 ` Rich Felker
2023-09-05 21:46 ` Mateusz Guzik
2023-09-05 17:29 ` Linus Torvalds
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=6d0e4e9e-ab69-0c73-bb9d-ce344b4a043b@linaro.org \
--to=adhemerval.zanella@linaro.org \
--cc=dalias@libc.org \
--cc=libc-alpha@sourceware.org \
--cc=mjguzik@gmail.com \
--cc=schwab@suse.de \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).