public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
* [PATCH] ftw.c: Check for "." and ".." branchlessly
@ 2023-09-24 17:16 Wilco Dijkstra
  2023-09-25  0:17 ` Paul Eggert
  0 siblings, 1 reply; 14+ messages in thread
From: Wilco Dijkstra @ 2023-09-24 17:16 UTC (permalink / raw)
  To: Xi Ruoyao, tirtajames45; +Cc: 'GNU C Library'

Hi,

> There are also other issues:
>
> 1. If "fname" is an empty string, there will be a buffer overread.  I
> don't know if fname can be empty string here though.

I think 'namlen' is the length of the string, so you could do something like:

if (namlen == 1 && (memcmp (fname, ".\0", 2) == 0)
  return 0;

That will generate a single 16-bit load on targets that support unaligned
access. But it would slow things down on targets that don't or don't expand
memcmp inline.

> 2. If "fname" is "..", then the bytes in "w4" can be:

Indeed, the expression:

(*(w4 *)fname | '\0' SH 24)

is incorrect since X | 0 << 24 is always the same as X.

And big-endian doesn't work like that either.

> 3. We should let the compiler decide to use the branchless operation or
> not.  If the compiler does not do the correct thing we should fix the
> compiler instead of writing some nasty code here because fixing the
> compiler will benefit both Glibc and other packages.  There is some
> related existing GCC bug report (https://gcc.gnu.org/PR109555), the
> problem is using a branchless operation here is not always a win.  See
> the comments in the bug report for reference.

Yes, branchless code is not a goal by itself. First determine whether this
function is performance critical, then determine the parts that are slower
than necessary. I don't think this check is a major bottleneck, however
assuming most filenames do not start with a '.', this would typically need
one load, compare and branch per call:

if (glibc_unlikely (fname[0] == '.'))
  ...

Cheers,
Wilco

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] ftw.c: Check for "." and ".." branchlessly
  2023-09-24 17:16 [PATCH] ftw.c: Check for "." and ".." branchlessly Wilco Dijkstra
@ 2023-09-25  0:17 ` Paul Eggert
  2023-09-25  6:26   ` Alexander Monakov
  0 siblings, 1 reply; 14+ messages in thread
From: Paul Eggert @ 2023-09-25  0:17 UTC (permalink / raw)
  To: Wilco Dijkstra, Xi Ruoyao, tirtajames45; +Cc: 'GNU C Library'

On 2023-09-24 10:16, Wilco Dijkstra wrote:
> I don't think this check is a major bottleneck

You're right it's not, but why worry when bikeshedding? :-)

The following is safe for any string alignment or length:

   (name[0] == '.' && !name[name[1] == '.'])

On today's processors this is surely a (tiny) win over ftw.c's current 
code (name[0] == '.' && (name[1] == '\0' || (name[1] == '.' && name[2] 
== '\0'))), as the ftw code is longer and has more conditional branches.

I doubt whether we really want the code to be branchless, though. I 
fooled around for a while and eventually came up with this:

   (name[0] == '.')
    & ~+!!name[(name[0] == '.') + (name[name[0] == '.'] == '.')]

Ouch, right? Among other things, this needs "~+!!" instead of "~!!" to 
pacify a false positive from GCC -Wbool-operation (see 
<https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111575>), and it needs 
"~!!" instead of plain "!" because otherwise GCC generates an 
unnecessary conditional branch in the machine code on x86-64 (see 
<https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79045>). I'm not sure it's 
worth the effort of writing and commenting such tricky code.

It's not too surprising that GCC generates poorly optimized code for the 
branchless code, as almost nobody writes code like that.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] ftw.c: Check for "." and ".." branchlessly
  2023-09-25  0:17 ` Paul Eggert
@ 2023-09-25  6:26   ` Alexander Monakov
  2023-09-25  6:45     ` Alexander Monakov
  2023-09-25  7:10     ` Paul Eggert
  0 siblings, 2 replies; 14+ messages in thread
From: Alexander Monakov @ 2023-09-25  6:26 UTC (permalink / raw)
  To: Paul Eggert
  Cc: Wilco Dijkstra, Xi Ruoyao, tirtajames45, 'GNU C Library'


On Sun, 24 Sep 2023, Paul Eggert wrote:

> On 2023-09-24 10:16, Wilco Dijkstra wrote:
> > I don't think this check is a major bottleneck
> 
> You're right it's not, but why worry when bikeshedding? :-)
> 
> The following is safe for any string alignment or length:
> 
>   (name[0] == '.' && !name[name[1] == '.'])
> 
> On today's processors this is surely a (tiny) win over ftw.c's current code
> (name[0] == '.' && (name[1] == '\0' || (name[1] == '.' && name[2] == '\0'))),
> as the ftw code is longer and has more conditional branches.
> 
> I doubt whether we really want the code to be branchless, though. I fooled
> around for a while and eventually came up with this:
> 
>   (name[0] == '.')
>    & ~+!!name[(name[0] == '.') + (name[name[0] == '.'] == '.')]

If you look at the context, you'll see that 'namlen' gives the length, so

    if (namlen < 3 && name[0] == '.' && name[namlen-1] == '.')
      return 0;

works, and you can combine that with shift-ors to a single comparison
(either relying on namlen to be always non-zero, or testing 'namlen-1 < 2')
but three separate branchless (especially the length test) is cheaper.

Alexander

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] ftw.c: Check for "." and ".." branchlessly
  2023-09-25  6:26   ` Alexander Monakov
@ 2023-09-25  6:45     ` Alexander Monakov
  2023-09-25  7:10     ` Paul Eggert
  1 sibling, 0 replies; 14+ messages in thread
From: Alexander Monakov @ 2023-09-25  6:45 UTC (permalink / raw)
  To: Paul Eggert
  Cc: Wilco Dijkstra, Xi Ruoyao, tirtajames45, 'GNU C Library'


On Mon, 25 Sep 2023, Alexander Monakov wrote:

> If you look at the context, you'll see that 'namlen' gives the length, so
> 
>     if (namlen < 3 && name[0] == '.' && name[namlen-1] == '.')
>       return 0;
> 
> works, and you can combine that with shift-ors to a single comparison
> (either relying on namlen to be always non-zero, or testing 'namlen-1 < 2')

(together with replacing 'name[namlen-1]' with 'name[namlen/2]' for the latter)

Alexander

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] ftw.c: Check for "." and ".." branchlessly
  2023-09-25  6:26   ` Alexander Monakov
  2023-09-25  6:45     ` Alexander Monakov
@ 2023-09-25  7:10     ` Paul Eggert
  2023-09-25  7:15       ` Andrew Pinski
  2023-09-25  7:32       ` Alexander Monakov
  1 sibling, 2 replies; 14+ messages in thread
From: Paul Eggert @ 2023-09-25  7:10 UTC (permalink / raw)
  To: Alexander Monakov
  Cc: Wilco Dijkstra, Xi Ruoyao, tirtajames45, 'GNU C Library'

On 2023-09-24 23:26, Alexander Monakov wrote:
>      if (namlen < 3 && name[0] == '.' && name[namlen-1] == '.')
>        return 0;
> 
> works, and you can combine that with shift-ors to a single comparison

How?

To make this concrete, with gcc -O2 on x86-64 the following:

   _Bool
   g (char const *name, unsigned long namlen)
   {
     return ((name[0] == '.')
             & ~+!!name[(name[0] == '.')
                        + (name[name[0] == '.'] == '.')]);
   }

compiles to branchless machine code that works for any string alignment 
or length. I don't see how to use namlen to do anything similar.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] ftw.c: Check for "." and ".." branchlessly
  2023-09-25  7:10     ` Paul Eggert
@ 2023-09-25  7:15       ` Andrew Pinski
  2023-09-25  7:21         ` Andrew Pinski
  2023-09-25  7:32         ` Paul Eggert
  2023-09-25  7:32       ` Alexander Monakov
  1 sibling, 2 replies; 14+ messages in thread
From: Andrew Pinski @ 2023-09-25  7:15 UTC (permalink / raw)
  To: Paul Eggert
  Cc: Alexander Monakov, Wilco Dijkstra, Xi Ruoyao, tirtajames45,
	GNU C Library

On Mon, Sep 25, 2023 at 12:11 AM Paul Eggert <eggert@cs.ucla.edu> wrote:
>
> On 2023-09-24 23:26, Alexander Monakov wrote:
> >      if (namlen < 3 && name[0] == '.' && name[namlen-1] == '.')
> >        return 0;
> >
> > works, and you can combine that with shift-ors to a single comparison
>
> How?
>
> To make this concrete, with gcc -O2 on x86-64 the following:
>
>    _Bool
>    g (char const *name, unsigned long namlen)
>    {
>      return ((name[0] == '.')
>              & ~+!!name[(name[0] == '.')
>                         + (name[name[0] == '.'] == '.')]);
>    }
>
> compiles to branchless machine code that works for any string alignment
> or length. I don't see how to use namlen to do anything similar.

```
_Bool
   g (char const *name, unsigned long namlen)
   {
        return (namlen < 3)
               & (namlen != 0)
               & (name[0] == '.')
               & (name[namlen-1] == '.');
   }
```

Thanks,
Andrew

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] ftw.c: Check for "." and ".." branchlessly
  2023-09-25  7:15       ` Andrew Pinski
@ 2023-09-25  7:21         ` Andrew Pinski
  2023-09-25  7:32         ` Paul Eggert
  1 sibling, 0 replies; 14+ messages in thread
From: Andrew Pinski @ 2023-09-25  7:21 UTC (permalink / raw)
  To: Paul Eggert
  Cc: Alexander Monakov, Wilco Dijkstra, Xi Ruoyao, tirtajames45,
	GNU C Library

On Mon, Sep 25, 2023 at 12:15 AM Andrew Pinski <pinskia@gmail.com> wrote:
>
> On Mon, Sep 25, 2023 at 12:11 AM Paul Eggert <eggert@cs.ucla.edu> wrote:
> >
> > On 2023-09-24 23:26, Alexander Monakov wrote:
> > >      if (namlen < 3 && name[0] == '.' && name[namlen-1] == '.')
> > >        return 0;
> > >
> > > works, and you can combine that with shift-ors to a single comparison
> >
> > How?
> >
> > To make this concrete, with gcc -O2 on x86-64 the following:
> >
> >    _Bool
> >    g (char const *name, unsigned long namlen)
> >    {
> >      return ((name[0] == '.')
> >              & ~+!!name[(name[0] == '.')
> >                         + (name[name[0] == '.'] == '.')]);
> >    }
> >
> > compiles to branchless machine code that works for any string alignment
> > or length. I don't see how to use namlen to do anything similar.
>
> ```
> _Bool
>    g (char const *name, unsigned long namlen)
>    {
>         return (namlen < 3)
>                & (namlen != 0)
>                & (name[0] == '.')
>                & (name[namlen-1] == '.');
>    }
> ```

Which produces 2 loads and 3 compares. And produces the best code for aarch64:
```
        sub     x1, x1, #1
        ldrb    w3, [x0]
        mov     w2, 46
        cmp     w3, w2
        ldrb    w0, [x0, x1]
        ccmp    x1, 1, 2, eq
        ccmp    w0, w2, 0, ls
        cset    w0, eq
```

And better code for x86_64 than your example:
        leaq    -1(%rsi), %rdx
        cmpb    $46, (%rdi)
        sete    %al
        cmpq    $1, %rdx
        setbe   %dl
        andl    %edx, %eax
        cmpb    $46, -1(%rdi,%rsi)
        sete    %dl
        andl    %edx, %eax

Because there are only 2 loads.

Thanks,
Andrew Pinski

>
> Thanks,
> Andrew

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] ftw.c: Check for "." and ".." branchlessly
  2023-09-25  7:15       ` Andrew Pinski
  2023-09-25  7:21         ` Andrew Pinski
@ 2023-09-25  7:32         ` Paul Eggert
  1 sibling, 0 replies; 14+ messages in thread
From: Paul Eggert @ 2023-09-25  7:32 UTC (permalink / raw)
  To: Andrew Pinski
  Cc: Alexander Monakov, Wilco Dijkstra, Xi Ruoyao, tirtajames45,
	GNU C Library

On 2023-09-25 00:15, Andrew Pinski wrote:
> _Bool
>     g (char const *name, unsigned long namlen)
>     {
>          return (namlen < 3)
>                 & (namlen != 0)
>                 & (name[0] == '.')
>                 & (name[namlen-1] == '.');
>     }

Nice, but this goes outside the buffer if namlen == 0. It would be OK if 
we know for some other reason that namlen cannot be zero, or if we know 
that the implementation won't care about the buffer overrun, though I'm 
not sure we know either of these things in general, and regardless this 
bit of trickiness would need a comment.

An efficiency nit (but hey! this thread is all about nits...): '(namlen 
!= 0)' can be omitted because it's implied by (name[0] == '.'). Omitting 
'(namlen != 0)' omits one machine insn on x86-64.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] ftw.c: Check for "." and ".." branchlessly
  2023-09-25  7:10     ` Paul Eggert
  2023-09-25  7:15       ` Andrew Pinski
@ 2023-09-25  7:32       ` Alexander Monakov
  2023-09-25  8:10         ` Paul Eggert
  1 sibling, 1 reply; 14+ messages in thread
From: Alexander Monakov @ 2023-09-25  7:32 UTC (permalink / raw)
  To: Paul Eggert
  Cc: Wilco Dijkstra, Xi Ruoyao, tirtajames45, 'GNU C Library'


On Mon, 25 Sep 2023, Paul Eggert wrote:

> On 2023-09-24 23:26, Alexander Monakov wrote:
> >      if (namlen < 3 && name[0] == '.' && name[namlen-1] == '.')
> >        return 0;
> > 
> > works, and you can combine that with shift-ors to a single comparison
> 
> How?

With 

  if ((-(namlen < 3) & (name[0] | name[namlen-1] << 8)) == '.' * 0x101)
    return 0;

or 

  if ((-(namlen-1 < 2) & (name[0] | name[namlen/2] << 8)) == '.' * 0x101)
    return 0;

if it needs to work for namlen == 0.

Alexander

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] ftw.c: Check for "." and ".." branchlessly
  2023-09-25  7:32       ` Alexander Monakov
@ 2023-09-25  8:10         ` Paul Eggert
  0 siblings, 0 replies; 14+ messages in thread
From: Paul Eggert @ 2023-09-25  8:10 UTC (permalink / raw)
  To: Alexander Monakov
  Cc: Wilco Dijkstra, Xi Ruoyao, tirtajames45, 'GNU C Library'

On 2023-09-25 00:32, Alexander Monakov wrote:
>    if ((-(namlen-1 < 2) & (name[0] | name[namlen/2] << 8)) == '.' * 0x101)

Oh, that's nice and should work. Thanks for explaining.

Somewhat clearer, and with three fewer instructions on gcc x86-64, is:

   if ((namlen <= 2) & (name[0] == '.') & (name[namlen / 2] == '.'))


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] ftw.c: Check for "." and ".." branchlessly
       [not found] <CANDqPp0yVgx8hdeL9JUQgyMSww2MW7QDdPERC2Mxp5vdCOoOog@mail.gmail.com>
@ 2023-09-24 13:14 ` Xi Ruoyao
  0 siblings, 0 replies; 14+ messages in thread
From: Xi Ruoyao @ 2023-09-24 13:14 UTC (permalink / raw)
  To: James; +Cc: libc-alpha

(Resend to fix the bad CC line.)

On Sun, 2023-09-24 at 19:49 +0700, James wrote:
> > No.  The unaligned access will cause segmentation fault or bus error
> > on
> > some architectures.
> Won't fname be aligned since we're checking from the start of d_name?
> 
> > 2. If "fname" is "..", then the bytes in "w4" can be:
> > 
> > '.', '.', '\0', **anything**
> > 
> > The last byte is completely arbitrary (and it may be even out of a
> > mapped memory range and then reading it will immediately cause a
> > segfault).  Again I don't know if fname can be just ".." here
> > though.
> Doesn't the *(w4 *)fname | '\0' SH 24 overwrite the last arbitrary
> byte?

Alright, I misread it.  But if the buffer is just 3-byte this is still
overreading the buffer and the overread may cause a segfault.

-- 
Xi Ruoyao <xry111@xry111.site>
School of Aerospace Science and Technology, Xidian University

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] ftw.c: Check for "." and ".." branchlessly
  2023-09-24 11:29 ` Xi Ruoyao
@ 2023-09-24 11:43   ` Xi Ruoyao
  0 siblings, 0 replies; 14+ messages in thread
From: Xi Ruoyao @ 2023-09-24 11:43 UTC (permalink / raw)
  To: James Tirta Halim, libc-alpha

On Sun, 2023-09-24 at 19:29 +0800, Xi Ruoyao wrote:
> On Sun, 2023-09-24 at 13:45 +0700, James Tirta Halim wrote:
> > +static int
> > +is_relative (const char *fname)
> > +{
> > +	typedef uint16_t w2 __attribute__ ((__may_alias__));
> > +	typedef uint32_t w4 __attribute__ ((__may_alias__));
> > +	const uint16_t n1 = '.' | '\0' SH 8;
> > +	const uint32_t n2 = '.' | '.' SH 8 | '\0' SH 16;
> > +	return (*(w2 *)fname == n1) | ((*(w4 *)fname | '\0' SH 24) ==
> > n2);
> > +}
> 
> No.  The unaligned access will cause segmentation fault or bus error on
> some architectures.

There are also other issues:

1. If "fname" is an empty string, there will be a buffer overread.  I
don't know if fname can be empty string here though.

2. If "fname" is "..", then the bytes in "w4" can be:

'.', '.', '\0', **anything**

The last byte is completely arbitrary (and it may be even out of a
mapped memory range and then reading it will immediately cause a
segfault).  Again I don't know if fname can be just ".." here though.

3. We should let the compiler decide to use the branchless operation or
not.  If the compiler does not do the correct thing we should fix the
compiler instead of writing some nasty code here because fixing the
compiler will benefit both Glibc and other packages.  There is some
related existing GCC bug report (https://gcc.gnu.org/PR109555), the
problem is using a branchless operation here is not always a win.  See
the comments in the bug report for reference.

-- 
Xi Ruoyao <xry111@xry111.site>
School of Aerospace Science and Technology, Xidian University

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] ftw.c: Check for "." and ".." branchlessly
  2023-09-24  6:45 James Tirta Halim
@ 2023-09-24 11:29 ` Xi Ruoyao
  2023-09-24 11:43   ` Xi Ruoyao
  0 siblings, 1 reply; 14+ messages in thread
From: Xi Ruoyao @ 2023-09-24 11:29 UTC (permalink / raw)
  To: James Tirta Halim, libc-alpha

On Sun, 2023-09-24 at 13:45 +0700, James Tirta Halim wrote:
> +static int
> +is_relative (const char *fname)
> +{
> +	typedef uint16_t w2 __attribute__ ((__may_alias__));
> +	typedef uint32_t w4 __attribute__ ((__may_alias__));
> +	const uint16_t n1 = '.' | '\0' SH 8;
> +	const uint32_t n2 = '.' | '.' SH 8 | '\0' SH 16;
> +	return (*(w2 *)fname == n1) | ((*(w4 *)fname | '\0' SH 24) ==
> n2);
> +}

No.  The unaligned access will cause segmentation fault or bus error on
some architectures.

-- 
Xi Ruoyao <xry111@xry111.site>
School of Aerospace Science and Technology, Xidian University

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH] ftw.c: Check for "." and ".." branchlessly
@ 2023-09-24  6:45 James Tirta Halim
  2023-09-24 11:29 ` Xi Ruoyao
  0 siblings, 1 reply; 14+ messages in thread
From: James Tirta Halim @ 2023-09-24  6:45 UTC (permalink / raw)
  To: libc-alpha; +Cc: James Tirta Halim

---
 io/ftw.c | 25 ++++++++++++++++++++++---
 1 file changed, 22 insertions(+), 3 deletions(-)

diff --git a/io/ftw.c b/io/ftw.c
index a72c7d5171..742369a2d2 100644
--- a/io/ftw.c
+++ b/io/ftw.c
@@ -66,6 +66,7 @@ char *alloca ();
 #include <unistd.h>
 #include <not-cancel.h>
 #include <sys/param.h>
+#include <endian.h>
 #ifdef _LIBC
 # include <include/sys/stat.h>
 #else
@@ -388,6 +389,25 @@ open_dir_stream (int *dfdp, struct ftw_data *data, struct dir_data *dirp)
 }
 
 
+#if __BYTE_ORDER == __LITTLE_ENDIAN
+#  define SH <<
+#else
+#  define SH >>
+#endif
+
+static int
+is_relative (const char *fname)
+{
+	typedef uint16_t w2 __attribute__ ((__may_alias__));
+	typedef uint32_t w4 __attribute__ ((__may_alias__));
+	const uint16_t n1 = '.' | '\0' SH 8;
+	const uint32_t n2 = '.' | '.' SH 8 | '\0' SH 16;
+	return (*(w2 *)fname == n1) | ((*(w4 *)fname | '\0' SH 24) == n2);
+}
+
+#undef SH
+
+
 static int
 process_entry (struct ftw_data *data, struct dir_data *dir, const char *name,
 	       size_t namlen, int d_type)
@@ -397,9 +417,8 @@ process_entry (struct ftw_data *data, struct dir_data *dir, const char *name,
   int flag = 0;
   size_t new_buflen;
 
-  if (name[0] == '.' && (name[1] == '\0'
-			 || (name[1] == '.' && name[2] == '\0')))
-    /* Don't process the "." and ".." entries.  */
+  /* Don't process the "." and ".." entries.  */
+  if (is_relative(name))
     return 0;
 
   new_buflen = data->ftw.base + namlen + 2;
-- 
2.42.0


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2023-09-25  8:10 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-09-24 17:16 [PATCH] ftw.c: Check for "." and ".." branchlessly Wilco Dijkstra
2023-09-25  0:17 ` Paul Eggert
2023-09-25  6:26   ` Alexander Monakov
2023-09-25  6:45     ` Alexander Monakov
2023-09-25  7:10     ` Paul Eggert
2023-09-25  7:15       ` Andrew Pinski
2023-09-25  7:21         ` Andrew Pinski
2023-09-25  7:32         ` Paul Eggert
2023-09-25  7:32       ` Alexander Monakov
2023-09-25  8:10         ` Paul Eggert
     [not found] <CANDqPp0yVgx8hdeL9JUQgyMSww2MW7QDdPERC2Mxp5vdCOoOog@mail.gmail.com>
2023-09-24 13:14 ` Xi Ruoyao
  -- strict thread matches above, loose matches on Subject: below --
2023-09-24  6:45 James Tirta Halim
2023-09-24 11:29 ` Xi Ruoyao
2023-09-24 11:43   ` Xi Ruoyao

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).