public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* Add new ABIs '__strcmpeq', '__strncmpeq', '__wcscmpeq' and '__wcsncmpeq' to libc
@ 2022-01-20 22:56 Noah Goldstein
  2022-01-21 18:51 ` [libc-coord] " Joerg Sonnenberger
  0 siblings, 1 reply; 3+ messages in thread
From: Noah Goldstein @ 2022-01-20 22:56 UTC (permalink / raw)
  To: libc-coord; +Cc: GNU C Library, Richard Biener via Gcc

Hi All,

This is a proposal for four new interfaces to be supported by libc.

This is essentially the same proposal as '__memcmpeq()':
https://sourceware.org/pipermail/libc-alpha/2021-September/131099.html

for the character string and wide-character string comparison
functions.

#### Interfaces ####

int __strcmpeq(const void * s1, const void * s2)
int __strncmpeq(const void * s1, const void * s2, size_t n)
int __wcscmpeq(const wchar_t * ws1, const wchar_t * ws2)
int __wcsncmpeq(const wchar_t * ws1, const wchar_t * ws2, size_t n)

#### Descriptions ####

- 'strcmpeq()'
    - The '__strcmpeq()' function shall compare the string pointed to
      by 's1' to the string pointed to by 's2'. If the two strings are
      equal the return value will be zero. Otherwise the return value
      will be some non-zero value. 'strcmp()' is a valid
      implementation of '__strcmpeq()'.

- 'strncmpeq()'
    - The '__strncmpeq()' function shall compare not more than 'n'
      bytes (bytes that follow a null byte are not compared) from the
      array pointed to by 's1' to the array pointed to by 's2'. If the
      two strings or first 'n' characters are equal the return value
      will be zero. Otherwise the return value will be some non-zero
      value. 'strncmp()' is a valid implementation of '__strncmpeq()'.

- 'wcscmpeq()'
    - The '__wcscmpeq()' function shall compare the wide-character
      string pointed to by 'ws1' to the wide-character string pointed
      to by 'ws2'. If the two wide-character strings are equal the
      return value will be zero. Otherwise the return value will be
      some non-zero value. 'wcscmp()' is a valid implementation of
      '__wcscmpeq()'.

- 'wcsncmpeq()'
    - The '__wcsncmpeq()' function shall compare not more than 'n'
      wide-character codes (wide-character codes that follow a null
      wide-character code are not compared) from the array pointed to
      by 'ws1' to the array pointed to by 'ws2'.  If the two
      wide-character strings or first 'n' characters are equal the
      return value will be zero. Otherwise the return value will be
      some non-zero value. 'wcsncmp()' is a valid implementation of
      '__wcsncmpeq()'.


#### Use Case ####

The goal is that the new interfaces will be usable as an optimization
by compilers if a program uses the return value of the non "eq"
variant as a boolean. For example:


void
foo (const char *s1, const char *s2, const wchar_t *ws1, const wchar_t *ws2,
     size_t n)
{
  if (!strcmp (s1, s2))
    printf ("strcmp can be optimized to __strcmpeq in this use case\n");
  if (strncmp (s1, s2, n))
    printf ("strncmp can be optimized to __strncmpeq in this use case\n");
  if (wcscmp (ws1, ws2))
    printf ("wcscmp can be optimized to __wcscmpeq in this use case\n");
  if (!wcsncmp (ws1, ws2, n))
    printf ("wcsncmp can be optimized to __wcsncmpeq in this use case\n");
}


#### Argument Specifications ####

- '__strcmpeq()' has the exact same argument specifications as 'strcmp()'
    - 's1' is a null terminated character string.
    - 's2' is a null terminated character string.

- '__strncmpeq()' has the exact same argument specifications as 'strncmp()'
    - 's1' is a character sequences terminated either by null or 'n'
    - 's2' is a character sequences terminated either by null or 'n'
    - 'n' is the maximum number

- '__wcscmpeq()' has the exact same argument specifications as 'wcscmp()'
    - 'ws1' is a null terminated wide-character string.
    - 'ws2' is a null terminated wide-character string.

- '__wcsncmpeq()' has the exact same argument specifications as 'wcsncmp()'
    - 'ws1' is a wide-character sequences terminated either by null or 'n'
    - 'ws2' is a wide-character sequences terminated either by null or 'n'
    - 'n' is the maximum number

For each of these functions, if any of the input constraints are not
met, the result is undefined.

#### Return Value Specification ####

- '__strcmpeq()'
    - if 's1' and 's2' are equal, the return value is zero. Otherwise
      the return value is any non-zero value. 'strcmp()',
      '!!strcmp()', or '-strcmp()' are all valid implementations of
      '__strcmpeq()'.

- '__strncmpeq()'
    - if 's1' and 's2' are equal up to the first 'n' characters or up
      to and including the first null character, the return value is
      zero. Otherwise the return value is any non-zero
      value. 'strncmp()', '!!strncmp()', or '-strncmp()' are all valid
      implementations of '__strncmpeq()'.

- '__wcscmpeq()'
    - if 'ws1' and 'ws2' are equal, the return value is
      zero. Otherwise the return value is any non-zero
      value. 'wcscmp()', '!!wcscmp()', or '-wcscmp()' are all valid
      implementations of '__wcscmpeq()'.

- '__wcsncmpeq()'
    - if 'ws1' and 'ws2' are equal up to the first 'n' wide-characters
      or up to and including the first null wide-character, the return
      value is zero. Otherwise the return value is any non-zero
      value. 'wcsncmp()', '!!wcsncmp()', or '-wcsncmp()' are all valid
      implementations of '__wcsncmpeq()'.


#### Notes ####

These interfaces are designed intentionally so that the non "eq"
variant of each function will be a valid implementation of the
corresponding "eq" variant.


#### ABI vs API ####

This proposal is for '__strcmpeq()', '__strncmpeq()', '__wcscmpeq()',
and '__wcsncmpeq()' as new ABIs. As ABIs the interfaces will have
value as an optimization compilers can make for the idiomatic boolean
usage of the return value of the existing comparison functions.


Best,
Noah

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [libc-coord] Add new ABIs '__strcmpeq', '__strncmpeq', '__wcscmpeq' and '__wcsncmpeq' to libc
  2022-01-20 22:56 Add new ABIs '__strcmpeq', '__strncmpeq', '__wcscmpeq' and '__wcsncmpeq' to libc Noah Goldstein
@ 2022-01-21 18:51 ` Joerg Sonnenberger
  2022-01-21 21:50   ` Noah Goldstein
  0 siblings, 1 reply; 3+ messages in thread
From: Joerg Sonnenberger @ 2022-01-21 18:51 UTC (permalink / raw)
  To: libc-coord; +Cc: GNU C Library, Richard Biener via Gcc

On Thu, Jan 20, 2022 at 04:56:59PM -0600, Noah Goldstein wrote:
> The goal is that the new interfaces will be usable as an optimization
> by compilers if a program uses the return value of the non "eq"
> variant as a boolean.

So I'm curious, but can you demonstrate that it can be implemented
notacibly faster than regular strcmp? Unlike for memcmp, I don't see an
obvious way to save any operations.

Joerg

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [libc-coord] Add new ABIs '__strcmpeq', '__strncmpeq', '__wcscmpeq' and '__wcsncmpeq' to libc
  2022-01-21 18:51 ` [libc-coord] " Joerg Sonnenberger
@ 2022-01-21 21:50   ` Noah Goldstein
  0 siblings, 0 replies; 3+ messages in thread
From: Noah Goldstein @ 2022-01-21 21:50 UTC (permalink / raw)
  To: Joerg Sonnenberger; +Cc: libc-coord, Richard Biener via Gcc, GNU C Library

On Fri, Jan 21, 2022 at 12:51 PM Joerg Sonnenberger <joerg@bec.de> wrote:
>
> On Thu, Jan 20, 2022 at 04:56:59PM -0600, Noah Goldstein wrote:
> > The goal is that the new interfaces will be usable as an optimization
> > by compilers if a program uses the return value of the non "eq"
> > variant as a boolean.
>
> So I'm curious, but can you demonstrate that it can be implemented
> notacibly faster than regular strcmp? Unlike for memcmp, I don't see an
> obvious way to save any operations.

Strong point! I had been somewhat assuming we could make the same
optimizations with `__memcmpeq` but there still needs to be some
logic that tracks which comes first the mismatch or the null terminator.

It's not quite as much as `memcmp` vs `__memcmpeq` but we can
still save.

Using the x86_64 AVX2 optimized implementation as reference:
https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/x86_64/multiarch/strcmp-avx2.S;h=9c73b5899d55a72b292f21b52593284cd513d2a3;hb=HEAD

We can convert the general return method of checking equals + strlen from:

```
VMOVU (%rdi), %ymm0
VPCMPEQ (%rsi), %ymm0, %ymm1
VPCMPEQ %ymm0, %ymmZERO, %ymm2
vpandn %ymm1, %ymm2, %ymm1
vpmovmskb %ymm1, %ecx
incl %ecx
jz L(keep_going)
tzcntl %ecx, %ecx
movzbl (%rdi, %rcx), %eax
movzbl (%rsi, %rcx), %ecx
subl %ecx, %eax
vzeroupper
ret
```

To

```
VMOVU (%rdi), %ymm0
VPCMPEQ (%rsi), %ymm0, %ymm1
VPCMPEQ %ymm0, %ymmZERO, %ymm2
vpandn %ymm1, %ymm2, %ymm2
vpmovmskb %ymm2, %ecx
incl %ecx
jz L(keep_going)
vpmovmskb %ymm1, %eax
blsi %ecx, %ecx
andn %eax, %ecx, %eax
vzeroupper
ret
```

Testing this with comparisons where mismatch or strlen in the first 32 bytes
(common case) it's about the same throughput but ~20% reduction in latency.

Another benefit is we can reuse this exact return logic throughout as memory
offset is no longer required. This simplifies the page cross logic a
great deal and
will net us some serious code size reduction for the common usage of strcmp.

I think though I was a bit over optimistic about the performance benefits as I
was using `memcmp` vs `__memcmpeq` as a reference. I'll put together
a patch for just `__strcmpeq` and post the results here. I think the
wide-character
versions have more expensive return value checks so if the character versions
show a benefit we can expect it to translate.



>
> Joerg

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2022-01-21 21:50 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-01-20 22:56 Add new ABIs '__strcmpeq', '__strncmpeq', '__wcscmpeq' and '__wcsncmpeq' to libc Noah Goldstein
2022-01-21 18:51 ` [libc-coord] " Joerg Sonnenberger
2022-01-21 21:50   ` Noah Goldstein

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).