public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug c/111774] New: boringssl performance gap between clang and gcc for x25519 operations
@ 2023-10-11 19:33 daniel at binaryparadox dot net
  2023-10-11 19:51 ` [Bug target/111774] " pinskia at gcc dot gnu.org
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: daniel at binaryparadox dot net @ 2023-10-11 19:33 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111774

            Bug ID: 111774
           Summary: boringssl performance gap between clang and gcc for
                    x25519 operations
           Product: gcc
           Version: 13.2.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c
          Assignee: unassigned at gcc dot gnu.org
          Reporter: daniel at binaryparadox dot net
  Target Milestone: ---

Hi folks,

I've been bisecting a performance regression for x25519 cryptographic
operations with BoringSSL (https://boringssl.googlesource.com/boringssl) that
causes builds with gcc (tested w/ 13.2.0) to perform significantly worse than
builds with clang (tested w/ clang 11.1.0).

I've identified the regression is in this commit:
https://github.com/google/boringssl/commit/d605df5b6f8462c1f3005da82d718ec067f46b70


Building the project with gcc prior to this commit (Linux 6.1.55, gcc 13.2.0,
12th Gen Intel Core i7-1280P) shows the following numbers in the boringssl
performance tests:

Did 90900 Ed25519 key generation operations in 1006408us (90321.2 ops/sec)
Did 94000 Ed25519 signing operations in 1002192us (93794.4 ops/sec)
Did 33000 Ed25519 verify operations in 1029750us (32046.6 ops/sec)
Did 103000 Curve25519 base-point multiplication operations in 1005442us
(102442.5 ops/sec)
Did 39000 Curve25519 arbitrary point multiplication operations in 1010017us
(38613.2 ops/sec)

Building the project with gcc at the identified regression commit produces
worse numbers for the same benchmarks:

Did 33744 Ed25519 key generation operations in 1006475us (33526.9 ops/sec)
Did 34000 Ed25519 signing operations in 1011973us (33597.7 ops/sec)
Did 32000 Ed25519 verify operations in 1032193us (31002.0 ops/sec)
Did 36000 Curve25519 base-point multiplication operations in 1021745us (35233.8
ops/sec)
Did 39000 Curve25519 arbitrary point multiplication operations in 1020887us
(38202.1 ops/sec)

Running the same tests prior to the problematic commit but using clang 11.1.0
produces these numbers:

Did 80132 Ed25519 key generation operations in 1004593us (79765.6 ops/sec)
Did 81000 Ed25519 signing operations in 1003061us (80752.8 ops/sec)
Did 28000 Ed25519 verify operations in 1010878us (27698.7 ops/sec)
Did 87000 Curve25519 base-point multiplication operations in 1005378us (86534.6
ops/sec)
Did 38000 Curve25519 arbitrary point multiplication operations in 1004032us
(37847.4 ops/sec)

And doing the same with the problematic commit and clang 11.1.0 shows:

Did 83739 Ed25519 key generation operations in 1007756us (83094.5 ops/sec)
Did 88000 Ed25519 signing operations in 1010131us (87117.4 ops/sec)
Did 31000 Ed25519 verify operations in 1013649us (30582.6 ops/sec)
Did 94000 Curve25519 base-point multiplication operations in 1008822us (93178.0
ops/sec)
Did 39000 Curve25519 arbitrary point multiplication operations in 1020461us
(38218.0 ops/sec)

You can see with the reported numbers that while the clang build is a little
bit slower after the problematic commit, the GCC build is much slower,
suggesting something specific to GCC is causing the slow down.

I'm not confident in my ability to dissect the underlying cause, but suspect
that GCC's handling of the new precomputed table representation is not as
efficient as it could be relative to clang. I'm hopeful that with clear
reproduction steps someone more familiar would be able to make progress.

I've already opened a bug with the BoringSSL project:
https://bugs.chromium.org/p/boringssl/issues/detail?id=655 


Here are the reproduction steps:

1. Check out
https://github.com/google/boringssl/commit/d605df5b6f8462c1f3005da82d718ec067f46b70
2. Configure and build the project **with GCC**:
```
CFLAGS="-Wno-error=stringop-overflow" CC= CXX= cmake -DCMAKE_BUILD_TYPE=Release
-B build-release-gcc
<snipped>
make -C build-release-gcc
<snipped>
```
3. Run the `bssl speed` tool, filtering for `25519`:
```
build-release-gcc/tool/bssl speed -filter 25519
```
4. Observe slower results.
```
5. Check out
https://github.com/google/boringssl/commit/4a0393fcf37d7dbd090a5bb2293601a9ec7605da
- the parent commit to d605df5b6f8462c1f3005da82d718ec067f46b70
6. Repeat the process described above.
7. Observe faster results.

The same process can be undertaken with clang by substituting the `cmake` step
with:

CC=clang CXX=clang++ cmake -DCMAKE_BUILD_TYPE=Release -B build-release-clang
make -C build-release-clang

Thank you!

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug target/111774] boringssl performance gap between clang and gcc for x25519 operations
  2023-10-11 19:33 [Bug c/111774] New: boringssl performance gap between clang and gcc for x25519 operations daniel at binaryparadox dot net
@ 2023-10-11 19:51 ` pinskia at gcc dot gnu.org
  2023-10-11 19:52 ` pinskia at gcc dot gnu.org
  2023-10-11 20:02 ` daniel at binaryparadox dot net
  2 siblings, 0 replies; 4+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-10-11 19:51 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111774

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
     Ever confirmed|0                           |1
             Status|UNCONFIRMED                 |WAITING
          Component|c                           |target
   Last reconfirmed|                            |2023-10-11
           Keywords|                            |missed-optimization

--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Hmm:
#if defined(__clang__) // materialize for vectorization, 6% speedup
  __asm__("" : "+m" (t_bytes) : /*no inputs*/);
#endif


What target is this for? What processor too?

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug target/111774] boringssl performance gap between clang and gcc for x25519 operations
  2023-10-11 19:33 [Bug c/111774] New: boringssl performance gap between clang and gcc for x25519 operations daniel at binaryparadox dot net
  2023-10-11 19:51 ` [Bug target/111774] " pinskia at gcc dot gnu.org
@ 2023-10-11 19:52 ` pinskia at gcc dot gnu.org
  2023-10-11 20:02 ` daniel at binaryparadox dot net
  2 siblings, 0 replies; 4+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-10-11 19:52 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111774

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Target|                            |x86_64-linux-gnu

--- Comment #2 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
(In reply to Andrew Pinski from comment #1)
> Hmm:
> #if defined(__clang__) // materialize for vectorization, 6% speedup
>   __asm__("" : "+m" (t_bytes) : /*no inputs*/);
> #endif
> 
> 
> What target is this for? What processor too?

What happens if you enable the above for GCC too?

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug target/111774] boringssl performance gap between clang and gcc for x25519 operations
  2023-10-11 19:33 [Bug c/111774] New: boringssl performance gap between clang and gcc for x25519 operations daniel at binaryparadox dot net
  2023-10-11 19:51 ` [Bug target/111774] " pinskia at gcc dot gnu.org
  2023-10-11 19:52 ` pinskia at gcc dot gnu.org
@ 2023-10-11 20:02 ` daniel at binaryparadox dot net
  2 siblings, 0 replies; 4+ messages in thread
From: daniel at binaryparadox dot net @ 2023-10-11 20:02 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111774

--- Comment #3 from cpu <daniel at binaryparadox dot net> ---
> What happens if you enable the above for GCC too?

That appears to have helped, but not closed the gap:

```
Did 39600 Ed25519 key generation operations in 1001716us (39532.2 ops/sec)
Did 41000 Ed25519 signing operations in 1006641us (40729.5 ops/sec)
Did 32000 Ed25519 verify operations in 1020079us (31370.1 ops/sec)
Did 43000 Curve25519 base-point multiplication operations in 1023075us (42030.2
ops/sec)
Did 39000 Curve25519 arbitrary point multiplication operations in 1008147us
(38684.8 ops/sec)
```

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2023-10-11 20:02 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-10-11 19:33 [Bug c/111774] New: boringssl performance gap between clang and gcc for x25519 operations daniel at binaryparadox dot net
2023-10-11 19:51 ` [Bug target/111774] " pinskia at gcc dot gnu.org
2023-10-11 19:52 ` pinskia at gcc dot gnu.org
2023-10-11 20:02 ` daniel at binaryparadox dot net

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).