From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id D63C0385840A; Wed, 11 Oct 2023 19:33:24 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org D63C0385840A DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1697052804; bh=plnY3qUAMzcMhmNJasIbFu2KW6Km7STuMh5yJZv+n9E=; h=From:To:Subject:Date:From; b=v4orDETecyj79P7BPTbKndA/HNG6VvT1Q7TXtnIER7lpsfezSBv1Bg5+LCSBC+7eX 4o2QasTiA9pQhkqkHfhULn+km3vIit0NatJeuX5X1QBGUv7WG6YKG6/YQ6n1+rOiCi Mn+DvnSSFIJ8YEL+0XjwvaNg3Z481Hkwz/frBK3w= From: "daniel at binaryparadox dot net" To: gcc-bugs@gcc.gnu.org Subject: [Bug c/111774] New: boringssl performance gap between clang and gcc for x25519 operations Date: Wed, 11 Oct 2023 19:33:23 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: c X-Bugzilla-Version: 13.2.0 X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: daniel at binaryparadox dot net X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_id short_desc product version bug_status bug_severity priority component assigned_to reporter target_milestone Message-ID: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 List-Id: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D111774 Bug ID: 111774 Summary: boringssl performance gap between clang and gcc for x25519 operations Product: gcc Version: 13.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: daniel at binaryparadox dot net Target Milestone: --- Hi folks, I've been bisecting a performance regression for x25519 cryptographic operations with BoringSSL (https://boringssl.googlesource.com/boringssl) th= at causes builds with gcc (tested w/ 13.2.0) to perform significantly worse th= an builds with clang (tested w/ clang 11.1.0). I've identified the regression is in this commit: https://github.com/google/boringssl/commit/d605df5b6f8462c1f3005da82d718ec0= 67f46b70 Building the project with gcc prior to this commit (Linux 6.1.55, gcc 13.2.= 0, 12th Gen Intel Core i7-1280P) shows the following numbers in the boringssl performance tests: Did 90900 Ed25519 key generation operations in 1006408us (90321.2 ops/sec) Did 94000 Ed25519 signing operations in 1002192us (93794.4 ops/sec) Did 33000 Ed25519 verify operations in 1029750us (32046.6 ops/sec) Did 103000 Curve25519 base-point multiplication operations in 1005442us (102442.5 ops/sec) Did 39000 Curve25519 arbitrary point multiplication operations in 1010017us (38613.2 ops/sec) Building the project with gcc at the identified regression commit produces worse numbers for the same benchmarks: Did 33744 Ed25519 key generation operations in 1006475us (33526.9 ops/sec) Did 34000 Ed25519 signing operations in 1011973us (33597.7 ops/sec) Did 32000 Ed25519 verify operations in 1032193us (31002.0 ops/sec) Did 36000 Curve25519 base-point multiplication operations in 1021745us (352= 33.8 ops/sec) Did 39000 Curve25519 arbitrary point multiplication operations in 1020887us (38202.1 ops/sec) Running the same tests prior to the problematic commit but using clang 11.1= .0 produces these numbers: Did 80132 Ed25519 key generation operations in 1004593us (79765.6 ops/sec) Did 81000 Ed25519 signing operations in 1003061us (80752.8 ops/sec) Did 28000 Ed25519 verify operations in 1010878us (27698.7 ops/sec) Did 87000 Curve25519 base-point multiplication operations in 1005378us (865= 34.6 ops/sec) Did 38000 Curve25519 arbitrary point multiplication operations in 1004032us (37847.4 ops/sec) And doing the same with the problematic commit and clang 11.1.0 shows: Did 83739 Ed25519 key generation operations in 1007756us (83094.5 ops/sec) Did 88000 Ed25519 signing operations in 1010131us (87117.4 ops/sec) Did 31000 Ed25519 verify operations in 1013649us (30582.6 ops/sec) Did 94000 Curve25519 base-point multiplication operations in 1008822us (931= 78.0 ops/sec) Did 39000 Curve25519 arbitrary point multiplication operations in 1020461us (38218.0 ops/sec) You can see with the reported numbers that while the clang build is a little bit slower after the problematic commit, the GCC build is much slower, suggesting something specific to GCC is causing the slow down. I'm not confident in my ability to dissect the underlying cause, but suspect that GCC's handling of the new precomputed table representation is not as efficient as it could be relative to clang. I'm hopeful that with clear reproduction steps someone more familiar would be able to make progress. I've already opened a bug with the BoringSSL project: https://bugs.chromium.org/p/boringssl/issues/detail?id=3D655=20 Here are the reproduction steps: 1. Check out https://github.com/google/boringssl/commit/d605df5b6f8462c1f3005da82d718ec0= 67f46b70 2. Configure and build the project **with GCC**: ``` CFLAGS=3D"-Wno-error=3Dstringop-overflow" CC=3D CXX=3D cmake -DCMAKE_BUILD_= TYPE=3DRelease -B build-release-gcc make -C build-release-gcc ``` 3. Run the `bssl speed` tool, filtering for `25519`: ``` build-release-gcc/tool/bssl speed -filter 25519 ``` 4. Observe slower results. ``` 5. Check out https://github.com/google/boringssl/commit/4a0393fcf37d7dbd090a5bb2293601a9= ec7605da - the parent commit to d605df5b6f8462c1f3005da82d718ec067f46b70 6. Repeat the process described above. 7. Observe faster results. The same process can be undertaken with clang by substituting the `cmake` s= tep with: CC=3Dclang CXX=3Dclang++ cmake -DCMAKE_BUILD_TYPE=3DRelease -B build-releas= e-clang make -C build-release-clang Thank you!=