From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id B06E3386EC45; Fri, 16 Apr 2021 20:30:08 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org B06E3386EC45 From: "munroesj at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug target/100085] Bad code for union transfer from __float128 to vector types Date: Fri, 16 Apr 2021 20:30:08 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: target X-Bugzilla-Version: 10.2.1 X-Bugzilla-Keywords: missed-optimization X-Bugzilla-Severity: normal X-Bugzilla-Who: munroesj at gcc dot gnu.org X-Bugzilla-Status: NEW X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: gcc-bugs@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-bugs mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 16 Apr 2021 20:30:08 -0000 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D100085 --- Comment #4 from Steven Munroe --- I am seeing this a similar problem with union transfers from __float128 to __int128. static inline unsigned __int128 vec_xfer_bin128_2_int128t (__binary128 f128) { __VF_128 vunion; vunion.vf1 =3D f128; return (vunion.ui1); } and=20 unsigned __int128 test_xfer_bin128_2_int128 (__binary128 f128) { return vec_xfer_bin128_2_int128t (f128); } generates: 0000000000000030 : 30: 57 12 42 f0 xxswapd vs34,vs34 34: 20 00 20 39 li r9,32 38: d0 ff 41 39 addi r10,r1,-48 3c: 99 4f 4a 7c stxvd2x vs34,r10,r9 40: f0 ff 61 e8 ld r3,-16(r1) 44: f8 ff 81 e8 ld r4,-8(r1) 48: 20 00 80 4e blr For POWER8 should use mfvsrd/xxpermdi/mfvsrd. This looks like the root cause of poor performance for __float128 soft-floa= t on POWER8. A simple benchmark using __float128 in C code calling libgcc for -mcpu=3Dpower8 and then hardware instructions for -mcpu=3Dpower9. P8 target P8AT14, Uses libgcc __addkf3_sw and __mulkf3_sw: test_time_f128 f128 CC tb delta =3D 52589, sec =3D 0.000102713 P9 Target P8AT14, Uses libgcc __addkf3_hw and __mulkf3_hw: test_time_f128 f128 CC tb delta =3D 18762, sec =3D 3.66445e-05 P9 Target P9AT14, inline hardware binary128 float: test_time_f128 f128 CC tb delta =3D 3809, sec =3D 7.43945e-06 I used Valgrind Itrace and Sim-ppc and perfstat analysis. Every call to lib= gcc __add/sub/mul/divkf3 takes a load-hit-store flush every call. This explains= why __float128 is so 13.8 X slower on P8 then P9.=