From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 0659A394740B; Thu, 27 Jan 2022 07:42:50 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 0659A394740B From: "rguenth at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug rtl-optimization/102178] [12 Regression] SPECFP 2006 470.lbm regressions on AMD Zen CPUs after r12-897-gde56f95afaaa22 Date: Thu, 27 Jan 2022 07:42:49 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: rtl-optimization X-Bugzilla-Version: 12.0 X-Bugzilla-Keywords: missed-optimization, ra X-Bugzilla-Severity: normal X-Bugzilla-Who: rguenth at gcc dot gnu.org X-Bugzilla-Status: NEW X-Bugzilla-Resolution: X-Bugzilla-Priority: P1 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: 12.0 X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: component cc keywords Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: gcc-bugs@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-bugs mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 27 Jan 2022 07:42:50 -0000 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D102178 Richard Biener changed: What |Removed |Added ---------------------------------------------------------------------------- Component|tree-optimization |rtl-optimization CC| |vmakarov at gcc dot gnu.org Keywords| |ra --- Comment #7 from Richard Biener --- I see a lot more GPR <-> XMM moves in the 'after' case: 1035 : 401c8b: vaddsd %xmm1,%xmm0,%xmm0 1953 : 401c8f: vmovq %rcx,%xmm1 305 : 401c94: vaddsd %xmm8,%xmm1,%xmm1 3076 : 401c99: vmovq %xmm0,%r14 590 : 401c9e: vmovq %r11,%xmm0 267 : 401ca3: vmovq %xmm1,%r8 136 : 401ca8: vmovq %rdx,%xmm1 448 : 401cad: vaddsd %xmm1,%xmm0,%xmm1 1703 : 401cb1: vmovq %xmm1,%r9 (*) 834 : 401cb6: vmovq %r8,%xmm1 1719 : 401cbb: vmovq %r9,%xmm0 (*) 2782 : 401cc0: vaddsd %xmm0,%xmm1,%xmm1 22135 : 401cc4: vmovsd %xmm1,%xmm1,%xmm0 1261 : 401cc8: vmovq %r14,%xmm1 646 : 401ccd: vaddsd %xmm0,%xmm1,%xmm0 18136 : 401cd1: vaddsd %xmm2,%xmm5,%xmm1 629 : 401cd5: vmovq %xmm1,%r8 142 : 401cda: vaddsd %xmm6,%xmm3,%xmm1 177 : 401cde: vmovq %xmm0,%r14 288 : 401ce3: vmovq %xmm1,%r9 177 : 401ce8: vmovq %r8,%xmm1 174 : 401ced: vmovq %r9,%xmm0 those look like RA / spilling artifacts and IIRC I saw Hongtao posting patches in this area to regcprop I think? The above is definitely bad, for example (*) seems to swap %xmm0 and %xmm1 via %r9. The function is LBM_performStreamCollide, the sinking pass does nothing wro= ng, it moves unconditionally executed - _948 =3D _861 + _867; - _957 =3D _944 + _948; - _912 =3D _861 + _873; ... - _981 =3D _853 + _865; - _989 =3D _977 + _981; - _916 =3D _853 + _857; - _924 =3D _912 + _916; into a conditionally executed block. But that increases register pressure by 5 FP regs (if I counted correctly) in that area. So this would be the usual issue of GIMPLE transforms not being register-pressure aware. -fschedule-insn -fsched-pressure seems to be able to somewhat mitigate this (though I think EBB scheduling cannot undo such movement). In postreload I see transforms like -(insn 466 410 411 7 (set (reg:DF 0 ax [530]) - (mem/u/c:DF (symbol_ref/u:DI ("*.LC10") [flags 0x2]) [0 S8 A64])) "lbm.c":241:5 141 {*movdf_internal} - (expr_list:REG_EQUAL (const_double:DF 9.939744999999999830464503247640095651149749755859375e-1 [0x0.fe751ce28ed5fp+0]) - (nil))) -(insn 411 466 467 7 (set (reg:DF 25 xmm5 [orig:123 prephitmp_643 ] [123]) +(insn 411 410 467 7 (set (reg:DF 25 xmm5 [orig:123 prephitmp_643 ] [123]) (reg:DF 0 ax [530])) "lbm.c":241:5 141 {*movdf_internal} (nil)) which seems like we could have reloaded %xmm5 from .LC10. But the spilling to GPRs seems to be present already after LRA and cprop_hardreg doesn't do anything bad either. The differences can be seen on trunk with -Ofast -march=3Dznver2 [-fdisable-tree-sink2]. We have X86_TUNE_INTER_UNIT_MOVES_TO_VEC/X86_TUNE_INTER_UNIT_MOVES_FROM_VEC and the interesting thing is that when I disable them I do see some spilling to the stack but also quite some re-materialized constants (loads from .LC* as seem from the opportunity above). It might be interesting to benchmark with -mtune-ctrl=3D^inter_unit_moves_from_vec,^inter_unit_moves_to_vec and find = a way to make costs in a way that IRA/LRA prefer re-materialization of constants from the constant pool over spilling to GPRs (if that's possible at all - Vlad?)=