From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id AEDD83857C4B; Fri, 1 Dec 2023 15:37:06 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org AEDD83857C4B DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1701445026; bh=Ula3uCZMRFXma54ge13Fojz5NPL3bIJFaUKtToAb6S4=; h=From:To:Subject:Date:In-Reply-To:References:From; b=e04I4IeuHk9Zu1EQ1uMRX8tPtF6nqAhsA4k9EPheIe4R9HoeSqOK09+zVYtgrvkSZ H0mqvsPE22ra7h7VEiOMT4BGbfqqhxqJnSX+ZzOLDLj0p4GI7xHEjDoo/aseZbLXlE 1t1sxa+wXGFNp1EeM0FKTkQrrt+N2TGfXbXL2R1Q= From: "amonakov at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug middle-end/112697] [14 Regression] 30-40% exec time regression of 433.milc on zen2 since r14-4972-g8aa47713701b1f Date: Fri, 01 Dec 2023 15:37:03 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: middle-end X-Bugzilla-Version: 14.0 X-Bugzilla-Keywords: missed-optimization, needs-bisection X-Bugzilla-Severity: normal X-Bugzilla-Who: amonakov at gcc dot gnu.org X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: 14.0 X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 List-Id: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D112697 --- Comment #8 from Alexander Monakov --- Thanks, I can reproduce it. It is pretty tricky though. For instance, just swapping the mov and the compare is enough to make it fast: --- d.out.ltrans0.ltrans.slow.s 2023-12-01 18:32:54.255841611 +0300 +++ d.out.ltrans0.ltrans.fast.s 2023-12-01 18:32:20.318668991 +0300 @@ -743,8 +743,8 @@ add_force_to_mom: .p2align 4,,10 .p2align 3 .L58: - cmpb $1, -680(%r11,%r12) movapd %xmm5, %xmm7 + cmpb $1, -680(%r11,%r12) jne .L54 xorpd %xmm6, %xmm7 .L54:=