From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id C34AC385DC18; Fri, 18 Feb 2022 13:45:35 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org C34AC385DC18 From: "rguenth at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug tree-optimization/104582] [11/12 Regression] Unoptimal code for __negdi2 (and others) from libgcc2 due to unwanted vectorization Date: Fri, 18 Feb 2022 13:45:35 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: tree-optimization X-Bugzilla-Version: 12.0 X-Bugzilla-Keywords: missed-optimization X-Bugzilla-Severity: normal X-Bugzilla-Who: rguenth at gcc dot gnu.org X-Bugzilla-Status: ASSIGNED X-Bugzilla-Resolution: X-Bugzilla-Priority: P2 X-Bugzilla-Assigned-To: rguenth at gcc dot gnu.org X-Bugzilla-Target-Milestone: 11.3 X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: gcc-bugs@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-bugs mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 18 Feb 2022 13:45:35 -0000 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D104582 --- Comment #17 from Richard Biener --- For FAIL: gcc.target/i386/pr91446.c scan-assembler-times vmovdqa[^\\n\\r]*xmm[0= -9] 2 we used to produce 0000000000000000 : 0: 48 83 ec 28 sub $0x28,%rsp 4: c4 e1 f9 6e d7 vmovq %rdi,%xmm2 9: c4 e1 f9 6e da vmovq %rdx,%xmm3 e: c4 e3 e9 22 ce 01 vpinsrq $0x1,%rsi,%xmm2,%xmm1 14: c4 e3 e1 22 c1 01 vpinsrq $0x1,%rcx,%xmm3,%xmm0 1a: 48 89 e7 mov %rsp,%rdi 1d: c5 f9 7f 0c 24 vmovdqa %xmm1,(%rsp) 22: c5 f9 7f 44 24 10 vmovdqa %xmm0,0x10(%rsp) 28: e8 00 00 00 00 call 2d 2d: 48 83 c4 28 add $0x28,%rsp 31: c3 ret=20=20=20=20 but now reject this on costing grounds. The scalar code is 0000000000000000 : 0: 48 83 ec 28 sub $0x28,%rsp 4: 48 89 3c 24 mov %rdi,(%rsp) 8: 48 89 e7 mov %rsp,%rdi b: 48 89 74 24 08 mov %rsi,0x8(%rsp) 10: 48 89 54 24 10 mov %rdx,0x10(%rsp) 15: 48 89 4c 24 18 mov %rcx,0x18(%rsp) 1a: e8 00 00 00 00 call 1f 1f: 48 83 c4 28 add $0x28,%rsp 23: c3 ret=20=20=20=20 I think the scalar variant is 5 uops up to the call while the vector variant is 9 uops. The scalar variant can also execute 4 of the uops in parallel (well, I guess only up to 3 with 3 store ports). I think the scalar variant is better and so I'm inclined to adjust the testcase.=