From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 69B67385800E; Tue, 19 Oct 2021 14:48:40 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 69B67385800E From: "roger at nextmovesoftware dot com" To: gcc-bugs@gcc.gnu.org Subject: [Bug rtl-optimization/102840] [12 Regression] gcc.target/i386/pr22076.c by r12-4475 Date: Tue, 19 Oct 2021 14:48:40 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: rtl-optimization X-Bugzilla-Version: 12.0 X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: roger at nextmovesoftware dot com X-Bugzilla-Status: NEW X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: 12.0 X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: cf_reconfirmed_on bug_status everconfirmed Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: gcc-bugs@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-bugs mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 19 Oct 2021 14:48:40 -0000 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D102840 Roger Sayle changed: What |Removed |Added ---------------------------------------------------------------------------- Last reconfirmed| |2021-10-19 Status|UNCONFIRMED |NEW Ever confirmed|0 |1 --- Comment #1 from Roger Sayle --- I believe this test case is poorly written, and not correctly testing the original issue in PR target/22076 which concerned suboptimal moving of arguments via memory (fixed by prohibiting reload using mmx registers). Prior to my patch, with -m32 -O2 -fomit-frame-pointer -mmmx -mno-sse2, GCC generated: test: movq .LC1, %mm0 paddb .LC0, %mm0 movq %mm0, x ret .x: .zero 8 .LC0: .byte 1 .byte 2 .byte 3 .byte 4 .byte 5 .byte 6 .byte 7 .byte 8 .LC1: .byte 11 .byte 22 .byte 33 .byte 44 .byte 55 .byte 66 .byte 77 .byte 88 which indeed doesn't use movl, and requires two movq. After my patch, we now generate the much more efficient (dare I say optimal= ): test: movl $807671820, %eax movl $1616136252, %edx movl %eax, x movl %edx, x+4 ret which has evaluated the _mm_add_pi8 at compile-time, and effectively memset= s x to the correct value in the minimum possible number of cycles. In fact, failing to evaluate this at compile-time is a regression since v4.1 (accord= ing to godbolt) [p.s. I predict other platforms might also notice changes in their testsuit= es, as the middle-end now generates more efficient instruction sequences].=