From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 7AE44385BF93; Fri, 18 Feb 2022 11:31:49 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 7AE44385BF93 From: "rguenth at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug tree-optimization/104582] [11/12 Regression] Unoptimal code for __negdi2 (and others) from libgcc2 due to unwanted vectorization Date: Fri, 18 Feb 2022 11:31:49 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: tree-optimization X-Bugzilla-Version: 12.0 X-Bugzilla-Keywords: missed-optimization X-Bugzilla-Severity: normal X-Bugzilla-Who: rguenth at gcc dot gnu.org X-Bugzilla-Status: ASSIGNED X-Bugzilla-Resolution: X-Bugzilla-Priority: P2 X-Bugzilla-Assigned-To: rguenth at gcc dot gnu.org X-Bugzilla-Target-Milestone: 11.3 X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: gcc-bugs@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-bugs mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 18 Feb 2022 11:31:49 -0000 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D104582 --- Comment #15 from Richard Biener --- The patch will cause FAIL: gcc.target/i386/pr91446.c scan-assembler-times vmovdqa[^\\n\\r]*xmm[0= -9] 2 FAIL: gcc.target/i386/pr92658-avx512bw-2.c scan-assembler-times pmovsxdq 2 FAIL: gcc.target/i386/pr92658-sse4-2.c scan-assembler-times pmovsxbq 2 FAIL: gcc.target/i386/pr92658-sse4-2.c scan-assembler-times pmovsxdq 2 FAIL: gcc.target/i386/pr92658-sse4-2.c scan-assembler-times pmovsxwq 2 FAIL: gcc.target/i386/pr92658-sse4.c scan-assembler-times pmovzxbq 2 FAIL: gcc.target/i386/pr92658-sse4.c scan-assembler-times pmovzxdq 2 FAIL: gcc.target/i386/pr92658-sse4.c scan-assembler-times pmovzxwq 2 XPASS: gcc.target/i386/pr99881.c scan-assembler-not xmm[0-9] I have to look into some of them. The pr92658 one seems to be cases like void bar_u32_u64 (v2di * dst, v4si src) { unsigned long long tem[2]; tem[0] =3D src[0]; tem[1] =3D src[1]; dst[0] =3D *(v2di *) tem; } where we fail to recognize the BIT_FIELD_REF as accessing a pre-existing vector (we only support a subset of cases during SLP discovery): _1 =3D BIT_FIELD_REF ; _2 =3D (long long unsigned int) _1; tem[0] =3D _2; _3 =3D BIT_FIELD_REF ; _4 =3D (long long unsigned int) _3; tem[1] =3D _4; but when vectorizing just store and the conversion as [local count: 1073741824]: _1 =3D BIT_FIELD_REF ; _3 =3D BIT_FIELD_REF ; _13 =3D {_1, _3}; vect__2.110_14 =3D (vector(2) long long unsigned int) _13; MEM [(long long unsigned int *)&tem] = =3D vect__2.110_14; we can recover things on the RTL side. So we just realize that costing is a difficult thing. Cost model analysis: _2 1 times scalar_store costs 12 in body _4 1 times scalar_store costs 12 in body (long long unsigned int) _1 1 times scalar_stmt costs 4 in body (long long unsigned int) _3 1 times scalar_stmt costs 4 in body (long long unsigned int) _1 1 times vector_stmt costs 4 in body node 0x415e268 1 times vec_construct costs 20 in prologue _2 1 times vector_store costs 16 in body Cost model analysis for part in loop 0: Vector cost: 40 Scalar cost: 32 not vectorized: vectorization is not profitable. note this uses icelake-server costs which has an unusally high sse_to_integ= er cost. The fix here would best be to recognize the BIT_FIELD_REF vector use of cou= rse.=