From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id C5D1E398300A; Tue, 10 Nov 2020 08:55:45 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org C5D1E398300A From: "rguenth at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug target/97770] [ICELAKE]Missing vectorization for vpopcnt Date: Tue, 10 Nov 2020 08:55:45 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: target X-Bugzilla-Version: 11.0 X-Bugzilla-Keywords: missed-optimization X-Bugzilla-Severity: normal X-Bugzilla-Who: rguenth at gcc dot gnu.org X-Bugzilla-Status: NEW X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: cc Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: gcc-bugs@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-bugs mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 10 Nov 2020 08:55:46 -0000 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D97770 Richard Biener changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |rguenth at gcc dot gnu.org --- Comment #6 from Richard Biener --- (In reply to Hongtao.liu from comment #5) > (In reply to Richard Biener from comment #4) > > What's missing is middle-end folding support to narrow popcount to the > > appropriate internal function call with byte/half-word width when target > > support > > is available. But I'm quite sure there's no scalar popcount instruction > > operating on half-word or byte pieces of a GPR? > >=20 > > Alternatively the vectorizer can use patterns to do this. >=20 > Yes, but for 64bit width, vectorizer generate suboptimal code. >=20 > sse #c3 >=20 > vector(2) long long unsigned int vect__4.6; > vector(2) long long unsigned int vect__4.5; > vector(2) long long unsigned int _8; > vector(2) long long unsigned int _26; >=20 > ... > ... >=20 > _8 =3D .POPCOUNT (vect__4.5_16); > _26 =3D .POPCOUNT (vect__4.6_9); > vect__5.7_22 =3D VEC_PACK_TRUNC_EXPR <_8, _26>; --- Why do we do this? > vector(4) int vect__5.7; >=20 >=20 > It could generate directly >=20 > v4di =3D .POPCOUNT (v4di); I guess that the vectorized popcount IFN is defined to be VnDI -> VnDI but we want to have VnSImode results. This means the instruction is wrongly modeled in vectorized form? Note the vectorizer isn't very good in handling narrowing operations here. If you can push the missing patterns I can have a look. Bonus points for a correctness testcase (from the above I think we're generating wrong code)= .=