From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 94531396ECB1; Thu, 3 Dec 2020 11:24:27 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 94531396ECB1 From: "rguenth at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug tree-optimization/97770] [ICELAKE]Missing vectorization for vpopcnt Date: Thu, 03 Dec 2020 11:24:27 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: tree-optimization X-Bugzilla-Version: 11.0 X-Bugzilla-Keywords: missed-optimization X-Bugzilla-Severity: normal X-Bugzilla-Who: rguenth at gcc dot gnu.org X-Bugzilla-Status: NEW X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: component Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: gcc-bugs@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-bugs mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 03 Dec 2020 11:24:27 -0000 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D97770 Richard Biener changed: What |Removed |Added ---------------------------------------------------------------------------- Component|target |tree-optimization --- Comment #14 from Richard Biener --- So we vectorize to _18 =3D .POPCOUNT (vect__5.7_22); _17 =3D .POPCOUNT (vect__5.7_21); vect__6.8_16 =3D VEC_PACK_TRUNC_EXPR <_18, _17>; _6 =3D 0; _7 =3D dest_13(D) + _2; vect__8.9_10 =3D [vec_unpack_lo_expr] vect__6.8_16; vect__8.9_9 =3D [vec_unpack_hi_expr] vect__6.8_16; _8 =3D (long long int) _6; which is exactly the issue that in the scalar code we have a 'int' producing popcount with long long argument but the vector IFN produces a result of the same width as the argument. So the vectorizer compensates for that (VEC_PACK_TRUNC_EXPR) and then vectorizes the widening that's in the scalar code (vec_unpack_{lo,hi}_expr). The fix for this and for the missing byte and word variants is to add a pattern to tree-vect-patterns.c for this case matching it to the .POPCOUNT internal function. That possibly applies to other bitops, too, like parity, ctz, ffs, etc. There's quite some _widen helpers in the pattern recog code so I'm not sure how complicated it is to match (long)popcountl(long) and (short)popcount((int)short) Richard may have a good idea since he did the last "big" surgery there.=