From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 712173858D20; Sat, 10 Feb 2024 03:41:37 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 712173858D20 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1707536497; bh=JADHoZHvRP1ccrZiiP3BIBVB2raG7dYXlBozgc+j3QI=; h=From:To:Subject:Date:In-Reply-To:References:From; b=dwXzXM2PIR7RtxIH8331utQdjQrzdrdqkLzxnVYyvIyxTIPnjVKhbnLc+Oa16Xd0O hq7mmFIpuaRlpJr/RrWDJaDs+ZXEs/6VhKixQn8o8/ieex2i9VNB68Ga1nu3y5LhsW nkifntY+0osdXL2Z+Tl1Ix2G5ncnnZlDsqO/hPoE= From: "pinskia at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug target/113859] popcount HI can be vectorized for non-SVE Date: Sat, 10 Feb 2024 03:41:37 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: target X-Bugzilla-Version: 14.0 X-Bugzilla-Keywords: missed-optimization X-Bugzilla-Severity: enhancement X-Bugzilla-Who: pinskia at gcc dot gnu.org X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 List-Id: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D113859 --- Comment #1 from Andrew Pinski --- SI (and DI) can be optimized too. LLVM is produces for int: ldr d0, [x0] cnt v0.8b, v0.8b uaddlp v0.4h, v0.8b uaddlp v0.2s, v0.4h str d0, [x1] ret And for long: ``` ldr q0, [x0] cnt v0.16b, v0.16b uaddlp v0.8h, v0.16b uaddlp v0.4s, v0.8h uaddlp v0.2d, v0.4s str q0, [x1] ret ``` That is for SLP version: ``` void f(unsigned long * __restrict b, unsigned long * __restrict d) { d[0] =3D __builtin_popcountll(b[0]); d[1] =3D __builtin_popcountll(b[1]); } ``` s/long/int/ in the first case. Note using SVE is better than the above if it is available and that is part= of PR 113860 though.=