From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id BFFDB387086D; Wed, 26 Jun 2024 14:36:35 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org BFFDB387086D DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1719412595; bh=aWGlm2I+kxXycSoIYkGIrbaYWnyNuyBoweGFm/BF058=; h=From:To:Subject:Date:From; b=mltqGFPKJWUONwwXX7M+dwedlYwnR2IXaBhiMBlHGIN3xpXBpHce5Bv2fQMJonM4V cLLHhEpUpsHdBei9s1QkExpgliu9drANH+V6fE4DNuGvqaKL/WU0HFy8CJBQwa0l/l 33DQof4etrLjJXaOr011f9jfPkq49F+sVJfanXOk= From: "ktkachov at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug rtl-optimization/115667] New: Improve expansion for popcountti2 Date: Wed, 26 Jun 2024 14:36:35 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: rtl-optimization X-Bugzilla-Version: 13.3.1 X-Bugzilla-Keywords: missed-optimization X-Bugzilla-Severity: normal X-Bugzilla-Who: ktkachov at gcc dot gnu.org X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_id short_desc product version bug_status keywords bug_severity priority component assigned_to reporter target_milestone Message-ID: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 List-Id: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D115667 Bug ID: 115667 Summary: Improve expansion for popcountti2 Product: gcc Version: 13.3.1 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: ktkachov at gcc dot gnu.org Target Milestone: --- Maybe this is aarch64-specific but for the testcase: int cnt (unsigned __int128 a) { return __builtin_popcountg (a); } GCC for aarch64 will generate: cnt: fmov d30, x0 fmov d31, x1 cnt v30.8b, v30.8b cnt v31.8b, v31.8b addv b30, v30.8b addv b31, v31.8b fmov x1, d30 fmov x0, d31 add w0, w1, w0 ret Effectively doing two DImode popcount expansions and adding the results. Clang does the more effective: cnt: // @cnt fmov d0, x0 mov v0.d[1], x1 cnt v0.16b, v0.16b uaddlv h0, v0.16b fmov w0, s0 ret=