From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 7E7F7384AB5F; Mon, 22 Apr 2024 20:24:46 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 7E7F7384AB5F DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1713817486; bh=P5AMmbH6oJmWi7E/hVde7vObvwBza7rCW3nOzkoybk0=; h=From:To:Subject:Date:In-Reply-To:References:From; b=HmhCgMrvLvLWJJcXUWhlkIJTL5MotqWHQ1n0kofTB311GcQ6Me/Pv1FSHmg0ePvn1 EJaKpYn1pC2piRowHdRzFTXvBAXzzbrXmjFZo4mRf7EE4uNYePn3Af+Izn3/O3RuUw tFard6UNrWP3LCl7KTvmPbjhPLBzhz3asD/S+1C8= From: "palmer at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug target/114809] [RISC-V RVV] Counting elements might be simpler Date: Mon, 22 Apr 2024 20:24:46 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: target X-Bugzilla-Version: 14.0 X-Bugzilla-Keywords: missed-optimization X-Bugzilla-Severity: normal X-Bugzilla-Who: palmer at gcc dot gnu.org X-Bugzilla-Status: NEW X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_status keywords everconfirmed cc cf_reconfirmed_on Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 List-Id: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D114809 palmer at gcc dot gnu.org changed: What |Removed |Added ---------------------------------------------------------------------------- Status|UNCONFIRMED |NEW Keywords| |missed-optimization Ever confirmed|0 |1 CC| |palmer at gcc dot gnu.org Last reconfirmed| |2024-04-22 --- Comment #1 from palmer at gcc dot gnu.org --- Thanks. Sounds like there's really two issues here: a missed peephole and a more complex set of micro-architectural tradeoffs. The peephole seems like a pretty straight-forward missed optimization, if you've got a smaller reproducer it's probably worth filing another bug for = it.=20 We're right at the end of the GCC-14 release process and ended up with some last-minute breakages so stuff is pretty chaotic right now, having the bug = will make it easier to avoid forgetting about this. The reduction looks way more complicated to me. Just thinking a bit as I'm watching the regressions run, I think there's a few options for generating = the code here: * Do we accumulate into a vector and then reduce, or reduce and then accumulate? * Do we reduce via a sum-reduction or a popcnt? * Do we reconfigure to a wider type or handle the overflow? I think this will depend on the cost model for the hardware: we're essentia= lly trading off operations of one flavor of op for another, and that's going to depend on how these ops perform. Your suggestion is essentially a reconfiguration vs reduction trade-off, which is probably going to be implementation-specific. Do you have a system that this code performs poorly on? If there's somethi= ng concrete to target and we're not generating good code that's pretty actiona= ble, otherwise I think this one is going to be hard to reason about for a bit.=