From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-bugzilla@gcc.gnu.org>
Received: by sourceware.org (Postfix, from userid 48)
	id 2ED28384AB62; Mon, 22 Apr 2024 21:11:10 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 2ED28384AB62
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org;
	s=default; t=1713820270;
	bh=Mih3sSQd8gTnn7r0N1aXaAa7E5mOt1wp/hhPecq1nD4=;
	h=From:To:Subject:Date:In-Reply-To:References:From;
	b=AKWkebUT4b6H4Uq34NZaz2V+fd2dYXTuGB74QO4kyqpTsml3/nmWZL3XxAgWCZ1DN
	 8AxE+KneT22161/TkMyEXghROQRTysiw3IZh2Q4eyFxl1kDGh4R1byYOI8c2ARPMdk
	 BGDsz/36KoeIlPQNfNwlBR9mTWTT1WQGOD4cGKx8=
From: "andrew at sifive dot com" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug target/114809] [RISC-V RVV] Counting elements might be simpler
Date: Mon, 22 Apr 2024 21:11:09 +0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: target
X-Bugzilla-Version: 14.0
X-Bugzilla-Keywords: missed-optimization
X-Bugzilla-Severity: normal
X-Bugzilla-Who: andrew at sifive dot com
X-Bugzilla-Status: NEW
X-Bugzilla-Resolution: 
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org
X-Bugzilla-Target-Milestone: ---
X-Bugzilla-Flags: 
X-Bugzilla-Changed-Fields: cc
Message-ID: <bug-114809-4-cn0ZCjglvW@http.gcc.gnu.org/bugzilla/>
In-Reply-To: <bug-114809-4@http.gcc.gnu.org/bugzilla/>
References: <bug-114809-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
List-Id: <gcc-bugs.sourceware.org>

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D114809

Andrew Waterman <andrew at sifive dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |andrew at sifive dot com
--- Comment #2 from Andrew Waterman <andrew at sifive dot com> ---
To respond to some of Palmer's points:

In general, doing a single reduction at the end will perform better than do=
ing
multiple reductions.  For the same total number of additions, sum reductions
tend to be slower (or at least no faster) than regular vector adds.

On some microarchitectures, vcpop.m results in a loss-of-decoupling event,
since it's consumed by the scalar unit.  To get reasonable performance on t=
hose
uarches, you need to use maximal LMUL to amortize the loss-of-decoupling ev=
ent
over a greater amount of vector work.  (The alternative is to unroll the lo=
op
such that each vcpop.m writes a different x-register, but that's far messier
than using large LMUL.)=