From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-bugzilla@gcc.gnu.org>
Received: by sourceware.org (Postfix, from userid 48)
	id 9E4FA385840E; Tue, 11 Jul 2023 10:41:44 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 9E4FA385840E
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org;
	s=default; t=1689072104;
	bh=3s7pTypjU3hiLnTwDUhg9vFNDhejvzPV6ESqW7NFqw0=;
	h=From:To:Subject:Date:In-Reply-To:References:From;
	b=R/C940hpVhGfoevsQNkNgyuZH7GXFGoGoHSR2XCdZm5nWcOThV+o3ILjec3/hyufd
	 zUYHzJW97D3OCb6kBDG/Y7tD8VqSZF77FtbH0LVIq7xTIhU3X4yi4vCDsmeVlgMz6g
	 n2LOn5/iYnDV3sy5RD2YmlHhlbNZ7R0TO3qQxIBE=
From: "rguenth at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug target/110625] [AArch64] Vect: SLP fails to vectorize a loop as
 the reduction_latency calculated by new costs is too large
Date: Tue, 11 Jul 2023 10:41:44 +0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: target
X-Bugzilla-Version: 14.0
X-Bugzilla-Keywords: missed-optimization
X-Bugzilla-Severity: normal
X-Bugzilla-Who: rguenth at gcc dot gnu.org
X-Bugzilla-Status: UNCONFIRMED
X-Bugzilla-Resolution: 
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org
X-Bugzilla-Target-Milestone: ---
X-Bugzilla-Flags: 
X-Bugzilla-Changed-Fields: cf_gcctarget keywords
Message-ID: <bug-110625-4-AhYmgXI4sC@http.gcc.gnu.org/bugzilla/>
In-Reply-To: <bug-110625-4@http.gcc.gnu.org/bugzilla/>
References: <bug-110625-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
List-Id: <gcc-bugs.sourceware.org>

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D110625

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Target|                            |aarch64
           Keywords|                            |missed-optimization
--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
Well, I think count is handled correctly even for SLP.  Given we accumulate
'short' to 'double' we likely perform 'count' adds to the m's here and those
are chained in a simple way.  We specifically avoid creating more
reduction variables because of register pressure issues with and without SLP
if possible.  Note when you have for example three scalar reductions we will
up the number of IVs to use with SLP, so using 'count' isn't always 100%
accurate but it the case of the testcase it should be.

But I'm not sure what "reduction-latency" tries to measure.=