From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-bugzilla@gcc.gnu.org>
Received: by sourceware.org (Postfix, from userid 48)
	id 548E13858C1F; Tue, 12 Sep 2023 07:43:41 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 548E13858C1F
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org;
	s=default; t=1694504621;
	bh=+S4EhUgksh2j0KOT5pdjB1tzSEL9UaT2AOVMx2poeKA=;
	h=From:To:Subject:Date:In-Reply-To:References:From;
	b=DpurQDDqMKaIi3hdWFmaqvCxSlTJS08l6ed39Yxu31FN63ukp9T4OS6MrSPaM+uw1
	 mQtiYNztK1rm5mguQxAfzIoiIpQ2x+rVQAY3gqFgsJDPVrmsPwmWKQ+IbchFmuhJMx
	 4WkiVUSTKx5rSF53EycOLexBN4EQPF75Thy85V7k=
From: "rguenther at suse dot de" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug tree-optimization/110935] Missed BB reduction vectorization
 because of missed eliding of a permute
Date: Tue, 12 Sep 2023 07:43:40 +0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: tree-optimization
X-Bugzilla-Version: 13.1.0
X-Bugzilla-Keywords: missed-optimization
X-Bugzilla-Severity: normal
X-Bugzilla-Who: rguenther at suse dot de
X-Bugzilla-Status: UNCONFIRMED
X-Bugzilla-Resolution: 
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org
X-Bugzilla-Target-Milestone: ---
X-Bugzilla-Flags: 
X-Bugzilla-Changed-Fields: 
Message-ID: <bug-110935-4-HS6z8Rih1Q@http.gcc.gnu.org/bugzilla/>
In-Reply-To: <bug-110935-4@http.gcc.gnu.org/bugzilla/>
References: <bug-110935-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
List-Id: <gcc-bugs.sourceware.org>

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D110935
--- Comment #3 from rguenther at suse dot de <rguenther at suse dot de> ---
On Tue, 5 Sep 2023, rsandifo at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D110935
>=20
> --- Comment #2 from rsandifo at gcc dot gnu.org <rsandifo at gcc dot gnu.=
org> ---
> If we were going to do this in vect_optimize_slp_pass, I think
> we'd need a node for the reduction in the pass's internal graph.
> We could then record that all input layouts have zero cost.
>=20
> What's the reason for not having an SLP node for the reduction?
> Isn't it a similar kind of sink to a store or constructor?

The difference is that the reduction reduces the number of incoming
lanes (to one).  For a loop SLP reduction chain we also do not have a SLP
node for that part (because it's in the epilog).  For a loop SLP
reduction there isn't a reduction operation.  For both cases we manage
to elide permutes into them - I wondered how we do that in the new code
and if we can leverage that for the BB reduction case.

I did think of representing the reduction op but wondered how to do
that in the most sensible way.  It's kind-of a permute node with
an associated operation.  Or, if we use .REDUC_*_SCAL, a regular
node with a scalar vectype?  I'm not sure we want to overload
the VEC_PERM_EXPR SLP node further.  But for example with x86
we have a SAD operation with 4 incoming lanes in op0, 16 incoming
lanes in op1 and 4 outgoing lanes.

That said, currently the reduction node is implicit in the
instance root stmt and can be identified by the SLP instance kind only.=