From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 548E13858C1F; Tue, 12 Sep 2023 07:43:41 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 548E13858C1F DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1694504621; bh=+S4EhUgksh2j0KOT5pdjB1tzSEL9UaT2AOVMx2poeKA=; h=From:To:Subject:Date:In-Reply-To:References:From; b=DpurQDDqMKaIi3hdWFmaqvCxSlTJS08l6ed39Yxu31FN63ukp9T4OS6MrSPaM+uw1 mQtiYNztK1rm5mguQxAfzIoiIpQ2x+rVQAY3gqFgsJDPVrmsPwmWKQ+IbchFmuhJMx 4WkiVUSTKx5rSF53EycOLexBN4EQPF75Thy85V7k= From: "rguenther at suse dot de" To: gcc-bugs@gcc.gnu.org Subject: [Bug tree-optimization/110935] Missed BB reduction vectorization because of missed eliding of a permute Date: Tue, 12 Sep 2023 07:43:40 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: tree-optimization X-Bugzilla-Version: 13.1.0 X-Bugzilla-Keywords: missed-optimization X-Bugzilla-Severity: normal X-Bugzilla-Who: rguenther at suse dot de X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 List-Id: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D110935 --- Comment #3 from rguenther at suse dot de --- On Tue, 5 Sep 2023, rsandifo at gcc dot gnu.org wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D110935 >=20 > --- Comment #2 from rsandifo at gcc dot gnu.org --- > If we were going to do this in vect_optimize_slp_pass, I think > we'd need a node for the reduction in the pass's internal graph. > We could then record that all input layouts have zero cost. >=20 > What's the reason for not having an SLP node for the reduction? > Isn't it a similar kind of sink to a store or constructor? The difference is that the reduction reduces the number of incoming lanes (to one). For a loop SLP reduction chain we also do not have a SLP node for that part (because it's in the epilog). For a loop SLP reduction there isn't a reduction operation. For both cases we manage to elide permutes into them - I wondered how we do that in the new code and if we can leverage that for the BB reduction case. I did think of representing the reduction op but wondered how to do that in the most sensible way. It's kind-of a permute node with an associated operation. Or, if we use .REDUC_*_SCAL, a regular node with a scalar vectype? I'm not sure we want to overload the VEC_PERM_EXPR SLP node further. But for example with x86 we have a SAD operation with 4 incoming lanes in op0, 16 incoming lanes in op1 and 4 outgoing lanes. That said, currently the reduction node is implicit in the instance root stmt and can be identified by the SLP instance kind only.=