From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-bugzilla@gcc.gnu.org>
Received: by sourceware.org (Postfix, from userid 48)
	id C806D3858415; Tue, 27 Jun 2023 08:13:37 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org C806D3858415
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org;
	s=default; t=1687853617;
	bh=v/iZbPoSC0CrTUutZ5CnS0j3dVckYp6OyhdYVsAhZs0=;
	h=From:To:Subject:Date:In-Reply-To:References:From;
	b=T8NEg+XMJxGEZPZ1GJduhZ/Wd2lkFGtfL7MOaDt0LSJcTwBXA7HsTTd0n4Idu1+Qp
	 hGJipFOe1DhFNJXr6yTRRJNeGB+ygekf1TY53/di0bLrLziA72LOiuWgvDtqLjQNGi
	 ID0UPtcduYuvTi+E/4tm27miOK/7qRHP8CYaR0ck=
From: "rguenth at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug middle-end/106081] missed vectorization
Date: Tue, 27 Jun 2023 08:13:28 +0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: middle-end
X-Bugzilla-Version: 13.0
X-Bugzilla-Keywords: missed-optimization
X-Bugzilla-Severity: enhancement
X-Bugzilla-Who: rguenth at gcc dot gnu.org
X-Bugzilla-Status: NEW
X-Bugzilla-Resolution: 
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org
X-Bugzilla-Target-Milestone: ---
X-Bugzilla-Flags: 
X-Bugzilla-Changed-Fields: cc
Message-ID: <bug-106081-4-8w84bUx2ue@http.gcc.gnu.org/bugzilla/>
In-Reply-To: <bug-106081-4@http.gcc.gnu.org/bugzilla/>
References: <bug-106081-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
List-Id: <gcc-bugs.sourceware.org>

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D106081

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |rsandifo at gcc dot gnu.org
--- Comment #6 from Richard Biener <rguenth at gcc dot gnu.org> ---
So what's interesting is that we now get as of r14-2117-gdd86a5a69cbda4 the
following.  The odd thing is that we fail to eliminate the load permutation
{ 3 2 1 0 } even though this is a reduction group.

I _suppose_ the reason is the { 0 0 0 0 } load permutation (the "splat")
which we don't "support".  In vect_optimize_slp_pass::start_choosing_layouts
there's

      if (SLP_TREE_LOAD_PERMUTATION (node).exists ())
        {
          /* If splitting out a SLP_TREE_LANE_PERMUTATION can make the node
             unpermuted, record a layout that reverses this permutation.

             We would need more work to cope with loads that are internally
             permuted and also have inputs (such as masks for
             IFN_MASK_LOADs).  */
          gcc_assert (partition.layout =3D=3D 0 && !m_slpg->vertices[node_i=
].succ);
          if (!STMT_VINFO_GROUPED_ACCESS (dr_stmt))
            continue;

which means we'll keep the permute there (well, that's OK - any permute
of the permute will retain it ...).  I suspect this prevents the optimizati=
on
here.  Massaging start_choosing_layouts to allow a splat on element zero
for a non-grouped access breaks things as we try to move that permute.
So I guess this needs a new kind of layout constraint?  The permute
can absorb any permute but we cannot "move" it.

Richard?


t.c:14:18: note:   =3D=3D=3D scheduling SLP instances =3D=3D=3D
t.c:14:18: note:   Vectorizing SLP tree:
t.c:14:18: note:   node 0x4304170 (max_nunits=3D16, refcnt=3D2) vector(4) d=
ouble
t.c:14:18: note:   op template: _21 =3D _20 + results$d_60;
t.c:14:18: note:        stmt 0 _21 =3D _20 + results$d_60;
t.c:14:18: note:        stmt 1 _17 =3D _16 + results$c_58;
t.c:14:18: note:        stmt 2 _13 =3D _12 + results$b_56;
t.c:14:18: note:        stmt 3 _9 =3D _8 + results$a_54;
t.c:14:18: note:        children 0x43041f8 0x4304418
t.c:14:18: note:   node 0x43041f8 (max_nunits=3D16, refcnt=3D1) vector(4) d=
ouble
t.c:14:18: note:   op template: _20 =3D _1 * _19;
t.c:14:18: note:        stmt 0 _20 =3D _1 * _19;
t.c:14:18: note:        stmt 1 _16 =3D _1 * _15;
t.c:14:18: note:        stmt 2 _12 =3D _1 * _11;
t.c:14:18: note:        stmt 3 _8 =3D _1 * _7;
t.c:14:18: note:        children 0x4304280 0x4304308
t.c:14:18: note:   node 0x4304280 (max_nunits=3D4, refcnt=3D1) vector(4) do=
uble
t.c:14:18: note:   op template: _1 =3D *k_50;
t.c:14:18: note:        stmt 0 _1 =3D *k_50;
t.c:14:18: note:        stmt 1 _1 =3D *k_50;
t.c:14:18: note:        stmt 2 _1 =3D *k_50;
t.c:14:18: note:        stmt 3 _1 =3D *k_50;
t.c:14:18: note:        load permutation { 0 0 0 0 }
t.c:14:18: note:   node 0x4304308 (max_nunits=3D16, refcnt=3D1) vector(4) d=
ouble
t.c:14:18: note:   op template: _19 =3D (double) _18;
t.c:14:18: note:        stmt 0 _19 =3D (double) _18;
t.c:14:18: note:        stmt 1 _15 =3D (double) _14;
t.c:14:18: note:        stmt 2 _11 =3D (double) _10;
t.c:14:18: note:        stmt 3 _7 =3D (double) _6;
t.c:14:18: note:        children 0x4304390
t.c:14:18: note:   node 0x4304390 (max_nunits=3D16, refcnt=3D1) vector(16) =
short
int
t.c:14:18: note:   op template: _18 =3D _5->d;
t.c:14:18: note:        stmt 0 _18 =3D _5->d;
t.c:14:18: note:        stmt 1 _14 =3D _5->c;
t.c:14:18: note:        stmt 2 _10 =3D _5->b;
t.c:14:18: note:        stmt 3 _6 =3D _5->a;
t.c:14:18: note:        load permutation { 3 2 1 0 }
t.c:14:18: note:   node 0x4304418 (max_nunits=3D4, refcnt=3D1) vector(4) do=
uble
t.c:14:18: note:   op template: results$d_60 =3D PHI <_21(5), 0.0(6)>
t.c:14:18: note:        stmt 0 results$d_60 =3D PHI <_21(5), 0.0(6)>
t.c:14:18: note:        stmt 1 results$c_58 =3D PHI <_17(5), 0.0(6)>
t.c:14:18: note:        stmt 2 results$b_56 =3D PHI <_13(5), 0.0(6)>
t.c:14:18: note:        stmt 3 results$a_54 =3D PHI <_9(5), 0.0(6)>
t.c:14:18: note:        children 0x4304170 (nil)=