From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-bugzilla@gcc.gnu.org>
Received: by sourceware.org (Postfix, from userid 48)
	id 53FF33858C33; Wed, 26 Jul 2023 10:01:11 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 53FF33858C33
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org;
	s=default; t=1690365671;
	bh=VsLUm3Zeqfcezif1/k6IlCB7tN+n82vzuM7/XJCLOFQ=;
	h=From:To:Subject:Date:In-Reply-To:References:From;
	b=xQYH2li4Wsa5vNGiAmKO/HTu+CwysYqN4wf5/9eK0odlCUWMDRDpVwVLTi5kWBzNP
	 RMRASD7UbrO9R/ARJT3bV/fT2499ghd6WB3OzpFKtge2EAsL78mNG5aXLIYgfQR/8V
	 3jGjOJrQjTs/02epcxsu6P1jORgk1e0655YWn8IM=
From: "rsandifo at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug middle-end/106081] missed vectorization
Date: Wed, 26 Jul 2023 10:01:10 +0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: middle-end
X-Bugzilla-Version: 13.0
X-Bugzilla-Keywords: missed-optimization
X-Bugzilla-Severity: enhancement
X-Bugzilla-Who: rsandifo at gcc dot gnu.org
X-Bugzilla-Status: ASSIGNED
X-Bugzilla-Resolution: 
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: rguenth at gcc dot gnu.org
X-Bugzilla-Target-Milestone: ---
X-Bugzilla-Flags: 
X-Bugzilla-Changed-Fields: 
Message-ID: <bug-106081-4-ZUs1Ej3QuF@http.gcc.gnu.org/bugzilla/>
In-Reply-To: <bug-106081-4@http.gcc.gnu.org/bugzilla/>
References: <bug-106081-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
List-Id: <gcc-bugs.sourceware.org>

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D106081
--- Comment #10 from rsandifo at gcc dot gnu.org <rsandifo at gcc dot gnu.o=
rg> ---
(In reply to Richard Biener from comment #9)
> So I can adjust change_layout_cost in a bit awkward way, but it seems that
> internal_node_cost would already work out that a permute can be merged in=
to
> an existing permute.
Right.

> It seems that existing permutes are not recorded in the "layout".
They should be if they're bijective, via:

      else if (SLP_TREE_CODE (node) =3D=3D VEC_PERM_EXPR
               && SLP_TREE_CHILDREN (node).length () =3D=3D 1
               && (child =3D SLP_TREE_CHILDREN (node)[0])
               && (TYPE_VECTOR_SUBPARTS (SLP_TREE_VECTYPE (child))
                   .is_constant (&imin)))
        {
          /* If the child has the same vector size as this node,
             reversing the permutation can make the permutation a no-op.
             In other cases it can change a true permutation into a
             full-vector extract.  */
          tmp_perm.reserve (SLP_TREE_LANES (node));
          for (unsigned j =3D 0; j < SLP_TREE_LANES (node); ++j)
            tmp_perm.quick_push (SLP_TREE_LANE_PERMUTATION (node)[j].second=
);
        }

> Also vectorizable_slp_permutation_1 doesn't try to elide permutes that
> are noop based on knowledge of the layout of 'node', say a permute
> { 1 0 3 2 } of a { _1, _1, _2, _2 } node would be noop.
To do that in general, I think we'd need to value-number each
element of each node (which sounds doable).  But I guess doing
it at leaves would be better than nothing.

> But change_layout_cost does MAX (count, 1) on its result anyway.
At the moment, yes, because having from_layout_i !=3D to_layout_i
for identical layouts would be a consistency failure.

> The following elides the unnecessary permutation for this special case
> (but not the general case):
>=20
> diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
> index e4430248ab5..e9048a61891 100644
> --- a/gcc/tree-vect-slp.cc
> +++ b/gcc/tree-vect-slp.cc
> @@ -4389,6 +4389,19 @@ vect_optimize_slp_pass::change_layout_cost (slp_tr=
ee
> node,
>    if (from_layout_i =3D=3D to_layout_i)
>      return 0;
>=20=20
> +  /* When there's a uniform load permutation permutating that in any
> +     way is free.  */
> +  if (SLP_TREE_LOAD_PERMUTATION (node).exists ())
> +    {
> +      unsigned l =3D SLP_TREE_LOAD_PERMUTATION (node)[0];
> +      unsigned i;
> +      for (i =3D 1; i < SLP_TREE_LOAD_PERMUTATION (node).length (); ++i)
> +       if (SLP_TREE_LOAD_PERMUTATION (node)[i] !=3D l)
> +         break;
> +      if (i =3D=3D SLP_TREE_LOAD_PERMUTATION (node).length ())
> +       return 0;
> +    }
> +
>    auto_vec<slp_tree, 1> children (1);
>    children.quick_push (node);
>    auto_lane_permutation_t perm (SLP_TREE_LANES (node));
>=20
> I'm not sure this is the correct place to factor in cost savings
> materialization would give.  Is it?
Yeah, I think so.  The patch LGTM.  I don't know if it's worth
caching the =E2=80=9Call the same element=E2=80=9D result, but probably not.

> Are explicit VEC_PERM nodes also still there in the optimization
> process or are they turned into sth implicit?
They're still there.  The current algorithm inherits the old
restriction that candidate layouts must be bijective, and not
all VEC_PERM_EXPRs are.  So some VEC_PERM_EXPRs would have to
be explicit whatever happens.  Same for non-bijective load
permutations.=