From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=g1d5=N4=vrull.eu=manolis.tsamis@sourceware.org>
Received: from mail-pg1-x531.google.com (mail-pg1-x531.google.com [IPv6:2607:f8b0:4864:20::531])
	by sourceware.org (Postfix) with ESMTPS id F3E04387086D
	for <gcc-patches@gcc.gnu.org>; Wed, 26 Jun 2024 12:42:53 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org F3E04387086D
Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=vrull.eu
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=vrull.eu
ARC-Filter: OpenARC Filter v1.0.0 sourceware.org F3E04387086D
Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2607:f8b0:4864:20::531
ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1719405777; cv=none;
	b=W2+WWa+3rSD3LxXoro/3zaNPu4L/Y0scjNO5y5hlQFb2c77BZ8AD3gjfalnnWdQ+OQBQlJM3k0WQGXcTvKCamG0//LfVddauRcc7o6wgD8VBlURTjN3UbWG5TXCJRUTdcbAHgIZWXqMoXnqXatkaeyu/9PR93XWGJlCV2bBCQ5I=
ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key;
	t=1719405777; c=relaxed/simple;
	bh=Plil/ZKyAloFuKkSIZV/aUHZHz5EChDqE0AKadZxW8o=;
	h=DKIM-Signature:MIME-Version:From:Date:Message-ID:Subject:To; b=GUH0y3MSqiG5T/MOidL28HuhSWH6p41YC/YJ0kOl/6g4FimcpoBl5z+A/MXmqtugSxOAmJsvgNGOPTKEeBCMEQzSLSIwwvF7nGT9Tf7hgn+DS4a3twoM8s9dpgJMXi/Ut78pUZhjI5m1ilkavJLhGGURRYVVztlbmeP/Blb/ONw=
ARC-Authentication-Results: i=1; server2.sourceware.org
Received: by mail-pg1-x531.google.com with SMTP id 41be03b00d2f7-7163489149eso4222244a12.1
        for <gcc-patches@gcc.gnu.org>; Wed, 26 Jun 2024 05:42:53 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=vrull.eu; s=google; t=1719405773; x=1720010573; darn=gcc.gnu.org;
        h=content-transfer-encoding:cc:to:subject:message-id:date:from
         :in-reply-to:references:mime-version:from:to:cc:subject:date
         :message-id:reply-to;
        bh=SBCo+rV3iGALTeWp7IutFzJ8PaxY6lpcJhPvVasktm8=;
        b=I86qnshsMO5lVTsXY3XxwJLK8iTMD1PPXUFREtTtT94k4QCvazPGL6pK53+xdjolZz
         5/i8sTw4nLrofLBYPG4XxRTKnkeWcPx3+nBhxwFpOZNLunjsE6ZkQzZd1a1pIEpY6ZnK
         1Y4u+yoZLfSDUk1OwhcHyZkV9VGhM/73NRRnF2qF1uHNXV0Rm+HxIFlaIV0Z+lz85sA3
         umNxIV9aJSjIQvF0bgSCCwaOExJwvBwledlKFmFzXeUMsU9NNYZ2uaSe5q1uc+7lGFUV
         yp0Bc38e9+o7a31CWDl0WhbSAMh7Dpc6cRtcRi3u61iVcKiG4ih5P/OrQsb4zjShEwZv
         F1cA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1719405773; x=1720010573;
        h=content-transfer-encoding:cc:to:subject:message-id:date:from
         :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=SBCo+rV3iGALTeWp7IutFzJ8PaxY6lpcJhPvVasktm8=;
        b=ga16KpXBWEb4VSb5IAgH9gRGDY0CqO7J0u6tj4PepuhwfklAUqye68224KXeVJJeMu
         hgelKIsdUvDYP1lFpx1bU1mAsavr/b+H2j5dODE7TtjJ3sitE6YVOuKC7ymFZUYRCoOn
         Gx6mUIc9haXZld+1It3BUV66MRFQ4SK/bQee5S1sX6op1neKhi0nY76ZjKDvKn+q0q3u
         K+ejlW1bJ0MUphhWlXJOeedf4HToN+9FhoLO3UHt28EK6ASep9Q4+poWJmRw98nRjLi8
         k8vTHpCKg9J2hJ6WATVvwrta0a9LdO9TLjDznOajnveuxbVsXavrusipp7QyjD10lH6R
         FB6Q==
X-Gm-Message-State: AOJu0YyNcbDu/tVkqyFJsdB//BGnqMR9c6SHElBs9pdJGtI45Ln/eDUl
	GkxKPRgsSR2zKiq9h+a6VjrFID8Z0iL3kl8q7/Z3vGoHQPfrojIub21sXzDWhnRK3KrUByStWUh
	p2zKgK2KPv2PfZLKBSTwbb7ccCabbN/8tvPavJg==
X-Google-Smtp-Source: AGHT+IGRMuXmmg1c5S87D0YXCWQlf67KZGz3jS5czgoTEUZD/yMUOlBqQuRUCOJWL0UNT2ArsAePwbmR3kR49p8+BvM=
X-Received: by 2002:a17:90a:df08:b0:2c8:431e:4105 with SMTP id
 98e67ed59e1d1-2c86140975dmr9615469a91.26.1719405772608; Wed, 26 Jun 2024
 05:42:52 -0700 (PDT)
MIME-Version: 1.0
References: <20240604135317.3536415-1-manolis.tsamis@vrull.eu> <no66n5r4-4po7-37pn-3s54-sn1r031670q5@fhfr.qr>
In-Reply-To: <no66n5r4-4po7-37pn-3s54-sn1r031670q5@fhfr.qr>
From: Manolis Tsamis <manolis.tsamis@vrull.eu>
Date: Wed, 26 Jun 2024 15:42:13 +0300
Message-ID: <CAM3yNXpP=jdTcH8d=aUYSPov-QXYJSC4=GXQ-WvKhwGbN4tBrQ@mail.gmail.com>
Subject: Re: [PATCH] Rearrange SLP nodes with duplicate statements. [PR98138]
To: Richard Biener <rguenther@suse.de>
Cc: gcc-patches@gcc.gnu.org, 
	=?UTF-8?Q?Christoph_M=C3=BCllner?= <christoph.muellner@vrull.eu>, 
	"Kewen . Lin" <linkw@linux.ibm.com>, Philipp Tomsich <philipp.tomsich@vrull.eu>, 
	Tamar Christina <tamar.christina@arm.com>, Jiangning Liu <jiangning.liu@amperecomputing.com>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Spam-Status: No, score=-9.3 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,KAM_SHORT,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <gcc-patches.gcc.gnu.org>

On Wed, Jun 5, 2024 at 11:07=E2=80=AFAM Richard Biener <rguenther@suse.de> =
wrote:
>
> On Tue, 4 Jun 2024, Manolis Tsamis wrote:
>
> > This change adds a function that checks for SLP nodes with multiple occ=
urrences
> > of the same statement (e.g. {A, B, A, B, ...}) and tries to rearrange t=
he node
> > so that there are no duplicates. A vec_perm is then introduced to recre=
ate the
> > original ordering. These duplicates can appear due to how two_operators=
 nodes
> > are handled, and they prevent vectorization in some cases.
>
> So the trick is that when we have two operands we elide duplicate lanes
> so we can do discovery for a single combined operand instead which we
> then decompose into the required two again.  That's a nice one.
>
> But as implemented this will fail SLP discovery if the combined operand
> fails discovery possibly because of divergence in downstream defs.  That
> is, it doesn't fall back to separate discovery.  I suspect the situation
> of duplicate lanes isn't common but then I would also suspect that
> divergence _is_ common.
>
> The discovery code is already quite complex with the way it possibly
> swaps operands of lanes, fitting in this as another variant to try (first=
)
> is likely going to be a bit awkward.  A way out might be to split the
> function or to make the re-try in the caller which could indicate whether
> to apply this pattern trick or not.  That said - can you try to get
> data on how often the trick applies and discovery succeeds and how
> often discovery fails but discovery would suceed without applying the
> pattern (say, on SPEC)?

Hi Richard,

I have found two other SPEC benchmarks in which the new version of
this optimization applies.
In these cases discovery "fails" anyway when not doing deduplication.
It's not an immediate failure though but rather not producing good SLP
trees and then aborting due to cost of other checks (similar to x264).

>
> I also suppose instead of hardcoding three patterns for a fixed
> size it should be possible to see there's
> only (at most) half unique lanes in both operands (and one less in one
> operand if the number of lanes is odd) and compute the un-swizzling lane
> permutes during this discovery, removing the need of the explicit enum
> and open-coding each case?
>
> Another general note is that trying (and then undo on fail) such ticks
> eats at the discovery limit we have in place to avoid exponential run-off
> in exactly this degenerate cases.
>

I have sent a new version that doesn't have hardcoded patterns and
only works with two_operators nodes among others.
Please note that I still haven't addressed all your other feedback as
I'm still iterating the implementation.

Thanks,
Manolis


> Thanks,
> Richard.
>
> > This targets the vectorization of the SPEC2017 x264 pixel_satd function=
s.
> > In some processors a larger than 10% improvement on x264 has been obser=
ved.
> >
> > See also: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D98138
> >
> > gcc/ChangeLog:
> >
> >       * tree-vect-slp.cc (enum slp_oprnd_pattern): new enum for rearran=
gement
> >       patterns.
> >       (try_rearrange_oprnd_info): Detect if a node corresponds to one o=
f the
> >       patterns.
> >
> > gcc/testsuite/ChangeLog:
> >
> >       * gcc.target/aarch64/vect-slp-two-operator.c: New test.
> >
> > Signed-off-by: Manolis Tsamis <manolis.tsamis@vrull.eu>
> > ---
> >
> >  .../aarch64/vect-slp-two-operator.c           |  42 ++++
> >  gcc/tree-vect-slp.cc                          | 234 ++++++++++++++++++
> >  2 files changed, 276 insertions(+)
> >  create mode 100644 gcc/testsuite/gcc.target/aarch64/vect-slp-two-opera=
tor.c
> >
> > diff --git a/gcc/testsuite/gcc.target/aarch64/vect-slp-two-operator.c b=
/gcc/testsuite/gcc.target/aarch64/vect-slp-two-operator.c
> > new file mode 100644
> > index 00000000000..2db066a0b6e
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/aarch64/vect-slp-two-operator.c
> > @@ -0,0 +1,42 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O2 -ftree-vectorize -fdump-tree-vect -fdump-tree-vec=
t-details" } */
> > +
> > +typedef unsigned char uint8_t;
> > +typedef unsigned int uint32_t;
> > +
> > +#define HADAMARD4(d0, d1, d2, d3, s0, s1, s2, s3) {\
> > +    int t0 =3D s0 + s1;\
> > +    int t1 =3D s0 - s1;\
> > +    int t2 =3D s2 + s3;\
> > +    int t3 =3D s2 - s3;\
> > +    d0 =3D t0 + t2;\
> > +    d1 =3D t1 + t3;\
> > +    d2 =3D t0 - t2;\
> > +    d3 =3D t1 - t3;\
> > +}
> > +
> > +static uint32_t abs2( uint32_t a )
> > +{
> > +    uint32_t s =3D ((a>>15)&0x10001)*0xffff;
> > +    return (a+s)^s;
> > +}
> > +
> > +void sink(uint32_t tmp[4][4]);
> > +
> > +int x264_pixel_satd_8x4( uint8_t *pix1, int i_pix1, uint8_t *pix2, int=
 i_pix2 )
> > +{
> > +    uint32_t tmp[4][4];
> > +    int sum =3D 0;
> > +    for( int i =3D 0; i < 4; i++, pix1 +=3D i_pix1, pix2 +=3D i_pix2 )
> > +    {
> > +        uint32_t a0 =3D (pix1[0] - pix2[0]) + ((pix1[4] - pix2[4]) << =
16);
> > +        uint32_t a1 =3D (pix1[1] - pix2[1]) + ((pix1[5] - pix2[5]) << =
16);
> > +        uint32_t a2 =3D (pix1[2] - pix2[2]) + ((pix1[6] - pix2[6]) << =
16);
> > +        uint32_t a3 =3D (pix1[3] - pix2[3]) + ((pix1[7] - pix2[7]) << =
16);
> > +        HADAMARD4( tmp[i][0], tmp[i][1], tmp[i][2], tmp[i][3], a0,a1,a=
2,a3 );
> > +    }
> > +    sink(tmp);
> > +}
> > +
> > +/* { dg-final { scan-tree-dump "vectorizing stmts using SLP" "vect" } =
} */
> > +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } }=
 */
> > diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
> > index bf1f467f53f..e395db0e185 100644
> > --- a/gcc/tree-vect-slp.cc
> > +++ b/gcc/tree-vect-slp.cc
> > @@ -40,6 +40,7 @@ along with GCC; see the file COPYING3.  If not see
> >  #include "tree-vectorizer.h"
> >  #include "langhooks.h"
> >  #include "gimple-walk.h"
> > +#include "gimple-pretty-print.h"
> >  #include "dbgcnt.h"
> >  #include "tree-vector-builder.h"
> >  #include "vec-perm-indices.h"
> > @@ -1829,6 +1830,141 @@ vect_slp_build_two_operator_nodes (slp_tree per=
m, tree vectype,
> >    SLP_TREE_CHILDREN (perm).quick_push (child2);
> >  }
> >
> > +enum slp_oprnd_pattern
> > +{
> > +  SLP_OPRND_PATTERN_NONE,
> > +  SLP_OPRND_PATTERN_ABAB,
> > +  SLP_OPRND_PATTERN_AABB,
> > +  SLP_OPRND_PATTERN_ABBA
> > +};
> > +
> > +/* Check if OPRNDS_INFO has duplicated nodes that correspond to a pred=
efined
> > +   pattern described by SLP_OPRND_PATTERN and return it.  */
> > +
> > +static int
> > +try_rearrange_oprnd_info (vec<slp_oprnd_info> &oprnds_info, unsigned g=
roup_size)
> > +{
> > +  unsigned i;
> > +  slp_oprnd_info info;
> > +
> > +  if (oprnds_info.length () !=3D 2 || group_size % 4 !=3D 0)
> > +    return SLP_OPRND_PATTERN_NONE;
> > +
> > +  if (!oprnds_info[0]->def_stmts[0]
> > +      || !is_a<gassign *> (oprnds_info[0]->def_stmts[0]->stmt))
> > +    return SLP_OPRND_PATTERN_NONE;
> > +
> > +  enum tree_code code
> > +    =3D gimple_assign_rhs_code (oprnds_info[0]->def_stmts[0]->stmt);
> > +  FOR_EACH_VEC_ELT (oprnds_info, i, info)
> > +    for (unsigned int j =3D 0; j < group_size; j +=3D 1)
> > +      {
> > +     if (!info->def_stmts[j]
> > +         || !is_a<gassign *> (info->def_stmts[j]->stmt)
> > +         || STMT_VINFO_DATA_REF (info->def_stmts[j]))
> > +       return SLP_OPRND_PATTERN_NONE;
> > +     /* Don't mix different operations.  */
> > +     if (gimple_assign_rhs_code (info->def_stmts[j]->stmt) !=3D code)
> > +       return SLP_OPRND_PATTERN_NONE;
> > +      }
> > +
> > +  if (gimple_assign_rhs_code (oprnds_info[0]->def_stmts[0]->stmt)
> > +      !=3D gimple_assign_rhs_code (oprnds_info[1]->def_stmts[0]->stmt)=
)
> > +    return SLP_OPRND_PATTERN_NONE;
> > +
> > +  int pattern =3D SLP_OPRND_PATTERN_NONE;
> > +  FOR_EACH_VEC_ELT (oprnds_info, i, info)
> > +    for (unsigned int j =3D 0; j < group_size; j +=3D 4)
> > +      {
> > +     int cur_pattern =3D SLP_OPRND_PATTERN_NONE;
> > +     /* Check for an ABAB... pattern.  */
> > +     if ((info->def_stmts[j] =3D=3D info->def_stmts[j + 2])
> > +         && (info->def_stmts[j + 1] =3D=3D info->def_stmts[j + 3])
> > +         && (info->def_stmts[j] !=3D info->def_stmts[j + 1]))
> > +       cur_pattern =3D SLP_OPRND_PATTERN_ABAB;
> > +     /* Check for an AABB... pattern.  */
> > +     else if ((info->def_stmts[j] =3D=3D info->def_stmts[j + 1])
> > +              && (info->def_stmts[j + 2] =3D=3D info->def_stmts[j + 3]=
)
> > +              && (info->def_stmts[j] !=3D info->def_stmts[j + 2]))
> > +       cur_pattern =3D SLP_OPRND_PATTERN_AABB;
> > +     /* Check for an ABBA... pattern.  */
> > +     else if ((info->def_stmts[j] =3D=3D info->def_stmts[j + 3])
> > +              && (info->def_stmts[j + 1] =3D=3D info->def_stmts[j + 2]=
)
> > +              && (info->def_stmts[j] !=3D info->def_stmts[j + 1]))
> > +       cur_pattern =3D SLP_OPRND_PATTERN_ABBA;
> > +     /* Unrecognised pattern.  */
> > +     else
> > +       return SLP_OPRND_PATTERN_NONE;
> > +
> > +     if (pattern =3D=3D SLP_OPRND_PATTERN_NONE)
> > +       pattern =3D cur_pattern;
> > +     /* Multiple patterns detected.  */
> > +     else if (cur_pattern !=3D pattern)
> > +       return SLP_OPRND_PATTERN_NONE;
> > +      }
> > +
> > +  gcc_checking_assert (pattern !=3D SLP_OPRND_PATTERN_NONE);
> > +
> > +  if (dump_enabled_p ())
> > +    {
> > +      if (pattern =3D=3D SLP_OPRND_PATTERN_ABAB)
> > +     dump_printf (MSG_NOTE, "ABAB");
> > +      else if (pattern =3D=3D SLP_OPRND_PATTERN_AABB)
> > +     dump_printf (MSG_NOTE, "AABB");
> > +      else if (pattern =3D=3D SLP_OPRND_PATTERN_ABBA)
> > +     dump_printf (MSG_NOTE, "ABBA");
> > +      dump_printf (MSG_NOTE, " pattern detected.\n");
> > +    }
> > +
> > +  if (pattern =3D=3D SLP_OPRND_PATTERN_ABAB || pattern =3D=3D SLP_OPRN=
D_PATTERN_ABBA)
> > +    for (unsigned int j =3D 0; j < group_size; j +=3D 4)
> > +      {
> > +     /* Given oprnd[0] -> A1, B1, A1, B1, A2, B2, A2, B2, ...
> > +        Given oprnd[1] -> C1, D1, C1, D1, C2, D2, C2, D2, ...
> > +        Create a single node -> A1, B1, C1, D1, A2, B2, C2, D2, ...  *=
/
> > +     oprnds_info[0]->def_stmts[j+2] =3D oprnds_info[1]->def_stmts[j];
> > +     oprnds_info[0]->ops[j+2] =3D oprnds_info[1]->ops[j];
> > +     oprnds_info[0]->def_stmts[j+3] =3D oprnds_info[1]->def_stmts[j+1]=
;
> > +     oprnds_info[0]->ops[j+3] =3D oprnds_info[1]->ops[j+1];
> > +      }
> > +  else if (pattern =3D=3D SLP_OPRND_PATTERN_AABB)
> > +    for (unsigned int j =3D 0; j < group_size; j +=3D 4)
> > +      {
> > +     /* Given oprnd[0] -> A1, A1, B1, B1, A2, A2, B2, B2, ...
> > +        Given oprnd[1] -> C1, C1, D1, D1, C2, C2, D2, D2, ...
> > +        Create a single node -> A1, C1, B1, D1, A2, C2, B2, D2, ...  *=
/
> > +
> > +     /* The ordering here is at least to some extent arbitrary.
> > +        A generilized version needs to use some explicit ordering.  */
> > +     oprnds_info[0]->def_stmts[j+1] =3D oprnds_info[1]->def_stmts[j];
> > +     oprnds_info[0]->ops[j+1] =3D oprnds_info[1]->ops[j];
> > +     oprnds_info[0]->def_stmts[j+2] =3D oprnds_info[0]->def_stmts[j+2]=
;
> > +     oprnds_info[0]->ops[j+2] =3D oprnds_info[0]->ops[j+2];
> > +     oprnds_info[0]->def_stmts[j+3] =3D oprnds_info[1]->def_stmts[j+2]=
;
> > +     oprnds_info[0]->ops[j+3] =3D oprnds_info[1]->ops[j+2];
> > +      }
> > +
> > +  if (dump_enabled_p ())
> > +    {
> > +      dump_printf (MSG_NOTE, "Recurse with:\n");
> > +      for (unsigned int j =3D 0; j < group_size; j++)
> > +     {
> > +       dump_printf (MSG_NOTE, "  ");
> > +       print_gimple_stmt (dump_file, oprnds_info[0]->def_stmts[j]->stm=
t, 0);
> > +     }
> > +    }
> > +
> > +  /* Since we've merged the two nodes in one, make the second one a co=
py of
> > +     the first.  */
> > +  for (unsigned int j =3D 0; j < group_size; j++)
> > +    {
> > +      oprnds_info[1]->def_stmts[j] =3D oprnds_info[0]->def_stmts[j];
> > +      oprnds_info[1]->ops[j] =3D oprnds_info[0]->ops[j];
> > +    }
> > +
> > +  return pattern;
> > +}
> > +
> >  /* Recursively build an SLP tree starting from NODE.
> >     Fail (and return a value not equal to zero) if def-stmts are not
> >     isomorphic, require data permutation or are of unsupported types of
> > @@ -2409,6 +2545,10 @@ out:
> >
> >    stmt_info =3D stmts[0];
> >
> > +  int rearrange_pattern =3D SLP_OPRND_PATTERN_NONE;
> > +  if (is_a<gassign *> (stmt_info->stmt))
> > +    rearrange_pattern =3D try_rearrange_oprnd_info (oprnds_info, group=
_size);
> > +
> >    /* Create SLP_TREE nodes for the definition node/s.  */
> >    FOR_EACH_VEC_ELT (oprnds_info, i, oprnd_info)
> >      {
> > @@ -2669,6 +2809,100 @@ fail:
> >    *tree_size +=3D this_tree_size + 1;
> >    *max_nunits =3D this_max_nunits;
> >
> > +  /* If we applied any rearrangmenets then we need to reconstruct the =
original
> > +     elements with an additional permutation layer.  */
> > +  if (rearrange_pattern !=3D SLP_OPRND_PATTERN_NONE)
> > +    {
> > +      slp_tree one =3D  new _slp_tree;
> > +      slp_tree two =3D new _slp_tree;
> > +      SLP_TREE_DEF_TYPE (one) =3D vect_internal_def;
> > +      SLP_TREE_DEF_TYPE (two) =3D vect_internal_def;
> > +      SLP_TREE_VECTYPE (one) =3D vectype;
> > +      SLP_TREE_VECTYPE (two) =3D vectype;
> > +      SLP_TREE_CHILDREN (one).safe_splice (children);
> > +      SLP_TREE_CHILDREN (two).safe_splice (children);
> > +
> > +      SLP_TREE_CODE (one) =3D VEC_PERM_EXPR;
> > +      SLP_TREE_CODE (two) =3D VEC_PERM_EXPR;
> > +      SLP_TREE_REPRESENTATIVE (one) =3D stmts[0];
> > +      SLP_TREE_REPRESENTATIVE (two) =3D stmts[2];
> > +      lane_permutation_t &perm_one =3D SLP_TREE_LANE_PERMUTATION (one)=
;
> > +      lane_permutation_t &perm_two =3D SLP_TREE_LANE_PERMUTATION (two)=
;
> > +
> > +      if (rearrange_pattern =3D=3D SLP_OPRND_PATTERN_ABAB)
> > +     {
> > +        /* Given a single node -> A1, B1, C1, D1, A2, B2, C2, D2, ...
> > +           Create node "one" -> A1, B1, A1, B1, A2, B2, A2, B2, ...
> > +           Create node "two" -> C1, D1, C1, D1, C2, D2, C2, D2, ...  *=
/
> > +
> > +       for (unsigned int j =3D 0; j < group_size; j +=3D 4)
> > +         {
> > +           perm_one.safe_push (std::make_pair (0, j + 0));
> > +           perm_one.safe_push (std::make_pair (0, j + 1));
> > +           perm_one.safe_push (std::make_pair (0, j + 0));
> > +           perm_one.safe_push (std::make_pair (0, j + 1));
> > +
> > +           perm_two.safe_push (std::make_pair (0, j + 2));
> > +           perm_two.safe_push (std::make_pair (0, j + 3));
> > +           perm_two.safe_push (std::make_pair (0, j + 2));
> > +           perm_two.safe_push (std::make_pair (0, j + 3));
> > +         }
> > +     }
> > +      else if (rearrange_pattern =3D=3D SLP_OPRND_PATTERN_AABB)
> > +     {
> > +        /* Given a single node -> A1, C1, B1, D1, A2, C2, B2, D2, ...
> > +           Create node "one" -> A1, A1, B1, B1, A2, A2, B2, B2, ...
> > +           Create node "two" -> C1, C1, D1, D1, C2, C2, D2, D2, ...  *=
/
> > +
> > +       for (unsigned int j =3D 0; j < group_size; j +=3D 4)
> > +         {
> > +           perm_one.safe_push (std::make_pair (0, j + 0));
> > +           perm_one.safe_push (std::make_pair (0, j + 0));
> > +           perm_one.safe_push (std::make_pair (0, j + 2));
> > +           perm_one.safe_push (std::make_pair (0, j + 2));
> > +
> > +           perm_two.safe_push (std::make_pair (0, j + 1));
> > +           perm_two.safe_push (std::make_pair (0, j + 1));
> > +           perm_two.safe_push (std::make_pair (0, j + 3));
> > +           perm_two.safe_push (std::make_pair (0, j + 3));
> > +         }
> > +     }
> > +      else if (rearrange_pattern =3D=3D SLP_OPRND_PATTERN_ABBA)
> > +     {
> > +        /* Given a single node -> A1, B1, C1, D1, A2, B2, C2, D2, ...
> > +           Create node "one" -> A1, B1, B1, A1, A2, B2, B2, A2, ...
> > +           Create node "two" -> C1, D1, D1, C1, C2, D2, D2, C2, ...  *=
/
> > +
> > +       for (unsigned int j =3D 0; j < group_size; j +=3D 4)
> > +         {
> > +           perm_one.safe_push (std::make_pair (0, j + 0));
> > +           perm_one.safe_push (std::make_pair (0, j + 1));
> > +           perm_one.safe_push (std::make_pair (0, j + 1));
> > +           perm_one.safe_push (std::make_pair (0, j + 0));
> > +
> > +           perm_two.safe_push (std::make_pair (0, j + 2));
> > +           perm_two.safe_push (std::make_pair (0, j + 3));
> > +           perm_two.safe_push (std::make_pair (0, j + 3));
> > +           perm_two.safe_push (std::make_pair (0, j + 2));
> > +         }
> > +     }
> > +
> > +      slp_tree child;
> > +      FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (two), i, child)
> > +       SLP_TREE_REF_COUNT (child)++;
> > +
> > +      node =3D vect_create_new_slp_node (node, stmts, 2);
> > +      SLP_TREE_VECTYPE (node) =3D vectype;
> > +      SLP_TREE_CHILDREN (node).quick_push (one);
> > +      SLP_TREE_CHILDREN (node).quick_push (two);
> > +
> > +      SLP_TREE_LANES (one) =3D stmts.length ();
> > +      SLP_TREE_LANES (two) =3D stmts.length ();
> > +
> > +      children.truncate (0);
> > +      children.safe_splice (SLP_TREE_CHILDREN (node));
> > +    }
> > +
> >    if (two_operators)
> >      {
> >        /* ???  We'd likely want to either cache in bst_map sth like
> >
>
> --
> Richard Biener <rguenther@suse.de>
> SUSE Software Solutions Germany GmbH,
> Frankenstrasse 146, 90461 Nuernberg, Germany;
> GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg=
)