From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <Tamar.Christina@arm.com>
Received: from EUR05-AM6-obe.outbound.protection.outlook.com
 (mail-am6eur05on2070.outbound.protection.outlook.com [40.107.22.70])
 by sourceware.org (Postfix) with ESMTPS id 31328388703D
 for <gcc-patches@gcc.gnu.org>; Wed,  4 Nov 2020 13:38:08 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 31328388703D
Received: from AM6P192CA0105.EURP192.PROD.OUTLOOK.COM (2603:10a6:209:8d::46)
 by HE1PR0801MB1962.eurprd08.prod.outlook.com (2603:10a6:3:57::12) with
 Microsoft SMTP Server (version=TLS1_2,
 cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3499.19; Wed, 4 Nov
 2020 13:38:05 +0000
Received: from VE1EUR03FT058.eop-EUR03.prod.protection.outlook.com
 (2603:10a6:209:8d:cafe::73) by AM6P192CA0105.outlook.office365.com
 (2603:10a6:209:8d::46) with Microsoft SMTP Server (version=TLS1_2,
 cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3499.18 via Frontend
 Transport; Wed, 4 Nov 2020 13:38:05 +0000
X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123)
 smtp.mailfrom=arm.com; gcc.gnu.org; dkim=pass (signature was verified)
 header.d=armh.onmicrosoft.com;gcc.gnu.org; dmarc=pass action=none
 header.from=arm.com;
Received-SPF: Pass (protection.outlook.com: domain of arm.com designates
 63.35.35.123 as permitted sender) receiver=protection.outlook.com;
 client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com;
Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by
 VE1EUR03FT058.mail.protection.outlook.com (10.152.19.86) with
 Microsoft SMTP
 Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id
 15.20.3520.15 via Frontend Transport; Wed, 4 Nov 2020 13:38:05 +0000
Received: ("Tessian outbound 68da730eaaba:v64");
 Wed, 04 Nov 2020 13:38:03 +0000
X-CR-MTA-TID: 64aa7808
Received: from 3c368ded5c4e.2
 by 64aa7808-outbound-1.mta.getcheckrecipient.com id
 AD8887C9-AA55-4798-BBDB-8D7873479174.1; 
 Wed, 04 Nov 2020 13:37:58 +0000
Received: from EUR04-HE1-obe.outbound.protection.outlook.com
 by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id 3c368ded5c4e.2
 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384);
 Wed, 04 Nov 2020 13:37:58 +0000
ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none;
 b=QpMXaVflGBPzs9mI/4QcDFtUoAqt1agf1694gfUNQp5hPtDLIF2a5lVTkLMPpYFVCa9Q3amozjdU40zmjg0OboXE0eyz0GbIqLxdELwm9mlW78lWCSV+vEZHlsLrhaZ3qpk66KohIN/D2YLM+rDaPMdFKiSnPaZY96PIQeABemO1Tb3jlSqeKPgwbueeczz0vvmQ9CnetEzdejtR3VXtyScOwK60sejWlJidbPDZkph5uqPL75oiN7kY+goKXkKV3ENRSzdoHFBuNTTaOdUcwIC+cCi0AP5M777WL4+vUFytDSLPynhxY+GW4qj7sXJ7Ezsd1CYi80OC3rf9NWlRpA==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; 
 s=arcselector9901;
 h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck;
 bh=kftPFiwyCbKSyd42XHcEoGKko/84hDzP7u4RfSmWv0E=;
 b=O6PfrlhzmC16E1h1VafQUkPCS4kkvxgQfVu514SONihh0PWuD86EO5IFVM+LsWpnqR5PuZIGTl0XoNjyjY2nuWyhR93FWVJ2GP5Y4tq4Wm6oH7diUnmoRvlxC1/zz1xUGRYZzQqg3rhYRalgGfPHkIKXQLCSN2pk3692GgxIoOV84M1sRsE4eGEyytvSrBeFUReDQ67/JAyb03EwALUWYIX2plMw0nPt/o+BOZxDI3Lq+6w5lySwI/BS4BM6LVo0hu1Om+x88putxFtvHllEGjA9NLOUkjyu/latyGZ1qgv0R3tYphU+9hjcxn0t6vM/7UQ3my/NSq4dc5CHs02mfg==
ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass
 smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass
 header.d=arm.com; arc=none
Received: from VI1PR08MB5325.eurprd08.prod.outlook.com (2603:10a6:803:13e::17)
 by VI1PR08MB3054.eurprd08.prod.outlook.com (2603:10a6:803:4a::32)
 with Microsoft SMTP Server (version=TLS1_2,
 cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3499.19; Wed, 4 Nov
 2020 13:37:55 +0000
Received: from VI1PR08MB5325.eurprd08.prod.outlook.com
 ([fe80::d0e7:49cd:4dae:a2a2]) by VI1PR08MB5325.eurprd08.prod.outlook.com
 ([fe80::d0e7:49cd:4dae:a2a2%6]) with mapi id 15.20.3499.032; Wed, 4 Nov 2020
 13:37:55 +0000
From: Tamar Christina <Tamar.Christina@arm.com>
To: Richard Biener <rguenther@suse.de>
CC: Richard Sandiford <Richard.Sandiford@arm.com>, nd <nd@arm.com>,
 "gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>
Subject: RE: [PATCH v2 3/16]middle-end Add basic SLP pattern matching
 scaffolding.
Thread-Topic: [PATCH v2 3/16]middle-end Add basic SLP pattern matching
 scaffolding.
Thread-Index: AQHWk0gPfg3binthXU6SMGDs22R/u6l+AX0AgAAQ82CAAD71gIAA4CwAgAAs+/GAAAR+AIA3Te0AgAF394CAAAgZEA==
Date: Wed, 4 Nov 2020 13:37:54 +0000
Message-ID: <VI1PR08MB5325F3496EF0CB5218BD87D6FFEF0@VI1PR08MB5325.eurprd08.prod.outlook.com>
References: <20200925142753.GA13692@arm.com>
 <nycvar.YFH.7.76.2009281405360.10073@p653.nepu.fhfr.qr>
 <VI1PR08MB532554C7BAF0AD8C02B793EDFF350@VI1PR08MB5325.eurprd08.prod.outlook.com>
 <VI1PR08MB5325F124EA9661D12C00AA57FF350@VI1PR08MB5325.eurprd08.prod.outlook.com>
 <nycvar.YFH.7.76.2009290834170.10073@p653.nepu.fhfr.qr>
 <mptsgb153zc.fsf@arm.com>
 <nycvar.YFH.7.76.2009291134300.10073@p653.nepu.fhfr.qr>
 <VI1PR08MB53250748BBC032014BB94CF0FF110@VI1PR08MB5325.eurprd08.prod.outlook.com>
 <nycvar.YFH.7.76.2011041209570.10073@p653.nepu.fhfr.qr>
In-Reply-To: <nycvar.YFH.7.76.2011041209570.10073@p653.nepu.fhfr.qr>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
x-ts-tracking-id: 12DB1D30F72C9D4495B3150A6C6068DA.0
x-checkrecipientchecked: true
Authentication-Results-Original: suse.de; dkim=none (message not signed)
 header.d=none;suse.de; dmarc=none action=none header.from=arm.com;
x-originating-ip: [82.24.248.186]
x-ms-publictraffictype: Email
X-MS-Office365-Filtering-HT: Tenant
X-MS-Office365-Filtering-Correlation-Id: 6714c063-881b-4177-67c3-08d880c6dbdb
x-ms-traffictypediagnostic: VI1PR08MB3054:|HE1PR0801MB1962:
x-ms-exchange-transport-forked: True
X-Microsoft-Antispam-PRVS: <HE1PR0801MB1962DBBF7F0CBA3D704EFA4CFFEF0@HE1PR0801MB1962.eurprd08.prod.outlook.com>
x-checkrecipientrouted: true
nodisclaimer: true
x-ms-oob-tlc-oobclassifiers: OLM:10000;OLM:10000;
X-MS-Exchange-SenderADCheck: 1
X-Microsoft-Antispam-Untrusted: BCL:0;
X-Microsoft-Antispam-Message-Info-Original: iQx1klYh3z9r2OgfryLBeA+OC+unAMVN3Fs+xDRGt90vQb2HRlvr2XVYq87+sRZ0XUykb1ZS94TXXgI2dqTIFxETq9mzibtUF2NpMH+xV47MwltIo6CoA5mt22UfOOEJj78RZ4XVFfScwP9PbATBQCOJwCDwSBMv/rcdwZFoX+Rgl69DcQs0qN4iTElZxdhk6m8LueWFFOpE3wiuyHDk7eCw22j+BZtR7trkgT21AY3/ltIENvzojr9rE1E/INSXDliwC5nKghNl/X+t4bQnekAU+CKL5AHL44FKdVgTPCaSViU2/Jt6a0inzZcCxN98lO8QV0HVtZRqAUJdWZ3zJA==
X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255; CTRY:; LANG:en;
 SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:VI1PR08MB5325.eurprd08.prod.outlook.com;
 PTR:; CAT:NONE;
 SFS:(4636009)(136003)(396003)(376002)(346002)(366004)(39860400002)(7696005)(8676002)(52536014)(5660300002)(4326008)(71200400001)(33656002)(8936002)(2906002)(54906003)(83380400001)(9686003)(6506007)(55016002)(26005)(53546011)(64756008)(66556008)(66946007)(66446008)(66476007)(76116006)(478600001)(30864003)(6916009)(86362001)(316002)(186003)(579004);
 DIR:OUT; SFP:1101; 
x-ms-exchange-antispam-messagedata: pbJde5COGaMRv/do5jaji/KmFdiM4r4/py5WpAQgOEBZty+D/9r6VbzjpeziZF4ZI0T4MrTPPGDeTboT9P4OID9Glcs27AlhAEOI1OFhw31zKNteDdF9ObaGvWpEyJZPJ+IxCiLoy4/OHZ3rpDxiVETMYpm5fpnHVZCTJh9zS0ss8nNcXozKIT9SvPKRff0lu4pYnYU6wwizUzuR1ECXpH5GaYD+Qum/2dRYdlV+GFuBak7xQoxWBk8xEiNhk/dvngoGxPkURkSdXOi8idCDqw4XxoEV/fLIkM0zsUk6k6MR2wUYiVz2Emd862U1fUoJsaUcqnV3lPSPsjlbYHC5GAsarak0JH7D5qef1OwzpMXoB/DDC4Wx0Nyh620dwXPpm8+5opo/qXJyDrvGcUvCu0PpQWnyhizeW4E0csPOOFi+jZvzKEbHdDmKSZhrgHGr0t73hXYBmyBWkM02/TJx6clWKN3xs3v8G+PRhte6NbyrnQr+wBwS4pB6odb6lRsan+kmUmRCYTfmYBFXK2QQs4vFI4NiZEV/HCQAy4bp18Xz0eseQ1HRjymS2TxaQsjsJZ8ftY9gwNR3yYDWK8OyyD6YH1SoJfrCpPaWuyliDkyumGXByhrd+cnoX7nHX9GbjTIK62w/iGbfa4W39ZWXoQ==
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
X-MS-Exchange-Transport-CrossTenantHeadersStamped: VI1PR08MB3054
Original-Authentication-Results: suse.de; dkim=none (message not signed)
 header.d=none;suse.de; dmarc=none action=none header.from=arm.com;
X-EOPAttributedMessage: 0
X-MS-Exchange-Transport-CrossTenantHeadersStripped: VE1EUR03FT058.eop-EUR03.prod.protection.outlook.com
X-MS-Office365-Filtering-Correlation-Id-Prvs: 6d31f046-fe52-43ae-a2bf-08d880c6d5ec
X-Microsoft-Antispam: BCL:0;
X-Microsoft-Antispam-Message-Info: uo0cxcqQ66PPnrBSyA3zPsqItuw3hZfA+hvv8V87yKPg6rDbSWqwJ91eb/qwyEJ8++lj3FlOx1NBumKwPUktRaZWgvpbuEL5vNspgZN+dPCxT2ATw9/SOGFuJX9L+YT2pS2/vA6iA/Pe0o0SBxfcFrkbkYb6Cqsy7DO5B43y55/LI4XQ8Siee7GKijB99rCMIELQC5+GA2SEo0bQV/0nvksJl+sY6IZWemNW9Nn4CGFc4KKH8CfJyX2/Q3DfpOsrJhRoM/tlOz6S+KZSQCKKrWU+yiOFdTh6Fm2iLLIlgnIr/eVa70FbRUNAMZep2WtIy7JkFYzhGFZJ3xD81qZJvgavAIpAs1oxLm/zsWzllYCGAURzS4jUPA/+0DwYDyBGlTiedfmtFaAH2yaXSdrKfg==
X-Forefront-Antispam-Report: CIP:63.35.35.123; CTRY:IE; LANG:en; SCL:1; SRV:;
 IPV:CAL; SFV:NSPM; H:64aa7808-outbound-1.mta.getcheckrecipient.com;
 PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com; CAT:NONE;
 SFS:(4636009)(39860400002)(396003)(346002)(376002)(136003)(46966005)(478600001)(2906002)(8676002)(70206006)(336012)(6862004)(70586007)(52536014)(7696005)(83380400001)(5660300002)(26005)(9686003)(4326008)(47076004)(86362001)(55016002)(186003)(53546011)(82740400003)(81166007)(356005)(54906003)(36906005)(316002)(6506007)(30864003)(8936002)(33656002)(82310400003);
 DIR:OUT; SFP:1101; 
X-OriginatorOrg: arm.com
X-MS-Exchange-CrossTenant-OriginalArrivalTime: 04 Nov 2020 13:38:05.1985 (UTC)
X-MS-Exchange-CrossTenant-Network-Message-Id: 6714c063-881b-4177-67c3-08d880c6dbdb
X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d
X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[63.35.35.123];
 Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com]
X-MS-Exchange-CrossTenant-AuthSource: VE1EUR03FT058.eop-EUR03.prod.protection.outlook.com
X-MS-Exchange-CrossTenant-AuthAs: Anonymous
X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem
X-MS-Exchange-Transport-CrossTenantHeadersStamped: HE1PR0801MB1962
X-Spam-Status: No, score=-10.2 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, GIT_PATCH_0, INDUSTRIAL_BODY, INDUSTRIAL_SUBJECT,
 RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_PASS, TXREP,
 UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.2
X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Wed, 04 Nov 2020 13:38:11 -0000

> -----Original Message-----
> From: rguenther@c653.arch.suse.de <rguenther@c653.arch.suse.de> On
> Behalf Of Richard Biener
> Sent: Wednesday, November 4, 2020 12:41 PM
> To: Tamar Christina <Tamar.Christina@arm.com>
> Cc: Richard Sandiford <Richard.Sandiford@arm.com>; nd <nd@arm.com>;
> gcc-patches@gcc.gnu.org
> Subject: RE: [PATCH v2 3/16]middle-end Add basic SLP pattern matching
> scaffolding.
>=20
> On Tue, 3 Nov 2020, Tamar Christina wrote:
>=20
> > Hi Richi,
> >
> > This is a respin which includes the changes you requested.
>=20
> Comments randomly ordered, I'm pasting in pieces of the patch - sending i=
t
> inline would help to get pieces properly quoted and in-order.
>=20
> diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h index
> 4bd454cfb185d7036843fc7140b073f525b2ec6a..b813508d3ceaf4c54f612bc10f9
> aa42ffe0ce0dd
> 100644
> --- a/gcc/tree-vectorizer.h
> +++ b/gcc/tree-vectorizer.h
> ...
>=20
> I miss comments in this file, see tree-vectorizer.h where we try to docum=
ent
> purpose of classes and fields.
>=20
> Things that sticks out to me:
>=20
> +    uint8_t m_arity;
> +    uint8_t m_num_args;
>=20
> why uint8_t and not simply unsigned int?  Not knowing what arity /
> num_args should be here ;)

I think I can remove arity, but num_args is how many operands the created
internal function call should take.  Since we can't vectorize calls with mo=
re than
4 arguments at the moment it seemed like 255 would be a safe limit :).

>=20
> +    vec_info *m_vinfo;
> ...
> +    vect_pattern (slp_tree *node, vec_info *vinfo)
>=20
> so this looks like something I freed stmt_vec_info of - back-pointers in =
the
> "wrong" direction of the logical hierarchy.  I suppose it's just to avoid=
 passing
> down vinfo where we need it?  Please do that instead - pass down vinfo as
> everything else does.
>=20
> The class seems to expose both very high-level (build () it!) and very lo=
w
> level details (get_ifn).  The high-level one suggests that a pattern _not=
_
> being represented by an ifn is possible but there's too much implementati=
on
> detail already in the vect_pattern class to make that impossible.  I gues=
s the
> IFN details could be pushed down to the simple matching class (and that b=
e
> called vect_ifn_pattern or so).
>=20
> +static bool
> +vect_match_slp_patterns (slp_tree *ref_node, vec_info *vinfo) {
> +  DUMP_VECT_SCOPE ("vect_match_slp_patterns");
> +  bool found_p =3D false;
> +
> +  if (dump_enabled_p ())
> +    {
> +      dump_printf_loc (MSG_NOTE, vect_location, "-- before patt match
> --\n");
> +      vect_print_slp_graph (MSG_NOTE, vect_location, *ref_node);
> +      dump_printf_loc (MSG_NOTE, vect_location, "-- end patt --\n");
> +    }
>=20
> we dumped all instances after their analysis.  Maybe just refer to the
> instance with its address (dump_print %p) so lookup in the (already large=
)
> dump file is easy.
>=20
> +  hash_set<slp_tree> *visited =3D new hash_set<slp_tree> ();  for
> + (unsigned x =3D 0; x < num__slp_patterns; x++)
> +    {
> +      visited->empty ();
> +      found_p |=3D vect_match_slp_patterns_2 (ref_node, vinfo,
> slp_patterns[x],
> +                                           visited);
> +    }
> +
> +  delete visited;
>=20
> no need to new / delete, just do
>=20
>   has_set<slp_tree> visited;
>=20
> like everyone else.  Btw, do you really want to scan pieces of the SLP gr=
aph
> (with instances being graph entries) multiple times?  If not then you sho=
uld
> move the visited set to the caller instead.
>=20
> +  /* TODO: Remove in final version, only here for generating debug dot
> graphs
> +          from SLP tree.  */
> +
> +  if (dump_enabled_p ())
> +    {
> +      dump_printf_loc (MSG_NOTE, vect_location, "-- start dot --\n");
> +      vect_print_slp_graph (MSG_NOTE, vect_location, *ref_node);
> +      dump_printf_loc (MSG_NOTE, vect_location, "-- end dot --\n");
> +    }
>=20
> now, if there was some pattern matched it is probably useful to dump the
> graph (entry) again.  But only conditional on that I think.  So can you i=
nstead
> make the dump conditional on found_p and remove the start dot/end dot
> markers as said in the comment?
>=20
> +         if (dump_enabled_p ())
> +           dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +                            "transformation for %s not valid due to
> + post
> "
> +                            "condition\n",
>=20
> not really a MSG_MISSED_OPTIMIZATION, use MSG_NOTE.
> MSG_MISSED_OPTIMIZATION should be used for things (likely) making
> vectorization fail.
>=20
> +  /* Perform recursive matching, it's important to do this after
> + matching
> things
>=20
> before matching things?
>=20
> +     in the current node as the matches here may re-order the nodes
> + below
> it.
> +     As such the pattern that needs to be subsequently match may change.
>=20
> and this is no longer true?
>=20
> */
> +
> +  if (SLP_TREE_CHILDREN (node).exists ()) {
>=20
> elide this check, the loop will simply not run if empty
>=20
> +    slp_tree child;
> +    FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (node), i, child)
>=20
> I think you want to perform the recursion in the caller so you do it only=
 once
> and not once for each pattern kind now that you do post-order processing
> rather than pre-order.

That wouldn't work as that would make multiply match before FMA/FMS.
I had tried having FM(A|S) match MUL instead, but MUL does a lot of work to
create permute nodes, which would have to be undone.

The FMS case is somewhat easy as that still has a two operators node in bet=
ween,
But FMA is just a simple add on top.  I have to then remove the merging nod=
es that MUL
Created, and drop half the operands to create the new node.

Which is doable but effectively does the same pattern twice, whereas now th=
e MUL would
just skip all that work since FMA matched.=20

The post order is essentially really problematic for overlapping patterns. =
 In pre-order I could for
Instance detect all 3 patterns in one traversal and share the matching info=
rmation between all 3.

Right now I can only do it between mul and add and FMA has to be standalone=
.

>=20
> +  vect_pattern *pattern =3D patt_fn (ref_node, vinfo);
> +
> +  if (pattern->matches ())
>=20
> this suggests you get number of SLP nodes times number of pattern
> matchers allocations & inits of vect_pattern.  If you move
>=20
> +      if (pattern->is_optab_supported_p (vectype, OPTIMIZE_FOR_SPEED))
> +       {
>=20
> into ->matches () then whether this is a IFN or multi-node pattern become=
s
> an implementation detail which would be IMHO better.
>=20
> +  FOR_EACH_VEC_ELT (*this->m_nodes, ix, node)
> +    {
> +      /* Calculate the location of the statement in NODE to replace.  */
> +      stmt_info =3D SLP_TREE_REPRESENTATIVE (node);
> +      gimple* old_stmt =3D STMT_VINFO_STMT (stmt_info);
> +      tree type =3D gimple_expr_type (old_stmt);
> +
> +      /* Create the argument set for use by
> gimple_build_call_internal_vec.  */
> +      for (int i =3D 0; i < this->m_num_args; i++)
>=20
> the scalar argument type is the type of the definition so instead of
> gimple_expr_type you want simply TREE_TYPE (gimple_get_lhs (old_stmt))
>=20
> But then you seem to mix the result and the argument types?  I think you
> need to walk over SLP_TREE_CHILDREN and look at their representatives
> scalar def.  Which would then assume the pattern consumes all direct
> children.  Somehow this "generic"

Since I'm only consuming simple math ops would the argument and result type=
s
differ? Something like a widening sub I can see but not the normal sub.

> vect_simple_pattern_match::build () has quite some restrictions.
> I guess they are documented in the files toplevel comment which also has
> general parts applying to all possible pattern matchers (not deriving fro=
m
> vect_simple_pattern_match).  Can you split up the comment and more
> explicitely mark what applies to patterns in general (postorder traversal=
) and
> what applies to vect_simple_pattern_match only?
>=20
> +      /* We have to explicitly mark the old statement as unused because
> during
> +        statement analysis the original and new pattern statement may
> require
>=20
> comment needs updating.
>=20
> +        different level of unrolling.  As an example add/sub when
> vectorized
> +        without a pattern requires 4 copies, whereas with a COMPLEX_ADD
> pattern
> +        this only requires 2 copies and the two statement will be treate=
d
> as
> +        hand unrolled.  That means that the analysis won't happen as
> it'll find
> +        a mismatch.  So we don't analyze the old statement and if we end
> up
> +        needing it, e.g. SLP fails then we have to quickly re-analyze it=
.
> */
> +      STMT_VINFO_RELEVANT (call_stmt_info) =3D vect_used_in_scope;
>=20
> I guess relevance should be copied from the original stmt (like if
> it was used_by_reduction).  Can't that be done in
> vect_mark_pattern_stmts?  Likewise the ->add_pattern_stmt part
> could be done there, too.  So you'd be left with
>=20
> +      STMT_SLP_TYPE (call_stmt_info) =3D pure_slp;
>=20
> and
>=20
> +      STMT_VINFO_SLP_VECT_ONLY (call_stmt_info) =3D true;
>=20
> where the pure_slp setting should in theory be not needed (it
> should be automagically detected so) but setting it is not wrong
> (it would be an optimization).

I am using it as a marker for a node dissolve needs to look at when undoing=
 SLP
so it restores the relevancy, but given your comment below on that I may be=
 able
to get rid of it.

>=20
> --
>=20
> I'm now looking down the series to see how this is all used,
> so quoting different pieces from other patches.
>=20
> +/* Checks to see of the expression EXPR is a gimple assign with code COD=
E
> +   and if this is the case the two operands of EXPR is returned in OP1
> and
> +   OP2.
> +
> +   If the matching and extraction is successful TRUE is returned
> otherwise
> +   FALSE in which case the value of OP1 and OP2 will not have been
> touched.
> +*/
> +
> +static bool
> +vect_match_expression_p (slp_tree node, tree_code code)
> +{
> +  if (!SLP_TREE_REPRESENTATIVE (node))
>=20
> I see that many overall function comments do not actually match
> anymore, if at least for the parameters refered to.  Please go
> over the patch and update them.
>=20
> +  gimple* expr =3D STMT_VINFO_STMT (SLP_TREE_REPRESENTATIVE (node));
> +  if (!is_gimple_assign (expr)
> +      || gimple_expr_code (expr) !=3D code)
>=20
> gimple_assign_rhs_code (expr)
>=20
> +/* Check if the given lane permute in PERMUTES matches an alternating
> sequence
> +   of {P0 P1 P0 P1 ...}.  This to account for unrolled loops.   */
> +static bool
> +vect_check_lane_permute (lane_permutation_t &permutes,
> +                        unsigned p0, unsigned p1)
> +{
> +  unsigned val[2] =3D {p0, p1};
> +  for (unsigned i =3D 0; i < permutes.length (); i++)
> +    if (permutes[i].first !=3D val[i % 2])
> +      return false;
>=20
> so this doesn't put any constraint on permutes[].second.  So it
> matches an arbitrary interleaving of two vectors.
>=20
> +
> +   will return PLUS_MINUS along with OPS containing {_37, _12, _38, _36}=
.
> +*/
> +
> +static complex_operation_t
> +vect_detect_pair_op (slp_tree node1, slp_tree node2, lane_permutation_t
> &lanes,
> +                    bool two_operands =3D true, vec<slp_tree> *ops =3D N=
ULL)
>=20
>  { {_37, _38}, {_12, _36} }
>=20
> there's a lot of vect_match_expression_p calls and I wonder if
> inlining them and CSEing the 'code' compute of node1 and node2
> would simplify things and elide the macro.  Also we then know
> this is all CSEd in the end ...
>=20
> I wonder what LANES specifies, TWO_OPERANDS specifies whether
> two-operand variants are allowed unless .. (guess LANES causes
> only same operands to match up in the end).
>=20

I'm guarding against that you could technically speaking have a +/- node
that does reverses the order of the elements below it, in which case that's
a different operation than what we want to guard against.

I hadn't been able to prove to myself that SLP build would never do this.

> +static void
> +vect_mark_stmts_as_in_pattern (hash_set<slp_tree> *cache, slp_tree
> node)
> +{
>=20
> need to find a use for this still, but don't you set pure_slp
> "above"?

This is used to mark the elided nodes.  For instance in a complex multiply =
the
* nodes in the SLP tree become unused.  While not having them in the SLP tr=
ee
is simple, the detection of pure/hybrid is led from the BB statements so th=
ey would
mark the elided nodes as being hybrid.

There is of course the problem of is this node used in any other SLP instan=
ce, it may force
the analysis to think the SLP tree is pure, but if SLP fails it would try n=
ot SLP anyway as it
would have done eventually if Hybrid.

>=20
> +bool
> +complex_pattern::validate_p ()
> +{
> +      unsigned i;
> +      FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (node), i, tmp)
> +       {
> +         slp_tree vnode =3D NULL;
> +         if (vect_slp_make_linear (node, tmp, &vnode))
> +           nodes.quick_push (vnode);
> +         else
> +           {
> +             nodes.release();
> +             return false;
> +           }
>=20
> hum.  vect_slp_make_linear better not fail?  At least we've
> created half-way stuff when it does.  Can't spot the implementation
> right now (guess splitting the patch up doesn't really help much ...)
>=20

It can fail, so I see I need to free some nodes otherwise I'll leak... But
It's not a correctness issue. Nothing It changed on NODE until it all
succeeds.

> Noting elsewhere:
>=20
> +static void
> +vect_dissolve_slp_only_patterns (loop_vec_info loop_vinfo,
> +                                hash_set<slp_tree> *visited, slp_tree
> root)
> +{
> +  if (!root || visited->contains (root))
> +    return;
> +
> +  unsigned int i;
> +  slp_tree node;
> +  stmt_vec_info related_stmt_info;
> +  stmt_vec_info stmt_info =3D SLP_TREE_REPRESENTATIVE (root);
> +
> +  visited->add (root);
>=20
> visited->add () returns whether the key was already there so
> please combine the contains and the add calls here and elsewhere.
>=20
>     if (stmt_info && STMT_VINFO_SLP_VECT_ONLY (stmt_info)
> +        && (related_stmt_info =3D STMT_VINFO_RELATED_STMT (stmt_info)) !=
=3D
> NULL)
> +      {
> +       if (dump_enabled_p ())
>=20
> I think STMT_VINFO_SLP_VECT_ONLY is a thing already, so I wonder
> if the above is a sufficient test.  There's is_pattern_stmt_p ()
> (which obviously also applies to non-SLP only patterns).
>=20
> Note that I also see another issue, namely with hybrid SLP the
> original non-pattern stmt would need to stay relevant.  That
> probably asks for hybrid SLP discovery and relevance marking
> to be combined somehow.  You can probably create a simple
> testcase by storing a lane of a complex op via a non-grouped
> store.
>=20
> +       STMT_VINFO_RELEVANT (stmt_info) =3D vect_unused_in_scope;
> +       STMT_VINFO_RELEVANT (related_stmt_info) =3D vect_used_in_scope;
>=20
> as said previously the relevance should be copied, in this case
> back to the original stmt info from the pattern stmt info.
>=20
> +       STMT_VINFO_IN_PATTERN_P (related_stmt_info) =3D false;
> +       STMT_SLP_TYPE (related_stmt_info) =3D loop_vect;
>=20
> one option might be to delay pattern recognition (for the loop case)
> to after vect_detect_hybrid_slp and simply not make any hybrid
> stmt containing nodes part of a SLP pattern.  It would be a workaround
> of course, not the best solution.  Another option is to
> add a new field to stmt_info to put a "saved" relevance to and
> simply go over all stmts restoring relevance - we already restore
> SLP_TYPE to loop_vect that way at

Yes.. ideally I would want something like telling it "don't worry about thi=
s if
you end up being pure SLP".

I haven't tried with this new approach to instead add it to the PATTERN_SEQ
for the statement.   Last time I tried however it looked like in order to g=
et this
to work correctly I'd need a bigger hack..

Thanks,
Tamar

>=20
>   /* Reset SLP type to loop_vect on all stmts.  */
>   for (i =3D 0; i < LOOP_VINFO_LOOP (loop_vinfo)->num_nodes; ++i)
>     {
>=20
> so simply restore the saved relevance would work there (guess I
> like that more than what vect_dissolve_slp_only_patterns does,
> in any case do what vect_dissolve_slp_only_patterns does in the
> above loop please).
>=20
> Thanks,
> Richard.
>=20
>=20
>=20
> > Thanks,
> > Tamar
> >
> > gcc/ChangeLog:
> >
> > 	* Makefile.in (tree-vect-slp-patterns.o): New.
> > 	* doc/passes.texi: Update documentation.
> > 	* tree-vect-slp.c (vect_print_slp_tree): Add new state.
> > 	(vect_match_slp_patterns_2, vect_match_slp_patterns): New.
> > 	(vect_analyze_slp): Call pattern matcher.
> > 	* tree-vectorizer.h (enum _complex_operation):
> > 	(class vect_pattern_match, class vect_pattern): New.
> > 	* tree-vect-slp-patterns.c: New file.
> >
> > > -----Original Message-----
> > > From: rguenther@c653.arch.suse.de <rguenther@c653.arch.suse.de> On
> > > Behalf Of Richard Biener
> > > Sent: Tuesday, September 29, 2020 10:42 AM
> > > To: Richard Sandiford <Richard.Sandiford@arm.com>
> > > Cc: Tamar Christina <Tamar.Christina@arm.com>; nd <nd@arm.com>;
> gcc-
> > > patches@gcc.gnu.org
> > > Subject: Re: [PATCH v2 3/16]middle-end Add basic SLP pattern matching
> > > scaffolding.
> > >
> > > On Tue, 29 Sep 2020, Richard Sandiford wrote:
> > >
> > > > Richard Biener <rguenther@suse.de> writes:
> > > > >> > > @@ -2192,6 +2378,17 @@ vect_analyze_slp_instance (vec_info
> > > *vinfo,
> > > > >> > >                               &tree_size, bst_map);
> > > > >> > >    if (node !=3D NULL)
> > > > >> > >      {
> > > > >> > > +      /* Temporarily allow add_stmt calls again.  */
> > > > >> > > +      vinfo->stmt_vec_info_ro =3D false;
> > > > >> > > +
> > > > >> > > +      /* See if any patterns can be found in the constructe=
d SLP
> tree
> > > > >> > > +        before we do any analysis on it.  */
> > > > >> > > +      vect_match_slp_patterns (node, vinfo, group_size,
> > > &max_nunits,
> > > > >> > > +                              matches, &npermutes, &tree_si=
ze,
> > > > >> > > + bst_map);
> > > > >> > > +
> > > > >> > > +      /* After this no more add_stmt calls are allowed.  */
> > > > >> > > +      vinfo->stmt_vec_info_ro =3D true;
> > > > >> > > +
> > > > >> > >
> > > > >> > > I think this is a bit early to match patterns - I'd defer it=
 to
> > > > >> > > the point where all entries into the same SLP subgraph are
> > > > >> > > analyzed, thus somewhere at the end of vect_analyze_slp loop
> > > > >> > > over all instances and match patterns?  That way phases are
> more
> > > clearly separated.
> > > > >> >
> > > > >> > That would probably work, my only worry is that the SLP analys=
is
> > > > >> > itself may fail and bail out at
> > > > >> >
> > > > >> > 	  /* If the loads and stores can be handled with load/store-
> lane
> > > > >> > 	     instructions do not generate this SLP instance.  */
> > > > >> > 	  if (is_a <loop_vec_info> (vinfo)
> > > > >> > 	      && loads_permuted
> > > > >> > 	      && dr && vect_store_lanes_supported (vectype,
> group_size,
> > > > >> > false))
> > > > >
> > > > > Ah, that piece of code.  Yeah, I'm repeatedly running into it as
> > > > > well - it's a bad hack that stands in the way all the time :/
> > > >
> > > > At one point I was wondering about trying to drop the above, vector=
ise
> > > > with and without SLP, and then compare their costs, like for
> > > VECT_COMPARE_COSTS.
> > > > But that seemed like a dead end with the move to doing everything o=
n
> > > > the SLP representation.
> > >
> > > Yeah ... though even moving everything to the SLP representation will
> retain
> > > the issue since there it will be N group-size 1 SLP instances vs. 1 g=
roup-
> size N
> > > SLP instance.
> > >
> > > > > I guess we should try moving this upward like to
> > > > > vect_analyze_loop_2 right before
> > > > >
> > > > >   /* Check the SLP opportunities in the loop, analyze and build S=
LP
> trees.
> > > > > */
> > > > >   ok =3D vect_analyze_slp (loop_vinfo, *n_stmts);
> > > > >   if (!ok)
> > > > >     return ok;
> > > > >
> > > > > and there check whether all grouped loads and stores can be handl=
ed
> > > > > with store-/load-lanes (and there are no SLP reduction chains?) i=
n
> > > > > which case do not try to attempt SLP at all.  Because the testcas=
es
> > > > > this check was supposed to change were all-load/store-lane or all
> > > > > SLP so the mixed case is probably not worth special casing.
> > > > >
> > > > > Since load-/store-lanes is an arm speciality I tried to only touc=
h
> > > > > this fragile part with a ten-foot pole ;)  CCing Richard, if he a=
cks
> > > > > the above I can produce a patch.
> > > >
> > > > Yeah, sounds good to me.  Probably also sorth checking whether the
> > > > likely_max iteration count is high enough to support group_size
> > > > vectors, if we have enough information to guess that.
> > > >
> > > > We could also get the gen* machinery to emit a macro that is true i=
f
> > > > at least one load/store-lane pattern is defined, so that we can ski=
p
> > > > the code for non-Arm targets.  I can do that as a follow-up.
> > >
> > > I've had a second look and one complication is that we only elide the=
 SLP
> > > node if any of the loads are permuted.  So if all loads/stores are
> unpermuted
> > > but load/store-lanes would work we'd keep the SLP node.
> > >
> > > Of course without actually building the SLP node we don't know whethe=
r
> the
> > > loads will be permuted or not ...
> > >
> > > But surely the current place for the check will cause some testcases =
to
> > > become hybrid vectorizations which is likely undesirable.
> > >
> > > So we could move the check after all SLP discovery is completed and
> throw it
> > > all away if we can and should use load/store-lanes?
> > > But that might then not solve Tamars issue.
> > >
> > > Richard.
> >
>=20
> --
> Richard Biener <rguenther@suse.de>
> SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409
> Nuernberg,
> Germany; GF: Felix Imend