From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from EUR05-AM6-obe.outbound.protection.outlook.com (mail-am6eur05on2070.outbound.protection.outlook.com [40.107.22.70]) by sourceware.org (Postfix) with ESMTPS id 31328388703D for ; Wed, 4 Nov 2020 13:38:08 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 31328388703D Received: from AM6P192CA0105.EURP192.PROD.OUTLOOK.COM (2603:10a6:209:8d::46) by HE1PR0801MB1962.eurprd08.prod.outlook.com (2603:10a6:3:57::12) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3499.19; Wed, 4 Nov 2020 13:38:05 +0000 Received: from VE1EUR03FT058.eop-EUR03.prod.protection.outlook.com (2603:10a6:209:8d:cafe::73) by AM6P192CA0105.outlook.office365.com (2603:10a6:209:8d::46) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3499.18 via Frontend Transport; Wed, 4 Nov 2020 13:38:05 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; gcc.gnu.org; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;gcc.gnu.org; dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by VE1EUR03FT058.mail.protection.outlook.com (10.152.19.86) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3520.15 via Frontend Transport; Wed, 4 Nov 2020 13:38:05 +0000 Received: ("Tessian outbound 68da730eaaba:v64"); Wed, 04 Nov 2020 13:38:03 +0000 X-CR-MTA-TID: 64aa7808 Received: from 3c368ded5c4e.2 by 64aa7808-outbound-1.mta.getcheckrecipient.com id AD8887C9-AA55-4798-BBDB-8D7873479174.1; Wed, 04 Nov 2020 13:37:58 +0000 Received: from EUR04-HE1-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id 3c368ded5c4e.2 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Wed, 04 Nov 2020 13:37:58 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=QpMXaVflGBPzs9mI/4QcDFtUoAqt1agf1694gfUNQp5hPtDLIF2a5lVTkLMPpYFVCa9Q3amozjdU40zmjg0OboXE0eyz0GbIqLxdELwm9mlW78lWCSV+vEZHlsLrhaZ3qpk66KohIN/D2YLM+rDaPMdFKiSnPaZY96PIQeABemO1Tb3jlSqeKPgwbueeczz0vvmQ9CnetEzdejtR3VXtyScOwK60sejWlJidbPDZkph5uqPL75oiN7kY+goKXkKV3ENRSzdoHFBuNTTaOdUcwIC+cCi0AP5M777WL4+vUFytDSLPynhxY+GW4qj7sXJ7Ezsd1CYi80OC3rf9NWlRpA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=kftPFiwyCbKSyd42XHcEoGKko/84hDzP7u4RfSmWv0E=; b=O6PfrlhzmC16E1h1VafQUkPCS4kkvxgQfVu514SONihh0PWuD86EO5IFVM+LsWpnqR5PuZIGTl0XoNjyjY2nuWyhR93FWVJ2GP5Y4tq4Wm6oH7diUnmoRvlxC1/zz1xUGRYZzQqg3rhYRalgGfPHkIKXQLCSN2pk3692GgxIoOV84M1sRsE4eGEyytvSrBeFUReDQ67/JAyb03EwALUWYIX2plMw0nPt/o+BOZxDI3Lq+6w5lySwI/BS4BM6LVo0hu1Om+x88putxFtvHllEGjA9NLOUkjyu/latyGZ1qgv0R3tYphU+9hjcxn0t6vM/7UQ3my/NSq4dc5CHs02mfg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none Received: from VI1PR08MB5325.eurprd08.prod.outlook.com (2603:10a6:803:13e::17) by VI1PR08MB3054.eurprd08.prod.outlook.com (2603:10a6:803:4a::32) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3499.19; Wed, 4 Nov 2020 13:37:55 +0000 Received: from VI1PR08MB5325.eurprd08.prod.outlook.com ([fe80::d0e7:49cd:4dae:a2a2]) by VI1PR08MB5325.eurprd08.prod.outlook.com ([fe80::d0e7:49cd:4dae:a2a2%6]) with mapi id 15.20.3499.032; Wed, 4 Nov 2020 13:37:55 +0000 From: Tamar Christina To: Richard Biener CC: Richard Sandiford , nd , "gcc-patches@gcc.gnu.org" Subject: RE: [PATCH v2 3/16]middle-end Add basic SLP pattern matching scaffolding. Thread-Topic: [PATCH v2 3/16]middle-end Add basic SLP pattern matching scaffolding. Thread-Index: AQHWk0gPfg3binthXU6SMGDs22R/u6l+AX0AgAAQ82CAAD71gIAA4CwAgAAs+/GAAAR+AIA3Te0AgAF394CAAAgZEA== Date: Wed, 4 Nov 2020 13:37:54 +0000 Message-ID: References: <20200925142753.GA13692@arm.com> In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-ts-tracking-id: 12DB1D30F72C9D4495B3150A6C6068DA.0 x-checkrecipientchecked: true Authentication-Results-Original: suse.de; dkim=none (message not signed) header.d=none;suse.de; dmarc=none action=none header.from=arm.com; x-originating-ip: [82.24.248.186] x-ms-publictraffictype: Email X-MS-Office365-Filtering-HT: Tenant X-MS-Office365-Filtering-Correlation-Id: 6714c063-881b-4177-67c3-08d880c6dbdb x-ms-traffictypediagnostic: VI1PR08MB3054:|HE1PR0801MB1962: x-ms-exchange-transport-forked: True X-Microsoft-Antispam-PRVS: x-checkrecipientrouted: true nodisclaimer: true x-ms-oob-tlc-oobclassifiers: OLM:10000;OLM:10000; X-MS-Exchange-SenderADCheck: 1 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: iQx1klYh3z9r2OgfryLBeA+OC+unAMVN3Fs+xDRGt90vQb2HRlvr2XVYq87+sRZ0XUykb1ZS94TXXgI2dqTIFxETq9mzibtUF2NpMH+xV47MwltIo6CoA5mt22UfOOEJj78RZ4XVFfScwP9PbATBQCOJwCDwSBMv/rcdwZFoX+Rgl69DcQs0qN4iTElZxdhk6m8LueWFFOpE3wiuyHDk7eCw22j+BZtR7trkgT21AY3/ltIENvzojr9rE1E/INSXDliwC5nKghNl/X+t4bQnekAU+CKL5AHL44FKdVgTPCaSViU2/Jt6a0inzZcCxN98lO8QV0HVtZRqAUJdWZ3zJA== X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:VI1PR08MB5325.eurprd08.prod.outlook.com; PTR:; CAT:NONE; SFS:(4636009)(136003)(396003)(376002)(346002)(366004)(39860400002)(7696005)(8676002)(52536014)(5660300002)(4326008)(71200400001)(33656002)(8936002)(2906002)(54906003)(83380400001)(9686003)(6506007)(55016002)(26005)(53546011)(64756008)(66556008)(66946007)(66446008)(66476007)(76116006)(478600001)(30864003)(6916009)(86362001)(316002)(186003)(579004); DIR:OUT; SFP:1101; x-ms-exchange-antispam-messagedata: pbJde5COGaMRv/do5jaji/KmFdiM4r4/py5WpAQgOEBZty+D/9r6VbzjpeziZF4ZI0T4MrTPPGDeTboT9P4OID9Glcs27AlhAEOI1OFhw31zKNteDdF9ObaGvWpEyJZPJ+IxCiLoy4/OHZ3rpDxiVETMYpm5fpnHVZCTJh9zS0ss8nNcXozKIT9SvPKRff0lu4pYnYU6wwizUzuR1ECXpH5GaYD+Qum/2dRYdlV+GFuBak7xQoxWBk8xEiNhk/dvngoGxPkURkSdXOi8idCDqw4XxoEV/fLIkM0zsUk6k6MR2wUYiVz2Emd862U1fUoJsaUcqnV3lPSPsjlbYHC5GAsarak0JH7D5qef1OwzpMXoB/DDC4Wx0Nyh620dwXPpm8+5opo/qXJyDrvGcUvCu0PpQWnyhizeW4E0csPOOFi+jZvzKEbHdDmKSZhrgHGr0t73hXYBmyBWkM02/TJx6clWKN3xs3v8G+PRhte6NbyrnQr+wBwS4pB6odb6lRsan+kmUmRCYTfmYBFXK2QQs4vFI4NiZEV/HCQAy4bp18Xz0eseQ1HRjymS2TxaQsjsJZ8ftY9gwNR3yYDWK8OyyD6YH1SoJfrCpPaWuyliDkyumGXByhrd+cnoX7nHX9GbjTIK62w/iGbfa4W39ZWXoQ== Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-MS-Exchange-Transport-CrossTenantHeadersStamped: VI1PR08MB3054 Original-Authentication-Results: suse.de; dkim=none (message not signed) header.d=none;suse.de; dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: VE1EUR03FT058.eop-EUR03.prod.protection.outlook.com X-MS-Office365-Filtering-Correlation-Id-Prvs: 6d31f046-fe52-43ae-a2bf-08d880c6d5ec X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: uo0cxcqQ66PPnrBSyA3zPsqItuw3hZfA+hvv8V87yKPg6rDbSWqwJ91eb/qwyEJ8++lj3FlOx1NBumKwPUktRaZWgvpbuEL5vNspgZN+dPCxT2ATw9/SOGFuJX9L+YT2pS2/vA6iA/Pe0o0SBxfcFrkbkYb6Cqsy7DO5B43y55/LI4XQ8Siee7GKijB99rCMIELQC5+GA2SEo0bQV/0nvksJl+sY6IZWemNW9Nn4CGFc4KKH8CfJyX2/Q3DfpOsrJhRoM/tlOz6S+KZSQCKKrWU+yiOFdTh6Fm2iLLIlgnIr/eVa70FbRUNAMZep2WtIy7JkFYzhGFZJ3xD81qZJvgavAIpAs1oxLm/zsWzllYCGAURzS4jUPA/+0DwYDyBGlTiedfmtFaAH2yaXSdrKfg== X-Forefront-Antispam-Report: CIP:63.35.35.123; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:64aa7808-outbound-1.mta.getcheckrecipient.com; PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com; CAT:NONE; SFS:(4636009)(39860400002)(396003)(346002)(376002)(136003)(46966005)(478600001)(2906002)(8676002)(70206006)(336012)(6862004)(70586007)(52536014)(7696005)(83380400001)(5660300002)(26005)(9686003)(4326008)(47076004)(86362001)(55016002)(186003)(53546011)(82740400003)(81166007)(356005)(54906003)(36906005)(316002)(6506007)(30864003)(8936002)(33656002)(82310400003); DIR:OUT; SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 04 Nov 2020 13:38:05.1985 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 6714c063-881b-4177-67c3-08d880c6dbdb X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[63.35.35.123]; Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: VE1EUR03FT058.eop-EUR03.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: HE1PR0801MB1962 X-Spam-Status: No, score=-10.2 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, GIT_PATCH_0, INDUSTRIAL_BODY, INDUSTRIAL_SUBJECT, RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_PASS, TXREP, UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 04 Nov 2020 13:38:11 -0000 > -----Original Message----- > From: rguenther@c653.arch.suse.de On > Behalf Of Richard Biener > Sent: Wednesday, November 4, 2020 12:41 PM > To: Tamar Christina > Cc: Richard Sandiford ; nd ; > gcc-patches@gcc.gnu.org > Subject: RE: [PATCH v2 3/16]middle-end Add basic SLP pattern matching > scaffolding. >=20 > On Tue, 3 Nov 2020, Tamar Christina wrote: >=20 > > Hi Richi, > > > > This is a respin which includes the changes you requested. >=20 > Comments randomly ordered, I'm pasting in pieces of the patch - sending i= t > inline would help to get pieces properly quoted and in-order. >=20 > diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h index > 4bd454cfb185d7036843fc7140b073f525b2ec6a..b813508d3ceaf4c54f612bc10f9 > aa42ffe0ce0dd > 100644 > --- a/gcc/tree-vectorizer.h > +++ b/gcc/tree-vectorizer.h > ... >=20 > I miss comments in this file, see tree-vectorizer.h where we try to docum= ent > purpose of classes and fields. >=20 > Things that sticks out to me: >=20 > + uint8_t m_arity; > + uint8_t m_num_args; >=20 > why uint8_t and not simply unsigned int? Not knowing what arity / > num_args should be here ;) I think I can remove arity, but num_args is how many operands the created internal function call should take. Since we can't vectorize calls with mo= re than 4 arguments at the moment it seemed like 255 would be a safe limit :). >=20 > + vec_info *m_vinfo; > ... > + vect_pattern (slp_tree *node, vec_info *vinfo) >=20 > so this looks like something I freed stmt_vec_info of - back-pointers in = the > "wrong" direction of the logical hierarchy. I suppose it's just to avoid= passing > down vinfo where we need it? Please do that instead - pass down vinfo as > everything else does. >=20 > The class seems to expose both very high-level (build () it!) and very lo= w > level details (get_ifn). The high-level one suggests that a pattern _not= _ > being represented by an ifn is possible but there's too much implementati= on > detail already in the vect_pattern class to make that impossible. I gues= s the > IFN details could be pushed down to the simple matching class (and that b= e > called vect_ifn_pattern or so). >=20 > +static bool > +vect_match_slp_patterns (slp_tree *ref_node, vec_info *vinfo) { > + DUMP_VECT_SCOPE ("vect_match_slp_patterns"); > + bool found_p =3D false; > + > + if (dump_enabled_p ()) > + { > + dump_printf_loc (MSG_NOTE, vect_location, "-- before patt match > --\n"); > + vect_print_slp_graph (MSG_NOTE, vect_location, *ref_node); > + dump_printf_loc (MSG_NOTE, vect_location, "-- end patt --\n"); > + } >=20 > we dumped all instances after their analysis. Maybe just refer to the > instance with its address (dump_print %p) so lookup in the (already large= ) > dump file is easy. >=20 > + hash_set *visited =3D new hash_set (); for > + (unsigned x =3D 0; x < num__slp_patterns; x++) > + { > + visited->empty (); > + found_p |=3D vect_match_slp_patterns_2 (ref_node, vinfo, > slp_patterns[x], > + visited); > + } > + > + delete visited; >=20 > no need to new / delete, just do >=20 > has_set visited; >=20 > like everyone else. Btw, do you really want to scan pieces of the SLP gr= aph > (with instances being graph entries) multiple times? If not then you sho= uld > move the visited set to the caller instead. >=20 > + /* TODO: Remove in final version, only here for generating debug dot > graphs > + from SLP tree. */ > + > + if (dump_enabled_p ()) > + { > + dump_printf_loc (MSG_NOTE, vect_location, "-- start dot --\n"); > + vect_print_slp_graph (MSG_NOTE, vect_location, *ref_node); > + dump_printf_loc (MSG_NOTE, vect_location, "-- end dot --\n"); > + } >=20 > now, if there was some pattern matched it is probably useful to dump the > graph (entry) again. But only conditional on that I think. So can you i= nstead > make the dump conditional on found_p and remove the start dot/end dot > markers as said in the comment? >=20 > + if (dump_enabled_p ()) > + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, > + "transformation for %s not valid due to > + post > " > + "condition\n", >=20 > not really a MSG_MISSED_OPTIMIZATION, use MSG_NOTE. > MSG_MISSED_OPTIMIZATION should be used for things (likely) making > vectorization fail. >=20 > + /* Perform recursive matching, it's important to do this after > + matching > things >=20 > before matching things? >=20 > + in the current node as the matches here may re-order the nodes > + below > it. > + As such the pattern that needs to be subsequently match may change. >=20 > and this is no longer true? >=20 > */ > + > + if (SLP_TREE_CHILDREN (node).exists ()) { >=20 > elide this check, the loop will simply not run if empty >=20 > + slp_tree child; > + FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (node), i, child) >=20 > I think you want to perform the recursion in the caller so you do it only= once > and not once for each pattern kind now that you do post-order processing > rather than pre-order. That wouldn't work as that would make multiply match before FMA/FMS. I had tried having FM(A|S) match MUL instead, but MUL does a lot of work to create permute nodes, which would have to be undone. The FMS case is somewhat easy as that still has a two operators node in bet= ween, But FMA is just a simple add on top. I have to then remove the merging nod= es that MUL Created, and drop half the operands to create the new node. Which is doable but effectively does the same pattern twice, whereas now th= e MUL would just skip all that work since FMA matched.=20 The post order is essentially really problematic for overlapping patterns. = In pre-order I could for Instance detect all 3 patterns in one traversal and share the matching info= rmation between all 3. Right now I can only do it between mul and add and FMA has to be standalone= . >=20 > + vect_pattern *pattern =3D patt_fn (ref_node, vinfo); > + > + if (pattern->matches ()) >=20 > this suggests you get number of SLP nodes times number of pattern > matchers allocations & inits of vect_pattern. If you move >=20 > + if (pattern->is_optab_supported_p (vectype, OPTIMIZE_FOR_SPEED)) > + { >=20 > into ->matches () then whether this is a IFN or multi-node pattern become= s > an implementation detail which would be IMHO better. >=20 > + FOR_EACH_VEC_ELT (*this->m_nodes, ix, node) > + { > + /* Calculate the location of the statement in NODE to replace. */ > + stmt_info =3D SLP_TREE_REPRESENTATIVE (node); > + gimple* old_stmt =3D STMT_VINFO_STMT (stmt_info); > + tree type =3D gimple_expr_type (old_stmt); > + > + /* Create the argument set for use by > gimple_build_call_internal_vec. */ > + for (int i =3D 0; i < this->m_num_args; i++) >=20 > the scalar argument type is the type of the definition so instead of > gimple_expr_type you want simply TREE_TYPE (gimple_get_lhs (old_stmt)) >=20 > But then you seem to mix the result and the argument types? I think you > need to walk over SLP_TREE_CHILDREN and look at their representatives > scalar def. Which would then assume the pattern consumes all direct > children. Somehow this "generic" Since I'm only consuming simple math ops would the argument and result type= s differ? Something like a widening sub I can see but not the normal sub. > vect_simple_pattern_match::build () has quite some restrictions. > I guess they are documented in the files toplevel comment which also has > general parts applying to all possible pattern matchers (not deriving fro= m > vect_simple_pattern_match). Can you split up the comment and more > explicitely mark what applies to patterns in general (postorder traversal= ) and > what applies to vect_simple_pattern_match only? >=20 > + /* We have to explicitly mark the old statement as unused because > during > + statement analysis the original and new pattern statement may > require >=20 > comment needs updating. >=20 > + different level of unrolling. As an example add/sub when > vectorized > + without a pattern requires 4 copies, whereas with a COMPLEX_ADD > pattern > + this only requires 2 copies and the two statement will be treate= d > as > + hand unrolled. That means that the analysis won't happen as > it'll find > + a mismatch. So we don't analyze the old statement and if we end > up > + needing it, e.g. SLP fails then we have to quickly re-analyze it= . > */ > + STMT_VINFO_RELEVANT (call_stmt_info) =3D vect_used_in_scope; >=20 > I guess relevance should be copied from the original stmt (like if > it was used_by_reduction). Can't that be done in > vect_mark_pattern_stmts? Likewise the ->add_pattern_stmt part > could be done there, too. So you'd be left with >=20 > + STMT_SLP_TYPE (call_stmt_info) =3D pure_slp; >=20 > and >=20 > + STMT_VINFO_SLP_VECT_ONLY (call_stmt_info) =3D true; >=20 > where the pure_slp setting should in theory be not needed (it > should be automagically detected so) but setting it is not wrong > (it would be an optimization). I am using it as a marker for a node dissolve needs to look at when undoing= SLP so it restores the relevancy, but given your comment below on that I may be= able to get rid of it. >=20 > -- >=20 > I'm now looking down the series to see how this is all used, > so quoting different pieces from other patches. >=20 > +/* Checks to see of the expression EXPR is a gimple assign with code COD= E > + and if this is the case the two operands of EXPR is returned in OP1 > and > + OP2. > + > + If the matching and extraction is successful TRUE is returned > otherwise > + FALSE in which case the value of OP1 and OP2 will not have been > touched. > +*/ > + > +static bool > +vect_match_expression_p (slp_tree node, tree_code code) > +{ > + if (!SLP_TREE_REPRESENTATIVE (node)) >=20 > I see that many overall function comments do not actually match > anymore, if at least for the parameters refered to. Please go > over the patch and update them. >=20 > + gimple* expr =3D STMT_VINFO_STMT (SLP_TREE_REPRESENTATIVE (node)); > + if (!is_gimple_assign (expr) > + || gimple_expr_code (expr) !=3D code) >=20 > gimple_assign_rhs_code (expr) >=20 > +/* Check if the given lane permute in PERMUTES matches an alternating > sequence > + of {P0 P1 P0 P1 ...}. This to account for unrolled loops. */ > +static bool > +vect_check_lane_permute (lane_permutation_t &permutes, > + unsigned p0, unsigned p1) > +{ > + unsigned val[2] =3D {p0, p1}; > + for (unsigned i =3D 0; i < permutes.length (); i++) > + if (permutes[i].first !=3D val[i % 2]) > + return false; >=20 > so this doesn't put any constraint on permutes[].second. So it > matches an arbitrary interleaving of two vectors. >=20 > + > + will return PLUS_MINUS along with OPS containing {_37, _12, _38, _36}= . > +*/ > + > +static complex_operation_t > +vect_detect_pair_op (slp_tree node1, slp_tree node2, lane_permutation_t > &lanes, > + bool two_operands =3D true, vec *ops =3D N= ULL) >=20 > { {_37, _38}, {_12, _36} } >=20 > there's a lot of vect_match_expression_p calls and I wonder if > inlining them and CSEing the 'code' compute of node1 and node2 > would simplify things and elide the macro. Also we then know > this is all CSEd in the end ... >=20 > I wonder what LANES specifies, TWO_OPERANDS specifies whether > two-operand variants are allowed unless .. (guess LANES causes > only same operands to match up in the end). >=20 I'm guarding against that you could technically speaking have a +/- node that does reverses the order of the elements below it, in which case that's a different operation than what we want to guard against. I hadn't been able to prove to myself that SLP build would never do this. > +static void > +vect_mark_stmts_as_in_pattern (hash_set *cache, slp_tree > node) > +{ >=20 > need to find a use for this still, but don't you set pure_slp > "above"? This is used to mark the elided nodes. For instance in a complex multiply = the * nodes in the SLP tree become unused. While not having them in the SLP tr= ee is simple, the detection of pure/hybrid is led from the BB statements so th= ey would mark the elided nodes as being hybrid. There is of course the problem of is this node used in any other SLP instan= ce, it may force the analysis to think the SLP tree is pure, but if SLP fails it would try n= ot SLP anyway as it would have done eventually if Hybrid. >=20 > +bool > +complex_pattern::validate_p () > +{ > + unsigned i; > + FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (node), i, tmp) > + { > + slp_tree vnode =3D NULL; > + if (vect_slp_make_linear (node, tmp, &vnode)) > + nodes.quick_push (vnode); > + else > + { > + nodes.release(); > + return false; > + } >=20 > hum. vect_slp_make_linear better not fail? At least we've > created half-way stuff when it does. Can't spot the implementation > right now (guess splitting the patch up doesn't really help much ...) >=20 It can fail, so I see I need to free some nodes otherwise I'll leak... But It's not a correctness issue. Nothing It changed on NODE until it all succeeds. > Noting elsewhere: >=20 > +static void > +vect_dissolve_slp_only_patterns (loop_vec_info loop_vinfo, > + hash_set *visited, slp_tree > root) > +{ > + if (!root || visited->contains (root)) > + return; > + > + unsigned int i; > + slp_tree node; > + stmt_vec_info related_stmt_info; > + stmt_vec_info stmt_info =3D SLP_TREE_REPRESENTATIVE (root); > + > + visited->add (root); >=20 > visited->add () returns whether the key was already there so > please combine the contains and the add calls here and elsewhere. >=20 > if (stmt_info && STMT_VINFO_SLP_VECT_ONLY (stmt_info) > + && (related_stmt_info =3D STMT_VINFO_RELATED_STMT (stmt_info)) != =3D > NULL) > + { > + if (dump_enabled_p ()) >=20 > I think STMT_VINFO_SLP_VECT_ONLY is a thing already, so I wonder > if the above is a sufficient test. There's is_pattern_stmt_p () > (which obviously also applies to non-SLP only patterns). >=20 > Note that I also see another issue, namely with hybrid SLP the > original non-pattern stmt would need to stay relevant. That > probably asks for hybrid SLP discovery and relevance marking > to be combined somehow. You can probably create a simple > testcase by storing a lane of a complex op via a non-grouped > store. >=20 > + STMT_VINFO_RELEVANT (stmt_info) =3D vect_unused_in_scope; > + STMT_VINFO_RELEVANT (related_stmt_info) =3D vect_used_in_scope; >=20 > as said previously the relevance should be copied, in this case > back to the original stmt info from the pattern stmt info. >=20 > + STMT_VINFO_IN_PATTERN_P (related_stmt_info) =3D false; > + STMT_SLP_TYPE (related_stmt_info) =3D loop_vect; >=20 > one option might be to delay pattern recognition (for the loop case) > to after vect_detect_hybrid_slp and simply not make any hybrid > stmt containing nodes part of a SLP pattern. It would be a workaround > of course, not the best solution. Another option is to > add a new field to stmt_info to put a "saved" relevance to and > simply go over all stmts restoring relevance - we already restore > SLP_TYPE to loop_vect that way at Yes.. ideally I would want something like telling it "don't worry about thi= s if you end up being pure SLP". I haven't tried with this new approach to instead add it to the PATTERN_SEQ for the statement. Last time I tried however it looked like in order to g= et this to work correctly I'd need a bigger hack.. Thanks, Tamar >=20 > /* Reset SLP type to loop_vect on all stmts. */ > for (i =3D 0; i < LOOP_VINFO_LOOP (loop_vinfo)->num_nodes; ++i) > { >=20 > so simply restore the saved relevance would work there (guess I > like that more than what vect_dissolve_slp_only_patterns does, > in any case do what vect_dissolve_slp_only_patterns does in the > above loop please). >=20 > Thanks, > Richard. >=20 >=20 >=20 > > Thanks, > > Tamar > > > > gcc/ChangeLog: > > > > * Makefile.in (tree-vect-slp-patterns.o): New. > > * doc/passes.texi: Update documentation. > > * tree-vect-slp.c (vect_print_slp_tree): Add new state. > > (vect_match_slp_patterns_2, vect_match_slp_patterns): New. > > (vect_analyze_slp): Call pattern matcher. > > * tree-vectorizer.h (enum _complex_operation): > > (class vect_pattern_match, class vect_pattern): New. > > * tree-vect-slp-patterns.c: New file. > > > > > -----Original Message----- > > > From: rguenther@c653.arch.suse.de On > > > Behalf Of Richard Biener > > > Sent: Tuesday, September 29, 2020 10:42 AM > > > To: Richard Sandiford > > > Cc: Tamar Christina ; nd ; > gcc- > > > patches@gcc.gnu.org > > > Subject: Re: [PATCH v2 3/16]middle-end Add basic SLP pattern matching > > > scaffolding. > > > > > > On Tue, 29 Sep 2020, Richard Sandiford wrote: > > > > > > > Richard Biener writes: > > > > >> > > @@ -2192,6 +2378,17 @@ vect_analyze_slp_instance (vec_info > > > *vinfo, > > > > >> > > &tree_size, bst_map); > > > > >> > > if (node !=3D NULL) > > > > >> > > { > > > > >> > > + /* Temporarily allow add_stmt calls again. */ > > > > >> > > + vinfo->stmt_vec_info_ro =3D false; > > > > >> > > + > > > > >> > > + /* See if any patterns can be found in the constructe= d SLP > tree > > > > >> > > + before we do any analysis on it. */ > > > > >> > > + vect_match_slp_patterns (node, vinfo, group_size, > > > &max_nunits, > > > > >> > > + matches, &npermutes, &tree_si= ze, > > > > >> > > + bst_map); > > > > >> > > + > > > > >> > > + /* After this no more add_stmt calls are allowed. */ > > > > >> > > + vinfo->stmt_vec_info_ro =3D true; > > > > >> > > + > > > > >> > > > > > > >> > > I think this is a bit early to match patterns - I'd defer it= to > > > > >> > > the point where all entries into the same SLP subgraph are > > > > >> > > analyzed, thus somewhere at the end of vect_analyze_slp loop > > > > >> > > over all instances and match patterns? That way phases are > more > > > clearly separated. > > > > >> > > > > > >> > That would probably work, my only worry is that the SLP analys= is > > > > >> > itself may fail and bail out at > > > > >> > > > > > >> > /* If the loads and stores can be handled with load/store- > lane > > > > >> > instructions do not generate this SLP instance. */ > > > > >> > if (is_a (vinfo) > > > > >> > && loads_permuted > > > > >> > && dr && vect_store_lanes_supported (vectype, > group_size, > > > > >> > false)) > > > > > > > > > > Ah, that piece of code. Yeah, I'm repeatedly running into it as > > > > > well - it's a bad hack that stands in the way all the time :/ > > > > > > > > At one point I was wondering about trying to drop the above, vector= ise > > > > with and without SLP, and then compare their costs, like for > > > VECT_COMPARE_COSTS. > > > > But that seemed like a dead end with the move to doing everything o= n > > > > the SLP representation. > > > > > > Yeah ... though even moving everything to the SLP representation will > retain > > > the issue since there it will be N group-size 1 SLP instances vs. 1 g= roup- > size N > > > SLP instance. > > > > > > > > I guess we should try moving this upward like to > > > > > vect_analyze_loop_2 right before > > > > > > > > > > /* Check the SLP opportunities in the loop, analyze and build S= LP > trees. > > > > > */ > > > > > ok =3D vect_analyze_slp (loop_vinfo, *n_stmts); > > > > > if (!ok) > > > > > return ok; > > > > > > > > > > and there check whether all grouped loads and stores can be handl= ed > > > > > with store-/load-lanes (and there are no SLP reduction chains?) i= n > > > > > which case do not try to attempt SLP at all. Because the testcas= es > > > > > this check was supposed to change were all-load/store-lane or all > > > > > SLP so the mixed case is probably not worth special casing. > > > > > > > > > > Since load-/store-lanes is an arm speciality I tried to only touc= h > > > > > this fragile part with a ten-foot pole ;) CCing Richard, if he a= cks > > > > > the above I can produce a patch. > > > > > > > > Yeah, sounds good to me. Probably also sorth checking whether the > > > > likely_max iteration count is high enough to support group_size > > > > vectors, if we have enough information to guess that. > > > > > > > > We could also get the gen* machinery to emit a macro that is true i= f > > > > at least one load/store-lane pattern is defined, so that we can ski= p > > > > the code for non-Arm targets. I can do that as a follow-up. > > > > > > I've had a second look and one complication is that we only elide the= SLP > > > node if any of the loads are permuted. So if all loads/stores are > unpermuted > > > but load/store-lanes would work we'd keep the SLP node. > > > > > > Of course without actually building the SLP node we don't know whethe= r > the > > > loads will be permuted or not ... > > > > > > But surely the current place for the check will cause some testcases = to > > > become hybrid vectorizations which is likely undesirable. > > > > > > So we could move the check after all SLP discovery is completed and > throw it > > > all away if we can and should use load/store-lanes? > > > But that might then not solve Tamars issue. > > > > > > Richard. > > >=20 > -- > Richard Biener > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 > Nuernberg, > Germany; GF: Felix Imend