From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from EUR02-HE1-obe.outbound.protection.outlook.com (mail-eopbgr10083.outbound.protection.outlook.com [40.107.1.83]) by sourceware.org (Postfix) with ESMTPS id A4BEE38708EA for ; Mon, 28 Dec 2020 13:36:28 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org A4BEE38708EA Received: from MR2P264CA0094.FRAP264.PROD.OUTLOOK.COM (2603:10a6:500:32::34) by AM9PR08MB6051.eurprd08.prod.outlook.com (2603:10a6:20b:2d6::19) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3700.31; Mon, 28 Dec 2020 13:36:24 +0000 Received: from VE1EUR03FT022.eop-EUR03.prod.protection.outlook.com (2603:10a6:500:32:cafe::e1) by MR2P264CA0094.outlook.office365.com (2603:10a6:500:32::34) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3700.30 via Frontend Transport; Mon, 28 Dec 2020 13:36:24 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; gcc.gnu.org; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;gcc.gnu.org; dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by VE1EUR03FT022.mail.protection.outlook.com (10.152.18.64) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3700.27 via Frontend Transport; Mon, 28 Dec 2020 13:36:23 +0000 Received: ("Tessian outbound 6ec21dac9dd3:v71"); Mon, 28 Dec 2020 13:36:22 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: afd3ef33ff58ce51 X-CR-MTA-TID: 64aa7808 Received: from c8080d792415.1 by 64aa7808-outbound-1.mta.getcheckrecipient.com id D569B508-1317-48F2-8D3A-2F1536E1B136.1; Mon, 28 Dec 2020 13:36:06 +0000 Received: from EUR01-DB5-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id c8080d792415.1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Mon, 28 Dec 2020 13:36:06 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=Qehmc1NvT67tGoMJApx62qLoT7BFYXYV9R4KIFpN6J68PPCqngzsWm9ZZdax9Sf0mR2lGf2CUMNs1xpLsjaGBXz9oPYh9h7XTCtTU3kwW++YB7WxyZQf4pyXcjcANO/Z7FFueigTvQYL5Jzl7rIukM4ExoNM7FJIFxsURSDPBAQZY/i6tA1KqrtXqt8pozt//2G2Au+FGVVuGlIACN9/36B7gnTErYGR6XDqEimQJ7t/KuhOIkzEAbHowBjHMOwbTG28dlnnV5pY5YKejh69mrDxgHopAm21SWS+SZSGX70VQBHBNXEnJsHHF96yecgJZERkfXUefElDLGb4tCd3NA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=zhRfMufV9B9fE5aeaS2Bt34vs4KCV+VOJXK30VIlAqI=; b=lF/4ZlGP/p4+0qca7IRz2m74DSOt9by1fpeM3aqhV0P2/qI1DFagmZArcSHcthre1iPrCcA9W19N7sOZ8YNOW+0wqzeBdIaB9AGt1h/TPhlPzW7X/AjNeqZL3dJQNxOjzdznYvwKCjjkGYoEYN1y3pOZgz33lsf0OfNNMfb/dAaovvp+sNManZmfTxWHl6BzTWCXr3Fik8/9Vg5iHzf4V+VaPEVVQywPBNVfmj2h/VQcPSdl2jaAyZrAF2sNjKEOWJmOkLPri/rO72Ff+lH83WVEjS/935ehmCSPXgLRehGCliyn0LA7KF0jpe4AC0VklHYYm2sjPzAonNSXGTCK5Q== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none Authentication-Results-Original: gcc.gnu.org; dkim=none (message not signed) header.d=none;gcc.gnu.org; dmarc=none action=none header.from=arm.com; Received: from VI1PR08MB5325.eurprd08.prod.outlook.com (2603:10a6:803:13e::17) by VE1PR08MB5135.eurprd08.prod.outlook.com (2603:10a6:803:108::30) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3700.27; Mon, 28 Dec 2020 13:36:05 +0000 Received: from VI1PR08MB5325.eurprd08.prod.outlook.com ([fe80::f937:5b3:12e1:8297]) by VI1PR08MB5325.eurprd08.prod.outlook.com ([fe80::f937:5b3:12e1:8297%7]) with mapi id 15.20.3700.031; Mon, 28 Dec 2020 13:36:05 +0000 Date: Mon, 28 Dec 2020 13:35:56 +0000 From: Tamar Christina To: gcc-patches@gcc.gnu.org Cc: nd@arm.com, rguenther@suse.de, ook@ucw.cz Subject: [PATCH 1/8 v9]middle-end slp: Support optimizing load distribution Message-ID: Content-Type: multipart/mixed; boundary="VS++wcV0S1rZb1Fb" Content-Disposition: inline User-Agent: Mutt/1.9.4 (2018-02-28) X-Originating-IP: [217.140.106.53] X-ClientProxiedBy: SN4PR0501CA0095.namprd05.prod.outlook.com (2603:10b6:803:22::33) To VI1PR08MB5325.eurprd08.prod.outlook.com (2603:10a6:803:13e::17) MIME-Version: 1.0 X-MS-Exchange-MessageSentRepresentingType: 1 Received: from arm.com (217.140.106.53) by SN4PR0501CA0095.namprd05.prod.outlook.com (2603:10b6:803:22::33) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3721.14 via Frontend Transport; Mon, 28 Dec 2020 13:36:03 +0000 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-HT: Tenant X-MS-Office365-Filtering-Correlation-Id: 92b24175-6bd4-44d5-75ff-08d8ab3591cb X-MS-TrafficTypeDiagnostic: VE1PR08MB5135:|AM9PR08MB6051: X-Microsoft-Antispam-PRVS: x-checkrecipientrouted: true NoDisclaimer: true X-MS-Oob-TLC-OOBClassifiers: OLM:9508;OLM:9508; X-MS-Exchange-SenderADCheck: 1 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: bLoFDZy0md6u7AxwPJ8NNBrQ439xq8fD5fTRf80B5bOiiWPZCtdp4A6Srt5ps/fAT5eBsBSmFR5CUZZRNhP8pWMRl8JGjdIpSIRj2oFI26yLmHlI4CXR87DSWh1fltZd9JtjVOM5K2Q+NqvNnrmeTgjafthHmrgZG8zRCF09u1N6JObEvpOFitsQEEHb0ZbHw/Lj41XAcydeNAvbtRmWXT4O1VXhu6Yt3VThfwYgKHKcQubfVP9BepPm3KIqFolemNldLKLkS7ETctQybJBclIYAa3Z2WWhJ4oThJesUHIrZ+3ArLlZnDs5ebEdJ4JPq2Etyq6cc5uED5xJ4A5uG7ioscvyn/ODLnPHVksTv3mazQgB4OEyQcqjrkYqgTsmighFjTmdKPHItsxWUqZ6YH5fZtT1fcDEGk5AY5T0RThOKFyAb1M2q+DkikwNkgDOwAqIHwmqk3/QZIfKA+yizwA== X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:VI1PR08MB5325.eurprd08.prod.outlook.com; PTR:; CAT:NONE; SFS:(4636009)(396003)(136003)(39860400002)(366004)(346002)(376002)(478600001)(44144004)(956004)(16526019)(186003)(5660300002)(8936002)(6666004)(6916009)(66476007)(66556008)(55016002)(33964004)(66946007)(66616009)(2616005)(36756003)(8676002)(4743002)(52116002)(7696005)(235185007)(2906002)(86362001)(26005)(4326008)(316002)(44832011)(8886007)(4216001)(2700100001); DIR:OUT; SFP:1101; X-MS-Exchange-AntiSpam-MessageData: =?utf-8?B?MkhuZTMrOFNjRHo2aTZYUytqRXo4YUxpekpuVmlYQUdRa0thWGlya1ZLVGI3?= =?utf-8?B?V2JkV3BjcllSTHZ4NTFHR2tKTWNORGdQTmZxZFNaejNDczZSQTltTXFGNjF4?= =?utf-8?B?L1c0VjlaUjBSdkNCUFZmRlUweDJpZ2pXVS9KeUtkdG4va3cvUE9rbnJKc0RY?= =?utf-8?B?RDlJTHplWjVMRjJ4V29tOWYrLzBVcndJanRmeUtDME1ucnprWTM5Y3FhQ0p2?= =?utf-8?B?c2VDaGIvZDlSYnZra2pjbTg4OFdoaHpQMkMrVElkNXpORmlDT29WeFhMdlpK?= =?utf-8?B?blRod0p0QmZ2UlRSdmhQMjlkcDErMWZJOWNBR0RzanNoWW8ycStRQllmcUxL?= =?utf-8?B?REpnenpRM0R1MTRIY3FYZnVweG1DNlVNTnFTVDNCaHMvaEt0Q2JUclJoelJP?= =?utf-8?B?Y2FPcitzMUVhUmhKNk1IMTJ1RWw4Y0t2UzFCN0o0SHFCcjVRYjE5NVVJL2xU?= =?utf-8?B?Z1I4ekRPVWFnWlJMcStPZHY4NjlBZmFKVkdUMkJvVFFEY05vaUFGT1QvcmpD?= =?utf-8?B?aDB1TW95Zm04dWF5REdMZHk1MWFHcXNFVFBKeGFWaHJySEpyV1VNa0doU1c5?= =?utf-8?B?aTQwR3VqUE5pT29EQk9heVFhRkNuSU8zU1dEYlJnTytIdys4cXFwL3NhUnFR?= =?utf-8?B?SE9vZDRVUUE2a1o2ZHVib0hJcTRhV1Vwc3hENFdMNytZSUh1U2oyZDlneTNi?= =?utf-8?B?ZTl5OWN4b1FXWjFBRUpzR3Y5NFVieHI3bjlNUVlVM2VrNnFuUGYybmNnVjRW?= =?utf-8?B?WFFMeU5sUVA3S1FMQVhPSmZnckEyRUtTVCtqcGdMZjNvQmJLTVJVS0Jsa1Q4?= =?utf-8?B?UFB1RitGcFpiNy9IMzFhVjFxcjM5cW9ycno5RzBCcmRtUzFxQmtlZkp2VEtp?= =?utf-8?B?ME05UDRuSHlHaS9NZFN0a2NEbHpjQXhwNVZaWllZUEZJNEYyZVR6S2lHdXlE?= =?utf-8?B?YmxoeWcvQzF6ZW91NC9VbllvNnFzVmZkNkJwRXhQbTVXMnFJM05zWEtjbFBW?= =?utf-8?B?MjdFT0drQndWeXVqY1Z0SzU1c0c3SzFNNDhkaEdMYXp6dGh1Q0kyZU5YWnhL?= =?utf-8?B?WmZNU1FLR3ZZdFZTbUxTY0twci9zdDc3MUc0d1VDQW5YcGd3UUJpTElkZ2t2?= =?utf-8?B?d29iNzloM3JtQ1dhLzFSbEZDVkswOU5zSXJYQlQ2R1ZWbm1Yb2VQakdFbDFh?= =?utf-8?B?SUFFalZUa3R6djhhOEx5QlZjdjkzSktsTnNYK09vNU9XYzdNdVNLYm53dzhh?= =?utf-8?B?S002a2ZjMjZUVkJLak9tVE1VenE5a3BpcTVzdnI1YWhobkd5YlJYb2U4ZVFT?= =?utf-8?Q?zaKABRl4i/A8DBlmPdiFXNBKiOw1JvwZfc?= X-MS-Exchange-Transport-Forked: True X-MS-Exchange-Transport-CrossTenantHeadersStamped: VE1PR08MB5135 Original-Authentication-Results: gcc.gnu.org; dkim=none (message not signed) header.d=none;gcc.gnu.org; dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: VE1EUR03FT022.eop-EUR03.prod.protection.outlook.com X-MS-Office365-Filtering-Correlation-Id-Prvs: 911b92f4-a5b1-458a-3201-08d8ab358684 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: IzKwRMJcLLay+7IOqWc5AoS5GuPD7ZTP42D2bAZpB5JXFZxvuH/6zklhu7fBphFigt/9wR8ZuksFRwWqs0NWEl79Q5ncOZEPOP3R+oAtBTNM3ViJ6XYoZP7+BFYuaqcdkbm0ITb6wrqFo7bFEVLoRqt5EBzKQqNS/iYakTiJaiVTro6A0O0kglx/wp7DA6geGAQLM2HDXVbTptHfAXHekK0M3koP61ZUAtHxOigVY+tb9rcG/9L55BpCDVu9+PnOqgX78DkFjdhDvU1G0AxZkm9SlVLViAdHU+xR6rUGmt4bpp0GG9NnKbw7IetsnnemhzOFZ1xi5SYmIrMnjmXl0sgfOL5ouRRb4JCgBerNRDNs/eWYebvnSiMnoSlSUAlyqoEIYyEWP5ki4ewniw9h5wC0R48DXJPHMlVV+iE18Y3k/zLEU+iaAxBE07qlXKsjkaX+9DKOkgH3hadyE0dgB2MwiORBv2QFJRLIZP5xT2NxwUQp6C2r7aYTvIKUxufWNQig7dAGRr41L47yG1bI/w== X-Forefront-Antispam-Report: CIP:63.35.35.123; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:64aa7808-outbound-1.mta.getcheckrecipient.com; PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com; CAT:NONE; SFS:(4636009)(136003)(346002)(376002)(39860400002)(396003)(46966006)(70206006)(86362001)(316002)(4743002)(26005)(4326008)(5660300002)(55016002)(8936002)(6666004)(7696005)(235185007)(47076005)(82740400003)(8676002)(336012)(478600001)(36756003)(2906002)(33964004)(44144004)(81166007)(356005)(66616009)(8886007)(186003)(6916009)(44832011)(16526019)(2616005)(70586007)(82310400003)(956004)(4216001)(2700100001); DIR:OUT; SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 28 Dec 2020 13:36:23.8832 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 92b24175-6bd4-44d5-75ff-08d8ab3591cb X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[63.35.35.123]; Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: VE1EUR03FT022.eop-EUR03.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: AM9PR08MB6051 X-Spam-Status: No, score=-14.3 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, GIT_PATCH_0, MSGID_FROM_MTA_HEADER, RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_PASS, TXREP, UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 28 Dec 2020 13:36:31 -0000 --VS++wcV0S1rZb1Fb Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Hi All, This introduces a post processing step for the pattern matcher to flatten permutes introduced by the complex multiplications patterns. This performs a blend early such that SLP is not cancelled by the LOAD_LANES permute. This is a temporary workaround to the fact that loads are not CSEd during building and is required to produce efficient code. Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu and no issues. Ok for master? Thanks, Tamar gcc/ChangeLog: * tree-vect-slp.c (optimize_load_redistribution_1): New. (optimize_load_redistribution): New. (vect_match_slp_patterns): Use it. --- inline copy of patch -- diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c index 2a58e54fe51471df5f55ce4a524d0022744054b0..8360a59098f517498f3155f325cf8406466ac25c 100644 --- a/gcc/tree-vect-slp.c +++ b/gcc/tree-vect-slp.c @@ -2228,6 +2228,115 @@ calculate_unrolling_factor (poly_uint64 nunits, unsigned int group_size) return exact_div (common_multiple (nunits, group_size), group_size); } +/* Helper function of optimize_load_redistribution that performs the operation + recursively. */ + +static slp_tree +optimize_load_redistribution_1 (scalar_stmts_to_slp_tree_map_t *bst_map, + hash_set *visited, slp_tree root) +{ + if (visited->add (root)) + return NULL; + + slp_tree node; + unsigned i; + + /* For now, we don't know anything about externals so do not do anything. */ + if (SLP_TREE_DEF_TYPE (root) == vect_external_def + || SLP_TREE_DEF_TYPE (root) == vect_constant_def) + return NULL; + else if (SLP_TREE_CODE (root) == VEC_PERM_EXPR + && SLP_TREE_LANE_PERMUTATION (root).exists () + && !SLP_TREE_SCALAR_STMTS (root).exists ()) + { + /* First convert this node into a load node and add it to the leaves + list and flatten the permute from a lane to a load one. If it's + unneeded it will be elided later. */ + auto_vec stmts; + stmts.create (SLP_TREE_LANES (root)); + load_permutation_t load_perm; + load_perm.create (SLP_TREE_LANES (root)); + lane_permutation_t lane_perm = SLP_TREE_LANE_PERMUTATION (root); + for (unsigned j = 0; j < lane_perm.length (); j++) + { + std::pair perm = lane_perm[j]; + /* This isn't strictly needed, but this function is a temporary + one for specifically pattern matching, so don't want it to + optimize things the remainder of the pipeline will. */ + if (perm.first != j) + goto next; + node = SLP_TREE_CHILDREN (root)[perm.first]; + + if (!SLP_TREE_LOAD_PERMUTATION (node).exists ()) + return NULL; + + stmts.quick_push (SLP_TREE_SCALAR_STMTS (node)[perm.second]); + load_perm.safe_push (SLP_TREE_LOAD_PERMUTATION (node)[perm.second]); + } + + if (dump_enabled_p ()) + dump_printf_loc (MSG_NOTE, vect_location, + "converting stmts on permute node %p\n", root); + + slp_tree *value = bst_map->get (stmts); + if (value) + node = *value; + else + { + FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (root), i, node) + SLP_TREE_REF_COUNT (node)++; + + vec stmts_cpy = stmts.copy (); + node = vect_create_new_slp_node (stmts_cpy.copy (), 0); + SLP_TREE_VECTYPE (node) = SLP_TREE_VECTYPE (root); + SLP_TREE_LOAD_PERMUTATION (node) = load_perm; + bst_map->put (stmts_cpy, node); + } + SLP_TREE_REF_COUNT (node)++; + + return node; + } + +next: + FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (root), i , node) + { + slp_tree value = optimize_load_redistribution_1 (bst_map, visited, node); + if (value) + { + SLP_TREE_CHILDREN (root)[i] = value; + vect_free_slp_tree (node); + } + } + + return NULL; +} + +/* Temporary workaround for loads not being CSEd during SLP build. This + function will traverse the SLP tree rooted in ROOT for INSTANCE and find + VEC_PERM nodes that blend vectors from multiple nodes that all read from the + same DR such that the final operation is equal to a permuted load. Such + NODES are then directly converted into LOADS themselves. The nodes are + CSEd using BST_MAP. */ + +static void +optimize_load_redistribution (scalar_stmts_to_slp_tree_map_t *bst_map, + slp_tree root) +{ + slp_tree node; + unsigned i; + hash_set visited; + + FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (root), i , node) + { + slp_tree value = optimize_load_redistribution_1 (bst_map, &visited, node); + if (value) + { + SLP_TREE_CHILDREN (root)[i] = value; + vect_free_slp_tree (node); + } + } +} + /* Helper function of vect_match_slp_patterns. Attempts to match patterns against the slp tree rooted in REF_NODE using @@ -2276,7 +2385,7 @@ static bool vect_match_slp_patterns (slp_instance instance, vec_info *vinfo, hash_set *visited, slp_tree_to_load_perm_map_t *perm_cache, - scalar_stmts_to_slp_tree_map_t * /* bst_map */) + scalar_stmts_to_slp_tree_map_t *bst_map) { DUMP_VECT_SCOPE ("vect_match_slp_patterns"); slp_tree *ref_node = &SLP_INSTANCE_TREE (instance); @@ -2291,6 +2400,8 @@ vect_match_slp_patterns (slp_instance instance, vec_info *vinfo, if (found_p) { + optimize_load_redistribution (bst_map, *ref_node); + if (dump_enabled_p ()) { dump_printf_loc (MSG_NOTE, vect_location, -- --VS++wcV0S1rZb1Fb Content-Type: text/x-diff; charset=utf-8 Content-Disposition: attachment; filename="rb13956.patch" diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c index 2a58e54fe51471df5f55ce4a524d0022744054b0..8360a59098f517498f3155f325cf8406466ac25c 100644 --- a/gcc/tree-vect-slp.c +++ b/gcc/tree-vect-slp.c @@ -2228,6 +2228,115 @@ calculate_unrolling_factor (poly_uint64 nunits, unsigned int group_size) return exact_div (common_multiple (nunits, group_size), group_size); } +/* Helper function of optimize_load_redistribution that performs the operation + recursively. */ + +static slp_tree +optimize_load_redistribution_1 (scalar_stmts_to_slp_tree_map_t *bst_map, + hash_set *visited, slp_tree root) +{ + if (visited->add (root)) + return NULL; + + slp_tree node; + unsigned i; + + /* For now, we don't know anything about externals so do not do anything. */ + if (SLP_TREE_DEF_TYPE (root) == vect_external_def + || SLP_TREE_DEF_TYPE (root) == vect_constant_def) + return NULL; + else if (SLP_TREE_CODE (root) == VEC_PERM_EXPR + && SLP_TREE_LANE_PERMUTATION (root).exists () + && !SLP_TREE_SCALAR_STMTS (root).exists ()) + { + /* First convert this node into a load node and add it to the leaves + list and flatten the permute from a lane to a load one. If it's + unneeded it will be elided later. */ + auto_vec stmts; + stmts.create (SLP_TREE_LANES (root)); + load_permutation_t load_perm; + load_perm.create (SLP_TREE_LANES (root)); + lane_permutation_t lane_perm = SLP_TREE_LANE_PERMUTATION (root); + for (unsigned j = 0; j < lane_perm.length (); j++) + { + std::pair perm = lane_perm[j]; + /* This isn't strictly needed, but this function is a temporary + one for specifically pattern matching, so don't want it to + optimize things the remainder of the pipeline will. */ + if (perm.first != j) + goto next; + node = SLP_TREE_CHILDREN (root)[perm.first]; + + if (!SLP_TREE_LOAD_PERMUTATION (node).exists ()) + return NULL; + + stmts.quick_push (SLP_TREE_SCALAR_STMTS (node)[perm.second]); + load_perm.safe_push (SLP_TREE_LOAD_PERMUTATION (node)[perm.second]); + } + + if (dump_enabled_p ()) + dump_printf_loc (MSG_NOTE, vect_location, + "converting stmts on permute node %p\n", root); + + slp_tree *value = bst_map->get (stmts); + if (value) + node = *value; + else + { + FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (root), i, node) + SLP_TREE_REF_COUNT (node)++; + + vec stmts_cpy = stmts.copy (); + node = vect_create_new_slp_node (stmts_cpy.copy (), 0); + SLP_TREE_VECTYPE (node) = SLP_TREE_VECTYPE (root); + SLP_TREE_LOAD_PERMUTATION (node) = load_perm; + bst_map->put (stmts_cpy, node); + } + SLP_TREE_REF_COUNT (node)++; + + return node; + } + +next: + FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (root), i , node) + { + slp_tree value = optimize_load_redistribution_1 (bst_map, visited, node); + if (value) + { + SLP_TREE_CHILDREN (root)[i] = value; + vect_free_slp_tree (node); + } + } + + return NULL; +} + +/* Temporary workaround for loads not being CSEd during SLP build. This + function will traverse the SLP tree rooted in ROOT for INSTANCE and find + VEC_PERM nodes that blend vectors from multiple nodes that all read from the + same DR such that the final operation is equal to a permuted load. Such + NODES are then directly converted into LOADS themselves. The nodes are + CSEd using BST_MAP. */ + +static void +optimize_load_redistribution (scalar_stmts_to_slp_tree_map_t *bst_map, + slp_tree root) +{ + slp_tree node; + unsigned i; + hash_set visited; + + FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (root), i , node) + { + slp_tree value = optimize_load_redistribution_1 (bst_map, &visited, node); + if (value) + { + SLP_TREE_CHILDREN (root)[i] = value; + vect_free_slp_tree (node); + } + } +} + /* Helper function of vect_match_slp_patterns. Attempts to match patterns against the slp tree rooted in REF_NODE using @@ -2276,7 +2385,7 @@ static bool vect_match_slp_patterns (slp_instance instance, vec_info *vinfo, hash_set *visited, slp_tree_to_load_perm_map_t *perm_cache, - scalar_stmts_to_slp_tree_map_t * /* bst_map */) + scalar_stmts_to_slp_tree_map_t *bst_map) { DUMP_VECT_SCOPE ("vect_match_slp_patterns"); slp_tree *ref_node = &SLP_INSTANCE_TREE (instance); @@ -2291,6 +2400,8 @@ vect_match_slp_patterns (slp_instance instance, vec_info *vinfo, if (found_p) { + optimize_load_redistribution (bst_map, *ref_node); + if (dump_enabled_p ()) { dump_printf_loc (MSG_NOTE, vect_location, --VS++wcV0S1rZb1Fb--