From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from EUR02-VI1-obe.outbound.protection.outlook.com (mail-vi1eur02on2041.outbound.protection.outlook.com [40.107.241.41]) by sourceware.org (Postfix) with ESMTPS id 680303858439; Tue, 16 May 2023 09:45:26 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 680303858439 Authentication-Results: sourceware.org; dmarc=pass (p=reject dis=none) header.from=siemens.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=siemens.com ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=nhjhqpgs+ndBa4VpnRGOIqgWxyTG8U4jd5yy8DUJKFIxxWImLF+9nReXRscg542M/UrXBBZiOTbajWLwOSVJVq6LUuUJb3wx8H5T2jVeL41+bnN64TH0iMj+FOMCfJzqx3kTpNsUiBYtoV25t4fizg2WcuytmsTww8eGfnk04EW73eW2QYP4rtHNaURNk71WnVbQfEKXb2xwj2H03yu2EuOiTVEQuzEgK2tRwhCV7OkwGcaV6avbfuFGOqrEfLvT3XjZRu7yR/rpUsG1GKIrZJjZvy+imF82DfXeklj21cbxEvwiieEPtvbnpRGrDAnMNwZ6Yh02NgB8D509S+jr1Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=XbnNoe7Sqx7DEdg0ArWv5ljOZGjw5aO7RvrLD9pAOG0=; b=dE+xMpIftjH+qxKRP7qR0BNDeFIPTs7+7z8Stbv84N79kHoSh62+p5/j1yVFk67wju4JB9G8rCR3EGfoieWh/nPy9+b0vb9AsjoaqAlsdTKVjH9DybRyTm6GhXoTENoWWdF+9aEMVAwzG3GLq28OIqUW+1gRavWqAkZB/CZgaQfkm0uqbDKmGhggptaRGAQ0MIRupcrl8mRlq/mOSVxIgEKnMHZJJitvzZGY3WmyOvd//E8UXS8QAQWkrQOIWUFXW4FEKBwoAPWdXW30prJDphYjXdKEkc6AtCGhGuRYDYccoOuPgR9k7g8iDTRUq66D085k1QtEphINRsLCYuvsmg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=siemens.com; dmarc=pass action=none header.from=siemens.com; dkim=pass header.d=siemens.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=siemens.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=XbnNoe7Sqx7DEdg0ArWv5ljOZGjw5aO7RvrLD9pAOG0=; b=aHCR6odEXhEztqgVZCx3lnca4iz4+gycJSs95bmhG8IYN2eo+yF/EfS8jsQvlDZrHXGb1dAQW46xE/jOIuH4SpDX1ahxkebjmRjBg/DeDWGXwuChnWSTtf+LM04gXwxIFQHcTqiVusMNwWCNpDUOjcaZvzU94DOzdbZ+HX8AOddgjo2wZhVCkKNn3tq9PR7+XA35NazFdxkk1hGAwb1p4anrvRgNAxDAiB4QF4+pcKBqkKrU50daW595yWtw9cqGsRnWu3SxI6MDfM//pZAIvSSE0ERM3Zh+WIhWVdwwOUFO3DwXH20vCq7AT4npvZp75yBKVMB4RL9EwcXIJrBn2A== Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=siemens.com; Received: from AM9PR10MB4824.EURPRD10.PROD.OUTLOOK.COM (2603:10a6:20b:413::23) by DB3PR10MB6860.EURPRD10.PROD.OUTLOOK.COM (2603:10a6:10:43d::18) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6387.30; Tue, 16 May 2023 09:45:24 +0000 Received: from AM9PR10MB4824.EURPRD10.PROD.OUTLOOK.COM ([fe80::bfea:f7bc:cdf9:550]) by AM9PR10MB4824.EURPRD10.PROD.OUTLOOK.COM ([fe80::bfea:f7bc:cdf9:550%6]) with mapi id 15.20.6387.032; Tue, 16 May 2023 09:45:24 +0000 Message-ID: Date: Tue, 16 May 2023 11:45:16 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.10.1 From: Frederik Harwath Subject: Re: [PATCH 0/7] openmp: OpenMP 5.1 loop transformation directives To: Jakub Jelinek , Frederik Harwath Cc: gcc-patches@gcc.gnu.org, fortran@gcc.gnu.org, tobias@codesourcery.com, joseph@codesourcery.com, jason@redhat.com References: <20230324153046.3996092-1-frederik@codesourcery.com> In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-ClientProxiedBy: FR2P281CA0142.DEUP281.PROD.OUTLOOK.COM (2603:10a6:d10:98::8) To AM9PR10MB4824.EURPRD10.PROD.OUTLOOK.COM (2603:10a6:20b:413::23) MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: AM9PR10MB4824:EE_|DB3PR10MB6860:EE_ X-MS-Office365-Filtering-Correlation-Id: d842f08a-5ecd-41b9-5d3e-08db55f24552 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: 20jIfPlTT63TY1gET0AKunMZpWMA0ioCPRFu+rp+rano0q9J6zmNvIb0h9+Y5AzyxgZoNo3Db/SIoRSAkATXS0+kImSIKA0nagsKQjOODFcoqPjdZNnofX/7Mr20+BsI4/YxCyz+GnMQsnsjVGqrdxoHYo5cmmUR2aH++3TN3FDOuGqfLLWVpI3Oe8ogSwZPrlF9dJu28gNs6Yp7gcB07nqM8puIbLRGzsY+zFYPsQ6n4zSYLY6AkgCQBYsUrEffrL8QGzlm3yUgOHcsBgpmfQpezbaGcAyCACxGa0lgesaizA7qKPrI4ZZ1VRUqbIXZ51QR/xGezS2ywmCrVJBkgENY7IAdI8q64cOEq6y27F/ADBMSiavtVZeTbapM7igBWCsCdCOihEU7dZOYWYOca28bmTsYBt+qCTY02jYMzJLEtbJVSeM0SDqSRiZvEles2prxTMutkkB+xys36JBwJvnWh8y1zhpg9Wf5fROYRUa9zaQ0pCmOGA1TYWNCynl2HuXMRayMtojYp7mUnTQizr7+OF9RnK9Tf4ByKzHZX7ckU1XwXdDDegorXAT07dpgtUd0bFdZEOe+ZYmbBgZMjM9xnkTnmXmEFjoPYPV/yoaFJYgYhnQ1EHQfdSuqW0lt/VEucLv3zXBiwJb8lTaj8Q== X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:AM9PR10MB4824.EURPRD10.PROD.OUTLOOK.COM;PTR:;CAT:NONE;SFS:(13230028)(4636009)(396003)(39860400002)(366004)(136003)(346002)(376002)(451199021)(36756003)(110136005)(478600001)(66556008)(66476007)(66946007)(4326008)(316002)(86362001)(6486002)(83380400001)(2616005)(5660300002)(8676002)(8936002)(6666004)(82960400001)(2906002)(44832011)(31696002)(41300700001)(38100700002)(6512007)(186003)(53546011)(6506007)(31686004)(43740500002)(45980500001);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: =?utf-8?B?QkUwcUVSRko3ZXNWQXR1bytLWDF6cUNwREc2ODZ5SGU4RFhOc0lvYWRCRG9u?= =?utf-8?B?MmpId3ZIYmZwSkdnRUF1TVE0blRCeWdGSjdaL0xlemRpTjdmYS9GK2IzY3NB?= =?utf-8?B?d2tvNHFFRTB0ZlJXMURXMnpRd1NnM0xwYWtXZDJCVXJGUGg5dWlqQ3E3VkpX?= =?utf-8?B?eWpYSGl3V1czb3pxRHoxNWhuWXpwbTFMOXVobVZGeFJlU2NVUUZQTFpZb0ZO?= =?utf-8?B?S0RCVW80SWZMUEVpVlVUeGV0NkFZd21jYk5IRVc4U3JNNXdGcXA5S20zU2Uy?= =?utf-8?B?STM4SkQ2M3J2ZW5oVW40TXRLWSsvZUt4VThSZDBtMFBSODZHYTl5NVpLUXJ6?= =?utf-8?B?dUNNWHg0UldrYzJMWEoxYXNhWWVoRjdhVXVrYitocFVFRVVlUXVaNHlzY0dp?= =?utf-8?B?M1djWmlxODBTOTNYRVdEK2hnUEh5cnUxbzVjakk0YVVDUUROUmdjNWUvdHlL?= =?utf-8?B?eUFuOHFzQ1VJM2FEYlEzdkUwWjJPanhxeG9LZytiZ2szSWpTUW15V2d2Mmxy?= =?utf-8?B?WFZrRXVIZklMMlF6NXQxaFpjQUVNL1Q4RWZLY2NOSWh6ckRTOWZpaGJjMjF2?= =?utf-8?B?Z090VUVsa0xSYXkxK2JsaEkzc25wTnNxNFl6Vm1FUmJMd250VStGeTRjZ0dj?= =?utf-8?B?Y3hqVVl6WkcxVjZoMUxnbXRTVUJCdzN2eHBMS3lRL0NXY0hibmxnSk9kcFhQ?= =?utf-8?B?NlFtT0hhZlpiUnFuTnJpZ1RsTDlUMVcyR0tuQWhMMjRFZ01WdCs3aDR6eTJl?= =?utf-8?B?a3ZyUGc1bi9GS2Qvam5yNXNpQysrc0NReU1sVGF2QW1jNkFXRTRVTm1mWUsr?= =?utf-8?B?a2hxemdDdzRrQkpCYmhRY20zVGNndENSUGtqN1JZWFJUcXQwcWNBQThpbmZ2?= =?utf-8?B?YmI5eTJkTTBxa1NrWjBGTFB4QTU4KzVRalQrMFJzblpESFVtbFM1Q25uRkJN?= =?utf-8?B?cjhPcDlYRTQ0NFVsN25sdFpYUnhyNUpnQnZOTzdzVHI0T1owWlp6d0tETHB4?= =?utf-8?B?V0ppRFNjVGJKME5ONlE2WTVDY1Y5eC9NWHlRdi9aWUVsaWFRZFBVM1V3RlMr?= =?utf-8?B?alVNNjBUTFVsbHgvVjNzTzFNWDBhQjJTZlRCQzg1RGFSQXpqNmFsN1B1WTNq?= =?utf-8?B?SDc5YnFDTlg2alp1dVhKRExRaGp5aXBNTFdERDlyc2pJcklpbXB4R0RLcE5o?= =?utf-8?B?MExXbUIyTHRyQmF6TmQ0eCtQZHZHMUhCMEJYNExwejZtcjhxWU90QVlKVjAy?= =?utf-8?B?SGRUcUt4OXJkckNJZWpmd1RSeXFLZkZaOXcrRzgrdk9WUFc4OFVtZm5zMVVE?= =?utf-8?B?dllDd0pEamM5THFRaUY0TXpzcnNIUnMrUERrME9zQ3Q5enM3UExlaUEvbmZE?= =?utf-8?B?N1lwU0MrbDhuQU43aHpmclZraTV4WHZQVnRqVkNjZW9zSjRZRWVlWkZRa3Qr?= =?utf-8?B?TWdVSTVLNG9PdTZPLzUwUnpXdXhHVU1lKzZBaGhaVEJTYUlNdDJTT1RWK0pR?= =?utf-8?B?Sysyc1NkeGJtZXRqcko1b2w0dUtieHR5eTBBWktZbG9YOHdrV09xZTRMNjBs?= =?utf-8?B?TVlMQmdVQ3pUSjhtOUtyTEZKcVFVUmxRSGtDemFGbEJFUk5lT1JGdWkzNVJP?= =?utf-8?B?VnlXR2VyODJodURmbW01bTVNNnhqdGpFV21TYks5OEZsenVMc2VJLzh5WTZB?= =?utf-8?B?dWJrSXZWRWxabU8weXY5L1lHQ0UxZG9tM040RkQ1SzQxMC8zM3UvQnNXcjlT?= =?utf-8?B?REFTZzZDa2o1STB2ampQTmZ4bXZzSTIvVHEyTDdEeStKMUNaUTdBbndrWjM3?= =?utf-8?B?RDFzbG56ZFhMc0szYjNicDJNVklOaFVDZ0hCaWFTdDZhanVUNjRwN3hubTBH?= =?utf-8?B?Q2VGSXFGbGRDK1dMN3ErUEN3YXA1NU01ajVPeWN5VzR4VXZ5a1BaQXhReWVo?= =?utf-8?B?L0ZRa3A2WlNJYmRzRDJzOHg1ZHk0L1RoU3pBL21wY3ZQbGZONjVXTkFFY2dN?= =?utf-8?B?dlkxMDVSZzh4NDJON1k5RFpFSmNCLzZNUWJpSVNzdy82Q053dStXeFdTdjJU?= =?utf-8?B?L1VWdHlEZE11dlVVUFhzaUxwR3F4MXphZDNZaHlEWGhRbkh1SXJ0eDJTd2Ra?= =?utf-8?B?aWZrb0JwVWhKMy92QnJUTDdxTWJuUHIzUVlHQXdIajVvejE0dnZ6MVEvOGQ0?= =?utf-8?B?Q0phMmQ1dUhPVDgwRzlJQnJSaTJzVnc5THRzdVRoYjFHVlFmcGJ2cmZvMkxZ?= =?utf-8?B?cXVxY01sakJkNE50bTB0WmNMNTBnPT0=?= X-OriginatorOrg: siemens.com X-MS-Exchange-CrossTenant-Network-Message-Id: d842f08a-5ecd-41b9-5d3e-08db55f24552 X-MS-Exchange-CrossTenant-AuthSource: AM9PR10MB4824.EURPRD10.PROD.OUTLOOK.COM X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 16 May 2023 09:45:23.9388 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 38ae3bcd-9579-4fd4-adda-b42e1495d55a X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: DLqzgY5yfXfTlJZCcQxtOHcshZZfNqTcKpJlOwoDwP3OYo4lCkP1mlNYK9V/ae2XJDwvYm3gVrUhmejpOmFQRt9pitxJi3K1sxIuEQR7Ja8= X-MS-Exchange-Transport-CrossTenantHeadersStamped: DB3PR10MB6860 X-Spam-Status: No, score=-6.9 required=5.0 tests=BAYES_00,DKIMWL_WL_MED,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FORGED_SPF_HELO,NICE_REPLY_A,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_PASS,SPF_NONE,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Hi Jakub, On 15.05.23 12:19, Jakub Jelinek wrote: > On Fri, Mar 24, 2023 at 04:30:38PM +0100, Frederik Harwath wrote: >> this patch series implements the OpenMP 5.1 "unroll" and "tile" >> constructs. It includes changes to the C,C++, and Fortran front end >> for parsing the new constructs and a new middle-end >> "omp_transform_loops" pass which implements the transformations in a >> source language agnostic way. > I'm afraid we can't do it this way, at least not completely. > > The OpenMP requirements and what is being discussed for further loop > transformations pretty much requires parts of it to be done as soon as possible. > My understanding is that that is where other implementations implement that > too and would also prefer GCC not to be the only implementation that takes > significantly different decision in that case from other implementations The place where different compilers implement the loop transformations was discussed in an OpenMP loop transformation meeting last year. Two compilers (another one and GCC with this patch series) transformed the loops in the middle end after the handling of data sharing, one planned to do so. Yet another vendor had not yet decided where it will be implemented. Clang currently does everything in the front end, but it was mentioned that this might change in the future e.g. for code sharing with Flang. Implementing the loop transformations late could potentially complicate the implementation of transformations which require adjustments of the data sharing clauses, but this is known and consequentially, no such transformations are planned for OpenMP 6.0. In particular, the "apply" clause therefore only permits loop-transforming constructs to be applied to the loops generated from other loop transformations in TR11. > The normal loop constructs (OMP_FOR, OMP_SIMD, OMP_DISTRIBUTE, OMP_LOOP) > already need to know given their collapse/ordered how many loops they are > actually associated with and the loop transformation constructs can change > that. > So, I think we need to do the loop transformations in the FEs, that doesn't > mean we need to write everything 3 times, once for each frontend. > Already now, e.g. various stuff is shared between C and C++ FEs in c-family, > though how much can be shared between c-family and Fortran is to be > discovered. > Or at least partially, to the extent that we compute how many canonical > loops the loop transformations result in, what artificial iterators they > will use etc., so that during gimplification we can take all that into > account and then can do the actual transformations later. The patches in this patch series already do compute how many canonical loop nests result from the loop transformations in the front end. This is necessary to represent the loop nest that is affected by the loop transformations by a single OMP_FOR to meet the expectations of all later OpenMP code transformations. This is also the major reason why the loop transformations are represented by clauses instead of representing them asĀ  "OMP_UNROLL/OMP_TILE as GENERIC constructs like OMP_FOR" as you suggest below. Since the loop transformations may also appear on inner loops of a collapsed loop nest (i.e. within the collapsed depth), representing the transformation by OMP_FOR-like constructs would imply that a collapsed loop nest would have to be broken apart into single loops. Perhaps this could be handled somehow, but the collapsed loop nest would have to be re-assembled to meet the expectations of e.g. gimplification. The clause representation is also much better suited for the upcoming OpenMP "apply" clause where the transformations will not appear as directives in front of actual loops but inside of other clauses. In fact, the loop transformation clauses in the implementation already specify the level of a loop nest to which they apply and it could be possible to re-use this handling for "apply". My initial reaction also was to implement the loop transformations as OMP_FOR-like constructs and the patch actually introduces an OMP_LOOP_TRANS construct which is used to represent loops that are not going to be associated with another OpenMP directive after the transformation, e.g. void foo () { #pragma omp tile sizes (4, 8, 16) for (int i = 0; i < 64; ++i) { ... } } You suggest to implement the loop transformations during gimplification. I am not sure if gimplification is actually well-suited to implement the depth-first evaluation of the loop transformations. I also believe that gimplification already handles too many things which conceptually are not related to the translation to GIMPLE. Having a separate pass seems to be the right move to achieve a better separation of concerns. I think this will be even more important in the future as the size of the loop transformation implementation keeps growing. As you mention below, several new constructs are already planned. > For C, I think the lowering of loop transformation constructs or at least > determining what it means can be done right after we actually parse it and > before we finalize the OMP_FOR eetc. that wraps it if any. As discussed last > week at F2F, I think we want to remember in OMP_FOR_ORIG_DECLS the user > iterators on the loop transformation constructs and take it into account > for data sharing purposes. > > For C++ in templates we obviously need to defer that until instantiations, > the constants in the clauses etc. could be template parameters etc. > > For Fortran during resolving. > >> The "unroll" and "tile" directives are >> internally implemented as clauses. This fits the representation of > So perhaps just use OMP_UNROLL/OMP_TILE as GENERIC constructs like > OMP_FOR etc. but with some argument where from the early loop > transformation analysis you can remember the important stuff, whether > does the loop transformation result in a canonical loop nest or not > and in the former case with how many nested loops. > > And then handle the actual transformation IMHO best at gimplification > time, find them in the OMP_FOR etc. body if they are nested in there, > let the transformation happen on GENERIC before the containing OMP_FOR > etc. if any is actually finalized and from the transformation remember > the original user decls and what should happen with them for data sharing > (e.g. lastprivate/lastprivate conditional). > From the slides I saw last week, a lot of other transformations are in the > planning, like loop reversal etc. Correct, more transformations are planned. I have been following the design discussions and I do not think that any of those constructs will cause problems to the implementation contained in this patch series. Or do you see one that would be problematic? > And, I think even in OpenMP 5.1 nothing prevents e.g. > #pragma omp for collapse(3) // etc. > #pragma omp tile sizes (4, 2, 2) > #pragma omp tile sizes (4, 8, 16) > for (int i = 0; i < 64; ++i) > for (int j = 0; j < 64; ++j) > for (int k = 0; k < 64; ++k) > body; > where the inner tile takes the i and j loops and makes > for (int i1 = 0; i1 < 64; i1 += 4) > for (int j1 = 0; j1 < 64; j1 += 8) > for (int k1 = 0; k1 < 64; k1 += 16) > for (int i2 = 0; i2 < 4; i2++) > { > int i = i1 + i2; > for (int j2 = 0; j2 < 8; j2++) > { > int j = j1 + j2; > for (int k2 = 0; k2 < 16; k2++) > { > int k = k1 + k2; > body; > } > } > } > out of it with 3 outer loops which have canonical loop form (the rest > doesn't). And then the second tile takes the outermost 3 of those generated > loops and tiles them again, making it into again 3 canonical loop form > loops plus stuff inside of it. > Or one can replace the > #pragma omp for collapse(3) // etc. > with > #pragma omp for > #pragma omp unroll partial(2) > which furthermore unrolls the outermost generated loop from the outer tile > turning it into 1 canonical loop form loop plus stuff in it. You are right, this is valid and the implementation in this patch series is able to cope with this. The patches contain corresponding tests. The most extensive testing for different combinations of loop transformations, collapse values, and other directives is done in the libgomp/testsuite/libgomp.c-c++-common/loop-transforms/matrix-* tests. > Or of course as you have in your testcases, some loop transformation > constructs could be used on more nested loops, not necessarily before > the outermost one. But still, in all cases you need to know quite early > how many canonical loop form nested loops you get from each loop > transformation, so that it can be e.g. checked against the collapse/ordered > clauses. You are right and this happens in the implementation. The front ends extend the "collapse" (i.e. the loop nest depth) of the OMP_FOR-like constructs as necessary for the transformations (and rejects invalid combinations, e.g. "collapse" larger than the tiled depth, early) and the transformation pass reduces the "collapse" of the gomp_for to the value requested by the user code after the transformations have been executed. > Feel free to disagree if you think your approach is able to handle all of > this, just put details in why do you think so. I think it is able to handle everything, as I described above. Thanks, Frederik