From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by sourceware.org (Postfix) with ESMTPS id 0BF453858C53 for ; Mon, 15 May 2023 10:19:11 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 0BF453858C53 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=redhat.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1684145950; h=from:from:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:in-reply-to:in-reply-to: references:references; bh=9tt38e83XECUbLSAUkuogVs3zb3MNf3pzlI+yqlz5EI=; b=Zd5cr/GNWlQLfbFLercoYyCNTEDAUoMJvNhRshwlwjobkB8l7Fufd2g/gInqK/W0qq9qum 6GS99zDBYJlK2CTpUB0mR20kctnMRYnKuP7c2UbnlLthhcCrhfkHWmII3GBenO4oybIR1m cI19QzK7MXniVpcx9RPVJRPxQZX7E44= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-374-PT2MmGNKO827tyuExPI6BA-1; Mon, 15 May 2023 06:19:05 -0400 X-MC-Unique: PT2MmGNKO827tyuExPI6BA-1 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.rdu2.redhat.com [10.11.54.5]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 1368D282CCA8; Mon, 15 May 2023 10:19:05 +0000 (UTC) Received: from tucnak.zalov.cz (unknown [10.39.192.17]) by smtp.corp.redhat.com (Postfix) with ESMTPS id AADB063F3D; Mon, 15 May 2023 10:19:04 +0000 (UTC) Received: from tucnak.zalov.cz (localhost [127.0.0.1]) by tucnak.zalov.cz (8.17.1/8.17.1) with ESMTPS id 34FAJ1a0175252 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NOT); Mon, 15 May 2023 12:19:02 +0200 Received: (from jakub@localhost) by tucnak.zalov.cz (8.17.1/8.17.1/Submit) id 34FAJ0Pi175251; Mon, 15 May 2023 12:19:00 +0200 Date: Mon, 15 May 2023 12:19:00 +0200 From: Jakub Jelinek To: Frederik Harwath Cc: gcc-patches@gcc.gnu.org, fortran@gcc.gnu.org, tobias@codesourcery.com, joseph@codesourcery.com, jason@redhat.com Subject: Re: [PATCH 0/7] openmp: OpenMP 5.1 loop transformation directives Message-ID: Reply-To: Jakub Jelinek References: <20230324153046.3996092-1-frederik@codesourcery.com> MIME-Version: 1.0 In-Reply-To: <20230324153046.3996092-1-frederik@codesourcery.com> X-Scanned-By: MIMEDefang 3.1 on 10.11.54.5 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=us-ascii Content-Disposition: inline X-Spam-Status: No, score=-2.6 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Fri, Mar 24, 2023 at 04:30:38PM +0100, Frederik Harwath wrote: > this patch series implements the OpenMP 5.1 "unroll" and "tile" > constructs. It includes changes to the C,C++, and Fortran front end > for parsing the new constructs and a new middle-end > "omp_transform_loops" pass which implements the transformations in a > source language agnostic way. I'm afraid we can't do it this way, at least not completely. The OpenMP requirements and what is being discussed for further loop transformations pretty much requires parts of it to be done as soon as possible. My understanding is that that is where other implementations implement that too and would also prefer GCC not to be the only implementation that takes significantly different decision in that case from other implementations like e.g. in the offloading case (where all other implementations preprocess/parse etc. source multiple times compared to GCC splitting stuff only at IPA time; this affects what can be done with metadirectives, declare variant etc.). Now, e.g. data sharing is done almost exclusively during gimplification, the proposed pass is later than that; it needs to be done before the data sharing. Ditto doacross handling. The normal loop constructs (OMP_FOR, OMP_SIMD, OMP_DISTRIBUTE, OMP_LOOP) already need to know given their collapse/ordered how many loops they are actually associated with and the loop transformation constructs can change that. So, I think we need to do the loop transformations in the FEs, that doesn't mean we need to write everything 3 times, once for each frontend. Already now, e.g. various stuff is shared between C and C++ FEs in c-family, though how much can be shared between c-family and Fortran is to be discovered. Or at least partially, to the extent that we compute how many canonical loops the loop transformations result in, what artificial iterators they will use etc., so that during gimplification we can take all that into account and then can do the actual transformations later. For C, I think the lowering of loop transformation constructs or at least determining what it means can be done right after we actually parse it and before we finalize the OMP_FOR eetc. that wraps it if any. As discussed last week at F2F, I think we want to remember in OMP_FOR_ORIG_DECLS the user iterators on the loop transformation constructs and take it into account for data sharing purposes. For C++ in templates we obviously need to defer that until instantiations, the constants in the clauses etc. could be template parameters etc. For Fortran during resolving. > The "unroll" and "tile" directives are > internally implemented as clauses. This fits the representation of So perhaps just use OMP_UNROLL/OMP_TILE as GENERIC constructs like OMP_FOR etc. but with some argument where from the early loop transformation analysis you can remember the important stuff, whether does the loop transformation result in a canonical loop nest or not and in the former case with how many nested loops. And then handle the actual transformation IMHO best at gimplification time, find them in the OMP_FOR etc. body if they are nested in there, let the transformation happen on GENERIC before the containing OMP_FOR etc. if any is actually finalized and from the transformation remember the original user decls and what should happen with them for data sharing (e.g. lastprivate/lastprivate conditional). >From the slides I saw last week, a lot of other transformations are in the planning, like loop reversal etc. And, I think even in OpenMP 5.1 nothing prevents e.g. #pragma omp for collapse(3) // etc. #pragma omp tile sizes (4, 2, 2) #pragma omp tile sizes (4, 8, 16) for (int i = 0; i < 64; ++i) for (int j = 0; j < 64; ++j) for (int k = 0; k < 64; ++k) body; where the inner tile takes the i and j loops and makes for (int i1 = 0; i1 < 64; i1 += 4) for (int j1 = 0; j1 < 64; j1 += 8) for (int k1 = 0; k1 < 64; k1 += 16) for (int i2 = 0; i2 < 4; i2++) { int i = i1 + i2; for (int j2 = 0; j2 < 8; j2++) { int j = j1 + j2; for (int k2 = 0; k2 < 16; k2++) { int k = k1 + k2; body; } } } out of it with 3 outer loops which have canonical loop form (the rest doesn't). And then the second tile takes the outermost 3 of those generated loops and tiles them again, making it into again 3 canonical loop form loops plus stuff inside of it. Or one can replace the #pragma omp for collapse(3) // etc. with #pragma omp for #pragma omp unroll partial(2) which furthermore unrolls the outermost generated loop from the outer tile turning it into 1 canonical loop form loop plus stuff in it. Or of course as you have in your testcases, some loop transformation constructs could be used on more nested loops, not necessarily before the outermost one. But still, in all cases you need to know quite early how many canonical loop form nested loops you get from each loop transformation, so that it can be e.g. checked against the collapse/ordered clauses. Feel free to disagree if you think your approach is able to handle all of this, just put details in why do you think so. Jakub