From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 26944 invoked by alias); 14 Jul 2015 11:05:33 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 26933 invoked by uid 89); 14 Jul 2015 11:05:32 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.3 required=5.0 tests=AWL,BAYES_00,RCVD_IN_DNSWL_LOW,SPF_PASS autolearn=ham version=3.3.2 X-HELO: relay1.mentorg.com Received: from relay1.mentorg.com (HELO relay1.mentorg.com) (192.94.38.131) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Tue, 14 Jul 2015 11:05:30 +0000 Received: from nat-ies.mentorg.com ([192.94.31.2] helo=SVR-IES-FEM-01.mgc.mentorg.com) by relay1.mentorg.com with esmtp id 1ZEy1e-0003Ra-5v from Thomas_Schwinge@mentor.com ; Tue, 14 Jul 2015 04:05:26 -0700 Received: from feldtkeller.schwinge.homeip.net (137.202.0.76) by SVR-IES-FEM-01.mgc.mentorg.com (137.202.0.104) with Microsoft SMTP Server id 14.3.224.2; Tue, 14 Jul 2015 12:05:24 +0100 From: Thomas Schwinge To: Chung-Lin Tang CC: Jakub Jelinek , gcc-patches , Tom de Vries Subject: Re: [PATCH, gomp4] Propagate independent clause for OpenACC kernels pass In-Reply-To: <874ml73wtj.fsf@schwinge.name> References: <55A4A21C.1070004@codesourcery.com> <874ml73wtj.fsf@schwinge.name> User-Agent: Notmuch/0.9-101-g81dad07 (http://notmuchmail.org) Emacs/24.3.1 (x86_64-pc-linux-gnu) Date: Tue, 14 Jul 2015 11:05:00 -0000 Message-ID: <87zj2z2e4x.fsf@schwinge.name> MIME-Version: 1.0 Content-Type: multipart/signed; boundary="=-=-="; micalg=pgp-sha1; protocol="application/pgp-signature" X-SW-Source: 2015-07/txt/msg01134.txt.bz2 --=-=-= Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Content-length: 4789 Hi! On Tue, 14 Jul 2015 11:36:24 +0200, I wrote: > On Tue, 14 Jul 2015 13:46:04 +0800, Chung-Lin Tang wrote: > > this patch provides a 'bool independent' field in struct loop, which > > will be switched on by an "independent" clause in a #pragma acc loop di= rective. >=20 > Thanks! >=20 >=20 > This patch has been developed in context of OpenACC kernels constructs, > but, is there anything still to be done regarding OpenACC parallel > constructs? That is, are we currently *using* the "independent yes/no" > information appropriately for these? Tom mentioned: | openacc spec: | ... | 2.7.9 independent clause | In a kernels construct, the independent clause tells the implementation=20 | that the iterations of this loop are data-independent with respect to=20 | each other. This allows the implementation to generate code to execute=20 | the iterations in parallel with no synchronization. | In a parallel construct, the independent clause is implied on all loop=20 | directives without a seq clause. | ... | | I think you're sort of asking if the seq clause has been implemented. | | openacc spec: | ... | 2.7.5 seq clause | The seq clause specifies that the associated loop or loops are to be=20 | executed sequentially by the accelerator. This clause will override any=20 | automatic parallelization or vectorization. | ... Thanks, and right, I also realized that. ;-) Yet, my request is still to properly "document" this. That is, if the idea is that gcc/omp-low.c:scan_omp_for makes sure to emit a diagnostic for the invalid combination of a gang/worker/vector clause together with a seq clause, then in combination with the code newly added to gcc/omp-low.c:find_omp_for_region_data to set region->independent =3D true inside OpenACC parallel constructs, can't we then assert(region->independent =3D=3D true) in expand_omp_for_static_chunk and expand_omp_for_static_nochunk? So far it looks to me as if for OpenACC parallel constructs, we only have a producer of region->independent =3D true, but no consumer. Even if the latter one is "implicit", we should "document" (using an assertion) this in some way, for clarity. While looking at that code, are we convinced that the diagnostic machinery and subsequent handling will do the right thing in such cases: [...] #pragma acc loop [gang/worker/vector] for [...] #pragma acc loop seq for[...] To me it looks (but I have not verified) that in such cases, the inner region's ctx->gwv_this will have been initialized from the outer_ctx, so to some combination of gang/worker/vector, and will that then be used to parallelize the inner loop, which shouldn't be done, as I'm understanding this? It seems to be as if this gang/worker/vector/seq/independent clause handling code that is currently distributed over gcc/omp-low.c:scan_omp_for and gcc/omp-low.c:find_omp_for_region_data (and, worse, also the front ends; see below) should be merged into one place. gcc/fortran/openmp.c:resolve_oacc_loop_blocks emits a diagnostic for seq with independent; gcc/omp-low.c:scan_omp_for doesn't. Should it? Then, why do we need this Fortran-specific checking code? Should the additional checking being done there get moved to OMP lowering? In the C and C++ front ends, there's related checking code for those clause attached to OpenACC routine constructs; not sure if that checking should also be handled during OMP lowering, in one place? I couldn't find similar code in the Fortran front end (but didn't look very hard). This is reminiscent of the discussion started in and following messages, about using gcc/omp-low.c:check_omp_nesting_restrictions to do such checking (which Cesar has not yet followed up on, as far as I know). A few more points: Does OMP_CLAUSE_INDEPENDENT need to be handled in gcc/c-family/c-omp.c:c_oacc_split_loop_clauses? While looking at that, ;-) again a few more things... Are others clauses missing to be handled there: tile, device_type (probably not; has all been processed before?)? Why is the firstprivate clause being duplicated for loop_clauses/non_loop_clauses? In OpenACC, there is no firstprivate clause for loop constructs. Why is the private clause being duplicated for loop_clauses/non_loop_clauses? My understanding is that the private clause in a combined OpenACC parallel loop construct is to be handled as if it were attached to the loop construct (... which is in contrast to firstprivate -- yes OpenACC 2.0a is a bit assymetric in that regard). What about the corresponding Fortran code, gcc/fortran/trans-openmp.c:gfc_trans_oacc_combined_directive? Do we have test cases to cover all this? Gr=C3=BC=C3=9Fe, Thomas --=-=-= Content-Type: application/pgp-signature Content-length: 472 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAEBAgAGBQJVpOzuAAoJEPoxNhtoi6COFfEH/A//ig7DN/Bw3PQYqmoOBYAB QlXXaJV0knym16gGfVBlvKzT3lJn1ARtMF94m/t9GJmV+GADt2FAJi370uigkJQ0 FEyz4nUkJ2A3z8K3fbk5aoqhCfHXKsHd1wHiD6Ah/HkQBwtxuXlGzHE02e9EIFUp uQtUM55OWf7Qpa/R2rSVYjI4GtI5V5wCUDP778BfEjFWSJlvb2YVJ9uHrprqqEva Qe4ImfQu+tfQPHOJHD5prPAA0xS1DU28p27jrmisXzbckdhnPrv8qrpLBa5qVGwu KIJDAx2G/PvdxP8Jl32Jcms3+3oUCw67SetJyFeExtZqyhGwziNUCF16c2WJCi0= =wYv4 -----END PGP SIGNATURE----- --=-=-=--