From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 92878 invoked by alias); 10 Feb 2016 11:51:18 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 92864 invoked by uid 89); 10 Feb 2016 11:51:17 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.0 required=5.0 tests=AWL,BAYES_20,RCVD_IN_DNSWL_NONE,SPF_PASS autolearn=ham version=3.3.2 spammy=checkpoint, chose, 9,7, 97 X-HELO: relay1.mentorg.com Received: from relay1.mentorg.com (HELO relay1.mentorg.com) (192.94.38.131) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Wed, 10 Feb 2016 11:51:10 +0000 Received: from nat-ies.mentorg.com ([192.94.31.2] helo=SVR-IES-FEM-02.mgc.mentorg.com) by relay1.mentorg.com with esmtp id 1aTTIX-0002Bc-AA from Thomas_Schwinge@mentor.com ; Wed, 10 Feb 2016 03:51:06 -0800 Received: from hertz.schwinge.homeip.net (137.202.0.76) by SVR-IES-FEM-02.mgc.mentorg.com (137.202.0.106) with Microsoft SMTP Server id 14.3.224.2; Wed, 10 Feb 2016 11:51:03 +0000 From: Thomas Schwinge To: Bernd Schmidt , Jakub Jelinek CC: , Tom de Vries Subject: Re: Un-parallelized OpenACC kernels constructs with nvptx offloading: "avoid offloading" In-Reply-To: <87zivg8rcy.fsf@hertz.schwinge.homeip.net> References: <87r3hac1w9.fsf@hertz.schwinge.homeip.net> <569D2059.4010105@mentor.com> <87d1subnu5.fsf@hertz.schwinge.homeip.net> <87a8nyawph.fsf@hertz.schwinge.homeip.net> <20160122083625.GL3017@tucnak.redhat.com> <56A22C2E.6000408@redhat.com> <20160122132538.GT3017@tucnak.redhat.com> <56A22F37.5010505@redhat.com> <87zivg8rcy.fsf@hertz.schwinge.homeip.net> User-Agent: Notmuch/0.9-101-g81dad07 (http://notmuchmail.org) Emacs/24.4.1 (x86_64-pc-linux-gnu) Date: Wed, 10 Feb 2016 11:51:00 -0000 Message-ID: <87h9hg9450.fsf@hertz.schwinge.homeip.net> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable X-SW-Source: 2016-02/txt/msg00668.txt.bz2 Hi! Ping. On Thu, 04 Feb 2016 15:47:25 +0100, I wrote: > Here is the patch re-worked for trunk. Instead of passing > -foffload-force in the affected libgomp test cases, I instead chose to > have them expect the warning. This way, we're testing more in line to > what users will be doing, and we'll notice how the OpenACC kernels > handling improves, when parloops gets able to parallelize more offloaded > code (and the "avoid offloading" handling will no longer trigger). OK to > commit? >=20 > commit acd66946777671486a0f69706b25a3ec5f877306 > Author: Thomas Schwinge > Date: Tue Feb 2 20:41:42 2016 +0100 >=20 > Un-parallelized OpenACC kernels constructs with nvptx offloading: "av= oid offloading" >=20=20=20=20=20 > gcc/ > * common.opt: Add -foffload-force. > * lto-wrapper.c (merge_and_complain, append_compiler_options): > Handle it. > * doc/invoke.texi: Document it. > * config/nvptx/mkoffload.c (struct id_map): Add "flags" member. > (record_id): Parse, and set it. > (process): Use it. > * config/nvptx/nvptx.c (nvptx_attribute_table): Add "omp avoid > offloading". > (nvptx_record_offload_symbol): Use it. > (nvptx_goacc_validate_dims): Set it. > libgomp/ > * libgomp.h (gomp_offload_target_available_p): New function > declaration. > * target.c (gomp_offload_target_available_p): New function > definition. > (GOMP_offload_register_ver, GOMP_offload_unregister_ver) > (gomp_init_device, gomp_unload_device): Handle and document "avoid > offloading" flag ("host_table =3D=3D NULL"). > (resolve_device): Document "avoid offloading". > * oacc-init.c (resolve_device): Likewise. > * libgomp.texi (Enabling OpenACC): Likewise. > * testsuite/lib/libgomp.exp > (check_effective_target_nvptx_offloading_configured): New proc > definition. > * testsuite/libgomp.oacc-c-c++-common/avoid-offloading-1.c: New > file. > * testsuite/libgomp.oacc-c-c++-common/avoid-offloading-2.c: > Likewise. > * testsuite/libgomp.oacc-c-c++-common/avoid-offloading-3.c: > Likewise. > * testsuite/libgomp.oacc-fortran/avoid-offloading-1.f: Likewise. > * testsuite/libgomp.oacc-fortran/avoid-offloading-2.f: Likewise. > * testsuite/libgomp.oacc-fortran/avoid-offloading-3.f: Likewise. > * testsuite/libgomp.oacc-c-c++-common/abort-3.c: Expect warning. > * testsuite/libgomp.oacc-c-c++-common/abort-4.c: Likewise. > * testsuite/libgomp.oacc-c-c++-common/combined-directives-1.c: > Likewise. > * testsuite/libgomp.oacc-c-c++-common/default-1.c: Likewise. > * testsuite/libgomp.oacc-c-c++-common/deviceptr-1.c: Likewise. > * testsuite/libgomp.oacc-c-c++-common/kernels-1.c: Likewise. > * testsuite/libgomp.oacc-c-c++-common/kernels-alias-ipa-pta-2.c: > Likewise. > * testsuite/libgomp.oacc-c-c++-common/kernels-alias-ipa-pta-3.c: > Likewise. > * testsuite/libgomp.oacc-c-c++-common/kernels-alias-ipa-pta.c: > Likewise. > * testsuite/libgomp.oacc-c-c++-common/kernels-empty.c: Likewise. > * testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-2.c: > Likewise. > * testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-3.c: > Likewise. > * testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-4.c: > Likewise. > * testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-5.c: > Likewise. > * testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-6.c: > Likewise. > * testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq.c: > Likewise. > * testsuite/libgomp.oacc-c-c++-common/kernels-loop-collapse.c: > Likewise. > * testsuite/libgomp.oacc-fortran/combined-directives-1.f90: > Likewise. > * testsuite/libgomp.oacc-fortran/non-scalar-data.f90: Likewise. >=20=20=20=20=20 > libgomp/ > * testsuite/libgomp.oacc-c-c++-common/combined-directives-1.c: Set > "-ftree-parallelize-loops=3D32". > * testsuite/libgomp.oacc-c-c++-common/default-1.c: Likewise. > * testsuite/libgomp.oacc-c-c++-common/host_data-1.c: Likewise. > * testsuite/libgomp.oacc-c-c++-common/kernels-1.c: Likewise. > * testsuite/libgomp.oacc-c-c++-common/nested-2.c: Likewise. > --- > gcc/common.opt | 4 + > gcc/config/nvptx/mkoffload.c | 73 +++++++++++- > gcc/config/nvptx/nvptx.c | 42 ++++++- > gcc/doc/invoke.texi | 12 +- > gcc/lto-wrapper.c | 2 + > libgomp/libgomp.h | 1 + > libgomp/libgomp.texi | 8 ++ > libgomp/oacc-init.c | 19 ++- > libgomp/target.c | 122 ++++++++++++++= ++---- > libgomp/testsuite/lib/libgomp.exp | 10 ++ > .../testsuite/libgomp.oacc-c-c++-common/abort-3.c | 4 +- > .../testsuite/libgomp.oacc-c-c++-common/abort-4.c | 4 +- > .../libgomp.oacc-c-c++-common/avoid-offloading-1.c | 28 +++++ > .../libgomp.oacc-c-c++-common/avoid-offloading-2.c | 38 ++++++ > .../libgomp.oacc-c-c++-common/avoid-offloading-3.c | 29 +++++ > .../combined-directives-1.c | 4 +- > .../libgomp.oacc-c-c++-common/default-1.c | 4 +- > .../libgomp.oacc-c-c++-common/deviceptr-1.c | 4 +- > .../libgomp.oacc-c-c++-common/host_data-1.c | 1 + > .../libgomp.oacc-c-c++-common/kernels-1.c | 10 +- > .../kernels-alias-ipa-pta-2.c | 4 +- > .../kernels-alias-ipa-pta-3.c | 4 +- > .../kernels-alias-ipa-pta.c | 4 +- > .../libgomp.oacc-c-c++-common/kernels-empty.c | 2 +- > .../kernels-loop-and-seq-2.c | 3 +- > .../kernels-loop-and-seq-3.c | 4 +- > .../kernels-loop-and-seq-4.c | 3 +- > .../kernels-loop-and-seq-5.c | 3 +- > .../kernels-loop-and-seq-6.c | 3 +- > .../kernels-loop-and-seq.c | 4 +- > .../kernels-loop-collapse.c | 3 +- > .../testsuite/libgomp.oacc-c-c++-common/nested-2.c | 2 +- > .../libgomp.oacc-fortran/avoid-offloading-1.f | 32 +++++ > .../libgomp.oacc-fortran/avoid-offloading-2.f | 41 +++++++ > .../libgomp.oacc-fortran/avoid-offloading-3.f | 31 +++++ > .../libgomp.oacc-fortran/combined-directives-1.f90 | 5 +- > .../libgomp.oacc-fortran/non-scalar-data.f90 | 5 +- > 37 files changed, 494 insertions(+), 78 deletions(-) >=20 > diff --git gcc/common.opt gcc/common.opt > index 520fa9c..2cf798d 100644 > --- gcc/common.opt > +++ gcc/common.opt > @@ -1779,6 +1779,10 @@ Enum(offload_abi) String(ilp32) Value(OFFLOAD_ABI_= ILP32) > EnumValue > Enum(offload_abi) String(lp64) Value(OFFLOAD_ABI_LP64) >=20=20 > +foffload-force > +Common Var(flag_offload_force) > +Force offloading if the compiler wanted to avoid it. > + > fomit-frame-pointer > Common Report Var(flag_omit_frame_pointer) Optimization > When possible do not generate stack frames. > diff --git gcc/config/nvptx/mkoffload.c gcc/config/nvptx/mkoffload.c > index c8eed45..586ee8b 100644 > --- gcc/config/nvptx/mkoffload.c > +++ gcc/config/nvptx/mkoffload.c > @@ -41,9 +41,19 @@ const char tool_name[] =3D "nvptx mkoffload"; >=20=20 > #define COMMENT_PREFIX "#" >=20=20 > +enum id_map_flag > + { > + /* All clear. */ > + ID_MAP_FLAG_NONE =3D 0, > + /* Avoid offloading. For example, because there is no sufficient > + parallelism. */ > + ID_MAP_FLAG_AVOID_OFFLOADING =3D 1 > + }; > + > struct id_map > { > id_map *next; > + int flags; > char *ptx_name; > }; >=20=20 > @@ -107,6 +117,38 @@ record_id (const char *p1, id_map ***where) > fatal_error (input_location, "malformed ptx file"); >=20=20 > id_map *v =3D XNEW (id_map); > + > + /* Do we have any flags? */ > + v->flags =3D ID_MAP_FLAG_NONE; > + if (p1[0] =3D=3D '(') > + { > + /* Current flag. */ > + const char *cur =3D p1 + 1; > + > + /* Seek to the beginning of ") ". */ > + p1 =3D strchr (cur, ')'); > + if (!p1 || p1 > end || p1[1] !=3D ' ') > + fatal_error (input_location, "malformed ptx file: " > + "expected \") \" at \"%s\"", cur); > + > + while (cur < p1) > + { > + const char *next =3D strchr (cur, ','); > + if (!next || next > p1) > + next =3D p1; > + > + if (strncmp (cur, "avoid offloading", next - cur - 1) =3D=3D 0) > + v->flags |=3D ID_MAP_FLAG_AVOID_OFFLOADING; > + else > + fatal_error (input_location, "malformed ptx file: " > + "unknown flag at \"%s\"", cur); > + > + cur =3D next; > + } > + > + /* Skip past ") ". */ > + p1 +=3D 2; > + } > size_t len =3D end - p1; > v->ptx_name =3D XNEWVEC (char, len + 1); > memcpy (v->ptx_name, p1, len); > @@ -296,12 +338,17 @@ process (FILE *in, FILE *out) > fprintf (out, "\n};\n\n"); >=20=20 > /* Dump out function idents. */ > + bool avoid_offloading_p =3D false; > fprintf (out, "static const struct nvptx_fn {\n" > " const char *name;\n" > " unsigned short dim[%d];\n" > "} func_mappings[] =3D {\n", GOMP_DIM_MAX); > for (comma =3D "", id =3D func_ids; id; comma =3D ",", id =3D id->next) > - fprintf (out, "%s\n\t{%s}", comma, id->ptx_name); > + { > + if (id->flags & ID_MAP_FLAG_AVOID_OFFLOADING) > + avoid_offloading_p =3D true; > + fprintf (out, "%s\n\t{%s}", comma, id->ptx_name); > + } > fprintf (out, "\n};\n\n"); >=20=20 > fprintf (out, > @@ -318,7 +365,11 @@ process (FILE *in, FILE *out) > " sizeof (var_mappings) / sizeof (var_mappings[0]),\n" > " func_mappings," > " sizeof (func_mappings) / sizeof (func_mappings[0])\n" > - "};\n\n"); > + "};\n"); > + if (avoid_offloading_p) > + /* Need a unique handle for target_data. */ > + fprintf (out, "static int target_data_avoid_offloading;\n"); > + fprintf (out, "\n"); >=20=20 > fprintf (out, "#ifdef __cplusplus\n" > "extern \"C\" {\n" > @@ -338,18 +389,28 @@ process (FILE *in, FILE *out) > fprintf (out, "static __attribute__((constructor)) void init (void)\n" > "{\n" > " GOMP_offload_register_ver (%#x, __OFFLOAD_TABLE__," > - "%d/*NVIDIA_PTX*/, &target_data);\n" > - "};\n", > + "%d/*NVIDIA_PTX*/, &target_data);\n", > GOMP_VERSION_PACK (GOMP_VERSION, GOMP_VERSION_NVIDIA_PTX), > GOMP_DEVICE_NVIDIA_PTX); > + if (avoid_offloading_p) > + fprintf (out, " GOMP_offload_register_ver (%#x, (void *) 0," > + "%d/*NVIDIA_PTX*/, &target_data_avoid_offloading);\n", > + GOMP_VERSION_PACK (GOMP_VERSION, GOMP_VERSION_NVIDIA_PTX), > + GOMP_DEVICE_NVIDIA_PTX); > + fprintf (out, "};\n"); >=20=20 > fprintf (out, "static __attribute__((destructor)) void fini (void)\n" > "{\n" > " GOMP_offload_unregister_ver (%#x, __OFFLOAD_TABLE__," > - "%d/*NVIDIA_PTX*/, &target_data);\n" > - "};\n", > + "%d/*NVIDIA_PTX*/, &target_data);\n", > GOMP_VERSION_PACK (GOMP_VERSION, GOMP_VERSION_NVIDIA_PTX), > GOMP_DEVICE_NVIDIA_PTX); > + if (avoid_offloading_p) > + fprintf (out, " GOMP_offload_unregister_ver (%#x, (void *) 0," > + "%d/*NVIDIA_PTX*/, &target_data_avoid_offloading);\n", > + GOMP_VERSION_PACK (GOMP_VERSION, GOMP_VERSION_NVIDIA_PTX), > + GOMP_DEVICE_NVIDIA_PTX); > + fprintf (out, "};\n"); > } >=20=20 > static void > diff --git gcc/config/nvptx/nvptx.c gcc/config/nvptx/nvptx.c > index 78614f8..fe28154 100644 > --- gcc/config/nvptx/nvptx.c > +++ gcc/config/nvptx/nvptx.c > @@ -3803,6 +3803,9 @@ static const struct attribute_spec nvptx_attribute_= table[] =3D > /* { name, min_len, max_len, decl_req, type_req, fn_type_req, handler, > affects_type_identity } */ > { "kernel", 0, 0, true, false, false, nvptx_handle_kernel_attribute, = false }, > + /* Avoid offloading. For example, because there is no sufficient > + parallelism. */ > + { "omp avoid offloading", 0, 0, true, false, false, NULL, false }, > { NULL, 0, 0, false, false, false, NULL, false } > }; > =0C > @@ -3867,7 +3870,10 @@ nvptx_record_offload_symbol (tree decl) > tree dims =3D TREE_VALUE (attr); > unsigned ix; >=20=20 > - fprintf (asm_out_file, "//:FUNC_MAP \"%s\"", > + fprintf (asm_out_file, "//:FUNC_MAP %s\"%s\"", > + (lookup_attribute ("omp avoid offloading", > + DECL_ATTRIBUTES (decl)) > + ? "(avoid offloading) " : ""), > IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (decl))); >=20=20 > for (ix =3D 0; ix !=3D GOMP_DIM_MAX; ix++, dims =3D TREE_CHAIN (dims)) > @@ -4124,6 +4130,40 @@ nvptx_expand_builtin (tree exp, rtx target, rtx AR= G_UNUSED (subtarget), > static bool > nvptx_goacc_validate_dims (tree decl, int dims[], int fn_level) > { > + /* Detect if a function is unsuitable for offloading. */ > + if (!flag_offload_force && decl) > + { > + tree oacc_function_attr =3D get_oacc_fn_attrib (decl); > + if (oacc_function_attr > + && oacc_fn_attrib_kernels_p (oacc_function_attr)) > + { > + bool avoid_offloading_p =3D true; > + for (unsigned ix =3D 0; ix !=3D GOMP_DIM_MAX; ix++) > + { > + if (dims[ix] > 1) > + { > + avoid_offloading_p =3D false; > + break; > + } > + } > + if (avoid_offloading_p) > + { > + /* OpenACC kernels constructs will never be parallelized for > + optimization levels smaller than -O2; avoid the diagnostic in > + this case. */ > + if (optimize >=3D 2) > + warning_at (DECL_SOURCE_LOCATION (decl), 0, > + "OpenACC kernels construct will be executed " > + "sequentially; will by default avoid offloading " > + "to prevent data copy penalty"); > + DECL_ATTRIBUTES (decl) > + =3D tree_cons (get_identifier ("omp avoid offloading"), > + NULL_TREE, DECL_ATTRIBUTES (decl)); > + > + } > + } > + } > + > bool changed =3D false; >=20=20 > /* The vector size must be 32, unless this is a SEQ routine. */ > diff --git gcc/doc/invoke.texi gcc/doc/invoke.texi > index fcc404e..c09fbc5 100644 > --- gcc/doc/invoke.texi > +++ gcc/doc/invoke.texi > @@ -180,7 +180,8 @@ in the following sections. > @gccoptlist{-ansi -std=3D@var{standard} -fgnu89-inline @gol > -aux-info @var{filename} -fallow-parameterless-variadic-functions @gol > -fno-asm -fno-builtin -fno-builtin-@var{function} @gol > --fhosted -ffreestanding -fopenacc -fopenmp -fopenmp-simd @gol > +-fhosted -ffreestanding @gol > +-foffload-force -fopenacc -fopenacc-dim=3D@var{geom} -fopenmp -fopenmp-s= imd @gol > -fms-extensions -fplan9-extensions -fsso-struct=3D@var{endianness} > -fallow-single-precision -fcond-mismatch -flax-vector-conversions @gol > -fsigned-bitfields -fsigned-char @gol > @@ -1953,6 +1954,15 @@ This is equivalent to @option{-fno-hosted}. > @xref{Standards,,Language Standards Supported by GCC}, for details of > freestanding and hosted environments. >=20=20 > +@item -foffload-force > +@opindex -foffload-force > +The option @option{-foffload-force} forces offloading if the compiler > +wanted to avoid it. For example, when there isn't sufficient > +parallelism in certain offloading constructs, the compiler may come to > +the conclusion that offloading incurs too much overhead (for data > +transfers, for example), and unless overridden with this flag, it then > +suggests to the runtime (libgomp) to avoid offloading. > + > @item -fopenacc > @opindex fopenacc > @cindex OpenACC accelerator programming > diff --git gcc/lto-wrapper.c gcc/lto-wrapper.c > index ced6f2f..702ae47 100644 > --- gcc/lto-wrapper.c > +++ gcc/lto-wrapper.c > @@ -275,6 +275,7 @@ merge_and_complain (struct cl_decoded_option **decode= d_options, > case OPT_fsigned_zeros: > case OPT_ftrapping_math: > case OPT_fwrapv: > + case OPT_foffload_force: > case OPT_fopenmp: > case OPT_fopenacc: > case OPT_fcilkplus: > @@ -517,6 +518,7 @@ append_compiler_options (obstack *argv_obstack, struc= t cl_decoded_option *opts, > case OPT_fsigned_zeros: > case OPT_ftrapping_math: > case OPT_fwrapv: > + case OPT_foffload_force: > case OPT_fopenmp: > case OPT_fopenacc: > case OPT_fopenacc_dim_: > diff --git libgomp/libgomp.h libgomp/libgomp.h > index 7108a6d..8747b72 100644 > --- libgomp/libgomp.h > +++ libgomp/libgomp.h > @@ -984,6 +984,7 @@ extern void gomp_unmap_vars (struct target_mem_desc *= , bool); > extern void gomp_init_device (struct gomp_device_descr *); > extern void gomp_free_memmap (struct splay_tree_s *); > extern void gomp_unload_device (struct gomp_device_descr *); > +extern bool gomp_offload_target_available_p (int); >=20=20 > /* work.c */ >=20=20 > diff --git libgomp/libgomp.texi libgomp/libgomp.texi > index 987ee5f..5795c00 100644 > --- libgomp/libgomp.texi > +++ libgomp/libgomp.texi > @@ -1815,6 +1815,14 @@ flag @option{-fopenacc} must be specified. This e= nables the OpenACC directive > arranges for automatic linking of the OpenACC runtime library=20 > (@ref{OpenACC Runtime Library Routines}). >=20=20 > +Offloading is enabled by default. In some cases, the compiler may > +come to the conclusion that offloading incurs too much overhead, and > +suggest to the runtime to avoid it. To counteract that, you can use > +the option @option{-foffload-force} to force offloading in such cases. > +Alternatively, offloading is also enabled if a specific device type is > +requested, in a call to @code{acc_init} or by setting the > +@env{ACC_DEVICE_TYPE} environment variable, for example. > + > A complete description of all OpenACC directives accepted may be found i= n=20 > the @uref{http://www.openacc.org/, OpenACC} Application Programming > Interface manual, version 2.0. > diff --git libgomp/oacc-init.c libgomp/oacc-init.c > index 42d005d..2f053f3 100644 > --- libgomp/oacc-init.c > +++ libgomp/oacc-init.c > @@ -122,7 +122,10 @@ resolve_device (acc_device_t d, bool fail_is_error) > { > if (goacc_device_type) > { > - /* Lookup the named device. */ > + /* Lookup the device that has been explicitly named, so do not pay > + attention to gomp_offload_target_available_p. (That is, > + enforced usage even with an "avoid offloading" flag set, and > + hard error if not actually available.) */ > while (++d !=3D _ACC_device_hwm) > if (dispatchers[d] > && !strcasecmp (goacc_device_type, > @@ -148,8 +151,15 @@ resolve_device (acc_device_t d, bool fail_is_error) > case acc_device_not_host: > /* Find the first available device after acc_device_not_host. */ > while (++d !=3D _ACC_device_hwm) > - if (dispatchers[d] && dispatchers[d]->get_num_devices_func () > 0) > + if (dispatchers[d] > + && dispatchers[d]->get_num_devices_func () > 0 > + /* No device has been explicitly named, so pay attention to > + gomp_offload_target_available_p, to not decide on an offload > + target that we don't have offload data available for, or have an > + "avoid offloading" flag set for. */ > + && gomp_offload_target_available_p (dispatchers[d]->type)) > goto found; > + /* No non-host device found. */ > if (d_arg =3D=3D acc_device_default) > { > d =3D acc_device_host; > @@ -168,7 +178,7 @@ resolve_device (acc_device_t d, bool fail_is_error) > break; >=20=20 > default: > - if (d > _ACC_device_hwm) > + if (d >=3D _ACC_device_hwm) > { > if (fail_is_error) > goto unsupported_device; > @@ -181,7 +191,8 @@ resolve_device (acc_device_t d, bool fail_is_error) >=20=20 > assert (d !=3D acc_device_none > && d !=3D acc_device_default > - && d !=3D acc_device_not_host); > + && d !=3D acc_device_not_host > + && d < _ACC_device_hwm); >=20=20 > if (dispatchers[d] =3D=3D NULL && fail_is_error) > { > diff --git libgomp/target.c libgomp/target.c > index 96fe3d5..afcbedb 100644 > --- libgomp/target.c > +++ libgomp/target.c > @@ -1165,12 +1165,19 @@ gomp_unload_image_from_device (struct gomp_device= _descr *devicep, >=20=20 > /* This function should be called from every offload image while loading. > It gets the descriptor of the host func and var tables HOST_TABLE, TY= PE of > - the target, and TARGET_DATA needed by target plugin. */ > + the target, and TARGET_DATA needed by target plugin. > + > + If HOST_TABLE is NULL, this image (TARGET_DATA) is stored as an "avoid > + offloading" flag, and the TARGET_TYPE will not be considered by defau= lt > + until this image gets unregistered. */ >=20=20 > void > GOMP_offload_register_ver (unsigned version, const void *host_table, > int target_type, const void *target_data) > { > + gomp_debug (0, "%s (%u, %p, %d, %p)\n", __FUNCTION__, > + version, host_table, target_type, target_data); > + > int i; >=20=20 > if (GOMP_VERSION_LIB (version) > GOMP_VERSION) > @@ -1179,16 +1186,19 @@ GOMP_offload_register_ver (unsigned version, cons= t void *host_table, >=20=20=20=20 > gomp_mutex_lock (®ister_lock); >=20=20 > - /* Load image to all initialized devices. */ > - for (i =3D 0; i < num_devices; i++) > + if (host_table !=3D NULL) > { > - struct gomp_device_descr *devicep =3D &devices[i]; > - gomp_mutex_lock (&devicep->lock); > - if (devicep->type =3D=3D target_type > - && devicep->state =3D=3D GOMP_DEVICE_INITIALIZED) > - gomp_load_image_to_device (devicep, version, > - host_table, target_data, true); > - gomp_mutex_unlock (&devicep->lock); > + /* Load image to all initialized devices. */ > + for (i =3D 0; i < num_devices; i++) > + { > + struct gomp_device_descr *devicep =3D &devices[i]; > + gomp_mutex_lock (&devicep->lock); > + if (devicep->type =3D=3D target_type > + && devicep->state =3D=3D GOMP_DEVICE_INITIALIZED) > + gomp_load_image_to_device (devicep, version, > + host_table, target_data, true); > + gomp_mutex_unlock (&devicep->lock); > + } > } >=20=20 > /* Insert image to array of pending images. */ > @@ -1214,26 +1224,36 @@ GOMP_offload_register (const void *host_table, in= t target_type, >=20=20 > /* This function should be called from every offload image while unloadi= ng. > It gets the descriptor of the host func and var tables HOST_TABLE, TY= PE of > - the target, and TARGET_DATA needed by target plugin. */ > + the target, and TARGET_DATA needed by target plugin. > + > + If HOST_TABLE is NULL, the "avoid offloading" flag gets cleared for t= his > + image (TARGET_DATA), and this TARGET_TYPE may again be considered by > + default. */ >=20=20 > void > GOMP_offload_unregister_ver (unsigned version, const void *host_table, > int target_type, const void *target_data) > { > + gomp_debug (0, "%s (%u, %p, %d, %p)\n", __FUNCTION__, > + version, host_table, target_type, target_data); > + > int i; >=20=20 > gomp_mutex_lock (®ister_lock); >=20=20 > - /* Unload image from all initialized devices. */ > - for (i =3D 0; i < num_devices; i++) > + if (host_table !=3D NULL) > { > - struct gomp_device_descr *devicep =3D &devices[i]; > - gomp_mutex_lock (&devicep->lock); > - if (devicep->type =3D=3D target_type > - && devicep->state =3D=3D GOMP_DEVICE_INITIALIZED) > - gomp_unload_image_from_device (devicep, version, > - host_table, target_data); > - gomp_mutex_unlock (&devicep->lock); > + /* Unload image from all initialized devices. */ > + for (i =3D 0; i < num_devices; i++) > + { > + struct gomp_device_descr *devicep =3D &devices[i]; > + gomp_mutex_lock (&devicep->lock); > + if (devicep->type =3D=3D target_type > + && devicep->state =3D=3D GOMP_DEVICE_INITIALIZED) > + gomp_unload_image_from_device (devicep, version, > + host_table, target_data); > + gomp_mutex_unlock (&devicep->lock); > + } > } >=20=20 > /* Remove image from array of pending images. */ > @@ -1267,7 +1287,8 @@ gomp_init_device (struct gomp_device_descr *devicep) > for (i =3D 0; i < num_offload_images; i++) > { > struct offload_image_descr *image =3D &offload_images[i]; > - if (image->type =3D=3D devicep->type) > + if (image->type =3D=3D devicep->type > + && image->host_table !=3D NULL) > gomp_load_image_to_device (devicep, image->version, > image->host_table, image->target_data, > false); > @@ -1287,7 +1308,8 @@ gomp_unload_device (struct gomp_device_descr *devic= ep) > for (i =3D 0; i < num_offload_images; i++) > { > struct offload_image_descr *image =3D &offload_images[i]; > - if (image->type =3D=3D devicep->type) > + if (image->type =3D=3D devicep->type > + && image->host_table !=3D NULL) > gomp_unload_image_from_device (devicep, image->version, > image->host_table, > image->target_data); > @@ -1311,6 +1333,62 @@ gomp_free_memmap (struct splay_tree_s *mem_map) > } > } >=20=20 > +/* Do we have offload data available for the given offload target type? > + Instead of verifying that *all* offload data is available that could > + possibly be required, we instead just look for *any*. If we later fi= nd any > + offload data missing, that's user error. If any offload data of this= target > + type is tagged with an "avoid offloading" flag, do not consider this = target > + type available unless it has been initialized already. */ > + > +attribute_hidden bool > +gomp_offload_target_available_p (int type) > +{ > + bool available =3D false; > + > + /* Has the offload target type already been initialized? */ > + for (int i =3D 0; !available && i < num_devices; i++) > + { > + struct gomp_device_descr *devicep =3D &devices[i]; > + gomp_mutex_lock (&devicep->lock); > + if (devicep->type =3D=3D type > + && devicep->state =3D=3D GOMP_DEVICE_INITIALIZED) > + available =3D true; > + gomp_mutex_unlock (&devicep->lock); > + } > + > + /* If the offload target type has been initialized already, we ignore = "avoid > + offloading" flags. This is important, because data/state may be pr= esent > + on the device, that we must continue to use. */ > + if (!available) > + { > + gomp_mutex_lock (®ister_lock); > + if (num_offload_images =3D=3D 0) > + /* If there is no offload data available at all, there is no way to > + later fail to find any of it for a specific offload target type. > + This is the case where there are no offloaded code regions in user > + code, but the target type can be initialized successfully, and > + executable directqives be used, or runtime library calls be > + made. */ > + available =3D true; > + else > + { > + /* Can the offload target be initialized? */ > + for (int i =3D 0; !available && i < num_offload_images; i++) > + if (offload_images[i].type =3D=3D type > + && offload_images[i].host_table !=3D NULL) > + available =3D true; > + /* If yes, is an "avoid offloading" flag set? */ > + for (int i =3D 0; available && i < num_offload_images; i++) > + if (offload_images[i].type =3D=3D type > + && offload_images[i].host_table =3D=3D NULL) > + available =3D false; > + } > + gomp_mutex_unlock (®ister_lock); > + } > + > + return available; > +} > + > /* Host fallback for GOMP_target{,_ext} routines. */ >=20=20 > static void > diff --git libgomp/testsuite/lib/libgomp.exp libgomp/testsuite/lib/libgom= p.exp > index a4c9d83..8d2be80 100644 > --- libgomp/testsuite/lib/libgomp.exp > +++ libgomp/testsuite/lib/libgomp.exp > @@ -344,6 +344,16 @@ proc check_effective_target_offload_device_nonshared= _as { } { > } ] > } >=20=20 > +# Return 1 if the compiler has been configured for nvptx offloading. > + > +proc check_effective_target_nvptx_offloading_configured { } { > + # PR libgomp/65099: Currently, we only support offloading in 64-bit > + # configurations. > + global offload_targets > + return [expr [string match "*,nvptx,*" ",$offload_targets,"] \ > + && [is-effective-target lp64] ] > +} > + > # Return 1 if at least one nvidia board is present. >=20=20 > proc check_effective_target_openacc_nvidia_accel_present { } { > diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/abort-3.c libgomp/= testsuite/libgomp.oacc-c-c++-common/abort-3.c > index bca425e..23156d8 100644 > --- libgomp/testsuite/libgomp.oacc-c-c++-common/abort-3.c > +++ libgomp/testsuite/libgomp.oacc-c-c++-common/abort-3.c > @@ -1,5 +1,3 @@ > -/* { dg-do run } */ > - > #include > #include >=20=20 > @@ -7,7 +5,7 @@ int > main (void) > { > fprintf (stderr, "CheCKpOInT\n"); > -#pragma acc kernels > +#pragma acc kernels /* { dg-warning "OpenACC kernels construct will be e= xecuted sequentially; will by default avoid offloading to prevent data copy= penalty" "" { target nvptx_offloading_configured } } */ > { > abort (); > } > diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/abort-4.c libgomp/= testsuite/libgomp.oacc-c-c++-common/abort-4.c > index c29ca3f..f4d6a07 100644 > --- libgomp/testsuite/libgomp.oacc-c-c++-common/abort-4.c > +++ libgomp/testsuite/libgomp.oacc-c-c++-common/abort-4.c > @@ -1,12 +1,10 @@ > -/* { dg-do run } */ > - > #include >=20=20 > int > main (int argc, char **argv) > { >=20=20 > -#pragma acc kernels > +#pragma acc kernels /* { dg-warning "OpenACC kernels construct will be e= xecuted sequentially; will by default avoid offloading to prevent data copy= penalty" "" { target nvptx_offloading_configured } } */ > { > if (argc !=3D 1) > abort (); > diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/avoid-offloading-1= .c libgomp/testsuite/libgomp.oacc-c-c++-common/avoid-offloading-1.c > new file mode 100644 > index 0000000..08745fc > --- /dev/null > +++ libgomp/testsuite/libgomp.oacc-c-c++-common/avoid-offloading-1.c > @@ -0,0 +1,28 @@ > +/* Test that the compiler decides to "avoid offloading". */ > + > +/* { dg-additional-options "-ftree-parallelize-loops=3D32" } */ > +/* The ACC_DEVICE_TYPE environment variable gets set in the testing > + framework, and that overrides the "avoid offloading" flag at run time. > + { dg-xfail-run-if "TODO" { openacc_nvidia_accel_selected } } */ > + > +#include > + > +int main(void) > +{ > + int x, y; > + > +#pragma acc data copyout(x, y) > +#pragma acc kernels /* { dg-warning "OpenACC kernels construct will be e= xecuted sequentially; will by default avoid offloading to prevent data copy= penalty" "" { target nvptx_offloading_configured } } */ > + *((volatile int *) &x) =3D 33, y =3D acc_on_device (acc_device_host); > + > + if (x !=3D 33) > + __builtin_abort(); > +#if defined ACC_DEVICE_TYPE_host || defined ACC_DEVICE_TYPE_nvidia > + if (y !=3D 1) > + __builtin_abort(); > +#else > +# error Not ported to this ACC_DEVICE_TYPE > +#endif > + > + return 0; > +} > diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/avoid-offloading-2= .c libgomp/testsuite/libgomp.oacc-c-c++-common/avoid-offloading-2.c > new file mode 100644 > index 0000000..724228a > --- /dev/null > +++ libgomp/testsuite/libgomp.oacc-c-c++-common/avoid-offloading-2.c > @@ -0,0 +1,38 @@ > +/* Test that a user can override the compiler's "avoid offloading" > + decision at run time. */ > + > +/* { dg-additional-options "-ftree-parallelize-loops=3D32" } */ > + > +#include > + > +int main(void) > +{ > + /* Override the compiler's "avoid offloading" decision. */ > + acc_device_t d; > +#if defined ACC_DEVICE_TYPE_nvidia > + d =3D acc_device_nvidia; > +#elif defined ACC_DEVICE_TYPE_host > + d =3D acc_device_host; > +#else > +# error Not ported to this ACC_DEVICE_TYPE > +#endif > + acc_init (d); > + > + int x, y; > + > +#pragma acc data copyout(x, y) > +#pragma acc kernels /* { dg-warning "OpenACC kernels construct will be e= xecuted sequentially; will by default avoid offloading to prevent data copy= penalty" "" { target nvptx_offloading_configured } } */ > + *((volatile int *) &x) =3D 33, y =3D acc_on_device (acc_device_host); > + > + if (x !=3D 33) > + __builtin_abort(); > +#if defined ACC_DEVICE_TYPE_nvidia > + if (y !=3D 0) > + __builtin_abort(); > +#else > + if (y !=3D 1) > + __builtin_abort(); > +#endif > + > + return 0; > +} > diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/avoid-offloading-3= .c libgomp/testsuite/libgomp.oacc-c-c++-common/avoid-offloading-3.c > new file mode 100644 > index 0000000..2fb5196 > --- /dev/null > +++ libgomp/testsuite/libgomp.oacc-c-c++-common/avoid-offloading-3.c > @@ -0,0 +1,29 @@ > +/* Test that a user can override the compiler's "avoid offloading" > + decision at compile time. */ > + > +/* { dg-additional-options "-ftree-parallelize-loops=3D32" } */ > +/* Override the compiler's "avoid offloading" decision. > + { dg-additional-options "-foffload-force" } */ > + > +#include > + > +int main(void) > +{ > + int x, y; > + > +#pragma acc data copyout(x, y) > +#pragma acc kernels > + *((volatile int *) &x) =3D 33, y =3D acc_on_device (acc_device_host); > + > + if (x !=3D 33) > + __builtin_abort(); > +#if defined ACC_DEVICE_TYPE_nvidia > + if (y !=3D 0) > + __builtin_abort(); > +#else > + if (y !=3D 1) > + __builtin_abort(); > +#endif > + > + return 0; > +} > diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/combined-directive= s-1.c libgomp/testsuite/libgomp.oacc-c-c++-common/combined-directives-1.c > index dad6d13..87ca378 100644 > --- libgomp/testsuite/libgomp.oacc-c-c++-common/combined-directives-1.c > +++ libgomp/testsuite/libgomp.oacc-c-c++-common/combined-directives-1.c > @@ -1,6 +1,6 @@ > /* This test exercises combined directives. */ >=20=20 > -/* { dg-do run } */ > +/* { dg-additional-options "-ftree-parallelize-loops=3D32" } */ >=20=20 > #include >=20=20 > @@ -33,7 +33,7 @@ main (int argc, char **argv) > abort (); > } >=20=20 > -#pragma acc kernels loop copy (a[0:N]) copy (b[0:N]) > +#pragma acc kernels loop copy (a[0:N]) copy (b[0:N]) /* { dg-warning "Op= enACC kernels construct will be executed sequentially; will by default avoi= d offloading to prevent data copy penalty" "" { target nvptx_offloading_con= figured } } */ > for (i =3D 0; i < N; i++) > { > b[i] =3D 3.0; > diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/default-1.c libgom= p/testsuite/libgomp.oacc-c-c++-common/default-1.c > index 1ac0b95..8f0144c 100644 > --- libgomp/testsuite/libgomp.oacc-c-c++-common/default-1.c > +++ libgomp/testsuite/libgomp.oacc-c-c++-common/default-1.c > @@ -1,4 +1,4 @@ > -/* { dg-do run } */ > +/* { dg-additional-options "-ftree-parallelize-loops=3D32" } */ >=20=20 > #include >=20=20 > @@ -51,7 +51,7 @@ int test_kernels () > ary[i] =3D ~0; >=20=20 > /* val defaults to copy, ary defaults to copy. */ > -#pragma acc kernels copy(ondev) > +#pragma acc kernels copy(ondev) /* { dg-warning "OpenACC kernels constru= ct will be executed sequentially; will by default avoid offloading to preve= nt data copy penalty" "" { target nvptx_offloading_configured } } */ > { > ondev =3D acc_on_device (acc_device_not_host); > #pragma acc loop=20 > diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/deviceptr-1.c libg= omp/testsuite/libgomp.oacc-c-c++-common/deviceptr-1.c > index e271a37..9a5f7b1 100644 > --- libgomp/testsuite/libgomp.oacc-c-c++-common/deviceptr-1.c > +++ libgomp/testsuite/libgomp.oacc-c-c++-common/deviceptr-1.c > @@ -1,5 +1,3 @@ > -/* { dg-do run } */ > - > #include >=20=20 > int main (void) > @@ -10,7 +8,7 @@ int main (void) > a =3D A; >=20=20 > #pragma acc data copyout (a_1, a_2) > -#pragma acc kernels deviceptr (a) > +#pragma acc kernels deviceptr (a) /* { dg-warning "OpenACC kernels const= ruct will be executed sequentially; will by default avoid offloading to pre= vent data copy penalty" "" { target nvptx_offloading_configured } } */ > { > a_1 =3D a; > a_2 =3D &a; > diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/host_data-1.c libg= omp/testsuite/libgomp.oacc-c-c++-common/host_data-1.c > index 51745ba..3ef6f9b 100644 > --- libgomp/testsuite/libgomp.oacc-c-c++-common/host_data-1.c > +++ libgomp/testsuite/libgomp.oacc-c-c++-common/host_data-1.c > @@ -1,4 +1,5 @@ > /* { dg-do run { target openacc_nvidia_accel_selected } } */ > +/* { dg-additional-options "-ftree-parallelize-loops=3D32" } */ > /* { dg-additional-options "-lcuda -lcublas -lcudart" } */ >=20=20 > #include > diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-1.c libgom= p/testsuite/libgomp.oacc-c-c++-common/kernels-1.c > index 3acfdf5..614ad33 100644 > --- libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-1.c > +++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-1.c > @@ -1,4 +1,4 @@ > -/* { dg-do run } */ > +/* { dg-additional-options "-ftree-parallelize-loops=3D32" } */ >=20=20 > #include >=20=20 > @@ -73,7 +73,7 @@ int main (void) > i =3D -1; > j =3D -2; > v =3D 0; > -#pragma acc kernels /* copyout */ present_or_copyout (v) present_or_copy= in (i, j) > +#pragma acc kernels /* copyout */ present_or_copyout (v) present_or_copy= in (i, j) /* { dg-warning "OpenACC kernels construct will be executed seque= ntially; will by default avoid offloading to prevent data copy penalty" "" = { target nvptx_offloading_configured } } */ > { > if (i !=3D -1 || j !=3D -2) > abort (); > @@ -96,7 +96,7 @@ int main (void) > i =3D -1; > j =3D -2; > v =3D 0; > -#pragma acc kernels /* copyout */ present_or_copyout (v) present_or_copy= out (i, j) > +#pragma acc kernels /* copyout */ present_or_copyout (v) present_or_copy= out (i, j) /* { dg-warning "OpenACC kernels construct will be executed sequ= entially; will by default avoid offloading to prevent data copy penalty" ""= { target nvptx_offloading_configured } } */ > { > i =3D 2; > j =3D 1; > @@ -110,7 +110,7 @@ int main (void) > i =3D -1; > j =3D -2; > v =3D 0; > -#pragma acc kernels /* copyout */ present_or_copyout (v) present_or_copy= (i, j) > +#pragma acc kernels /* copyout */ present_or_copyout (v) present_or_copy= (i, j) /* { dg-warning "OpenACC kernels construct will be executed sequent= ially; will by default avoid offloading to prevent data copy penalty" "" { = target nvptx_offloading_configured } } */ > { > if (i !=3D -1 || j !=3D -2) > abort (); > @@ -126,7 +126,7 @@ int main (void) > i =3D -1; > j =3D -2; > v =3D 0; > -#pragma acc kernels /* copyout */ present_or_copyout (v) present_or_crea= te (i, j) > +#pragma acc kernels /* copyout */ present_or_copyout (v) present_or_crea= te (i, j) /* { dg-warning "OpenACC kernels construct will be executed seque= ntially; will by default avoid offloading to prevent data copy penalty" "" = { target nvptx_offloading_configured } } */ > { > i =3D 2; > j =3D 1; > diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-alias-ipa-= pta-2.c libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-alias-ipa-pta-2= .c > index 0f323c8..8d5101d 100644 > --- libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-alias-ipa-pta-2.c > +++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-alias-ipa-pta-2.c > @@ -1,4 +1,4 @@ > -/* { dg-additional-options "-O2 -fipa-pta" } */ > +/* { dg-additional-options "-fipa-pta" } */ >=20=20 > #include >=20=20 > @@ -11,7 +11,7 @@ main (void) > unsigned int *b =3D (unsigned int *)malloc (N * sizeof (unsigned int)); > unsigned int *c =3D (unsigned int *)malloc (N * sizeof (unsigned int)); >=20=20 > -#pragma acc kernels pcopyout (a[0:N], b[0:N], c[0:N]) > +#pragma acc kernels pcopyout (a[0:N], b[0:N], c[0:N]) /* { dg-warning "O= penACC kernels construct will be executed sequentially; will by default avo= id offloading to prevent data copy penalty" "" { target nvptx_offloading_co= nfigured } } */ > { > a[0] =3D 0; > b[0] =3D 1; > diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-alias-ipa-= pta-3.c libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-alias-ipa-pta-3= .c > index 654e750..3726b0c 100644 > --- libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-alias-ipa-pta-3.c > +++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-alias-ipa-pta-3.c > @@ -1,4 +1,4 @@ > -/* { dg-additional-options "-O2 -fipa-pta" } */ > +/* { dg-additional-options "-fipa-pta" } */ >=20=20 > #include >=20=20 > @@ -11,7 +11,7 @@ main (void) > unsigned int *b =3D a; > unsigned int *c =3D (unsigned int *)malloc (N * sizeof (unsigned int)); >=20=20 > -#pragma acc kernels pcopyout (a[0:N], b[0:N], c[0:N]) > +#pragma acc kernels pcopyout (a[0:N], b[0:N], c[0:N]) /* { dg-warning "O= penACC kernels construct will be executed sequentially; will by default avo= id offloading to prevent data copy penalty" "" { target nvptx_offloading_co= nfigured } } */ > { > a[0] =3D 0; > b[0] =3D 1; > diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-alias-ipa-= pta.c libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-alias-ipa-pta.c > index 44d4fd2..eea4f76 100644 > --- libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-alias-ipa-pta.c > +++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-alias-ipa-pta.c > @@ -1,4 +1,4 @@ > -/* { dg-additional-options "-O2 -fipa-pta" } */ > +/* { dg-additional-options "-fipa-pta" } */ >=20=20 > #include >=20=20 > @@ -11,7 +11,7 @@ main (void) > unsigned int b[N]; > unsigned int c[N]; >=20=20 > -#pragma acc kernels pcopyout (a, b, c) > +#pragma acc kernels pcopyout (a, b, c) /* { dg-warning "OpenACC kernels = construct will be executed sequentially; will by default avoid offloading t= o prevent data copy penalty" "" { target nvptx_offloading_configured } } */ > { > a[0] =3D 0; > b[0] =3D 1; > diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-empty.c li= bgomp/testsuite/libgomp.oacc-c-c++-common/kernels-empty.c > index a68a7cd..860b6da 100644 > --- libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-empty.c > +++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-empty.c > @@ -1,6 +1,6 @@ > int > main (void) > { > -#pragma acc kernels > +#pragma acc kernels /* { dg-warning "OpenACC kernels construct will be e= xecuted sequentially; will by default avoid offloading to prevent data copy= penalty" "" { target nvptx_offloading_configured } } */ > ; > } > diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-s= eq-2.c libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-2.c > index 2e4100f..5cdc200 100644 > --- libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-2.c > +++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-2.c > @@ -1,4 +1,3 @@ > -/* { dg-do run } */ > /* { dg-additional-options "-ftree-parallelize-loops=3D32" } */ >=20=20 > #include > @@ -8,7 +7,7 @@ > unsigned int > foo (int n, unsigned int *a) > { > -#pragma acc kernels copy (a[0:N]) > +#pragma acc kernels copy (a[0:N]) /* { dg-warning "OpenACC kernels const= ruct will be executed sequentially; will by default avoid offloading to pre= vent data copy penalty" "" { target nvptx_offloading_configured } } */ > { > a[0] =3D a[0] + 1; >=20=20 > diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-s= eq-3.c libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-3.c > index b3e736b..2e4d4d2 100644 > --- libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-3.c > +++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-3.c > @@ -1,4 +1,3 @@ > -/* { dg-do run } */ > /* { dg-additional-options "-ftree-parallelize-loops=3D32" } */ >=20=20 > #include > @@ -8,8 +7,7 @@ > unsigned int > foo (int n, unsigned int *a) > { > - > -#pragma acc kernels copy (a[0:N]) > +#pragma acc kernels copy (a[0:N]) /* { dg-warning "OpenACC kernels const= ruct will be executed sequentially; will by default avoid offloading to pre= vent data copy penalty" "" { target nvptx_offloading_configured } } */ > { > for (int i =3D 0; i < n; i++) > a[i] =3D 1; > diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-s= eq-4.c libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-4.c > index 8b9affa..5bf00db 100644 > --- libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-4.c > +++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-4.c > @@ -1,4 +1,3 @@ > -/* { dg-do run } */ > /* { dg-additional-options "-ftree-parallelize-loops=3D32" } */ >=20=20 > #include > @@ -8,7 +7,7 @@ > unsigned int > foo (int n, unsigned int *a) > { > -#pragma acc kernels copy (a[0:N]) > +#pragma acc kernels copy (a[0:N]) /* { dg-warning "OpenACC kernels const= ruct will be executed sequentially; will by default avoid offloading to pre= vent data copy penalty" "" { target nvptx_offloading_configured } } */ > { > a[0] =3D 2; >=20=20 > diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-s= eq-5.c libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-5.c > index 83d4e7f..d39b667 100644 > --- libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-5.c > +++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-5.c > @@ -1,4 +1,3 @@ > -/* { dg-do run } */ > /* { dg-additional-options "-ftree-parallelize-loops=3D32" } */ >=20=20 > #include > @@ -9,7 +8,7 @@ unsigned int > foo (int n, unsigned int *a) > { > int r; > -#pragma acc kernels copyout(r) copy (a[0:N]) > +#pragma acc kernels copyout(r) copy (a[0:N]) /* { dg-warning "OpenACC ke= rnels construct will be executed sequentially; will by default avoid offloa= ding to prevent data copy penalty" "" { target nvptx_offloading_configured = } } */ > { > r =3D a[0]; >=20=20 > diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-s= eq-6.c libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-6.c > index 01d5e5e..bb2e85b 100644 > --- libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-6.c > +++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-6.c > @@ -1,4 +1,3 @@ > -/* { dg-do run } */ > /* { dg-additional-options "-ftree-parallelize-loops=3D32" } */ >=20=20 > #include > @@ -8,7 +7,7 @@ > unsigned int > foo (int n, unsigned int *a) > { > -#pragma acc kernels copy (a[0:N]) > +#pragma acc kernels copy (a[0:N]) /* { dg-warning "OpenACC kernels const= ruct will be executed sequentially; will by default avoid offloading to pre= vent data copy penalty" "" { target nvptx_offloading_configured } } */ > { > int r =3D a[0]; >=20=20 > diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-s= eq.c libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq.c > index 61d1283..e513827 100644 > --- libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq.c > +++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq.c > @@ -1,4 +1,3 @@ > -/* { dg-do run } */ > /* { dg-additional-options "-ftree-parallelize-loops=3D32" } */ >=20=20 > #include > @@ -8,8 +7,7 @@ > unsigned int > foo (int n, unsigned int *a) > { > - > -#pragma acc kernels copy (a[0:N]) > +#pragma acc kernels copy (a[0:N]) /* { dg-warning "OpenACC kernels const= ruct will be executed sequentially; will by default avoid offloading to pre= vent data copy penalty" "" { target nvptx_offloading_configured } } */ > { > for (int i =3D 0; i < n; i++) > a[i] =3D 1; > diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-colla= pse.c libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-collapse.c > index f7f04cb..c4791a4 100644 > --- libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-collapse.c > +++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-collapse.c > @@ -1,4 +1,3 @@ > -/* { dg-do run } */ > /* { dg-additional-options "-ftree-parallelize-loops=3D32" } */ >=20=20 > #include > @@ -11,7 +10,7 @@ void __attribute__((noinline, noclone)) > foo (int m, int n) > { > int i, j; > - #pragma acc kernels > + #pragma acc kernels /* { dg-warning "OpenACC kernels construct will be= executed sequentially; will by default avoid offloading to prevent data co= py penalty" "" { target nvptx_offloading_configured } } */ > { > #pragma acc loop collapse(2) > for (i =3D 0; i < m; i++) > diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/nested-2.c libgomp= /testsuite/libgomp.oacc-c-c++-common/nested-2.c > index c164598..94a5ae2 100644 > --- libgomp/testsuite/libgomp.oacc-c-c++-common/nested-2.c > +++ libgomp/testsuite/libgomp.oacc-c-c++-common/nested-2.c > @@ -1,4 +1,4 @@ > -/* { dg-do run } */ > +/* { dg-additional-options "-ftree-parallelize-loops=3D32" } */ >=20=20 > #include >=20=20 > diff --git libgomp/testsuite/libgomp.oacc-fortran/avoid-offloading-1.f li= bgomp/testsuite/libgomp.oacc-fortran/avoid-offloading-1.f > new file mode 100644 > index 0000000..5f18b94 > --- /dev/null > +++ libgomp/testsuite/libgomp.oacc-fortran/avoid-offloading-1.f > @@ -0,0 +1,32 @@ > +! Test that the compiler decides to "avoid offloading". > + > +! { dg-do run } > +! { dg-additional-options "-cpp" } > +! { dg-additional-options "-ftree-parallelize-loops=3D32" } > +! The "avoid offloading" warning is only triggered for -O2 and higher. > +! { dg-xfail-if "n/a" { nvptx_offloading_configured } { "-O0" "-O1" } { = "" } } > +! The ACC_DEVICE_TYPE environment variable gets set in the testing > +! framework, and that overrides the "avoid offloading" flag at run time. > +! { dg-xfail-run-if "TODO" { openacc_nvidia_accel_selected } } > + > + IMPLICIT NONE > + INCLUDE "openacc_lib.h" > + > + INTEGER, VOLATILE :: X > + LOGICAL :: Y > + > +!$ACC DATA COPYOUT(X, Y) > +!$ACC KERNELS ! { dg-warning "OpenACC kernels construct will be executed= sequentially; will by default avoid offloading to prevent data copy penalt= y" "" { target nvptx_offloading_configured } } > + X =3D 33 > + Y =3D ACC_ON_DEVICE (ACC_DEVICE_HOST); > +!$ACC END KERNELS > +!$ACC END DATA > + > + IF (X .NE. 33) CALL ABORT > +#if defined ACC_DEVICE_TYPE_host || defined ACC_DEVICE_TYPE_nvidia > + IF (.NOT. Y) CALL ABORT > +#else > +# error Not ported to this ACC_DEVICE_TYPE > +#endif > + > + END > diff --git libgomp/testsuite/libgomp.oacc-fortran/avoid-offloading-2.f li= bgomp/testsuite/libgomp.oacc-fortran/avoid-offloading-2.f > new file mode 100644 > index 0000000..51801ad > --- /dev/null > +++ libgomp/testsuite/libgomp.oacc-fortran/avoid-offloading-2.f > @@ -0,0 +1,41 @@ > +! Test that a user can override the compiler's "avoid offloading" > +! decision at run time. > + > +! { dg-do run } > +! { dg-additional-options "-cpp" } > +! { dg-additional-options "-ftree-parallelize-loops=3D32" } > +! The "avoid offloading" warning is only triggered for -O2 and higher. > +! { dg-xfail-if "n/a" { nvptx_offloading_configured } { "-O0" "-O1" } { = "" } } > + > + IMPLICIT NONE > + INCLUDE "openacc_lib.h" > + > + INTEGER :: D > + INTEGER, VOLATILE :: X > + LOGICAL :: Y > + > +! Override the compiler's "avoid offloading" decision. > +#if defined ACC_DEVICE_TYPE_nvidia > + D =3D ACC_DEVICE_NVIDIA > +#elif defined ACC_DEVICE_TYPE_host > + D =3D ACC_DEVICE_HOST > +#else > +# error Not ported to this ACC_DEVICE_TYPE > +#endif > + CALL ACC_INIT (D) > + > +!$ACC DATA COPYOUT(X, Y) > +!$ACC KERNELS ! { dg-warning "OpenACC kernels construct will be executed= sequentially; will by default avoid offloading to prevent data copy penalt= y" "" { target nvptx_offloading_configured } } > + X =3D 33 > + Y =3D ACC_ON_DEVICE (ACC_DEVICE_HOST) > +!$ACC END KERNELS > +!$ACC END DATA > + > + IF (X .NE. 33) CALL ABORT > +#if defined ACC_DEVICE_TYPE_nvidia > + IF (Y) CALL ABORT > +#else > + IF (.NOT. Y) CALL ABORT > +#endif > + > + END > diff --git libgomp/testsuite/libgomp.oacc-fortran/avoid-offloading-3.f li= bgomp/testsuite/libgomp.oacc-fortran/avoid-offloading-3.f > new file mode 100644 > index 0000000..bea6ab8 > --- /dev/null > +++ libgomp/testsuite/libgomp.oacc-fortran/avoid-offloading-3.f > @@ -0,0 +1,31 @@ > +! Test that a user can override the compiler's "avoid offloading" > +! decision at compile time. > + > +! { dg-do run } > +! { dg-additional-options "-cpp" } > +! { dg-additional-options "-ftree-parallelize-loops=3D32" } > +! Override the compiler's "avoid offloading" decision. > +! { dg-additional-options "-foffload-force" } > + > + IMPLICIT NONE > + INCLUDE "openacc_lib.h" > + > + INTEGER :: D > + INTEGER, VOLATILE :: X > + LOGICAL :: Y > + > +!$ACC DATA COPYOUT(X, Y) > +!$ACC KERNELS > + X =3D 33 > + Y =3D ACC_ON_DEVICE (ACC_DEVICE_HOST) > +!$ACC END KERNELS > +!$ACC END DATA > + > + IF (X .NE. 33) CALL ABORT > +#if defined ACC_DEVICE_TYPE_nvidia > + IF (Y) CALL ABORT > +#else > + IF (.NOT. Y) CALL ABORT > +#endif > + > + END > diff --git libgomp/testsuite/libgomp.oacc-fortran/combined-directives-1.f= 90 libgomp/testsuite/libgomp.oacc-fortran/combined-directives-1.f90 > index 94100b2..4b52579 100644 > --- libgomp/testsuite/libgomp.oacc-fortran/combined-directives-1.f90 > +++ libgomp/testsuite/libgomp.oacc-fortran/combined-directives-1.f90 > @@ -1,6 +1,9 @@ > ! This test exercises combined directives. >=20=20 > ! { dg-do run } > +! { dg-additional-options "-ftree-parallelize-loops=3D32" } > +! The "avoid offloading" warning is only triggered for -O2 and higher. > +! { dg-xfail-if "n/a" { nvptx_offloading_configured } { "-O0" "-O1" } { = "" } } >=20=20 > program main > integer, parameter :: n =3D 32 > @@ -27,7 +30,7 @@ program main > !$acc kernels loop copy (a(1:n)) copy (b(1:n)) > do i =3D 1, n > b(i) =3D 3.0; > - a(i) =3D a(i) + b(i) > + a(i) =3D a(i) + b(i) ! { dg-warning "OpenACC kernels construct will = be executed sequentially; will by default avoid offloading to prevent data = copy penalty" "" { target nvptx_offloading_configured } } > end do >=20=20 > do i =3D 1, n > diff --git libgomp/testsuite/libgomp.oacc-fortran/non-scalar-data.f90 lib= gomp/testsuite/libgomp.oacc-fortran/non-scalar-data.f90 > index 4afb562..b9298c7 100644 > --- libgomp/testsuite/libgomp.oacc-fortran/non-scalar-data.f90 > +++ libgomp/testsuite/libgomp.oacc-fortran/non-scalar-data.f90 > @@ -2,6 +2,9 @@ > ! offloaded regions are properly mapped using present_or_copy. >=20=20 > ! { dg-do run } > +! { dg-additional-options "-ftree-parallelize-loops=3D32" } > +! The "avoid offloading" warning is only triggered for -O2 and higher. > +! { dg-xfail-if "n/a" { nvptx_offloading_configured } { "-O0" "-O1" } { = "" } } >=20=20 > program main > implicit none > @@ -30,7 +33,7 @@ subroutine kernels (array, n) > integer, dimension (n) :: array > integer :: n, i >=20=20 > - !$acc kernels > + !$acc kernels ! { dg-warning "OpenACC kernels construct will be execut= ed sequentially; will by default avoid offloading to prevent data copy pena= lty" "" { target nvptx_offloading_configured } } > do i =3D 1, n > array(i) =3D i > end do Gr=C3=BC=C3=9Fe Thomas