From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 799 invoked by alias); 19 Oct 2015 16:24:54 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 734 invoked by uid 89); 19 Oct 2015 16:24:51 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.3 required=5.0 tests=AWL,BAYES_00,RCVD_IN_DNSWL_LOW,SPF_PASS autolearn=ham version=3.3.2 X-HELO: relay1.mentorg.com Received: from relay1.mentorg.com (HELO relay1.mentorg.com) (192.94.38.131) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Mon, 19 Oct 2015 16:24:48 +0000 Received: from nat-ies.mentorg.com ([192.94.31.2] helo=SVR-IES-FEM-01.mgc.mentorg.com) by relay1.mentorg.com with esmtp id 1ZoDEq-0001qO-4Y from Thomas_Schwinge@mentor.com ; Mon, 19 Oct 2015 09:24:44 -0700 Received: from feldtkeller.schwinge.homeip.net (137.202.0.76) by SVR-IES-FEM-01.mgc.mentorg.com (137.202.0.104) with Microsoft SMTP Server id 14.3.224.2; Mon, 19 Oct 2015 17:24:42 +0100 From: Thomas Schwinge To: Chung-Lin Tang CC: , Kirill Yukhin , Jakub Jelinek , Ilya Verbin Subject: OpenACC async clause regressions (was: [gomp4.1] Add new versions of GOMP_target{,_data,_update} and GOMP_target_enter_exit_data) In-Reply-To: <20150623115139.GN10247@tucnak.redhat.com> References: <20150615122037.GA45068@msticlxl57.ims.intel.com> <20150615130609.GR10247@tucnak.redhat.com> <20150615161827.GB45068@msticlxl57.ims.intel.com> <20150615162528.GU10247@tucnak.redhat.com> <20150615194850.GC45068@msticlxl57.ims.intel.com> <20150615195840.GZ10247@tucnak.redhat.com> <20150619213514.GA23723@msticlxl57.ims.intel.com> <20150623114043.GC18789@msticlxl57.ims.intel.com> <20150623115139.GN10247@tucnak.redhat.com> User-Agent: Notmuch/0.9-125-g4686d11 (http://notmuchmail.org) Emacs/24.5.1 (i586-pc-linux-gnu) Date: Mon, 19 Oct 2015 16:33:00 -0000 Message-ID: <87pp0aaksc.fsf@kepler.schwinge.homeip.net> MIME-Version: 1.0 Content-Type: multipart/signed; boundary="==-=-="; micalg=pgp-sha1; protocol="application/pgp-signature" X-SW-Source: 2015-10/txt/msg01768.txt.bz2 --==-=-= Content-Type: multipart/mixed; boundary="=-=-=" --=-=-= Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Content-length: 5202 Hi! Chung-Lin, would you please have a look at the following (on gomp-4_0-branch)? Also, anyone else got any ideas off-hand? On Tue, 23 Jun 2015 13:51:39 +0200, Jakub Jelinek wrote: > On Tue, Jun 23, 2015 at 02:40:43PM +0300, Ilya Verbin wrote: > > On Sat, Jun 20, 2015 at 00:35:14 +0300, Ilya Verbin wrote: > > > Given that a mapped variable in 4.1 can have different kinds across n= ested data > > > regions, we need to store map-type not only for each var, but also fo= r each > > > structured mapping. Here is my WIP patch, is it sane? :) > > > Attached testcase works OK on the device with non-shared memory. > >=20 > > A bit updated version with a fix for GOMP_MAP_TO_PSET. > > make check-target-libgomp passed. >=20 > Ok, thanks. >=20 > > include/gcc/ > > * gomp-constants.h (GOMP_MAP_ALWAYS_TO_P, > > GOMP_MAP_ALWAYS_FROM_P): Define. > > libgomp/ > > * libgomp.h (struct target_var_desc): New. > > (struct target_mem_desc): Replace array of splay_tree_key with array of > > target_var_desc. > > (struct splay_tree_key_s): Move copy_from to target_var_desc. > > * oacc-mem.c (gomp_acc_remove_pointer): Use copy_from from > > target_var_desc. > > * oacc-parallel.c (GOACC_parallel): Use copy_from from target_var_desc. > > * target.c (gomp_map_vars_existing): Copy data to device if map-type is > > 'always to' or 'always tofrom'. > > (gomp_map_vars): Use key from target_var_desc. Set copy_from and > > always_copy_from. > > (gomp_copy_from_async): Use key and copy_from from target_var_desc. > > (gomp_unmap_vars): Copy data from device if always_copy_from is set. > > (gomp_offload_image_to_device): Do not use copy_from. > > * testsuite/libgomp.c/target-11.c: New test. (That's gomp-4_1-branch r224838. The attached gomp-4_1-branch-r224838.patch is a variant that applies on top of gomp-4_0-branch r228972.) This change introduces regressions in OpenACC async clause handling. Testing on gomp-4_1-branch r224838: PASS: libgomp.oacc-c/../libgomp.oacc-c-c++-common/data-2.c -DACC_DEVICE= _TYPE_nvidia=3D1 -DACC_MEM_SHARED=3D0 (test for excess errors) [-PASS:-]{+FAIL:+} libgomp.oacc-c/../libgomp.oacc-c-c++-common/data-2.c= -DACC_DEVICE_TYPE_nvidia=3D1 -DACC_MEM_SHARED=3D0 execution test PASS: libgomp.oacc-c/../libgomp.oacc-c-c++-common/data-3.c -DACC_DEVICE= _TYPE_nvidia=3D1 -DACC_MEM_SHARED=3D0 (test for excess errors) [-PASS:-]{+FAIL:+} libgomp.oacc-c/../libgomp.oacc-c-c++-common/data-3.c= -DACC_DEVICE_TYPE_nvidia=3D1 -DACC_MEM_SHARED=3D0 execution test Same for C++. Testing on gomp-4_0-branch r228972 plus the attached gomp-4_1-branch-r224838.patch: PASS: libgomp.oacc-c/../libgomp.oacc-c-c++-common/asyncwait-1.c -DACC_D= EVICE_TYPE_nvidia=3D1 -DACC_MEM_SHARED=3D0 -foffload=3Dnvptx-none (test for= excess errors) [-PASS:-]{+FAIL:+} libgomp.oacc-c/../libgomp.oacc-c-c++-common/asyncwai= t-1.c -DACC_DEVICE_TYPE_nvidia=3D1 -DACC_MEM_SHARED=3D0 -foffload=3Dnvptx-n= one execution test Same for C++. As I mentioned in , all three regressions are visible when testing on trunk r228777. I have not analyzed why the three different branches show different sets of regressions -- I'm hoping they're all manifestations of the same underlying problem: they're all using the OpenACC async clause. Looking at gomp-4_0-branch r228972 plus the attached gomp-4_1-branch-r224838.patch, clearly there is "some kind of data corruption": $ gdb -q a.out=20 Reading symbols from a.out...done. (gdb) start [...] 25 a =3D (float *) malloc (nbytes); (gdb) n 26 b =3D (float *) malloc (nbytes); (gdb) print a $1 =3D (float *) 0xab12c0 (gdb) c Continuing. =20=20=20=20 Program received signal SIGSEGV, Segmentation fault. 0x00000000004015d2 in main (argc=3D1, argv=3D0x7fffffffd408) at source-= gcc/libgomp/testsuite/libgomp.oacc-c-c++-common/asyncwait-1.c:133 133 if (a[i] !=3D 3.0) (gdb) print a $2 =3D (float *) 0x500680620 0x500680620 looks like a nvptx device pointer to me, which is a) wrong (after the "malloc", "a" shouldn't change its value throughout program execution), and b) that "explains" the segmentation fault (device pointer dereferenced in host code). So, maybe data is erroneously being copied back to the host from device, or from libgomp internal data structures. Maybe some copy_from flag handling needs to be adjusted or added in the OpenACC code in libgomp? I have no idea whether that's related, but I noticed that currently we're not in any way handling async_refcount in libgomp/oacc-*.c -- do we have to? (Its name certainly makes me believe it's related to asynchronous data (un-)mapping.) Should we be able to drop some of the OpenACC-specific async implementation in libgomp, and use new/generic target.c code instead? Please note that there will be further libgomp changes (target.c, and other files) coming in later merges from gomp-4_1-branch, so please for now just work on identifying/resolving the regression, and let any code refactoring wait for later. Gr=C3=BC=C3=9Fe Thomas --=-=-= Content-Type: text/x-diff Content-Disposition: inline; filename=gomp-4_1-branch-r224838.patch Content-Transfer-Encoding: quoted-printable Content-length: 11791 diff --git include/gomp-constants.h include/gomp-constants.h index b55f68b..540a31e 100644 --- include/gomp-constants.h +++ include/gomp-constants.h @@ -111,6 +111,12 @@ enum gomp_map_kind #define GOMP_MAP_POINTER_P(X) \ ((X) =3D=3D GOMP_MAP_POINTER) =20 +#define GOMP_MAP_ALWAYS_TO_P(X) \ + (((X) =3D=3D GOMP_MAP_ALWAYS_TO) || ((X) =3D=3D GOMP_MAP_ALWAYS_TOFROM)) + +#define GOMP_MAP_ALWAYS_FROM_P(X) \ + (((X) =3D=3D GOMP_MAP_ALWAYS_FROM) || ((X) =3D=3D GOMP_MAP_ALWAYS_TOFROM= )) + =20 /* Asynchronous behavior. Keep in sync with libgomp/{openacc.h,openacc.f90,openacc_lib.h}:acc_async_t. */ diff --git libgomp/libgomp.h libgomp/libgomp.h index d86da7d..8fd7d08 100644 --- libgomp/libgomp.h +++ libgomp/libgomp.h @@ -641,6 +641,15 @@ typedef struct splay_tree_node_s *splay_tree_node; typedef struct splay_tree_s *splay_tree; typedef struct splay_tree_key_s *splay_tree_key; =20 +struct target_var_desc { + /* Splay key. */ + splay_tree_key key; + /* True if data should be copied from device to host at the end. */ + bool copy_from; + /* True if data always should be copied from device to host at the end. = */ + bool always_copy_from; +}; + struct target_mem_desc { /* Reference count. */ uintptr_t refcount; @@ -660,9 +669,9 @@ struct target_mem_desc { /* Corresponding target device descriptor. */ struct gomp_device_descr *device_descr; =20 - /* List of splay keys to remove (or decrease refcount) + /* List of target items to remove (or decrease refcount) at the end of region. */ - splay_tree_key list[]; + struct target_var_desc list[]; }; =20 struct splay_tree_key_s { @@ -678,8 +687,6 @@ struct splay_tree_key_s { uintptr_t refcount; /* Asynchronous reference count. */ uintptr_t async_refcount; - /* True if data should be copied from device to host at the end. */ - bool copy_from; }; =20 #include "splay-tree.h" diff --git libgomp/oacc-mem.c libgomp/oacc-mem.c index 7fcf199..a90c912 100644 --- libgomp/oacc-mem.c +++ libgomp/oacc-mem.c @@ -685,7 +685,7 @@ gomp_acc_remove_pointer (void *h, bool force_copyfrom, = int async, int mapnum) } } =20 - t->list[0]->copy_from =3D force_copyfrom ? 1 : 0; + t->list[0].copy_from =3D force_copyfrom ? 1 : 0; =20 gomp_mutex_unlock (&acc_dev->lock); =20 diff --git libgomp/oacc-parallel.c libgomp/oacc-parallel.c index 2b90c9f..e4ecc87 100644 --- libgomp/oacc-parallel.c +++ libgomp/oacc-parallel.c @@ -262,9 +262,9 @@ GOACC_parallel_keyed (int device, void (*fn) (void *), devaddrs =3D gomp_alloca (sizeof (void *) * mapnum); for (i =3D 0; i < mapnum; i++) { - if (tgt->list[i] !=3D NULL) - devaddrs[i] =3D (void *) (tgt->list[i]->tgt->tgt_start - + tgt->list[i]->tgt_offset); + if (tgt->list[i].key !=3D NULL) + devaddrs[i] =3D (void *) (tgt->list[i].key->tgt->tgt_start + + tgt->list[i].key->tgt_offset); else devaddrs[i] =3D NULL; } diff --git libgomp/target.c libgomp/target.c index 4587361..c2e1996 100644 --- libgomp/target.c +++ libgomp/target.c @@ -168,6 +168,12 @@ gomp_map_vars_existing (struct gomp_device_descr *devi= cep, splay_tree_key oldn, (void *) newn->host_start, (void *) newn->host_end, (void *) oldn->host_start, (void *) oldn->host_end); } + + if (GOMP_MAP_ALWAYS_TO_P (kind)) + devicep->host2dev_func (devicep->target_id, + (void *) (oldn->tgt->tgt_start + oldn->tgt_offset), + (void *) newn->host_start, + newn->host_end - newn->host_start); oldn->refcount++; } =20 @@ -267,7 +273,7 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_= t mapnum, int kind =3D get_kind (short_mapkind, kinds, i); if (hostaddrs[i] =3D=3D NULL) { - tgt->list[i] =3D NULL; + tgt->list[i].key =3D NULL; continue; } cur_node.host_start =3D (uintptr_t) hostaddrs[i]; @@ -278,12 +284,15 @@ gomp_map_vars (struct gomp_device_descr *devicep, siz= e_t mapnum, splay_tree_key n =3D splay_tree_lookup (mem_map, &cur_node); if (n) { - tgt->list[i] =3D n; + tgt->list[i].key =3D n; + tgt->list[i].copy_from =3D GOMP_MAP_COPY_FROM_P (kind & typemask); + tgt->list[i].always_copy_from + =3D GOMP_MAP_ALWAYS_FROM_P (kind & typemask); gomp_map_vars_existing (devicep, n, &cur_node, kind & typemask); } else { - tgt->list[i] =3D NULL; + tgt->list[i].key =3D NULL; =20 size_t align =3D (size_t) 1 << (kind >> rshift); not_found_cnt++; @@ -304,7 +313,7 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_= t mapnum, break; else { - tgt->list[j] =3D NULL; + tgt->list[j].key =3D NULL; i++; } } @@ -352,7 +361,7 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_= t mapnum, size_t j; =20 for (i =3D 0; i < mapnum; i++) - if (tgt->list[i] =3D=3D NULL) + if (tgt->list[i].key =3D=3D NULL) { int kind =3D get_kind (short_mapkind, kinds, i); if (hostaddrs[i] =3D=3D NULL) @@ -366,18 +375,23 @@ gomp_map_vars (struct gomp_device_descr *devicep, siz= e_t mapnum, splay_tree_key n =3D splay_tree_lookup (mem_map, k); if (n) { - tgt->list[i] =3D n; + tgt->list[i].key =3D n; + tgt->list[i].copy_from =3D GOMP_MAP_COPY_FROM_P (kind & typemask); + tgt->list[i].always_copy_from + =3D GOMP_MAP_ALWAYS_FROM_P (kind & typemask); gomp_map_vars_existing (devicep, n, k, kind & typemask); } else { size_t align =3D (size_t) 1 << (kind >> rshift); - tgt->list[i] =3D k; + tgt->list[i].key =3D k; tgt_size =3D (tgt_size + align - 1) & ~(align - 1); k->tgt =3D tgt; k->tgt_offset =3D tgt_size; tgt_size +=3D k->host_end - k->host_start; - k->copy_from =3D GOMP_MAP_COPY_FROM_P (kind & typemask); + tgt->list[i].copy_from =3D GOMP_MAP_COPY_FROM_P (kind & typemask); + tgt->list[i].always_copy_from + =3D GOMP_MAP_ALWAYS_FROM_P (kind & typemask); k->refcount =3D 1; k->async_refcount =3D 0; tgt->refcount++; @@ -395,6 +409,8 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_= t mapnum, case GOMP_MAP_TOFROM: case GOMP_MAP_FORCE_TO: case GOMP_MAP_FORCE_TOFROM: + case GOMP_MAP_ALWAYS_TO: + case GOMP_MAP_ALWAYS_TOFROM: /* FIXME: Perhaps add some smarts, like if copying several adjacent fields from host to target, use some host buffer to avoid sending each var individually. */ @@ -427,7 +443,9 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_= t mapnum, break; else { - tgt->list[j] =3D k; + tgt->list[j].key =3D k; + tgt->list[j].copy_from =3D false; + tgt->list[j].always_copy_from =3D false; k->refcount++; gomp_map_pointer (tgt, (uintptr_t) *(void **) hostaddrs[j], @@ -479,11 +497,11 @@ gomp_map_vars (struct gomp_device_descr *devicep, siz= e_t mapnum, { for (i =3D 0; i < mapnum; i++) { - if (tgt->list[i] =3D=3D NULL) + if (tgt->list[i].key =3D=3D NULL) cur_node.tgt_offset =3D (uintptr_t) NULL; else - cur_node.tgt_offset =3D tgt->list[i]->tgt->tgt_start - + tgt->list[i]->tgt_offset; + cur_node.tgt_offset =3D tgt->list[i].key->tgt->tgt_start + + tgt->list[i].key->tgt_offset; /* FIXME: see above FIXME comment. */ devicep->host2dev_func (devicep->target_id, (void *) (tgt->tgt_start @@ -523,17 +541,17 @@ gomp_copy_from_async (struct target_mem_desc *tgt) gomp_mutex_lock (&devicep->lock); =20 for (i =3D 0; i < tgt->list_count; i++) - if (tgt->list[i] =3D=3D NULL) + if (tgt->list[i].key =3D=3D NULL) ; - else if (tgt->list[i]->refcount > 1) + else if (tgt->list[i].key->refcount > 1) { - tgt->list[i]->refcount--; - tgt->list[i]->async_refcount++; + tgt->list[i].key->refcount--; + tgt->list[i].key->async_refcount++; } else { - splay_tree_key k =3D tgt->list[i]; - if (k->copy_from) + splay_tree_key k =3D tgt->list[i].key; + if (tgt->list[i].copy_from) devicep->dev2host_func (devicep->target_id, (void *) k->host_start, (void *) (k->tgt->tgt_start + k->tgt_offset), k->host_end - k->host_start); @@ -561,25 +579,33 @@ gomp_unmap_vars (struct target_mem_desc *tgt, bool do= _copyfrom) =20 size_t i; for (i =3D 0; i < tgt->list_count; i++) - if (tgt->list[i] =3D=3D NULL) - ; - else if (tgt->list[i]->refcount > 1) - tgt->list[i]->refcount--; - else if (tgt->list[i]->async_refcount > 0) - tgt->list[i]->async_refcount--; - else - { - splay_tree_key k =3D tgt->list[i]; - if (k->copy_from && do_copyfrom) - devicep->dev2host_func (devicep->target_id, (void *) k->host_start, - (void *) (k->tgt->tgt_start + k->tgt_offset), - k->host_end - k->host_start); - splay_tree_remove (&devicep->mem_map, k); - if (k->tgt->refcount > 1) - k->tgt->refcount--; - else - gomp_unmap_tgt (k->tgt); - } + { + splay_tree_key k =3D tgt->list[i].key; + if (k =3D=3D NULL) + continue; + + bool do_unmap =3D false; + if (k->refcount > 1) + k->refcount--; + else if (k->async_refcount > 0) + k->async_refcount--; + else + do_unmap =3D true; + + if ((do_unmap && do_copyfrom && tgt->list[i].copy_from) + || tgt->list[i].always_copy_from) + devicep->dev2host_func (devicep->target_id, (void *) k->host_start, + (void *) (k->tgt->tgt_start + k->tgt_offset), + k->host_end - k->host_start); + if (do_unmap) + { + splay_tree_remove (&devicep->mem_map, k); + if (k->tgt->refcount > 1) + k->tgt->refcount--; + else + gomp_unmap_tgt (k->tgt); + } + } =20 if (tgt->refcount > 1) tgt->refcount--; @@ -714,8 +740,7 @@ gomp_load_image_to_device (struct gomp_device_descr *de= vicep, unsigned version, k->tgt_offset =3D target_table[i].start; k->refcount =3D 1; k->async_refcount =3D 0; - k->copy_from =3D false; - tgt->list[i] =3D k; + tgt->list[i].key =3D k; tgt->refcount++; array->left =3D NULL; array->right =3D NULL; @@ -742,8 +767,7 @@ gomp_load_image_to_device (struct gomp_device_descr *de= vicep, unsigned version, k->tgt_offset =3D target_var->start; k->refcount =3D 1; k->async_refcount =3D 0; - k->copy_from =3D false; - tgt->list[i] =3D k; + tgt->list[i].key =3D k; tgt->refcount++; array->left =3D NULL; array->right =3D NULL; diff --git libgomp/testsuite/libgomp.c/target-11.c libgomp/testsuite/libgom= p.c/target-11.c new file mode 100644 index 0000000..4562d88 --- /dev/null +++ libgomp/testsuite/libgomp.c/target-11.c @@ -0,0 +1,51 @@ +/* { dg-require-effective-target offload_device } */ + +#include + +int main () +{ + int aa =3D 0, bb =3D 0, cc =3D 0, dd =3D 0; + + #pragma omp target data map(tofrom: aa) map(to: bb) map(from: cc, dd) + { + int ok; + aa =3D bb =3D cc =3D 1; + + /* Set dd on target to 0 for the further check. */ + #pragma omp target map(always to: dd) + { dd; } + + dd =3D 1; + #pragma omp target map(tofrom: aa) map(always to: bb) \ + map(always from: cc) map(to: dd) map(from: ok) + { + /* bb is always to, aa and dd are not. */ + ok =3D (aa =3D=3D 0) && (bb =3D=3D 1) && (dd =3D=3D 0); + aa =3D bb =3D cc =3D dd =3D 2; + } + + assert (ok); + assert (aa =3D=3D 1); + assert (bb =3D=3D 1); + assert (cc =3D=3D 2); /* cc is always from. */ + assert (dd =3D=3D 1); + + dd =3D 3; + #pragma omp target map(from: cc) map(always to: dd) map(from: ok) + { + ok =3D (dd =3D=3D 3); /* dd is always to. */ + cc =3D dd =3D 4; + } + + assert (ok); + assert (cc =3D=3D 2); + assert (dd =3D=3D 3); + } + + assert (aa =3D=3D 2); + assert (bb =3D=3D 1); + assert (cc =3D=3D 4); + assert (dd =3D=3D 4); + + return 0; +} --=-=-=-- --==-=-= Content-Type: application/pgp-signature; name="signature.asc" Content-length: 472 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAEBAgAGBQJWJRlDAAoJEK3/DN1sMFFtMMUH/Rn0mJf+ZZIImH1gaahoAVxe LQmYD54YXHfzOi13x1kfjRW0G+iEu41DhCPWi/fVlFt49etpQ2OQRYoANZEtHr52 wn/PGNplc2Zt0hvPWyIwfCTA8hl+nY3ZDhMg+nxWFxrRagqpOcG5e4EoJHmYcA46 +MMH2tAMPMtIXNTOylawGgpBT9WL0mKIEdV+2MCM7V0oOjM9JhRwO8qGfWqrxWry C285fGYsSJWWx9019J56XQvOAD/9KclAzCEIDt3NWr7P0jLxCut+Wbumh0DTc7qX EOurtnVBb56TZe3C7L49zSfSxNe+eu/FXUw6FMU7HhRgpSW5IG6BAg2vs52EkD8= =oOPb -----END PGP SIGNATURE----- --==-=-=--