From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from esa1.mentor.iphmx.com (esa1.mentor.iphmx.com [68.232.129.153]) by sourceware.org (Postfix) with ESMTPS id 802ED3857017; Tue, 26 Sep 2023 22:58:10 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 802ED3857017 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=codesourcery.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mentor.com X-CSE-ConnectionGUID: OJCSjKVYTvKEHamYy0ZH+Q== X-CSE-MsgGUID: JKXDivyzSjiEbNYvzonROw== X-IronPort-AV: E=Sophos;i="6.03,179,1694764800"; d="scan'208";a="20247481" Received: from orw-gwy-01-in.mentorg.com ([192.94.38.165]) by esa1.mentor.iphmx.com with ESMTP; 26 Sep 2023 14:58:09 -0800 IronPort-SDR: 96X0JiOKH2ur5J4lmDmIFJDUXb3Kpb6uXsNdx5e39J0K5Um52gljnpfdPa+sWehAE19njY5NhF hOOybl8jpT7tSk72KJ1xxTuPC/3rWz7HNKWeVrDr52NlvmUf742T0bn6mCHQqOJB3XqvdJcUak dSicRIp/yEezxdmE6D62BgeCT73ZzlplTM/yMzRKIo+ffpKTeLzrZPN0W5gabr1jEAVk4RFYM9 NLynR9u7wf1oxf1XkT9/RSJpZVXNWwvKtAjKsKrTckxkhhmUqQUkpiFCgSLA7E2NPgB8w0Z+dC NJQ= From: Thomas Schwinge To: Julian Brown CC: , , , , Tom de Vries Subject: Re: [PATCH 1/5] OpenMP, NVPTX: memcpy[23]D bias correction In-Reply-To: References: User-Agent: Notmuch/0.29.3+94~g74c3f1b (https://notmuchmail.org) Emacs/28.2 (x86_64-pc-linux-gnu) Date: Wed, 27 Sep 2023 00:57:58 +0200 Message-ID: <87sf704k5l.fsf@euler.schwinge.homeip.net> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable X-Originating-IP: [137.202.0.90] X-ClientProxiedBy: svr-ies-mbx-11.mgc.mentorg.com (139.181.222.11) To svr-ies-mbx-10.mgc.mentorg.com (139.181.222.10) X-Spam-Status: No, score=-11.8 required=5.0 tests=BAYES_00,GIT_PATCH_0,HEADER_FROM_DIFFERENT_DOMAINS,KAM_DMARC_STATUS,SPF_HELO_PASS,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Hi Julian! On 2023-09-06T02:34:30-0700, Julian Brown wrote: > This patch works around behaviour of the 2D and 3D memcpy operations in > the CUDA driver runtime. Particularly in Fortran, the "base pointer" > of an array (used for either source or destination of a host/device copy) > may lie outside of data that is actually stored on the device. The fix > is to make sure that we use the first element of data to be transferred > instead, and adjust parameters accordingly. Do you (a) have a stand-alone test case for this (that is, not depending on your other pending patches, so that this could go in directly -- together with the before-FAIL test case). Do you (b) know if is this a bug in our use of the CUDA Driver API or rather in CUDA itself? If the latter, have you reported this to Nvidia? (I didn't quickly understand "cuMemcpy2D" etc.) Gr=C3=BC=C3=9Fe Thomas > 2023-09-05 Julian Brown > > libgomp/ > * plugin/plugin-nvptx.c (GOMP_OFFLOAD_memcpy2d): Adjust parameters = to > avoid out-of-bounds array checks in CUDA runtime. > (GOMP_OFFLOAD_memcpy3d): Likewise. > --- > libgomp/plugin/plugin-nvptx.c | 67 +++++++++++++++++++++++++++++++++++ > 1 file changed, 67 insertions(+) > > diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.= c > index 00d4241ae02b..cefe288a8aab 100644 > --- a/libgomp/plugin/plugin-nvptx.c > +++ b/libgomp/plugin/plugin-nvptx.c > @@ -1827,6 +1827,35 @@ GOMP_OFFLOAD_memcpy2d (int dst_ord, int src_ord, s= ize_t dim1_size, > data.srcXInBytes =3D src_offset1_size; > data.srcY =3D src_offset0_len; > > + if (data.srcXInBytes !=3D 0 || data.srcY !=3D 0) > + { > + /* Adjust origin to the actual array data, else the CUDA 2D memory > + copy API calls below may fail to validate source/dest pointers > + correctly (especially for Fortran where the "virtual origin" of an > + array is often outside the stored data). */ > + if (src_ord =3D=3D -1) > + data.srcHost =3D (const void *) ((const char *) data.srcHost > + + data.srcY * data.srcPitch > + + data.srcXInBytes); > + else > + data.srcDevice +=3D data.srcY * data.srcPitch + data.srcXInBytes; > + data.srcXInBytes =3D 0; > + data.srcY =3D 0; > + } > + > + if (data.dstXInBytes !=3D 0 || data.dstY !=3D 0) > + { > + /* As above. */ > + if (dst_ord =3D=3D -1) > + data.dstHost =3D (void *) ((char *) data.dstHost > + + data.dstY * data.dstPitch > + + data.dstXInBytes); > + else > + data.dstDevice +=3D data.dstY * data.dstPitch + data.dstXInBytes; > + data.dstXInBytes =3D 0; > + data.dstY =3D 0; > + } > + > CUresult res =3D CUDA_CALL_NOCHECK (cuMemcpy2D, &data); > if (res =3D=3D CUDA_ERROR_INVALID_VALUE) > /* If pitch > CU_DEVICE_ATTRIBUTE_MAX_PITCH or for device-to-device > @@ -1895,6 +1924,44 @@ GOMP_OFFLOAD_memcpy3d (int dst_ord, int src_ord, s= ize_t dim2_size, > data.srcY =3D src_offset1_len; > data.srcZ =3D src_offset0_len; > > + if (data.srcXInBytes !=3D 0 || data.srcY !=3D 0 || data.srcZ !=3D 0) > + { > + /* Adjust origin to the actual array data, else the CUDA 3D memory > + copy API call below may fail to validate source/dest pointers > + correctly (especially for Fortran where the "virtual origin" of an > + array is often outside the stored data). */ > + if (src_ord =3D=3D -1) > + data.srcHost > + =3D (const void *) ((const char *) data.srcHost > + + (data.srcZ * data.srcHeight + data.srcY) > + * data.srcPitch > + + data.srcXInBytes); > + else > + data.srcDevice > + +=3D (data.srcZ * data.srcHeight + data.srcY) * data.srcPitch > + + data.srcXInBytes; > + data.srcXInBytes =3D 0; > + data.srcY =3D 0; > + data.srcZ =3D 0; > + } > + > + if (data.dstXInBytes !=3D 0 || data.dstY !=3D 0 || data.dstZ !=3D 0) > + { > + /* As above. */ > + if (dst_ord =3D=3D -1) > + data.dstHost =3D (void *) ((char *) data.dstHost > + + (data.dstZ * data.dstHeight + data.dstY) > + * data.dstPitch > + + data.dstXInBytes); > + else > + data.dstDevice > + +=3D (data.dstZ * data.dstHeight + data.dstY) * data.dstPitch > + + data.dstXInBytes; > + data.dstXInBytes =3D 0; > + data.dstY =3D 0; > + data.dstZ =3D 0; > + } > + > CUDA_CALL (cuMemcpy3D, &data); > return true; > } > -- > 2.41.0 ----------------- Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstra=C3=9Fe 201= , 80634 M=C3=BCnchen; Gesellschaft mit beschr=C3=A4nkter Haftung; Gesch=C3= =A4ftsf=C3=BChrer: Thomas Heurung, Frank Th=C3=BCrauf; Sitz der Gesellschaf= t: M=C3=BCnchen; Registergericht M=C3=BCnchen, HRB 106955