From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from esa4.mentor.iphmx.com (esa4.mentor.iphmx.com [68.232.137.252]) by sourceware.org (Postfix) with ESMTPS id EDA58385840D; Tue, 19 Dec 2023 20:45:51 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org EDA58385840D Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=codesourcery.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mentor.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org EDA58385840D Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=68.232.137.252 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1703018754; cv=none; b=Y1DpvW1Nlwp83LYRI2TwdBa/fdHCZgrDKddzRqrw0UU/mbtUBUNAz6x+PU8G9QHyXZrZngFa9AC00giPSyjmwfm+6Xygs5UyCC+LysRSwHKO5fUaJcjhPnwWDJsytiKBUlmYZqwj600M5SrLHWe8kfRKS8rSRYyTiScARvptxGQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1703018754; c=relaxed/simple; bh=o90DR4rSaEMqWBfVHE0fyP/XuYy3BUbKjw22kNOhLlk=; h=Message-ID:Date:MIME-Version:Subject:To:From; b=Bujzw7+Scj+4d+tpwdqURTTnRJvUGeEcO4P5ooose7J26SuHNr8nNA7pjObKF7N5Ww7dIlhgEwCgTPHS/YXyhq2JGs2JfGq6rDuqFNnqNuQnclyBCA6GFcj5P/se3y10ygxWgPR/N1V4JMbfRlfBJ+ZML+y7z+QRD0/2hkjCBbg= ARC-Authentication-Results: i=1; server2.sourceware.org X-CSE-ConnectionGUID: WmN2VtoOTDiOc0zTVVMSmg== X-CSE-MsgGUID: RAjPvsH3RDCG1C4JT6V+QA== X-IronPort-AV: E=Sophos;i="6.04,289,1695715200"; d="scan'208";a="25751356" Received: from orw-gwy-02-in.mentorg.com ([192.94.38.167]) by esa4.mentor.iphmx.com with ESMTP; 19 Dec 2023 12:45:47 -0800 IronPort-SDR: 7NDkJqsG/UBXyOoLT87q+kOMPntgQATO1Dwcsko/h/XRuuoSgmdOw8KcbCYcTacm/bdiT+sQM8 pkoxau852qKJdgSREf6+nPxtyXeVqxafLVwwArSzCdp44kcRE9b5bfrCko84fav0lcM5EYTkta hcS9iFltEiMmVtwAn2ifmYnLlRAO5Mf2xZCEe9RUKfD4vx4ddo8IqlP0zHAgVmikTvajBDZVJ4 CLonLuHXU7MVq5NdMWlp5bUDnvfGbY6yTx+QsGndbaQcINtBhSOE5FwDdaaIOHGpMkfCAj7958 EgU= Message-ID: <0632c1f8-e869-45a0-96d7-5375895d53cd@codesourcery.com> Date: Tue, 19 Dec 2023 21:45:39 +0100 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH 1/5] OpenMP, NVPTX: memcpy[23]D bias correction Content-Language: en-US To: Julian Brown , Thomas Schwinge CC: , , , Tom de Vries References: <87sf704k5l.fsf@euler.schwinge.homeip.net> <20231002155359.3a44a582@squid.athome> From: Tobias Burnus In-Reply-To: <20231002155359.3a44a582@squid.athome> Content-Type: text/plain; charset="UTF-8"; format=flowed Content-Transfer-Encoding: quoted-printable X-Originating-IP: [137.202.0.90] X-ClientProxiedBy: svr-ies-mbx-14.mgc.mentorg.com (139.181.222.14) To svr-ies-mbx-12.mgc.mentorg.com (139.181.222.12) X-Spam-Status: No, score=-5.3 required=5.0 tests=BAYES_00,HEADER_FROM_DIFFERENT_DOMAINS,KAM_DMARC_STATUS,SPF_HELO_PASS,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Hi Julian & Thomas, the patch LGTM - and seemingly also Thomas is somewhat fine with it - and it includes the stand-alone testcase. * * * I guess, you don't know the answer to Thomas question, i.e. whether that's a bug in CUDA or in our use of the CUDA API? CUDA's spec itself, https://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MEM.html has for cuMemcpy2D =E2=80=8E void* Start =3D (void*)((char*)srcHost+srcY*srcPitch + srcXInByt= es); and for cuMemcpy3D =E2=80=8E void* Start =3D (void*)((char*)srcHost+(srcZ*srcHeight+srcY)*src= Pitch + srcXInBytes); Thus, I assume we use it "properly", except that the CUDA writers probably assumed that one allocates a big chunk of memory and work with that memory and not just maps a subset. This might or might not be stated in the manual in the following: "Memory regions spanning over allocations that are both registered and not registered with CUDA are not supported and will return CUDA_ERROR_INVALID_VALUE." =E2=80=93 where the question is whether everythi= ng until 'start' really counts as "spanning". Tobias On 02.10.23 16:53, Julian Brown wrote: > On Wed, 27 Sep 2023 00:57:58 +0200 > Thomas Schwinge wrote: > >> On 2023-09-06T02:34:30-0700, Julian Brown >> wrote: >>> This patch works around behaviour of the 2D and 3D memcpy >>> operations in the CUDA driver runtime. Particularly in Fortran, >>> the "base pointer" of an array (used for either source or >>> destination of a host/device copy) may lie outside of data that is >>> actually stored on the device. The fix is to make sure that we use >>> the first element of data to be transferred instead, and adjust >>> parameters accordingly. >> Do you (a) have a stand-alone test case for this (that is, not >> depending on your other pending patches, so that this could go in >> directly -- together with the before-FAIL test case). > Thanks for the reply! Here's a version with a stand-alone test case. > >> Do you (b) >> know if is this a bug in our use of the CUDA Driver API or rather in >> CUDA itself? If the latter, have you reported this to Nvidia? > I don't think the CUDA behaviour is *wrong*, as such -- at least to the > C/C++ way of thinking (or indeed a graphics-oriented way of thinking), > one would normally think of an array as having a zero-based origin, and > these 2D/3D memory copies would be intended as a way of updating just a > part of an array (or texture) that has full duplicate copies on both > the host and device. Our use-case just happens to be a bit different, > both because Fortran (internally) represents an array by a zero-based > origin but may use 1-based (or whatever-based) indices, and because we > support partial mappings of host arrays on the device in all three > supported languages -- which amounts to much the same thing, actually. > > That said, it *could* be fixed in CUDA, though probably not in all the > versions currently deployed out there in the world. So I guess we'd > still need a patch like this anyway. > > Julian ----------------- Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstra=C3=9Fe 201= , 80634 M=C3=BCnchen; Gesellschaft mit beschr=C3=A4nkter Haftung; Gesch=C3= =A4ftsf=C3=BChrer: Thomas Heurung, Frank Th=C3=BCrauf; Sitz der Gesellschaf= t: M=C3=BCnchen; Registergericht M=C3=BCnchen, HRB 106955