From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <Tobias_Burnus@mentor.com>
Received: from esa2.mentor.iphmx.com (esa2.mentor.iphmx.com [68.232.141.98])
 by sourceware.org (Postfix) with ESMTPS id 0EA98384B122
 for <gcc-patches@gcc.gnu.org>; Thu,  9 Jun 2022 10:09:59 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 0EA98384B122
Authentication-Results: sourceware.org; dmarc=none (p=none dis=none)
 header.from=codesourcery.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mentor.com
X-IronPort-AV: E=Sophos;i="5.91,287,1647331200"; d="scan'208";a="77014862"
Received: from orw-gwy-01-in.mentorg.com ([192.94.38.165])
 by esa2.mentor.iphmx.com with ESMTP; 09 Jun 2022 02:09:58 -0800
IronPort-SDR: q2EVLQZgP7vBfxcmxy/8+iiOLin1VrgGxilxiJ5lrXYNIQMU71FHFWHJl8uIDeuGELgIAE30ro
 BSevn1SHZ/OC+QGKdiZWMsKbDYZfjNrMDQLIhBjoxvdL/GfAKQ+rNeCkttpfFkuKi9G83e9jwM
 4Li3bANl8nSFA8e45nRUCc7b/6ZXLolZNOa432FKvlwYCFKzuKJIPo/HV09z6JCCe1IPkXDcjv
 5Crh4A31PglEA8JlzKCUS1v6fLEpdhRBOAKwpJfNOVuKXSmlZ8w0NZVgOPul6usjGClgMN0UNY
 ACo=
Message-ID: <8c95bbcf-7a74-738d-ffc2-4cae606aac62@codesourcery.com>
Date: Thu, 9 Jun 2022 12:09:52 +0200
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101
 Thunderbird/91.10.0
Subject: Re: [PATCH] libgomp, openmp: pinned memory
Content-Language: en-US
To: Thomas Schwinge <thomas@codesourcery.com>, Andrew Stubbs
 <ams@codesourcery.com>, Jakub Jelinek <jakub@redhat.com>
CC: <gcc-patches@gcc.gnu.org>
References: <f5260c95-6c71-99a7-3bf2-774380444082@codesourcery.com>
 <20220104155558.GG2646553@tucnak>
 <48ee767a-0d90-53b4-ea54-9deba9edd805@codesourcery.com>
 <20220104182829.GK2646553@tucnak> <20220104184740.GL2646553@tucnak>
 <b59981ce-9e47-8b00-03b8-1a9a5d555bb7@codesourcery.com>
 <a79567df-f061-8248-4281-63c74e724cb7@codesourcery.com>
 <dadaaf64-360f-bffb-8616-1ab9493cb358@codesourcery.com>
 <Yp9AMrhxak8lOh4t@tucnak>
 <e8fc4b30-768a-2a02-1fc9-208ab9bf8a5d@codesourcery.com>
 <87edzy5g8h.fsf@euler.schwinge.homeip.net>
From: Tobias Burnus <tobias@codesourcery.com>
In-Reply-To: <87edzy5g8h.fsf@euler.schwinge.homeip.net>
Content-Type: text/plain; charset="UTF-8"; format=flowed
Content-Transfer-Encoding: quoted-printable
X-Originating-IP: [137.202.0.90]
X-ClientProxiedBy: svr-ies-mbx-11.mgc.mentorg.com (139.181.222.11) To
 svr-ies-mbx-12.mgc.mentorg.com (139.181.222.12)
X-Spam-Status: No, score=-6.0 required=5.0 tests=BAYES_00,
 HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, NICE_REPLY_A,
 RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_PASS, TXREP,
 T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Thu, 09 Jun 2022 10:10:02 -0000

On 09.06.22 11:38, Thomas Schwinge wrote:
> On 2022-06-07T13:28:33+0100, Andrew Stubbs <ams@codesourcery.com> wrote:
>> On 07/06/2022 13:10, Jakub Jelinek wrote:
>>> On Tue, Jun 07, 2022 at 12:05:40PM +0100, Andrew Stubbs wrote:
>>>> The memory pinned via the mlock call does not give the expected perfor=
mance
>>>> boost. I had not expected that it would do much in my test setup, give=
n that
>>>> the machine has a lot of RAM and my benchmarks are small, but others h=
ave
>>>> tried more and on varying machines and architectures.
>>> I don't understand why there should be any expected performance boost (=
at
>>> least not unless the machine starts swapping out pages),
>>> { omp_atk_pinned, true } is solely about the requirement that the memor=
y
>>> can't be swapped out.
>> It seems like it takes a faster path through the NVidia drivers. [...]

I think this conflates two parts:

* User-defined allocators in general =E2=80=93 there CUDA does not make muc=
h
sense and without unified-shared memory, it will always be inaccessible
on the device (w/o explicit/implicit mapping).

* Memory which is supposed to be accessible both on the host and on the
device. That's most obvious by  explicitly allocating to be accessible
on both =E2=80=93 it is less clear cut when just creating an allocator with
unified-shared memory as it is not clear when it is only using on the
host (e.g. with host-based thread parallelization) =E2=80=93 and when it is=
 also
relevant for the device.

Currently, the user has no means to express the intent that it should be
accessible on both the host and one/several devices, except for 'omp
requires unified_shared_memory'.

The next OpenMP version will likely permit a means to create an
allocator which permits this =E2=86=92
https://github.com/OpenMP/spec/issues/1843 (not publicly available;
slides (last comment) are slightly outdated).

  * * *

The question is only what to do with 'requires unified_shared_memory' =E2=
=80=93
and a non-multi-device allocator.

Probably: unified_shared_memory or no nvptx device: just use mlock.
Otherwise (i.e. both nvptx device and (unified_shared_memory or a
multi-device-allocator)), use the CUDA one.

For the latter, I think Thomas' remarks are helpful.

Tobias

-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstra=C3=9Fe 201=
, 80634 M=C3=BCnchen; Gesellschaft mit beschr=C3=A4nkter Haftung; Gesch=C3=
=A4ftsf=C3=BChrer: Thomas Heurung, Frank Th=C3=BCrauf; Sitz der Gesellschaf=
t: M=C3=BCnchen; Registergericht M=C3=BCnchen, HRB 106955