From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=oVjw=6M=siemens.com=andrew.stubbs@sourceware.org>
Received: from EUR05-DB8-obe.outbound.protection.outlook.com (mail-db8eur05on2083.outbound.protection.outlook.com [40.107.20.83])
	by sourceware.org (Postfix) with ESMTPS id 466593858D33
	for <gcc-patches@gcc.gnu.org>; Thu, 16 Feb 2023 16:17:35 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 466593858D33
Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=siemens.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=siemens.com
ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none;
 b=gMJ2sUESfsTXZRTpjf/D8iLyMSvC8H9DYqPOewCNXC8wMSTenH9GEQdwy5GnzsuLAN2+DKWhmG1ADOjJhDSFLrtO52SjWXysGLNWbiaE1drN6F1RbRXpUBVLWWKXWSPZL9vEgSk577LcbgCMRaWqSrGSBAmnhjLY/A3U73ju7l0h+mI63Wvd8mnbbW0uOHgWGuJOl4bNljwFws9n1Q9VnNeYWSQuSrLymAtQqm28B3Foh3oaMh3KfMUGyu8QKExkNy1sl5qdPxkJVoqGXtWH26Ob8DddvJkPo/kCQfkSvxJgDbU91yy7aJqAyf6mSCGtsknsOlL1yRpR++4QRJ+UTA==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com;
 s=arcselector9901;
 h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1;
 bh=RwmnibGgGOxdA/dMSALM72hrg7MAxX1Tcfsh2NYIv9g=;
 b=IWxWuiLR3gqC23PJZYw4TyEZMnCQ+Zst9Xvb9X3WY0e+HfWDUq0dEbmscRBYybDBc2OrCI3qfT3sYtlYHp7aNMmjyiz4r7fpQSI1oLld1JM5EqcGI7naLcrjBQixd+QevCTVyTvZLcFV2rSf3mVesjWbbY9kWrAKVa4JqMB22abwa+z+cestGfG0ozi7KB0/XREiQ9Gomydk18VUPK+NbKZmeZepQEJ1hfGRdbLxwsaUj8qetlFv1IP0p0yOH7K4U+pDkTtJ2U8thj4hXvQ6jvhATyBbAhp7WFkUmzxVYiHjwL05qZoBIVrZVpufRWxU3x5CQ0ci3CauNkjfBl4Bag==
ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass
 smtp.mailfrom=siemens.com; dmarc=pass action=none header.from=siemens.com;
 dkim=pass header.d=siemens.com; arc=none
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=siemens.com;
 s=selector2;
 h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck;
 bh=RwmnibGgGOxdA/dMSALM72hrg7MAxX1Tcfsh2NYIv9g=;
 b=Fh8OIbSwrSpP6uQUuQfjEt3BraBDACz3mX4+6j5enz8sFjqShVNBvkmQMycEwPOnMVzbf3nfHPXMovgWd4xZuh/ydWp77ytXM18Sjn/XBDFZAt/PcAooW1IPMGD9WhkQDYP4pGVHZTtDT94+mwHHXipjMHb5PDd6yJom2yfAH7xDTAR9wvIsmY7TgT3IKL0bIZkhG/H7zXOG1IhzCkl04vadaH11Ugv1bwJwKnEc26fOW5YC+mfKxj9ejceBCqrtGYzkUFH0i4WkwTR3o8soTqrQdz+hraNFesCVsylSiQrW4IIlreMhbmYZFnxX+x/1cLmue6gFST8buTJhKURIxg==
Received: from DB8PR10MB3676.EURPRD10.PROD.OUTLOOK.COM (2603:10a6:10:13c::18)
 by AM7PR10MB3238.EURPRD10.PROD.OUTLOOK.COM (2603:10a6:20b:10e::10) with
 Microsoft SMTP Server (version=TLS1_2,
 cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6111.13; Thu, 16 Feb
 2023 16:17:33 +0000
Received: from DB8PR10MB3676.EURPRD10.PROD.OUTLOOK.COM
 ([fe80::e78e:a40b:3948:4d41]) by DB8PR10MB3676.EURPRD10.PROD.OUTLOOK.COM
 ([fe80::e78e:a40b:3948:4d41%9]) with mapi id 15.20.6111.013; Thu, 16 Feb 2023
 16:17:32 +0000
From: "Stubbs, Andrew" <andrew.stubbs@siemens.com>
To: Thomas Schwinge <thomas@codesourcery.com>, Andrew Stubbs
	<ams@codesourcery.com>, Jakub Jelinek <jakub@redhat.com>, Tobias Burnus
	<tobias@codesourcery.com>, "gcc-patches@gcc.gnu.org"
	<gcc-patches@gcc.gnu.org>
Subject: RE: Attempt to register OpenMP pinned memory using a device instead
 of 'mlock' (was: [PATCH] libgomp, openmp: pinned memory)
Thread-Topic: Attempt to register OpenMP pinned memory using a device instead
 of 'mlock' (was: [PATCH] libgomp, openmp: pinned memory)
Thread-Index: AQHZQhv79EJfgurMakCAHDiqUX7NVK7RvnTQ
Date: Thu, 16 Feb 2023 16:17:32 +0000
Message-ID:
 <DB8PR10MB3676D898B971F9E6AB2C5AFFFBA09@DB8PR10MB3676.EURPRD10.PROD.OUTLOOK.COM>
References: <f5260c95-6c71-99a7-3bf2-774380444082@codesourcery.com>
 <20220104155558.GG2646553@tucnak>
 <48ee767a-0d90-53b4-ea54-9deba9edd805@codesourcery.com>
 <20220104182829.GK2646553@tucnak> <20220104184740.GL2646553@tucnak>
 <b59981ce-9e47-8b00-03b8-1a9a5d555bb7@codesourcery.com>
 <a79567df-f061-8248-4281-63c74e724cb7@codesourcery.com>
 <dadaaf64-360f-bffb-8616-1ab9493cb358@codesourcery.com>
 <Yp9AMrhxak8lOh4t@tucnak>
 <e8fc4b30-768a-2a02-1fc9-208ab9bf8a5d@codesourcery.com>
 <87edzy5g8h.fsf@euler.schwinge.homeip.net>
 <87cz69tyla.fsf@dem-tschwing-1.ger.mentorg.com>
In-Reply-To: <87cz69tyla.fsf@dem-tschwing-1.ger.mentorg.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
msip_labels:
 MSIP_Label_9d258917-277f-42cd-a3cd-14c4e9ee58bc_ActionId=a9822208-ef7b-4140-991f-455abb88ef3e;MSIP_Label_9d258917-277f-42cd-a3cd-14c4e9ee58bc_ContentBits=0;MSIP_Label_9d258917-277f-42cd-a3cd-14c4e9ee58bc_Enabled=true;MSIP_Label_9d258917-277f-42cd-a3cd-14c4e9ee58bc_Method=Standard;MSIP_Label_9d258917-277f-42cd-a3cd-14c4e9ee58bc_Name=restricted;MSIP_Label_9d258917-277f-42cd-a3cd-14c4e9ee58bc_SetDate=2023-02-16T16:11:33Z;MSIP_Label_9d258917-277f-42cd-a3cd-14c4e9ee58bc_SiteId=38ae3bcd-9579-4fd4-adda-b42e1495d55a;
authentication-results: dkim=none (message not signed)
 header.d=none;dmarc=none action=none header.from=siemens.com;
x-ms-publictraffictype: Email
x-ms-traffictypediagnostic: DB8PR10MB3676:EE_|AM7PR10MB3238:EE_
x-ms-office365-filtering-correlation-id: 3323c3bc-a2cf-428b-a5fd-08db10394ea4
x-ms-exchange-senderadcheck: 1
x-ms-exchange-antispam-relay: 0
x-microsoft-antispam: BCL:0;
x-microsoft-antispam-message-info:
 TgE3b41SAYnq2MyJkSGOL9FHCJbA/hQuEVHKwt01kkexliTlVbHc4bvg8C7I6aBgVJWDXxH/narSdjAZAQmrom6D1JaHDxZ9fqCgiAIG4QwpgmyS6zRhqlD4Kg8U7okAGQa3E+YOAGU3wn66AcOgCbVV+GBNsbzHyNwe3UNJPkAWYRC5q/2E1+vw19JH4N3RxgAJUJneJpzd/wq3qvjFOwT7ZiNXzjUqSAwLpnNDiBxZIhDH0nxD3j0ROTTFfu+Ae+KyeAVENwdxdNFWuLdc8+7c6LsZGYP/rOms3FsRBjST9qbuizj/AwfTAUFc62gmEmLDRoPPt76cjfhpFlDOReJxerjkhH4Dt9gWJBs2nISGDhj+iYLSVQTyzTzX6qVHoJLhV0nnegbrFUUElDJoZcv8+Av9qDnEZEfwEI6bfNtgsZLuibLcWffcBZTrsGzMND3ArYAXKMlLYAopSj/FIjvKzZdV66aHqKdihoY7DICd/8KOFZdSFTi72lIB4853C+mf9rZsf9qinqby2S4Bh4vE/zdzp4zkHA1XSnILnL+60V3cZkMPzkPQqIs26Lr40hBbxLF6SMl+0IoyXNme0uW90nHoLmRHFZp6/t0YWR9ShhiWU3D3ewGZodLEkh+s/7Ony4SRoWDSuMvT2nuz/e5qhrYibBtaJa2sVJ/i7GsiJiw3IZCprcWBRqvH+/cx
x-forefront-antispam-report:
 CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:DB8PR10MB3676.EURPRD10.PROD.OUTLOOK.COM;PTR:;CAT:NONE;SFS:(13230025)(4636009)(396003)(346002)(366004)(376002)(136003)(39860400002)(451199018)(71200400001)(478600001)(7696005)(55016003)(83380400001)(38070700005)(33656002)(122000001)(82960400001)(86362001)(38100700002)(9686003)(186003)(26005)(6506007)(53546011)(8936002)(41300700001)(52536014)(2906002)(5660300002)(76116006)(64756008)(66446008)(8676002)(66946007)(66476007)(316002)(66556008)(110136005);DIR:OUT;SFP:1101;
x-ms-exchange-antispam-messagedata-chunkcount: 1
x-ms-exchange-antispam-messagedata-0:
 =?us-ascii?Q?Id3yrnj+hfZvPRXj5JqG92kzsU4d/GI8QNolyLNYDjq/pkiOl5BRQN68y0HY?=
 =?us-ascii?Q?Wiacv1ndxU1P1C7TQdsq6UMnp1CzmGKSupLJXO1ayo73jbYVK880uL8+ktSs?=
 =?us-ascii?Q?gbRyXGF8xn8GwxNm5DnoCDPC2LkGiFEBu52oMBl1MhxuWlu6KrQQqOH1RTSu?=
 =?us-ascii?Q?hq1p3EEGaD5XHAMeBBW6rNKT19jDti9V29BtN5gQLRuY/RAEPnf4FsLH5k1/?=
 =?us-ascii?Q?Qsplla+CSReAUobZoFv0IinkggQvWpDRgMDmJEWL5it2YsbN5P8DAkilyulc?=
 =?us-ascii?Q?zteO0DYTQuLZygDcwHoxy1QrzMutKHGo7iKZcfl2xz02UCYO2Je4vlatd158?=
 =?us-ascii?Q?ENtWYKz2O16sYutpTp3ifWYyxjCtE8mCR+n01tQ04VJOsfkOmtCUyheZFGAe?=
 =?us-ascii?Q?KckSMXVV2Nb9Qi4Cu+gB9V958EYP4H+ge9e2s1ZQ7wErISAPdLbF8TaQG+9U?=
 =?us-ascii?Q?uI7duT0gYKzd4xKly97SQLZFoXn1IVgfex4fY3JGyJATBQ+PhP/660mAfbYj?=
 =?us-ascii?Q?Yikc0trS6Dam0KnhCf1ETOWRI0PlA7mJBXaxhB/La16hF4yo4wY2UXa0PKhx?=
 =?us-ascii?Q?ld5wjOZG/ltRP3AdOzwRDF1ddkhwBGxQERdjhRZ9Mb5gF1l2bWOgkuuYPosO?=
 =?us-ascii?Q?tXC+/4b6459reTKBJzBn0Oqn8rMPTeKlNnqrfkDhgI7vHZK2OA+Fu5AFMtTP?=
 =?us-ascii?Q?xyPcXFJxUebt2Z93jKGeHU3xjXFEVagHIIFXEFbB8wHc9FzmJQAm2+Dygoeu?=
 =?us-ascii?Q?/kHUih+NfISAMO16fo9daZ0p1mXUgKXXkjLKxlw29VJ42WyJLvarPI7OCzfs?=
 =?us-ascii?Q?3jxRzK105u9muE97q76TJDYe9kr9TQZ9np0Fi+FJfspH5qqUqW88NehL8G0M?=
 =?us-ascii?Q?PuRfgqYnKuphPoE5ASTO+/I/njzBIumzILtvukO3JpUee5sTI+TP5q06BAzw?=
 =?us-ascii?Q?V5phqBlxoLaoABMMHXJi3khGg9t/+pazBIs/f+3bQhSqK3bI7XtBuM7xDm+G?=
 =?us-ascii?Q?qSnMREeq8UIOaJX0Fm/LZXJ0tzgqMkvmAP/yQSR0HpQv6jHuPtz0IAU1va8n?=
 =?us-ascii?Q?h92nhdMyju+mS18VmaeVnykr1bDio4AQ6RXgB0riIynPHLXuSB2Gf7ynKEPO?=
 =?us-ascii?Q?ePoXJaNp/MFxM0wMbWzptlNjDE8HTCEbI0yFFrbIsAHRe2IHMOsRLp+Rq1pW?=
 =?us-ascii?Q?ofs0HkMRKvq6Ua1jpKIquD9+xj6jvxmXyBkjU1pAMhh7CuQOrbIldE6els+x?=
 =?us-ascii?Q?R8lO7NvxUmrsAo++vczngagM4CY1xvGbAG6qPOBrt4cHZb0Wn5JQdffDp3um?=
 =?us-ascii?Q?leqRUooK5Tm8ZdNJXZSc+I1074YzwvjYKgTXPLnKq9ZhPLraKbZw8IWyZoqG?=
 =?us-ascii?Q?+Nj6HbFSEdcTGEiaR66+rxuyZB31P8F2g54uQ9OSDWgod3BiJSVxp463f2Li?=
 =?us-ascii?Q?keqSWmvyeK29ULI7Y2F9dy4LnQVs4haRfz46Hp9qjn1/IAgsILmoIh7WlmLp?=
 =?us-ascii?Q?DUxNa3PhYlwdkgiMfF/Q8a36yKjKP4L5WiarCTmIkyDswbJ6o1H4I3XgqMgn?=
 =?us-ascii?Q?JI51Gcx5i4iLStVJO0e/4Q8/qWTWdvr9YfCH7srS?=
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
X-OriginatorOrg: siemens.com
X-MS-Exchange-CrossTenant-AuthAs: Internal
X-MS-Exchange-CrossTenant-AuthSource: DB8PR10MB3676.EURPRD10.PROD.OUTLOOK.COM
X-MS-Exchange-CrossTenant-Network-Message-Id: 3323c3bc-a2cf-428b-a5fd-08db10394ea4
X-MS-Exchange-CrossTenant-originalarrivaltime: 16 Feb 2023 16:17:32.1494
 (UTC)
X-MS-Exchange-CrossTenant-fromentityheader: Hosted
X-MS-Exchange-CrossTenant-id: 38ae3bcd-9579-4fd4-adda-b42e1495d55a
X-MS-Exchange-CrossTenant-mailboxtype: HOSTED
X-MS-Exchange-CrossTenant-userprincipalname: Oytk/TGiQkkewMgB48CR9wLfPCpoIqwvTxe8OxWvygU0tTzISpgwcRodVJN15Uwgz9NU1egJN+swBthbok+JQzy2QrXVEhPDjq46dVhRWvM=
X-MS-Exchange-Transport-CrossTenantHeadersStamped: AM7PR10MB3238
X-Spam-Status: No, score=-1.2 required=5.0 tests=BAYES_00,DKIMWL_WL_MED,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FORGED_SPF_HELO,KAM_LOTSOFHASH,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_PASS,SPF_NONE,TXREP autolearn=no autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <gcc-patches.gcc.gnu.org>

> -----Original Message-----
> From: Thomas Schwinge <thomas@codesourcery.com>
> Sent: 16 February 2023 15:33
> To: Andrew Stubbs <ams@codesourcery.com>; Jakub Jelinek <jakub@redhat.com=
>;
> Tobias Burnus <tobias@codesourcery.com>; gcc-patches@gcc.gnu.org
> Subject: Attempt to register OpenMP pinned memory using a device instead =
of
> 'mlock' (was: [PATCH] libgomp, openmp: pinned memory)
>=20
> Hi!
>=20
> On 2022-06-09T11:38:22+0200, I wrote:
> > On 2022-06-07T13:28:33+0100, Andrew Stubbs <ams@codesourcery.com> wrote=
:
> >> On 07/06/2022 13:10, Jakub Jelinek wrote:
> >>> On Tue, Jun 07, 2022 at 12:05:40PM +0100, Andrew Stubbs wrote:
> >>>> Following some feedback from users of the OG11 branch I think I need=
 to
> >>>> withdraw this patch, for now.
> >>>>
> >>>> The memory pinned via the mlock call does not give the expected
> performance
> >>>> boost. I had not expected that it would do much in my test setup, gi=
ven
> that
> >>>> the machine has a lot of RAM and my benchmarks are small, but others
> have
> >>>> tried more and on varying machines and architectures.
> >>>
> >>> I don't understand why there should be any expected performance boost
> (at
> >>> least not unless the machine starts swapping out pages),
> >>> { omp_atk_pinned, true } is solely about the requirement that the mem=
ory
> >>> can't be swapped out.
> >>
> >> It seems like it takes a faster path through the NVidia drivers. This =
is
> >> a black box, for me, but that seems like a plausible explanation. The
> >> results are different on x86_64 and powerpc hosts (such as the Summit
> >> supercomputer).
> >
> > For example, it's documented that 'cuMemHostAlloc',
> >
> <https://eur01.safelinks.protection.outlook.com/?url=3Dhttps%3A%2F%2Fdocs=
.nvid
> ia.com%2Fcuda%2Fcuda-driver-
> api%2Fgroup__CUDA__MEM.html%23group__CUDA__MEM_1g572ca4011bfcb25034888a14=
d4e
> 035b9&data=3D05%7C01%7Candrew.stubbs%40siemens.com%7C239a86c9ff1142313daa=
08db1
> 0331cfc%7C38ae3bcd95794fd4addab42e1495d55a%7C1%7C0%7C638121583939887694%7=
CUn
> known%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiL=
CJX
> VCI6Mn0%3D%7C3000%7C%7C%7C&sdata=3D7S8K2opKAV%2F5Ub2tyZtcgplptZ65dNc3b%2F=
2IYoh
> me%2Fw%3D&reserved=3D0>,
> > "Allocates page-locked host memory".  The crucial thing, though, what
> > makes this different from 'malloc' plus 'mlock' is, that "The driver
> > tracks the virtual memory ranges allocated with this function and
> > automatically accelerates calls to functions such as cuMemcpyHtoD().
> > Since the memory can be accessed directly by the device, it can be read
> > or written with much higher bandwidth than pageable memory obtained wit=
h
> > functions such as malloc()".
> >
> > Similar, for example, for 'cuMemAllocHost',
> >
> <https://eur01.safelinks.protection.outlook.com/?url=3Dhttps%3A%2F%2Fdocs=
.nvid
> ia.com%2Fcuda%2Fcuda-driver-
> api%2Fgroup__CUDA__MEM.html%23group__CUDA__MEM_1gdd8311286d2c2691605362c6=
89b
> c64e0&data=3D05%7C01%7Candrew.stubbs%40siemens.com%7C239a86c9ff1142313daa=
08db1
> 0331cfc%7C38ae3bcd95794fd4addab42e1495d55a%7C1%7C0%7C638121583939887694%7=
CUn
> known%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiL=
CJX
> VCI6Mn0%3D%7C3000%7C%7C%7C&sdata=3DTAhX%2BFjPavhKZKICMDiO%2BuZuytxnkaDvfD=
ArT0R
> KDV0%3D&reserved=3D0>.
> >
> > This, to me, would explain why "the mlock call does not give the expect=
ed
> > performance boost", in comparison with 'cuMemAllocHost'/'cuMemHostAlloc=
';
> > with 'mlock' you're missing the "tracks the virtual memory ranges"
> > aspect.
> >
> > Also, by means of the Nvidia Driver allocating the memory, I suppose
> > using this interface likely circumvents any "annoying" 'ulimit'
> > limitations?  I get this impression, because documentation continues
> > stating that "Allocating excessive amounts of memory with
> > cuMemAllocHost() may degrade system performance, since it reduces the
> > amount of memory available to the system for paging.  As a result, this
> > function is best used sparingly to allocate staging areas for data
> > exchange between host and device".
> >
> >>>> It seems that it isn't enough for the memory to be pinned, it has to=
 be
> >>>> pinned using the Cuda API to get the performance boost.
> >>>
> >>> For performance boost of what kind of code?
> >>> I don't understand how Cuda API could be useful (or can be used at al=
l)
> if
> >>> offloading to NVPTX isn't involved.  The fact that somebody asks for
> host
> >>> memory allocation with omp_atk_pinned set to true doesn't mean it wil=
l
> be
> >>> in any way related to NVPTX offloading (unless it is in NVPTX target
> region
> >>> obviously, but then mlock isn't available, so sure, if there is
> something
> >>> CUDA can provide for that case, nice).
> >>
> >> This is specifically for NVPTX offload, of course, but then that's wha=
t
> >> our customer is paying for.
> >>
> >> The expectation, from users, is that memory pinning will give the
> >> benefits specific to the active device. We can certainly make that
> >> happen when there is only one (flavour of) offload device present. I h=
ad
> >> hoped it could be one way for all, but it looks like not.
> >
> > Aren't there CUDA Driver interfaces for that?  That is:
> >
> >>>> I had not done this
> >>>> this because it was difficult to resolve the code abstraction
> >>>> difficulties and anyway the implementation was supposed to be device
> >>>> independent, but it seems we need a specific pinning mechanism for e=
ach
> >>>> device.
> >
> > If not directly *allocating and registering* such memory via
> > 'cuMemAllocHost'/'cuMemHostAlloc', you should still be able to only
> > *register* your standard 'malloc'ed etc. memory via 'cuMemHostRegister'=
,
> >
> <https://eur01.safelinks.protection.outlook.com/?url=3Dhttps%3A%2F%2Fdocs=
.nvid
> ia.com%2Fcuda%2Fcuda-driver-
> api%2Fgroup__CUDA__MEM.html%23group__CUDA__MEM_1gf0a9fe11544326dabd743b7a=
a6b
> 54223&data=3D05%7C01%7Candrew.stubbs%40siemens.com%7C239a86c9ff1142313daa=
08db1
> 0331cfc%7C38ae3bcd95794fd4addab42e1495d55a%7C1%7C0%7C638121583939887694%7=
CUn
> known%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiL=
CJX
> VCI6Mn0%3D%7C3000%7C%7C%7C&sdata=3DWkwx9TipC8JJNn1QqULahoTfqn9w%2FOLyoCQ1=
MTt90
> 8M%3D&reserved=3D0>:
> > "Page-locks the memory range specified [...] and maps it for the
> > device(s) [...].  This memory range also is added to the same tracking
> > mechanism as cuMemHostAlloc to automatically accelerate [...]"?  (No
> > manual 'mlock'ing involved in that case, too; presumably again using th=
is
> > interface likely circumvents any "annoying" 'ulimit' limitations?)
> >
> > Such a *register* abstraction can then be implemented by all the libgom=
p
> > offloading plugins: they just call the respective
> > CUDA/HSA/etc. functions to register such (existing, 'malloc'ed, etc.)
> > memory.
> >
> > ..., but maybe I'm missing some crucial "detail" here?
>=20
> Indeed this does appear to work; see attached
> "[WIP] Attempt to register OpenMP pinned memory using a device instead of
> 'mlock'".
> Any comments (aside from the TODOs that I'm still working on)?

The mmap implementation was not optimized for a lot of small allocations, a=
nd I can't see that issue changing here, so I don't know if this can be use=
d for mlockall replacement.

I had assumed that using the Cuda allocator would fix that limitation.

Andrew