From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <Andrew_Stubbs@mentor.com>
Received: from esa1.mentor.iphmx.com (esa1.mentor.iphmx.com [68.232.129.153])
 by sourceware.org (Postfix) with ESMTPS id B4577381BBE9
 for <gcc-patches@gcc.gnu.org>; Tue,  7 Jun 2022 12:28:39 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org B4577381BBE9
Authentication-Results: sourceware.org; dmarc=none (p=none dis=none)
 header.from=codesourcery.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mentor.com
X-IronPort-AV: E=Sophos;i="5.91,283,1647331200"; d="scan'208";a="79560830"
Received: from orw-gwy-01-in.mentorg.com ([192.94.38.165])
 by esa1.mentor.iphmx.com with ESMTP; 07 Jun 2022 04:28:38 -0800
IronPort-SDR: 53BIicrvIPe/qEVgiBnDMaPp3F6mujJ/pWvfyG8HHpB0EGBTSTrgfhWmYYICJjlvsnO1mZmOPU
 19TyrTIhpKX49ys/Vcxbh9JZTJGLoJBUmjoXFcpJL2FEtNMwqUP9avacKRk0t40vk9e3WzGCD4
 AqoJwTRiGfwZJhfkwbaDIBxWdoRNmpw/g7zJXgijJmLLfO5rgtm7Xc9qL1KnM8LlQ9NOwX6Yie
 yQqlOHJqMD31cUF9DEcAHO8F5GEGhJFWrv+DjWD17LGQbwkQe/vbT62Vgxl8/1xlnuXb12Zt0K
 nP8=
Message-ID: <e8fc4b30-768a-2a02-1fc9-208ab9bf8a5d@codesourcery.com>
Date: Tue, 7 Jun 2022 13:28:33 +0100
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
 Thunderbird/91.9.1
Subject: Re: [PATCH] libgomp, openmp: pinned memory
Content-Language: en-GB
To: Jakub Jelinek <jakub@redhat.com>
CC: "gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>
References: <f5260c95-6c71-99a7-3bf2-774380444082@codesourcery.com>
 <20220104155558.GG2646553@tucnak>
 <48ee767a-0d90-53b4-ea54-9deba9edd805@codesourcery.com>
 <20220104182829.GK2646553@tucnak> <20220104184740.GL2646553@tucnak>
 <b59981ce-9e47-8b00-03b8-1a9a5d555bb7@codesourcery.com>
 <a79567df-f061-8248-4281-63c74e724cb7@codesourcery.com>
 <dadaaf64-360f-bffb-8616-1ab9493cb358@codesourcery.com>
 <Yp9AMrhxak8lOh4t@tucnak>
From: Andrew Stubbs <ams@codesourcery.com>
In-Reply-To: <Yp9AMrhxak8lOh4t@tucnak>
Content-Type: text/plain; charset="UTF-8"; format=flowed
Content-Transfer-Encoding: 7bit
X-Originating-IP: [137.202.0.90]
X-ClientProxiedBy: svr-ies-mbx-12.mgc.mentorg.com (139.181.222.12) To
 svr-ies-mbx-11.mgc.mentorg.com (139.181.222.11)
X-Spam-Status: No, score=-6.0 required=5.0 tests=BAYES_00,
 HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, NICE_REPLY_A, SPF_HELO_PASS,
 SPF_PASS, TXREP,
 T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Tue, 07 Jun 2022 12:28:41 -0000

On 07/06/2022 13:10, Jakub Jelinek wrote:
> On Tue, Jun 07, 2022 at 12:05:40PM +0100, Andrew Stubbs wrote:
>> Following some feedback from users of the OG11 branch I think I need to
>> withdraw this patch, for now.
>>
>> The memory pinned via the mlock call does not give the expected performance
>> boost. I had not expected that it would do much in my test setup, given that
>> the machine has a lot of RAM and my benchmarks are small, but others have
>> tried more and on varying machines and architectures.
> 
> I don't understand why there should be any expected performance boost (at
> least not unless the machine starts swapping out pages),
> { omp_atk_pinned, true } is solely about the requirement that the memory
> can't be swapped out.

It seems like it takes a faster path through the NVidia drivers. This is 
a black box, for me, but that seems like a plausible explanation. The 
results are different on x86_64 and powerpc hosts (such as the Summit 
supercomputer).

>> It seems that it isn't enough for the memory to be pinned, it has to be
>> pinned using the Cuda API to get the performance boost. I had not done this
> 
> For performance boost of what kind of code?
> I don't understand how Cuda API could be useful (or can be used at all) if
> offloading to NVPTX isn't involved.  The fact that somebody asks for host
> memory allocation with omp_atk_pinned set to true doesn't mean it will be
> in any way related to NVPTX offloading (unless it is in NVPTX target region
> obviously, but then mlock isn't available, so sure, if there is something
> CUDA can provide for that case, nice).

This is specifically for NVPTX offload, of course, but then that's what 
our customer is paying for.

The expectation, from users, is that memory pinning will give the 
benefits specific to the active device. We can certainly make that 
happen when there is only one (flavour of) offload device present. I had 
hoped it could be one way for all, but it looks like not.

> 
>> I don't think libmemkind will resolve this performance issue, although
>> certainly it can be used for host implementations of low-latency memories,
>> etc.
> 
> The reason for libmemkind is primarily its support of HBW memory (but
> admittedly I need to find out what kind of such memory it does support),
> or the various interleaving etc. the library has.
> Plus, when we have such support, as it has its own costomizable allocator,
> it could be used to allocate larger chunks of memory that can be mlocked
> and then just allocate from that pinned memory if user asks for small
> allocations from that memory.

It should be straight-forward to switch the no-offload implementation to 
libmemkind when the time comes (the changes would be contained within 
config/linux/allocator.c), but I have no plans to do so myself (and no 
hardware to test it with). I'd prefer that it didn't impede the offload 
solution in the meantime.

Andrew