From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <Andrew_Stubbs@mentor.com>
Received: from esa2.mentor.iphmx.com (esa2.mentor.iphmx.com [68.232.141.98])
 by sourceware.org (Postfix) with ESMTPS id 258DB382BD01
 for <gcc-patches@gcc.gnu.org>; Tue,  7 Jun 2022 11:05:47 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 258DB382BD01
Authentication-Results: sourceware.org; dmarc=none (p=none dis=none)
 header.from=codesourcery.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mentor.com
X-IronPort-AV: E=Sophos;i="5.91,283,1647331200"; d="scan'208";a="76899511"
Received: from orw-gwy-02-in.mentorg.com ([192.94.38.167])
 by esa2.mentor.iphmx.com with ESMTP; 07 Jun 2022 03:05:45 -0800
IronPort-SDR: tx7lZTIGbXJAdwHWz4crfU4jErG77u1tFUmm3eu7U4SJhyqCrM08cbl8F62ih3uQg6KEzAYEAI
 6myJ26flB8Pwab/a+9mWf8M1kdg3cVhjoQ9Ls98J0zRgTXH/Xd30hNKGopMhwPcYkR2GDtTior
 opAt9iAkKjjfCsVeEd1OYKM4enGX4ywJHGPvTghVAT9QhOT9TrJj9VqMubN5uh+GXPSSGH1kl2
 TIYmzoYb4iGLrLJLjJyXBkDl/zsBD1sG7pyH7r/EZbkXa+mvTd7ZH/NyGgHnC7nA1txwKPoe38
 2tI=
Message-ID: <dadaaf64-360f-bffb-8616-1ab9493cb358@codesourcery.com>
Date: Tue, 7 Jun 2022 12:05:40 +0100
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
 Thunderbird/91.9.1
Subject: Re: [PATCH] libgomp, openmp: pinned memory
Content-Language: en-GB
From: Andrew Stubbs <ams@codesourcery.com>
To: Jakub Jelinek <jakub@redhat.com>, "gcc-patches@gcc.gnu.org"
 <gcc-patches@gcc.gnu.org>
References: <f5260c95-6c71-99a7-3bf2-774380444082@codesourcery.com>
 <20220104155558.GG2646553@tucnak>
 <48ee767a-0d90-53b4-ea54-9deba9edd805@codesourcery.com>
 <20220104182829.GK2646553@tucnak> <20220104184740.GL2646553@tucnak>
 <b59981ce-9e47-8b00-03b8-1a9a5d555bb7@codesourcery.com>
 <a79567df-f061-8248-4281-63c74e724cb7@codesourcery.com>
In-Reply-To: <a79567df-f061-8248-4281-63c74e724cb7@codesourcery.com>
Content-Type: text/plain; charset="UTF-8"; format=flowed
Content-Transfer-Encoding: 7bit
X-Originating-IP: [137.202.0.90]
X-ClientProxiedBy: svr-ies-mbx-11.mgc.mentorg.com (139.181.222.11) To
 svr-ies-mbx-11.mgc.mentorg.com (139.181.222.11)
X-Spam-Status: No, score=-6.0 required=5.0 tests=BAYES_00,
 HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, NICE_REPLY_A,
 RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_PASS, TXREP,
 T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Tue, 07 Jun 2022 11:05:48 -0000

Following some feedback from users of the OG11 branch I think I need to 
withdraw this patch, for now.

The memory pinned via the mlock call does not give the expected 
performance boost. I had not expected that it would do much in my test 
setup, given that the machine has a lot of RAM and my benchmarks are 
small, but others have tried more and on varying machines and architectures.

It seems that it isn't enough for the memory to be pinned, it has to be 
pinned using the Cuda API to get the performance boost. I had not done 
this because it was difficult to resolve the code abstraction 
difficulties and anyway the implementation was supposed to be device 
independent, but it seems we need a specific pinning mechanism for each 
device.

I will resubmit this patch with some kind of Cuda/plugin hook soonish, 
keeping the existing implementation for other device types. I don't know 
how that'll handle heterogenous systems, but those ought to be rare.

I don't think libmemkind will resolve this performance issue, although 
certainly it can be used for host implementations of low-latency 
memories, etc.

Andrew

On 13/01/2022 13:53, Andrew Stubbs wrote:
> On 05/01/2022 17:07, Andrew Stubbs wrote:
>> I don't believe 64KB will be anything like enough for any real HPC 
>> application. Is it really worth optimizing for this case?
>>
>> Anyway, I'm working on an implementation using mmap instead of malloc 
>> for pinned allocations. I figure that will simplify the unpin 
>> algorithm (because it'll be munmap) and optimize for large allocations 
>> such as I imagine HPC applications will use. It won't fix the ulimit 
>> issue.
> 
> Here's my new patch.
> 
> This version is intended to apply on top of the latest version of my 
> low-latency allocator patch, although the dependency is mostly textual.
> 
> Pinned memory is allocated via mmap + mlock, and allocation fails 
> (returns NULL) if the lock fails and there's no fallback configured.
> 
> This means that large allocations will now be page aligned and therefore 
> pin the smallest number of pages for the size requested, and that that 
> memory will be unpinned automatically when freed via munmap, or moved 
> via mremap.
> 
> Obviously this is not ideal for allocations much smaller than one page. 
> If that turns out to be a problem in the real world then we can add a 
> special case fairly straight-forwardly, and incur the extra page 
> tracking expense in those cases only, or maybe implement our own 
> pinned-memory heap (something like already proposed for low-latency 
> memory, perhaps).
> 
> Also new is a realloc implementation that works better when reallocation 
> fails. This is confirmed by the new testcases.
> 
> OK for stage 1?
> 
> Thanks
> 
> Andrew