From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from esa1.mentor.iphmx.com (esa1.mentor.iphmx.com [68.232.129.153]) by sourceware.org (Postfix) with ESMTPS id B4577381BBE9 for ; Tue, 7 Jun 2022 12:28:39 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org B4577381BBE9 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=codesourcery.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mentor.com X-IronPort-AV: E=Sophos;i="5.91,283,1647331200"; d="scan'208";a="79560830" Received: from orw-gwy-01-in.mentorg.com ([192.94.38.165]) by esa1.mentor.iphmx.com with ESMTP; 07 Jun 2022 04:28:38 -0800 IronPort-SDR: 53BIicrvIPe/qEVgiBnDMaPp3F6mujJ/pWvfyG8HHpB0EGBTSTrgfhWmYYICJjlvsnO1mZmOPU 19TyrTIhpKX49ys/Vcxbh9JZTJGLoJBUmjoXFcpJL2FEtNMwqUP9avacKRk0t40vk9e3WzGCD4 AqoJwTRiGfwZJhfkwbaDIBxWdoRNmpw/g7zJXgijJmLLfO5rgtm7Xc9qL1KnM8LlQ9NOwX6Yie yQqlOHJqMD31cUF9DEcAHO8F5GEGhJFWrv+DjWD17LGQbwkQe/vbT62Vgxl8/1xlnuXb12Zt0K nP8= Message-ID: Date: Tue, 7 Jun 2022 13:28:33 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101 Thunderbird/91.9.1 Subject: Re: [PATCH] libgomp, openmp: pinned memory Content-Language: en-GB To: Jakub Jelinek CC: "gcc-patches@gcc.gnu.org" References: <20220104155558.GG2646553@tucnak> <48ee767a-0d90-53b4-ea54-9deba9edd805@codesourcery.com> <20220104182829.GK2646553@tucnak> <20220104184740.GL2646553@tucnak> From: Andrew Stubbs In-Reply-To: Content-Type: text/plain; charset="UTF-8"; format=flowed Content-Transfer-Encoding: 7bit X-Originating-IP: [137.202.0.90] X-ClientProxiedBy: svr-ies-mbx-12.mgc.mentorg.com (139.181.222.12) To svr-ies-mbx-11.mgc.mentorg.com (139.181.222.11) X-Spam-Status: No, score=-6.0 required=5.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, NICE_REPLY_A, SPF_HELO_PASS, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 07 Jun 2022 12:28:41 -0000 On 07/06/2022 13:10, Jakub Jelinek wrote: > On Tue, Jun 07, 2022 at 12:05:40PM +0100, Andrew Stubbs wrote: >> Following some feedback from users of the OG11 branch I think I need to >> withdraw this patch, for now. >> >> The memory pinned via the mlock call does not give the expected performance >> boost. I had not expected that it would do much in my test setup, given that >> the machine has a lot of RAM and my benchmarks are small, but others have >> tried more and on varying machines and architectures. > > I don't understand why there should be any expected performance boost (at > least not unless the machine starts swapping out pages), > { omp_atk_pinned, true } is solely about the requirement that the memory > can't be swapped out. It seems like it takes a faster path through the NVidia drivers. This is a black box, for me, but that seems like a plausible explanation. The results are different on x86_64 and powerpc hosts (such as the Summit supercomputer). >> It seems that it isn't enough for the memory to be pinned, it has to be >> pinned using the Cuda API to get the performance boost. I had not done this > > For performance boost of what kind of code? > I don't understand how Cuda API could be useful (or can be used at all) if > offloading to NVPTX isn't involved. The fact that somebody asks for host > memory allocation with omp_atk_pinned set to true doesn't mean it will be > in any way related to NVPTX offloading (unless it is in NVPTX target region > obviously, but then mlock isn't available, so sure, if there is something > CUDA can provide for that case, nice). This is specifically for NVPTX offload, of course, but then that's what our customer is paying for. The expectation, from users, is that memory pinning will give the benefits specific to the active device. We can certainly make that happen when there is only one (flavour of) offload device present. I had hoped it could be one way for all, but it looks like not. > >> I don't think libmemkind will resolve this performance issue, although >> certainly it can be used for host implementations of low-latency memories, >> etc. > > The reason for libmemkind is primarily its support of HBW memory (but > admittedly I need to find out what kind of such memory it does support), > or the various interleaving etc. the library has. > Plus, when we have such support, as it has its own costomizable allocator, > it could be used to allocate larger chunks of memory that can be mlocked > and then just allocate from that pinned memory if user asks for small > allocations from that memory. It should be straight-forward to switch the no-offload implementation to libmemkind when the time comes (the changes would be contained within config/linux/allocator.c), but I have no plans to do so myself (and no hardware to test it with). I'd prefer that it didn't impede the offload solution in the meantime. Andrew