From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from esa3.mentor.iphmx.com (esa3.mentor.iphmx.com [68.232.137.180]) by sourceware.org (Postfix) with ESMTPS id 5475F3858C2C for ; Tue, 4 Jan 2022 16:58:29 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 5475F3858C2C Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=codesourcery.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mentor.com IronPort-SDR: PRQzrQqwnZcm/Rb5XNiwtYliMGZIO3AR3VHauqsoOMAK1HAfKXMzJbMBVBqq27Ewqps6LX2myK 01J1CtmSFMItyQSYz1lc5nqoDTclP1XsxDXhcJ042eQ8e/xryIElhfYRSql+d1MYemPbACxjCO vtDNvUo7g0MRvitrC2mSUQIiWb0Sunv+Fccj5dcEgi1XRSw4aEcox62HQcDz7iuy795c6OKA5n QgnFCilt8YYzXYjn2F2TlmK7VxLYnNn+bbVr1dUY3MwKWahQ3YsbJZUl0Jv0ihY9Ac9NrSMVJH g0brIqK15YFH2+FrOaPXSJ/S X-IronPort-AV: E=Sophos;i="5.88,261,1635235200"; d="scan'208";a="70236057" Received: from orw-gwy-02-in.mentorg.com ([192.94.38.167]) by esa3.mentor.iphmx.com with ESMTP; 04 Jan 2022 08:58:25 -0800 IronPort-SDR: cX8ZR6ApV2Bkk6OHBF5gY6ULrnldL5X5sFl/Jc6GuMwE5c/QCvqB6mlZNgDbVZanxtHUaiNqz3 iL1y1AbeI6wWyz3GjzZAwocKWaG79re1bYhlmN8mZbSBV2xiO1BVA/CD7Yr9uAr335l52r1QAr e9IHja9nTGmZmDDEwQodpNa2MpCXw8TnD7962QBLQwGyizqs1weJJfWMwcSVDdCY/VlBTX3JLQ /w2Ti1Cz+Xb+FNeRbkg0oSyr9bIZlPpSEA74GhdIroDu6KfcDoD73U43QYiiU3Bd2dgQte2sbT bDM= Message-ID: <48ee767a-0d90-53b4-ea54-9deba9edd805@codesourcery.com> Date: Tue, 4 Jan 2022 16:58:19 +0000 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101 Thunderbird/91.4.1 Subject: Re: [PATCH] libgomp, openmp: pinned memory Content-Language: en-GB To: Jakub Jelinek CC: "gcc-patches@gcc.gnu.org" References: <20220104155558.GG2646553@tucnak> From: Andrew Stubbs In-Reply-To: <20220104155558.GG2646553@tucnak> Content-Type: text/plain; charset="UTF-8"; format=flowed Content-Transfer-Encoding: 7bit X-Originating-IP: [137.202.0.90] X-ClientProxiedBy: svr-ies-mbx-12.mgc.mentorg.com (139.181.222.12) To svr-ies-mbx-01.mgc.mentorg.com (139.181.222.1) X-Spam-Status: No, score=-7.2 required=5.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, NICE_REPLY_A, SPF_HELO_PASS, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 04 Jan 2022 16:58:31 -0000 On 04/01/2022 15:55, Jakub Jelinek wrote: > The usual libgomp way of doing this wouldn't be to use #ifdef __linux__, but > instead add libgomp/config/linux/allocator.c that includes some headers, > defines some macros and then includes the generic allocator.c. OK, good point, I can do that. > I think perror is the wrong thing to do, omp_alloc etc. has a well defined > interface what to do in such cases - the allocation should just fail (not be > allocated) and depending on user's choice that can be fatal, or return NULL, > or chain to some other allocator with other properties etc. I did it this way because pinning feels more like an optimization, and falling back to "just works" seemed like what users would want to happen. The perror was added because it turns out the default ulimit is tiny and I wanted to hint at the solution. I guess you're right that the consistent behaviour would be to silently switch to the fallback allocator, but it still feels like users will be left in the dark about why it failed. > Other issues in the patch are that it doesn't munlock on deallocation and > that because of that deallocation we need to figure out what to do on page > boundaries. As documented, mlock can be passed address and/or address + > size that aren't at page boundaries and pinning happens even just for > partially touched pages. But munlock unpins also even the partially > overlapping pages and we don't know at that point whether some other pinned > allocations don't appear in those pages. Right, it doesn't munlock because of these issues. I don't know of any way to solve this that wouldn't involve building tables of locked ranges (and knowing what the page size is). I considered using mmap with the lock flag instead, but the failure mode looked unhelpful. I guess we could mmap with the regular flags, then mlock after. That should bypass the regular heap and ensure each allocation has it's own page. I'm not sure what the unintended side-effects of that might be. > Some bad options are only pin pages wholy contained within the allocation > and don't pin partial pages around it, force at least page alignment and > size so that everything can be pinned, somehow ensure that we never allocate > more than one pinned allocation in such partial pages (but can allocate > there non-pinned allocations), or e.g. use some internal data structure to > track how many pinned allocations are on the partial pages (say a hash map > from page start address to a counter how many pinned allocations are there, > if it goes to 0 munlock even that page, otherwise munlock just the wholy > contained pages), or perhaps use page size aligned allocation and size and > just remember in some data structure that the partial pages could be used > for other pinned (small) allocations. Bad options indeed. If any part of the memory block is not pinned I expect no performance gains whatsoever. And all this other business adds complexity and runtime overhead. For version 1.0 it feels reasonable to omit the unlock step and hope that a) pinned data will be long-lived, or b) short-lived pinned data will be replaced with more data that -- most likely -- occupies the same pages. Similarly, it seems likely that serious HPC applications will run on devices with lots of RAM, and if not any page swapping will destroy the performance gains of using OpenMP. For now I'll just fix the architectural issues. Andrew