From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by sourceware.org (Postfix) with ESMTPS id BF6283858C2C for ; Tue, 4 Jan 2022 15:57:24 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org BF6283858C2C Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-577-vZx7ty8BOZ29GU9gQsGLzA-1; Tue, 04 Jan 2022 10:57:23 -0500 X-MC-Unique: vZx7ty8BOZ29GU9gQsGLzA-1 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.phx2.redhat.com [10.5.11.22]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 218F381EE62; Tue, 4 Jan 2022 15:57:22 +0000 (UTC) Received: from tucnak.zalov.cz (unknown [10.2.16.169]) by smtp.corp.redhat.com (Postfix) with ESMTPS id B744F1091EEB; Tue, 4 Jan 2022 15:57:21 +0000 (UTC) Received: from tucnak.zalov.cz (localhost [127.0.0.1]) by tucnak.zalov.cz (8.16.1/8.16.1) with ESMTPS id 204FuOdw2907754 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NOT); Tue, 4 Jan 2022 16:56:39 +0100 Received: (from jakub@localhost) by tucnak.zalov.cz (8.16.1/8.16.1/Submit) id 204FtwpK2907751; Tue, 4 Jan 2022 16:55:58 +0100 Date: Tue, 4 Jan 2022 16:55:58 +0100 From: Jakub Jelinek To: Andrew Stubbs Cc: "gcc-patches@gcc.gnu.org" Subject: Re: [PATCH] libgomp, openmp: pinned memory Message-ID: <20220104155558.GG2646553@tucnak> Reply-To: Jakub Jelinek References: MIME-Version: 1.0 In-Reply-To: X-Scanned-By: MIMEDefang 2.84 on 10.5.11.22 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=us-ascii Content-Disposition: inline X-Spam-Status: No, score=-11.4 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 04 Jan 2022 15:57:26 -0000 On Tue, Jan 04, 2022 at 03:32:17PM +0000, Andrew Stubbs wrote: > This patch implements the OpenMP pinned memory trait for Linux hosts. On > other hosts and on devices the trait becomes a no-op (instead of being > rejected). > > The memory is locked via the mlock syscall, which is both the "correct" way > to do it on Linux, and a problem because the default ulimit for pinned > memory is very small (and most users don't have permission to increase it > (much?)). Therefore the code emits a non-fatal warning message if locking > fails. > > Another approach might be to use cudaHostAlloc to allocate the memory in the > first place, which bypasses the ulimit somehow, but this would not help > non-NVidia users. > > The tests work on Linux and will xfail on other hosts; neither libgomp nor > the test knows how to allocate or query pinned memory elsewhere. > > The patch applies on top of the text of my previously submitted patches, but > does not actually depend on the functionality of those patches. > > OK for stage 1? > > I'll commit a backport to OG11 shortly. > > Andrew > libgomp: pinned memory > > Implement the OpenMP pinned memory trait on Linux hosts using the mlock > syscall. > > libgomp/ChangeLog: > > * allocator.c (MEMSPACE_PIN): New macro. > (xmlock): New function. > (omp_init_allocator): Don't disallow the pinned trait. > (omp_aligned_alloc): Add pinning via MEMSPACE_PIN. > (omp_aligned_calloc): Likewise. > (omp_realloc): Likewise. > * testsuite/libgomp.c/alloc-pinned-1.c: New test. > * testsuite/libgomp.c/alloc-pinned-2.c: New test. > > diff --git a/libgomp/allocator.c b/libgomp/allocator.c > index b1f5fe0a5e2..671b91e7ff8 100644 > --- a/libgomp/allocator.c > +++ b/libgomp/allocator.c > @@ -51,6 +51,25 @@ > #define MEMSPACE_FREE(MEMSPACE, ADDR, SIZE) \ > ((void)MEMSPACE, (void)SIZE, free (ADDR)) > #endif > +#ifndef MEMSPACE_PIN > +/* Only define this on supported host platforms. */ > +#ifdef __linux__ > +#define MEMSPACE_PIN(MEMSPACE, ADDR, SIZE) \ > + ((void)MEMSPACE, xmlock (ADDR, SIZE)) > + > +#include > +#include > +void > +xmlock (void *addr, size_t size) > +{ > + if (mlock (addr, size)) > + perror ("libgomp: failed to pin memory (ulimit too low?)"); > +} > +#else > +#define MEMSPACE_PIN(MEMSPACE, ADDR, SIZE) \ > + ((void)MEMSPACE, (void)ADDR, (void)SIZE) > +#endif > +#endif The usual libgomp way of doing this wouldn't be to use #ifdef __linux__, but instead add libgomp/config/linux/allocator.c that includes some headers, defines some macros and then includes the generic allocator.c. I think perror is the wrong thing to do, omp_alloc etc. has a well defined interface what to do in such cases - the allocation should just fail (not be allocated) and depending on user's choice that can be fatal, or return NULL, or chain to some other allocator with other properties etc. Other issues in the patch are that it doesn't munlock on deallocation and that because of that deallocation we need to figure out what to do on page boundaries. As documented, mlock can be passed address and/or address + size that aren't at page boundaries and pinning happens even just for partially touched pages. But munlock unpins also even the partially overlapping pages and we don't know at that point whether some other pinned allocations don't appear in those pages. Some bad options are only pin pages wholy contained within the allocation and don't pin partial pages around it, force at least page alignment and size so that everything can be pinned, somehow ensure that we never allocate more than one pinned allocation in such partial pages (but can allocate there non-pinned allocations), or e.g. use some internal data structure to track how many pinned allocations are on the partial pages (say a hash map from page start address to a counter how many pinned allocations are there, if it goes to 0 munlock even that page, otherwise munlock just the wholy contained pages), or perhaps use page size aligned allocation and size and just remember in some data structure that the partial pages could be used for other pinned (small) allocations. Jakub