From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by sourceware.org (Postfix) with ESMTPS id 9657A3821370 for ; Tue, 7 Jun 2022 12:10:34 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 9657A3821370 Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-434-jYOVcLVaMCa968d0sDHU5g-1; Tue, 07 Jun 2022 08:10:31 -0400 X-MC-Unique: jYOVcLVaMCa968d0sDHU5g-1 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.rdu2.redhat.com [10.11.54.4]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id DDFB21C06ECA; Tue, 7 Jun 2022 12:10:30 +0000 (UTC) Received: from tucnak.zalov.cz (unknown [10.39.192.11]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 7AD0E200E263; Tue, 7 Jun 2022 12:10:30 +0000 (UTC) Received: from tucnak.zalov.cz (localhost [127.0.0.1]) by tucnak.zalov.cz (8.17.1/8.17.1) with ESMTPS id 257CARsS2099843 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NOT); Tue, 7 Jun 2022 14:10:27 +0200 Received: (from jakub@localhost) by tucnak.zalov.cz (8.17.1/8.17.1/Submit) id 257CAQlD2099842; Tue, 7 Jun 2022 14:10:26 +0200 Date: Tue, 7 Jun 2022 14:10:26 +0200 From: Jakub Jelinek To: Andrew Stubbs Cc: "gcc-patches@gcc.gnu.org" Subject: Re: [PATCH] libgomp, openmp: pinned memory Message-ID: Reply-To: Jakub Jelinek References: <20220104155558.GG2646553@tucnak> <48ee767a-0d90-53b4-ea54-9deba9edd805@codesourcery.com> <20220104182829.GK2646553@tucnak> <20220104184740.GL2646553@tucnak> MIME-Version: 1.0 In-Reply-To: X-Scanned-By: MIMEDefang 2.78 on 10.11.54.4 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=us-ascii Content-Disposition: inline X-Spam-Status: No, score=-4.8 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 07 Jun 2022 12:10:35 -0000 On Tue, Jun 07, 2022 at 12:05:40PM +0100, Andrew Stubbs wrote: > Following some feedback from users of the OG11 branch I think I need to > withdraw this patch, for now. > > The memory pinned via the mlock call does not give the expected performance > boost. I had not expected that it would do much in my test setup, given that > the machine has a lot of RAM and my benchmarks are small, but others have > tried more and on varying machines and architectures. I don't understand why there should be any expected performance boost (at least not unless the machine starts swapping out pages), { omp_atk_pinned, true } is solely about the requirement that the memory can't be swapped out. > It seems that it isn't enough for the memory to be pinned, it has to be > pinned using the Cuda API to get the performance boost. I had not done this For performance boost of what kind of code? I don't understand how Cuda API could be useful (or can be used at all) if offloading to NVPTX isn't involved. The fact that somebody asks for host memory allocation with omp_atk_pinned set to true doesn't mean it will be in any way related to NVPTX offloading (unless it is in NVPTX target region obviously, but then mlock isn't available, so sure, if there is something CUDA can provide for that case, nice). > I don't think libmemkind will resolve this performance issue, although > certainly it can be used for host implementations of low-latency memories, > etc. The reason for libmemkind is primarily its support of HBW memory (but admittedly I need to find out what kind of such memory it does support), or the various interleaving etc. the library has. Plus, when we have such support, as it has its own costomizable allocator, it could be used to allocate larger chunks of memory that can be mlocked and then just allocate from that pinned memory if user asks for small allocations from that memory. Jakub