From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by sourceware.org (Postfix) with ESMTPS id B0BDF3814FC2 for ; Tue, 7 Jun 2022 12:40:34 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org B0BDF3814FC2 Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-625-6NwhwCeWM7mxc8s0HZhcgw-1; Tue, 07 Jun 2022 08:40:32 -0400 X-MC-Unique: 6NwhwCeWM7mxc8s0HZhcgw-1 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.rdu2.redhat.com [10.11.54.7]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 53CC08027EE; Tue, 7 Jun 2022 12:40:32 +0000 (UTC) Received: from tucnak.zalov.cz (unknown [10.39.192.11]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 12F26140EBD5; Tue, 7 Jun 2022 12:40:31 +0000 (UTC) Received: from tucnak.zalov.cz (localhost [127.0.0.1]) by tucnak.zalov.cz (8.17.1/8.17.1) with ESMTPS id 257CeSwq2099935 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NOT); Tue, 7 Jun 2022 14:40:29 +0200 Received: (from jakub@localhost) by tucnak.zalov.cz (8.17.1/8.17.1/Submit) id 257CeShv2099934; Tue, 7 Jun 2022 14:40:28 +0200 Date: Tue, 7 Jun 2022 14:40:28 +0200 From: Jakub Jelinek To: Andrew Stubbs Cc: "gcc-patches@gcc.gnu.org" Subject: Re: [PATCH] libgomp, openmp: pinned memory Message-ID: Reply-To: Jakub Jelinek References: <20220104155558.GG2646553@tucnak> <48ee767a-0d90-53b4-ea54-9deba9edd805@codesourcery.com> <20220104182829.GK2646553@tucnak> <20220104184740.GL2646553@tucnak> MIME-Version: 1.0 In-Reply-To: X-Scanned-By: MIMEDefang 2.85 on 10.11.54.7 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=us-ascii Content-Disposition: inline X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 07 Jun 2022 12:40:36 -0000 On Tue, Jun 07, 2022 at 01:28:33PM +0100, Andrew Stubbs wrote: > > For performance boost of what kind of code? > > I don't understand how Cuda API could be useful (or can be used at all) if > > offloading to NVPTX isn't involved. The fact that somebody asks for host > > memory allocation with omp_atk_pinned set to true doesn't mean it will be > > in any way related to NVPTX offloading (unless it is in NVPTX target region > > obviously, but then mlock isn't available, so sure, if there is something > > CUDA can provide for that case, nice). > > This is specifically for NVPTX offload, of course, but then that's what our > customer is paying for. > > The expectation, from users, is that memory pinning will give the benefits > specific to the active device. We can certainly make that happen when there > is only one (flavour of) offload device present. I had hoped it could be one > way for all, but it looks like not. I think that is just an expectation that isn't backed by anything in the standard. When users need something like that (but would be good to describe what it is, memory that will be primarily used for interfacing the offloading device 0 (or some specific device given by some number), or memory that can be used without remapping on some offloading device, something else? And when we know what exactly that is (e.g. what Cuda APIs or GCN APIs etc. can provide), discuss on omp-lang whether there shouldn't be some standard way to ask for such an allocator. Or there is always the possibility of extensions. Not sure if one can just define ompx_atv_whatever, use some large value for it (but the spec doesn't have a vendor range which would be safe to use) and support it that way. Plus a different thing is allocators in the offloading regions. I think we should translate some omp_alloc etc. calls in such regions when they use constant expression standard allocators to doing the allocation through other means, or allocators.c can be overridden or amended for the needs or possibilities of the offloading targets. Jakub