From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from esa2.mentor.iphmx.com (esa2.mentor.iphmx.com [68.232.141.98]) by sourceware.org (Postfix) with ESMTPS id 0EA98384B122 for ; Thu, 9 Jun 2022 10:09:59 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 0EA98384B122 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=codesourcery.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mentor.com X-IronPort-AV: E=Sophos;i="5.91,287,1647331200"; d="scan'208";a="77014862" Received: from orw-gwy-01-in.mentorg.com ([192.94.38.165]) by esa2.mentor.iphmx.com with ESMTP; 09 Jun 2022 02:09:58 -0800 IronPort-SDR: q2EVLQZgP7vBfxcmxy/8+iiOLin1VrgGxilxiJ5lrXYNIQMU71FHFWHJl8uIDeuGELgIAE30ro BSevn1SHZ/OC+QGKdiZWMsKbDYZfjNrMDQLIhBjoxvdL/GfAKQ+rNeCkttpfFkuKi9G83e9jwM 4Li3bANl8nSFA8e45nRUCc7b/6ZXLolZNOa432FKvlwYCFKzuKJIPo/HV09z6JCCe1IPkXDcjv 5Crh4A31PglEA8JlzKCUS1v6fLEpdhRBOAKwpJfNOVuKXSmlZ8w0NZVgOPul6usjGClgMN0UNY ACo= Message-ID: <8c95bbcf-7a74-738d-ffc2-4cae606aac62@codesourcery.com> Date: Thu, 9 Jun 2022 12:09:52 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.10.0 Subject: Re: [PATCH] libgomp, openmp: pinned memory Content-Language: en-US To: Thomas Schwinge , Andrew Stubbs , Jakub Jelinek CC: References: <20220104155558.GG2646553@tucnak> <48ee767a-0d90-53b4-ea54-9deba9edd805@codesourcery.com> <20220104182829.GK2646553@tucnak> <20220104184740.GL2646553@tucnak> <87edzy5g8h.fsf@euler.schwinge.homeip.net> From: Tobias Burnus In-Reply-To: <87edzy5g8h.fsf@euler.schwinge.homeip.net> Content-Type: text/plain; charset="UTF-8"; format=flowed Content-Transfer-Encoding: quoted-printable X-Originating-IP: [137.202.0.90] X-ClientProxiedBy: svr-ies-mbx-11.mgc.mentorg.com (139.181.222.11) To svr-ies-mbx-12.mgc.mentorg.com (139.181.222.12) X-Spam-Status: No, score=-6.0 required=5.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, NICE_REPLY_A, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 09 Jun 2022 10:10:02 -0000 On 09.06.22 11:38, Thomas Schwinge wrote: > On 2022-06-07T13:28:33+0100, Andrew Stubbs wrote: >> On 07/06/2022 13:10, Jakub Jelinek wrote: >>> On Tue, Jun 07, 2022 at 12:05:40PM +0100, Andrew Stubbs wrote: >>>> The memory pinned via the mlock call does not give the expected perfor= mance >>>> boost. I had not expected that it would do much in my test setup, give= n that >>>> the machine has a lot of RAM and my benchmarks are small, but others h= ave >>>> tried more and on varying machines and architectures. >>> I don't understand why there should be any expected performance boost (= at >>> least not unless the machine starts swapping out pages), >>> { omp_atk_pinned, true } is solely about the requirement that the memor= y >>> can't be swapped out. >> It seems like it takes a faster path through the NVidia drivers. [...] I think this conflates two parts: * User-defined allocators in general =E2=80=93 there CUDA does not make muc= h sense and without unified-shared memory, it will always be inaccessible on the device (w/o explicit/implicit mapping). * Memory which is supposed to be accessible both on the host and on the device. That's most obvious by explicitly allocating to be accessible on both =E2=80=93 it is less clear cut when just creating an allocator with unified-shared memory as it is not clear when it is only using on the host (e.g. with host-based thread parallelization) =E2=80=93 and when it is= also relevant for the device. Currently, the user has no means to express the intent that it should be accessible on both the host and one/several devices, except for 'omp requires unified_shared_memory'. The next OpenMP version will likely permit a means to create an allocator which permits this =E2=86=92 https://github.com/OpenMP/spec/issues/1843 (not publicly available; slides (last comment) are slightly outdated). * * * The question is only what to do with 'requires unified_shared_memory' =E2= =80=93 and a non-multi-device allocator. Probably: unified_shared_memory or no nvptx device: just use mlock. Otherwise (i.e. both nvptx device and (unified_shared_memory or a multi-device-allocator)), use the CUDA one. For the latter, I think Thomas' remarks are helpful. Tobias ----------------- Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstra=C3=9Fe 201= , 80634 M=C3=BCnchen; Gesellschaft mit beschr=C3=A4nkter Haftung; Gesch=C3= =A4ftsf=C3=BChrer: Thomas Heurung, Frank Th=C3=BCrauf; Sitz der Gesellschaf= t: M=C3=BCnchen; Registergericht M=C3=BCnchen, HRB 106955