From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from APC01-SG2-obe.outbound.protection.outlook.com (mail-sgaapc01on2045.outbound.protection.outlook.com [40.107.215.45]) by sourceware.org (Postfix) with ESMTPS id 453DC3884CAA for ; Mon, 12 Dec 2022 11:13:09 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 453DC3884CAA Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=siemens.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=siemens.com ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=lm8fG7zR43gr8g6Hwtry+VkF5UTwwsTNhFzy48XP6rQs3MOfJOUNYQERdS0FmAGuIa/g/ZuoDdTR1T2DjidT5E4Hf1Kzy8Bj/CqlHcEvHhHI9qk6j8cd37RavLVGUUEXvoo4dYW+e5evSRJPSmF3Bf4Dd72IYZS0KzEKb9lQzsdfX6AMf6GWBVxmj5ntiq4skLXOwzw1ZgyKznUAps4q7ybOG94h2Dw4gGwtKHLJ7LJvqFpnHvPUeuu4nw10t62Bk6rLo12QacqpPkaTC8YUGIRGrfLNggoDaaPL36HkvHH+HVCtacW517pnJzHwST2oB2C9g9abCmOHYTmP6YCOOQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=ZioC7r3DkRB4OY1SJRBPFRQ3bAsgHJ0d0ev/mov53QI=; b=CLySqhdFu1tsrR87m9DWEtlfRiTP1EhfNTns7n4b3IPVZ2GCL+JTc1ul0lrenXuR6iaAa8pF/+1Sz3DfHhp3mBQG7heXmxap57dOZ99keFfEJztI8U9T9u6/rX5ubFG3VUb7Jp1DJsQqAjcPcDd+lmLYSGCGxVfOm3TBn4E1dWRQ16a9R4bp5rJWkuVj3P9GTusQBruIHqfc+kgSjU8KnlhyE39gP3/TKHuCT/4FhelAbb1Sl+iRtTJI1XPJh3PJPOpzewT7cW7lXi127aMV3TyXbUSdUGJdXTL1+IkBpzVlthChXyr7qgtRmpL5NcTZ5ZUtIA+vhP9h+5ps7D1+UQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=siemens.com; dmarc=pass action=none header.from=siemens.com; dkim=pass header.d=siemens.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=siemens.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=ZioC7r3DkRB4OY1SJRBPFRQ3bAsgHJ0d0ev/mov53QI=; b=YN8+beeHikZ1VYT/t3rms1vao8bbQtSSI8dmDhDrbaZVYMb3m5/ZpDsXXO5tJXQWpbb5K1SW5Reu+hqasN3lDuRlce0nYe64e8IJv+h9Elb4FMBaZIceAdGegnfk9l4J4Mnj/cRTz2SNevCPWnp6g37FDrlgkizKTB7U8woXMzSwYyP7LOTsjbVWxAIhH2W5ERMXmEeAEX/ZvEJW3GnkZw569K62FHj8/QXeCNfmyI9URcGasRSGoBrC4Mz4sA8teeVlRQuMV1vCIMI++cyOIULlYGVAYiN44Pmz50LswmKIFTbQDaac6okq8k0wSH3egRfVkuU16H9r4taMRnzIvQ== Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=siemens.com; Received: from SG2PR06MB5430.apcprd06.prod.outlook.com (2603:1096:4:1ba::14) by PUZPR06MB5885.apcprd06.prod.outlook.com (2603:1096:301:117::11) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5880.19; Mon, 12 Dec 2022 11:13:05 +0000 Received: from SG2PR06MB5430.apcprd06.prod.outlook.com ([fe80::6338:4729:4a99:e3cc]) by SG2PR06MB5430.apcprd06.prod.outlook.com ([fe80::6338:4729:4a99:e3cc%5]) with mapi id 15.20.5880.019; Mon, 12 Dec 2022 11:13:05 +0000 Message-ID: <48af261e-a81e-5fd5-c0ce-8c8a2e979e9c@siemens.com> Date: Mon, 12 Dec 2022 19:13:01 +0800 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:102.0) Gecko/20100101 Thunderbird/102.5.1 Subject: [Ping x6] Re: [PATCH, nvptx, 1/2] Reimplement libgomp barriers for nvptx To: Chung-Lin Tang , Chung-Lin Tang , gcc-patches , Tom de Vries , Catherine Moore References: <8b974d21-e288-4596-7500-277a43c92771@gmail.com> <32ba851f-ad70-155e-c321-b9bfb610f353@codesourcery.com> <0d3cfa3f-8d63-e2bb-ea31-2f39753d6dd1@siemens.com> <50529ebd-9d54-5581-6719-d968ab5e4ed3@siemens.com> Content-Language: en-US From: Chung-Lin Tang In-Reply-To: <50529ebd-9d54-5581-6719-d968ab5e4ed3@siemens.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-ClientProxiedBy: TYAPR01CA0116.jpnprd01.prod.outlook.com (2603:1096:404:2a::32) To SG2PR06MB5430.apcprd06.prod.outlook.com (2603:1096:4:1ba::14) MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: SG2PR06MB5430:EE_|PUZPR06MB5885:EE_ X-MS-Office365-Filtering-Correlation-Id: fb052ebd-4491-4931-1cb9-08dadc31d779 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: A8xvVs9hd3wgcioOWkDAJxB1sa5W2AGLcSyrypVBCOq180wDc/2EA8xNZpIvsPUCMyYutv9/imUq3cKX3D55/+jrz10zoqjEKufKYzHjv1KVcjNzyV2aS/8wkAbxc46InsuVXRBJJLgVyIdQVb0ekCwcKbmusanWClKVN3i3Y4SEuyxt+tu+jIFHTl44Pq3+j+ZfGvtJ1AhFF2+D/1nmiKEtlIawIwpvO05AqRZRRnBt1TKv1w9vREoIWeN4P98UcDCLvEN8GxN9z3DypzsQCnr/3TUUVzGEZloMcSYfs3mXWrpyHyEAQdhnnr5L5d4ciPVj5u9W82zCp8hk25hN/WlxcGqZgaRuoxysJ0ESj8XUqy5TRkcPJw0Eao7j5YZkUr+nfEHYDBErppq2S6ND0wbMyYzaLKEBniYM2lYu3mJn0oTPT2eVo8tnKjqWc2b1gloX3AVvHOWfTpNYbVnzleAlVA7YhQlWJASO+2kAXm86HLaKJlUKWMSMh86gy79vkzX+xfopGCYPZ0byaoDvPwQQfaZ27LqivCPihxGCpEo3HNxs65zhU2H/YE4pL/EJagJKi/tTPLdppGIn69d138nDyRnVAzP3AcF2QaQj7mPvRt1Q3Ufe05Za5yNqsrckJ9oL0nv7gPEalOFshxd1h12p8TzLrwgmhvT3adaxZCWXkUNLGXK5kM1kKCVntCq/qmkKd5Y9qebhB04UyU+wSPY7s0IFlyPH5z63QkDNbcpfLyZ/CD6AL9/zRo/wF0tANR8vDPMlweCf/W7TkJ5Qgw== X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:SG2PR06MB5430.apcprd06.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230022)(4636009)(376002)(39860400002)(136003)(346002)(396003)(366004)(451199015)(31686004)(38100700002)(53546011)(6512007)(6506007)(6486002)(966005)(86362001)(31696002)(36756003)(2616005)(82960400001)(186003)(83380400001)(66476007)(26005)(478600001)(5660300002)(6666004)(8676002)(2906002)(66556008)(41300700001)(66946007)(316002)(8936002)(110136005)(17423001)(43740500002)(45980500001);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: =?utf-8?B?NjRoZUdFWDdoaWFLVTlqZkQ2M0FZd0k3MWZLVnYwREtOTWtyY21xSUo4bzRu?= =?utf-8?B?RVhFeXZWZVUwUnEyeFB3MHk4N3htRmNjVHByYyttV25GYmhmeDhXQlZoVUNj?= =?utf-8?B?c3B5SDFCVU5qZm9wcS9mZUF0MjFhaEp2S0VzZmxkU1JBN0kyamo5OG92b1l1?= =?utf-8?B?bXZBL0I5MDZjMktFV1VSa24rVkNyTUV5dFVKWTNteXMyOEhMcjg4VXBsRjkv?= =?utf-8?B?bWhHNnB6U2ZlM2JzbHlFSzRJWUxBM2xCd3dNQXhDbFN4cjNqVTNmZ0MwMnls?= =?utf-8?B?MHA0aCsxSHFzVDgwc2xKM2w3TzVIaEFrSndFVUM3bGlCV3FmdVRCNHVkOFNF?= =?utf-8?B?TlFjd1UwQ3FqaUF4bUhuNG5iM1hCSXhOLzdqbW9JM1RLdExDZFlNNk44Vnkz?= =?utf-8?B?NDUwTENidGYzRGhKcmlycHJ5d0ZUcDVlOEFWV3dLcWlCdjNLWkZTSmRSTE5X?= =?utf-8?B?T0cyVmdsQml1M1Rabk0wUXJVTGtGVjk0NGZEN0NMVFR4UDgyeEdvakFON0tM?= =?utf-8?B?akNmNzlHWnFPcGJhbW1EcVVSYklHQkVNYjI0SG9EVUJTUWZoOWVCNU9xNUF0?= =?utf-8?B?MnNtc2pXRFQzcVVWbU5TblV2NVpORzdoUUwxZ1JFQ3FFRFRyelMzWnNnbjBu?= =?utf-8?B?UFRZNFBDclNEL3MzNWVsaGpqb2NpMVhIaTVRbWxQSm9nUGU0Z2tVdTBHQTJu?= =?utf-8?B?aXl3TnJjVlZQd0xVOVB5Z0xJMkZBUWNVNEZhTHhJWEVhMHNkU2JEcUFmVEpi?= =?utf-8?B?TmNONkQzemQvZW1ndDJ0Y1M4YW90NGMwb2lpMVd4RG55THBUcjVjdFNhNWhT?= =?utf-8?B?ZTlXc3IyeTVtYmxlRDViNk5VUmUxZUZPSWpFNkZWRkVlZk1KbzFZS2xZRm1V?= =?utf-8?B?OUo0Y3F4elJSRVdBTDRWQnRzVXdxQzBWZlp6akNSeko3b2hMNFFvNVpBMHkz?= =?utf-8?B?TVA4WkgySFk1ZndaTVhoVkt5YlhtWnNQZHlRNnNIaXBpcVRod1ZWL3N3Tktz?= =?utf-8?B?WXVPWWtjOHprYWw0dUkySyt1Y3RFc0E0RUhYVEtONTFmYkRiKzhjazVTMDVB?= =?utf-8?B?ZjByVG8yOUFjK09uTXIwVEg4L3ZwbGZpSGsyV3FWTitVMTNWMjM1Qk9yTmV4?= =?utf-8?B?UTc1VElwSm9XT2dTU0oybHppL0laTHVsazQwdXkvb0o4Z3lxNkttV0hseDl6?= =?utf-8?B?M1JGN3UrQ2lkN3J5T041aU00QkJ1SThoNDdTQTkyRWN0RDNXTmNjNlZLWjc5?= =?utf-8?B?TkdTTnpId29CUlVPNEZzN1NDa3lHZ2ZFVjZYNjRGa3NoMXkxOG9MeWx6aUdV?= =?utf-8?B?ZlBYTmw3Y1NGTmJDaytpZkdBNk9xcEcvc3BVRUllUldiU2M5WG5LT3hHL0E0?= =?utf-8?B?bDJFVFloU25xSDRIdUZXaDMvSDA0U3hLYTVSOXVIS0RrSUlPSHZsd1ZtenFN?= =?utf-8?B?NzdxZ1FKUU5CL0l4bVFHekRxS25ibW82UHRKcnNlMmdZd0xmazNCSnV6RzhO?= =?utf-8?B?Tit0c0dXV1FuakYvR2E1WFcxL0NHbVNlZG5yZmY1b25FUm5NdGpsalMwNDZw?= =?utf-8?B?K2Jlc25kMUVKaUQ2Q1NyUzk4aEFFM1BIUGludXVNUnNJVEx6MnNVMTIrYUdh?= =?utf-8?B?QVpsMjdQdHJ3MzRiZ1pGZjVvTTR5MUNzODFnNGYwV3BidG43MjZIWUwvbVJv?= =?utf-8?B?R2V0NkpmTVplc2tZVDVHa1VIR3ZpQzBla2FlN1JaRmoxZnNDR3JWcmJSMmU5?= =?utf-8?B?eTBBZnM5azd1eFp1R3pDQmhLbW5heFora3VsdCtaTzUvTWtid2xvRXp5VGZV?= =?utf-8?B?Y0tKL0NOTzJ3aGxnMjZVUUpWZFRzbGFnbmhhSkFMRXhCVVhoTXZVMGZpU29k?= =?utf-8?B?UjQzeXh2OUpRc2g2OGdDUHF3a0ZBY1E4b21HcExOMWtxSHN0bjhGVE9IdHlu?= =?utf-8?B?cVVmbDIwbGJwZjF4UXdCTE1GTlV5dnJkYTVOWXI4dkxiL1N2YkpVSlVPSUZ4?= =?utf-8?B?NnR1K1haYmdaelI2bG91enNZRHBXaEdUNko0RE9nS0NpRW94V0dxSTFSakNx?= =?utf-8?B?ZGVsc2t3L0trQUZkRklJL3NQTEtMbVE3NjFIS1VqV2YzaWNETkdQNUU1Uk9N?= =?utf-8?B?WTFiUnR5TTRrN205eFpDZkpGWXpaRTNMZnU1ck5aQTFpU09DcWZTMEpFOXhQ?= =?utf-8?B?NGc9PQ==?= X-OriginatorOrg: siemens.com X-MS-Exchange-CrossTenant-Network-Message-Id: fb052ebd-4491-4931-1cb9-08dadc31d779 X-MS-Exchange-CrossTenant-AuthSource: SG2PR06MB5430.apcprd06.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 12 Dec 2022 11:13:05.5392 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 38ae3bcd-9579-4fd4-adda-b42e1495d55a X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: 51uNhY7jRSRCq+GaGE79vR9TvAlbuKID7DUlzdgUmWi3nGweal0LEreExFNziISAaVFD7m/Kp9c+5nT/JC0cemuJSXtJVx+2BlsaDApH9GI= X-MS-Exchange-Transport-CrossTenantHeadersStamped: PUZPR06MB5885 X-Spam-Status: No, score=-1.5 required=5.0 tests=BAYES_00,DKIMWL_WL_MED,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FORGED_SPF_HELO,KAM_SHORT,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_PASS,SPF_NONE,TXREP autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Ping x6 On 2022/12/6 12:21 AM, Chung-Lin Tang wrote: > Ping x5 > > On 2022/11/22 12:24 上午, Chung-Lin Tang wrote: >> Ping x4 >> >> On 2022/11/8 12:34 AM, Chung-Lin Tang wrote: >>> Ping x3. >>> >>> On 2022/10/31 10:18 PM, Chung-Lin Tang wrote: >>>> Ping x2. >>>> >>>> On 2022/10/17 10:29 PM, Chung-Lin Tang wrote: >>>>> Ping. >>>>> >>>>> On 2022/9/21 3:45 PM, Chung-Lin Tang via Gcc-patches wrote: >>>>>> Hi Tom, >>>>>> I had a patch submitted earlier, where I reported that the current way of implementing >>>>>> barriers in libgomp on nvptx created a quite significant performance drop on some SPEChpc2021 >>>>>> benchmarks: >>>>>> https://gcc.gnu.org/pipermail/gcc-patches/2022-September/600818.html That previous patch wasn't accepted well (admittedly, it was kind of a hack). >>>>>> So in this patch, I tried to (mostly) re-implement team-barriers for NVPTX. >>>>>> >>>>>> Basically, instead of trying to have the GPU do CPU-with-OS-like things that it isn't suited for, >>>>>> barriers are implemented simplistically with bar.* synchronization instructions. >>>>>> Tasks are processed after threads have joined, and only if team->task_count != 0 >>>>>> >>>>>> (arguably, there might be a little bit of performance forfeited where earlier arriving threads >>>>>> could've been used to process tasks ahead of other threads. But that again falls into requiring >>>>>> implementing complex futex-wait/wake like behavior. Really, that kind of tasking is not what target >>>>>> offloading is usually used for) >>>>>> >>>>>> Implementation highlight notes: >>>>>> 1. gomp_team_barrier_wake() is now an empty function (threads never "wake" in the usual manner) >>>>>> 2. gomp_team_barrier_cancel() now uses the "exit" PTX instruction. >>>>>> 3. gomp_barrier_wait_last() now is implemented using "bar.arrive" >>>>>> >>>>>> 4. gomp_team_barrier_wait_end()/gomp_team_barrier_wait_cancel_end(): >>>>>> The main synchronization is done using a 'bar.red' instruction. This reduces across all threads >>>>>> the condition (team->task_count != 0), to enable the task processing down below if any thread >>>>>> created a task. (this bar.red usage required the need of the second GCC patch in this series) >>>>>> >>>>>> This patch has been tested on x86_64/powerpc64le with nvptx offloading, using libgomp, ovo, omptests, >>>>>> and sollve_vv testsuites, all without regressions. Also verified that the SPEChpc 2021 521.miniswp_t >>>>>> and 534.hpgmgfv_t performance regressions that occurred in the GCC12 cycle has been restored to >>>>>> devel/omp/gcc-11 (OG11) branch levels. Is this okay for trunk? >>>>>> >>>>>> (also suggest backporting to GCC12 branch, if performance regression can be considered a defect) >>>>>> >>>>>> Thanks, >>>>>> Chung-Lin >>>>>> >>>>>> libgomp/ChangeLog: >>>>>> >>>>>> 2022-09-21 Chung-Lin Tang >>>>>> >>>>>> * config/nvptx/bar.c (generation_to_barrier): Remove. >>>>>> (futex_wait,futex_wake,do_spin,do_wait): Remove. >>>>>> (GOMP_WAIT_H): Remove. >>>>>> (#include "../linux/bar.c"): Remove. >>>>>> (gomp_barrier_wait_end): New function. >>>>>> (gomp_barrier_wait): Likewise. >>>>>> (gomp_barrier_wait_last): Likewise. >>>>>> (gomp_team_barrier_wait_end): Likewise. >>>>>> (gomp_team_barrier_wait): Likewise. >>>>>> (gomp_team_barrier_wait_final): Likewise. >>>>>> (gomp_team_barrier_wait_cancel_end): Likewise. >>>>>> (gomp_team_barrier_wait_cancel): Likewise. >>>>>> (gomp_team_barrier_cancel): Likewise. >>>>>> * config/nvptx/bar.h (gomp_team_barrier_wake): Remove >>>>>> prototype, add new static inline function. >>> >> >