From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from APC01-PSA-obe.outbound.protection.outlook.com (mail-psaapc01on2087.outbound.protection.outlook.com [40.107.255.87]) by sourceware.org (Postfix) with ESMTPS id 76151384606C for ; Mon, 21 Nov 2022 16:24:27 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 76151384606C Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=siemens.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=siemens.com ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=Mg6yUERaJf8pzKdKAEsgNPJ3nMIzTueVTxZPcdSa+TGAx1+7fuHVdHE2C8njZZqTQx1EWwZ0gzMVzZtgFBuWQNnSHnN4ReJDI3yLioMzcnCE/thSRuYdGQSskzXAcdN12MB8sCEsy/VfuqlYrGSjOBHon2Wbhld1l5MMNkPktcBw8Qv/PZQeSlyus0wYyvFzrJZHmogicbFpJ5ZTfpzUN8i8WeUcZY0TcZsoMC8fUNFR3FtUV4WHeWCfYa6rLY25dyia6jNe0gTv2vAAbqQjBq3cjA4qAgv7aObVxVMkbyVnUe4QxOznR6Hy2PfW2w5cCrw2HDypWXkG1RdTVTitaQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=FKn6CR2HbaDO2LOSun+2rtRN6S0pSDmsGaIK8hQmlXc=; b=FFVjO4gibShzn37iOL7pXAG7p3xE5FoXf2b8DvvFnip6GpyPP7dy56tu4unytQfFH3SIqJ8tKzYmhJPOcIaKT2iBocmCCEkbsi+9+d2zRsaG6g9bYAi//dtrJASAzywqD+Nfogs0QtQQg3szdQOwTwRJ3J025kjNAla5iF3ptoGlCDNZnhsU3Q9fikmh7jqWnF2XACY+fVqoUGvY38pmTTNg4IhwVwBBl1KzC8NuQe4x0JDuR0nfYP/yxAmvhuWoyoWKDn9jIS55YCNcZSMuLLZ75KaDh7UxQ98oV6SSy4OBx5GtrCXF1mqzxvuPUGl0tfU8mgifmNVYHytcxwdw1Q== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=siemens.com; dmarc=pass action=none header.from=siemens.com; dkim=pass header.d=siemens.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=siemens.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=FKn6CR2HbaDO2LOSun+2rtRN6S0pSDmsGaIK8hQmlXc=; b=hTzHYUBzvB3SCI4zRJdobE73mYZ+MBiaeaKG9fStt7mOdnWPeKR7NCPii+2Kg95PVGUGSYEdh2IFkXnd0wnXujt4h9MGNAjqx3u7aPhXw5bDSoKSQQdz/lq3OZRXB46Li1FKeIcSxsvPLC03Zr36TQ0EFAcDmYd9qyDrP95P3y5+0njLSKx0rtJt5/R+8NyQrAFSg9k1Dpm+Ixzu7c5ZiIXi70UviLxFFJQeRdGXad+a0BLWjWGkE833VIsMJT4NNbH6g+ui7fskv4Aq5IYaYtEzEq8OOmnwWIiGqmo4EE5EAfNitiWgg7/flnhrFqi5+1AfXYYdv0G9vp1FHpzMYw== Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=siemens.com; Received: from SG2PR06MB5430.apcprd06.prod.outlook.com (2603:1096:4:1ba::14) by TY0PR06MB5212.apcprd06.prod.outlook.com (2603:1096:400:21a::5) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5834.15; Mon, 21 Nov 2022 16:24:23 +0000 Received: from SG2PR06MB5430.apcprd06.prod.outlook.com ([fe80::6338:4729:4a99:e3cc]) by SG2PR06MB5430.apcprd06.prod.outlook.com ([fe80::6338:4729:4a99:e3cc%5]) with mapi id 15.20.5834.011; Mon, 21 Nov 2022 16:24:18 +0000 Message-ID: Date: Tue, 22 Nov 2022 00:24:14 +0800 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:102.0) Gecko/20100101 Thunderbird/102.5.0 Subject: [Ping x4] Re: [PATCH, nvptx, 1/2] Reimplement libgomp barriers for nvptx Content-Language: en-US To: Chung-Lin Tang , Chung-Lin Tang , gcc-patches , Tom de Vries , Catherine Moore References: <8b974d21-e288-4596-7500-277a43c92771@gmail.com> <32ba851f-ad70-155e-c321-b9bfb610f353@codesourcery.com> <0d3cfa3f-8d63-e2bb-ea31-2f39753d6dd1@siemens.com> From: Chung-Lin Tang In-Reply-To: <0d3cfa3f-8d63-e2bb-ea31-2f39753d6dd1@siemens.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-ClientProxiedBy: TYAPR04CA0018.apcprd04.prod.outlook.com (2603:1096:404:15::30) To SG2PR06MB5430.apcprd06.prod.outlook.com (2603:1096:4:1ba::14) MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: SG2PR06MB5430:EE_|TY0PR06MB5212:EE_ X-MS-Office365-Filtering-Correlation-Id: 15e0a667-d518-431c-b8cb-08dacbdcd6d1 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: QUW54x0cO0o5Dhxz5xb1/2TH7Q3aKJh2J/DsbPAjfSl98hCZiesS+zLpjnTnVtX2WMd/3A1reQ5L/gDTi0c6Q31/cwDWBS5PjRVlRBwtqJOlSo6ZvKNJrDiU+OmQ1YP5040INKhgldN/5+3RUPuijrS7XGREB4a15l3eLR18Ft26GOZJq2JQ4IvAV31dQz3+X6YHVl+MhcJdLxxVUORTcKkFgCZE+lnYQ6WSZzsnRdquwxrDkpgB95zEDfu+V3ViLT7hRRDfE5fiAyD5N0I4Y+x+fRdrkE79vbEbpjIBX1YEdjgyvAjzy0Qw8VFJp1dSv9KJAK7k8wjNSfXStyfZ/gTaCgFnYQlwpRyXzv+NDg0LOUG0CTou3jE0LMMOFM/WNEVAAZcAv9kcgwOmcsN8d5D9T05wpjmOVLH1cTY3VJY04ZnuX0pOmjUsFJIyY0/Mq6dZjwy0UzSHtgmjdf4YWlijvYeWAcYlqSszsec9VEL8Os7HBSE3WCNK03xZ/3xODhwVOeSNXHCl/gWzpSBm2oV9O3VpWRbNPqBUD6iH8518IKCYZ84WICfX9OdW+OhtihE1UH5pxnSh9OD6Tzei33nMwgLFWB9d0C7jxwb2L4i1X7NwM6W8OaRSjuBmOBFo9xABZRZdWkFJkyCB2uwrwQ6kVRT2m6RrORBBKm4WTMZWH2EmT+FCQAmJy+goX46Y2ER2UR3byqK8I/c1IjsDKMMukCQ1uXwnfjPFRNlBaLWxzpsr8cP4I3aVSLAkRyz1g7LEtIKsdSUL5RMri8MS/w== X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:SG2PR06MB5430.apcprd06.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230022)(4636009)(136003)(39860400002)(366004)(396003)(346002)(376002)(451199015)(8676002)(36756003)(53546011)(66476007)(38100700002)(66946007)(66556008)(6512007)(6506007)(6666004)(31696002)(478600001)(26005)(5660300002)(316002)(41300700001)(82960400001)(86362001)(110136005)(6486002)(966005)(8936002)(83380400001)(31686004)(2616005)(186003)(2906002)(17423001)(43740500002)(45980500001);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: =?utf-8?B?Qk1aVll4Z3dsSG9YZHBjakp0NVFiVG83c09oQXZ2azFQTk5oWmNsMkc1cDl1?= =?utf-8?B?S3V5NjUzalJQdzIxTFJ6QTBqR05hUmdEbUNsMTZSbjdmTTdod2NoU1BTTVYw?= =?utf-8?B?dUpLOFZnZFJrRWt4dlM3bDhRTmsybGtOR3BGYnA4b1FTOWlQeXFyZFhaUjVV?= =?utf-8?B?RkwvV0o2VkMvOHliVmJ1SGw1aVd5TXVUN1o0aUNDVE5iYUY4Nk5Fd2tpZGQr?= =?utf-8?B?TStsOEx4WEhpQzNxOFR3S2JUVnovZVJGVFJKb0VySzY1Mi95UGprTUNZK2xh?= =?utf-8?B?VmJJYXJZM3pSUEFZblc5ZXlVR2VrQW9wN0xzWExNcWdqWU51MWVMcFF1ejFI?= =?utf-8?B?amVzOXdxa0Vzc1ZoeWZ4S2xwaitja3dOZTJESHdqSzBYRnM2SlhNWFFZZjVi?= =?utf-8?B?TWFNbWVMemo2WmJ3ZFVPNDgzV0pWVkxicU9GelFEUFY1WTBoVHpMeVJ1TThE?= =?utf-8?B?WmZLaFlaR2FscXBQMlYwUUd5K01sUmhTZzJURFltZWQrRlZHQVdYZklYWnp0?= =?utf-8?B?azZ1QVh1SUhoVmRUVm9ob1IxQXhuSldxZWt3QWU2Z0gzUVZaODhvZ0ZmWHND?= =?utf-8?B?ejhwN0lSWWlRQUhaODNUcmtSRm5uMU5DczJyYTlMbXFVWkdtMGFtRVpWL1Bm?= =?utf-8?B?VnUvRTJsVDNSblRTU0tIbFBxNUZwMDFNcVg2UUtuWmxXZ1ZvRG5xNGdIclZK?= =?utf-8?B?TE11RmNucTlHbXpiR0MwbitMVU5KN05sL0lVTWd3Yll0aC85Wk8xNVVoRUZz?= =?utf-8?B?WFUwSkkzakUxdE4xQ1R4eWUxVGxuY0JQOTl0VlZFRnVUYnkybmZIQjhkUGh0?= =?utf-8?B?NkN3MFhoaURpYVMzenY5T1JUZmdmYm54c1U5STJ0bU9xV0lEQW9JeEFhV2Zk?= =?utf-8?B?VGhRSTRmc3piRkFxRXhGR3c2NWtPNHFYT1p2ZHBkSFdiUnpNamdQb1lYczhn?= =?utf-8?B?emlhamx1UExJaWJvT3BhQ1Z4RW4wYXEvWjlpMkFrZTBsc1ZER2dialkyRGFW?= =?utf-8?B?SWxNMis5eVMxZmNYRUNSQlowcytqSU9zcHB5WjVIME44V29Tb2NxNnR0Z1FJ?= =?utf-8?B?VEFDdHJoYnliRDArNzBLSm5QZDI4VWQwTXRPSXZIamNwMC9kRUNMam9ScDZk?= =?utf-8?B?ektqZFpTalhpdnZIUjNBclRUUTBNdlhTNzdRdEt3T1BRUWlYZjFUWDkxa2pD?= =?utf-8?B?S3lkenFyaGVrdU1WY0JTOVZZSVhJVkJDa1NqVnJGMXJIRjVUdlZ0QW9BMEht?= =?utf-8?B?Wld6UnhMRVJQZlhNR05JazVyRGd2QlBZaHBtRlJYMGNuRkVLd2pvRXpmbzRY?= =?utf-8?B?Nmx2eDU4VnNacCttS3JtSzRGMzVUQ3V4d3JidVNReGxCdGhKdm83TXdxREdF?= =?utf-8?B?amxuQ2dOZjR6cTl6SzVCY0NGaU04b3dSMVpHZDdXbzYwZ0FNNnVDM25TUkpt?= =?utf-8?B?ZDh0S3l0MlFsNVFTQ0dLc3k0Rlg0c1NkWG1xL1NFWVpnd2dqZGxnbHlvNlly?= =?utf-8?B?cC9zYUF3dWxTeWFYdGdFZWZxcmVac2xKYnFOQWRqRWJWdlNlOE5tNUJOTzNK?= =?utf-8?B?QytXUHBhc1VQeXVjZnYraWQ3Y1JTcnpHRXVYN1VFR3Awbm5VR2t6MGRERXhL?= =?utf-8?B?OWRNclZJUk9ycHU4SUZSaVZnS0lDSGN4alpXUUJjMWtVMk5pdGlySzBKcDlS?= =?utf-8?B?MTNMWDVJZkxGeGR1dkdZTFdkVjFEUUJkMTZWSW9ZVDVIK2JkNFVueVBiNmNJ?= =?utf-8?B?OVdpTzhOKy9lSXJWWENKYWpmQVBwc1V5K3ZoTmFLUkRIb0s3Zk5IcG44RkVj?= =?utf-8?B?NktJaUxZR1BmWTV5dytITjF2dkdFUFZ0dUNmdUNDQVFOeW1BRk1SVmVWeU9o?= =?utf-8?B?b2g4OGx1QzB6OXRydysyb2RkaW5jQ3RIUmVvTzdhc1RGVVhDdmN5d2JqK0k1?= =?utf-8?B?ZUZ6U0c3MkpaSDBxY1VoREUvcUsyQzYrYjh5cXdWbU5Dbks4dllHSFF1Nysv?= =?utf-8?B?b0VwOVNqbS94Q2RwV0NHbzVqZGhIV2RWVXhtNzg3a2ErMkZhZjhBSnJ4WXdP?= =?utf-8?B?emU3TWdZNThSeW5HbHlnSi92WnRuU1c1SkE1YmFDaWRndm91MU5aNG03ZXFO?= =?utf-8?B?NTViTDlUOC8ycHVEN1lxZG9kb2l1S2RjZ3ZYTTlIdWhpUE14R2xpR29JY1FF?= =?utf-8?B?RUE9PQ==?= X-OriginatorOrg: siemens.com X-MS-Exchange-CrossTenant-Network-Message-Id: 15e0a667-d518-431c-b8cb-08dacbdcd6d1 X-MS-Exchange-CrossTenant-AuthSource: SG2PR06MB5430.apcprd06.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 21 Nov 2022 16:24:18.4908 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 38ae3bcd-9579-4fd4-adda-b42e1495d55a X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: CaaTJWvLbEzIIFb5guz5NDB7h8wuBNP5Qc+2QH7m29lFlztNe30I8Vh3Lv8Okh3T2xnJ4jIPuk+4u78OJUC85YxD4yCyJVlsWsmlIUjyBaw= X-MS-Exchange-Transport-CrossTenantHeadersStamped: TY0PR06MB5212 X-Spam-Status: No, score=-1.5 required=5.0 tests=BAYES_00,DKIMWL_WL_MED,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FORGED_SPF_HELO,KAM_SHORT,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_PASS,SPF_NONE,TXREP autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Ping x4 On 2022/11/8 12:34 AM, Chung-Lin Tang wrote: > Ping x3. > > On 2022/10/31 10:18 PM, Chung-Lin Tang wrote: >> Ping x2. >> >> On 2022/10/17 10:29 PM, Chung-Lin Tang wrote: >>> Ping. >>> >>> On 2022/9/21 3:45 PM, Chung-Lin Tang via Gcc-patches wrote: >>>> Hi Tom, >>>> I had a patch submitted earlier, where I reported that the current way of implementing >>>> barriers in libgomp on nvptx created a quite significant performance drop on some SPEChpc2021 >>>> benchmarks: >>>> https://gcc.gnu.org/pipermail/gcc-patches/2022-September/600818.html>>>>> >>>> That previous patch wasn't accepted well (admittedly, it was kind of a hack). >>>> So in this patch, I tried to (mostly) re-implement team-barriers for NVPTX. >>>> >>>> Basically, instead of trying to have the GPU do CPU-with-OS-like things that it isn't suited for, >>>> barriers are implemented simplistically with bar.* synchronization instructions. >>>> Tasks are processed after threads have joined, and only if team->task_count != 0 >>>> >>>> (arguably, there might be a little bit of performance forfeited where earlier arriving threads >>>> could've been used to process tasks ahead of other threads. But that again falls into requiring >>>> implementing complex futex-wait/wake like behavior. Really, that kind of tasking is not what target >>>> offloading is usually used for) >>>> >>>> Implementation highlight notes: >>>> 1. gomp_team_barrier_wake() is now an empty function (threads never "wake" in the usual manner) >>>> 2. gomp_team_barrier_cancel() now uses the "exit" PTX instruction. >>>> 3. gomp_barrier_wait_last() now is implemented using "bar.arrive" >>>> >>>> 4. gomp_team_barrier_wait_end()/gomp_team_barrier_wait_cancel_end(): >>>> The main synchronization is done using a 'bar.red' instruction. This reduces across all threads >>>> the condition (team->task_count != 0), to enable the task processing down below if any thread >>>> created a task. (this bar.red usage required the need of the second GCC patch in this series) >>>> >>>> This patch has been tested on x86_64/powerpc64le with nvptx offloading, using libgomp, ovo, omptests, >>>> and sollve_vv testsuites, all without regressions. Also verified that the SPEChpc 2021 521.miniswp_t >>>> and 534.hpgmgfv_t performance regressions that occurred in the GCC12 cycle has been restored to >>>> devel/omp/gcc-11 (OG11) branch levels. Is this okay for trunk? >>>> >>>> (also suggest backporting to GCC12 branch, if performance regression can be considered a defect) >>>> >>>> Thanks, >>>> Chung-Lin >>>> >>>> libgomp/ChangeLog: >>>> >>>> 2022-09-21 Chung-Lin Tang >>>> >>>> * config/nvptx/bar.c (generation_to_barrier): Remove. >>>> (futex_wait,futex_wake,do_spin,do_wait): Remove. >>>> (GOMP_WAIT_H): Remove. >>>> (#include "../linux/bar.c"): Remove. >>>> (gomp_barrier_wait_end): New function. >>>> (gomp_barrier_wait): Likewise. >>>> (gomp_barrier_wait_last): Likewise. >>>> (gomp_team_barrier_wait_end): Likewise. >>>> (gomp_team_barrier_wait): Likewise. >>>> (gomp_team_barrier_wait_final): Likewise. >>>> (gomp_team_barrier_wait_cancel_end): Likewise. >>>> (gomp_team_barrier_wait_cancel): Likewise. >>>> (gomp_team_barrier_cancel): Likewise. >>>> * config/nvptx/bar.h (gomp_team_barrier_wake): Remove >>>> prototype, add new static inline function. >