From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from APC01-TYZ-obe.outbound.protection.outlook.com (mail-tyzapc01on2054.outbound.protection.outlook.com [40.107.117.54]) by sourceware.org (Postfix) with ESMTPS id 8A02D3858D28 for ; Mon, 7 Nov 2022 16:34:45 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 8A02D3858D28 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=siemens.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=siemens.com ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=l/hNIGafc3Wuvq00DCGqkB4GVm0BMJLHCK8GsyZQRSNrPxYlUCYYANwm9FsXizaAdw2ciNIwVhgDl3lgQUm0uH1M17DfJerR/gjP/JthTlEPbT3i5VzZrToNcSnuGN7NYAjAvHFe/PjpBpAzz8wsv9DvFCWjI6n/ovqa8XD5p3VhI2CybG6QUI4MCoyTNvuayMNlRKpMJkQoX1ac/Pok5oMzhPWv6b4Q2u6wwfuhQObsnC5tie9y8sQJRYk0+qvSXkQi6jqlsnElRc0kExW3dpCgqBF2JIlTAX194bZXpLVejEgOKS3i45wDtlF2FfhX2bV2+NyW3PqPQ1nN4XIoeQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=BsXGQppt+5fnhTWUWNUwkaYKKpekE1tl7N07fSOizh8=; b=UGJoXBgSSUj0wIylC5pFflWOyaU8wcqgmDXRZS95zCJq6VeAVsAaNkSS3T3B8YwFofcmTHpq0UB438p/SW++b7aZkXIW3JvTQI7CrGUPVWFmRmQzN9k9i1y4htH13ZJuEtTv3q6+Wt92ckVT0II7hZrzhntgm6/r1uo0vF2rAS8sjq0qFUqAVd3DwSIkSkgKn+QT9eo7RHCqUAZLdplybR0vKhjcnr33BuO+L3vyWNj5vDrKVBHOKlAyaRPK0FpCOUW8d6qV5vjGNeMR4y4O9icTckgi5NyYUz99KppAg0CS/bb8Hcq0X0RxkCTeR84I3RolzxAKxW6MAuPPU/HO3Q== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=siemens.com; dmarc=pass action=none header.from=siemens.com; dkim=pass header.d=siemens.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=siemens.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=BsXGQppt+5fnhTWUWNUwkaYKKpekE1tl7N07fSOizh8=; b=L7WEsc+7IsTKvSWqPXF+WfpgiN/VaUqbUoAH8Vp+GkAOXPyUDSL34fD4tUAbTdLlN8t16dZCIIrM43sB5cP+b7rrPlTtXrepNYZjK5UPwZ55PN6pVw9TceHj/cagrAKPI27gwiyIi8rNGF7MDXR09AuUdodi4eXG8vMpAfA37/wRuQmaySW8lX52zFZun6WxgVyc5YKYcCZFfVqc5J3q5pxrKKKZeX6b2cMyDSS2KPbtzBOQOLIhboH4it5U48tQqHWelmGfYmvqd/9LMyxmQPc9lItikCvEBtbgCjQG72dFctMrcSRyTiq+rWXTICE0nnzNS7SyaZUV4gEUgVdvEw== Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=siemens.com; Received: from SG2PR06MB5430.apcprd06.prod.outlook.com (2603:1096:4:1ba::14) by TYZPR06MB5932.apcprd06.prod.outlook.com (2603:1096:400:341::13) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5791.24; Mon, 7 Nov 2022 16:34:41 +0000 Received: from SG2PR06MB5430.apcprd06.prod.outlook.com ([fe80::f8ec:b657:d845:1212]) by SG2PR06MB5430.apcprd06.prod.outlook.com ([fe80::f8ec:b657:d845:1212%8]) with mapi id 15.20.5791.027; Mon, 7 Nov 2022 16:34:41 +0000 Message-ID: <0d3cfa3f-8d63-e2bb-ea31-2f39753d6dd1@siemens.com> Date: Tue, 8 Nov 2022 00:34:38 +0800 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:102.0) Gecko/20100101 Thunderbird/102.4.1 Subject: [Ping x3] Re: [PATCH, nvptx, 1/2] Reimplement libgomp barriers for nvptx Content-Language: en-US To: Chung-Lin Tang , Chung-Lin Tang , gcc-patches , Tom de Vries , Catherine Moore References: <8b974d21-e288-4596-7500-277a43c92771@gmail.com> <32ba851f-ad70-155e-c321-b9bfb610f353@codesourcery.com> From: Chung-Lin Tang In-Reply-To: <32ba851f-ad70-155e-c321-b9bfb610f353@codesourcery.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-ClientProxiedBy: TY2PR06CA0039.apcprd06.prod.outlook.com (2603:1096:404:2e::27) To SG2PR06MB5430.apcprd06.prod.outlook.com (2603:1096:4:1ba::14) MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: SG2PR06MB5430:EE_|TYZPR06MB5932:EE_ X-MS-Office365-Filtering-Correlation-Id: 6f24251b-9713-4e75-3d6b-08dac0ddf842 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: 300CpxCyljfXYpXs/jMPVvWC/d+wzCiHMutlAtdFtqsZVGqXijRIDyNm/AIjKBjiJp20vRV78txtci8YCM/8ReZ/ghKNmsqe6kfQoL7EFGqelKuK2hVQvc/FgZGb36WTPGv3jouRhUFY8aJQz8JfjTiYiGCOv37dQwCVFRGb3dVYjxb6a7Ab0jsQd0Q6h/FnQFFtnzgbGokS0VZssrZOAJXkMDOm52EuWE1eEIDR0M4jT4uYO1G2BAMqKI+G/oWJOxZT0YCqmfrmaiqzYpvz4gU5P3gJGDJbL+hJIuYy1gb/YnHDIlYUVb6ArFUtLId6wKzAgd/o1zeSLmuXKczmhDy0CpOp66UGrPwnXACmNA6gEgXHKThr/AcZ0EofNb40JCYua1A8hTOnJErpHSie2zxk0XJ8vdYFr+ltcn6KSzpinu34pjfPp3umXWGKIlVeoklJS5dONRoPnVj+AxwrEPdovi5rYwD+j/WsCaNOp20mWRUVNrtL7rQ92CzzMCU4thecrSE6XgLKFLxq9B5EECl9QhIMur8iTgRYw6C/hwnTX6XS7UYT05lkSx0OXezIIQD3+PoG5rZvyu9nRtXnN3gricvbBXLNrNuwNtJvWJJbG4fhNuZZDHZwHhJHxk41XUebncX05sSoVtZmkNDbpOOmI+5oSkqbknlPpvbKETonGPfpCsCDCC8po//zz5KifJI8hrgXx3SkKGbiI5417Kd5twQw1zj3q0FjL56vnLfHgBgeld77ye1uu+KzSKHvp2ISIdC0YvdtrKlCwcOih/3+dUnjUFX6DEWJeSy/EwDhvlOXyDA33S0ou+CbrXSxzMsHnD+rUvrWmoIIpXX5fA== X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:SG2PR06MB5430.apcprd06.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230022)(4636009)(366004)(376002)(136003)(39860400002)(346002)(396003)(451199015)(38100700002)(5660300002)(31686004)(186003)(66946007)(8936002)(82960400001)(83380400001)(6666004)(8676002)(66556008)(110136005)(66476007)(316002)(2616005)(6512007)(26005)(478600001)(41300700001)(31696002)(86362001)(6506007)(2906002)(966005)(53546011)(36756003)(6486002)(17423001)(45980500001)(43740500002);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: =?utf-8?B?RVU1Sk1TQktBWmpQZmdkMHl0RU5GczNkZ2NzanZsVnUvZXU3WHJPaC8xM3Vv?= =?utf-8?B?QkhEak56MUlJN1U0Q2x2bXovdk9ZTGcwNXAweWtHSHBTVW1lWkZJZkNoUEgx?= =?utf-8?B?YlRRYjcxUzVBS2tyNis4dkt4MCs5aDlianJ2NUN5enEvNkhoSDIyU0ZmcWdL?= =?utf-8?B?RG85TGpVN0d5dGo3NkM1NExoN2grZlRkeEpTRDRFMkNDNDVXU3RMaC82M1Ja?= =?utf-8?B?UmJFS0k2ZVhEeWZBNXVJb1Yzc0JlejIvSWQ0M09ORGNRS2lsd285b2UrMldE?= =?utf-8?B?eHRaSHFjRVdZOVhOOUFFdUUrM2EzTTNINnJNZTUzZGNYSDJVbmE3RjFOZlds?= =?utf-8?B?YTg2RjNPMnFweURubGh4Zit4cGRZSUZkWUs4Q2xCTFUvdDlYZ2dNdkhZSjc1?= =?utf-8?B?L0FjSS9aV3VhajJqb3FhMnpYZE5NWS9QZFordmlJQ2FaMWJzM1piUW9WWXNq?= =?utf-8?B?UEhnNm16Z1o1RTE5cHFOL1ZaYXRCY1pLOTlWNWdWOEoraFVqcitzZ3FxWHh4?= =?utf-8?B?QUpMV01qRHRET3dnMk1KWHVqTU1vTlZnL2tVYk5QQTd0aFpEVC9wZ3RVQUpN?= =?utf-8?B?MnV6NjVaZ3VEMmovYnptQjl1YkQrUzlNNG9pc0V5amExNjdIMGhFVEI2TW9V?= =?utf-8?B?K3JNbVlKTEdkWGlGQnNJdjdZNmJCY0c3Vk5zVHFIaG8wWmFUeHM3RFd3bjBX?= =?utf-8?B?Skd2L1Ywb2lZbHltdG9uR21jODRhNGJmd0tLc1FYTnhjOFBscWJLaGllTWRJ?= =?utf-8?B?NWNncUdrNGZGZDdQVzJxcmN1Q3J4cTZqeDVFbURkWWdmdjlVL2pjWGxNQ2sy?= =?utf-8?B?UGFrNUx2TGRndXhKbnBKbTV3SFRmYWswZDhFUUVqeWNSZnF3WkE4V1J2c2Q5?= =?utf-8?B?Q21XUnZEUVVLN0FmU1d5dmxkaEtYRVBrOVJSSHBveFFodlcvL2M4SDFRNU54?= =?utf-8?B?WjBUbjhzSGRMVjNaL01EQzY5NWV5RWpsaGtMNjluYWQ0N09mMDFRbS9mT2Fu?= =?utf-8?B?NmhST2FDOStRbytyWjhOYXIyWk1BT0dBU2FoT3FiekhFalZQRUtqQy9EMVVL?= =?utf-8?B?VmptNDk3c3BjalRXbFFnN1RXMHNQb3d5SUFwN0hQbGFpRUhESHNQRGw4WGdq?= =?utf-8?B?OHJoSTM3ck04U0M1eWpPa2lEaWRUTTJhQVkyMk1DRHROL2UxSEJ1UnA5SFJB?= =?utf-8?B?dWREazBEY2JSeWlZbWlueVdKRDlJMHVuWi8wZ2NKMklCZjlRM091Z1RCam9h?= =?utf-8?B?L3YrdXorZFFtQnlGTGdrblhwUFFEVU9WQzRicVNmamZlSGtZYlJmTDc3TmZ1?= =?utf-8?B?UlYyUlFMdU1FUmRKUXl4UDBHNjFwakZvOE0rOHZBeVBBbjRWZ0ZDYW10R1Fr?= =?utf-8?B?UHozV210aVJNZ1dJMjA0VnRSR1VjTk83MmRSSSszUENBN0tyRjBxSCt3OE5j?= =?utf-8?B?TXl4M0tybk5HamJXdDBXbkUzVnFjUGVVblUweXU0QllrQjNwQWRWblNkbUVQ?= =?utf-8?B?S29vUmFHSFpKT3NnYm02ZU5QQ09xNS9LeFp3U0VPVGlGZENzUWJCYUIvT0Y1?= =?utf-8?B?d2h2bGlmSXZSK0FqSGZlalhKRnZXamtLRkp0UkRVK2t4SWFsbXpSYVpVMWlh?= =?utf-8?B?TGNxZkRVd0dLbGRjR2ttTk9yUXo3b2VMZlR6cCswUE9GUGdYTWhoTnRSWVZD?= =?utf-8?B?NEhkbFlCK3llenYyalVveUNKalpFcmJXbWl0VWJrN0NabjFlbzFxYi8rSC9U?= =?utf-8?B?ZzFMeEFiNWhKb0pwc1hoLzl1emh3UlVXME1oN25vcnhrSGFhcHk2bE1hT1Vr?= =?utf-8?B?Qy95WkhkNEJiMTRUdDBZUEpmdm1rWDNKRzJmVmhJcHUwdm4yb25SM0xReldX?= =?utf-8?B?cEhrcUxMVERpaWlBaGplOC83TVE0bG8xMFB6NHJWYnA0ZWwxaE1Cb3VTRjlB?= =?utf-8?B?a0ZvT1RvS2twNmQzYjNDRXdwZk45ZHR6d3hiMmdBZzVLZ0ZwV28wS1lxL0V4?= =?utf-8?B?NUdRckgyaE45MmQrUVB5em1pMlZRV2J5Wmoxdko0eXNuUnNReXliczY4V2NF?= =?utf-8?B?REtJOGJVNWVmSDdOV3BvWmhwenUxTUYraGtBZXUxeXdiM2hqeEl0VVJzVEFy?= =?utf-8?B?Q1V0TlA4NUJwdzNHUHJSZWRqVW5wMHdRcmllTVcrNnFlb2RuazZzWUpXTFhy?= =?utf-8?B?dGc9PQ==?= X-OriginatorOrg: siemens.com X-MS-Exchange-CrossTenant-Network-Message-Id: 6f24251b-9713-4e75-3d6b-08dac0ddf842 X-MS-Exchange-CrossTenant-AuthSource: SG2PR06MB5430.apcprd06.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 07 Nov 2022 16:34:41.3304 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 38ae3bcd-9579-4fd4-adda-b42e1495d55a X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: NLtEAe+fPiJjrDMaZ/IWsIwZ1iGVHBTYpOAvG3tiQ2kOv17F8ouv6AaPg/y+djzj24GIVBWpOESpf4snG0jglK4sv6evOpmmEKxlcPo8Gq4= X-MS-Exchange-Transport-CrossTenantHeadersStamped: TYZPR06MB5932 X-Spam-Status: No, score=-1.5 required=5.0 tests=BAYES_00,DKIMWL_WL_MED,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FORGED_SPF_HELO,KAM_SHORT,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_PASS,SPF_NONE,TXREP autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Ping x3. On 2022/10/31 10:18 PM, Chung-Lin Tang wrote: > Ping x2. > > On 2022/10/17 10:29 PM, Chung-Lin Tang wrote: >> Ping. >> >> On 2022/9/21 3:45 PM, Chung-Lin Tang via Gcc-patches wrote: >>> Hi Tom, >>> I had a patch submitted earlier, where I reported that the current way of implementing >>> barriers in libgomp on nvptx created a quite significant performance drop on some SPEChpc2021 >>> benchmarks: >>> https://gcc.gnu.org/pipermail/gcc-patches/2022-September/600818.html >>> >>> That previous patch wasn't accepted well (admittedly, it was kind of a hack). >>> So in this patch, I tried to (mostly) re-implement team-barriers for NVPTX. >>> >>> Basically, instead of trying to have the GPU do CPU-with-OS-like things that it isn't suited for, >>> barriers are implemented simplistically with bar.* synchronization instructions. >>> Tasks are processed after threads have joined, and only if team->task_count != 0 >>> >>> (arguably, there might be a little bit of performance forfeited where earlier arriving threads >>> could've been used to process tasks ahead of other threads. But that again falls into requiring >>> implementing complex futex-wait/wake like behavior. Really, that kind of tasking is not what target >>> offloading is usually used for) >>> >>> Implementation highlight notes: >>> 1. gomp_team_barrier_wake() is now an empty function (threads never "wake" in the usual manner) >>> 2. gomp_team_barrier_cancel() now uses the "exit" PTX instruction. >>> 3. gomp_barrier_wait_last() now is implemented using "bar.arrive" >>> >>> 4. gomp_team_barrier_wait_end()/gomp_team_barrier_wait_cancel_end(): >>> The main synchronization is done using a 'bar.red' instruction. This reduces across all threads >>> the condition (team->task_count != 0), to enable the task processing down below if any thread >>> created a task. (this bar.red usage required the need of the second GCC patch in this series) >>> >>> This patch has been tested on x86_64/powerpc64le with nvptx offloading, using libgomp, ovo, omptests, >>> and sollve_vv testsuites, all without regressions. Also verified that the SPEChpc 2021 521.miniswp_t >>> and 534.hpgmgfv_t performance regressions that occurred in the GCC12 cycle has been restored to >>> devel/omp/gcc-11 (OG11) branch levels. Is this okay for trunk? >>> >>> (also suggest backporting to GCC12 branch, if performance regression can be considered a defect) >>> >>> Thanks, >>> Chung-Lin >>> >>> libgomp/ChangeLog: >>> >>> 2022-09-21 Chung-Lin Tang >>> >>> * config/nvptx/bar.c (generation_to_barrier): Remove. >>> (futex_wait,futex_wake,do_spin,do_wait): Remove. >>> (GOMP_WAIT_H): Remove. >>> (#include "../linux/bar.c"): Remove. >>> (gomp_barrier_wait_end): New function. >>> (gomp_barrier_wait): Likewise. >>> (gomp_barrier_wait_last): Likewise. >>> (gomp_team_barrier_wait_end): Likewise. >>> (gomp_team_barrier_wait): Likewise. >>> (gomp_team_barrier_wait_final): Likewise. >>> (gomp_team_barrier_wait_cancel_end): Likewise. >>> (gomp_team_barrier_wait_cancel): Likewise. >>> (gomp_team_barrier_cancel): Likewise. >>> * config/nvptx/bar.h (gomp_team_barrier_wake): Remove >>> prototype, add new static inline function.