From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from APC01-SG2-obe.outbound.protection.outlook.com (mail-sgaapc01on2081.outbound.protection.outlook.com [40.107.215.81]) by sourceware.org (Postfix) with ESMTPS id 25E44393BA53 for ; Mon, 5 Dec 2022 16:21:13 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 25E44393BA53 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=siemens.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=siemens.com ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=ZAdGmo+WPgKJYPzFyEy4FzunFNR+NSM7hIdhzXtmwH56d6sbIFOlU9QpSDHx0g8fSBGYDkvnsGjBGDiD4Dc87hFAKB9zbHdkFwFj3OZL3F91y3xtkAZF48DQ42G6vKt6CXbO5MsH7lWATPpKoShhaNpxeJ0lmQeSRMljaLXVljOJMeP/Pj1PY8RjQ0V7qNGlgAVgBnNdEzaaNqZRlvVQzo3BOTcs5EGcGutrvbrvHTiz9odLxy8/G9YASsuuZhfhRCFJkZicc30sBGfP7W6aXjjlkD2ts56saBiiHaxSQAoHFhrfySbBeUbCsxxNRmb7eLJcFoUtgpVybXs15NBU7g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=GvjASs06BESUul0lvKglTKbfa4+AgNDKpehE0/H4iMc=; b=EJmdNwakJuJ0QNgYhUVbd/7CsQ+cBaThguxpGn1Uhatk+6sUIow/eTUvk/+wiBIJbHfETxhjndNBcDUPySxLOLDnwkwg3gbNVyaP/jBKoC9NwQaAT2L6NSo9tfdpGRWe5hCEUHT6+Zqu+yb8purxs+Zg75q3X4CGeLVWwHCYIZT9ud6E7nT/aN5PcF6T3WtHf04F+sph58mSDtjmlbckwxYjp0qbZ8pFE2hXuw4RdXYZ82NCVsTrK/8Gmxohu2TsCU4bgeYuPyQllQbcrPBltb9k4ZaVY6RWoeniroAiuueFmREkWza8RoiJ4cIAyz59sQ4/6FzPinkhad/5wRB/Ew== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=siemens.com; dmarc=pass action=none header.from=siemens.com; dkim=pass header.d=siemens.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=siemens.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=GvjASs06BESUul0lvKglTKbfa4+AgNDKpehE0/H4iMc=; b=ZW9eUhxGB7tZljAsfoNvYAGvpqGvRhyAEuniwWjn+3w83dO6R1a6jNGG0M5V0aUVVr7N8TVJuIDxXbEDFtYPOSyGTg5ITQrVxOkZ0oXumkV+u2zmIv7UMeX6tiTIuy2l+E8yM+ocuudeHAW8njbtC10Gh0OC5lT9mUeXLWMt+jVj9TxSYy/LcWjVQ2121HKrzqVGs4ZZIb7prOdcvCOVmx8gbYFnGc7dkadbrU9q/p5MjlE7RlK36LPE6t06+jSNrDlDHPaQ3+8gQeHzhuR5zHsh/4cdvtvY8/GPqPWkSKtsN00Ce1BCnGShbJXCuAhMZkAY3y63+p3TaqECIk/EZA== Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=siemens.com; Received: from SG2PR06MB5430.apcprd06.prod.outlook.com (2603:1096:4:1ba::14) by TY0PR06MB4981.apcprd06.prod.outlook.com (2603:1096:400:1af::6) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5880.13; Mon, 5 Dec 2022 16:21:08 +0000 Received: from SG2PR06MB5430.apcprd06.prod.outlook.com ([fe80::6338:4729:4a99:e3cc]) by SG2PR06MB5430.apcprd06.prod.outlook.com ([fe80::6338:4729:4a99:e3cc%4]) with mapi id 15.20.5880.013; Mon, 5 Dec 2022 16:21:07 +0000 Message-ID: <50529ebd-9d54-5581-6719-d968ab5e4ed3@siemens.com> Date: Tue, 6 Dec 2022 00:21:03 +0800 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:102.0) Gecko/20100101 Thunderbird/102.5.1 Subject: [Ping x5] Re: [PATCH, nvptx, 1/2] Reimplement libgomp barriers for nvptx Content-Language: en-US To: Chung-Lin Tang , Chung-Lin Tang , gcc-patches , Tom de Vries , Catherine Moore References: <8b974d21-e288-4596-7500-277a43c92771@gmail.com> <32ba851f-ad70-155e-c321-b9bfb610f353@codesourcery.com> <0d3cfa3f-8d63-e2bb-ea31-2f39753d6dd1@siemens.com> From: Chung-Lin Tang In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-ClientProxiedBy: TY2PR01CA0002.jpnprd01.prod.outlook.com (2603:1096:404:a::14) To SG2PR06MB5430.apcprd06.prod.outlook.com (2603:1096:4:1ba::14) MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: SG2PR06MB5430:EE_|TY0PR06MB4981:EE_ X-MS-Office365-Filtering-Correlation-Id: 6c7e6fe6-5aca-4cfc-9b19-08dad6dcb6cd X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: NCjQy9DPQ5A9g38rX+n+4nl6vNHbj2g2t15LoF+ETHr4ZSoNGPySm2NE4wtG35xVHLMiLi7Q8gXF+8+Imc3FWAkVyQy+OIgRrJINyy4u79pg71DIT6NZIOLykFJAXT9X3Lk9COY/YwUckSVXf3lHCtE3wCbUrbMIc37Oyu4Oc8Pe7tUPfvhlyR0r2MutCtA5viqjEZYbD4JgnBo2Nl0x2LIFwljMl0qgTHKTZYxYmVuaYgZA4y8IIfmp3fZciIvClj4o6mW0IolSMaFCsC34CFmr3sJ7FI7LuMYwJcqR7QBZQs5ArR17kKTWOLNjrKD5HFRF3sdKbwsGIPqr4Jjqcj2Rj2ShrCA5YlqBFggH4nKAnPOauW7EGXUMfx5mlxzqwpuCbWqSqF/JhaFQX6ji3/WmoIEHzS5ovqe6z7NiIosmp/cJItv3lY1v0leQ20NZlbwfyVArJhW4YFn9QMS8GW3splF329tqE2YA0nbKveG1gw4FthOv0dfMbZj2kKf2EEvsBz8zttJFcFqpPZUR2C7V4lSk+6d3nIkr6hp0ghKDncHF2I0uvfgmSlZJTermRRjC53eibpH8PnZh9JifAz/xfm0xps+43RSGEcb8XORLxssVnecd4GMjzOFzrVJTbPuwcoakSX8XKfT/MEKHZPC9QZbwQnztj6XtkYGykjvU2e2v6e8XD7xT2rSMvFtKKOXlGoiAb7Xstv7bSYmctbCBqPsQMiLcCNzH364rg+zwN1BuVKFqi2/Q00qBrS8o+2ay+vKMEtbJgznUBXH4yw== X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:SG2PR06MB5430.apcprd06.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230022)(4636009)(39860400002)(346002)(376002)(396003)(366004)(136003)(451199015)(36756003)(82960400001)(38100700002)(86362001)(31696002)(2906002)(41300700001)(8936002)(5660300002)(83380400001)(966005)(66556008)(478600001)(6486002)(66946007)(66476007)(316002)(2616005)(110136005)(31686004)(8676002)(6666004)(53546011)(6506007)(186003)(6512007)(26005)(17423001)(45980500001)(43740500002);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: =?utf-8?B?MjFXVzhGeElFWGxodDI0Rk1McCtLMnE0bTBwSkpZRFl0Qk9UL3FFa04rU3JM?= =?utf-8?B?THluYlU3RUptVFZjTGJNMXltRnlYNkFwN3N1VDdiZUl5T2JXKzF1WFFqMHk4?= =?utf-8?B?eEFIbzVLZVdoSS9FcEJYTi9kc1V0NmZlN0ZCQVN1VndHTjZlUWZheElvakVw?= =?utf-8?B?bTR1azNjL3hoZGF3Yk1JK0lia20yMlJXUk4rKzg5RjFaQ3VHZVZsTFRodDVH?= =?utf-8?B?QmdiYlAxbWlnUUNUbFR0M0VZdFJiRkQrM2ZWRFN3L3MyYXFWS1Y3QjVaaUxL?= =?utf-8?B?SUVoVk1GMjdiOHR0TUZ5SDY1eXg1cVYzbi9mS291WVFvcGJYNmtpOWxldTZ3?= =?utf-8?B?NzVVNkR5Q3pWTEtNMkUwdnorbmh3c2ZKdzV1TDVvc1FwM2FGb2x1TVd2WEo3?= =?utf-8?B?NXFjVHRLM2FkY2tPTEw4K3NRb0x3QlNwN0plazRXYmdHdTRnK1NrV1V0UWgz?= =?utf-8?B?c3NrVnRlMVVpOGlBenZ0MFVGRGlXZE1PbW1mM3VucVEzUlcxZ3FCM0F1ZmF2?= =?utf-8?B?UHk3eHc2emF5UmdJUi9JQStPTFZSVVBGMmM3SzcwNVlkTktHS043WXRpci8r?= =?utf-8?B?WlNNZkZPeWpidTFpVnlGaC91MWZ0M00wcXJsMDltZGE4M3ZyR1FMekpacGpO?= =?utf-8?B?QlZTYUM1aHdvUURiUVQ1T0FBNkdZYmxocHF6OW1Pd2lDYTVmeXIzajc5Undh?= =?utf-8?B?TDNIUDdXb0lkQzdyYnNSUnY2MUJVNUlYYXBwcnNvQzdtRXMyUGVPT2tyWVVX?= =?utf-8?B?MkI5WjY0Nm40QlJhV2t1UUFVaktZWm9GbmxoK1diai9ZbHRuVXBNaUEyZmpM?= =?utf-8?B?Mm96YlZ6SFhsQ2pvRHlHU0ZHWXhnZzlsNWp2bklzSzAvTk1LMHZ0cHZzU0NZ?= =?utf-8?B?Z0dFL2ZxbXRZa3NjVm1tWGg1T0MxMlNkTWtieFlacFVlSS8zby93WWFyUVNu?= =?utf-8?B?TVUrUVh5Kzh1cGQ1MFkwTjBQRTRLOTUzbFRvc3lNbmdEdWZxalM2UTdJQ3Nm?= =?utf-8?B?SG1IWithV0VqVUhSV2JOZ2k4aU9vL2kzNk1sQ082aFZITXhmZmxLcDRjNGFq?= =?utf-8?B?QzhzSk4zb3Rsa0tMeGRKOHdDZjgrVDZjYXVIYjhkYTI0YytKaXB2OU5DTmt5?= =?utf-8?B?NmlwL2lOelBmVFErWWIrWTQrZEF3bThvclN5Q2owNS82eVB6d1pvMVAzRS95?= =?utf-8?B?S2J2VTg5RS8vZjVwcHkyL1NieXVkQmd6KzVsRjBROERnalE3TU9qbmE3VUJ1?= =?utf-8?B?NXVHQ1FWSXdPaHZTS3VwRzB5dWJTOEFtNWRjNzBWWEFwMWJibEkxQ1UwazRQ?= =?utf-8?B?ODZCV256dlUwdVpOR3V6VEFhcXBVY0lYK1pZYW1UdURrZjUrc2VwZEhVbWh5?= =?utf-8?B?VWNUeklpenc5RGNtcVlzeHAxOFpZakE4QUhxekNsZlVQYkprVm9IZkNJbzhX?= =?utf-8?B?N2txbXRnRHFkdWNUVURiTlpWbC92TlpuVEk3dVNIV3V5V2RydWNiVVdVeDJ1?= =?utf-8?B?QTc4TUZNT0pYSTNTdTZDdFVScFVtSWdnMXlqVjZFa3N4dzQvL2ZuZm9YTi8z?= =?utf-8?B?NFZ3clVoV1o0NjFVNDBkY0tyRyt5dlJXTDJqejZJNkVxSkM0bnVZcXcrWEI4?= =?utf-8?B?SXE1VnhUSy9kQXFqaDI2WFZpN0FubjNLNnprY3pZeklSRjk1aldPUUJVVGVj?= =?utf-8?B?OFYvSldkL2c5SmJ3Ykw5MHZ2VWYrT1dPVFZocVRuVzRuM29zbVNlbjdkaFJp?= =?utf-8?B?SjBUWmtiMEZrOHNqc3dBUUxWeC9VNUxqUkY1Z3ZpRFVDUUY1b3NwR1N4V0Jo?= =?utf-8?B?T2I5ZkFiZ2FVaW9yd1N5dEdlaXN3OSt4RHZTL0ZZeUFlV3FUb3F0enNHQ3V0?= =?utf-8?B?ZDFZRVBjMUVHVkxHUVMzRTVpKzR0Zkw5ZTR3NkVLcHNDTk0rQko0aVZhSFBN?= =?utf-8?B?dWRlZXN5SzZVRGc5VFErRXpVUFJ0SWhlWjBoQkJkNHJ6SVI0WFJkUW80MDFh?= =?utf-8?B?THRNbzhoVFV1dFU1RmNNQ1psM0FtMzQ4RmtOQzVFcDFRT2MvcWhBcTNMYlVV?= =?utf-8?B?OTJtVlpqS2VqVkFJR092OWQ2bGg2SUdoNmI5Y3hyWEtXRXp4ZHF0RW5UT1hF?= =?utf-8?B?S2xJTVIxNW52eEFNckxoREdLQlkxYmNPZFlDUW9ycUsrUXVGSXhOS2IwVW9t?= =?utf-8?B?L2c9PQ==?= X-OriginatorOrg: siemens.com X-MS-Exchange-CrossTenant-Network-Message-Id: 6c7e6fe6-5aca-4cfc-9b19-08dad6dcb6cd X-MS-Exchange-CrossTenant-AuthSource: SG2PR06MB5430.apcprd06.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 05 Dec 2022 16:21:07.6778 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 38ae3bcd-9579-4fd4-adda-b42e1495d55a X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: TcGsXRbbPDttRugNyDItSKg8Eb6oEgwq2q8IQsuF9H7sXqwzkEZQSTyOIfVB1npNo9C5erBUhWv6H8+z6Rr7oZGn0PvX5rKGU6rtWsIMLb4= X-MS-Exchange-Transport-CrossTenantHeadersStamped: TY0PR06MB4981 X-Spam-Status: No, score=-1.5 required=5.0 tests=BAYES_00,DKIMWL_WL_MED,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FORGED_SPF_HELO,KAM_SHORT,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_PASS,SPF_NONE,TXREP autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Ping x5 On 2022/11/22 12:24 上午, Chung-Lin Tang wrote: > Ping x4 > > On 2022/11/8 12:34 AM, Chung-Lin Tang wrote: >> Ping x3. >> >> On 2022/10/31 10:18 PM, Chung-Lin Tang wrote: >>> Ping x2. >>> >>> On 2022/10/17 10:29 PM, Chung-Lin Tang wrote: >>>> Ping. >>>> >>>> On 2022/9/21 3:45 PM, Chung-Lin Tang via Gcc-patches wrote: >>>>> Hi Tom, >>>>> I had a patch submitted earlier, where I reported that the current way of implementing >>>>> barriers in libgomp on nvptx created a quite significant performance drop on some SPEChpc2021 >>>>> benchmarks: >>>>> https://gcc.gnu.org/pipermail/gcc-patches/2022-September/600818.html>>>>>> That previous patch wasn't accepted well (admittedly, it was kind of a hack). >>>>> So in this patch, I tried to (mostly) re-implement team-barriers for NVPTX. >>>>> >>>>> Basically, instead of trying to have the GPU do CPU-with-OS-like things that it isn't suited for, >>>>> barriers are implemented simplistically with bar.* synchronization instructions. >>>>> Tasks are processed after threads have joined, and only if team->task_count != 0 >>>>> >>>>> (arguably, there might be a little bit of performance forfeited where earlier arriving threads >>>>> could've been used to process tasks ahead of other threads. But that again falls into requiring >>>>> implementing complex futex-wait/wake like behavior. Really, that kind of tasking is not what target >>>>> offloading is usually used for) >>>>> >>>>> Implementation highlight notes: >>>>> 1. gomp_team_barrier_wake() is now an empty function (threads never "wake" in the usual manner) >>>>> 2. gomp_team_barrier_cancel() now uses the "exit" PTX instruction. >>>>> 3. gomp_barrier_wait_last() now is implemented using "bar.arrive" >>>>> >>>>> 4. gomp_team_barrier_wait_end()/gomp_team_barrier_wait_cancel_end(): >>>>> The main synchronization is done using a 'bar.red' instruction. This reduces across all threads >>>>> the condition (team->task_count != 0), to enable the task processing down below if any thread >>>>> created a task. (this bar.red usage required the need of the second GCC patch in this series) >>>>> >>>>> This patch has been tested on x86_64/powerpc64le with nvptx offloading, using libgomp, ovo, omptests, >>>>> and sollve_vv testsuites, all without regressions. Also verified that the SPEChpc 2021 521.miniswp_t >>>>> and 534.hpgmgfv_t performance regressions that occurred in the GCC12 cycle has been restored to >>>>> devel/omp/gcc-11 (OG11) branch levels. Is this okay for trunk? >>>>> >>>>> (also suggest backporting to GCC12 branch, if performance regression can be considered a defect) >>>>> >>>>> Thanks, >>>>> Chung-Lin >>>>> >>>>> libgomp/ChangeLog: >>>>> >>>>> 2022-09-21 Chung-Lin Tang >>>>> >>>>> * config/nvptx/bar.c (generation_to_barrier): Remove. >>>>> (futex_wait,futex_wake,do_spin,do_wait): Remove. >>>>> (GOMP_WAIT_H): Remove. >>>>> (#include "../linux/bar.c"): Remove. >>>>> (gomp_barrier_wait_end): New function. >>>>> (gomp_barrier_wait): Likewise. >>>>> (gomp_barrier_wait_last): Likewise. >>>>> (gomp_team_barrier_wait_end): Likewise. >>>>> (gomp_team_barrier_wait): Likewise. >>>>> (gomp_team_barrier_wait_final): Likewise. >>>>> (gomp_team_barrier_wait_cancel_end): Likewise. >>>>> (gomp_team_barrier_wait_cancel): Likewise. >>>>> (gomp_team_barrier_cancel): Likewise. >>>>> * config/nvptx/bar.h (gomp_team_barrier_wake): Remove >>>>> prototype, add new static inline function. >> >