From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtpout140.security-mail.net (smtpout140.security-mail.net [85.31.212.145]) by sourceware.org (Postfix) with ESMTPS id 1CD603858421 for ; Wed, 5 Jul 2023 15:13:03 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 1CD603858421 Authentication-Results: sourceware.org; dmarc=pass (p=quarantine dis=none) header.from=kalrayinc.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=kalrayinc.com Received: from localhost (fx405.security-mail.net [127.0.0.1]) by fx405.security-mail.net (Postfix) with ESMTP id 23704335E2E for ; Wed, 5 Jul 2023 17:13:01 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kalrayinc.com; s=sec-sig-email; t=1688569981; bh=O4xSN7pnnLbFq6oX3pUt5DcqM9kUX7x+8/kNd1SGGvY=; h=From:To:CC:Subject:Date; b=e9b8+dr0nlN1k7B1EFBtR3DW3dn7QtWGNAyu/65rFyNMm3J9ikU2y1l22iSpVEPLv 3oJa7slvDUOXPWORIySOBNa+ky8YRgjPfGSseNF5x/89469CTap3L2+3rElG/gbJls WtH7OH8VK4G7xRcgqZRlUXJxR5nyY6gqFAoH+jSk= Received: from fx405 (fx405.security-mail.net [127.0.0.1]) by fx405.security-mail.net (Postfix) with ESMTP id 00571335E50 for ; Wed, 5 Jul 2023 17:13:01 +0200 (CEST) Received: from FRA01-MR2-obe.outbound.protection.outlook.com (mail-mr2fra01lp0106.outbound.protection.outlook.com [104.47.25.106]) by fx405.security-mail.net (Postfix) with ESMTPS id 5D524335D31 for ; Wed, 5 Jul 2023 17:13:00 +0200 (CEST) Received: from MR2P264MB0113.FRAP264.PROD.OUTLOOK.COM (2603:10a6:500:11::21) by PR0P264MB3292.FRAP264.PROD.OUTLOOK.COM (2603:10a6:102:144::12) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6565.17; Wed, 5 Jul 2023 15:12:58 +0000 Received: from MR2P264MB0113.FRAP264.PROD.OUTLOOK.COM ([fe80::68c2:5dd8:5cf7:38f]) by MR2P264MB0113.FRAP264.PROD.OUTLOOK.COM ([fe80::68c2:5dd8:5cf7:38f%6]) with mapi id 15.20.6565.016; Wed, 5 Jul 2023 15:12:58 +0000 X-Virus-Scanned: E-securemail Secumail-id: <79ae.64a5887c.5c54e.0> ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=HIig0pRW2cr5P9OxRI68DJk7W6dlZP2Mn9pNPKsyAtmEVmTWlDyYAb1Q2dOqOc0g+uEd5KqGDikVM/Q+EDI7+q8EdUJXKEANNf8b5hFd5FWmTtKPqsjftZGCUph9pN5zlJ+7ZlPbkZQLDDeO7FYIfuUv5P2OYE3iyoKbAkXkdBPZe/yNZel5s5732w2sQdROqds0Kg9XCc1F4WXIYXKpg8lEnyY7mNjsuDFgP6dtMCaHb8rlwAE3IGXsNkwI+XZFBseOVdnqlKoiiApRVNY8M9q4ezT0wQG9zXFFcgf+8sZ5ASKoaeNXmODZvjLWyoIQKpEeBdtt1qJrzyJQ9ZQ1uQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=lgXwG+DECtrOyX++iA9S8PZx4zU9I4j7VzWxPZfoFwY=; b=CHjPJcm0x+36gICBkaGBH2KTNyIXfKrvNmYtIGlkwjAN/GALhWGPiujACh8Aimkz/52g03lEAGJIu9AdhSwFVqxn/5FXNSZepqp6M+0ut8BPfT1q6qMdpskXAfnDqaypya8yJ26l/kmRriD0KVH783uRoiBHl3ZYcLjmZlKVYHXBLa8XXypSae5wdAoRugOGi+IOsgu2E3gF7mXGLipihXZLj1dcVeRgMsGeRjQKAmWmGR6YWw4tTyoBoun5bP1PP74DbrAW+M6NrpUJKfrKd+PABfeMerMuTSp4X7TCq3dIZfbwn74YGyNI+V6uQRovDo7tKP8o8C1fqlJYPgjhUw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=kalrayinc.com; dmarc=pass action=none header.from=kalrayinc.com; dkim=pass header.d=kalrayinc.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kalrayinc.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=lgXwG+DECtrOyX++iA9S8PZx4zU9I4j7VzWxPZfoFwY=; b=cBAXVyk6EvHWGkCGj9aV7esGhMFnlroAlkXvDfUWbjckewJ2cnzYFXxoZC58vZ2/rnAjnryUPh6HvDLkefySSTLlIcTGakqF7LYtZ6ACOYLZyK9AB6W+LdOxOfKsilsek3vr6d9F29axpaZjNYXtOozSxLFybS/ug8zZi5RyEZLQE2ybwoKiVnQKMOv6GS+mlM0Uz8ftS3amudfpDis66QuPe63R+xo/1mX7EPfwvQ59th28lLKjybIRprtwI880D7DG/g9gKZGBM8IDSBtdGqzQVPomlmhPWoNqoBU08hB13DzGdvVhLj4wajrtUOWY5S94yd2p4ExhjQGkQyp0MQ== From: Sylvain Noiry To: gcc@gcc.gnu.org CC: Paul Iannetta , Benoit Dinechin Subject: [RFC] Exposing complex numbers to target backends Thread-Topic: [RFC] Exposing complex numbers to target backends Thread-Index: AQHZr1Gif+OatQ2Fi0+WqaI+jAlgsA== Date: Wed, 5 Jul 2023 15:12:58 +0000 Message-ID: Accept-Language: en-US Content-Language: en-us X-MS-Has-Attach: X-MS-TNEF-Correlator: msip_labels: authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=kalrayinc.com; x-ms-publictraffictype: Email x-ms-traffictypediagnostic: MR2P264MB0113:EE_|PR0P264MB3292:EE_ x-ms-office365-filtering-correlation-id: 4e61b65d-953a-4ea5-8c11-08db7d6a5131 x-ms-exchange-senderadcheck: 1 x-ms-exchange-antispam-relay: 0 x-microsoft-antispam: BCL:0; x-microsoft-antispam-message-info: gZZRIZCwHz0IkByseTULGsUsNPNW/YmI7ZJeP7MlwqEIfrbo+T5utdjgHu6TrNrotB5iaLR0dbJOzuKDLF/1ghybcAw4Pbz6Am3mAZNhssVFGBHrrMA5UOfBKvk/L5+vq9JC8VqU/S6Bx6R105nr+UCoMOJCVvi0M1QqGGnHhSQFazGjZu3E5lXY1o3R2B6L6BVKfJf4/KDtRrRLAT3d+c8K6liYIfMPXV204jD0SJ9ihxeHTNCi48GPSUd+ZEb5em5C6v45KBTQIaqy8gkQM2I93IjG1U3X7FbCSc2Yrbl6vmJyV2ZTaAxcXCIrQ/TzwYNNEAAb7VFoAvdkQH86TRJ+cJs9HfAIqtbuWProAYwESLryBL77aifLx451vUqo+r09GUL4fps4qbBNClEcxjKJbg1JqC/zB420xvbJdNCnrX5iFZv02+J5AgLBJvSFUj6ooUFRTfN0ohnjJJzKYCVhf3U+x1BPmgBSGXZhomVhJQS0lRmED9Ip7xIvy3hY3vLc7V6QyGyrmAqJsLGw+Dx/0SO+uCjfT4gb7hwsurdPiiHHq9x+Y/m/NpgUnbVwOiLXA6E6xFUy0MtbjScLYIGs5gTCC1q9/Z8H+YTKKYbb+s1yBXMsawQaQtOWY/3W+Y/xUnwpOLGB3SlcrtHcpw== x-forefront-antispam-report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:MR2P264MB0113.FRAP264.PROD.OUTLOOK.COM;PTR:;CAT:NONE;SFS:(13230028)(4636009)(396003)(39850400004)(136003)(376002)(366004)(346002)(451199021)(76116006)(91956017)(38100700002)(66476007)(66446008)(66556008)(4326008)(64756008)(6916009)(66946007)(122000001)(55016003)(4743002)(186003)(86362001)(33656002)(71200400001)(7696005)(38070700005)(9686003)(26005)(6506007)(478600001)(54906003)(107886003)(8936002)(8676002)(5660300002)(52536014)(66899021)(2906002)(41300700001)(316002)(83380400001)(36394005);DIR:OUT;SFP:1102; x-ms-exchange-antispam-messagedata-chunkcount: 1 x-ms-exchange-antispam-messagedata-0: mN8koQFiTXhZ5Rdz683if3J8my2SdS9/XyGU1Ksn2kaSa5U93uLbeM3+wXgAhmfjXm7HeXvrhDaPs/wAkAq0vhVasiInh0Uz1t7J6cqCc+2czklTy8iEZsm2EAYjxTmziK46o4SQBbIzF6Yas9k5zfha2dcZsEnVcorcnrrPPirVIRXAnEYyy2KTgrmJ1RPZwSNVrczWrpsY1Emp0EGrRrMfHYcUQjssIVNcIZyyOSpyrTDwX8dhm4WErP7v3JYABDwukjwDX7n1M2WuuszV32YTWHnNhzyRELGCLc9FMdk86exn1rqfM4yF+9pMlhLESQ0rANKX+AiK5Lf9crT9lL3RuBCY77PmVzCB26ilPA8w169iXUCwrP/h3SMRVwMVa2S6vcULmTAbcsnU4HPcNINq5giP/jPMe0zeKH4qMYvxEBgu6otI67LQPoxGE8dXlRonEC/RFxzXUKlBuCFLTW2BReFD3hZG6MHy2nsMRW6kdg/mP0OgMJ9BGm5mhpMAClo2+kAE76q+n+LkF/bDl7EqROAkTeSgB8Ldqb8czzPQ4uNl02FkgPAYaTaFCm4t7AZ8qVDMJksPryAUnu41p8236HcWIDpbFJh5Nn0H14SAYNtq0/pF8LLVDw8T2YfezWXp65sj1I5wQUvteK+nOhrTpGfnIhdzTeHoFrWGrjr/swQCQXkXrrbSbHdrbpKTBQVGqoKqYcsW0bXmeeQUXQfrELKexB6bqAxlICpsRh9NIY74CukZY6I1ImkGM+P3xe0nwVJ7ZnZ/UryXhfT20f99bXOyDyyiWsxURC46TJeYtTpsMPfw9AzJPptFMB2zIkrUGblpYAczHEvw0shMuaIVd8Prefh6tWywTeSq3lrTMeskOkxHTPH6RxTQv3SyqbUfecz/jVSd7RsgHEjqfDeDQDbIa0MKAokWoR+O+TJq2ahlxF8jvwIm7jUwbevE aUFwY6X0dB3HHpB67Ftbxx7Ze9sdR7V/7jkQAYUdIRvKXHxECahyMbmHeO9LuezD1ZyTS64F+jyrFVhaiitpghRopJkJrrtwFywDt4Mf2iH26QzM8N5GtHnMNRRRCscrTgsXs7P5gbPcKF0CjoDjOm4ZAIdVm0HmFzoL9Fy8Z8Bfih87ALvfAUd1yufR3qOTdJlLEvjGXAPr+FELQCxAfCWiXBNti2RgYJfTKTMUmcFku2roq/bFqnb0zkJbtpMG5vhqWd2kWLh+73O24N/Bns46QREeQiEbtptvCz9j+VNdnaia0nRr+LB3jMLedFnY2GKcmgSPkAT3wEK6vbCqx10b0asnCwR2BoRhC1OdXfmAZDb1aBZDTlWvk/dq2iw1TN6AMV5KPjtVJ8Em1bXJbstZZ3W3qW+EIzutz++vegpbNAZoZOzz3zB88AveNMSXaziMUZd3THpTO6WVJisoGoRe45N4DFM0pbjCuHX62Zdcr/dCoEV+mycKPfCIYGXa8+KLEu7hqnW7ZlsARtpn98MzK6zr5GeCcj5yVu5/p+y5GtkYEMimoE8Eli6Se/HsaV9T5ccMGZlpABOG4h0Rhh64wFSWGYdPTixfCDcrqqjjeVgLIGF06y67ISJk7NpT Content-Transfer-Encoding: 8bit MIME-Version: 1.0 X-OriginatorOrg: kalrayinc.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-AuthSource: MR2P264MB0113.FRAP264.PROD.OUTLOOK.COM X-MS-Exchange-CrossTenant-Network-Message-Id: 4e61b65d-953a-4ea5-8c11-08db7d6a5131 X-MS-Exchange-CrossTenant-originalarrivaltime: 05 Jul 2023 15:12:58.4967 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 8931925d-7620-4a64-b7fe-20afd86363d3 X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: Ka6SKBZDgyPpWtYiuqQtY36tb8HI3D7ciwVVXF02iAptSnhPZUh0jmjNMBMpMaartDxbo5M63B+lTYYOgJI1pA== X-MS-Exchange-Transport-CrossTenantHeadersStamped: PR0P264MB3292 Content-Type: text/plain; charset=utf-8 X-ALTERMIMEV2_out: done X-Spam-Status: No, score=-3.1 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_LOW,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Hi, My name is Sylvain, I am an intern at Kalray and I work on improving the GCC backend for the KVX target.  The KVX ISA has dedicated instructions for the handling of complex numbers, which cannot be selected by GCC due to how complex numbers are handled internally.  My goal is to make GCC able to expose to machine description files new patterns dealing with complex numbers.  I already have a proof of concept which can increase performance even on other backends like x86 if the new patterns are implemented. My approach is to prevent the lowering of complex operations when the backend can handle it natively and work directly on complex modes (SC, DC, CDI, CSI, CHI, CQI).  The cplxlower pass looks for supported optabs related to complex numbers and use them directly.  Another advantage is that native operations can now go through all GIMPLE passes and preserve most optimisations like FMA generation. Vectorization is also preserved with native complex operands, although some functions were updated. Because vectorization assumes that inner elements are scalar and complex cannot be considered as scalar, some functions which only take scalars have been adapted or duplicated to handle complex elements. I've also changed the representation of complex numbers during the expand pass.  READ_COMPLEX_PART and WRITE_COMPLEX_PART have been transformed into target hooks, and a new hook GEN_RTX_COMPLEX allows each backend to choose its preferred complex representation in RTL.  The default one uses CONCAT like before, but the KVX backend uses registers with complex mode containing both real and imaginary parts. Now each backend can add its own native complex operations with patterns in its machine description. The following example implements a complex multiplication with mode SC on the KVX backend: (define_insn "mulsc3"   [(set (match_operand:SC 0 "register_operand" "=r")         (mult:SC (match_operand:SC 1 "register_operand" "r")                  (match_operand:SC 2 "register_operand" "r")))]   ""   "fmulwc %0 = %1, %2"   [(set_attr "type" "mau_fpu")] ) The main patch affects around 1400 lines of generic code, mostly located in expr.cc and tree-complex.cc. These are mainly additions or the result of the move of READ_COMPLEX_PART and WRITE_COMPLEX_PART from expr.cc to target hooks. I know that ARM developers have added partial support of complex instructions.  However, since they are operating during the vectorization, and are promoting operations on vectors of floating point numbers that looks like operations on (vectors of) complex numbers, their approach misses simple cases.  At this point they create operations working on vector of floating point numbers which will be caught by dedicated define_expand later.  On the other hand, our approach propagates complex numbers through all the middle-end and we have an easier time to recombine the operations and recognize what ARM does.  Some choices will be needed to merge our two approaches, although I've already reused their work on complex rotations in my implementation. Results: I have tested my implementation on multiple code samples, as well as a few FFTs.  On a simple in-place radix-2 with precomputed twiddle seeds (2 complex mult, 1 add, and 1 sub per loop), the compute time has been divided by 3 when compiling with -O3 (because calls to __mulsc3 are replaced by native instructions) and shortened by 20% with -ffast-math.  In both cases, the achieved performance level is now on par with another version coded using intrinsics.  These improvements do not come exclusively from the new generated hardware instructions, the replacement of CONCATs to registers prevents GCC from generating instructions to extract the real and imaginary part into their own registers and recombine them later. This new approach can also brings a performance uplift to other backends.  I have tried to reuse the same complex representation in rtl as KVX for x86, and a few patterns.  Although I still have useless moves on large programs, simple examples like below already show performance uplift. _Complex float add(_Complex float a, _Complex float b) {   return a + b; } Using "-O2" the assembly produced is now on paar with llvm and looks like : add:         addps  %xmm1, %xmm0         ret Choices to be done:   - Currently, ARM uses optab which start with "c" like "cmul" to distinguish between a real floating point numbers and complex numbers.  Since we keep complex mode, this could be simply done with mul.   - Currently the parser does some early optimizations and lowering that could be moved into the cplxlower pass.  For example, i've changed a bit how complex rotations by 90° and 270° are processed, which are recognized in fold-const.cc.  A call to a new COMPLEX_ROT90/270 internal function is now inserted, which is then lowered or kept in the cplxlower pass.  Finally the widening_mul pass can generate COMPLEX_ADD_ROT90/270 internal function, which are expanded using the cadd90/270 optabs, else COMPLEX_ROT90/270 are expanded using new crot90/270 optabs.   - Currently, we have to duplicate the preferred_simd_mode since in only accept scalar modes, if we unify enough, we could have a new type that would be a union of scalar_mode and complex_mode, but we did not do it since it would incur many modifications.   - Declaration of complex vector through attribute directives, this would be a new C extension (and clang does not support it either).   - The KVX ISA supports some fused conjugate and operations (ex: a + conjf(b)), which are caught directly in the combine pass if the corresponding pattern in present the backend. This solution is simple, but it also mays be caught in the middle-end like FMAs. Currently supported patterns:   - all basic arithmetic operations for scalar and vector complex modes (add, mul, neg, ...)   - conj for the conjugate operation, using a new conj_optab   - crot90/crot270 for complex rotations, using new optabs I would like to have your opinion on my approach. I can send you the patch if you want. Best regards, Sylvain Noiry