From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtpout140.security-mail.net (unknown [85.31.212.148]) by sourceware.org (Postfix) with ESMTPS id 4EFC43858C2A for ; Mon, 16 Oct 2023 09:14:34 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 4EFC43858C2A Authentication-Results: sourceware.org; dmarc=pass (p=quarantine dis=none) header.from=kalrayinc.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=kalrayinc.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 4EFC43858C2A Authentication-Results: server2.sourceware.org; arc=fail smtp.remote-ip=85.31.212.148 ARC-Seal: i=2; a=rsa-sha256; d=sourceware.org; s=key; t=1697447679; cv=fail; b=Kwh/7p/zxLYtt/QUXeCro5bJEImS1iXCbA3gK4qohDags9Zhc42g9wEQXvAlJ9iAD0drVoSCjAOm6XvOscWCw9WVY0q94zMHReUVqhK0Ca/g2dl6dG8jlEOD51K7n9LnYmaPxuDWRNxHMV3WTrmBXFi63zm1NU5BTcuLGvGjQrQ= ARC-Message-Signature: i=2; a=rsa-sha256; d=sourceware.org; s=key; t=1697447679; c=relaxed/simple; bh=OXUQ8u9b7yVfCPid+etwy4PfAIVwrxnZEyUz4Ao0nag=; h=DKIM-Signature:DKIM-Signature:Message-ID:Date:To:From:Subject: MIME-Version; b=XrMBluFCdglZdsxQ2URvgimD1MTV04RarZCaKZp+zdrtBUP9loA/ZMHgd7yVPOtud+u5PSuT17fMkCh2v7Rm+kGJb0+u52+0FA9FCxUV+nLA4XngJR++yLApHKIUX4W9tlX9WC9VdG8xE8LI7ZDKyHIM8cOJFOyX62yVvg/XoP4= ARC-Authentication-Results: i=2; server2.sourceware.org Received: from localhost (fx408.security-mail.net [127.0.0.1]) by fx408.security-mail.net (Postfix) with ESMTP id EA5003229CA for ; Mon, 16 Oct 2023 11:14:32 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kalrayinc.com; s=sec-sig-email; t=1697447673; bh=OXUQ8u9b7yVfCPid+etwy4PfAIVwrxnZEyUz4Ao0nag=; h=Date:To:From:Subject:Cc; b=O4wnAmjv5+exhcbeOjg/hcWyOh6kTgACSPT15zLmf1wupa6Zf/vl4aOF1uSlOK0FX YrdhqAnKl9gVxQ/HDXUAtAv8QO5qma2Fx1aCPSArTGUzfAu3r6tzmSZ4GfYWSAufTJ Gp2+qzA2/UGcwPOt4lP8y/k2j7tAao7wsfIPSCNI= Received: from fx408 (fx408.security-mail.net [127.0.0.1]) by fx408.security-mail.net (Postfix) with ESMTP id A9D57323386 for ; Mon, 16 Oct 2023 11:14:32 +0200 (CEST) Received: from FRA01-PR2-obe.outbound.protection.outlook.com (mail-pr2fra01lp0100.outbound.protection.outlook.com [104.47.24.100]) by fx408.security-mail.net (Postfix) with ESMTPS id 91D9E323375 for ; Mon, 16 Oct 2023 11:14:31 +0200 (CEST) Received: from PR1P264MB3448.FRAP264.PROD.OUTLOOK.COM (2603:10a6:102:184::6) by MRZP264MB3003.FRAP264.PROD.OUTLOOK.COM (2603:10a6:501:30::7) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6886.35; Mon, 16 Oct 2023 09:14:30 +0000 Received: from PR1P264MB3448.FRAP264.PROD.OUTLOOK.COM ([fe80::5ece:32eb:eae9:b4d7]) by PR1P264MB3448.FRAP264.PROD.OUTLOOK.COM ([fe80::5ece:32eb:eae9:b4d7%3]) with mapi id 15.20.6886.034; Mon, 16 Oct 2023 09:14:30 +0000 X-Virus-Scanned: E-securemail Secumail-id: ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=WV5ZRBy86aw2guMxuZ1fOOy1fvK09O1Lpp29z94FI/Nyp5h3ZPMcVfPlIU2lBQqCh0b9RYi9200jEmC9n+AJs+4hMgruzXHyEiayKyVdt6p8egdbdFt7V4SCxdbCJV6hmZmumyRd5PEXQpEOORjZBr/wle/lnTEIx6uuIevhwFdBVHRwhM4lDp82WMw5BY0rOIYdVCkxIA3ZSm4bVECayscLCR8faxGKXaA/X6ZfFABwy65m7Oq4u8qNbnkWM3GXWvM5VJdTvW9/5qza9kKDYM+3/n/rAheQfSCGPtqy8Ko739tm37xPtjOaMaQurnGFS9xTJLOn9tq2gO9A1+WAQQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=XBJxmk7PTiUBCUs1dV7+Z1BwDlBCTYhLpSdVyDRwRRw=; b=SFU9ncVJz1iwcsJUUGb7gq1Maqxta4Cdor2D44QnD+SJ7b4bxRd55ZOYKnO97aLLj7EUUUnsKDfQIb8dLdZDGuYIpbhlNbKdhQvXFu+LPKcytYdZ4zyi1I8QKxxK+1chrQ3RuSCw9zZ6D08i0krCHI9nGvTTEwJ7QcjZ94k0TKgyW/gw3eiV1Wlr36OeVc7GhG2Swq19wN2W/TOsew1k3x8xYHjPB4EdB8PPPP9QL4kXo2N6gMDBNBhtJOL0BKKKtPGRViYVlvCRCwXGYEmvqdlacpqYktuyUBkBSN5T/DjGECxSGJAzcfMDRLas+AUarXndOeiRH8ywcTynSdSw6Q== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=kalrayinc.com; dmarc=pass action=none header.from=kalrayinc.com; dkim=pass header.d=kalrayinc.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kalrayinc.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=XBJxmk7PTiUBCUs1dV7+Z1BwDlBCTYhLpSdVyDRwRRw=; b=eo8AdZyICG/jdtjG7Qy4H/uJeANuc6KZTaCMeccHcc2Px5YGj2mM6q6xhMaXl29a3DAhCOhhUuFYu7X0OcU1Ap/Uox0D5L250JgDjZWkKA3YH0XtvyxBfG9aTICDZL69Yk1iLsQ9r8x5SJ7m+zpQvx0R6mBbRuQ3TZGGUxbubsSLFILpt97bG3UjVRcx1XSKslUd2GkNZxBUJ327y9wjANjJACKoKrQ30vsf6hfAR8m/Dcc9q8++LWx1SpuEyUXDYBqU6Egl4NkKhxyaqB6EbCSZvBeF/+P2SFthLfNGLkadmWsFRwFRR3e210kfUoYgnehYkFIebTNQKL/IpfDKOQ== Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=kalrayinc.com; Message-ID: Date: Mon, 16 Oct 2023 11:14:28 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.11.0 Content-Language: en-us To: gcc@gcc.gnu.org From: Sylvain Noiry Subject: Complex numbers support: discussions summary Cc: piannetta@kalrayinc.com Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit X-ClientProxiedBy: AM0PR02CA0205.eurprd02.prod.outlook.com (2603:10a6:20b:28f::12) To PR1P264MB3448.FRAP264.PROD.OUTLOOK.COM (2603:10a6:102:184::6) MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: PR1P264MB3448:EE_|MRZP264MB3003:EE_ X-MS-Office365-Filtering-Correlation-Id: 7667460d-aaa8-45ff-9e53-08dbce284d8c X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: OcD0pPgl0iw33v5kspmaWEDoDnlXXSAK1oRqwn+Bt9ij7zO2v3aV0XzSuN/HFHq30+9MS1OjX2/V5t8Xqnr3+UKNMiu7tRjyW5w74G2G+oTeL5jhKAzAPqFuhA31Dpj3FHMAQDXQ2GVWyFIf2je3fd//ctghyvMofwDKbgLzmEPe2/i1Xu82lJ/bThQAmkLl2XQIBv7vpkAF70oCqSlOf8NmzfnUksVgxCEzU73AtJBArFDqEsb0I1LE2Tbr7cxsBu/TkGashGWRiejf1JDND739/AeQbLP5wwcpbDC0eue0tuQC3f6gJiGFxY8UWZxG8ObvQj5sELLSqceopBi9FAFsnYwSUPkye1QOP4QVJTHE8pMJ+4wCXcKmtreCTZTRqucuyr+hQgsirX2HwxB4NqIKmLWKSFpKmhLLPoaYaeuiXQIn2wO4boVHlYyIfAEuR3kSevoBlxOixpKKYu3gyrXxJhe77XI1EVZXupEK35ci/ripdNl0PsPCgxm5d+e5Mz9hggD6qcgxy3N4jkVj2wcTk18OJTshf+bFrPuIuSIn7cDV7F2JPB9vsqSqljKbjUk5XX7lWcSUxQvkBYJJZgSDpm7VLzbQpT7Cqmos8emB67tH/M8csheFLdGFem8UDMYUJGYhFu3EmFdSu7FVHw== X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:PR1P264MB3448.FRAP264.PROD.OUTLOOK.COM;PTR:;CAT:NONE;SFS:(13230031)(136003)(376002)(366004)(39850400004)(346002)(396003)(230922051799003)(451199024)(64100799003)(186009)(1800799009)(31686004)(478600001)(6486002)(66946007)(66556008)(66476007)(316002)(6916009)(83380400001)(31696002)(86362001)(38100700002)(6512007)(26005)(107886003)(2616005)(6506007)(36756003)(41300700001)(5660300002)(8936002)(8676002)(4326008)(2906002)(45980500001)(43740500002);DIR:OUT;SFP:1102; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: 2CrpxICUFYtw38M+6wTHInqJ3rFZ2jKwlV6xHxXAIF4ZcLyLaHuUxATIl6f8REQggptbXFT1R6WobFzxHxhcXYK5Um/J2saeu67/q2UFA3CmNHp4Ef4njHKaYvoSc2qJp9Bd9WOoZAmtjBHZpSJ85iZQR+u8mJ+gHdP1CZFhiiEIkMlvJQBRH8XiRBtSv5qWCW/onYRsRZEnMoQSjONiywLpCP8B2zaVdtzv2RnBBkqCwDFNY79djfWgA6xQdGzA35fEmc8zbErl3xZBrNNQu3mgz4y4sFXW9UI78GHmdiWpPvkD51cfev+crkDirQBKAh9u9eyAmjsquI9nhP0Ha799kdFFv+puxs0ckrbQfGGhtyAAK9Zc/0ZvSjccCI0iNbaTD5EknfYxdtqz3sHhUlxtAMQVo6/2Vfx+3QCYmfGbaNs8CnAij/vd+VZzL9kUTWgT/jhE7uB9o9y7rlEjmbuynjaBlGY2eRPCR+CDfDW1RSimegOvQOMyjdjMm23kzaLeNWdkVzqNJO3W09xItAdFKFm0Tv6xfBfoHk0fGZv6j9xFjSzsoxTOS0UrFF8C9SJr3zWBVkCJw7wfbOUQhEfK1Ln/mBpf5MpoaFKPW64W4zd9G89jXWXzlpAflu3pvlgSSQ/1RNuz7ioPd5aFSSqFlbzyoTQzYPdFKp9S2X55H5GI9CCrpjZ2w0QhJioqJ3xbfqH3X3x08ztvg3Z+JmyBIjLWdSLdy3kxyHzRhNRztDPNErMRccdZi8cCJaeueEIQ1PIYvOiiXO4AX1UBfIp7OG5Bn7ztcL/DcOvBwwlhw+4cboAedkHEl82CGCUhuH/5DvwDEDmDOWhdxK9DGh8wFQgWQylL0YanK9FuVTKWehHIiIFFfF3HpuROda6A4xXVSyBxLxprhNCT6tBiuWUoLM1s5Bcm6emV5MDFoZacGU2sNSe3DdOlauXUuzJj Rd3zrHX0vm6ujNI/H/gVGrkOc62mD7WQCLrP0MbI0mjA7QPHyeUKt2gZYI9gDN6H/wpHn9ZPCUy269gE0NgnXjRQVGuWpoA53+s0qDmRh5VTWHoeFJDeOrfzl/0bKyfOqP0JwjZVFTzrEj9H5pKwhVPFQh6BCypArwrHpfjgcvYMeE9hFpwUrt9go5oJMCXtQNwEB43nZv5albyVANSgsyZAOm1/kTdHV2AR5pVea7XkJkmpW6JDxNmr2Ly5GoABy/Y65XT6cu6QGB3+GSg9eXQrtQh0dMT98AehjvIA8FuGOgZuTfi+92yHdqxDDGcocyC58Cp9HDWOMCWDscHPZdsfBmfSoZarMyic3qwMpZF/kaW8CaYTT+Fx8hN0ZnUlKTJdig2nbFtukP2nkHFZHwa9zV7DMGZkpuSnGXLEcFCyhtKkWyaryXMHFgln+3Z3hmMi0VRFjR4/XYm6zh990NXxDK0BXsvs5R6Hp/751nOziXYZbUAv+wVfydq2h67eKbBumVQ2KnZ8nOa3Bib706hjMLzb5LmTghNNFOexxPkgW0z1JIHe3FkUH+KVwQyI1gxGRP1751BgXRy0dsRCNiigH4CS3cHWsDv+hFpgWHaTHo6E+vLJq/ALM+WbG/rY X-OriginatorOrg: kalrayinc.com X-MS-Exchange-CrossTenant-Network-Message-Id: 7667460d-aaa8-45ff-9e53-08dbce284d8c X-MS-Exchange-CrossTenant-AuthSource: PR1P264MB3448.FRAP264.PROD.OUTLOOK.COM X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 16 Oct 2023 09:14:29.9939 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 8931925d-7620-4a64-b7fe-20afd86363d3 X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: CqZiunAeniX19/lxGmIW6v6mM10ov16OB2nY64kgx+ziaWdhUYDEkjnNwtw6eyQR16mtKkYV/vY1mbu+2zjd5A== X-MS-Exchange-Transport-CrossTenantHeadersStamped: MRZP264MB3003 X-ALTERMIMEV2_out: done X-Spam-Status: No, score=-4.9 required=5.0 tests=BAYES_00,BODY_8BITS,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_LOW,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Hi, We are trying to update our patches on complex numbers to take into account what has been discussed. The main change from our previous patches consists of replacing vectors of complex types with classical vectors of real types (ex V4SF instead of V2SC) associated with existing complex opcodes (like .COMPLEX_MUL) when vectorizing.  Non vectored complex modes are also replaced by vectors of two reals at the end of the middle-end (ex SC to V2SF), so that it can reuse already existing patterns.  Indeed, non complex specific operations like an addition does not require an specific pattern anymore, and already implementing patterns like cmul, cmul_conj, cadd90,... can be used. To do so, the cplxlower pass has been cut into two passes:   - The first one replace complex specific opcodes with dedicated opcodes (like .COMPLEX_MUL replacing MUL_EXPR with SC mode), but complex modes are kept at this point.  Unsupported native operations are also lowered, because we assume that it's better to lower and hope for standard optimizations in the middle-end than trying to vectorize with near-zero chance, and then lower only after.   - The second one almost only remaps non vectored complex modes into vector of two reals (like SC to V2SF). So the vectorizer takes complex modes as input but vectorize with vectors of real modes (ex V4SF vector mode for SC).  Because complex specific opcodes have been set before, no confusion with real operations is possible. We also may use vectors of two reals as inputs, but vectorizing small vector modes into bigger ones (like V2SF to V4SF) is not possible. Here are some advantages of this new approach:   - No more vectors of complex modes   - The vectorization of complex operations is improved, because split and unified vectored statements can easely be mixed as it uses the same vector type. We can also imagine to test multiple options (First: native vectored, second: split vectored, third: unified scalar,...).   - It reuses patterns for vectors of two reals for non complex specific operations, and also already existing complex patterns like cmul implemented on aarch64, which could mean almost free performance gains on many targets. On the performance side, we can still exploit the full potential of complex instructions on KVX.  To illustrate the gains on aarch64 without rewriting any patterns (except a mov), here is the assembly generated for a vector complex mul mul add with -O2 -mcpu=neoverse-v1 (and without ffast-math like with SLP): void vfmma (_Complex float a[restrict N], _Complex float b[restrict N],                      _Complex float c[restrict N], _Complex float d[restrict N]) {   for (int i = 0; i < N; i++)     c[i] += a[i] * b[i] * d[i]; } vfmma:         movi    v3.4s, 0         mov     x4, 0         .align  5 .L2:         ldr     q2, [x1, x4]         mov     v1.16b, v3.16b         ldr     q0, [x0, x4]         fcmla   v1.4s, v0.4s, v2.4s, #0         fcmla   v1.4s, v0.4s, v2.4s, #90         ldr     q0, [x2, x4]         ldr     q2, [x3, x4]         fcmla   v0.4s, v2.4s, v1.4s, #0         fcmla   v0.4s, v2.4s, v1.4s, #90         str     q0, [x2, x4]         add     x4, x4, 16         cmp     x4, 256         bne     .L2         ret We have only done some experimentation with this approach.  If you think that it could be interesting we will try to develop it more. Thanks, Sylvain