From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <Tamar.Christina@arm.com>
Received: from EUR04-DB3-obe.outbound.protection.outlook.com
 (mail-eopbgr60042.outbound.protection.outlook.com [40.107.6.42])
 by sourceware.org (Postfix) with ESMTPS id 5FC223857823
 for <gcc-patches@gcc.gnu.org>; Fri, 15 Jan 2021 15:30:33 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 5FC223857823
Received: from AS8PR04CA0196.eurprd04.prod.outlook.com (2603:10a6:20b:2f3::21)
 by PA4PR08MB6174.eurprd08.prod.outlook.com (2603:10a6:102:e6::8) with
 Microsoft SMTP Server (version=TLS1_2,
 cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3763.10; Fri, 15 Jan
 2021 15:30:31 +0000
Received: from VE1EUR03FT006.eop-EUR03.prod.protection.outlook.com
 (2603:10a6:20b:2f3:cafe::e3) by AS8PR04CA0196.outlook.office365.com
 (2603:10a6:20b:2f3::21) with Microsoft SMTP Server (version=TLS1_2,
 cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3763.9 via Frontend
 Transport; Fri, 15 Jan 2021 15:30:31 +0000
X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123)
 smtp.mailfrom=arm.com; gcc.gnu.org; dkim=pass (signature was verified)
 header.d=armh.onmicrosoft.com;gcc.gnu.org; dmarc=pass action=none
 header.from=arm.com;
Received-SPF: Pass (protection.outlook.com: domain of arm.com designates
 63.35.35.123 as permitted sender) receiver=protection.outlook.com;
 client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com;
Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by
 VE1EUR03FT006.mail.protection.outlook.com (10.152.18.116) with
 Microsoft SMTP
 Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id
 15.20.3763.12 via Frontend Transport; Fri, 15 Jan 2021 15:30:30 +0000
Received: ("Tessian outbound 2b57fdd78668:v71");
 Fri, 15 Jan 2021 15:30:30 +0000
X-CheckRecipientChecked: true
X-CR-MTA-CID: c6441e6c2e7a2e97
X-CR-MTA-TID: 64aa7808
Received: from e20ccc495de4.2
 by 64aa7808-outbound-1.mta.getcheckrecipient.com id
 AE2F0DD5-2B95-478D-8E98-7ED0278515EE.1; 
 Fri, 15 Jan 2021 15:30:25 +0000
Received: from EUR03-AM5-obe.outbound.protection.outlook.com
 by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id e20ccc495de4.2
 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384);
 Fri, 15 Jan 2021 15:30:25 +0000
ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none;
 b=ZP1POM0dLB+PEbYgkYO59DO0d615d+Z/XdJfTMaQfQtRmydfQwdygHgcO8dyet+kOtkt1ldoQsJPbLvKKMA6Hm8B762+t7kV9S9lf/wccBqqMkQpdkj1fNQXvBHqzCf5ItSeQ4jrPyIuAXWzwnW2rvAKZn4pczeG2pDi7VoHu2FnQzJUIQY/8XRXJr/ZyRSUBuH8wuYwshcuxOnkdEF+3qUnRWN5UV88uDWN+i9SFCdUcMJtK3asVKnCEPk14BfT+1s3YT4HylQ+CHjBgq6QhTvX3JIqf30GtvJs/4/8Itq7VW1FwRjonF+lcv6U+sgILBB6yZPIt8HkKemYD5K5Aw==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; 
 s=arcselector9901;
 h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck;
 bh=zIxIP7PAIuirhXqPI2ILJ2Oty/iOlGKErONn8hSttGM=;
 b=ErEzxsS/53Nq3RyXp98+Ug/7ipoty23V5OO4+DvMfd3Z2Abpm+bCH3kRchhPA12MwQKYzkINOoDekxjR74FZpKqG+YE0Rmf0iyzsp9psDm3mTBsREY2YTJvz39NOcWlruIxmW9HxNamnhQbCOT82l6fOhOFrE6tauJCLPFX7CsTYbjS8/QtznOpwjstLduvLu6gEEi/XShSO4quwidSqErLwaj13bUUNOuXpj7kReftFW+dhTTjCGlK5gK/PU3xuu+mCa6QdP/8LC/i1SANj39TbFBLPGP7NOmlllVEKRtPq/fjisCPf47F9nmXe5juoX2SIMd4tZEREFI8dZIVUGg==
ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass
 smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass
 header.d=arm.com; arc=none
Authentication-Results-Original: gcc.gnu.org; dkim=none (message not signed)
 header.d=none;gcc.gnu.org; dmarc=none action=none header.from=arm.com;
Received: from VI1PR08MB5325.eurprd08.prod.outlook.com (2603:10a6:803:13e::17)
 by VI1PR08MB3168.eurprd08.prod.outlook.com (2603:10a6:803:49::14)
 with Microsoft SMTP Server (version=TLS1_2,
 cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3763.10; Fri, 15 Jan
 2021 15:30:22 +0000
Received: from VI1PR08MB5325.eurprd08.prod.outlook.com
 ([fe80::ed1e:9499:4501:2118]) by VI1PR08MB5325.eurprd08.prod.outlook.com
 ([fe80::ed1e:9499:4501:2118%8]) with mapi id 15.20.3763.012; Fri, 15 Jan 2021
 15:30:22 +0000
Date: Fri, 15 Jan 2021 15:30:19 +0000
From: Tamar Christina <tamar.christina@arm.com>
To: gcc-patches@gcc.gnu.org
Cc: nd@arm.com, Richard.Earnshaw@arm.com, Marcus.Shawcroft@arm.com,
 Kyrylo.Tkachov@arm.com, richard.sandiford@arm.com
Subject: [PATCH]AArch64: Add NEON, SVE and SVE2 RTL patterns for Multiply,
 FMS and FMA. 
Message-ID: <patch-13907-tamar@arm.com>
Content-Type: multipart/mixed; boundary="wRRV7LY7NUeQGEoC"
Content-Disposition: inline
User-Agent: Mutt/1.9.4 (2018-02-28)
X-Originating-IP: [217.140.106.53]
X-ClientProxiedBy: LO2P265CA0503.GBRP265.PROD.OUTLOOK.COM
 (2603:10a6:600:13b::10) To VI1PR08MB5325.eurprd08.prod.outlook.com
 (2603:10a6:803:13e::17)
MIME-Version: 1.0
X-MS-Exchange-MessageSentRepresentingType: 1
Received: from arm.com (217.140.106.53) by
 LO2P265CA0503.GBRP265.PROD.OUTLOOK.COM (2603:10a6:600:13b::10) with Microsoft
 SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id
 15.20.3763.10 via Frontend Transport; Fri, 15 Jan 2021 15:30:21 +0000
X-MS-PublicTrafficType: Email
X-MS-Office365-Filtering-HT: Tenant
X-MS-Office365-Filtering-Correlation-Id: 648d2f57-9dbc-4ef9-e103-08d8b96a7e5e
X-MS-TrafficTypeDiagnostic: VI1PR08MB3168:|PA4PR08MB6174:
X-MS-Exchange-Transport-Forked: True
X-Microsoft-Antispam-PRVS: <PA4PR08MB61748ED9596D33B9619435F5FFA70@PA4PR08MB6174.eurprd08.prod.outlook.com>
x-checkrecipientrouted: true
NoDisclaimer: true
X-MS-Oob-TLC-OOBClassifiers: OLM:3513;OLM:3513;
X-MS-Exchange-SenderADCheck: 1
X-Microsoft-Antispam-Untrusted: BCL:0;
X-Microsoft-Antispam-Message-Info-Original: BpNMEBIG2VJAdNOFI8MPsIsL/UkhICsTNDn8cnVXNAZ+90/+BrDq9OB5NB8oewJpocMJ0+No73s9q64Y1rBIZx4BHLo3w3mNZzTL4xDo/bHkfpLWGpSB/kxyErkwVojbwwVWK50trwm7vZI2gDYjJQt/mYBs0OdaUK2KQiLmc17O+LITyCEB0Okwshp9qGkYhZyd1vG3+0wPZpxsJnU6OStOvzoemya4RpTGMMFB0a9s+cByHrf7GsiyOpUn2w2WcjwCVMkkpI0SABwnh5G+V6LSjac9WcsidbiZm3HTyVqCAoIGcMiiHcPv/Jk7tYKsjlCSef5ARfqosRrzy9qT/W7ulgipw5MYHAL0k8LcD+UOOh1WdFCRnyJpX0JOSc1kntkXJGQn2QtLKok26ze+H7+TsSSsnmXVQ4xErP9os84NiGke22E2EE2zhQ958c0qkgqWOdOqnOSpRYM7XlgunJvn0spzEwE/W1Q3Dki/DBNnE2d2drEnKt9MoVs/hE/a
X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255; CTRY:; LANG:en;
 SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:VI1PR08MB5325.eurprd08.prod.outlook.com;
 PTR:; CAT:NONE;
 SFS:(4636009)(136003)(366004)(396003)(346002)(39860400002)(376002)(83380400001)(16526019)(4326008)(8676002)(8936002)(86362001)(66616009)(235185007)(30864003)(5660300002)(186003)(44832011)(7696005)(8886007)(2906002)(316002)(956004)(26005)(6916009)(36756003)(66556008)(4743002)(66946007)(66476007)(52116002)(2616005)(478600001)(55016002)(44144004)(33964004)(4216001)(2700100001)(357404004);
 DIR:OUT; SFP:1101; 
X-MS-Exchange-AntiSpam-MessageData: =?utf-8?B?TWFzOUdOWHExeGxaSENRbU5mNDFub1kvdk1VQ1YvM2hqSmM1ZFY1OHBEU0pu?=
 =?utf-8?B?K3lsTEx2U2Rwa2pHNzY5NE5BME82dUNMRmhyV00xb1BEeGcvMnd0eTd2T2dv?=
 =?utf-8?B?VWtBTDJ0YXBsSFpxdk85a1lrbGhIbE1LOWF0MXR0eE4rQ3A1VU83ZWVRajZT?=
 =?utf-8?B?cmNBdER0ZGFHM29NOTlvd1hYb0k3Skk1SSsycXkrank0eXh2d1QzS2Z2dWUw?=
 =?utf-8?B?ZGFGRzhKZ2tiK2tVMngvSWVmVEZHbXUvSzFlQ1BJbHVSM0c5Ny9hMkliNVdp?=
 =?utf-8?B?bU9Fb2lEczMrYmJYcnI2T0FpU2l2cU40bjkvYUt3aE1BSzJmZjRQazhTSFVz?=
 =?utf-8?B?bzNXcjBISWFUQmh0TTErTUl1NG1QR3h0QVVpY3MrMlNQZUtISWdWMndMaDFq?=
 =?utf-8?B?QUtnVlJoK0t1N0pIaEJkSnJIQzNjaTVFaUxKQS9ERmI0MEd6dUxzMW5MU21L?=
 =?utf-8?B?VEpBZEgwT000RXBmZjVNUFk2LzN5UGo2OTU2VjhmK3NrMGJjNXFaaFNKRW42?=
 =?utf-8?B?ZWdsRjlEM00xcDY5VzN3UHgvanFYampQSG1vY2V6anlnN3RkZURnS290aDVu?=
 =?utf-8?B?d1A4U05GK2F5QlN4dEFiM1FEdnJBRnc0dHpaQUVGbmpJUmprRHF5c3BUQURx?=
 =?utf-8?B?aFNPZ0V5NHFsU1doQW9aVE8xb2lxazFncjhLa0FuM2tjWFo5VmhGd2JaOEN3?=
 =?utf-8?B?TEl3a1Y4ZTgyLzB4d2lwYWdlU3A2VlBlbjdmM0FIbHNOSGlBZWxrbjZnSHBF?=
 =?utf-8?B?SjMrYjFDaG9JZE1ad1B1MFNSd0E5QjRURjJHUm1XdXlrTHl5UUJkeDRnc1Z5?=
 =?utf-8?B?ZmQyYU1wdkhYeTNwc3l0ZTBZN04xWWIrajVSUm94WC9EbE1zRHRPQ1UreTJv?=
 =?utf-8?B?bkllTTZ5d1pmZkVIUngzUW1WdmdBZnhZeVhPR0NmVkFvS2FzSnhNT0hoTC8y?=
 =?utf-8?B?OTdwSk11YkpMSlZTQTZxejludmxUTG50cnhqcDRSWURBbTVJNENLR3N2ZkZ4?=
 =?utf-8?B?UzNTbzRRZXVPblFIYk9wZTlaTmRIWGwzSk8yRDVSK25yZk9iSFNZbkpHWndM?=
 =?utf-8?B?a1hQL1JvS2dNMWRSOVNTelVlV3dzaFFDSjZOOE9ZR0pBK1gwWmh3dEhtQnUv?=
 =?utf-8?B?ckd1OTIwckJvdWtRMWFmS3dqM0llNUtkUkwwSTZKeHFlMU9mYUZlM3lXcXhs?=
 =?utf-8?B?c0ZyYTZkdUZZdjE3NFl4WC9xRGpIU2F1bjJIWm9BOTF4M0JaWXJvYmRsVVdJ?=
 =?utf-8?B?eTlqOWwrN24rdUdiamdHWk9WaVlIQ05tMVBQblpOTHVWRFdpRm1rUmYxb05G?=
 =?utf-8?Q?+wnNIMiuG9DVQJuLAS7GLz1KKo7RiLLpR+?=
X-MS-Exchange-Transport-CrossTenantHeadersStamped: VI1PR08MB3168
Original-Authentication-Results: gcc.gnu.org; dkim=none (message not signed)
 header.d=none;gcc.gnu.org; dmarc=none action=none header.from=arm.com;
X-EOPAttributedMessage: 0
X-MS-Exchange-Transport-CrossTenantHeadersStripped: VE1EUR03FT006.eop-EUR03.prod.protection.outlook.com
X-MS-Office365-Filtering-Correlation-Id-Prvs: dbe45e68-2cee-4bd3-f6f2-08d8b96a78d1
X-Microsoft-Antispam: BCL:0;
X-Microsoft-Antispam-Message-Info: IuCvUfOEiGBMrrkO3hVi5VsEmYnMDA8I7gNWExSwwbMYP4E+pQnRNdSmpZOFg07n3V6gyuT4p6x1DPonLTPa/vGVblV95AGODny/mr7xWst3bbGSJZb2+s8wyLhqjE0O5+jrrVzxNLoqfxQ48eFnqReB+/Z4NUEnSdSdim6+SoOWAnotZt5Up4X4o+f8KkcfHO5b3Cuh3FKtidKk0vJ0BVvzUOB+1ORy3ufHprIhuDzfDPoGu878b4RdhC0ZhrnM4xVUHWSHKfMibvCwdaphpootNwJ8nIZvE0X2bpqsdoWWgzGeW1PBnY+kN4TL9aCFkVJt6kowW7Zsmto1sMAVwOeDXcrD1JhrNl7ofOX8+S4jVp4liv/supziGuFV5adpx1gg+RHdewqu4fzTqDW7xQUftajgDqCjsLhHhSy2yej/7g9IhlG1IiRYMBi2Spr10vLmtyoQbE8oFhy+7MNYXn9Ws7NntNyfqiBzmiXz1yTGhewq4B3rpf0/lJD43igpPHthKnbMVB5R6Is1xzthO4Tn+CoaDgg/TbnCOQtYl1XQQvS3eGCuiTzBiKK1Yjr81SIraHv54MNZIchY7ckVMzJmOAMktpOy9hYDQ/kC9sFBN8wtUkI3ypZ2BqFL6K6eU5GDMrbK95iVWcwcCkhFhG+8mtcPFwbRKReI1hsj6cij2rpVC28dbP63BZxgkz/WZBcu0gA60F2DxIEuBB4qOw==
X-Forefront-Antispam-Report: CIP:63.35.35.123; CTRY:IE; LANG:en; SCL:1; SRV:;
 IPV:CAL; SFV:NSPM; H:64aa7808-outbound-1.mta.getcheckrecipient.com;
 PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com; CAT:NONE;
 SFS:(4636009)(39860400002)(376002)(346002)(136003)(396003)(46966006)(8676002)(82740400003)(70586007)(4743002)(356005)(336012)(36756003)(44832011)(235185007)(2616005)(8936002)(8886007)(82310400003)(478600001)(47076005)(316002)(66616009)(70206006)(81166007)(956004)(34020700004)(83380400001)(55016002)(26005)(44144004)(7696005)(2906002)(86362001)(16526019)(186003)(33964004)(4326008)(6916009)(5660300002)(30864003)(4216001)(2700100001)(357404004);
 DIR:OUT; SFP:1101; 
X-OriginatorOrg: arm.com
X-MS-Exchange-CrossTenant-OriginalArrivalTime: 15 Jan 2021 15:30:30.8975 (UTC)
X-MS-Exchange-CrossTenant-Network-Message-Id: 648d2f57-9dbc-4ef9-e103-08d8b96a7e5e
X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d
X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[63.35.35.123];
 Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com]
X-MS-Exchange-CrossTenant-AuthSource: VE1EUR03FT006.eop-EUR03.prod.protection.outlook.com
X-MS-Exchange-CrossTenant-AuthAs: Anonymous
X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem
X-MS-Exchange-Transport-CrossTenantHeadersStamped: PA4PR08MB6174
X-Spam-Status: No, score=-13.8 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, GIT_PATCH_0, KAM_ASCII_DIVIDERS, KAM_LOTSOFHASH,
 MSGID_FROM_MTA_HEADER, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS,
 SPF_PASS, TXREP,
 UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.2
X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Fri, 15 Jan 2021 15:30:37 -0000
Message-ID: <20210115153019.ntt-ry12tsuCVWjxeaUVMs2f3yE2G_IlrVxidGtmVig@z>

--wRRV7LY7NUeQGEoC
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline

Hi All,

This adds implementation for the optabs for complex operations.  With this the
following C code:

  void g (float complex a[restrict N], float complex b[restrict N],
	  float complex c[restrict N])
  {
    for (int i=0; i < N; i++)
      c[i] =  a[i] * b[i];
  }

generates


NEON:

g:
        movi    v3.4s, 0
        mov     x3, 0
        .p2align 3,,7
.L2:
        mov     v0.16b, v3.16b
        ldr     q2, [x1, x3]
        ldr     q1, [x0, x3]
        fcmla   v0.4s, v1.4s, v2.4s, #0
        fcmla   v0.4s, v1.4s, v2.4s, #90
        str     q0, [x2, x3]
        add     x3, x3, 16
        cmp     x3, 1600
        bne     .L2
        ret

SVE:

g:
        mov     x3, 0
        mov     x4, 400
        ptrue   p1.b, all
        whilelo p0.s, xzr, x4
        mov     z3.s, #0
        .p2align 3,,7
.L2:
        ld1w    z1.s, p0/z, [x0, x3, lsl 2]
        ld1w    z2.s, p0/z, [x1, x3, lsl 2]
        movprfx z0, z3
        fcmla   z0.s, p1/m, z1.s, z2.s, #0
        fcmla   z0.s, p1/m, z1.s, z2.s, #90
        st1w    z0.s, p0, [x2, x3, lsl 2]
        incw    x3
        whilelo p0.s, x3, x4
        b.any   .L2
        ret

SVE2 (with int instead of float)
g:
        mov     x3, 0
        mov     x4, 400
        mov     z3.b, #0
        whilelo p0.s, xzr, x4
        .p2align 3,,7
.L2:
        ld1w    z1.s, p0/z, [x0, x3, lsl 2]
        ld1w    z2.s, p0/z, [x1, x3, lsl 2]
        movprfx z0, z3
        cmla    z0.s, z1.s, z2.s, #0
        cmla    z0.s, z1.s, z2.s, #90
        st1w    z0.s, p0, [x2, x3, lsl 2]
        incw    x3
        whilelo p0.s, x3, x4
        b.any   .L2
        ret


It defined a new iterator VALL_ARITH which contains types for which we can do
general arithmetic (excludes bfloat16).

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
Checked with armv8-a+sve2+fp16 and no issues.  Note that sue to a mid-end
limitation SLP for SVE currently fails for some permutes.  The tests have these
marked as XFAIL.  I do intend to fix this soon.

Execution tests verified with QEMU.

Matching tests for these are in the mid-end patches.  This I will turn on for
these patterns in a separate patch.

Ok for master?

Thanks,
Tamar


gcc/ChangeLog:

	* config/aarch64/aarch64-simd.md (cml<fcmac1><rot_op><mode>4,
	cmul<rot_op><mode>3): New.
	* config/aarch64/iterators.md (VALL_ARITH, UNSPEC_FCMUL,
	UNSPEC_FCMUL180, UNSPEC_FCMLA_CONJ, UNSPEC_FCMLA180_CONJ,
	UNSPEC_CMLA_CONJ, UNSPEC_CMLA180_CONJ, UNSPEC_CMUL, UNSPEC_CMUL180,
	FCMLA_OP, FCMUL_OP, rot_op, rotsplit1, rotsplit2, fcmac1, sve_rot1,
	sve_rot2, SVE2_INT_CMLA_OP, SVE2_INT_CMUL_OP, SVE2_INT_CADD_OP): New.
	(rot): Add UNSPEC_FCMUL, UNSPEC_FCMUL180.
	* config/aarch64/aarch64-sve.md (cml<fcmac1><rot_op><mode>4,
	cmul<rot_op><mode>3): New.
	* config/aarch64/aarch64-sve2.md (cml<fcmac1><rot_op><mode>4,
	cmul<rot_op><mode>3): New.

--- inline copy of patch -- 
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index 4b869ded918fd91ffd41e6ba068239a752b331e5..8a5f1dad224a99a8ba30669139259922a1250d0e 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -516,6 +516,47 @@ (define_insn "aarch64_fcmlaq_lane<rot><mode>"
   [(set_attr "type" "neon_fcmla")]
 )
 
+;; The complex mla/mls operations always need to expand to two instructions.
+;; The first operation does half the computation and the second does the
+;; remainder.  Because of this, expand early.
+(define_expand "cml<fcmac1><rot_op><mode>4"
+  [(set (match_operand:VHSDF 0 "register_operand")
+	(plus:VHSDF (match_operand:VHSDF 1 "register_operand")
+		    (unspec:VHSDF [(match_operand:VHSDF 2 "register_operand")
+				   (match_operand:VHSDF 3 "register_operand")]
+				   FCMLA_OP)))]
+  "TARGET_COMPLEX && !BYTES_BIG_ENDIAN"
+{
+  rtx tmp = gen_reg_rtx (<MODE>mode);
+  emit_insn (gen_aarch64_fcmla<rotsplit1><mode> (tmp, operands[1],
+						 operands[3], operands[2]));
+  emit_insn (gen_aarch64_fcmla<rotsplit2><mode> (operands[0], tmp,
+						 operands[3], operands[2]));
+  DONE;
+})
+
+;; The complex mul operations always need to expand to two instructions.
+;; The first operation does half the computation and the second does the
+;; remainder.  Because of this, expand early.
+(define_expand "cmul<rot_op><mode>3"
+  [(set (match_operand:VHSDF 0 "register_operand")
+	(unspec:VHSDF [(match_operand:VHSDF 1 "register_operand")
+		       (match_operand:VHSDF 2 "register_operand")]
+		       FCMUL_OP))]
+  "TARGET_COMPLEX && !BYTES_BIG_ENDIAN"
+{
+  rtx tmp = gen_reg_rtx (<MODE>mode);
+  rtx res1 = gen_reg_rtx (<MODE>mode);
+  emit_move_insn (tmp, CONST0_RTX (<MODE>mode));
+  emit_insn (gen_aarch64_fcmla<rotsplit1><mode> (res1, tmp,
+						 operands[2], operands[1]));
+  emit_insn (gen_aarch64_fcmla<rotsplit2><mode> (operands[0], res1,
+						 operands[2], operands[1]));
+  DONE;
+})
+
+
+
 ;; These instructions map to the __builtins for the Dot Product operations.
 (define_insn "aarch64_<sur>dot<vsi2qi>"
   [(set (match_operand:VS 0 "register_operand" "=w")
diff --git a/gcc/config/aarch64/aarch64-sve.md b/gcc/config/aarch64/aarch64-sve.md
index da15bd8788507feb12d52894c14e099370f34108..9dfe6a3f4512a20ba4f1e66a105ee0ae5d6949ea 100644
--- a/gcc/config/aarch64/aarch64-sve.md
+++ b/gcc/config/aarch64/aarch64-sve.md
@@ -7243,6 +7243,62 @@ (define_insn "@aarch64_pred_<optab><mode>"
   [(set_attr "movprfx" "*,yes")]
 )
 
+;; unpredicated optab pattern for auto-vectorizer
+;; The complex mla/mls operations always need to expand to two instructions.
+;; The first operation does half the computation and the second does the
+;; remainder.  Because of this, expand early.
+(define_expand "cml<fcmac1><rot_op><mode>4"
+  [(set (match_operand:SVE_FULL_F 0 "register_operand")
+	(unspec:SVE_FULL_F
+	  [(match_dup 4)
+	   (match_dup 5)
+	   (match_operand:SVE_FULL_F 1 "register_operand")
+	   (match_operand:SVE_FULL_F 2 "register_operand")
+	   (match_operand:SVE_FULL_F 3 "register_operand")]
+	  FCMLA_OP))]
+  "TARGET_SVE"
+{
+  operands[4] = aarch64_ptrue_reg (<VPRED>mode);
+  operands[5] = gen_int_mode (SVE_RELAXED_GP, SImode);
+  rtx tmp = gen_reg_rtx (<MODE>mode);
+  emit_insn
+    (gen_aarch64_pred_fcmla<sve_rot1><mode> (tmp, operands[4],
+					     operands[3], operands[2],
+					     operands[1], operands[5]));
+  emit_insn
+    (gen_aarch64_pred_fcmla<sve_rot2><mode> (operands[0], operands[4],
+					     operands[3], operands[2],
+					     tmp, operands[5]));
+  DONE;
+})
+
+;; unpredicated optab pattern for auto-vectorizer
+;; The complex mul operations always need to expand to two instructions.
+;; The first operation does half the computation and the second does the
+;; remainder.  Because of this, expand early.
+(define_expand "cmul<rot_op><mode>3"
+  [(set (match_operand:SVE_FULL_F 0 "register_operand")
+	(unspec:SVE_FULL_F
+	   [(match_operand:SVE_FULL_F 1 "register_operand")
+	    (match_operand:SVE_FULL_F 2 "register_operand")]
+	  FCMUL_OP))]
+  "TARGET_SVE"
+{
+  rtx pred_reg = aarch64_ptrue_reg (<VPRED>mode);
+  rtx gp_mode = gen_int_mode (SVE_RELAXED_GP, SImode);
+  rtx accum = force_reg (<MODE>mode, CONST0_RTX (<MODE>mode));
+  rtx tmp = gen_reg_rtx (<MODE>mode);
+  emit_insn
+    (gen_aarch64_pred_fcmla<sve_rot1><mode> (tmp, pred_reg,
+					     operands[2], operands[1],
+					     accum, gp_mode));
+  emit_insn
+    (gen_aarch64_pred_fcmla<sve_rot2><mode> (operands[0], pred_reg,
+					     operands[2], operands[1],
+					     tmp, gp_mode));
+  DONE;
+})
+
 ;; Predicated FCMLA with merging.
 (define_expand "@cond_<optab><mode>"
   [(set (match_operand:SVE_FULL_F 0 "register_operand")
diff --git a/gcc/config/aarch64/aarch64-sve2.md b/gcc/config/aarch64/aarch64-sve2.md
index 5cb9144da98af2d02b83043511a99b5723d7e8c0..b96708d03f4458726b32ec46c0078499e00b8549 100644
--- a/gcc/config/aarch64/aarch64-sve2.md
+++ b/gcc/config/aarch64/aarch64-sve2.md
@@ -1848,6 +1848,48 @@ (define_insn "@aarch64_<optab>_lane_<mode>"
   [(set_attr "movprfx" "*,yes")]
 )
 
+;; unpredicated optab pattern for auto-vectorizer
+;; The complex mla/mls operations always need to expand to two instructions.
+;; The first operation does half the computation and the second does the
+;; remainder.  Because of this, expand early.
+(define_expand "cml<fcmac1><rot_op><mode>4"
+  [(set (match_operand:SVE_FULL_I 0 "register_operand")
+	(plus:SVE_FULL_I (match_operand:SVE_FULL_I 1 "register_operand")
+	  (unspec:SVE_FULL_I
+	    [(match_operand:SVE_FULL_I 2 "register_operand")
+	     (match_operand:SVE_FULL_I 3 "register_operand")]
+	    SVE2_INT_CMLA_OP)))]
+  "TARGET_SVE2"
+{
+  rtx tmp = gen_reg_rtx (<MODE>mode);
+  emit_insn (gen_aarch64_sve_cmla<sve_rot1><mode> (tmp, operands[1],
+						   operands[3], operands[2]));
+  emit_insn (gen_aarch64_sve_cmla<sve_rot2><mode> (operands[0], tmp,
+						   operands[3], operands[2]));
+  DONE;
+})
+
+;; unpredicated optab pattern for auto-vectorizer
+;; The complex mul operations always need to expand to two instructions.
+;; The first operation does half the computation and the second does the
+;; remainder.  Because of this, expand early.
+(define_expand "cmul<rot_op><mode>3"
+  [(set (match_operand:SVE_FULL_I 0 "register_operand")
+	(unspec:SVE_FULL_I
+	  [(match_operand:SVE_FULL_I 1 "register_operand")
+	   (match_operand:SVE_FULL_I 2 "register_operand")]
+	  SVE2_INT_CMUL_OP))]
+  "TARGET_SVE2"
+{
+  rtx accum = force_reg (<MODE>mode, CONST0_RTX (<MODE>mode));
+  rtx tmp = gen_reg_rtx (<MODE>mode);
+  emit_insn (gen_aarch64_sve_cmla<sve_rot1><mode> (tmp, accum,
+						   operands[2], operands[1]));
+  emit_insn (gen_aarch64_sve_cmla<sve_rot2><mode> (operands[0], tmp,
+						   operands[2], operands[1]));
+  DONE;
+})
+
 ;; -------------------------------------------------------------------------
 ;; ---- [INT] Complex dot product
 ;; -------------------------------------------------------------------------
diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md
index d42a70653edb266f2b76924b75a814db25f08f23..3f61fc8e380abd922d39973f40a966b7ce64fa40 100644
--- a/gcc/config/aarch64/iterators.md
+++ b/gcc/config/aarch64/iterators.md
@@ -182,6 +182,11 @@ (define_mode_iterator V2F [V2SF V2DF])
 ;; All Advanced SIMD modes on which we support any arithmetic operations.
 (define_mode_iterator VALL [V8QI V16QI V4HI V8HI V2SI V4SI V2DI V2SF V4SF V2DF])
 
+;; All Advanced SIMD modes suitable for performing arithmetics.
+(define_mode_iterator VALL_ARITH [V8QI V16QI V4HI V8HI V2SI V4SI V2DI
+				  (V4HF "TARGET_SIMD_F16INST") (V8HF "TARGET_SIMD_F16INST")
+				  V2SF V4SF V2DF])
+
 ;; All Advanced SIMD modes suitable for moving, loading, and storing.
 (define_mode_iterator VALL_F16 [V8QI V16QI V4HI V8HI V2SI V4SI V2DI
 				V4HF V8HF V4BF V8BF V2SF V4SF V2DF])
@@ -712,6 +717,10 @@ (define_c_enum "unspec"
     UNSPEC_FCMLA90	; Used in aarch64-simd.md.
     UNSPEC_FCMLA180	; Used in aarch64-simd.md.
     UNSPEC_FCMLA270	; Used in aarch64-simd.md.
+    UNSPEC_FCMUL	; Used in aarch64-simd.md.
+    UNSPEC_FCMUL_CONJ	; Used in aarch64-simd.md.
+    UNSPEC_FCMLA_CONJ	; Used in aarch64-simd.md.
+    UNSPEC_FCMLA180_CONJ	; Used in aarch64-simd.md.
     UNSPEC_ASRD		; Used in aarch64-sve.md.
     UNSPEC_ADCLB	; Used in aarch64-sve2.md.
     UNSPEC_ADCLT	; Used in aarch64-sve2.md.
@@ -730,6 +739,10 @@ (define_c_enum "unspec"
     UNSPEC_CMLA180	; Used in aarch64-sve2.md.
     UNSPEC_CMLA270	; Used in aarch64-sve2.md.
     UNSPEC_CMLA90	; Used in aarch64-sve2.md.
+    UNSPEC_CMLA_CONJ	; Used in aarch64-sve2.md.
+    UNSPEC_CMLA180_CONJ	; Used in aarch64-sve2.md.
+    UNSPEC_CMUL		; Used in aarch64-sve2.md.
+    UNSPEC_CMUL_CONJ	; Used in aarch64-sve2.md.
     UNSPEC_COND_FCVTLT	; Used in aarch64-sve2.md.
     UNSPEC_COND_FCVTNT	; Used in aarch64-sve2.md.
     UNSPEC_COND_FCVTX	; Used in aarch64-sve2.md.
@@ -1291,7 +1304,7 @@ (define_mode_attr Vwide [(V2SF "v2df") (V4HF "v4sf")
 
 ;; Widened mode register suffixes for VD_BHSI/VQW/VQ_HSF.
 (define_mode_attr Vwtype [(V8QI "8h") (V4HI "4s")
-			  (V2SI "2d") (V16QI "8h") 
+			  (V2SI "2d") (V16QI "8h")
 			  (V8HI "4s") (V4SI "2d")
 			  (V8HF "4s") (V4SF "2d")])
 
@@ -1313,7 +1326,7 @@ (define_mode_attr Vewtype [(VNx16QI "h")
 
 ;; Widened mode register suffixes for VDW/VQW.
 (define_mode_attr Vmwtype [(V8QI ".8h") (V4HI ".4s")
-			   (V2SI ".2d") (V16QI ".8h") 
+			   (V2SI ".2d") (V16QI ".8h")
 			   (V8HI ".4s") (V4SI ".2d")
 			   (V4HF ".4s") (V2SF ".2d")
 			   (SI   "")    (HI   "")])
@@ -2611,6 +2624,19 @@ (define_int_iterator SVE2_INT_CMLA [UNSPEC_CMLA
 				    UNSPEC_SQRDCMLAH180
 				    UNSPEC_SQRDCMLAH270])
 
+;; Unlike the normal CMLA instructions these represent the actual operation you
+;; to be performed.  They will always need to be expanded into multiple
+;; sequences consisting of CMLA.
+(define_int_iterator SVE2_INT_CMLA_OP [UNSPEC_CMLA
+				       UNSPEC_CMLA_CONJ
+				       UNSPEC_CMLA180])
+
+;; Unlike the normal CMLA instructions these represent the actual operation you
+;; to be performed.  They will always need to be expanded into multiple
+;; sequences consisting of CMLA.
+(define_int_iterator SVE2_INT_CMUL_OP [UNSPEC_CMUL
+				       UNSPEC_CMUL_CONJ])
+
 ;; Same as SVE2_INT_CADD but exclude the saturating instructions
 (define_int_iterator SVE2_INT_CADD_OP [UNSPEC_CADD90
 				       UNSPEC_CADD270])
@@ -2725,6 +2751,14 @@ (define_int_iterator FMMLA [UNSPEC_FMMLA])
 (define_int_iterator BF_MLA [UNSPEC_BFMLALB
 			     UNSPEC_BFMLALT])
 
+(define_int_iterator FCMLA_OP [UNSPEC_FCMLA
+			       UNSPEC_FCMLA180
+			       UNSPEC_FCMLA_CONJ
+			       UNSPEC_FCMLA180_CONJ])
+
+(define_int_iterator FCMUL_OP [UNSPEC_FCMUL
+			       UNSPEC_FCMUL_CONJ])
+
 ;; Iterators for atomic operations.
 
 (define_int_iterator ATOMIC_LDOP
@@ -3435,7 +3469,79 @@ (define_int_attr rot [(UNSPEC_CADD90 "90")
 		      (UNSPEC_COND_FCMLA "0")
 		      (UNSPEC_COND_FCMLA90 "90")
 		      (UNSPEC_COND_FCMLA180 "180")
-		      (UNSPEC_COND_FCMLA270 "270")])
+		      (UNSPEC_COND_FCMLA270 "270")
+		      (UNSPEC_FCMUL "0")
+		      (UNSPEC_FCMUL_CONJ "180")])
+
+;; A conjucate is a negation of the imaginary component
+;; The number in the unspecs are the rotation component of the instruction, e.g
+;; FCMLA180 means use the instruction with #180.
+;; The iterator is used to produce the right name mangling for the function.
+(define_int_attr rot_op [(UNSPEC_FCMLA180 "")
+			 (UNSPEC_FCMLA180_CONJ "_conj")
+			 (UNSPEC_FCMLA "")
+			 (UNSPEC_FCMLA_CONJ "_conj")
+			 (UNSPEC_FCMUL "")
+			 (UNSPEC_FCMUL_CONJ "_conj")
+			 (UNSPEC_CMLA "")
+			 (UNSPEC_CMLA180 "")
+			 (UNSPEC_CMLA_CONJ "_conj")
+			 (UNSPEC_CMUL "")
+			 (UNSPEC_CMUL_CONJ "_conj")])
+
+;; The complex operations when performed on a real complex number require two
+;; instructions to perform the operation. e.g. complex multiplication requires
+;; two FCMUL with a particular rotation value.
+;;
+;; These values can be looked up in rotsplit1 and rotsplit2.  as an example
+;; FCMUL needs the first instruction to use #0 and the second #90.
+(define_int_attr rotsplit1 [(UNSPEC_FCMLA "0")
+			    (UNSPEC_FCMLA_CONJ "0")
+			    (UNSPEC_FCMUL "0")
+			    (UNSPEC_FCMUL_CONJ "0")
+			    (UNSPEC_FCMLA180 "270")
+			    (UNSPEC_FCMLA180_CONJ "90")])
+
+(define_int_attr rotsplit2 [(UNSPEC_FCMLA "90")
+			    (UNSPEC_FCMLA_CONJ "270")
+			    (UNSPEC_FCMUL "90")
+			    (UNSPEC_FCMUL_CONJ "270")
+			    (UNSPEC_FCMLA180 "180")
+			    (UNSPEC_FCMLA180_CONJ "180")])
+
+;; SVE has slightly different namings from NEON so we have to split these
+;; iterators.
+(define_int_attr sve_rot1 [(UNSPEC_FCMLA "")
+			   (UNSPEC_FCMLA_CONJ "")
+			   (UNSPEC_FCMUL "")
+			   (UNSPEC_FCMUL_CONJ "")
+			   (UNSPEC_FCMLA180 "270")
+			   (UNSPEC_FCMLA180_CONJ "90")
+			   (UNSPEC_CMLA "")
+			   (UNSPEC_CMLA_CONJ "")
+			   (UNSPEC_CMUL "")
+			   (UNSPEC_CMUL_CONJ "")
+			   (UNSPEC_CMLA180 "270")
+			   (UNSPEC_CMLA180_CONJ "90")])
+
+(define_int_attr sve_rot2 [(UNSPEC_FCMLA "90")
+			   (UNSPEC_FCMLA_CONJ "270")
+			   (UNSPEC_FCMUL "90")
+			   (UNSPEC_FCMUL_CONJ "270")
+			   (UNSPEC_FCMLA180 "180")
+			   (UNSPEC_FCMLA180_CONJ "180")
+			   (UNSPEC_CMLA "90")
+			   (UNSPEC_CMLA_CONJ "270")
+			   (UNSPEC_CMUL "90")
+			   (UNSPEC_CMUL_CONJ "270")
+			   (UNSPEC_CMLA180 "180")
+			   (UNSPEC_CMLA180_CONJ "180")])
+
+
+(define_int_attr fcmac1 [(UNSPEC_FCMLA "a") (UNSPEC_FCMLA_CONJ "a")
+			 (UNSPEC_FCMLA180 "s") (UNSPEC_FCMLA180_CONJ "s")
+			 (UNSPEC_CMLA "a") (UNSPEC_CMLA_CONJ "a")
+			 (UNSPEC_CMLA180 "s") (UNSPEC_CMLA180_CONJ "s")])
 
 (define_int_attr sve_fmla_op [(UNSPEC_COND_FMLA "fmla")
 			      (UNSPEC_COND_FMLS "fmls")


-- 

--wRRV7LY7NUeQGEoC
Content-Type: text/x-diff; charset=utf-8
Content-Disposition: attachment; filename="rb13907.patch"

diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index 4b869ded918fd91ffd41e6ba068239a752b331e5..8a5f1dad224a99a8ba30669139259922a1250d0e 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -516,6 +516,47 @@ (define_insn "aarch64_fcmlaq_lane<rot><mode>"
   [(set_attr "type" "neon_fcmla")]
 )
 
+;; The complex mla/mls operations always need to expand to two instructions.
+;; The first operation does half the computation and the second does the
+;; remainder.  Because of this, expand early.
+(define_expand "cml<fcmac1><rot_op><mode>4"
+  [(set (match_operand:VHSDF 0 "register_operand")
+	(plus:VHSDF (match_operand:VHSDF 1 "register_operand")
+		    (unspec:VHSDF [(match_operand:VHSDF 2 "register_operand")
+				   (match_operand:VHSDF 3 "register_operand")]
+				   FCMLA_OP)))]
+  "TARGET_COMPLEX && !BYTES_BIG_ENDIAN"
+{
+  rtx tmp = gen_reg_rtx (<MODE>mode);
+  emit_insn (gen_aarch64_fcmla<rotsplit1><mode> (tmp, operands[1],
+						 operands[3], operands[2]));
+  emit_insn (gen_aarch64_fcmla<rotsplit2><mode> (operands[0], tmp,
+						 operands[3], operands[2]));
+  DONE;
+})
+
+;; The complex mul operations always need to expand to two instructions.
+;; The first operation does half the computation and the second does the
+;; remainder.  Because of this, expand early.
+(define_expand "cmul<rot_op><mode>3"
+  [(set (match_operand:VHSDF 0 "register_operand")
+	(unspec:VHSDF [(match_operand:VHSDF 1 "register_operand")
+		       (match_operand:VHSDF 2 "register_operand")]
+		       FCMUL_OP))]
+  "TARGET_COMPLEX && !BYTES_BIG_ENDIAN"
+{
+  rtx tmp = gen_reg_rtx (<MODE>mode);
+  rtx res1 = gen_reg_rtx (<MODE>mode);
+  emit_move_insn (tmp, CONST0_RTX (<MODE>mode));
+  emit_insn (gen_aarch64_fcmla<rotsplit1><mode> (res1, tmp,
+						 operands[2], operands[1]));
+  emit_insn (gen_aarch64_fcmla<rotsplit2><mode> (operands[0], res1,
+						 operands[2], operands[1]));
+  DONE;
+})
+
+
+
 ;; These instructions map to the __builtins for the Dot Product operations.
 (define_insn "aarch64_<sur>dot<vsi2qi>"
   [(set (match_operand:VS 0 "register_operand" "=w")
diff --git a/gcc/config/aarch64/aarch64-sve.md b/gcc/config/aarch64/aarch64-sve.md
index da15bd8788507feb12d52894c14e099370f34108..9dfe6a3f4512a20ba4f1e66a105ee0ae5d6949ea 100644
--- a/gcc/config/aarch64/aarch64-sve.md
+++ b/gcc/config/aarch64/aarch64-sve.md
@@ -7243,6 +7243,62 @@ (define_insn "@aarch64_pred_<optab><mode>"
   [(set_attr "movprfx" "*,yes")]
 )
 
+;; unpredicated optab pattern for auto-vectorizer
+;; The complex mla/mls operations always need to expand to two instructions.
+;; The first operation does half the computation and the second does the
+;; remainder.  Because of this, expand early.
+(define_expand "cml<fcmac1><rot_op><mode>4"
+  [(set (match_operand:SVE_FULL_F 0 "register_operand")
+	(unspec:SVE_FULL_F
+	  [(match_dup 4)
+	   (match_dup 5)
+	   (match_operand:SVE_FULL_F 1 "register_operand")
+	   (match_operand:SVE_FULL_F 2 "register_operand")
+	   (match_operand:SVE_FULL_F 3 "register_operand")]
+	  FCMLA_OP))]
+  "TARGET_SVE"
+{
+  operands[4] = aarch64_ptrue_reg (<VPRED>mode);
+  operands[5] = gen_int_mode (SVE_RELAXED_GP, SImode);
+  rtx tmp = gen_reg_rtx (<MODE>mode);
+  emit_insn
+    (gen_aarch64_pred_fcmla<sve_rot1><mode> (tmp, operands[4],
+					     operands[3], operands[2],
+					     operands[1], operands[5]));
+  emit_insn
+    (gen_aarch64_pred_fcmla<sve_rot2><mode> (operands[0], operands[4],
+					     operands[3], operands[2],
+					     tmp, operands[5]));
+  DONE;
+})
+
+;; unpredicated optab pattern for auto-vectorizer
+;; The complex mul operations always need to expand to two instructions.
+;; The first operation does half the computation and the second does the
+;; remainder.  Because of this, expand early.
+(define_expand "cmul<rot_op><mode>3"
+  [(set (match_operand:SVE_FULL_F 0 "register_operand")
+	(unspec:SVE_FULL_F
+	   [(match_operand:SVE_FULL_F 1 "register_operand")
+	    (match_operand:SVE_FULL_F 2 "register_operand")]
+	  FCMUL_OP))]
+  "TARGET_SVE"
+{
+  rtx pred_reg = aarch64_ptrue_reg (<VPRED>mode);
+  rtx gp_mode = gen_int_mode (SVE_RELAXED_GP, SImode);
+  rtx accum = force_reg (<MODE>mode, CONST0_RTX (<MODE>mode));
+  rtx tmp = gen_reg_rtx (<MODE>mode);
+  emit_insn
+    (gen_aarch64_pred_fcmla<sve_rot1><mode> (tmp, pred_reg,
+					     operands[2], operands[1],
+					     accum, gp_mode));
+  emit_insn
+    (gen_aarch64_pred_fcmla<sve_rot2><mode> (operands[0], pred_reg,
+					     operands[2], operands[1],
+					     tmp, gp_mode));
+  DONE;
+})
+
 ;; Predicated FCMLA with merging.
 (define_expand "@cond_<optab><mode>"
   [(set (match_operand:SVE_FULL_F 0 "register_operand")
diff --git a/gcc/config/aarch64/aarch64-sve2.md b/gcc/config/aarch64/aarch64-sve2.md
index 5cb9144da98af2d02b83043511a99b5723d7e8c0..b96708d03f4458726b32ec46c0078499e00b8549 100644
--- a/gcc/config/aarch64/aarch64-sve2.md
+++ b/gcc/config/aarch64/aarch64-sve2.md
@@ -1848,6 +1848,48 @@ (define_insn "@aarch64_<optab>_lane_<mode>"
   [(set_attr "movprfx" "*,yes")]
 )
 
+;; unpredicated optab pattern for auto-vectorizer
+;; The complex mla/mls operations always need to expand to two instructions.
+;; The first operation does half the computation and the second does the
+;; remainder.  Because of this, expand early.
+(define_expand "cml<fcmac1><rot_op><mode>4"
+  [(set (match_operand:SVE_FULL_I 0 "register_operand")
+	(plus:SVE_FULL_I (match_operand:SVE_FULL_I 1 "register_operand")
+	  (unspec:SVE_FULL_I
+	    [(match_operand:SVE_FULL_I 2 "register_operand")
+	     (match_operand:SVE_FULL_I 3 "register_operand")]
+	    SVE2_INT_CMLA_OP)))]
+  "TARGET_SVE2"
+{
+  rtx tmp = gen_reg_rtx (<MODE>mode);
+  emit_insn (gen_aarch64_sve_cmla<sve_rot1><mode> (tmp, operands[1],
+						   operands[3], operands[2]));
+  emit_insn (gen_aarch64_sve_cmla<sve_rot2><mode> (operands[0], tmp,
+						   operands[3], operands[2]));
+  DONE;
+})
+
+;; unpredicated optab pattern for auto-vectorizer
+;; The complex mul operations always need to expand to two instructions.
+;; The first operation does half the computation and the second does the
+;; remainder.  Because of this, expand early.
+(define_expand "cmul<rot_op><mode>3"
+  [(set (match_operand:SVE_FULL_I 0 "register_operand")
+	(unspec:SVE_FULL_I
+	  [(match_operand:SVE_FULL_I 1 "register_operand")
+	   (match_operand:SVE_FULL_I 2 "register_operand")]
+	  SVE2_INT_CMUL_OP))]
+  "TARGET_SVE2"
+{
+  rtx accum = force_reg (<MODE>mode, CONST0_RTX (<MODE>mode));
+  rtx tmp = gen_reg_rtx (<MODE>mode);
+  emit_insn (gen_aarch64_sve_cmla<sve_rot1><mode> (tmp, accum,
+						   operands[2], operands[1]));
+  emit_insn (gen_aarch64_sve_cmla<sve_rot2><mode> (operands[0], tmp,
+						   operands[2], operands[1]));
+  DONE;
+})
+
 ;; -------------------------------------------------------------------------
 ;; ---- [INT] Complex dot product
 ;; -------------------------------------------------------------------------
diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md
index d42a70653edb266f2b76924b75a814db25f08f23..3f61fc8e380abd922d39973f40a966b7ce64fa40 100644
--- a/gcc/config/aarch64/iterators.md
+++ b/gcc/config/aarch64/iterators.md
@@ -182,6 +182,11 @@ (define_mode_iterator V2F [V2SF V2DF])
 ;; All Advanced SIMD modes on which we support any arithmetic operations.
 (define_mode_iterator VALL [V8QI V16QI V4HI V8HI V2SI V4SI V2DI V2SF V4SF V2DF])
 
+;; All Advanced SIMD modes suitable for performing arithmetics.
+(define_mode_iterator VALL_ARITH [V8QI V16QI V4HI V8HI V2SI V4SI V2DI
+				  (V4HF "TARGET_SIMD_F16INST") (V8HF "TARGET_SIMD_F16INST")
+				  V2SF V4SF V2DF])
+
 ;; All Advanced SIMD modes suitable for moving, loading, and storing.
 (define_mode_iterator VALL_F16 [V8QI V16QI V4HI V8HI V2SI V4SI V2DI
 				V4HF V8HF V4BF V8BF V2SF V4SF V2DF])
@@ -712,6 +717,10 @@ (define_c_enum "unspec"
     UNSPEC_FCMLA90	; Used in aarch64-simd.md.
     UNSPEC_FCMLA180	; Used in aarch64-simd.md.
     UNSPEC_FCMLA270	; Used in aarch64-simd.md.
+    UNSPEC_FCMUL	; Used in aarch64-simd.md.
+    UNSPEC_FCMUL_CONJ	; Used in aarch64-simd.md.
+    UNSPEC_FCMLA_CONJ	; Used in aarch64-simd.md.
+    UNSPEC_FCMLA180_CONJ	; Used in aarch64-simd.md.
     UNSPEC_ASRD		; Used in aarch64-sve.md.
     UNSPEC_ADCLB	; Used in aarch64-sve2.md.
     UNSPEC_ADCLT	; Used in aarch64-sve2.md.
@@ -730,6 +739,10 @@ (define_c_enum "unspec"
     UNSPEC_CMLA180	; Used in aarch64-sve2.md.
     UNSPEC_CMLA270	; Used in aarch64-sve2.md.
     UNSPEC_CMLA90	; Used in aarch64-sve2.md.
+    UNSPEC_CMLA_CONJ	; Used in aarch64-sve2.md.
+    UNSPEC_CMLA180_CONJ	; Used in aarch64-sve2.md.
+    UNSPEC_CMUL		; Used in aarch64-sve2.md.
+    UNSPEC_CMUL_CONJ	; Used in aarch64-sve2.md.
     UNSPEC_COND_FCVTLT	; Used in aarch64-sve2.md.
     UNSPEC_COND_FCVTNT	; Used in aarch64-sve2.md.
     UNSPEC_COND_FCVTX	; Used in aarch64-sve2.md.
@@ -1291,7 +1304,7 @@ (define_mode_attr Vwide [(V2SF "v2df") (V4HF "v4sf")
 
 ;; Widened mode register suffixes for VD_BHSI/VQW/VQ_HSF.
 (define_mode_attr Vwtype [(V8QI "8h") (V4HI "4s")
-			  (V2SI "2d") (V16QI "8h") 
+			  (V2SI "2d") (V16QI "8h")
 			  (V8HI "4s") (V4SI "2d")
 			  (V8HF "4s") (V4SF "2d")])
 
@@ -1313,7 +1326,7 @@ (define_mode_attr Vewtype [(VNx16QI "h")
 
 ;; Widened mode register suffixes for VDW/VQW.
 (define_mode_attr Vmwtype [(V8QI ".8h") (V4HI ".4s")
-			   (V2SI ".2d") (V16QI ".8h") 
+			   (V2SI ".2d") (V16QI ".8h")
 			   (V8HI ".4s") (V4SI ".2d")
 			   (V4HF ".4s") (V2SF ".2d")
 			   (SI   "")    (HI   "")])
@@ -2611,6 +2624,19 @@ (define_int_iterator SVE2_INT_CMLA [UNSPEC_CMLA
 				    UNSPEC_SQRDCMLAH180
 				    UNSPEC_SQRDCMLAH270])
 
+;; Unlike the normal CMLA instructions these represent the actual operation you
+;; to be performed.  They will always need to be expanded into multiple
+;; sequences consisting of CMLA.
+(define_int_iterator SVE2_INT_CMLA_OP [UNSPEC_CMLA
+				       UNSPEC_CMLA_CONJ
+				       UNSPEC_CMLA180])
+
+;; Unlike the normal CMLA instructions these represent the actual operation you
+;; to be performed.  They will always need to be expanded into multiple
+;; sequences consisting of CMLA.
+(define_int_iterator SVE2_INT_CMUL_OP [UNSPEC_CMUL
+				       UNSPEC_CMUL_CONJ])
+
 ;; Same as SVE2_INT_CADD but exclude the saturating instructions
 (define_int_iterator SVE2_INT_CADD_OP [UNSPEC_CADD90
 				       UNSPEC_CADD270])
@@ -2725,6 +2751,14 @@ (define_int_iterator FMMLA [UNSPEC_FMMLA])
 (define_int_iterator BF_MLA [UNSPEC_BFMLALB
 			     UNSPEC_BFMLALT])
 
+(define_int_iterator FCMLA_OP [UNSPEC_FCMLA
+			       UNSPEC_FCMLA180
+			       UNSPEC_FCMLA_CONJ
+			       UNSPEC_FCMLA180_CONJ])
+
+(define_int_iterator FCMUL_OP [UNSPEC_FCMUL
+			       UNSPEC_FCMUL_CONJ])
+
 ;; Iterators for atomic operations.
 
 (define_int_iterator ATOMIC_LDOP
@@ -3435,7 +3469,79 @@ (define_int_attr rot [(UNSPEC_CADD90 "90")
 		      (UNSPEC_COND_FCMLA "0")
 		      (UNSPEC_COND_FCMLA90 "90")
 		      (UNSPEC_COND_FCMLA180 "180")
-		      (UNSPEC_COND_FCMLA270 "270")])
+		      (UNSPEC_COND_FCMLA270 "270")
+		      (UNSPEC_FCMUL "0")
+		      (UNSPEC_FCMUL_CONJ "180")])
+
+;; A conjucate is a negation of the imaginary component
+;; The number in the unspecs are the rotation component of the instruction, e.g
+;; FCMLA180 means use the instruction with #180.
+;; The iterator is used to produce the right name mangling for the function.
+(define_int_attr rot_op [(UNSPEC_FCMLA180 "")
+			 (UNSPEC_FCMLA180_CONJ "_conj")
+			 (UNSPEC_FCMLA "")
+			 (UNSPEC_FCMLA_CONJ "_conj")
+			 (UNSPEC_FCMUL "")
+			 (UNSPEC_FCMUL_CONJ "_conj")
+			 (UNSPEC_CMLA "")
+			 (UNSPEC_CMLA180 "")
+			 (UNSPEC_CMLA_CONJ "_conj")
+			 (UNSPEC_CMUL "")
+			 (UNSPEC_CMUL_CONJ "_conj")])
+
+;; The complex operations when performed on a real complex number require two
+;; instructions to perform the operation. e.g. complex multiplication requires
+;; two FCMUL with a particular rotation value.
+;;
+;; These values can be looked up in rotsplit1 and rotsplit2.  as an example
+;; FCMUL needs the first instruction to use #0 and the second #90.
+(define_int_attr rotsplit1 [(UNSPEC_FCMLA "0")
+			    (UNSPEC_FCMLA_CONJ "0")
+			    (UNSPEC_FCMUL "0")
+			    (UNSPEC_FCMUL_CONJ "0")
+			    (UNSPEC_FCMLA180 "270")
+			    (UNSPEC_FCMLA180_CONJ "90")])
+
+(define_int_attr rotsplit2 [(UNSPEC_FCMLA "90")
+			    (UNSPEC_FCMLA_CONJ "270")
+			    (UNSPEC_FCMUL "90")
+			    (UNSPEC_FCMUL_CONJ "270")
+			    (UNSPEC_FCMLA180 "180")
+			    (UNSPEC_FCMLA180_CONJ "180")])
+
+;; SVE has slightly different namings from NEON so we have to split these
+;; iterators.
+(define_int_attr sve_rot1 [(UNSPEC_FCMLA "")
+			   (UNSPEC_FCMLA_CONJ "")
+			   (UNSPEC_FCMUL "")
+			   (UNSPEC_FCMUL_CONJ "")
+			   (UNSPEC_FCMLA180 "270")
+			   (UNSPEC_FCMLA180_CONJ "90")
+			   (UNSPEC_CMLA "")
+			   (UNSPEC_CMLA_CONJ "")
+			   (UNSPEC_CMUL "")
+			   (UNSPEC_CMUL_CONJ "")
+			   (UNSPEC_CMLA180 "270")
+			   (UNSPEC_CMLA180_CONJ "90")])
+
+(define_int_attr sve_rot2 [(UNSPEC_FCMLA "90")
+			   (UNSPEC_FCMLA_CONJ "270")
+			   (UNSPEC_FCMUL "90")
+			   (UNSPEC_FCMUL_CONJ "270")
+			   (UNSPEC_FCMLA180 "180")
+			   (UNSPEC_FCMLA180_CONJ "180")
+			   (UNSPEC_CMLA "90")
+			   (UNSPEC_CMLA_CONJ "270")
+			   (UNSPEC_CMUL "90")
+			   (UNSPEC_CMUL_CONJ "270")
+			   (UNSPEC_CMLA180 "180")
+			   (UNSPEC_CMLA180_CONJ "180")])
+
+
+(define_int_attr fcmac1 [(UNSPEC_FCMLA "a") (UNSPEC_FCMLA_CONJ "a")
+			 (UNSPEC_FCMLA180 "s") (UNSPEC_FCMLA180_CONJ "s")
+			 (UNSPEC_CMLA "a") (UNSPEC_CMLA_CONJ "a")
+			 (UNSPEC_CMLA180 "s") (UNSPEC_CMLA180_CONJ "s")])
 
 (define_int_attr sve_fmla_op [(UNSPEC_COND_FMLA "fmla")
 			      (UNSPEC_COND_FMLS "fmls")


--wRRV7LY7NUeQGEoC--