From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from EUR03-VE1-obe.outbound.protection.outlook.com (mail-eopbgr50080.outbound.protection.outlook.com [40.107.5.80]) by sourceware.org (Postfix) with ESMTPS id ED19B386F44F for ; Thu, 10 Dec 2020 17:00:19 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org ED19B386F44F Received: from AM6P192CA0012.EURP192.PROD.OUTLOOK.COM (2603:10a6:209:83::25) by VI1PR0802MB2351.eurprd08.prod.outlook.com (2603:10a6:800:a0::10) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3632.19; Thu, 10 Dec 2020 17:00:17 +0000 Received: from VE1EUR03FT059.eop-EUR03.prod.protection.outlook.com (2603:10a6:209:83:cafe::c5) by AM6P192CA0012.outlook.office365.com (2603:10a6:209:83::25) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3654.12 via Frontend Transport; Thu, 10 Dec 2020 17:00:17 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; gcc.gnu.org; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;gcc.gnu.org; dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by VE1EUR03FT059.mail.protection.outlook.com (10.152.19.60) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3654.12 via Frontend Transport; Thu, 10 Dec 2020 17:00:16 +0000 Received: ("Tessian outbound 76bd5a04122f:v71"); Thu, 10 Dec 2020 17:00:16 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: c6441e6c2e7a2e97 X-CR-MTA-TID: 64aa7808 Received: from ce5ae83fb9ea.2 by 64aa7808-outbound-1.mta.getcheckrecipient.com id 6857171B-4B50-4BAC-BDEF-2A0C7E819C18.1; Thu, 10 Dec 2020 16:59:59 +0000 Received: from EUR04-HE1-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id ce5ae83fb9ea.2 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Thu, 10 Dec 2020 16:59:59 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=DG/KrpZvCSufSxb/3O+wRerZf8wCEFExOgXf8hAjK/D3G7LoMNlrvYjdRXV7PyRvYOJKtzBc1f+lt1NG1F+LX3F18tIDHuCAK2jJ5X8LODTF36Qyz3bf4NfFwPUXB/trcLphtXakhGgr0rShX0+XmczKE2MrxlVQK9KvJsmxpZW9UoBlpRn1uopvEbM6xbRIsEvTH0uKLqqDbfizrBHkjPFAyHV93uJqn9PoctBoKSv+kkluhn6bBbeQZwspQXod+4RPSO2M8qhLS0S6c+gxnnKekJ0IHNa3DzX1WCMuBym3H08vBxgw32GCxsz0GIGX2ruh9vvVrYyqjYKVnz0KTA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=8BPUtg6uYp7y4psF8m9X78zn3yoK6oYqtkb5r+e2m7A=; b=mPQ1FaVIQTWhiSCrwpIUZ+a8j7r/b3nnHOGgvceOBnW1xrOieUTF+APjdxyXh05sS10hlB11xHBHRG1e/UfwFxEMBrofG6MJmGA9LGnVwAJHyE/lbu2nQ3Izr80ZBPPkHaKvcc8mqkT9Lgx/JNabBQggcHzFCJrLRKgvOp4btWAVV0oaE5GO9jpfiJpjRdvqeba/aqf2pU4jI/NP90sMsCoKJT39qflJKdOfuPnxdBTFrVFuR4WBGorC+/hyAn6RNzCy6bBgRwl7JY4hsDIw+26LECTTOW25oSeiouOd8bnI/O+Su44fJ7JkWm6qUjtAK9rRk8NpIVG6wKU7reg4NQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none Authentication-Results-Original: gcc.gnu.org; dkim=none (message not signed) header.d=none;gcc.gnu.org; dmarc=none action=none header.from=arm.com; Received: from VI1PR08MB5325.eurprd08.prod.outlook.com (2603:10a6:803:13e::17) by VI1PR08MB5326.eurprd08.prod.outlook.com (2603:10a6:803:12d::12) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3632.17; Thu, 10 Dec 2020 16:59:56 +0000 Received: from VI1PR08MB5325.eurprd08.prod.outlook.com ([fe80::f937:5b3:12e1:8297]) by VI1PR08MB5325.eurprd08.prod.outlook.com ([fe80::f937:5b3:12e1:8297%6]) with mapi id 15.20.3632.023; Thu, 10 Dec 2020 16:59:56 +0000 Date: Thu, 10 Dec 2020 16:59:53 +0000 From: Tamar Christina To: gcc-patches@gcc.gnu.org Cc: nd@arm.com, Richard.Earnshaw@arm.com, Marcus.Shawcroft@arm.com, Kyrylo.Tkachov@arm.com, richard.sandiford@arm.com Subject: [PATCH]AArch64: Add NEON, SVE and SVE2 RTL patterns for Complex Addition, Multiply and FMA. Message-ID: Content-Type: multipart/mixed; boundary="gKMricLos+KVdGMg" Content-Disposition: inline User-Agent: Mutt/1.9.4 (2018-02-28) X-Originating-IP: [217.140.106.53] X-ClientProxiedBy: LO2P265CA0226.GBRP265.PROD.OUTLOOK.COM (2603:10a6:600:b::22) To VI1PR08MB5325.eurprd08.prod.outlook.com (2603:10a6:803:13e::17) MIME-Version: 1.0 X-MS-Exchange-MessageSentRepresentingType: 1 Received: from arm.com (217.140.106.53) by LO2P265CA0226.GBRP265.PROD.OUTLOOK.COM (2603:10a6:600:b::22) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3654.12 via Frontend Transport; Thu, 10 Dec 2020 16:59:55 +0000 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-HT: Tenant X-MS-Office365-Filtering-Correlation-Id: 98e88f14-fbb5-4bc2-3478-08d89d2d11cf X-MS-TrafficTypeDiagnostic: VI1PR08MB5326:|VI1PR0802MB2351: X-MS-Exchange-Transport-Forked: True X-Microsoft-Antispam-PRVS: x-checkrecipientrouted: true NoDisclaimer: true X-MS-Oob-TLC-OOBClassifiers: OLM:1284;OLM:1284; X-MS-Exchange-SenderADCheck: 1 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: 13gs+YVp0f8CqIqtR/ZDs99nsy8qnVjCtCmWc0qHYdA/etFCudUqBgp5TbRovj/ssnyLTZ33/4h5CjiPffTvkTuNe7h/szTDM9XHOuc33CSaaSYHiOImp18bnb0FMo+WAno4WLUw/V65brWueF07wc4Z38mtgeZXdKEE6I+nJboNxE479kbbRp3enfeUEmHF5dBxk73FnkIyC0U0X8Z1kHXV7mtWYGtYAJTBlr7zYDC6NJhvmoJgzSxatJp7h3xCR6Mt4kHBurjtf/Xd4tPbqVAlmoRXR72AEFT1/Az/NuwIcmDL4IeVEPafbM5yFnAq691Vik46kAQpYSd0b/tbO57Qsswf1wcHr72BmpAGsSRzbAZf2A3TAuyYJUf1xwRbJ4h2Jn9F8hzeXjwlIdLiog== X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:VI1PR08MB5325.eurprd08.prod.outlook.com; PTR:; CAT:NONE; SFS:(4636009)(136003)(346002)(39860400002)(396003)(366004)(376002)(52116002)(55016002)(8676002)(86362001)(33964004)(66616009)(316002)(44832011)(44144004)(83380400001)(26005)(956004)(6916009)(2616005)(16526019)(66556008)(186003)(8936002)(8886007)(66476007)(4326008)(7696005)(2906002)(478600001)(36756003)(235185007)(5660300002)(66946007)(4743002)(4216001)(2700100001); DIR:OUT; SFP:1101; X-MS-Exchange-AntiSpam-MessageData: =?utf-8?B?MGc5eFRyWXJmQUZ5R3lTOERnWHRTdWJQM3lFYmxrckdJcjJKYS9CdFBVdXFQ?= =?utf-8?B?NGN1cmMrMDRxMk13Z3NLa1d6VW5QUlBqS3lxYVExSWk1MFBDbW5SbVBZREln?= =?utf-8?B?RUtlVHdFRXZVUi8vWlVvNlJsWVFCSFRHZThmbU9kUXk0WWZjcURqT1BDbTVp?= =?utf-8?B?ekdKeTJQOW42WEtTNlBaSGRwVzNXaHF0KzRxVXNCeUJ0djFqTWF0dFVvbDI3?= =?utf-8?B?OHN2em5SZjRxSWtYaUZLNWdZT0luVUh4citaajBHcjlzU0c1RHR5OW8wZlRa?= =?utf-8?B?OEN6NUtraXRIbXovcGdIL2VubnRzSWE1WHprVWp3ejlPbzRmOFVtdDA0TlhM?= =?utf-8?B?UURaenNySlNjYzllMThLblpiVzB6NGFJeHpIOGxaekQrRkZZNzRnQlR1UWVu?= =?utf-8?B?b2t4dXVCSzB6L3FDUEJtVlRIaGFlb00wVXQ4NlBhK1FFQ1RUTWZtb1c4a3Bw?= =?utf-8?B?VDRrSnZydUFTOVJONGVBTzVjdE5hWkx0czV3Rnc2RzZqbnB3dDRIVGVPaWNI?= =?utf-8?B?bDNUQ3M4OEhPa1kwcHZ0bHZWOUZuZWNPQUpWeWRndUViemZPWTlHQ3RnQWYv?= =?utf-8?B?SXBjZzhMNWZiOW1hdkhTaExMb242QU9zeC80d3hISjBXcUZKRkJ1MHVTM09a?= =?utf-8?B?UkgrT2F2NnFrek9UNXdDcW9mdTE5Zjh5MFdBZS9vN3hCcGV0UTEzK1MvZWNz?= =?utf-8?B?OUdkbkxCZ2pqSVFEYVozZlp0dGkxTEJXVmFVYzVIbjRyVGhSVjFFSDlPb0g3?= =?utf-8?B?WnpnakNaWndYZmJTNUNRMThhM0J5NzVHYkNWSmNxejVSRjliYldBS21va0lr?= =?utf-8?B?cWJhK2FNd3MzSGdJbE9XdGNWRUFmL3NzV0ZrOGVDcDkySmhTMUpXbHJpaEJD?= =?utf-8?B?M3dES1NBcysvdTg5d2k1U1UxakFOb1Z2aytnUXUxcUxUd2FiWmh3VmhBUDQ0?= =?utf-8?B?TExONXBxNmdOY1RHb1RZbmtUQk5zYkVmVXRaRkhremk5ZjBDeDlMWHNwS2FG?= =?utf-8?B?N0pGdmJFZmRIbjhnQjFoQnNlWnVhcm9aSlo3YUh3VWVJWGt2WWpiejBGbXBP?= =?utf-8?B?WG1XVDJ6T0ZhYzZwTmIvcE0wcFFlOGU4cDVLZHVyMjcrQnZNNmVnSVc5TzQz?= =?utf-8?B?U1ordCszbXhlUVM4UXhzZ0JJVVViSm5pb0RqN21YcUpLdWdTdHdIbkZOZ1dI?= =?utf-8?B?ZDBqZUVjZjFTQjYrZUZMeFJLd2xUVkI0ZkhvS2NMMlVDNEYxSThSQWVLVnFM?= =?utf-8?B?dUFHbjRPcFdpRFRuMW82QlNsMDJZOWtsMStkczE1ckZTVG5LNk1CK3VvUDlZ?= =?utf-8?Q?fhOlRbKVfvzVpEfNlvWHfYo4CVlf9mVEDp?= X-MS-Exchange-Transport-CrossTenantHeadersStamped: VI1PR08MB5326 Original-Authentication-Results: gcc.gnu.org; dkim=none (message not signed) header.d=none;gcc.gnu.org; dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: VE1EUR03FT059.eop-EUR03.prod.protection.outlook.com X-MS-Office365-Filtering-Correlation-Id-Prvs: 69565b8d-9ff0-4c7a-4ec8-08d89d2d057e X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: zxdNAj8P9V7kO8iEJABjh6g2zoQ7CxmHGtIRjqaVQU+KMdOzSIxADp55kwoMahsvrfP49ehQWqcaRSayrpYbX6ZdXsSAOaKLl8V3iYc9A7ivLjB3pE1Cnee89Co/oTnSlAa5E2K0VjFIhfHYid/XoozhiSR0yaOnOkKVVkrOtGZAoxu2ipeWRSsmTgypPJo4L/SElCZ/y64Vj3EKD4v6PFFMc8O83kt5rWIHxt3t8o66zlTVA4TMti45xzMit2hXph244oNynM4siiy1qmnAcVklVhlmqSYcETsimZcnkOMkGL+NdzUv5/4VZTsf9nSeZIo6TJGDoHvaJOneHepUFsgqat9aNL/wBRtyF66/um5sNJAQu9iB2aeIcgKvSGOMTLaN2QoFDcTDMN4Obnx18atozEvay2z5v3N5i8i35mq9L5frjejdsPk5tkon6jQX X-Forefront-Antispam-Report: CIP:63.35.35.123; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:64aa7808-outbound-1.mta.getcheckrecipient.com; PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com; CAT:NONE; SFS:(4636009)(346002)(396003)(39860400002)(136003)(376002)(46966005)(356005)(66616009)(55016002)(336012)(82310400003)(86362001)(44832011)(186003)(82740400003)(8936002)(2616005)(70586007)(47076004)(4326008)(956004)(44144004)(70206006)(33964004)(83380400001)(81166007)(36756003)(4743002)(6916009)(235185007)(2906002)(26005)(7696005)(8886007)(478600001)(8676002)(316002)(16526019)(5660300002)(4216001)(2700100001); DIR:OUT; SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 10 Dec 2020 17:00:16.9486 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 98e88f14-fbb5-4bc2-3478-08d89d2d11cf X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[63.35.35.123]; Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: VE1EUR03FT059.eop-EUR03.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: VI1PR0802MB2351 X-Spam-Status: No, score=-13.5 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, GIT_PATCH_0, KAM_ASCII_DIVIDERS, KAM_LOTSOFHASH, MSGID_FROM_MTA_HEADER, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_PASS, TXREP, UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 10 Dec 2020 17:00:23 -0000 --gKMricLos+KVdGMg Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Hi All, This adds implementation for the optabs for complex operations. With this the following C code: void f90 (float complex a[restrict N], float complex b[restrict N], float complex c[restrict N]) { for (int i=0; i < N; i++) c[i] = a[i] + (b[i] * I); } generates f90: mov x3, 0 .p2align 3,,7 .L2: ldr q0, [x0, x3] ldr q1, [x1, x3] fcadd v0.4s, v0.4s, v1.4s, #90 str q0, [x2, x3] add x3, x3, 16 cmp x3, 1600 bne .L2 ret instead of f90: add x3, x1, 1600 .p2align 3,,7 .L2: ld2 {v4.4s - v5.4s}, [x0], 32 ld2 {v2.4s - v3.4s}, [x1], 32 fsub v0.4s, v4.4s, v3.4s fadd v1.4s, v5.4s, v2.4s st2 {v0.4s - v1.4s}, [x2], 32 cmp x3, x1 bne .L2 ret It defined a new iterator VALL_ARITH which contains types for which we can do general arithmetic (excludes bfloat16). Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. Checked with armv8-a+sve2+fp16 and no issues. Note that sue to a mid-end limitation SLP for SVE currently fails for some permutes. The tests have these marked as XFAIL. I do intend to fix this soon. Matching tests for these are in the mid-end patches. Note that The mid-end patches are still being respun and I may need to change the order of some parameters but no other change is expected and would like to decrease the size of future patches. As such.. Ok for master? Thanks, Tamar gcc/ChangeLog: * config/aarch64/aarch64-simd.md (cadd3, cml4, cmul3): New. * config/aarch64/iterators.md (VALL_ARITH, UNSPEC_FCMUL, UNSPEC_FCMUL180, UNSPEC_FCMLS, UNSPEC_FCMLS180, UNSPEC_CMLS, UNSPEC_CMLS180, UNSPEC_CMUL, UNSPEC_CMUL180, FCMLA_OP, FCMUL_OP, rot_op, rotsplit1, rotsplit2, fcmac1, sve_rot1, sve_rot2, SVE2_INT_CMLA_OP, SVE2_INT_CMUL_OP, SVE2_INT_CADD_OP): New.): New.): New. (rot): Add UNSPEC_FCMLS, UNSPEC_FCMUL, UNSPEC_FCMUL180. * config/aarch64/aarch64-sve.md (cadd3, cml4, cmul3): New. * config/aarch64/aarch64-sve2.md (cadd3, cml4, cmul3): New. -- --gKMricLos+KVdGMg Content-Type: text/x-diff; charset=utf-8 Content-Disposition: attachment; filename="rb13907.patch" diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index 68baf416045178b0ebcfeb8de2d201f625f1c317..1aa74beeee154e054f2a01f8843dfed218fe850b 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -449,6 +449,14 @@ (define_insn "aarch64_fcadd" [(set_attr "type" "neon_fcadd")] ) +(define_expand "cadd3" + [(set (match_operand:VHSDF 0 "register_operand") + (unspec:VHSDF [(match_operand:VHSDF 1 "register_operand") + (match_operand:VHSDF 2 "register_operand")] + FCADD))] + "TARGET_COMPLEX && !BYTES_BIG_ENDIAN" +) + (define_insn "aarch64_fcmla" [(set (match_operand:VHSDF 0 "register_operand" "=w") (plus:VHSDF (match_operand:VHSDF 1 "register_operand" "0") @@ -508,6 +516,47 @@ (define_insn "aarch64_fcmlaq_lane" [(set_attr "type" "neon_fcmla")] ) +;; The complex mla/mls operations always need to expand to two instructions. +;; The first operation does half the computation and the second does the +;; remainder. Because of this, expand early. +(define_expand "cml4" + [(set (match_operand:VHSDF 0 "register_operand") + (plus:VHSDF (match_operand:VHSDF 1 "register_operand") + (unspec:VHSDF [(match_operand:VHSDF 2 "register_operand") + (match_operand:VHSDF 3 "register_operand")] + FCMLA_OP)))] + "TARGET_COMPLEX && !BYTES_BIG_ENDIAN" +{ + rtx tmp = gen_reg_rtx (mode); + emit_insn (gen_aarch64_fcmla (tmp, operands[1], + operands[2], operands[3])); + emit_insn (gen_aarch64_fcmla (operands[0], tmp, + operands[2], operands[3])); + DONE; +}) + +;; The complex mul operations always need to expand to two instructions. +;; The first operation does half the computation and the second does the +;; remainder. Because of this, expand early. +(define_expand "cmul3" + [(set (match_operand:VHSDF 0 "register_operand") + (unspec:VHSDF [(match_operand:VHSDF 1 "register_operand") + (match_operand:VHSDF 2 "register_operand")] + FCMUL_OP))] + "TARGET_COMPLEX && !BYTES_BIG_ENDIAN" +{ + rtx tmp = gen_reg_rtx (mode); + rtx res1 = gen_reg_rtx (mode); + emit_move_insn (tmp, CONST0_RTX (mode)); + emit_insn (gen_aarch64_fcmla (res1, tmp, + operands[1], operands[2])); + emit_insn (gen_aarch64_fcmla (operands[0], res1, + operands[1], operands[2])); + DONE; +}) + + + ;; These instructions map to the __builtins for the Dot Product operations. (define_insn "aarch64_dot" [(set (match_operand:VS 0 "register_operand" "=w") diff --git a/gcc/config/aarch64/aarch64-sve.md b/gcc/config/aarch64/aarch64-sve.md index 6359c40bdecda6c126bd70bef66561dd1da44dc9..7d27a84016d687cb6c019f98b99a7aacf8b3a031 100644 --- a/gcc/config/aarch64/aarch64-sve.md +++ b/gcc/config/aarch64/aarch64-sve.md @@ -5480,6 +5480,20 @@ (define_expand "@cond_" "TARGET_SVE" ) +;; Predicated FCADD using ptrue for unpredicated optab for auto-vectorizer +(define_expand "@cadd3" + [(set (match_operand:SVE_FULL_F 0 "register_operand") + (unspec:SVE_FULL_F + [(match_dup 3) + (const_int SVE_RELAXED_GP) + (match_operand:SVE_FULL_F 1 "register_operand") + (match_operand:SVE_FULL_F 2 "register_operand")] + SVE_COND_FCADD))] + "TARGET_SVE" +{ + operands[3] = aarch64_ptrue_reg (mode); +}) + ;; Predicated FCADD, merging with the first input. (define_insn_and_rewrite "*cond__2_relaxed" [(set (match_operand:SVE_FULL_F 0 "register_operand" "=w, ?&w") @@ -7152,6 +7166,64 @@ (define_insn "@aarch64_pred_" [(set_attr "movprfx" "*,yes")] ) +;; unpredicated optab pattern for auto-vectorizer +;; The complex mla/mls operations always need to expand to two instructions. +;; The first operation does half the computation and the second does the +;; remainder. Because of this, expand early. +(define_expand "cml4" + [(set (match_operand:SVE_FULL_F 0 "register_operand") + (unspec:SVE_FULL_F + [(match_dup 4) + (match_dup 5) + (match_operand:SVE_FULL_F 1 "register_operand") + (match_operand:SVE_FULL_F 2 "register_operand") + (match_operand:SVE_FULL_F 3 "register_operand")] + FCMLA_OP))] + "TARGET_SVE && !BYTES_BIG_ENDIAN" +{ + operands[4] = aarch64_ptrue_reg (mode); + operands[5] = gen_int_mode (SVE_RELAXED_GP, SImode); + rtx tmp = gen_reg_rtx (mode); + emit_insn ( + gen_aarch64_pred_fcmla (tmp, operands[4], + operands[1], operands[2], + operands[3], operands[5])); + emit_insn ( + gen_aarch64_pred_fcmla (operands[0], operands[4], + tmp, operands[2], + operands[3], operands[5])); + DONE; +}) + +;; unpredicated optab pattern for auto-vectorizer +;; The complex mul operations always need to expand to two instructions. +;; The first operation does half the computation and the second does the +;; remainder. Because of this, expand early. +(define_expand "cmul3" + [(set (match_operand:SVE_FULL_F 0 "register_operand") + (unspec:SVE_FULL_F + [(match_dup 3) + (match_dup 4) + (match_operand:SVE_FULL_F 1 "register_operand") + (match_operand:SVE_FULL_F 2 "register_operand") + (match_dup 5)] + FCMUL_OP))] + "TARGET_SVE && !BYTES_BIG_ENDIAN" +{ + operands[3] = aarch64_ptrue_reg (mode); + operands[4] = gen_int_mode (SVE_RELAXED_GP, SImode); + operands[5] = force_reg (mode, CONST0_RTX (mode)); + rtx tmp = gen_reg_rtx (mode); + emit_insn ( + gen_aarch64_pred_fcmla (tmp, operands[3], operands[1], + operands[2], operands[5], operands[4])); + emit_insn ( + gen_aarch64_pred_fcmla (operands[0], operands[3], operands[1], + operands[2], tmp, + operands[4])); + DONE; +}) + ;; Predicated FCMLA with merging. (define_expand "@cond_" [(set (match_operand:SVE_FULL_F 0 "register_operand") diff --git a/gcc/config/aarch64/aarch64-sve2.md b/gcc/config/aarch64/aarch64-sve2.md index 772c35079c9441448534471fba4dba622322b8fc..58594f985e5a98a188f32d96c6f71c9f4e0a6f05 100644 --- a/gcc/config/aarch64/aarch64-sve2.md +++ b/gcc/config/aarch64/aarch64-sve2.md @@ -1799,6 +1799,16 @@ (define_insn "@aarch64_sve_" [(set_attr "movprfx" "*,yes")] ) +;; unpredicated optab pattern for auto-vectorizer +(define_expand "cadd3" + [(set (match_operand:SVE_FULL_I 0 "register_operand") + (unspec:SVE_FULL_I + [(match_operand:SVE_FULL_I 1 "register_operand") + (match_operand:SVE_FULL_I 2 "register_operand")] + SVE2_INT_CADD_OP))] + "TARGET_SVE2 && !BYTES_BIG_ENDIAN" +) + ;; ------------------------------------------------------------------------- ;; ---- [INT] Complex ternary operations ;; ------------------------------------------------------------------------- @@ -1838,6 +1848,49 @@ (define_insn "@aarch64__lane_" [(set_attr "movprfx" "*,yes")] ) +;; unpredicated optab pattern for auto-vectorizer +;; The complex mla/mls operations always need to expand to two instructions. +;; The first operation does half the computation and the second does the +;; remainder. Because of this, expand early. +(define_expand "cml4" + [(set (match_operand:SVE_FULL_I 0 "register_operand") + (plus:SVE_FULL_I (match_operand:SVE_FULL_I 1 "register_operand") + (unspec:SVE_FULL_I + [(match_operand:SVE_FULL_I 2 "register_operand") + (match_operand:SVE_FULL_I 3 "register_operand")] + SVE2_INT_CMLA_OP)))] + "TARGET_SVE2 && !BYTES_BIG_ENDIAN" +{ + rtx tmp = gen_reg_rtx (mode); + emit_insn (gen_aarch64_sve_cmla (tmp, operands[1], + operands[2], operands[3])); + emit_insn (gen_aarch64_sve_cmla (operands[0], tmp, + operands[2], operands[3])); + DONE; +}) + +;; unpredicated optab pattern for auto-vectorizer +;; The complex mul operations always need to expand to two instructions. +;; The first operation does half the computation and the second does the +;; remainder. Because of this, expand early. +(define_expand "cmul3" + [(set (match_operand:SVE_FULL_I 0 "register_operand") + (unspec:SVE_FULL_I + [(match_operand:SVE_FULL_I 1 "register_operand") + (match_operand:SVE_FULL_I 2 "register_operand") + (match_dup 3)] + SVE2_INT_CMUL_OP))] + "TARGET_SVE2 && !BYTES_BIG_ENDIAN" +{ + operands[3] = force_reg (mode, CONST0_RTX (mode)); + rtx tmp = gen_reg_rtx (mode); + emit_insn (gen_aarch64_sve_cmla (tmp, operands[3], + operands[1], operands[2])); + emit_insn (gen_aarch64_sve_cmla (operands[0], tmp, + operands[1], operands[2])); + DONE; +}) + ;; ------------------------------------------------------------------------- ;; ---- [INT] Complex dot product ;; ------------------------------------------------------------------------- diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md index fb1426b7752890848cb49722ef7442d96cb1408b..dd88e63f4e3a60ffe0d0276f13c6068161511cb9 100644 --- a/gcc/config/aarch64/iterators.md +++ b/gcc/config/aarch64/iterators.md @@ -182,6 +182,11 @@ (define_mode_iterator V2F [V2SF V2DF]) ;; All Advanced SIMD modes on which we support any arithmetic operations. (define_mode_iterator VALL [V8QI V16QI V4HI V8HI V2SI V4SI V2DI V2SF V4SF V2DF]) +;; All Advanced SIMD modes suitable for performing arithmetics. +(define_mode_iterator VALL_ARITH [V8QI V16QI V4HI V8HI V2SI V4SI V2DI + (V4HF "TARGET_SIMD_F16INST") (V8HF "TARGET_SIMD_F16INST") + V2SF V4SF V2DF]) + ;; All Advanced SIMD modes suitable for moving, loading, and storing. (define_mode_iterator VALL_F16 [V8QI V16QI V4HI V8HI V2SI V4SI V2DI V4HF V8HF V4BF V8BF V2SF V4SF V2DF]) @@ -708,6 +713,10 @@ (define_c_enum "unspec" UNSPEC_FCMLA90 ; Used in aarch64-simd.md. UNSPEC_FCMLA180 ; Used in aarch64-simd.md. UNSPEC_FCMLA270 ; Used in aarch64-simd.md. + UNSPEC_FCMUL ; Used in aarch64-simd.md. + UNSPEC_FCMUL180 ; Used in aarch64-simd.md. + UNSPEC_FCMLS ; Used in aarch64-simd.md. + UNSPEC_FCMLS180 ; Used in aarch64-simd.md. UNSPEC_ASRD ; Used in aarch64-sve.md. UNSPEC_ADCLB ; Used in aarch64-sve2.md. UNSPEC_ADCLT ; Used in aarch64-sve2.md. @@ -726,6 +735,10 @@ (define_c_enum "unspec" UNSPEC_CMLA180 ; Used in aarch64-sve2.md. UNSPEC_CMLA270 ; Used in aarch64-sve2.md. UNSPEC_CMLA90 ; Used in aarch64-sve2.md. + UNSPEC_CMLS ; Used in aarch64-sve2.md. + UNSPEC_CMLS180 ; Used in aarch64-sve2.md. + UNSPEC_CMUL ; Used in aarch64-sve2.md. + UNSPEC_CMUL180 ; Used in aarch64-sve2.md. UNSPEC_COND_FCVTLT ; Used in aarch64-sve2.md. UNSPEC_COND_FCVTNT ; Used in aarch64-sve2.md. UNSPEC_COND_FCVTX ; Used in aarch64-sve2.md. @@ -2598,6 +2611,23 @@ (define_int_iterator SVE2_INT_CMLA [UNSPEC_CMLA UNSPEC_SQRDCMLAH180 UNSPEC_SQRDCMLAH270]) +;; Unlike the normal CMLA instructions these represent the actual operation you +;; to be performed. They will always need to be expanded into multiple +;; sequences consisting of CMLA. +(define_int_iterator SVE2_INT_CMLA_OP [UNSPEC_CMLA + UNSPEC_CMLA180 + UNSPEC_CMLS]) + +;; Unlike the normal CMLA instructions these represent the actual operation you +;; to be performed. They will always need to be expanded into multiple +;; sequences consisting of CMLA. +(define_int_iterator SVE2_INT_CMUL_OP [UNSPEC_CMUL + UNSPEC_CMUL180]) + +;; Same as SVE2_INT_CADD but exclude the saturating instructions +(define_int_iterator SVE2_INT_CADD_OP [UNSPEC_CADD90 + UNSPEC_CADD270]) + (define_int_iterator SVE2_INT_CDOT [UNSPEC_CDOT UNSPEC_CDOT90 UNSPEC_CDOT180 @@ -2708,6 +2738,14 @@ (define_int_iterator FMMLA [UNSPEC_FMMLA]) (define_int_iterator BF_MLA [UNSPEC_BFMLALB UNSPEC_BFMLALT]) +(define_int_iterator FCMLA_OP [UNSPEC_FCMLA + UNSPEC_FCMLA180 + UNSPEC_FCMLS + UNSPEC_FCMLS180]) + +(define_int_iterator FCMUL_OP [UNSPEC_FCMUL + UNSPEC_FCMUL180]) + ;; Iterators for atomic operations. (define_int_iterator ATOMIC_LDOP @@ -3403,6 +3441,7 @@ (define_int_attr rot [(UNSPEC_CADD90 "90") (UNSPEC_CMLA270 "270") (UNSPEC_FCADD90 "90") (UNSPEC_FCADD270 "270") + (UNSPEC_FCMLS "0") (UNSPEC_FCMLA "0") (UNSPEC_FCMLA90 "90") (UNSPEC_FCMLA180 "180") @@ -3418,7 +3457,85 @@ (define_int_attr rot [(UNSPEC_CADD90 "90") (UNSPEC_COND_FCMLA "0") (UNSPEC_COND_FCMLA90 "90") (UNSPEC_COND_FCMLA180 "180") - (UNSPEC_COND_FCMLA270 "270")]) + (UNSPEC_COND_FCMLA270 "270") + (UNSPEC_FCMUL "0") + (UNSPEC_FCMUL180 "180")]) + +;; A conjucate is a negation of the imaginary component +;; The number in the inspecs are the rotation component of the instruction, e.g +;; FCMLS180 means use the instruction with #180. +;; The iterator is used to produce the right name mangling for the function. +;; +;; The rotation value does not directly correlate to a rotation along the argant +;; plane as the instructions only perform half the computation. +;; +;; For the implementation we threat any rotation by 0 as normal and 180 as +;; conjucate. This is only for implementing the vectorizer patterns. +(define_int_attr rot_op [(UNSPEC_FCMLS "") + (UNSPEC_FCMLS180 "_conj") + (UNSPEC_FCMLA "") + (UNSPEC_FCMLA180 "_conj") + (UNSPEC_FCMUL "") + (UNSPEC_FCMUL180 "_conj") + (UNSPEC_CMLS "") + (UNSPEC_CMLA "") + (UNSPEC_CMLA180 "_conj") + (UNSPEC_CMUL "") + (UNSPEC_CMUL180 "_conj")]) + +;; The complex operations when performed on a real complex number require two +;; instructions to perform the operation. e.g. complex multiplication requires +;; two FCMUL with a particular rotation value. +;; +;; These values can be looked up in rotsplit1 and rotsplit2. as an example +;; FCMUL needs the first instruction to use #0 and the second #90. +(define_int_attr rotsplit1 [(UNSPEC_FCMLA "0") + (UNSPEC_FCMLA180 "0") + (UNSPEC_FCMUL "0") + (UNSPEC_FCMUL180 "0") + (UNSPEC_FCMLS "270") + (UNSPEC_FCMLS180 "90")]) + +(define_int_attr rotsplit2 [(UNSPEC_FCMLA "90") + (UNSPEC_FCMLA180 "270") + (UNSPEC_FCMUL "90") + (UNSPEC_FCMUL180 "270") + (UNSPEC_FCMLS "180") + (UNSPEC_FCMLS180 "180")]) + +;; SVE has slightly different namings from NEON so we have to split these +;; iterators. +(define_int_attr sve_rot1 [(UNSPEC_FCMLA "") + (UNSPEC_FCMLA180 "") + (UNSPEC_FCMUL "") + (UNSPEC_FCMUL180 "") + (UNSPEC_FCMLS "270") + (UNSPEC_FCMLS180 "90") + (UNSPEC_CMLA "") + (UNSPEC_CMLA180 "") + (UNSPEC_CMUL "") + (UNSPEC_CMUL180 "") + (UNSPEC_CMLS "270") + (UNSPEC_CMLS180 "90")]) + +(define_int_attr sve_rot2 [(UNSPEC_FCMLA "90") + (UNSPEC_FCMLA180 "270") + (UNSPEC_FCMUL "90") + (UNSPEC_FCMUL180 "270") + (UNSPEC_FCMLS "180") + (UNSPEC_FCMLS180 "180") + (UNSPEC_CMLA "90") + (UNSPEC_CMLA180 "270") + (UNSPEC_CMUL "90") + (UNSPEC_CMUL180 "270") + (UNSPEC_CMLS "180") + (UNSPEC_CMLS180 "180")]) + + +(define_int_attr fcmac1 [(UNSPEC_FCMLA "a") (UNSPEC_FCMLA180 "a") + (UNSPEC_FCMLS "s") (UNSPEC_FCMLS180 "s") + (UNSPEC_CMLA "a") (UNSPEC_CMLA180 "a") + (UNSPEC_CMLS "s") (UNSPEC_CMLS180 "s")]) (define_int_attr sve_fmla_op [(UNSPEC_COND_FMLA "fmla") (UNSPEC_COND_FMLS "fmls") diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md index 592af35f038f48b5f4ac622a0ed944ffc2a140f2..43e1ebd87cf69e716474bb6ee9bcdd405523d8da 100644 --- a/gcc/config/arm/iterators.md +++ b/gcc/config/arm/iterators.md @@ -712,7 +712,7 @@ (define_mode_attr v_cmp_result [(V8QI "v8qi") (V16QI "v16qi") (DI "di") (V2DI "v2di") (V2SF "v2si") (V4SF "v4si")]) -;; Get element type from double-width mode, for operations where we +;; Get element type from double-width mode, for operations where we ;; don't care about signedness. (define_mode_attr V_if_elem [(V8QI "i8") (V16QI "i8") (V4HI "i16") (V8HI "i16") @@ -1180,7 +1180,49 @@ (define_int_attr rot [(UNSPEC_VCADD90 "90") (UNSPEC_VCMLA "0") (UNSPEC_VCMLA90 "90") (UNSPEC_VCMLA180 "180") - (UNSPEC_VCMLA270 "270")]) + (UNSPEC_VCMLA270 "270") + (UNSPEC_VCMUL "0") + (UNSPEC_VCMUL180 "180")]) + +;; A conjucate is a negation of the imaginary component +;; The number in the inspecs are the rotation component of the instruction, e.g +;; FCMLS180 means use the instruction with #180. +;; The iterator is used to produce the right name mangling for the function. +;; +;; The rotation value does not directly correlate to a rotation along the argant +;; plane as the instructions only perform half the computation. +;; +;; For the implementation we threat any rotation by 0 as normal and 180 as +;; conjucate. This is only for implementing the vectorizer patterns. +(define_int_attr rot_op [(UNSPEC_VCMLS "") + (UNSPEC_VCMLS180 "_conj") + (UNSPEC_VCMLA "") + (UNSPEC_VCMLA180 "_conj") + (UNSPEC_VCMUL "") + (UNSPEC_VCMUL180 "_conj")]) + +;; The complex operations when performed on a real complex number require two +;; instructions to perform the operation. e.g. complex multiplication requires +;; two FCMUL with a particular rotation value. +;; +;; These values can be looked up in rotsplit1 and rotsplit2. as an example +;; FCMUL needs the first instruction to use #0 and the second #90. +(define_int_attr rotsplit1 [(UNSPEC_VCMLA "0") + (UNSPEC_VCMLA180 "0") + (UNSPEC_VCMUL "0") + (UNSPEC_VCMUL180 "0") + (UNSPEC_VCMLS "270") + (UNSPEC_VCMLS180 "90")]) + +(define_int_attr rotsplit2 [(UNSPEC_VCMLA "90") + (UNSPEC_VCMLA180 "270") + (UNSPEC_VCMUL "90") + (UNSPEC_VCMUL180 "270") + (UNSPEC_VCMLS "180") + (UNSPEC_VCMLS180 "180")]) + +(define_int_attr fcmac1 [(UNSPEC_VCMLA "a") (UNSPEC_VCMLA180 "a") + (UNSPEC_VCMLS "s") (UNSPEC_VCMLS180 "s")]) (define_int_attr simd32_op [(UNSPEC_QADD8 "qadd8") (UNSPEC_QSUB8 "qsub8") (UNSPEC_SHADD8 "shadd8") (UNSPEC_SHSUB8 "shsub8") --gKMricLos+KVdGMg-- From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from EUR04-DB3-obe.outbound.protection.outlook.com (mail-eopbgr60042.outbound.protection.outlook.com [40.107.6.42]) by sourceware.org (Postfix) with ESMTPS id 5FC223857823 for ; Fri, 15 Jan 2021 15:30:33 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 5FC223857823 Received: from AS8PR04CA0196.eurprd04.prod.outlook.com (2603:10a6:20b:2f3::21) by PA4PR08MB6174.eurprd08.prod.outlook.com (2603:10a6:102:e6::8) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3763.10; Fri, 15 Jan 2021 15:30:31 +0000 Received: from VE1EUR03FT006.eop-EUR03.prod.protection.outlook.com (2603:10a6:20b:2f3:cafe::e3) by AS8PR04CA0196.outlook.office365.com (2603:10a6:20b:2f3::21) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3763.9 via Frontend Transport; Fri, 15 Jan 2021 15:30:31 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; gcc.gnu.org; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;gcc.gnu.org; dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by VE1EUR03FT006.mail.protection.outlook.com (10.152.18.116) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3763.12 via Frontend Transport; Fri, 15 Jan 2021 15:30:30 +0000 Received: ("Tessian outbound 2b57fdd78668:v71"); Fri, 15 Jan 2021 15:30:30 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: c6441e6c2e7a2e97 X-CR-MTA-TID: 64aa7808 Received: from e20ccc495de4.2 by 64aa7808-outbound-1.mta.getcheckrecipient.com id AE2F0DD5-2B95-478D-8E98-7ED0278515EE.1; Fri, 15 Jan 2021 15:30:25 +0000 Received: from EUR03-AM5-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id e20ccc495de4.2 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Fri, 15 Jan 2021 15:30:25 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=ZP1POM0dLB+PEbYgkYO59DO0d615d+Z/XdJfTMaQfQtRmydfQwdygHgcO8dyet+kOtkt1ldoQsJPbLvKKMA6Hm8B762+t7kV9S9lf/wccBqqMkQpdkj1fNQXvBHqzCf5ItSeQ4jrPyIuAXWzwnW2rvAKZn4pczeG2pDi7VoHu2FnQzJUIQY/8XRXJr/ZyRSUBuH8wuYwshcuxOnkdEF+3qUnRWN5UV88uDWN+i9SFCdUcMJtK3asVKnCEPk14BfT+1s3YT4HylQ+CHjBgq6QhTvX3JIqf30GtvJs/4/8Itq7VW1FwRjonF+lcv6U+sgILBB6yZPIt8HkKemYD5K5Aw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=zIxIP7PAIuirhXqPI2ILJ2Oty/iOlGKErONn8hSttGM=; b=ErEzxsS/53Nq3RyXp98+Ug/7ipoty23V5OO4+DvMfd3Z2Abpm+bCH3kRchhPA12MwQKYzkINOoDekxjR74FZpKqG+YE0Rmf0iyzsp9psDm3mTBsREY2YTJvz39NOcWlruIxmW9HxNamnhQbCOT82l6fOhOFrE6tauJCLPFX7CsTYbjS8/QtznOpwjstLduvLu6gEEi/XShSO4quwidSqErLwaj13bUUNOuXpj7kReftFW+dhTTjCGlK5gK/PU3xuu+mCa6QdP/8LC/i1SANj39TbFBLPGP7NOmlllVEKRtPq/fjisCPf47F9nmXe5juoX2SIMd4tZEREFI8dZIVUGg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none Authentication-Results-Original: gcc.gnu.org; dkim=none (message not signed) header.d=none;gcc.gnu.org; dmarc=none action=none header.from=arm.com; Received: from VI1PR08MB5325.eurprd08.prod.outlook.com (2603:10a6:803:13e::17) by VI1PR08MB3168.eurprd08.prod.outlook.com (2603:10a6:803:49::14) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3763.10; Fri, 15 Jan 2021 15:30:22 +0000 Received: from VI1PR08MB5325.eurprd08.prod.outlook.com ([fe80::ed1e:9499:4501:2118]) by VI1PR08MB5325.eurprd08.prod.outlook.com ([fe80::ed1e:9499:4501:2118%8]) with mapi id 15.20.3763.012; Fri, 15 Jan 2021 15:30:22 +0000 Date: Fri, 15 Jan 2021 15:30:19 +0000 From: Tamar Christina To: gcc-patches@gcc.gnu.org Cc: nd@arm.com, Richard.Earnshaw@arm.com, Marcus.Shawcroft@arm.com, Kyrylo.Tkachov@arm.com, richard.sandiford@arm.com Subject: [PATCH]AArch64: Add NEON, SVE and SVE2 RTL patterns for Multiply, FMS and FMA. Message-ID: Content-Type: multipart/mixed; boundary="wRRV7LY7NUeQGEoC" Content-Disposition: inline User-Agent: Mutt/1.9.4 (2018-02-28) X-Originating-IP: [217.140.106.53] X-ClientProxiedBy: LO2P265CA0503.GBRP265.PROD.OUTLOOK.COM (2603:10a6:600:13b::10) To VI1PR08MB5325.eurprd08.prod.outlook.com (2603:10a6:803:13e::17) MIME-Version: 1.0 X-MS-Exchange-MessageSentRepresentingType: 1 Received: from arm.com (217.140.106.53) by LO2P265CA0503.GBRP265.PROD.OUTLOOK.COM (2603:10a6:600:13b::10) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3763.10 via Frontend Transport; Fri, 15 Jan 2021 15:30:21 +0000 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-HT: Tenant X-MS-Office365-Filtering-Correlation-Id: 648d2f57-9dbc-4ef9-e103-08d8b96a7e5e X-MS-TrafficTypeDiagnostic: VI1PR08MB3168:|PA4PR08MB6174: X-MS-Exchange-Transport-Forked: True X-Microsoft-Antispam-PRVS: x-checkrecipientrouted: true NoDisclaimer: true X-MS-Oob-TLC-OOBClassifiers: OLM:3513;OLM:3513; X-MS-Exchange-SenderADCheck: 1 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: BpNMEBIG2VJAdNOFI8MPsIsL/UkhICsTNDn8cnVXNAZ+90/+BrDq9OB5NB8oewJpocMJ0+No73s9q64Y1rBIZx4BHLo3w3mNZzTL4xDo/bHkfpLWGpSB/kxyErkwVojbwwVWK50trwm7vZI2gDYjJQt/mYBs0OdaUK2KQiLmc17O+LITyCEB0Okwshp9qGkYhZyd1vG3+0wPZpxsJnU6OStOvzoemya4RpTGMMFB0a9s+cByHrf7GsiyOpUn2w2WcjwCVMkkpI0SABwnh5G+V6LSjac9WcsidbiZm3HTyVqCAoIGcMiiHcPv/Jk7tYKsjlCSef5ARfqosRrzy9qT/W7ulgipw5MYHAL0k8LcD+UOOh1WdFCRnyJpX0JOSc1kntkXJGQn2QtLKok26ze+H7+TsSSsnmXVQ4xErP9os84NiGke22E2EE2zhQ958c0qkgqWOdOqnOSpRYM7XlgunJvn0spzEwE/W1Q3Dki/DBNnE2d2drEnKt9MoVs/hE/a X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:VI1PR08MB5325.eurprd08.prod.outlook.com; PTR:; CAT:NONE; SFS:(4636009)(136003)(366004)(396003)(346002)(39860400002)(376002)(83380400001)(16526019)(4326008)(8676002)(8936002)(86362001)(66616009)(235185007)(30864003)(5660300002)(186003)(44832011)(7696005)(8886007)(2906002)(316002)(956004)(26005)(6916009)(36756003)(66556008)(4743002)(66946007)(66476007)(52116002)(2616005)(478600001)(55016002)(44144004)(33964004)(4216001)(2700100001)(357404004); DIR:OUT; SFP:1101; X-MS-Exchange-AntiSpam-MessageData: =?utf-8?B?TWFzOUdOWHExeGxaSENRbU5mNDFub1kvdk1VQ1YvM2hqSmM1ZFY1OHBEU0pu?= =?utf-8?B?K3lsTEx2U2Rwa2pHNzY5NE5BME82dUNMRmhyV00xb1BEeGcvMnd0eTd2T2dv?= =?utf-8?B?VWtBTDJ0YXBsSFpxdk85a1lrbGhIbE1LOWF0MXR0eE4rQ3A1VU83ZWVRajZT?= =?utf-8?B?cmNBdER0ZGFHM29NOTlvd1hYb0k3Skk1SSsycXkrank0eXh2d1QzS2Z2dWUw?= =?utf-8?B?ZGFGRzhKZ2tiK2tVMngvSWVmVEZHbXUvSzFlQ1BJbHVSM0c5Ny9hMkliNVdp?= =?utf-8?B?bU9Fb2lEczMrYmJYcnI2T0FpU2l2cU40bjkvYUt3aE1BSzJmZjRQazhTSFVz?= =?utf-8?B?bzNXcjBISWFUQmh0TTErTUl1NG1QR3h0QVVpY3MrMlNQZUtISWdWMndMaDFq?= =?utf-8?B?QUtnVlJoK0t1N0pIaEJkSnJIQzNjaTVFaUxKQS9ERmI0MEd6dUxzMW5MU21L?= =?utf-8?B?VEpBZEgwT000RXBmZjVNUFk2LzN5UGo2OTU2VjhmK3NrMGJjNXFaaFNKRW42?= =?utf-8?B?ZWdsRjlEM00xcDY5VzN3UHgvanFYampQSG1vY2V6anlnN3RkZURnS290aDVu?= =?utf-8?B?d1A4U05GK2F5QlN4dEFiM1FEdnJBRnc0dHpaQUVGbmpJUmprRHF5c3BUQURx?= =?utf-8?B?aFNPZ0V5NHFsU1doQW9aVE8xb2lxazFncjhLa0FuM2tjWFo5VmhGd2JaOEN3?= =?utf-8?B?TEl3a1Y4ZTgyLzB4d2lwYWdlU3A2VlBlbjdmM0FIbHNOSGlBZWxrbjZnSHBF?= =?utf-8?B?SjMrYjFDaG9JZE1ad1B1MFNSd0E5QjRURjJHUm1XdXlrTHl5UUJkeDRnc1Z5?= =?utf-8?B?ZmQyYU1wdkhYeTNwc3l0ZTBZN04xWWIrajVSUm94WC9EbE1zRHRPQ1UreTJv?= =?utf-8?B?bkllTTZ5d1pmZkVIUngzUW1WdmdBZnhZeVhPR0NmVkFvS2FzSnhNT0hoTC8y?= =?utf-8?B?OTdwSk11YkpMSlZTQTZxejludmxUTG50cnhqcDRSWURBbTVJNENLR3N2ZkZ4?= =?utf-8?B?UzNTbzRRZXVPblFIYk9wZTlaTmRIWGwzSk8yRDVSK25yZk9iSFNZbkpHWndM?= =?utf-8?B?a1hQL1JvS2dNMWRSOVNTelVlV3dzaFFDSjZOOE9ZR0pBK1gwWmh3dEhtQnUv?= =?utf-8?B?ckd1OTIwckJvdWtRMWFmS3dqM0llNUtkUkwwSTZKeHFlMU9mYUZlM3lXcXhs?= =?utf-8?B?c0ZyYTZkdUZZdjE3NFl4WC9xRGpIU2F1bjJIWm9BOTF4M0JaWXJvYmRsVVdJ?= =?utf-8?B?eTlqOWwrN24rdUdiamdHWk9WaVlIQ05tMVBQblpOTHVWRFdpRm1rUmYxb05G?= =?utf-8?Q?+wnNIMiuG9DVQJuLAS7GLz1KKo7RiLLpR+?= X-MS-Exchange-Transport-CrossTenantHeadersStamped: VI1PR08MB3168 Original-Authentication-Results: gcc.gnu.org; dkim=none (message not signed) header.d=none;gcc.gnu.org; dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: VE1EUR03FT006.eop-EUR03.prod.protection.outlook.com X-MS-Office365-Filtering-Correlation-Id-Prvs: dbe45e68-2cee-4bd3-f6f2-08d8b96a78d1 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: IuCvUfOEiGBMrrkO3hVi5VsEmYnMDA8I7gNWExSwwbMYP4E+pQnRNdSmpZOFg07n3V6gyuT4p6x1DPonLTPa/vGVblV95AGODny/mr7xWst3bbGSJZb2+s8wyLhqjE0O5+jrrVzxNLoqfxQ48eFnqReB+/Z4NUEnSdSdim6+SoOWAnotZt5Up4X4o+f8KkcfHO5b3Cuh3FKtidKk0vJ0BVvzUOB+1ORy3ufHprIhuDzfDPoGu878b4RdhC0ZhrnM4xVUHWSHKfMibvCwdaphpootNwJ8nIZvE0X2bpqsdoWWgzGeW1PBnY+kN4TL9aCFkVJt6kowW7Zsmto1sMAVwOeDXcrD1JhrNl7ofOX8+S4jVp4liv/supziGuFV5adpx1gg+RHdewqu4fzTqDW7xQUftajgDqCjsLhHhSy2yej/7g9IhlG1IiRYMBi2Spr10vLmtyoQbE8oFhy+7MNYXn9Ws7NntNyfqiBzmiXz1yTGhewq4B3rpf0/lJD43igpPHthKnbMVB5R6Is1xzthO4Tn+CoaDgg/TbnCOQtYl1XQQvS3eGCuiTzBiKK1Yjr81SIraHv54MNZIchY7ckVMzJmOAMktpOy9hYDQ/kC9sFBN8wtUkI3ypZ2BqFL6K6eU5GDMrbK95iVWcwcCkhFhG+8mtcPFwbRKReI1hsj6cij2rpVC28dbP63BZxgkz/WZBcu0gA60F2DxIEuBB4qOw== X-Forefront-Antispam-Report: CIP:63.35.35.123; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:64aa7808-outbound-1.mta.getcheckrecipient.com; PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com; CAT:NONE; SFS:(4636009)(39860400002)(376002)(346002)(136003)(396003)(46966006)(8676002)(82740400003)(70586007)(4743002)(356005)(336012)(36756003)(44832011)(235185007)(2616005)(8936002)(8886007)(82310400003)(478600001)(47076005)(316002)(66616009)(70206006)(81166007)(956004)(34020700004)(83380400001)(55016002)(26005)(44144004)(7696005)(2906002)(86362001)(16526019)(186003)(33964004)(4326008)(6916009)(5660300002)(30864003)(4216001)(2700100001)(357404004); DIR:OUT; SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 15 Jan 2021 15:30:30.8975 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 648d2f57-9dbc-4ef9-e103-08d8b96a7e5e X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[63.35.35.123]; Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: VE1EUR03FT006.eop-EUR03.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: PA4PR08MB6174 X-Spam-Status: No, score=-13.8 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, GIT_PATCH_0, KAM_ASCII_DIVIDERS, KAM_LOTSOFHASH, MSGID_FROM_MTA_HEADER, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_PASS, TXREP, UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 15 Jan 2021 15:30:37 -0000 Message-ID: <20210115153019.ntt-ry12tsuCVWjxeaUVMs2f3yE2G_IlrVxidGtmVig@z> --wRRV7LY7NUeQGEoC Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Hi All, This adds implementation for the optabs for complex operations. With this the following C code: void g (float complex a[restrict N], float complex b[restrict N], float complex c[restrict N]) { for (int i=0; i < N; i++) c[i] = a[i] * b[i]; } generates NEON: g: movi v3.4s, 0 mov x3, 0 .p2align 3,,7 .L2: mov v0.16b, v3.16b ldr q2, [x1, x3] ldr q1, [x0, x3] fcmla v0.4s, v1.4s, v2.4s, #0 fcmla v0.4s, v1.4s, v2.4s, #90 str q0, [x2, x3] add x3, x3, 16 cmp x3, 1600 bne .L2 ret SVE: g: mov x3, 0 mov x4, 400 ptrue p1.b, all whilelo p0.s, xzr, x4 mov z3.s, #0 .p2align 3,,7 .L2: ld1w z1.s, p0/z, [x0, x3, lsl 2] ld1w z2.s, p0/z, [x1, x3, lsl 2] movprfx z0, z3 fcmla z0.s, p1/m, z1.s, z2.s, #0 fcmla z0.s, p1/m, z1.s, z2.s, #90 st1w z0.s, p0, [x2, x3, lsl 2] incw x3 whilelo p0.s, x3, x4 b.any .L2 ret SVE2 (with int instead of float) g: mov x3, 0 mov x4, 400 mov z3.b, #0 whilelo p0.s, xzr, x4 .p2align 3,,7 .L2: ld1w z1.s, p0/z, [x0, x3, lsl 2] ld1w z2.s, p0/z, [x1, x3, lsl 2] movprfx z0, z3 cmla z0.s, z1.s, z2.s, #0 cmla z0.s, z1.s, z2.s, #90 st1w z0.s, p0, [x2, x3, lsl 2] incw x3 whilelo p0.s, x3, x4 b.any .L2 ret It defined a new iterator VALL_ARITH which contains types for which we can do general arithmetic (excludes bfloat16). Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. Checked with armv8-a+sve2+fp16 and no issues. Note that sue to a mid-end limitation SLP for SVE currently fails for some permutes. The tests have these marked as XFAIL. I do intend to fix this soon. Execution tests verified with QEMU. Matching tests for these are in the mid-end patches. This I will turn on for these patterns in a separate patch. Ok for master? Thanks, Tamar gcc/ChangeLog: * config/aarch64/aarch64-simd.md (cml4, cmul3): New. * config/aarch64/iterators.md (VALL_ARITH, UNSPEC_FCMUL, UNSPEC_FCMUL180, UNSPEC_FCMLA_CONJ, UNSPEC_FCMLA180_CONJ, UNSPEC_CMLA_CONJ, UNSPEC_CMLA180_CONJ, UNSPEC_CMUL, UNSPEC_CMUL180, FCMLA_OP, FCMUL_OP, rot_op, rotsplit1, rotsplit2, fcmac1, sve_rot1, sve_rot2, SVE2_INT_CMLA_OP, SVE2_INT_CMUL_OP, SVE2_INT_CADD_OP): New. (rot): Add UNSPEC_FCMUL, UNSPEC_FCMUL180. * config/aarch64/aarch64-sve.md (cml4, cmul3): New. * config/aarch64/aarch64-sve2.md (cml4, cmul3): New. --- inline copy of patch -- diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index 4b869ded918fd91ffd41e6ba068239a752b331e5..8a5f1dad224a99a8ba30669139259922a1250d0e 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -516,6 +516,47 @@ (define_insn "aarch64_fcmlaq_lane" [(set_attr "type" "neon_fcmla")] ) +;; The complex mla/mls operations always need to expand to two instructions. +;; The first operation does half the computation and the second does the +;; remainder. Because of this, expand early. +(define_expand "cml4" + [(set (match_operand:VHSDF 0 "register_operand") + (plus:VHSDF (match_operand:VHSDF 1 "register_operand") + (unspec:VHSDF [(match_operand:VHSDF 2 "register_operand") + (match_operand:VHSDF 3 "register_operand")] + FCMLA_OP)))] + "TARGET_COMPLEX && !BYTES_BIG_ENDIAN" +{ + rtx tmp = gen_reg_rtx (mode); + emit_insn (gen_aarch64_fcmla (tmp, operands[1], + operands[3], operands[2])); + emit_insn (gen_aarch64_fcmla (operands[0], tmp, + operands[3], operands[2])); + DONE; +}) + +;; The complex mul operations always need to expand to two instructions. +;; The first operation does half the computation and the second does the +;; remainder. Because of this, expand early. +(define_expand "cmul3" + [(set (match_operand:VHSDF 0 "register_operand") + (unspec:VHSDF [(match_operand:VHSDF 1 "register_operand") + (match_operand:VHSDF 2 "register_operand")] + FCMUL_OP))] + "TARGET_COMPLEX && !BYTES_BIG_ENDIAN" +{ + rtx tmp = gen_reg_rtx (mode); + rtx res1 = gen_reg_rtx (mode); + emit_move_insn (tmp, CONST0_RTX (mode)); + emit_insn (gen_aarch64_fcmla (res1, tmp, + operands[2], operands[1])); + emit_insn (gen_aarch64_fcmla (operands[0], res1, + operands[2], operands[1])); + DONE; +}) + + + ;; These instructions map to the __builtins for the Dot Product operations. (define_insn "aarch64_dot" [(set (match_operand:VS 0 "register_operand" "=w") diff --git a/gcc/config/aarch64/aarch64-sve.md b/gcc/config/aarch64/aarch64-sve.md index da15bd8788507feb12d52894c14e099370f34108..9dfe6a3f4512a20ba4f1e66a105ee0ae5d6949ea 100644 --- a/gcc/config/aarch64/aarch64-sve.md +++ b/gcc/config/aarch64/aarch64-sve.md @@ -7243,6 +7243,62 @@ (define_insn "@aarch64_pred_" [(set_attr "movprfx" "*,yes")] ) +;; unpredicated optab pattern for auto-vectorizer +;; The complex mla/mls operations always need to expand to two instructions. +;; The first operation does half the computation and the second does the +;; remainder. Because of this, expand early. +(define_expand "cml4" + [(set (match_operand:SVE_FULL_F 0 "register_operand") + (unspec:SVE_FULL_F + [(match_dup 4) + (match_dup 5) + (match_operand:SVE_FULL_F 1 "register_operand") + (match_operand:SVE_FULL_F 2 "register_operand") + (match_operand:SVE_FULL_F 3 "register_operand")] + FCMLA_OP))] + "TARGET_SVE" +{ + operands[4] = aarch64_ptrue_reg (mode); + operands[5] = gen_int_mode (SVE_RELAXED_GP, SImode); + rtx tmp = gen_reg_rtx (mode); + emit_insn + (gen_aarch64_pred_fcmla (tmp, operands[4], + operands[3], operands[2], + operands[1], operands[5])); + emit_insn + (gen_aarch64_pred_fcmla (operands[0], operands[4], + operands[3], operands[2], + tmp, operands[5])); + DONE; +}) + +;; unpredicated optab pattern for auto-vectorizer +;; The complex mul operations always need to expand to two instructions. +;; The first operation does half the computation and the second does the +;; remainder. Because of this, expand early. +(define_expand "cmul3" + [(set (match_operand:SVE_FULL_F 0 "register_operand") + (unspec:SVE_FULL_F + [(match_operand:SVE_FULL_F 1 "register_operand") + (match_operand:SVE_FULL_F 2 "register_operand")] + FCMUL_OP))] + "TARGET_SVE" +{ + rtx pred_reg = aarch64_ptrue_reg (mode); + rtx gp_mode = gen_int_mode (SVE_RELAXED_GP, SImode); + rtx accum = force_reg (mode, CONST0_RTX (mode)); + rtx tmp = gen_reg_rtx (mode); + emit_insn + (gen_aarch64_pred_fcmla (tmp, pred_reg, + operands[2], operands[1], + accum, gp_mode)); + emit_insn + (gen_aarch64_pred_fcmla (operands[0], pred_reg, + operands[2], operands[1], + tmp, gp_mode)); + DONE; +}) + ;; Predicated FCMLA with merging. (define_expand "@cond_" [(set (match_operand:SVE_FULL_F 0 "register_operand") diff --git a/gcc/config/aarch64/aarch64-sve2.md b/gcc/config/aarch64/aarch64-sve2.md index 5cb9144da98af2d02b83043511a99b5723d7e8c0..b96708d03f4458726b32ec46c0078499e00b8549 100644 --- a/gcc/config/aarch64/aarch64-sve2.md +++ b/gcc/config/aarch64/aarch64-sve2.md @@ -1848,6 +1848,48 @@ (define_insn "@aarch64__lane_" [(set_attr "movprfx" "*,yes")] ) +;; unpredicated optab pattern for auto-vectorizer +;; The complex mla/mls operations always need to expand to two instructions. +;; The first operation does half the computation and the second does the +;; remainder. Because of this, expand early. +(define_expand "cml4" + [(set (match_operand:SVE_FULL_I 0 "register_operand") + (plus:SVE_FULL_I (match_operand:SVE_FULL_I 1 "register_operand") + (unspec:SVE_FULL_I + [(match_operand:SVE_FULL_I 2 "register_operand") + (match_operand:SVE_FULL_I 3 "register_operand")] + SVE2_INT_CMLA_OP)))] + "TARGET_SVE2" +{ + rtx tmp = gen_reg_rtx (mode); + emit_insn (gen_aarch64_sve_cmla (tmp, operands[1], + operands[3], operands[2])); + emit_insn (gen_aarch64_sve_cmla (operands[0], tmp, + operands[3], operands[2])); + DONE; +}) + +;; unpredicated optab pattern for auto-vectorizer +;; The complex mul operations always need to expand to two instructions. +;; The first operation does half the computation and the second does the +;; remainder. Because of this, expand early. +(define_expand "cmul3" + [(set (match_operand:SVE_FULL_I 0 "register_operand") + (unspec:SVE_FULL_I + [(match_operand:SVE_FULL_I 1 "register_operand") + (match_operand:SVE_FULL_I 2 "register_operand")] + SVE2_INT_CMUL_OP))] + "TARGET_SVE2" +{ + rtx accum = force_reg (mode, CONST0_RTX (mode)); + rtx tmp = gen_reg_rtx (mode); + emit_insn (gen_aarch64_sve_cmla (tmp, accum, + operands[2], operands[1])); + emit_insn (gen_aarch64_sve_cmla (operands[0], tmp, + operands[2], operands[1])); + DONE; +}) + ;; ------------------------------------------------------------------------- ;; ---- [INT] Complex dot product ;; ------------------------------------------------------------------------- diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md index d42a70653edb266f2b76924b75a814db25f08f23..3f61fc8e380abd922d39973f40a966b7ce64fa40 100644 --- a/gcc/config/aarch64/iterators.md +++ b/gcc/config/aarch64/iterators.md @@ -182,6 +182,11 @@ (define_mode_iterator V2F [V2SF V2DF]) ;; All Advanced SIMD modes on which we support any arithmetic operations. (define_mode_iterator VALL [V8QI V16QI V4HI V8HI V2SI V4SI V2DI V2SF V4SF V2DF]) +;; All Advanced SIMD modes suitable for performing arithmetics. +(define_mode_iterator VALL_ARITH [V8QI V16QI V4HI V8HI V2SI V4SI V2DI + (V4HF "TARGET_SIMD_F16INST") (V8HF "TARGET_SIMD_F16INST") + V2SF V4SF V2DF]) + ;; All Advanced SIMD modes suitable for moving, loading, and storing. (define_mode_iterator VALL_F16 [V8QI V16QI V4HI V8HI V2SI V4SI V2DI V4HF V8HF V4BF V8BF V2SF V4SF V2DF]) @@ -712,6 +717,10 @@ (define_c_enum "unspec" UNSPEC_FCMLA90 ; Used in aarch64-simd.md. UNSPEC_FCMLA180 ; Used in aarch64-simd.md. UNSPEC_FCMLA270 ; Used in aarch64-simd.md. + UNSPEC_FCMUL ; Used in aarch64-simd.md. + UNSPEC_FCMUL_CONJ ; Used in aarch64-simd.md. + UNSPEC_FCMLA_CONJ ; Used in aarch64-simd.md. + UNSPEC_FCMLA180_CONJ ; Used in aarch64-simd.md. UNSPEC_ASRD ; Used in aarch64-sve.md. UNSPEC_ADCLB ; Used in aarch64-sve2.md. UNSPEC_ADCLT ; Used in aarch64-sve2.md. @@ -730,6 +739,10 @@ (define_c_enum "unspec" UNSPEC_CMLA180 ; Used in aarch64-sve2.md. UNSPEC_CMLA270 ; Used in aarch64-sve2.md. UNSPEC_CMLA90 ; Used in aarch64-sve2.md. + UNSPEC_CMLA_CONJ ; Used in aarch64-sve2.md. + UNSPEC_CMLA180_CONJ ; Used in aarch64-sve2.md. + UNSPEC_CMUL ; Used in aarch64-sve2.md. + UNSPEC_CMUL_CONJ ; Used in aarch64-sve2.md. UNSPEC_COND_FCVTLT ; Used in aarch64-sve2.md. UNSPEC_COND_FCVTNT ; Used in aarch64-sve2.md. UNSPEC_COND_FCVTX ; Used in aarch64-sve2.md. @@ -1291,7 +1304,7 @@ (define_mode_attr Vwide [(V2SF "v2df") (V4HF "v4sf") ;; Widened mode register suffixes for VD_BHSI/VQW/VQ_HSF. (define_mode_attr Vwtype [(V8QI "8h") (V4HI "4s") - (V2SI "2d") (V16QI "8h") + (V2SI "2d") (V16QI "8h") (V8HI "4s") (V4SI "2d") (V8HF "4s") (V4SF "2d")]) @@ -1313,7 +1326,7 @@ (define_mode_attr Vewtype [(VNx16QI "h") ;; Widened mode register suffixes for VDW/VQW. (define_mode_attr Vmwtype [(V8QI ".8h") (V4HI ".4s") - (V2SI ".2d") (V16QI ".8h") + (V2SI ".2d") (V16QI ".8h") (V8HI ".4s") (V4SI ".2d") (V4HF ".4s") (V2SF ".2d") (SI "") (HI "")]) @@ -2611,6 +2624,19 @@ (define_int_iterator SVE2_INT_CMLA [UNSPEC_CMLA UNSPEC_SQRDCMLAH180 UNSPEC_SQRDCMLAH270]) +;; Unlike the normal CMLA instructions these represent the actual operation you +;; to be performed. They will always need to be expanded into multiple +;; sequences consisting of CMLA. +(define_int_iterator SVE2_INT_CMLA_OP [UNSPEC_CMLA + UNSPEC_CMLA_CONJ + UNSPEC_CMLA180]) + +;; Unlike the normal CMLA instructions these represent the actual operation you +;; to be performed. They will always need to be expanded into multiple +;; sequences consisting of CMLA. +(define_int_iterator SVE2_INT_CMUL_OP [UNSPEC_CMUL + UNSPEC_CMUL_CONJ]) + ;; Same as SVE2_INT_CADD but exclude the saturating instructions (define_int_iterator SVE2_INT_CADD_OP [UNSPEC_CADD90 UNSPEC_CADD270]) @@ -2725,6 +2751,14 @@ (define_int_iterator FMMLA [UNSPEC_FMMLA]) (define_int_iterator BF_MLA [UNSPEC_BFMLALB UNSPEC_BFMLALT]) +(define_int_iterator FCMLA_OP [UNSPEC_FCMLA + UNSPEC_FCMLA180 + UNSPEC_FCMLA_CONJ + UNSPEC_FCMLA180_CONJ]) + +(define_int_iterator FCMUL_OP [UNSPEC_FCMUL + UNSPEC_FCMUL_CONJ]) + ;; Iterators for atomic operations. (define_int_iterator ATOMIC_LDOP @@ -3435,7 +3469,79 @@ (define_int_attr rot [(UNSPEC_CADD90 "90") (UNSPEC_COND_FCMLA "0") (UNSPEC_COND_FCMLA90 "90") (UNSPEC_COND_FCMLA180 "180") - (UNSPEC_COND_FCMLA270 "270")]) + (UNSPEC_COND_FCMLA270 "270") + (UNSPEC_FCMUL "0") + (UNSPEC_FCMUL_CONJ "180")]) + +;; A conjucate is a negation of the imaginary component +;; The number in the unspecs are the rotation component of the instruction, e.g +;; FCMLA180 means use the instruction with #180. +;; The iterator is used to produce the right name mangling for the function. +(define_int_attr rot_op [(UNSPEC_FCMLA180 "") + (UNSPEC_FCMLA180_CONJ "_conj") + (UNSPEC_FCMLA "") + (UNSPEC_FCMLA_CONJ "_conj") + (UNSPEC_FCMUL "") + (UNSPEC_FCMUL_CONJ "_conj") + (UNSPEC_CMLA "") + (UNSPEC_CMLA180 "") + (UNSPEC_CMLA_CONJ "_conj") + (UNSPEC_CMUL "") + (UNSPEC_CMUL_CONJ "_conj")]) + +;; The complex operations when performed on a real complex number require two +;; instructions to perform the operation. e.g. complex multiplication requires +;; two FCMUL with a particular rotation value. +;; +;; These values can be looked up in rotsplit1 and rotsplit2. as an example +;; FCMUL needs the first instruction to use #0 and the second #90. +(define_int_attr rotsplit1 [(UNSPEC_FCMLA "0") + (UNSPEC_FCMLA_CONJ "0") + (UNSPEC_FCMUL "0") + (UNSPEC_FCMUL_CONJ "0") + (UNSPEC_FCMLA180 "270") + (UNSPEC_FCMLA180_CONJ "90")]) + +(define_int_attr rotsplit2 [(UNSPEC_FCMLA "90") + (UNSPEC_FCMLA_CONJ "270") + (UNSPEC_FCMUL "90") + (UNSPEC_FCMUL_CONJ "270") + (UNSPEC_FCMLA180 "180") + (UNSPEC_FCMLA180_CONJ "180")]) + +;; SVE has slightly different namings from NEON so we have to split these +;; iterators. +(define_int_attr sve_rot1 [(UNSPEC_FCMLA "") + (UNSPEC_FCMLA_CONJ "") + (UNSPEC_FCMUL "") + (UNSPEC_FCMUL_CONJ "") + (UNSPEC_FCMLA180 "270") + (UNSPEC_FCMLA180_CONJ "90") + (UNSPEC_CMLA "") + (UNSPEC_CMLA_CONJ "") + (UNSPEC_CMUL "") + (UNSPEC_CMUL_CONJ "") + (UNSPEC_CMLA180 "270") + (UNSPEC_CMLA180_CONJ "90")]) + +(define_int_attr sve_rot2 [(UNSPEC_FCMLA "90") + (UNSPEC_FCMLA_CONJ "270") + (UNSPEC_FCMUL "90") + (UNSPEC_FCMUL_CONJ "270") + (UNSPEC_FCMLA180 "180") + (UNSPEC_FCMLA180_CONJ "180") + (UNSPEC_CMLA "90") + (UNSPEC_CMLA_CONJ "270") + (UNSPEC_CMUL "90") + (UNSPEC_CMUL_CONJ "270") + (UNSPEC_CMLA180 "180") + (UNSPEC_CMLA180_CONJ "180")]) + + +(define_int_attr fcmac1 [(UNSPEC_FCMLA "a") (UNSPEC_FCMLA_CONJ "a") + (UNSPEC_FCMLA180 "s") (UNSPEC_FCMLA180_CONJ "s") + (UNSPEC_CMLA "a") (UNSPEC_CMLA_CONJ "a") + (UNSPEC_CMLA180 "s") (UNSPEC_CMLA180_CONJ "s")]) (define_int_attr sve_fmla_op [(UNSPEC_COND_FMLA "fmla") (UNSPEC_COND_FMLS "fmls") -- --wRRV7LY7NUeQGEoC Content-Type: text/x-diff; charset=utf-8 Content-Disposition: attachment; filename="rb13907.patch" diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index 4b869ded918fd91ffd41e6ba068239a752b331e5..8a5f1dad224a99a8ba30669139259922a1250d0e 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -516,6 +516,47 @@ (define_insn "aarch64_fcmlaq_lane" [(set_attr "type" "neon_fcmla")] ) +;; The complex mla/mls operations always need to expand to two instructions. +;; The first operation does half the computation and the second does the +;; remainder. Because of this, expand early. +(define_expand "cml4" + [(set (match_operand:VHSDF 0 "register_operand") + (plus:VHSDF (match_operand:VHSDF 1 "register_operand") + (unspec:VHSDF [(match_operand:VHSDF 2 "register_operand") + (match_operand:VHSDF 3 "register_operand")] + FCMLA_OP)))] + "TARGET_COMPLEX && !BYTES_BIG_ENDIAN" +{ + rtx tmp = gen_reg_rtx (mode); + emit_insn (gen_aarch64_fcmla (tmp, operands[1], + operands[3], operands[2])); + emit_insn (gen_aarch64_fcmla (operands[0], tmp, + operands[3], operands[2])); + DONE; +}) + +;; The complex mul operations always need to expand to two instructions. +;; The first operation does half the computation and the second does the +;; remainder. Because of this, expand early. +(define_expand "cmul3" + [(set (match_operand:VHSDF 0 "register_operand") + (unspec:VHSDF [(match_operand:VHSDF 1 "register_operand") + (match_operand:VHSDF 2 "register_operand")] + FCMUL_OP))] + "TARGET_COMPLEX && !BYTES_BIG_ENDIAN" +{ + rtx tmp = gen_reg_rtx (mode); + rtx res1 = gen_reg_rtx (mode); + emit_move_insn (tmp, CONST0_RTX (mode)); + emit_insn (gen_aarch64_fcmla (res1, tmp, + operands[2], operands[1])); + emit_insn (gen_aarch64_fcmla (operands[0], res1, + operands[2], operands[1])); + DONE; +}) + + + ;; These instructions map to the __builtins for the Dot Product operations. (define_insn "aarch64_dot" [(set (match_operand:VS 0 "register_operand" "=w") diff --git a/gcc/config/aarch64/aarch64-sve.md b/gcc/config/aarch64/aarch64-sve.md index da15bd8788507feb12d52894c14e099370f34108..9dfe6a3f4512a20ba4f1e66a105ee0ae5d6949ea 100644 --- a/gcc/config/aarch64/aarch64-sve.md +++ b/gcc/config/aarch64/aarch64-sve.md @@ -7243,6 +7243,62 @@ (define_insn "@aarch64_pred_" [(set_attr "movprfx" "*,yes")] ) +;; unpredicated optab pattern for auto-vectorizer +;; The complex mla/mls operations always need to expand to two instructions. +;; The first operation does half the computation and the second does the +;; remainder. Because of this, expand early. +(define_expand "cml4" + [(set (match_operand:SVE_FULL_F 0 "register_operand") + (unspec:SVE_FULL_F + [(match_dup 4) + (match_dup 5) + (match_operand:SVE_FULL_F 1 "register_operand") + (match_operand:SVE_FULL_F 2 "register_operand") + (match_operand:SVE_FULL_F 3 "register_operand")] + FCMLA_OP))] + "TARGET_SVE" +{ + operands[4] = aarch64_ptrue_reg (mode); + operands[5] = gen_int_mode (SVE_RELAXED_GP, SImode); + rtx tmp = gen_reg_rtx (mode); + emit_insn + (gen_aarch64_pred_fcmla (tmp, operands[4], + operands[3], operands[2], + operands[1], operands[5])); + emit_insn + (gen_aarch64_pred_fcmla (operands[0], operands[4], + operands[3], operands[2], + tmp, operands[5])); + DONE; +}) + +;; unpredicated optab pattern for auto-vectorizer +;; The complex mul operations always need to expand to two instructions. +;; The first operation does half the computation and the second does the +;; remainder. Because of this, expand early. +(define_expand "cmul3" + [(set (match_operand:SVE_FULL_F 0 "register_operand") + (unspec:SVE_FULL_F + [(match_operand:SVE_FULL_F 1 "register_operand") + (match_operand:SVE_FULL_F 2 "register_operand")] + FCMUL_OP))] + "TARGET_SVE" +{ + rtx pred_reg = aarch64_ptrue_reg (mode); + rtx gp_mode = gen_int_mode (SVE_RELAXED_GP, SImode); + rtx accum = force_reg (mode, CONST0_RTX (mode)); + rtx tmp = gen_reg_rtx (mode); + emit_insn + (gen_aarch64_pred_fcmla (tmp, pred_reg, + operands[2], operands[1], + accum, gp_mode)); + emit_insn + (gen_aarch64_pred_fcmla (operands[0], pred_reg, + operands[2], operands[1], + tmp, gp_mode)); + DONE; +}) + ;; Predicated FCMLA with merging. (define_expand "@cond_" [(set (match_operand:SVE_FULL_F 0 "register_operand") diff --git a/gcc/config/aarch64/aarch64-sve2.md b/gcc/config/aarch64/aarch64-sve2.md index 5cb9144da98af2d02b83043511a99b5723d7e8c0..b96708d03f4458726b32ec46c0078499e00b8549 100644 --- a/gcc/config/aarch64/aarch64-sve2.md +++ b/gcc/config/aarch64/aarch64-sve2.md @@ -1848,6 +1848,48 @@ (define_insn "@aarch64__lane_" [(set_attr "movprfx" "*,yes")] ) +;; unpredicated optab pattern for auto-vectorizer +;; The complex mla/mls operations always need to expand to two instructions. +;; The first operation does half the computation and the second does the +;; remainder. Because of this, expand early. +(define_expand "cml4" + [(set (match_operand:SVE_FULL_I 0 "register_operand") + (plus:SVE_FULL_I (match_operand:SVE_FULL_I 1 "register_operand") + (unspec:SVE_FULL_I + [(match_operand:SVE_FULL_I 2 "register_operand") + (match_operand:SVE_FULL_I 3 "register_operand")] + SVE2_INT_CMLA_OP)))] + "TARGET_SVE2" +{ + rtx tmp = gen_reg_rtx (mode); + emit_insn (gen_aarch64_sve_cmla (tmp, operands[1], + operands[3], operands[2])); + emit_insn (gen_aarch64_sve_cmla (operands[0], tmp, + operands[3], operands[2])); + DONE; +}) + +;; unpredicated optab pattern for auto-vectorizer +;; The complex mul operations always need to expand to two instructions. +;; The first operation does half the computation and the second does the +;; remainder. Because of this, expand early. +(define_expand "cmul3" + [(set (match_operand:SVE_FULL_I 0 "register_operand") + (unspec:SVE_FULL_I + [(match_operand:SVE_FULL_I 1 "register_operand") + (match_operand:SVE_FULL_I 2 "register_operand")] + SVE2_INT_CMUL_OP))] + "TARGET_SVE2" +{ + rtx accum = force_reg (mode, CONST0_RTX (mode)); + rtx tmp = gen_reg_rtx (mode); + emit_insn (gen_aarch64_sve_cmla (tmp, accum, + operands[2], operands[1])); + emit_insn (gen_aarch64_sve_cmla (operands[0], tmp, + operands[2], operands[1])); + DONE; +}) + ;; ------------------------------------------------------------------------- ;; ---- [INT] Complex dot product ;; ------------------------------------------------------------------------- diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md index d42a70653edb266f2b76924b75a814db25f08f23..3f61fc8e380abd922d39973f40a966b7ce64fa40 100644 --- a/gcc/config/aarch64/iterators.md +++ b/gcc/config/aarch64/iterators.md @@ -182,6 +182,11 @@ (define_mode_iterator V2F [V2SF V2DF]) ;; All Advanced SIMD modes on which we support any arithmetic operations. (define_mode_iterator VALL [V8QI V16QI V4HI V8HI V2SI V4SI V2DI V2SF V4SF V2DF]) +;; All Advanced SIMD modes suitable for performing arithmetics. +(define_mode_iterator VALL_ARITH [V8QI V16QI V4HI V8HI V2SI V4SI V2DI + (V4HF "TARGET_SIMD_F16INST") (V8HF "TARGET_SIMD_F16INST") + V2SF V4SF V2DF]) + ;; All Advanced SIMD modes suitable for moving, loading, and storing. (define_mode_iterator VALL_F16 [V8QI V16QI V4HI V8HI V2SI V4SI V2DI V4HF V8HF V4BF V8BF V2SF V4SF V2DF]) @@ -712,6 +717,10 @@ (define_c_enum "unspec" UNSPEC_FCMLA90 ; Used in aarch64-simd.md. UNSPEC_FCMLA180 ; Used in aarch64-simd.md. UNSPEC_FCMLA270 ; Used in aarch64-simd.md. + UNSPEC_FCMUL ; Used in aarch64-simd.md. + UNSPEC_FCMUL_CONJ ; Used in aarch64-simd.md. + UNSPEC_FCMLA_CONJ ; Used in aarch64-simd.md. + UNSPEC_FCMLA180_CONJ ; Used in aarch64-simd.md. UNSPEC_ASRD ; Used in aarch64-sve.md. UNSPEC_ADCLB ; Used in aarch64-sve2.md. UNSPEC_ADCLT ; Used in aarch64-sve2.md. @@ -730,6 +739,10 @@ (define_c_enum "unspec" UNSPEC_CMLA180 ; Used in aarch64-sve2.md. UNSPEC_CMLA270 ; Used in aarch64-sve2.md. UNSPEC_CMLA90 ; Used in aarch64-sve2.md. + UNSPEC_CMLA_CONJ ; Used in aarch64-sve2.md. + UNSPEC_CMLA180_CONJ ; Used in aarch64-sve2.md. + UNSPEC_CMUL ; Used in aarch64-sve2.md. + UNSPEC_CMUL_CONJ ; Used in aarch64-sve2.md. UNSPEC_COND_FCVTLT ; Used in aarch64-sve2.md. UNSPEC_COND_FCVTNT ; Used in aarch64-sve2.md. UNSPEC_COND_FCVTX ; Used in aarch64-sve2.md. @@ -1291,7 +1304,7 @@ (define_mode_attr Vwide [(V2SF "v2df") (V4HF "v4sf") ;; Widened mode register suffixes for VD_BHSI/VQW/VQ_HSF. (define_mode_attr Vwtype [(V8QI "8h") (V4HI "4s") - (V2SI "2d") (V16QI "8h") + (V2SI "2d") (V16QI "8h") (V8HI "4s") (V4SI "2d") (V8HF "4s") (V4SF "2d")]) @@ -1313,7 +1326,7 @@ (define_mode_attr Vewtype [(VNx16QI "h") ;; Widened mode register suffixes for VDW/VQW. (define_mode_attr Vmwtype [(V8QI ".8h") (V4HI ".4s") - (V2SI ".2d") (V16QI ".8h") + (V2SI ".2d") (V16QI ".8h") (V8HI ".4s") (V4SI ".2d") (V4HF ".4s") (V2SF ".2d") (SI "") (HI "")]) @@ -2611,6 +2624,19 @@ (define_int_iterator SVE2_INT_CMLA [UNSPEC_CMLA UNSPEC_SQRDCMLAH180 UNSPEC_SQRDCMLAH270]) +;; Unlike the normal CMLA instructions these represent the actual operation you +;; to be performed. They will always need to be expanded into multiple +;; sequences consisting of CMLA. +(define_int_iterator SVE2_INT_CMLA_OP [UNSPEC_CMLA + UNSPEC_CMLA_CONJ + UNSPEC_CMLA180]) + +;; Unlike the normal CMLA instructions these represent the actual operation you +;; to be performed. They will always need to be expanded into multiple +;; sequences consisting of CMLA. +(define_int_iterator SVE2_INT_CMUL_OP [UNSPEC_CMUL + UNSPEC_CMUL_CONJ]) + ;; Same as SVE2_INT_CADD but exclude the saturating instructions (define_int_iterator SVE2_INT_CADD_OP [UNSPEC_CADD90 UNSPEC_CADD270]) @@ -2725,6 +2751,14 @@ (define_int_iterator FMMLA [UNSPEC_FMMLA]) (define_int_iterator BF_MLA [UNSPEC_BFMLALB UNSPEC_BFMLALT]) +(define_int_iterator FCMLA_OP [UNSPEC_FCMLA + UNSPEC_FCMLA180 + UNSPEC_FCMLA_CONJ + UNSPEC_FCMLA180_CONJ]) + +(define_int_iterator FCMUL_OP [UNSPEC_FCMUL + UNSPEC_FCMUL_CONJ]) + ;; Iterators for atomic operations. (define_int_iterator ATOMIC_LDOP @@ -3435,7 +3469,79 @@ (define_int_attr rot [(UNSPEC_CADD90 "90") (UNSPEC_COND_FCMLA "0") (UNSPEC_COND_FCMLA90 "90") (UNSPEC_COND_FCMLA180 "180") - (UNSPEC_COND_FCMLA270 "270")]) + (UNSPEC_COND_FCMLA270 "270") + (UNSPEC_FCMUL "0") + (UNSPEC_FCMUL_CONJ "180")]) + +;; A conjucate is a negation of the imaginary component +;; The number in the unspecs are the rotation component of the instruction, e.g +;; FCMLA180 means use the instruction with #180. +;; The iterator is used to produce the right name mangling for the function. +(define_int_attr rot_op [(UNSPEC_FCMLA180 "") + (UNSPEC_FCMLA180_CONJ "_conj") + (UNSPEC_FCMLA "") + (UNSPEC_FCMLA_CONJ "_conj") + (UNSPEC_FCMUL "") + (UNSPEC_FCMUL_CONJ "_conj") + (UNSPEC_CMLA "") + (UNSPEC_CMLA180 "") + (UNSPEC_CMLA_CONJ "_conj") + (UNSPEC_CMUL "") + (UNSPEC_CMUL_CONJ "_conj")]) + +;; The complex operations when performed on a real complex number require two +;; instructions to perform the operation. e.g. complex multiplication requires +;; two FCMUL with a particular rotation value. +;; +;; These values can be looked up in rotsplit1 and rotsplit2. as an example +;; FCMUL needs the first instruction to use #0 and the second #90. +(define_int_attr rotsplit1 [(UNSPEC_FCMLA "0") + (UNSPEC_FCMLA_CONJ "0") + (UNSPEC_FCMUL "0") + (UNSPEC_FCMUL_CONJ "0") + (UNSPEC_FCMLA180 "270") + (UNSPEC_FCMLA180_CONJ "90")]) + +(define_int_attr rotsplit2 [(UNSPEC_FCMLA "90") + (UNSPEC_FCMLA_CONJ "270") + (UNSPEC_FCMUL "90") + (UNSPEC_FCMUL_CONJ "270") + (UNSPEC_FCMLA180 "180") + (UNSPEC_FCMLA180_CONJ "180")]) + +;; SVE has slightly different namings from NEON so we have to split these +;; iterators. +(define_int_attr sve_rot1 [(UNSPEC_FCMLA "") + (UNSPEC_FCMLA_CONJ "") + (UNSPEC_FCMUL "") + (UNSPEC_FCMUL_CONJ "") + (UNSPEC_FCMLA180 "270") + (UNSPEC_FCMLA180_CONJ "90") + (UNSPEC_CMLA "") + (UNSPEC_CMLA_CONJ "") + (UNSPEC_CMUL "") + (UNSPEC_CMUL_CONJ "") + (UNSPEC_CMLA180 "270") + (UNSPEC_CMLA180_CONJ "90")]) + +(define_int_attr sve_rot2 [(UNSPEC_FCMLA "90") + (UNSPEC_FCMLA_CONJ "270") + (UNSPEC_FCMUL "90") + (UNSPEC_FCMUL_CONJ "270") + (UNSPEC_FCMLA180 "180") + (UNSPEC_FCMLA180_CONJ "180") + (UNSPEC_CMLA "90") + (UNSPEC_CMLA_CONJ "270") + (UNSPEC_CMUL "90") + (UNSPEC_CMUL_CONJ "270") + (UNSPEC_CMLA180 "180") + (UNSPEC_CMLA180_CONJ "180")]) + + +(define_int_attr fcmac1 [(UNSPEC_FCMLA "a") (UNSPEC_FCMLA_CONJ "a") + (UNSPEC_FCMLA180 "s") (UNSPEC_FCMLA180_CONJ "s") + (UNSPEC_CMLA "a") (UNSPEC_CMLA_CONJ "a") + (UNSPEC_CMLA180 "s") (UNSPEC_CMLA180_CONJ "s")]) (define_int_attr sve_fmla_op [(UNSPEC_COND_FMLA "fmla") (UNSPEC_COND_FMLS "fmls") --wRRV7LY7NUeQGEoC--