From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from EUR04-DB3-obe.outbound.protection.outlook.com (mail-eopbgr60042.outbound.protection.outlook.com [40.107.6.42]) by sourceware.org (Postfix) with ESMTPS id 5FC223857823 for ; Fri, 15 Jan 2021 15:30:33 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 5FC223857823 Received: from AS8PR04CA0196.eurprd04.prod.outlook.com (2603:10a6:20b:2f3::21) by PA4PR08MB6174.eurprd08.prod.outlook.com (2603:10a6:102:e6::8) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3763.10; Fri, 15 Jan 2021 15:30:31 +0000 Received: from VE1EUR03FT006.eop-EUR03.prod.protection.outlook.com (2603:10a6:20b:2f3:cafe::e3) by AS8PR04CA0196.outlook.office365.com (2603:10a6:20b:2f3::21) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3763.9 via Frontend Transport; Fri, 15 Jan 2021 15:30:31 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; gcc.gnu.org; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;gcc.gnu.org; dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by VE1EUR03FT006.mail.protection.outlook.com (10.152.18.116) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3763.12 via Frontend Transport; Fri, 15 Jan 2021 15:30:30 +0000 Received: ("Tessian outbound 2b57fdd78668:v71"); Fri, 15 Jan 2021 15:30:30 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: c6441e6c2e7a2e97 X-CR-MTA-TID: 64aa7808 Received: from e20ccc495de4.2 by 64aa7808-outbound-1.mta.getcheckrecipient.com id AE2F0DD5-2B95-478D-8E98-7ED0278515EE.1; Fri, 15 Jan 2021 15:30:25 +0000 Received: from EUR03-AM5-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id e20ccc495de4.2 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Fri, 15 Jan 2021 15:30:25 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=ZP1POM0dLB+PEbYgkYO59DO0d615d+Z/XdJfTMaQfQtRmydfQwdygHgcO8dyet+kOtkt1ldoQsJPbLvKKMA6Hm8B762+t7kV9S9lf/wccBqqMkQpdkj1fNQXvBHqzCf5ItSeQ4jrPyIuAXWzwnW2rvAKZn4pczeG2pDi7VoHu2FnQzJUIQY/8XRXJr/ZyRSUBuH8wuYwshcuxOnkdEF+3qUnRWN5UV88uDWN+i9SFCdUcMJtK3asVKnCEPk14BfT+1s3YT4HylQ+CHjBgq6QhTvX3JIqf30GtvJs/4/8Itq7VW1FwRjonF+lcv6U+sgILBB6yZPIt8HkKemYD5K5Aw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=zIxIP7PAIuirhXqPI2ILJ2Oty/iOlGKErONn8hSttGM=; b=ErEzxsS/53Nq3RyXp98+Ug/7ipoty23V5OO4+DvMfd3Z2Abpm+bCH3kRchhPA12MwQKYzkINOoDekxjR74FZpKqG+YE0Rmf0iyzsp9psDm3mTBsREY2YTJvz39NOcWlruIxmW9HxNamnhQbCOT82l6fOhOFrE6tauJCLPFX7CsTYbjS8/QtznOpwjstLduvLu6gEEi/XShSO4quwidSqErLwaj13bUUNOuXpj7kReftFW+dhTTjCGlK5gK/PU3xuu+mCa6QdP/8LC/i1SANj39TbFBLPGP7NOmlllVEKRtPq/fjisCPf47F9nmXe5juoX2SIMd4tZEREFI8dZIVUGg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none Authentication-Results-Original: gcc.gnu.org; dkim=none (message not signed) header.d=none;gcc.gnu.org; dmarc=none action=none header.from=arm.com; Received: from VI1PR08MB5325.eurprd08.prod.outlook.com (2603:10a6:803:13e::17) by VI1PR08MB3168.eurprd08.prod.outlook.com (2603:10a6:803:49::14) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3763.10; Fri, 15 Jan 2021 15:30:22 +0000 Received: from VI1PR08MB5325.eurprd08.prod.outlook.com ([fe80::ed1e:9499:4501:2118]) by VI1PR08MB5325.eurprd08.prod.outlook.com ([fe80::ed1e:9499:4501:2118%8]) with mapi id 15.20.3763.012; Fri, 15 Jan 2021 15:30:22 +0000 Date: Fri, 15 Jan 2021 15:30:19 +0000 From: Tamar Christina To: gcc-patches@gcc.gnu.org Cc: nd@arm.com, Richard.Earnshaw@arm.com, Marcus.Shawcroft@arm.com, Kyrylo.Tkachov@arm.com, richard.sandiford@arm.com Subject: [PATCH]AArch64: Add NEON, SVE and SVE2 RTL patterns for Multiply, FMS and FMA. Message-ID: Content-Type: multipart/mixed; boundary="wRRV7LY7NUeQGEoC" Content-Disposition: inline User-Agent: Mutt/1.9.4 (2018-02-28) X-Originating-IP: [217.140.106.53] X-ClientProxiedBy: LO2P265CA0503.GBRP265.PROD.OUTLOOK.COM (2603:10a6:600:13b::10) To VI1PR08MB5325.eurprd08.prod.outlook.com (2603:10a6:803:13e::17) MIME-Version: 1.0 X-MS-Exchange-MessageSentRepresentingType: 1 Received: from arm.com (217.140.106.53) by LO2P265CA0503.GBRP265.PROD.OUTLOOK.COM (2603:10a6:600:13b::10) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3763.10 via Frontend Transport; Fri, 15 Jan 2021 15:30:21 +0000 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-HT: Tenant X-MS-Office365-Filtering-Correlation-Id: 648d2f57-9dbc-4ef9-e103-08d8b96a7e5e X-MS-TrafficTypeDiagnostic: VI1PR08MB3168:|PA4PR08MB6174: X-MS-Exchange-Transport-Forked: True X-Microsoft-Antispam-PRVS: x-checkrecipientrouted: true NoDisclaimer: true X-MS-Oob-TLC-OOBClassifiers: OLM:3513;OLM:3513; X-MS-Exchange-SenderADCheck: 1 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: BpNMEBIG2VJAdNOFI8MPsIsL/UkhICsTNDn8cnVXNAZ+90/+BrDq9OB5NB8oewJpocMJ0+No73s9q64Y1rBIZx4BHLo3w3mNZzTL4xDo/bHkfpLWGpSB/kxyErkwVojbwwVWK50trwm7vZI2gDYjJQt/mYBs0OdaUK2KQiLmc17O+LITyCEB0Okwshp9qGkYhZyd1vG3+0wPZpxsJnU6OStOvzoemya4RpTGMMFB0a9s+cByHrf7GsiyOpUn2w2WcjwCVMkkpI0SABwnh5G+V6LSjac9WcsidbiZm3HTyVqCAoIGcMiiHcPv/Jk7tYKsjlCSef5ARfqosRrzy9qT/W7ulgipw5MYHAL0k8LcD+UOOh1WdFCRnyJpX0JOSc1kntkXJGQn2QtLKok26ze+H7+TsSSsnmXVQ4xErP9os84NiGke22E2EE2zhQ958c0qkgqWOdOqnOSpRYM7XlgunJvn0spzEwE/W1Q3Dki/DBNnE2d2drEnKt9MoVs/hE/a X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:VI1PR08MB5325.eurprd08.prod.outlook.com; PTR:; CAT:NONE; SFS:(4636009)(136003)(366004)(396003)(346002)(39860400002)(376002)(83380400001)(16526019)(4326008)(8676002)(8936002)(86362001)(66616009)(235185007)(30864003)(5660300002)(186003)(44832011)(7696005)(8886007)(2906002)(316002)(956004)(26005)(6916009)(36756003)(66556008)(4743002)(66946007)(66476007)(52116002)(2616005)(478600001)(55016002)(44144004)(33964004)(4216001)(2700100001)(357404004); DIR:OUT; SFP:1101; X-MS-Exchange-AntiSpam-MessageData: =?utf-8?B?TWFzOUdOWHExeGxaSENRbU5mNDFub1kvdk1VQ1YvM2hqSmM1ZFY1OHBEU0pu?= =?utf-8?B?K3lsTEx2U2Rwa2pHNzY5NE5BME82dUNMRmhyV00xb1BEeGcvMnd0eTd2T2dv?= =?utf-8?B?VWtBTDJ0YXBsSFpxdk85a1lrbGhIbE1LOWF0MXR0eE4rQ3A1VU83ZWVRajZT?= =?utf-8?B?cmNBdER0ZGFHM29NOTlvd1hYb0k3Skk1SSsycXkrank0eXh2d1QzS2Z2dWUw?= =?utf-8?B?ZGFGRzhKZ2tiK2tVMngvSWVmVEZHbXUvSzFlQ1BJbHVSM0c5Ny9hMkliNVdp?= =?utf-8?B?bU9Fb2lEczMrYmJYcnI2T0FpU2l2cU40bjkvYUt3aE1BSzJmZjRQazhTSFVz?= =?utf-8?B?bzNXcjBISWFUQmh0TTErTUl1NG1QR3h0QVVpY3MrMlNQZUtISWdWMndMaDFq?= =?utf-8?B?QUtnVlJoK0t1N0pIaEJkSnJIQzNjaTVFaUxKQS9ERmI0MEd6dUxzMW5MU21L?= =?utf-8?B?VEpBZEgwT000RXBmZjVNUFk2LzN5UGo2OTU2VjhmK3NrMGJjNXFaaFNKRW42?= =?utf-8?B?ZWdsRjlEM00xcDY5VzN3UHgvanFYampQSG1vY2V6anlnN3RkZURnS290aDVu?= =?utf-8?B?d1A4U05GK2F5QlN4dEFiM1FEdnJBRnc0dHpaQUVGbmpJUmprRHF5c3BUQURx?= =?utf-8?B?aFNPZ0V5NHFsU1doQW9aVE8xb2lxazFncjhLa0FuM2tjWFo5VmhGd2JaOEN3?= =?utf-8?B?TEl3a1Y4ZTgyLzB4d2lwYWdlU3A2VlBlbjdmM0FIbHNOSGlBZWxrbjZnSHBF?= =?utf-8?B?SjMrYjFDaG9JZE1ad1B1MFNSd0E5QjRURjJHUm1XdXlrTHl5UUJkeDRnc1Z5?= =?utf-8?B?ZmQyYU1wdkhYeTNwc3l0ZTBZN04xWWIrajVSUm94WC9EbE1zRHRPQ1UreTJv?= =?utf-8?B?bkllTTZ5d1pmZkVIUngzUW1WdmdBZnhZeVhPR0NmVkFvS2FzSnhNT0hoTC8y?= =?utf-8?B?OTdwSk11YkpMSlZTQTZxejludmxUTG50cnhqcDRSWURBbTVJNENLR3N2ZkZ4?= =?utf-8?B?UzNTbzRRZXVPblFIYk9wZTlaTmRIWGwzSk8yRDVSK25yZk9iSFNZbkpHWndM?= =?utf-8?B?a1hQL1JvS2dNMWRSOVNTelVlV3dzaFFDSjZOOE9ZR0pBK1gwWmh3dEhtQnUv?= =?utf-8?B?ckd1OTIwckJvdWtRMWFmS3dqM0llNUtkUkwwSTZKeHFlMU9mYUZlM3lXcXhs?= =?utf-8?B?c0ZyYTZkdUZZdjE3NFl4WC9xRGpIU2F1bjJIWm9BOTF4M0JaWXJvYmRsVVdJ?= =?utf-8?B?eTlqOWwrN24rdUdiamdHWk9WaVlIQ05tMVBQblpOTHVWRFdpRm1rUmYxb05G?= =?utf-8?Q?+wnNIMiuG9DVQJuLAS7GLz1KKo7RiLLpR+?= X-MS-Exchange-Transport-CrossTenantHeadersStamped: VI1PR08MB3168 Original-Authentication-Results: gcc.gnu.org; dkim=none (message not signed) header.d=none;gcc.gnu.org; dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: VE1EUR03FT006.eop-EUR03.prod.protection.outlook.com X-MS-Office365-Filtering-Correlation-Id-Prvs: dbe45e68-2cee-4bd3-f6f2-08d8b96a78d1 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: IuCvUfOEiGBMrrkO3hVi5VsEmYnMDA8I7gNWExSwwbMYP4E+pQnRNdSmpZOFg07n3V6gyuT4p6x1DPonLTPa/vGVblV95AGODny/mr7xWst3bbGSJZb2+s8wyLhqjE0O5+jrrVzxNLoqfxQ48eFnqReB+/Z4NUEnSdSdim6+SoOWAnotZt5Up4X4o+f8KkcfHO5b3Cuh3FKtidKk0vJ0BVvzUOB+1ORy3ufHprIhuDzfDPoGu878b4RdhC0ZhrnM4xVUHWSHKfMibvCwdaphpootNwJ8nIZvE0X2bpqsdoWWgzGeW1PBnY+kN4TL9aCFkVJt6kowW7Zsmto1sMAVwOeDXcrD1JhrNl7ofOX8+S4jVp4liv/supziGuFV5adpx1gg+RHdewqu4fzTqDW7xQUftajgDqCjsLhHhSy2yej/7g9IhlG1IiRYMBi2Spr10vLmtyoQbE8oFhy+7MNYXn9Ws7NntNyfqiBzmiXz1yTGhewq4B3rpf0/lJD43igpPHthKnbMVB5R6Is1xzthO4Tn+CoaDgg/TbnCOQtYl1XQQvS3eGCuiTzBiKK1Yjr81SIraHv54MNZIchY7ckVMzJmOAMktpOy9hYDQ/kC9sFBN8wtUkI3ypZ2BqFL6K6eU5GDMrbK95iVWcwcCkhFhG+8mtcPFwbRKReI1hsj6cij2rpVC28dbP63BZxgkz/WZBcu0gA60F2DxIEuBB4qOw== X-Forefront-Antispam-Report: CIP:63.35.35.123; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:64aa7808-outbound-1.mta.getcheckrecipient.com; PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com; CAT:NONE; SFS:(4636009)(39860400002)(376002)(346002)(136003)(396003)(46966006)(8676002)(82740400003)(70586007)(4743002)(356005)(336012)(36756003)(44832011)(235185007)(2616005)(8936002)(8886007)(82310400003)(478600001)(47076005)(316002)(66616009)(70206006)(81166007)(956004)(34020700004)(83380400001)(55016002)(26005)(44144004)(7696005)(2906002)(86362001)(16526019)(186003)(33964004)(4326008)(6916009)(5660300002)(30864003)(4216001)(2700100001)(357404004); DIR:OUT; SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 15 Jan 2021 15:30:30.8975 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 648d2f57-9dbc-4ef9-e103-08d8b96a7e5e X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[63.35.35.123]; Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: VE1EUR03FT006.eop-EUR03.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: PA4PR08MB6174 X-Spam-Status: No, score=-13.8 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, GIT_PATCH_0, KAM_ASCII_DIVIDERS, KAM_LOTSOFHASH, MSGID_FROM_MTA_HEADER, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_PASS, TXREP, UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 15 Jan 2021 15:30:37 -0000 Message-ID: <20210115153019.ntt-ry12tsuCVWjxeaUVMs2f3yE2G_IlrVxidGtmVig@z> --wRRV7LY7NUeQGEoC Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Hi All, This adds implementation for the optabs for complex operations. With this the following C code: void g (float complex a[restrict N], float complex b[restrict N], float complex c[restrict N]) { for (int i=0; i < N; i++) c[i] = a[i] * b[i]; } generates NEON: g: movi v3.4s, 0 mov x3, 0 .p2align 3,,7 .L2: mov v0.16b, v3.16b ldr q2, [x1, x3] ldr q1, [x0, x3] fcmla v0.4s, v1.4s, v2.4s, #0 fcmla v0.4s, v1.4s, v2.4s, #90 str q0, [x2, x3] add x3, x3, 16 cmp x3, 1600 bne .L2 ret SVE: g: mov x3, 0 mov x4, 400 ptrue p1.b, all whilelo p0.s, xzr, x4 mov z3.s, #0 .p2align 3,,7 .L2: ld1w z1.s, p0/z, [x0, x3, lsl 2] ld1w z2.s, p0/z, [x1, x3, lsl 2] movprfx z0, z3 fcmla z0.s, p1/m, z1.s, z2.s, #0 fcmla z0.s, p1/m, z1.s, z2.s, #90 st1w z0.s, p0, [x2, x3, lsl 2] incw x3 whilelo p0.s, x3, x4 b.any .L2 ret SVE2 (with int instead of float) g: mov x3, 0 mov x4, 400 mov z3.b, #0 whilelo p0.s, xzr, x4 .p2align 3,,7 .L2: ld1w z1.s, p0/z, [x0, x3, lsl 2] ld1w z2.s, p0/z, [x1, x3, lsl 2] movprfx z0, z3 cmla z0.s, z1.s, z2.s, #0 cmla z0.s, z1.s, z2.s, #90 st1w z0.s, p0, [x2, x3, lsl 2] incw x3 whilelo p0.s, x3, x4 b.any .L2 ret It defined a new iterator VALL_ARITH which contains types for which we can do general arithmetic (excludes bfloat16). Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. Checked with armv8-a+sve2+fp16 and no issues. Note that sue to a mid-end limitation SLP for SVE currently fails for some permutes. The tests have these marked as XFAIL. I do intend to fix this soon. Execution tests verified with QEMU. Matching tests for these are in the mid-end patches. This I will turn on for these patterns in a separate patch. Ok for master? Thanks, Tamar gcc/ChangeLog: * config/aarch64/aarch64-simd.md (cml4, cmul3): New. * config/aarch64/iterators.md (VALL_ARITH, UNSPEC_FCMUL, UNSPEC_FCMUL180, UNSPEC_FCMLA_CONJ, UNSPEC_FCMLA180_CONJ, UNSPEC_CMLA_CONJ, UNSPEC_CMLA180_CONJ, UNSPEC_CMUL, UNSPEC_CMUL180, FCMLA_OP, FCMUL_OP, rot_op, rotsplit1, rotsplit2, fcmac1, sve_rot1, sve_rot2, SVE2_INT_CMLA_OP, SVE2_INT_CMUL_OP, SVE2_INT_CADD_OP): New. (rot): Add UNSPEC_FCMUL, UNSPEC_FCMUL180. * config/aarch64/aarch64-sve.md (cml4, cmul3): New. * config/aarch64/aarch64-sve2.md (cml4, cmul3): New. --- inline copy of patch -- diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index 4b869ded918fd91ffd41e6ba068239a752b331e5..8a5f1dad224a99a8ba30669139259922a1250d0e 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -516,6 +516,47 @@ (define_insn "aarch64_fcmlaq_lane" [(set_attr "type" "neon_fcmla")] ) +;; The complex mla/mls operations always need to expand to two instructions. +;; The first operation does half the computation and the second does the +;; remainder. Because of this, expand early. +(define_expand "cml4" + [(set (match_operand:VHSDF 0 "register_operand") + (plus:VHSDF (match_operand:VHSDF 1 "register_operand") + (unspec:VHSDF [(match_operand:VHSDF 2 "register_operand") + (match_operand:VHSDF 3 "register_operand")] + FCMLA_OP)))] + "TARGET_COMPLEX && !BYTES_BIG_ENDIAN" +{ + rtx tmp = gen_reg_rtx (mode); + emit_insn (gen_aarch64_fcmla (tmp, operands[1], + operands[3], operands[2])); + emit_insn (gen_aarch64_fcmla (operands[0], tmp, + operands[3], operands[2])); + DONE; +}) + +;; The complex mul operations always need to expand to two instructions. +;; The first operation does half the computation and the second does the +;; remainder. Because of this, expand early. +(define_expand "cmul3" + [(set (match_operand:VHSDF 0 "register_operand") + (unspec:VHSDF [(match_operand:VHSDF 1 "register_operand") + (match_operand:VHSDF 2 "register_operand")] + FCMUL_OP))] + "TARGET_COMPLEX && !BYTES_BIG_ENDIAN" +{ + rtx tmp = gen_reg_rtx (mode); + rtx res1 = gen_reg_rtx (mode); + emit_move_insn (tmp, CONST0_RTX (mode)); + emit_insn (gen_aarch64_fcmla (res1, tmp, + operands[2], operands[1])); + emit_insn (gen_aarch64_fcmla (operands[0], res1, + operands[2], operands[1])); + DONE; +}) + + + ;; These instructions map to the __builtins for the Dot Product operations. (define_insn "aarch64_dot" [(set (match_operand:VS 0 "register_operand" "=w") diff --git a/gcc/config/aarch64/aarch64-sve.md b/gcc/config/aarch64/aarch64-sve.md index da15bd8788507feb12d52894c14e099370f34108..9dfe6a3f4512a20ba4f1e66a105ee0ae5d6949ea 100644 --- a/gcc/config/aarch64/aarch64-sve.md +++ b/gcc/config/aarch64/aarch64-sve.md @@ -7243,6 +7243,62 @@ (define_insn "@aarch64_pred_" [(set_attr "movprfx" "*,yes")] ) +;; unpredicated optab pattern for auto-vectorizer +;; The complex mla/mls operations always need to expand to two instructions. +;; The first operation does half the computation and the second does the +;; remainder. Because of this, expand early. +(define_expand "cml4" + [(set (match_operand:SVE_FULL_F 0 "register_operand") + (unspec:SVE_FULL_F + [(match_dup 4) + (match_dup 5) + (match_operand:SVE_FULL_F 1 "register_operand") + (match_operand:SVE_FULL_F 2 "register_operand") + (match_operand:SVE_FULL_F 3 "register_operand")] + FCMLA_OP))] + "TARGET_SVE" +{ + operands[4] = aarch64_ptrue_reg (mode); + operands[5] = gen_int_mode (SVE_RELAXED_GP, SImode); + rtx tmp = gen_reg_rtx (mode); + emit_insn + (gen_aarch64_pred_fcmla (tmp, operands[4], + operands[3], operands[2], + operands[1], operands[5])); + emit_insn + (gen_aarch64_pred_fcmla (operands[0], operands[4], + operands[3], operands[2], + tmp, operands[5])); + DONE; +}) + +;; unpredicated optab pattern for auto-vectorizer +;; The complex mul operations always need to expand to two instructions. +;; The first operation does half the computation and the second does the +;; remainder. Because of this, expand early. +(define_expand "cmul3" + [(set (match_operand:SVE_FULL_F 0 "register_operand") + (unspec:SVE_FULL_F + [(match_operand:SVE_FULL_F 1 "register_operand") + (match_operand:SVE_FULL_F 2 "register_operand")] + FCMUL_OP))] + "TARGET_SVE" +{ + rtx pred_reg = aarch64_ptrue_reg (mode); + rtx gp_mode = gen_int_mode (SVE_RELAXED_GP, SImode); + rtx accum = force_reg (mode, CONST0_RTX (mode)); + rtx tmp = gen_reg_rtx (mode); + emit_insn + (gen_aarch64_pred_fcmla (tmp, pred_reg, + operands[2], operands[1], + accum, gp_mode)); + emit_insn + (gen_aarch64_pred_fcmla (operands[0], pred_reg, + operands[2], operands[1], + tmp, gp_mode)); + DONE; +}) + ;; Predicated FCMLA with merging. (define_expand "@cond_" [(set (match_operand:SVE_FULL_F 0 "register_operand") diff --git a/gcc/config/aarch64/aarch64-sve2.md b/gcc/config/aarch64/aarch64-sve2.md index 5cb9144da98af2d02b83043511a99b5723d7e8c0..b96708d03f4458726b32ec46c0078499e00b8549 100644 --- a/gcc/config/aarch64/aarch64-sve2.md +++ b/gcc/config/aarch64/aarch64-sve2.md @@ -1848,6 +1848,48 @@ (define_insn "@aarch64__lane_" [(set_attr "movprfx" "*,yes")] ) +;; unpredicated optab pattern for auto-vectorizer +;; The complex mla/mls operations always need to expand to two instructions. +;; The first operation does half the computation and the second does the +;; remainder. Because of this, expand early. +(define_expand "cml4" + [(set (match_operand:SVE_FULL_I 0 "register_operand") + (plus:SVE_FULL_I (match_operand:SVE_FULL_I 1 "register_operand") + (unspec:SVE_FULL_I + [(match_operand:SVE_FULL_I 2 "register_operand") + (match_operand:SVE_FULL_I 3 "register_operand")] + SVE2_INT_CMLA_OP)))] + "TARGET_SVE2" +{ + rtx tmp = gen_reg_rtx (mode); + emit_insn (gen_aarch64_sve_cmla (tmp, operands[1], + operands[3], operands[2])); + emit_insn (gen_aarch64_sve_cmla (operands[0], tmp, + operands[3], operands[2])); + DONE; +}) + +;; unpredicated optab pattern for auto-vectorizer +;; The complex mul operations always need to expand to two instructions. +;; The first operation does half the computation and the second does the +;; remainder. Because of this, expand early. +(define_expand "cmul3" + [(set (match_operand:SVE_FULL_I 0 "register_operand") + (unspec:SVE_FULL_I + [(match_operand:SVE_FULL_I 1 "register_operand") + (match_operand:SVE_FULL_I 2 "register_operand")] + SVE2_INT_CMUL_OP))] + "TARGET_SVE2" +{ + rtx accum = force_reg (mode, CONST0_RTX (mode)); + rtx tmp = gen_reg_rtx (mode); + emit_insn (gen_aarch64_sve_cmla (tmp, accum, + operands[2], operands[1])); + emit_insn (gen_aarch64_sve_cmla (operands[0], tmp, + operands[2], operands[1])); + DONE; +}) + ;; ------------------------------------------------------------------------- ;; ---- [INT] Complex dot product ;; ------------------------------------------------------------------------- diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md index d42a70653edb266f2b76924b75a814db25f08f23..3f61fc8e380abd922d39973f40a966b7ce64fa40 100644 --- a/gcc/config/aarch64/iterators.md +++ b/gcc/config/aarch64/iterators.md @@ -182,6 +182,11 @@ (define_mode_iterator V2F [V2SF V2DF]) ;; All Advanced SIMD modes on which we support any arithmetic operations. (define_mode_iterator VALL [V8QI V16QI V4HI V8HI V2SI V4SI V2DI V2SF V4SF V2DF]) +;; All Advanced SIMD modes suitable for performing arithmetics. +(define_mode_iterator VALL_ARITH [V8QI V16QI V4HI V8HI V2SI V4SI V2DI + (V4HF "TARGET_SIMD_F16INST") (V8HF "TARGET_SIMD_F16INST") + V2SF V4SF V2DF]) + ;; All Advanced SIMD modes suitable for moving, loading, and storing. (define_mode_iterator VALL_F16 [V8QI V16QI V4HI V8HI V2SI V4SI V2DI V4HF V8HF V4BF V8BF V2SF V4SF V2DF]) @@ -712,6 +717,10 @@ (define_c_enum "unspec" UNSPEC_FCMLA90 ; Used in aarch64-simd.md. UNSPEC_FCMLA180 ; Used in aarch64-simd.md. UNSPEC_FCMLA270 ; Used in aarch64-simd.md. + UNSPEC_FCMUL ; Used in aarch64-simd.md. + UNSPEC_FCMUL_CONJ ; Used in aarch64-simd.md. + UNSPEC_FCMLA_CONJ ; Used in aarch64-simd.md. + UNSPEC_FCMLA180_CONJ ; Used in aarch64-simd.md. UNSPEC_ASRD ; Used in aarch64-sve.md. UNSPEC_ADCLB ; Used in aarch64-sve2.md. UNSPEC_ADCLT ; Used in aarch64-sve2.md. @@ -730,6 +739,10 @@ (define_c_enum "unspec" UNSPEC_CMLA180 ; Used in aarch64-sve2.md. UNSPEC_CMLA270 ; Used in aarch64-sve2.md. UNSPEC_CMLA90 ; Used in aarch64-sve2.md. + UNSPEC_CMLA_CONJ ; Used in aarch64-sve2.md. + UNSPEC_CMLA180_CONJ ; Used in aarch64-sve2.md. + UNSPEC_CMUL ; Used in aarch64-sve2.md. + UNSPEC_CMUL_CONJ ; Used in aarch64-sve2.md. UNSPEC_COND_FCVTLT ; Used in aarch64-sve2.md. UNSPEC_COND_FCVTNT ; Used in aarch64-sve2.md. UNSPEC_COND_FCVTX ; Used in aarch64-sve2.md. @@ -1291,7 +1304,7 @@ (define_mode_attr Vwide [(V2SF "v2df") (V4HF "v4sf") ;; Widened mode register suffixes for VD_BHSI/VQW/VQ_HSF. (define_mode_attr Vwtype [(V8QI "8h") (V4HI "4s") - (V2SI "2d") (V16QI "8h") + (V2SI "2d") (V16QI "8h") (V8HI "4s") (V4SI "2d") (V8HF "4s") (V4SF "2d")]) @@ -1313,7 +1326,7 @@ (define_mode_attr Vewtype [(VNx16QI "h") ;; Widened mode register suffixes for VDW/VQW. (define_mode_attr Vmwtype [(V8QI ".8h") (V4HI ".4s") - (V2SI ".2d") (V16QI ".8h") + (V2SI ".2d") (V16QI ".8h") (V8HI ".4s") (V4SI ".2d") (V4HF ".4s") (V2SF ".2d") (SI "") (HI "")]) @@ -2611,6 +2624,19 @@ (define_int_iterator SVE2_INT_CMLA [UNSPEC_CMLA UNSPEC_SQRDCMLAH180 UNSPEC_SQRDCMLAH270]) +;; Unlike the normal CMLA instructions these represent the actual operation you +;; to be performed. They will always need to be expanded into multiple +;; sequences consisting of CMLA. +(define_int_iterator SVE2_INT_CMLA_OP [UNSPEC_CMLA + UNSPEC_CMLA_CONJ + UNSPEC_CMLA180]) + +;; Unlike the normal CMLA instructions these represent the actual operation you +;; to be performed. They will always need to be expanded into multiple +;; sequences consisting of CMLA. +(define_int_iterator SVE2_INT_CMUL_OP [UNSPEC_CMUL + UNSPEC_CMUL_CONJ]) + ;; Same as SVE2_INT_CADD but exclude the saturating instructions (define_int_iterator SVE2_INT_CADD_OP [UNSPEC_CADD90 UNSPEC_CADD270]) @@ -2725,6 +2751,14 @@ (define_int_iterator FMMLA [UNSPEC_FMMLA]) (define_int_iterator BF_MLA [UNSPEC_BFMLALB UNSPEC_BFMLALT]) +(define_int_iterator FCMLA_OP [UNSPEC_FCMLA + UNSPEC_FCMLA180 + UNSPEC_FCMLA_CONJ + UNSPEC_FCMLA180_CONJ]) + +(define_int_iterator FCMUL_OP [UNSPEC_FCMUL + UNSPEC_FCMUL_CONJ]) + ;; Iterators for atomic operations. (define_int_iterator ATOMIC_LDOP @@ -3435,7 +3469,79 @@ (define_int_attr rot [(UNSPEC_CADD90 "90") (UNSPEC_COND_FCMLA "0") (UNSPEC_COND_FCMLA90 "90") (UNSPEC_COND_FCMLA180 "180") - (UNSPEC_COND_FCMLA270 "270")]) + (UNSPEC_COND_FCMLA270 "270") + (UNSPEC_FCMUL "0") + (UNSPEC_FCMUL_CONJ "180")]) + +;; A conjucate is a negation of the imaginary component +;; The number in the unspecs are the rotation component of the instruction, e.g +;; FCMLA180 means use the instruction with #180. +;; The iterator is used to produce the right name mangling for the function. +(define_int_attr rot_op [(UNSPEC_FCMLA180 "") + (UNSPEC_FCMLA180_CONJ "_conj") + (UNSPEC_FCMLA "") + (UNSPEC_FCMLA_CONJ "_conj") + (UNSPEC_FCMUL "") + (UNSPEC_FCMUL_CONJ "_conj") + (UNSPEC_CMLA "") + (UNSPEC_CMLA180 "") + (UNSPEC_CMLA_CONJ "_conj") + (UNSPEC_CMUL "") + (UNSPEC_CMUL_CONJ "_conj")]) + +;; The complex operations when performed on a real complex number require two +;; instructions to perform the operation. e.g. complex multiplication requires +;; two FCMUL with a particular rotation value. +;; +;; These values can be looked up in rotsplit1 and rotsplit2. as an example +;; FCMUL needs the first instruction to use #0 and the second #90. +(define_int_attr rotsplit1 [(UNSPEC_FCMLA "0") + (UNSPEC_FCMLA_CONJ "0") + (UNSPEC_FCMUL "0") + (UNSPEC_FCMUL_CONJ "0") + (UNSPEC_FCMLA180 "270") + (UNSPEC_FCMLA180_CONJ "90")]) + +(define_int_attr rotsplit2 [(UNSPEC_FCMLA "90") + (UNSPEC_FCMLA_CONJ "270") + (UNSPEC_FCMUL "90") + (UNSPEC_FCMUL_CONJ "270") + (UNSPEC_FCMLA180 "180") + (UNSPEC_FCMLA180_CONJ "180")]) + +;; SVE has slightly different namings from NEON so we have to split these +;; iterators. +(define_int_attr sve_rot1 [(UNSPEC_FCMLA "") + (UNSPEC_FCMLA_CONJ "") + (UNSPEC_FCMUL "") + (UNSPEC_FCMUL_CONJ "") + (UNSPEC_FCMLA180 "270") + (UNSPEC_FCMLA180_CONJ "90") + (UNSPEC_CMLA "") + (UNSPEC_CMLA_CONJ "") + (UNSPEC_CMUL "") + (UNSPEC_CMUL_CONJ "") + (UNSPEC_CMLA180 "270") + (UNSPEC_CMLA180_CONJ "90")]) + +(define_int_attr sve_rot2 [(UNSPEC_FCMLA "90") + (UNSPEC_FCMLA_CONJ "270") + (UNSPEC_FCMUL "90") + (UNSPEC_FCMUL_CONJ "270") + (UNSPEC_FCMLA180 "180") + (UNSPEC_FCMLA180_CONJ "180") + (UNSPEC_CMLA "90") + (UNSPEC_CMLA_CONJ "270") + (UNSPEC_CMUL "90") + (UNSPEC_CMUL_CONJ "270") + (UNSPEC_CMLA180 "180") + (UNSPEC_CMLA180_CONJ "180")]) + + +(define_int_attr fcmac1 [(UNSPEC_FCMLA "a") (UNSPEC_FCMLA_CONJ "a") + (UNSPEC_FCMLA180 "s") (UNSPEC_FCMLA180_CONJ "s") + (UNSPEC_CMLA "a") (UNSPEC_CMLA_CONJ "a") + (UNSPEC_CMLA180 "s") (UNSPEC_CMLA180_CONJ "s")]) (define_int_attr sve_fmla_op [(UNSPEC_COND_FMLA "fmla") (UNSPEC_COND_FMLS "fmls") -- --wRRV7LY7NUeQGEoC Content-Type: text/x-diff; charset=utf-8 Content-Disposition: attachment; filename="rb13907.patch" diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index 4b869ded918fd91ffd41e6ba068239a752b331e5..8a5f1dad224a99a8ba30669139259922a1250d0e 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -516,6 +516,47 @@ (define_insn "aarch64_fcmlaq_lane" [(set_attr "type" "neon_fcmla")] ) +;; The complex mla/mls operations always need to expand to two instructions. +;; The first operation does half the computation and the second does the +;; remainder. Because of this, expand early. +(define_expand "cml4" + [(set (match_operand:VHSDF 0 "register_operand") + (plus:VHSDF (match_operand:VHSDF 1 "register_operand") + (unspec:VHSDF [(match_operand:VHSDF 2 "register_operand") + (match_operand:VHSDF 3 "register_operand")] + FCMLA_OP)))] + "TARGET_COMPLEX && !BYTES_BIG_ENDIAN" +{ + rtx tmp = gen_reg_rtx (mode); + emit_insn (gen_aarch64_fcmla (tmp, operands[1], + operands[3], operands[2])); + emit_insn (gen_aarch64_fcmla (operands[0], tmp, + operands[3], operands[2])); + DONE; +}) + +;; The complex mul operations always need to expand to two instructions. +;; The first operation does half the computation and the second does the +;; remainder. Because of this, expand early. +(define_expand "cmul3" + [(set (match_operand:VHSDF 0 "register_operand") + (unspec:VHSDF [(match_operand:VHSDF 1 "register_operand") + (match_operand:VHSDF 2 "register_operand")] + FCMUL_OP))] + "TARGET_COMPLEX && !BYTES_BIG_ENDIAN" +{ + rtx tmp = gen_reg_rtx (mode); + rtx res1 = gen_reg_rtx (mode); + emit_move_insn (tmp, CONST0_RTX (mode)); + emit_insn (gen_aarch64_fcmla (res1, tmp, + operands[2], operands[1])); + emit_insn (gen_aarch64_fcmla (operands[0], res1, + operands[2], operands[1])); + DONE; +}) + + + ;; These instructions map to the __builtins for the Dot Product operations. (define_insn "aarch64_dot" [(set (match_operand:VS 0 "register_operand" "=w") diff --git a/gcc/config/aarch64/aarch64-sve.md b/gcc/config/aarch64/aarch64-sve.md index da15bd8788507feb12d52894c14e099370f34108..9dfe6a3f4512a20ba4f1e66a105ee0ae5d6949ea 100644 --- a/gcc/config/aarch64/aarch64-sve.md +++ b/gcc/config/aarch64/aarch64-sve.md @@ -7243,6 +7243,62 @@ (define_insn "@aarch64_pred_" [(set_attr "movprfx" "*,yes")] ) +;; unpredicated optab pattern for auto-vectorizer +;; The complex mla/mls operations always need to expand to two instructions. +;; The first operation does half the computation and the second does the +;; remainder. Because of this, expand early. +(define_expand "cml4" + [(set (match_operand:SVE_FULL_F 0 "register_operand") + (unspec:SVE_FULL_F + [(match_dup 4) + (match_dup 5) + (match_operand:SVE_FULL_F 1 "register_operand") + (match_operand:SVE_FULL_F 2 "register_operand") + (match_operand:SVE_FULL_F 3 "register_operand")] + FCMLA_OP))] + "TARGET_SVE" +{ + operands[4] = aarch64_ptrue_reg (mode); + operands[5] = gen_int_mode (SVE_RELAXED_GP, SImode); + rtx tmp = gen_reg_rtx (mode); + emit_insn + (gen_aarch64_pred_fcmla (tmp, operands[4], + operands[3], operands[2], + operands[1], operands[5])); + emit_insn + (gen_aarch64_pred_fcmla (operands[0], operands[4], + operands[3], operands[2], + tmp, operands[5])); + DONE; +}) + +;; unpredicated optab pattern for auto-vectorizer +;; The complex mul operations always need to expand to two instructions. +;; The first operation does half the computation and the second does the +;; remainder. Because of this, expand early. +(define_expand "cmul3" + [(set (match_operand:SVE_FULL_F 0 "register_operand") + (unspec:SVE_FULL_F + [(match_operand:SVE_FULL_F 1 "register_operand") + (match_operand:SVE_FULL_F 2 "register_operand")] + FCMUL_OP))] + "TARGET_SVE" +{ + rtx pred_reg = aarch64_ptrue_reg (mode); + rtx gp_mode = gen_int_mode (SVE_RELAXED_GP, SImode); + rtx accum = force_reg (mode, CONST0_RTX (mode)); + rtx tmp = gen_reg_rtx (mode); + emit_insn + (gen_aarch64_pred_fcmla (tmp, pred_reg, + operands[2], operands[1], + accum, gp_mode)); + emit_insn + (gen_aarch64_pred_fcmla (operands[0], pred_reg, + operands[2], operands[1], + tmp, gp_mode)); + DONE; +}) + ;; Predicated FCMLA with merging. (define_expand "@cond_" [(set (match_operand:SVE_FULL_F 0 "register_operand") diff --git a/gcc/config/aarch64/aarch64-sve2.md b/gcc/config/aarch64/aarch64-sve2.md index 5cb9144da98af2d02b83043511a99b5723d7e8c0..b96708d03f4458726b32ec46c0078499e00b8549 100644 --- a/gcc/config/aarch64/aarch64-sve2.md +++ b/gcc/config/aarch64/aarch64-sve2.md @@ -1848,6 +1848,48 @@ (define_insn "@aarch64__lane_" [(set_attr "movprfx" "*,yes")] ) +;; unpredicated optab pattern for auto-vectorizer +;; The complex mla/mls operations always need to expand to two instructions. +;; The first operation does half the computation and the second does the +;; remainder. Because of this, expand early. +(define_expand "cml4" + [(set (match_operand:SVE_FULL_I 0 "register_operand") + (plus:SVE_FULL_I (match_operand:SVE_FULL_I 1 "register_operand") + (unspec:SVE_FULL_I + [(match_operand:SVE_FULL_I 2 "register_operand") + (match_operand:SVE_FULL_I 3 "register_operand")] + SVE2_INT_CMLA_OP)))] + "TARGET_SVE2" +{ + rtx tmp = gen_reg_rtx (mode); + emit_insn (gen_aarch64_sve_cmla (tmp, operands[1], + operands[3], operands[2])); + emit_insn (gen_aarch64_sve_cmla (operands[0], tmp, + operands[3], operands[2])); + DONE; +}) + +;; unpredicated optab pattern for auto-vectorizer +;; The complex mul operations always need to expand to two instructions. +;; The first operation does half the computation and the second does the +;; remainder. Because of this, expand early. +(define_expand "cmul3" + [(set (match_operand:SVE_FULL_I 0 "register_operand") + (unspec:SVE_FULL_I + [(match_operand:SVE_FULL_I 1 "register_operand") + (match_operand:SVE_FULL_I 2 "register_operand")] + SVE2_INT_CMUL_OP))] + "TARGET_SVE2" +{ + rtx accum = force_reg (mode, CONST0_RTX (mode)); + rtx tmp = gen_reg_rtx (mode); + emit_insn (gen_aarch64_sve_cmla (tmp, accum, + operands[2], operands[1])); + emit_insn (gen_aarch64_sve_cmla (operands[0], tmp, + operands[2], operands[1])); + DONE; +}) + ;; ------------------------------------------------------------------------- ;; ---- [INT] Complex dot product ;; ------------------------------------------------------------------------- diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md index d42a70653edb266f2b76924b75a814db25f08f23..3f61fc8e380abd922d39973f40a966b7ce64fa40 100644 --- a/gcc/config/aarch64/iterators.md +++ b/gcc/config/aarch64/iterators.md @@ -182,6 +182,11 @@ (define_mode_iterator V2F [V2SF V2DF]) ;; All Advanced SIMD modes on which we support any arithmetic operations. (define_mode_iterator VALL [V8QI V16QI V4HI V8HI V2SI V4SI V2DI V2SF V4SF V2DF]) +;; All Advanced SIMD modes suitable for performing arithmetics. +(define_mode_iterator VALL_ARITH [V8QI V16QI V4HI V8HI V2SI V4SI V2DI + (V4HF "TARGET_SIMD_F16INST") (V8HF "TARGET_SIMD_F16INST") + V2SF V4SF V2DF]) + ;; All Advanced SIMD modes suitable for moving, loading, and storing. (define_mode_iterator VALL_F16 [V8QI V16QI V4HI V8HI V2SI V4SI V2DI V4HF V8HF V4BF V8BF V2SF V4SF V2DF]) @@ -712,6 +717,10 @@ (define_c_enum "unspec" UNSPEC_FCMLA90 ; Used in aarch64-simd.md. UNSPEC_FCMLA180 ; Used in aarch64-simd.md. UNSPEC_FCMLA270 ; Used in aarch64-simd.md. + UNSPEC_FCMUL ; Used in aarch64-simd.md. + UNSPEC_FCMUL_CONJ ; Used in aarch64-simd.md. + UNSPEC_FCMLA_CONJ ; Used in aarch64-simd.md. + UNSPEC_FCMLA180_CONJ ; Used in aarch64-simd.md. UNSPEC_ASRD ; Used in aarch64-sve.md. UNSPEC_ADCLB ; Used in aarch64-sve2.md. UNSPEC_ADCLT ; Used in aarch64-sve2.md. @@ -730,6 +739,10 @@ (define_c_enum "unspec" UNSPEC_CMLA180 ; Used in aarch64-sve2.md. UNSPEC_CMLA270 ; Used in aarch64-sve2.md. UNSPEC_CMLA90 ; Used in aarch64-sve2.md. + UNSPEC_CMLA_CONJ ; Used in aarch64-sve2.md. + UNSPEC_CMLA180_CONJ ; Used in aarch64-sve2.md. + UNSPEC_CMUL ; Used in aarch64-sve2.md. + UNSPEC_CMUL_CONJ ; Used in aarch64-sve2.md. UNSPEC_COND_FCVTLT ; Used in aarch64-sve2.md. UNSPEC_COND_FCVTNT ; Used in aarch64-sve2.md. UNSPEC_COND_FCVTX ; Used in aarch64-sve2.md. @@ -1291,7 +1304,7 @@ (define_mode_attr Vwide [(V2SF "v2df") (V4HF "v4sf") ;; Widened mode register suffixes for VD_BHSI/VQW/VQ_HSF. (define_mode_attr Vwtype [(V8QI "8h") (V4HI "4s") - (V2SI "2d") (V16QI "8h") + (V2SI "2d") (V16QI "8h") (V8HI "4s") (V4SI "2d") (V8HF "4s") (V4SF "2d")]) @@ -1313,7 +1326,7 @@ (define_mode_attr Vewtype [(VNx16QI "h") ;; Widened mode register suffixes for VDW/VQW. (define_mode_attr Vmwtype [(V8QI ".8h") (V4HI ".4s") - (V2SI ".2d") (V16QI ".8h") + (V2SI ".2d") (V16QI ".8h") (V8HI ".4s") (V4SI ".2d") (V4HF ".4s") (V2SF ".2d") (SI "") (HI "")]) @@ -2611,6 +2624,19 @@ (define_int_iterator SVE2_INT_CMLA [UNSPEC_CMLA UNSPEC_SQRDCMLAH180 UNSPEC_SQRDCMLAH270]) +;; Unlike the normal CMLA instructions these represent the actual operation you +;; to be performed. They will always need to be expanded into multiple +;; sequences consisting of CMLA. +(define_int_iterator SVE2_INT_CMLA_OP [UNSPEC_CMLA + UNSPEC_CMLA_CONJ + UNSPEC_CMLA180]) + +;; Unlike the normal CMLA instructions these represent the actual operation you +;; to be performed. They will always need to be expanded into multiple +;; sequences consisting of CMLA. +(define_int_iterator SVE2_INT_CMUL_OP [UNSPEC_CMUL + UNSPEC_CMUL_CONJ]) + ;; Same as SVE2_INT_CADD but exclude the saturating instructions (define_int_iterator SVE2_INT_CADD_OP [UNSPEC_CADD90 UNSPEC_CADD270]) @@ -2725,6 +2751,14 @@ (define_int_iterator FMMLA [UNSPEC_FMMLA]) (define_int_iterator BF_MLA [UNSPEC_BFMLALB UNSPEC_BFMLALT]) +(define_int_iterator FCMLA_OP [UNSPEC_FCMLA + UNSPEC_FCMLA180 + UNSPEC_FCMLA_CONJ + UNSPEC_FCMLA180_CONJ]) + +(define_int_iterator FCMUL_OP [UNSPEC_FCMUL + UNSPEC_FCMUL_CONJ]) + ;; Iterators for atomic operations. (define_int_iterator ATOMIC_LDOP @@ -3435,7 +3469,79 @@ (define_int_attr rot [(UNSPEC_CADD90 "90") (UNSPEC_COND_FCMLA "0") (UNSPEC_COND_FCMLA90 "90") (UNSPEC_COND_FCMLA180 "180") - (UNSPEC_COND_FCMLA270 "270")]) + (UNSPEC_COND_FCMLA270 "270") + (UNSPEC_FCMUL "0") + (UNSPEC_FCMUL_CONJ "180")]) + +;; A conjucate is a negation of the imaginary component +;; The number in the unspecs are the rotation component of the instruction, e.g +;; FCMLA180 means use the instruction with #180. +;; The iterator is used to produce the right name mangling for the function. +(define_int_attr rot_op [(UNSPEC_FCMLA180 "") + (UNSPEC_FCMLA180_CONJ "_conj") + (UNSPEC_FCMLA "") + (UNSPEC_FCMLA_CONJ "_conj") + (UNSPEC_FCMUL "") + (UNSPEC_FCMUL_CONJ "_conj") + (UNSPEC_CMLA "") + (UNSPEC_CMLA180 "") + (UNSPEC_CMLA_CONJ "_conj") + (UNSPEC_CMUL "") + (UNSPEC_CMUL_CONJ "_conj")]) + +;; The complex operations when performed on a real complex number require two +;; instructions to perform the operation. e.g. complex multiplication requires +;; two FCMUL with a particular rotation value. +;; +;; These values can be looked up in rotsplit1 and rotsplit2. as an example +;; FCMUL needs the first instruction to use #0 and the second #90. +(define_int_attr rotsplit1 [(UNSPEC_FCMLA "0") + (UNSPEC_FCMLA_CONJ "0") + (UNSPEC_FCMUL "0") + (UNSPEC_FCMUL_CONJ "0") + (UNSPEC_FCMLA180 "270") + (UNSPEC_FCMLA180_CONJ "90")]) + +(define_int_attr rotsplit2 [(UNSPEC_FCMLA "90") + (UNSPEC_FCMLA_CONJ "270") + (UNSPEC_FCMUL "90") + (UNSPEC_FCMUL_CONJ "270") + (UNSPEC_FCMLA180 "180") + (UNSPEC_FCMLA180_CONJ "180")]) + +;; SVE has slightly different namings from NEON so we have to split these +;; iterators. +(define_int_attr sve_rot1 [(UNSPEC_FCMLA "") + (UNSPEC_FCMLA_CONJ "") + (UNSPEC_FCMUL "") + (UNSPEC_FCMUL_CONJ "") + (UNSPEC_FCMLA180 "270") + (UNSPEC_FCMLA180_CONJ "90") + (UNSPEC_CMLA "") + (UNSPEC_CMLA_CONJ "") + (UNSPEC_CMUL "") + (UNSPEC_CMUL_CONJ "") + (UNSPEC_CMLA180 "270") + (UNSPEC_CMLA180_CONJ "90")]) + +(define_int_attr sve_rot2 [(UNSPEC_FCMLA "90") + (UNSPEC_FCMLA_CONJ "270") + (UNSPEC_FCMUL "90") + (UNSPEC_FCMUL_CONJ "270") + (UNSPEC_FCMLA180 "180") + (UNSPEC_FCMLA180_CONJ "180") + (UNSPEC_CMLA "90") + (UNSPEC_CMLA_CONJ "270") + (UNSPEC_CMUL "90") + (UNSPEC_CMUL_CONJ "270") + (UNSPEC_CMLA180 "180") + (UNSPEC_CMLA180_CONJ "180")]) + + +(define_int_attr fcmac1 [(UNSPEC_FCMLA "a") (UNSPEC_FCMLA_CONJ "a") + (UNSPEC_FCMLA180 "s") (UNSPEC_FCMLA180_CONJ "s") + (UNSPEC_CMLA "a") (UNSPEC_CMLA_CONJ "a") + (UNSPEC_CMLA180 "s") (UNSPEC_CMLA180_CONJ "s")]) (define_int_attr sve_fmla_op [(UNSPEC_COND_FMLA "fmla") (UNSPEC_COND_FMLS "fmls") --wRRV7LY7NUeQGEoC--