From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from EUR05-AM6-obe.outbound.protection.outlook.com (mail-am6eur05on2056.outbound.protection.outlook.com [40.107.22.56]) by sourceware.org (Postfix) with ESMTPS id 25FEA3858D39 for ; Wed, 19 Oct 2022 15:34:00 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 25FEA3858D39 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com ARC-Seal: i=2; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=pass; b=V4oTK1XwiHUxEc4BYLYdm2L9WCzW7cX8+9QHffWCIe3U44UZ1GJ0lMxJdHEQU0+dNdXXch/Qnck2/mLLL9P6uAWGKT6M8FWBYuHQWloDtPAnTLMVeJ6Zpv54JdC2qIl6KnfauVxbpp2C9RIlaXHTXfQ9759nLdTZR3IwpuZzl6ZrL6MWdoj4dkof1WaM8RT7n9MbfPMhPQ9c7Oo8AP8jeEcxp0F5mMHbYChUeY6l2cW/JCXc4FaNZAWgB1JjiooNNySYZojAAC31j7xbOZhOCjgX+s8n1vprzoIzgOZc3Z0nGL0k5E6nRTXL1YowO395o6zlnTTpdFOe8BMsbSOWIw== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=Ruk8NFOlYTCWacmcQprfYvI0aDBiYBqQX3pO4ptTGR0=; b=I7NJ1Aquua62zF8OdCEjlH944NzoNi2ekvSTwWB5fwdrmWniHyECEZ78VnrTggL5wP2Bby7t0Hd3DwiwjzvumeLYmSToE2aw03NHGN1ZK2UCPSVvQSfXw+UPjXou4ZEPndCTuLNLWZwK5/BDNzvGjI8CLDaSulW/+OsMlUdlQd6e5uqgceNozDZgthc/6+pg002MHXJbUzphN7CMCX6ktuIamMcpvNl0bICTu3rmC3C69IcEY5vL510+RBw++HRmJR7jfqrYQbFqZt9ni/qXeUI6MaZNTIIulxUcVrDZTPHP/er9CQ3QFQqSPxi5VNUtrfCXR3RZw+UnIATEFYUcww== ARC-Authentication-Results: i=2; mx.microsoft.com 1; spf=pass (sender ip is 63.35.35.123) smtp.rcpttodomain=gcc.gnu.org smtp.mailfrom=arm.com; dmarc=pass (p=none sp=none pct=100) action=none header.from=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com; arc=pass (0 oda=1 ltdi=1 spf=[1,1,smtp.mailfrom=arm.com] dkim=[1,1,header.d=arm.com] dmarc=[1,1,header.from=arm.com]) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=Ruk8NFOlYTCWacmcQprfYvI0aDBiYBqQX3pO4ptTGR0=; b=lWyVzOOmZCladJfwHgQbqbV2ysgOUyMn8/fE484W002D1kKjgxslQcDqwvi8gLFfV5nuMAbj27imutcr89aB2i2Vq2gpg92kyCdIBCpX9oQUIbcHLdcFE/XsLMS/rRLHfvVmg3gGM2Jxl6gziq8wdD5CTocblM4QNa0cZMqW9yU= Received: from FR3P281CA0150.DEUP281.PROD.OUTLOOK.COM (2603:10a6:d10:95::19) by DBBPR08MB6284.eurprd08.prod.outlook.com (2603:10a6:10:20e::15) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5723.34; Wed, 19 Oct 2022 15:33:57 +0000 Received: from VI1EUR03FT021.eop-EUR03.prod.protection.outlook.com (2603:10a6:d10:95::4) by FR3P281CA0150.outlook.office365.com (2603:10a6:d10:95::19) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5746.17 via Frontend Transport; Wed, 19 Oct 2022 15:33:56 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; pr=C Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by VI1EUR03FT021.mail.protection.outlook.com (100.127.144.91) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5723.32 via Frontend Transport; Wed, 19 Oct 2022 15:33:56 +0000 Received: ("Tessian outbound 58faf9791229:v130"); Wed, 19 Oct 2022 15:33:56 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: 142312a451d3863f X-CR-MTA-TID: 64aa7808 Received: from d55e3fed6ca4.1 by 64aa7808-outbound-1.mta.getcheckrecipient.com id A1F81CCB-1919-4F01-B7D8-829F440A4EF0.1; Wed, 19 Oct 2022 15:33:49 +0000 Received: from EUR04-HE1-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id d55e3fed6ca4.1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Wed, 19 Oct 2022 15:33:49 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=Hiix4zkHv8Cfa9VxO7+tB1Xksxvgip0YftUg++Wi3CZk40nJZ+yVJmUmy+WRGJqPBu8RMArf+xqFvyptc1NGA5ooAYerBK3Ga5QTzJyhMt9InIek+nL6tiA/S4nkZ2sBpIvZxwTh13pezr+0V5sKGBWTjKSVF5raZ15H26S7WHgvYSn93xVtZL1mo7mQNUzuak1Iau0QvE/84tns4xBSZ0zLzn+KMqZitBasm3ageH4afvBWm6hHn3mQAghN7SMF/JdkmbAeX9TPmF+diGCQZqDDMiseEgyLHTe4/CHjeQrA9605A9qCv7wU5F4CxyVNA6m1Fdcb6a3J7o1tWy8uTA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=Ruk8NFOlYTCWacmcQprfYvI0aDBiYBqQX3pO4ptTGR0=; b=jjiPISgaepnulADh68FEW/jZF+QZ3pGi+sJmzqlLSdLO8kXk+HK4+ggZszrs/lJWqMAQ9yZDLhGlVxXcHlDQBkW8BhQ1OMXgX+UmkdKuyaT3ZS0DE10GNz/QFmFsKz0eSSkxjQoTaY8JR/s1+IfI+37TcQIaCdPAexJpcj+3qNDj7LRT5EE7/lnwm+oEeaaL6rlZLcbXrS9esVX0G5AmI4U12MCyRK1j3rLjk+8Qj3xmjftyE4mp0tLMYiAQKn8TxlTnfmdhWgXGWlLsNOpgt+xgz/TKy9mYMLqxxx5joAY64yIlyeMfGbCcPHU+yqGJmqjgwURUsfQlUFofEL94HA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=Ruk8NFOlYTCWacmcQprfYvI0aDBiYBqQX3pO4ptTGR0=; b=lWyVzOOmZCladJfwHgQbqbV2ysgOUyMn8/fE484W002D1kKjgxslQcDqwvi8gLFfV5nuMAbj27imutcr89aB2i2Vq2gpg92kyCdIBCpX9oQUIbcHLdcFE/XsLMS/rRLHfvVmg3gGM2Jxl6gziq8wdD5CTocblM4QNa0cZMqW9yU= Received: from AS4PR08MB7901.eurprd08.prod.outlook.com (2603:10a6:20b:51c::16) by AS8PR08MB6293.eurprd08.prod.outlook.com (2603:10a6:20b:23e::14) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5723.34; Wed, 19 Oct 2022 15:33:46 +0000 Received: from AS4PR08MB7901.eurprd08.prod.outlook.com ([fe80::8e59:e879:9b60:316f]) by AS4PR08MB7901.eurprd08.prod.outlook.com ([fe80::8e59:e879:9b60:316f%5]) with mapi id 15.20.5723.034; Wed, 19 Oct 2022 15:33:46 +0000 From: Wilco Dijkstra To: Richard Sandiford CC: Wilco Dijkstra via Gcc-patches Subject: Re: [PATCH][AArch64] Improve immediate expansion [PR106583] Thread-Topic: [PATCH][AArch64] Improve immediate expansion [PR106583] Thread-Index: AQHY2A60YLzBQMfh40K/8+uaM+WmL63/fNejgAIgoBGAAVDLSYAABmEYgAATQ+2AB9iuA4ALD4Hd Date: Wed, 19 Oct 2022 15:33:46 +0000 Message-ID: References: In-Reply-To: Accept-Language: en-GB, en-US Content-Language: en-GB X-MS-Has-Attach: X-MS-TNEF-Correlator: msip_labels: Authentication-Results-Original: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; x-ms-traffictypediagnostic: AS4PR08MB7901:EE_|AS8PR08MB6293:EE_|VI1EUR03FT021:EE_|DBBPR08MB6284:EE_ X-MS-Office365-Filtering-Correlation-Id: fbbbcaa7-ea09-4d09-5488-08dab1e75613 x-checkrecipientrouted: true nodisclaimer: true X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: m3SYOWdXPotXKRp/aPJaUWY6qc5JCOFjneI9Ss50VXUZErIlTo9+VbUzuUoczegAJfRZkKdyq53jaQU04ih4OuN9ffflRsA0Y08LITCutFxsO7tx7164J+u4U6kzqR1zpg4xejY0QZnJyEv24h8LyQzw7CQVzOg4HhqFEHq/w8romxknibw7hFzJUs7NUZ4ChBtoBJSlJ0CEiEC2znLgYRaQejp3uFxIc0hOYLrc/LQXwEFS4G2VCB1t9g7j4PDftTq4LC1DCc78Q1COESuX0UsrpDW5PL1cDiPvTu/JZ9Bog+8mlIBLIvtTeCk+x+UgEh/K07qlEqklTP736CyXBBNkjXHLG7b2L9BLptnpM4IlVxoCVJosk96FDsTeGWO8tpxWCM2R6qNO0fPBD7wus7QdabIxaTuYcNqxocqQaM9H03qu5KK8hBSWfrf/oJ5iXc8/XvSAMhjN38mRbtT09ItzQq2yd1L3q2aikjdvwz7n9Bn9nVYNemBXhwOTyNiJxh+O2ACet49xktzsCNFIgTpg0UHVSplrniOyJrt9aLI4+6IlBeHFFJJMWqXx7vr0xIPiOoDOPBRD8G8bJjMLxaHtE2RG7L9PZk1IF/41RA5xdhRasNka+3TtDQzOkI9kG4nW9fb2C9B8pETxYXefy70U7QfQ2wNYKlZj2M/PuSe6IR+kpJM+LyGCIH+oJhvIMvNOKu52bnbe+kr4Qpoy1mW5eh40qoX+76L9NKTNvb1xBY1K8thLUoGHYMVGr3U7MLhth3i9Cu1iiUxEp59jybSSzV9vKIdbv00d2CiDwxnJshTIcJISzqqoWcZjqRWH X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:AS4PR08MB7901.eurprd08.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230022)(4636009)(396003)(39860400002)(366004)(346002)(376002)(136003)(451199015)(2906002)(71200400001)(33656002)(186003)(38100700002)(122000001)(84970400001)(478600001)(38070700005)(83380400001)(66476007)(4326008)(64756008)(66446008)(8676002)(66556008)(66946007)(76116006)(30864003)(6636002)(26005)(86362001)(41300700001)(9686003)(7696005)(6506007)(316002)(8936002)(55016003)(6862004)(5660300002)(91956017)(52536014)(14773001)(559001)(579004);DIR:OUT;SFP:1101; Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-MS-Exchange-Transport-CrossTenantHeadersStamped: AS8PR08MB6293 Original-Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: VI1EUR03FT021.eop-EUR03.prod.protection.outlook.com X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id-Prvs: f4ca90db-f648-4989-cbab-08dab1e75028 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: N1EqvehEhKOwj7eLfVz8yZsKCmYF0vpDCKapJGlqrzL94pq89BaW46ltqKcSOKRHrsoujL3PJNbM1O09LBID/6lAp3W769LNc5VBSthif4T385YUet+d/3lNaNcTOfN2zt1LJqHJD0fL+GPV9vTv2Kazghefj9MT36H9m6y4RclWXSmU7DpWtW9cpPu/Uiax31/ql8QLWpUT3zyUzSxCF/aTZ6rk7WyZftFuX+pxsQwfABw013JZrXjRp5G2aN2dCNJ1VeNskGW0LspCoBzie1tHGGuPChOaohZ4a81aTHyQo2Ex87zTH5BVNkKTNR+6Ds4GFrFJxjieAzKjgwF190Z8WL/23wlxBmi18JzJV8AgSkJcKBd+h/2/vS/EbHnSm3vRgX1krcRTF5s0AQLNb7zzNsfao6a8nvV5pqDVgcU0wWZ1YDdmeXiuxEOzK5wJkQIXX1UMt5i21qqzRWqXLZm4hZRVr1RYrZe7ViJVIetoABhoY//KXnQ8ge2vaXYMUfZSzrYqzrGR9AYB39BWa4EiSejToVv2V5XUZ0a6brU1z1j0FtiM5QGF7fVXVxhfb1ev5F62DXHadXfuB4yZ3I7/BM1sz9ZPk1C3ZasWRigVqaA3iAuQsyza9Jt9ISF4aTNjhiZeIM6DGm0gKNm3eK6k+oHxOVwnwJBtFwbjlQFgu9LOB4M25xP+nHZFlTNoqkDRApeEGSLAVJvNhr202t3YiEBet745wcAe0ryZWGEw6pnNl+ZJK4nEGSyBMfoGVYQeetS9eW9dNct85ItAmTy0vHtpK6JCKs4/YTUpTNG6gZxxq4ObS9MMujeI9bJFrPBnOcY4o0BDqPijpFiS0A== X-Forefront-Antispam-Report: CIP:63.35.35.123;CTRY:IE;LANG:en;SCL:1;SRV:;IPV:CAL;SFV:NSPM;H:64aa7808-outbound-1.mta.getcheckrecipient.com;PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com;CAT:NONE;SFS:(13230022)(4636009)(346002)(39860400002)(136003)(376002)(396003)(451199015)(40470700004)(46966006)(36840700001)(30864003)(26005)(8676002)(6636002)(86362001)(4326008)(47076005)(83380400001)(41300700001)(70206006)(6506007)(7696005)(9686003)(70586007)(5660300002)(336012)(316002)(36860700001)(8936002)(55016003)(6862004)(52536014)(40480700001)(81166007)(82740400003)(356005)(186003)(33656002)(2906002)(40460700003)(84970400001)(478600001)(82310400005)(14773001);DIR:OUT;SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 19 Oct 2022 15:33:56.4453 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: fbbbcaa7-ea09-4d09-5488-08dab1e75613 X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d;Ip=[63.35.35.123];Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: VI1EUR03FT021.eop-EUR03.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: DBBPR08MB6284 X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,BODY_8BITS,DKIM_SIGNED,DKIM_VALID,FORGED_SPF_HELO,GIT_PATCH_0,KAM_DMARC_NONE,KAM_LOTSOFHASH,KAM_SHORT,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_PASS,SPF_NONE,TXREP,UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: ping=0A= =0A= =0A= =0A= Hi Richard,=0A= =0A= >>> Sounds good, but could you put it before the mode version,=0A= >>> to avoid the forward declaration?=0A= >>=0A= >> I can swap them around but the forward declaration is still required as= =0A= >> aarch64_check_bitmask is 5000 lines before aarch64_bitmask_imm.=0A= >=0A= > OK, how about moving them both above aarch64_check_bitmask?=0A= =0A= Sure I've moved them as well as all related helper functions - it makes the= diff=0A= quite large but they are all together now which makes sense. I also refacto= red=0A= aarch64_mov_imm to handle the case of a 64-bit immediate being generated=0A= by a 32-bit MOVZ/MOVN - this simplifies aarch64_internal_move_immediate=0A= and movdi patterns even further.=0A= =0A= Cheers,=0A= Wilco=0A= =0A= v3: move immediate code together and avoid forward declarations,=0A= further cleanups and simplifications.=0A= =0A= Improve immediate expansion of immediates which can be created from a=0A= bitmask immediate and 2 MOVKs.=A0 Simplify, refactor and improve =0A= efficiency of bitmask checks and move immediate. Move various immediate=0A= handling functions together to avoid forward declarations.=0A= Include 32-bit MOVZ/N as valid 64-bit immediates. Add new constraint so=0A= the movdi pattern only needs a single alternative for move immediate.=0A= =0A= This reduces the number of 4-instruction immediates in SPECINT/FP by 10-15%= .=0A= =0A= Passes bootstrap & regress, OK for commit?=0A= =0A= gcc/ChangeLog:=0A= =0A= =A0=A0=A0=A0=A0=A0=A0 PR target/106583=0A= =A0=A0=A0=A0=A0=A0=A0 * config/aarch64/aarch64.cc (aarch64_internal_mov_imm= ediate)=0A= =A0=A0=A0=A0=A0=A0=A0 Add support for a bitmask immediate with 2 MOVKs.=0A= =A0=A0=A0=A0=A0=A0=A0 (aarch64_check_bitmask): New function after refactori= zation.=0A= =A0=A0=A0=A0=A0=A0=A0 (aarch64_replicate_bitmask_imm): Remove function, mer= ge into...=0A= =A0=A0=A0=A0=A0=A0=A0 (aarch64_bitmask_imm): Simplify replication of small = modes.=0A= =A0=A0=A0=A0=A0=A0=A0 Split function into 64-bit only version for efficienc= y.=0A= =A0=A0=A0=A0=A0=A0=A0 (aarch64_zeroextended_move_imm): New function.=0A= =A0=A0=A0=A0=A0=A0=A0 (aarch64_move_imm): Refactor code.=0A= =A0=A0=A0=A0=A0=A0=A0 (aarch64_uimm12_shift): Move near other immediate fun= ctions.=0A= =A0=A0=A0=A0=A0=A0=A0 (aarch64_clamp_to_uimm12_shift): Likewise.=0A= =A0=A0=A0=A0=A0=A0=A0 (aarch64_movk_shift): Likewise.=0A= =A0=A0=A0=A0=A0=A0=A0 (aarch64_replicate_bitmask_imm): Likewise.=0A= =A0=A0=A0=A0=A0=A0=A0 (aarch64_and_split_imm1): Likewise.=0A= =A0=A0=A0=A0=A0=A0=A0 (aarch64_and_split_imm2): Likewise.=0A= =A0=A0=A0=A0=A0=A0=A0 (aarch64_and_bitmask_imm): Likewise.=0A= =A0=A0=A0=A0=A0=A0=A0 (aarch64_movw_imm): Remove.=0A= =A0=A0=A0=A0=A0=A0=A0 * config/aarch64/aarch64.md (movdi_aarch64): Merge 'N= ' and 'M'=0A= =A0=A0=A0=A0=A0=A0=A0 constraints into single 'O'.=0A= =A0=A0=A0=A0=A0=A0=A0 (mov_aarch64): Likewise.=0A= =A0=A0=A0=A0=A0=A0=A0 * config/aarch64/aarch64-protos.h (aarch64_move_imm):= Use unsigned.=0A= =A0=A0=A0=A0=A0=A0=A0 (aarch64_bitmask_imm): Likewise.=0A= =A0=A0=A0=A0=A0=A0=A0 (aarch64_uimm12_shift): Likewise.=0A= =A0=A0=A0=A0=A0=A0=A0 (aarch64_zeroextended_move_imm): New prototype.=0A= =A0=A0=A0=A0=A0=A0=A0 * config/aarch64/constraints.md: Add 'O' for 32/64-bi= t immediates,=0A= =A0=A0=A0=A0=A0=A0=A0 limit 'N' to 64-bit only moves.=0A= =0A= gcc/testsuite:=0A= =A0=A0=A0=A0=A0=A0=A0 PR target/106583=0A= =A0=A0=A0=A0=A0=A0=A0 * gcc.target/aarch64/pr106583.c: Add new test.=0A= =0A= ---=0A= =0A= diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch= 64-protos.h=0A= index 3e4005c9f4ff1f999f1811c6fb0b2252878dc4ae..b82f9ba7c2bb4cffa16abbf45f8= 7061f72015083 100644=0A= --- a/gcc/config/aarch64/aarch64-protos.h=0A= +++ b/gcc/config/aarch64/aarch64-protos.h=0A= @@ -755,7 +755,7 @@ void aarch64_post_cfi_startproc (void);=0A= =A0poly_int64 aarch64_initial_elimination_offset (unsigned, unsigned);=0A= =A0int aarch64_get_condition_code (rtx);=0A= =A0bool aarch64_address_valid_for_prefetch_p (rtx, bool);=0A= -bool aarch64_bitmask_imm (HOST_WIDE_INT val, machine_mode);=0A= +bool aarch64_bitmask_imm (unsigned HOST_WIDE_INT val, machine_mode);=0A= =A0unsigned HOST_WIDE_INT aarch64_and_split_imm1 (HOST_WIDE_INT val_in);=0A= =A0unsigned HOST_WIDE_INT aarch64_and_split_imm2 (HOST_WIDE_INT val_in);=0A= =A0bool aarch64_and_bitmask_imm (unsigned HOST_WIDE_INT val_in, machine_mod= e mode);=0A= @@ -792,7 +792,7 @@ bool aarch64_masks_and_shift_for_bfi_p (scalar_int_mode= , unsigned HOST_WIDE_INT,=0A= =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 unsigned HOST_WIDE_INT,=0A= =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 unsigned HOST_WIDE_INT);=0A= =A0bool aarch64_zero_extend_const_eq (machine_mode, rtx, machine_mode, rtx)= ;=0A= -bool aarch64_move_imm (HOST_WIDE_INT, machine_mode);=0A= +bool aarch64_move_imm (unsigned HOST_WIDE_INT, machine_mode);=0A= =A0machine_mode aarch64_sve_int_mode (machine_mode);=0A= =A0opt_machine_mode aarch64_sve_pred_mode (unsigned int);=0A= =A0machine_mode aarch64_sve_pred_mode (machine_mode);=0A= @@ -842,8 +842,9 @@ bool aarch64_sve_float_arith_immediate_p (rtx, bool);= =0A= =A0bool aarch64_sve_float_mul_immediate_p (rtx);=0A= =A0bool aarch64_split_dimode_const_store (rtx, rtx);=0A= =A0bool aarch64_symbolic_address_p (rtx);=0A= -bool aarch64_uimm12_shift (HOST_WIDE_INT);=0A= +bool aarch64_uimm12_shift (unsigned HOST_WIDE_INT);=0A= =A0int aarch64_movk_shift (const wide_int_ref &, const wide_int_ref &);=0A= +bool aarch64_zeroextended_move_imm (unsigned HOST_WIDE_INT);=0A= =A0bool aarch64_use_return_insn_p (void);=0A= =A0const char *aarch64_output_casesi (rtx *);=0A= =A0=0A= diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc= =0A= index 4de55beb067ea8f0be0a90060a785c94bdee708b..785ec07692981d423582051ac08= 97e5dbc3a001f 100644=0A= --- a/gcc/config/aarch64/aarch64.cc=0A= +++ b/gcc/config/aarch64/aarch64.cc=0A= @@ -305,7 +305,6 @@ static bool aarch64_builtin_support_vector_misalignment= (machine_mode mode,=0A= =A0static machine_mode aarch64_simd_container_mode (scalar_mode, poly_int64= );=0A= =A0static bool aarch64_print_address_internal (FILE*, machine_mode, rtx,=0A= =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 aarch64_addr_quer= y_type);=0A= -static HOST_WIDE_INT aarch64_clamp_to_uimm12_shift (HOST_WIDE_INT val);=0A= =A0=0A= =A0/* The processor for which instructions should be scheduled.=A0 */=0A= =A0enum aarch64_processor aarch64_tune =3D cortexa53;=0A= @@ -5502,6 +5501,142 @@ aarch64_output_sve_vector_inc_dec (const char *oper= ands, rtx x)=0A= =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 factor, nelts_= per_vq);=0A= =A0}=0A= =A0=0A= +/* Multipliers for repeating bitmasks of width 32, 16, 8, 4, and 2.=A0 */= =0A= +=0A= +static const unsigned HOST_WIDE_INT bitmask_imm_mul[] =3D=0A= +=A0 {=0A= +=A0=A0=A0 0x0000000100000001ull,=0A= +=A0=A0=A0 0x0001000100010001ull,=0A= +=A0=A0=A0 0x0101010101010101ull,=0A= +=A0=A0=A0 0x1111111111111111ull,=0A= +=A0=A0=A0 0x5555555555555555ull,=0A= +=A0 };=0A= +=0A= +=0A= +/* Return true if 64-bit VAL is a valid bitmask immediate.=A0 */=0A= +static bool=0A= +aarch64_bitmask_imm (unsigned HOST_WIDE_INT val)=0A= +{=0A= +=A0 unsigned HOST_WIDE_INT tmp, mask, first_one, next_one;=0A= +=A0 int bits;=0A= +=0A= +=A0 /* Check for a single sequence of one bits and return quickly if so.= =0A= +=A0=A0=A0=A0 The special cases of all ones and all zeroes returns false.= =A0 */=0A= +=A0 tmp =3D val + (val & -val);=0A= +=0A= +=A0 if (tmp =3D=3D (tmp & -tmp))=0A= +=A0=A0=A0 return (val + 1) > 1;=0A= +=0A= +=A0 /* Invert if the immediate doesn't start with a zero bit - this means = we=0A= +=A0=A0=A0=A0 only need to search for sequences of one bits.=A0 */=0A= +=A0 if (val & 1)=0A= +=A0=A0=A0 val =3D ~val;=0A= +=0A= +=A0 /* Find the first set bit and set tmp to val with the first sequence o= f one=0A= +=A0=A0=A0=A0 bits removed.=A0 Return success if there is a single sequence= of ones.=A0 */=0A= +=A0 first_one =3D val & -val;=0A= +=A0 tmp =3D val & (val + first_one);=0A= +=0A= +=A0 if (tmp =3D=3D 0)=0A= +=A0=A0=A0 return true;=0A= +=0A= +=A0 /* Find the next set bit and compute the difference in bit position.= =A0 */=0A= +=A0 next_one =3D tmp & -tmp;=0A= +=A0 bits =3D clz_hwi (first_one) - clz_hwi (next_one);=0A= +=A0 mask =3D val ^ tmp;=0A= +=0A= +=A0 /* Check the bit position difference is a power of 2, and that the fir= st=0A= +=A0=A0=A0=A0 sequence of one bits fits within 'bits' bits.=A0 */=0A= +=A0 if ((mask >> bits) !=3D 0 || bits !=3D (bits & -bits))=0A= +=A0=A0=A0 return false;=0A= +=0A= +=A0 /* Check the sequence of one bits is repeated 64/bits times.=A0 */=0A= +=A0 return val =3D=3D mask * bitmask_imm_mul[__builtin_clz (bits) - 26];= =0A= +}=0A= +=0A= +=0A= +/* Return true if VAL is a valid bitmask immediate for any mode.=A0 */=0A= +bool=0A= +aarch64_bitmask_imm (unsigned HOST_WIDE_INT val, machine_mode mode)=0A= +{=0A= +=A0 if (mode =3D=3D DImode)=0A= +=A0=A0=A0 return aarch64_bitmask_imm (val);=0A= +=0A= +=A0 if (mode =3D=3D SImode)=0A= +=A0=A0=A0 return aarch64_bitmask_imm ((val & 0xffffffff) | (val << 32));= =0A= +=0A= +=A0 /* Replicate small immediates to fit 64 bits.=A0 */=0A= +=A0 int size =3D GET_MODE_UNIT_PRECISION (mode);=0A= +=A0 val &=3D (HOST_WIDE_INT_1U << size) - 1;=0A= +=A0 val *=3D bitmask_imm_mul[__builtin_clz (size) - 26];=0A= +=0A= +=A0 return aarch64_bitmask_imm (val);=0A= +}=0A= +=0A= +/* Return true if the immediate VAL can be a bitfield immediate=0A= +=A0=A0 by changing the given MASK bits in VAL to zeroes, ones or bits=0A= +=A0=A0 from the other half of VAL.=A0 Return the new immediate in VAL2.=A0= */=0A= +static inline bool=0A= +aarch64_check_bitmask (unsigned HOST_WIDE_INT val,=0A= +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 unsigned H= OST_WIDE_INT &val2,=0A= +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 unsigned H= OST_WIDE_INT mask)=0A= +{=0A= +=A0 val2 =3D val & ~mask;=0A= +=A0 if (val2 !=3D val && aarch64_bitmask_imm (val2))=0A= +=A0=A0=A0 return true;=0A= +=A0 val2 =3D val | mask;=0A= +=A0 if (val2 !=3D val && aarch64_bitmask_imm (val2))=0A= +=A0=A0=A0 return true;=0A= +=A0 val =3D val & ~mask;=0A= +=A0 val2 =3D val | (((val >> 32) | (val << 32)) & mask);=0A= +=A0 if (val2 !=3D val && aarch64_bitmask_imm (val2))=0A= +=A0=A0=A0 return true;=0A= +=A0 val2 =3D val | (((val >> 16) | (val << 48)) & mask);=0A= +=A0 if (val2 !=3D val && aarch64_bitmask_imm (val2))=0A= +=A0=A0=A0 return true;=0A= +=A0 return false;=0A= +}=0A= +=0A= +/* Return true if immediate VAL can only be created by using a 32-bit=0A= +=A0=A0 zero-extended move immediate, not by a 64-bit move.=A0 */=0A= +bool=0A= +aarch64_zeroextended_move_imm (unsigned HOST_WIDE_INT val)=0A= +{=0A= +=A0 if ((val >> 16) =3D=3D 0 || (val >> 32) !=3D 0 || (val & 0xffff) =3D= =3D 0)=0A= +=A0=A0=A0 return false;=0A= +=A0 return !aarch64_bitmask_imm (val);=0A= +}=0A= +=0A= +/* Return true if VAL is an immediate that can be created by a single=0A= +=A0=A0 MOV instruction.=A0 */=0A= +bool=0A= +aarch64_move_imm (unsigned HOST_WIDE_INT val, machine_mode mode)=0A= +{=0A= +=A0 unsigned HOST_WIDE_INT val2;=0A= +=0A= +=A0 if (val < 65536)=0A= +=A0=A0=A0 return true;=0A= +=0A= +=A0 val2 =3D val ^ ((HOST_WIDE_INT) val >> 63);=0A= +=A0 if ((val2 >> (__builtin_ctzll (val2) & 48)) < 65536)=0A= +=A0=A0=A0 return true;=0A= +=0A= +=A0 /* Special case 0xyyyyffffffffffff. */=0A= +=A0 if (((val2 + 1) << 16) =3D=3D 0)=0A= +=A0=A0=A0 return true;=0A= +=0A= +=A0 /* Special case immediates 0xffffyyyy and 0xyyyyffff.=A0 */=0A= +=A0 val2 =3D (mode =3D=3D DImode) ? val : val2;=0A= +=A0 if (((val2 + 1) & ~(unsigned HOST_WIDE_INT) 0xffff0000) =3D=3D 0=0A= +=A0=A0=A0=A0=A0 || (val2 >> 16) =3D=3D 0xffff)=0A= +=A0=A0=A0 return true;=0A= +=0A= +=A0 if (mode =3D=3D SImode || (val >> 32) =3D=3D 0)=0A= +=A0=A0=A0 val =3D (val & 0xffffffff) | (val << 32);=0A= +=A0 return aarch64_bitmask_imm (val);=0A= +}=0A= +=0A= +=0A= =A0static int=0A= =A0aarch64_internal_mov_immediate (rtx dest, rtx imm, bool generate,=0A= =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0 scalar_int_mode mode)=0A= @@ -5520,31 +5655,6 @@ aarch64_internal_mov_immediate (rtx dest, rtx imm, b= ool generate,=0A= =A0=A0=A0=A0=A0=A0 return 1;=0A= =A0=A0=A0=A0 }=0A= =A0=0A= -=A0 /* Check to see if the low 32 bits are either 0xffffXXXX or 0xXXXXffff= =0A= -=A0=A0=A0=A0 (with XXXX non-zero). In that case check to see if the move c= an be done in=0A= -=A0=A0=A0=A0 a smaller mode.=A0 */=0A= -=A0 val2 =3D val & 0xffffffff;=0A= -=A0 if (mode =3D=3D DImode=0A= -=A0=A0=A0=A0=A0 && aarch64_move_imm (val2, SImode)=0A= -=A0=A0=A0=A0=A0 && (((val >> 32) & 0xffff) =3D=3D 0 || (val >> 48) =3D=3D = 0))=0A= -=A0=A0=A0 {=0A= -=A0=A0=A0=A0=A0 if (generate)=0A= -=A0=A0=A0=A0=A0=A0 emit_insn (gen_rtx_SET (dest, GEN_INT (val2)));=0A= -=0A= -=A0=A0=A0=A0=A0 /* Check if we have to emit a second instruction by checki= ng to see=0A= -=A0=A0=A0=A0=A0=A0=A0=A0 if any of the upper 32 bits of the original DI mo= de value is set.=A0 */=0A= -=A0=A0=A0=A0=A0 if (val =3D=3D val2)=0A= -=A0=A0=A0=A0=A0=A0 return 1;=0A= -=0A= -=A0=A0=A0=A0=A0 i =3D (val >> 48) ? 48 : 32;=0A= -=0A= -=A0=A0=A0=A0=A0 if (generate)=0A= -=A0=A0=A0=A0=A0=A0=A0 emit_insn (gen_insv_immdi (dest, GEN_INT (i),=0A= -=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 GEN_INT ((val >> i) & 0xffff)));=0A= -=0A= -=A0=A0=A0=A0=A0 return 2;=0A= -=A0=A0=A0 }=0A= -=0A= =A0=A0 if ((val >> 32) =3D=3D 0 || mode =3D=3D SImode)=0A= =A0=A0=A0=A0 {=0A= =A0=A0=A0=A0=A0=A0 if (generate)=0A= @@ -5568,26 +5678,20 @@ aarch64_internal_mov_immediate (rtx dest, rtx imm, = bool generate,=0A= =A0=A0 one_match =3D ((~val & mask) =3D=3D 0) + ((~val & (mask << 16)) =3D= =3D 0) +=0A= =A0=A0=A0=A0 ((~val & (mask << 32)) =3D=3D 0) + ((~val & (mask << 48)) =3D= =3D 0);=0A= =A0=0A= -=A0 if (zero_match !=3D 2 && one_match !=3D 2)=0A= +=A0 /* Try a bitmask immediate and a movk to generate the immediate=0A= +=A0=A0=A0=A0 in 2 instructions.=A0 */=0A= +=A0 if (zero_match < 2 && one_match < 2)=0A= =A0=A0=A0=A0 {=0A= -=A0=A0=A0=A0=A0 /* Try emitting a bitmask immediate with a movk replacing = 16 bits.=0A= -=A0=A0=A0=A0=A0=A0=A0 For a 64-bit bitmask try whether changing 16 bits to= all ones or=0A= -=A0=A0=A0=A0=A0=A0=A0 zeroes creates a valid bitmask.=A0 To check any repe= ated bitmask,=0A= -=A0=A0=A0=A0=A0=A0=A0 try using 16 bits from the other 32-bit half of val.= =A0 */=0A= -=0A= -=A0=A0=A0=A0=A0 for (i =3D 0; i < 64; i +=3D 16, mask <<=3D 16)=0A= +=A0=A0=A0=A0=A0 for (i =3D 0; i < 64; i +=3D 16)=0A= =A0=A0=A0=A0=A0=A0=A0=A0 {=0A= -=A0=A0=A0=A0=A0=A0=A0=A0 val2 =3D val & ~mask;=0A= -=A0=A0=A0=A0=A0=A0=A0=A0 if (val2 !=3D val && aarch64_bitmask_imm (val2, m= ode))=0A= -=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 break;=0A= -=A0=A0=A0=A0=A0=A0=A0=A0 val2 =3D val | mask;=0A= -=A0=A0=A0=A0=A0=A0=A0=A0 if (val2 !=3D val && aarch64_bitmask_imm (val2, m= ode))=0A= +=A0=A0=A0=A0=A0=A0=A0=A0 if (aarch64_check_bitmask (val, val2, mask << i))= =0A= =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 break;=0A= -=A0=A0=A0=A0=A0=A0=A0=A0 val2 =3D val2 & ~mask;=0A= -=A0=A0=A0=A0=A0=A0=A0=A0 val2 =3D val2 | (((val2 >> 32) | (val2 << 32)) & = mask);=0A= -=A0=A0=A0=A0=A0=A0=A0=A0 if (val2 !=3D val && aarch64_bitmask_imm (val2, m= ode))=0A= +=0A= +=A0=A0=A0=A0=A0=A0=A0=A0 val2 =3D val & ~(mask << i);=0A= +=A0=A0=A0=A0=A0=A0=A0=A0 if ((val2 >> 32) =3D=3D 0 && aarch64_move_imm (va= l2, DImode))=0A= =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 break;=0A= =A0=A0=A0=A0=A0=A0=A0=A0 }=0A= +=0A= =A0=A0=A0=A0=A0=A0 if (i !=3D 64)=0A= =A0=A0=A0=A0=A0=A0=A0=A0 {=0A= =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 if (generate)=0A= @@ -5600,6 +5704,25 @@ aarch64_internal_mov_immediate (rtx dest, rtx imm, b= ool generate,=0A= =A0=A0=A0=A0=A0=A0=A0=A0 }=0A= =A0=A0=A0=A0 }=0A= =A0=0A= +=A0 /* Try a bitmask plus 2 movk to generate the immediate in 3 instructio= ns.=A0 */=0A= +=A0 if (zero_match + one_match =3D=3D 0)=0A= +=A0=A0=A0 {=0A= +=A0=A0=A0=A0=A0 for (i =3D 0; i < 48; i +=3D 16)=0A= +=A0=A0=A0=A0=A0=A0 for (int j =3D i + 16; j < 64; j +=3D 16)=0A= +=A0=A0=A0=A0=A0=A0=A0=A0 if (aarch64_check_bitmask (val, val2, (mask << i)= | (mask << j)))=0A= +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 {=0A= +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 if (generate)=0A= +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 {=0A= +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 emit_insn (gen_rtx_SET (d= est, GEN_INT (val2)));=0A= +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 emit_insn (gen_insv_immdi= (dest, GEN_INT (i),=0A= +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 GEN_INT ((val >> = i) & 0xffff)));=0A= +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 emit_insn (gen_insv_immdi= (dest, GEN_INT (j),=0A= +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 GEN_INT ((v= al >> j) & 0xffff)));=0A= +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 }=0A= +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 return 3;=0A= +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 }=0A= +=A0=A0=A0 }=0A= +=0A= =A0=A0 /* Generate 2-4 instructions, skipping 16 bits of all zeroes or ones= which=0A= =A0=A0=A0=A0=A0 are emitted by the initial mov.=A0 If one_match > zero_matc= h, skip set bits,=0A= =A0=A0=A0=A0=A0 otherwise skip zero bits.=A0 */=0A= @@ -5643,6 +5766,95 @@ aarch64_mov128_immediate (rtx imm)=0A= =A0=A0=A0=A0=A0=A0=A0=A0=A0 + aarch64_internal_mov_immediate (NULL_RTX, hi,= false, DImode) <=3D 4;=0A= =A0}=0A= =A0=0A= +/* Return true if val can be encoded as a 12-bit unsigned immediate with= =0A= +=A0=A0 a left shift of 0 or 12 bits.=A0 */=0A= +bool=0A= +aarch64_uimm12_shift (unsigned HOST_WIDE_INT val)=0A= +{=0A= +=A0 return val < 4096 || (val & 0xfff000) =3D=3D val;=0A= +}=0A= +=0A= +/* Returns the nearest value to VAL that will fit as a 12-bit unsigned imm= ediate=0A= +=A0=A0 that can be created with a left shift of 0 or 12.=A0 */=0A= +static HOST_WIDE_INT=0A= +aarch64_clamp_to_uimm12_shift (unsigned HOST_WIDE_INT val)=0A= +{=0A= +=A0 /* Check to see if the value fits in 24 bits, as that is the maximum w= e can=0A= +=A0=A0=A0=A0 handle correctly.=A0 */=0A= +=A0 gcc_assert (val < 0x1000000);=0A= +=0A= +=A0 if (val < 4096)=0A= +=A0=A0=A0 return val;=0A= +=0A= +=A0 return val & 0xfff000;=0A= +}=0A= +=0A= +/* Test whether:=0A= +=0A= +=A0=A0=A0=A0 X =3D (X & AND_VAL) | IOR_VAL;=0A= +=0A= +=A0=A0 can be implemented using:=0A= +=0A= +=A0=A0=A0=A0 MOVK X, #(IOR_VAL >> shift), LSL #shift=0A= +=0A= +=A0=A0 Return the shift if so, otherwise return -1.=A0 */=0A= +int=0A= +aarch64_movk_shift (const wide_int_ref &and_val,=0A= +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 const wide_int_ref = &ior_val)=0A= +{=0A= +=A0 unsigned int precision =3D and_val.get_precision ();=0A= +=A0 unsigned HOST_WIDE_INT mask =3D 0xffff;=0A= +=A0 for (unsigned int shift =3D 0; shift < precision; shift +=3D 16)=0A= +=A0=A0=A0 {=0A= +=A0=A0=A0=A0=A0 if (and_val =3D=3D ~mask && (ior_val & mask) =3D=3D ior_va= l)=0A= +=A0=A0=A0=A0=A0=A0 return shift;=0A= +=A0=A0=A0=A0=A0 mask <<=3D 16;=0A= +=A0=A0=A0 }=0A= +=A0 return -1;=0A= +}=0A= +=0A= +/* Create mask of ones, covering the lowest to highest bits set in VAL_IN.= =0A= +=A0=A0 Assumed precondition: VAL_IN Is not zero.=A0 */=0A= +=0A= +unsigned HOST_WIDE_INT=0A= +aarch64_and_split_imm1 (HOST_WIDE_INT val_in)=0A= +{=0A= +=A0 int lowest_bit_set =3D ctz_hwi (val_in);=0A= +=A0 int highest_bit_set =3D floor_log2 (val_in);=0A= +=A0 gcc_assert (val_in !=3D 0);=0A= +=0A= +=A0 return ((HOST_WIDE_INT_UC (2) << highest_bit_set) -=0A= +=A0=A0=A0=A0=A0=A0=A0=A0 (HOST_WIDE_INT_1U << lowest_bit_set));=0A= +}=0A= +=0A= +/* Create constant where bits outside of lowest bit set to highest bit set= =0A= +=A0=A0 are set to 1.=A0 */=0A= +=0A= +unsigned HOST_WIDE_INT=0A= +aarch64_and_split_imm2 (HOST_WIDE_INT val_in)=0A= +{=0A= +=A0 return val_in | ~aarch64_and_split_imm1 (val_in);=0A= +}=0A= +=0A= +/* Return true if VAL_IN is a valid 'and' bitmask immediate.=A0 */=0A= +=0A= +bool=0A= +aarch64_and_bitmask_imm (unsigned HOST_WIDE_INT val_in, machine_mode mode)= =0A= +{=0A= +=A0 scalar_int_mode int_mode;=0A= +=A0 if (!is_a (mode, &int_mode))=0A= +=A0=A0=A0 return false;=0A= +=0A= +=A0 if (aarch64_bitmask_imm (val_in, int_mode))=0A= +=A0=A0=A0 return false;=0A= +=0A= +=A0 if (aarch64_move_imm (val_in, int_mode))=0A= +=A0=A0=A0 return false;=0A= +=0A= +=A0 unsigned HOST_WIDE_INT imm2 =3D aarch64_and_split_imm2 (val_in);=0A= +=0A= +=A0 return aarch64_bitmask_imm (imm2, int_mode);=0A= +}=0A= =A0=0A= =A0/* Return the number of temporary registers that aarch64_add_offset_1=0A= =A0=A0=A0 would need to add OFFSET to a register.=A0 */=0A= @@ -10098,208 +10310,6 @@ aarch64_tls_referenced_p (rtx x)=0A= =A0=A0 return false;=0A= =A0}=0A= =A0=0A= -=0A= -/* Return true if val can be encoded as a 12-bit unsigned immediate with= =0A= -=A0=A0 a left shift of 0 or 12 bits.=A0 */=0A= -bool=0A= -aarch64_uimm12_shift (HOST_WIDE_INT val)=0A= -{=0A= -=A0 return ((val & (((HOST_WIDE_INT) 0xfff) << 0)) =3D=3D val=0A= -=A0=A0=A0=A0=A0=A0=A0=A0 || (val & (((HOST_WIDE_INT) 0xfff) << 12)) =3D=3D= val=0A= -=A0=A0=A0=A0=A0=A0=A0=A0 );=0A= -}=0A= -=0A= -/* Returns the nearest value to VAL that will fit as a 12-bit unsigned imm= ediate=0A= -=A0=A0 that can be created with a left shift of 0 or 12.=A0 */=0A= -static HOST_WIDE_INT=0A= -aarch64_clamp_to_uimm12_shift (HOST_WIDE_INT val)=0A= -{=0A= -=A0 /* Check to see if the value fits in 24 bits, as that is the maximum w= e can=0A= -=A0=A0=A0=A0 handle correctly.=A0 */=0A= -=A0 gcc_assert ((val & 0xffffff) =3D=3D val);=0A= -=0A= -=A0 if (((val & 0xfff) << 0) =3D=3D val)=0A= -=A0=A0=A0 return val;=0A= -=0A= -=A0 return val & (0xfff << 12);=0A= -}=0A= -=0A= -/* Return true if val is an immediate that can be loaded into a=0A= -=A0=A0 register by a MOVZ instruction.=A0 */=0A= -static bool=0A= -aarch64_movw_imm (HOST_WIDE_INT val, scalar_int_mode mode)=0A= -{=0A= -=A0 if (GET_MODE_SIZE (mode) > 4)=0A= -=A0=A0=A0 {=0A= -=A0=A0=A0=A0=A0 if ((val & (((HOST_WIDE_INT) 0xffff) << 32)) =3D=3D val=0A= -=A0=A0=A0=A0=A0=A0=A0=A0 || (val & (((HOST_WIDE_INT) 0xffff) << 48)) =3D= =3D val)=0A= -=A0=A0=A0=A0=A0=A0 return 1;=0A= -=A0=A0=A0 }=0A= -=A0 else=0A= -=A0=A0=A0 {=0A= -=A0=A0=A0=A0=A0 /* Ignore sign extension.=A0 */=0A= -=A0=A0=A0=A0=A0 val &=3D (HOST_WIDE_INT) 0xffffffff;=0A= -=A0=A0=A0 }=0A= -=A0 return ((val & (((HOST_WIDE_INT) 0xffff) << 0)) =3D=3D val=0A= -=A0=A0=A0=A0=A0=A0=A0=A0 || (val & (((HOST_WIDE_INT) 0xffff) << 16)) =3D= =3D val);=0A= -}=0A= -=0A= -/* Test whether:=0A= -=0A= -=A0=A0=A0=A0 X =3D (X & AND_VAL) | IOR_VAL;=0A= -=0A= -=A0=A0 can be implemented using:=0A= -=0A= -=A0=A0=A0=A0 MOVK X, #(IOR_VAL >> shift), LSL #shift=0A= -=0A= -=A0=A0 Return the shift if so, otherwise return -1.=A0 */=0A= -int=0A= -aarch64_movk_shift (const wide_int_ref &and_val,=0A= -=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 const wide_int_ref = &ior_val)=0A= -{=0A= -=A0 unsigned int precision =3D and_val.get_precision ();=0A= -=A0 unsigned HOST_WIDE_INT mask =3D 0xffff;=0A= -=A0 for (unsigned int shift =3D 0; shift < precision; shift +=3D 16)=0A= -=A0=A0=A0 {=0A= -=A0=A0=A0=A0=A0 if (and_val =3D=3D ~mask && (ior_val & mask) =3D=3D ior_va= l)=0A= -=A0=A0=A0=A0=A0=A0 return shift;=0A= -=A0=A0=A0=A0=A0 mask <<=3D 16;=0A= -=A0=A0=A0 }=0A= -=A0 return -1;=0A= -}=0A= -=0A= -/* VAL is a value with the inner mode of MODE.=A0 Replicate it to fill a= =0A= -=A0=A0 64-bit (DImode) integer.=A0 */=0A= -=0A= -static unsigned HOST_WIDE_INT=0A= -aarch64_replicate_bitmask_imm (unsigned HOST_WIDE_INT val, machine_mode mo= de)=0A= -{=0A= -=A0 unsigned int size =3D GET_MODE_UNIT_PRECISION (mode);=0A= -=A0 while (size < 64)=0A= -=A0=A0=A0 {=0A= -=A0=A0=A0=A0=A0 val &=3D (HOST_WIDE_INT_1U << size) - 1;=0A= -=A0=A0=A0=A0=A0 val |=3D val << size;=0A= -=A0=A0=A0=A0=A0 size *=3D 2;=0A= -=A0=A0=A0 }=0A= -=A0 return val;=0A= -}=0A= -=0A= -/* Multipliers for repeating bitmasks of width 32, 16, 8, 4, and 2.=A0 */= =0A= -=0A= -static const unsigned HOST_WIDE_INT bitmask_imm_mul[] =3D=0A= -=A0 {=0A= -=A0=A0=A0 0x0000000100000001ull,=0A= -=A0=A0=A0 0x0001000100010001ull,=0A= -=A0=A0=A0 0x0101010101010101ull,=0A= -=A0=A0=A0 0x1111111111111111ull,=0A= -=A0=A0=A0 0x5555555555555555ull,=0A= -=A0 };=0A= -=0A= -=0A= -/* Return true if val is a valid bitmask immediate.=A0 */=0A= -=0A= -bool=0A= -aarch64_bitmask_imm (HOST_WIDE_INT val_in, machine_mode mode)=0A= -{=0A= -=A0 unsigned HOST_WIDE_INT val, tmp, mask, first_one, next_one;=0A= -=A0 int bits;=0A= -=0A= -=A0 /* Check for a single sequence of one bits and return quickly if so.= =0A= -=A0=A0=A0=A0 The special cases of all ones and all zeroes returns false.= =A0 */=0A= -=A0 val =3D aarch64_replicate_bitmask_imm (val_in, mode);=0A= -=A0 tmp =3D val + (val & -val);=0A= -=0A= -=A0 if (tmp =3D=3D (tmp & -tmp))=0A= -=A0=A0=A0 return (val + 1) > 1;=0A= -=0A= -=A0 /* Replicate 32-bit immediates so we can treat them as 64-bit.=A0 */= =0A= -=A0 if (mode =3D=3D SImode)=0A= -=A0=A0=A0 val =3D (val << 32) | (val & 0xffffffff);=0A= -=0A= -=A0 /* Invert if the immediate doesn't start with a zero bit - this means = we=0A= -=A0=A0=A0=A0 only need to search for sequences of one bits.=A0 */=0A= -=A0 if (val & 1)=0A= -=A0=A0=A0 val =3D ~val;=0A= -=0A= -=A0 /* Find the first set bit and set tmp to val with the first sequence o= f one=0A= -=A0=A0=A0=A0 bits removed.=A0 Return success if there is a single sequence= of ones.=A0 */=0A= -=A0 first_one =3D val & -val;=0A= -=A0 tmp =3D val & (val + first_one);=0A= -=0A= -=A0 if (tmp =3D=3D 0)=0A= -=A0=A0=A0 return true;=0A= -=0A= -=A0 /* Find the next set bit and compute the difference in bit position.= =A0 */=0A= -=A0 next_one =3D tmp & -tmp;=0A= -=A0 bits =3D clz_hwi (first_one) - clz_hwi (next_one);=0A= -=A0 mask =3D val ^ tmp;=0A= -=0A= -=A0 /* Check the bit position difference is a power of 2, and that the fir= st=0A= -=A0=A0=A0=A0 sequence of one bits fits within 'bits' bits.=A0 */=0A= -=A0 if ((mask >> bits) !=3D 0 || bits !=3D (bits & -bits))=0A= -=A0=A0=A0 return false;=0A= -=0A= -=A0 /* Check the sequence of one bits is repeated 64/bits times.=A0 */=0A= -=A0 return val =3D=3D mask * bitmask_imm_mul[__builtin_clz (bits) - 26];= =0A= -}=0A= -=0A= -/* Create mask of ones, covering the lowest to highest bits set in VAL_IN.= =A0 =0A= -=A0=A0 Assumed precondition: VAL_IN Is not zero.=A0 */=0A= -=0A= -unsigned HOST_WIDE_INT=0A= -aarch64_and_split_imm1 (HOST_WIDE_INT val_in)=0A= -{=0A= -=A0 int lowest_bit_set =3D ctz_hwi (val_in);=0A= -=A0 int highest_bit_set =3D floor_log2 (val_in);=0A= -=A0 gcc_assert (val_in !=3D 0);=0A= -=0A= -=A0 return ((HOST_WIDE_INT_UC (2) << highest_bit_set) -=0A= -=A0=A0=A0=A0=A0=A0=A0=A0 (HOST_WIDE_INT_1U << lowest_bit_set));=0A= -}=0A= -=0A= -/* Create constant where bits outside of lowest bit set to highest bit set= =0A= -=A0=A0 are set to 1.=A0 */=0A= -=0A= -unsigned HOST_WIDE_INT=0A= -aarch64_and_split_imm2 (HOST_WIDE_INT val_in)=0A= -{=0A= -=A0 return val_in | ~aarch64_and_split_imm1 (val_in);=0A= -}=0A= -=0A= -/* Return true if VAL_IN is a valid 'and' bitmask immediate.=A0 */=0A= -=0A= -bool=0A= -aarch64_and_bitmask_imm (unsigned HOST_WIDE_INT val_in, machine_mode mode)= =0A= -{=0A= -=A0 scalar_int_mode int_mode;=0A= -=A0 if (!is_a (mode, &int_mode))=0A= -=A0=A0=A0 return false;=0A= -=0A= -=A0 if (aarch64_bitmask_imm (val_in, int_mode))=0A= -=A0=A0=A0 return false;=0A= -=0A= -=A0 if (aarch64_move_imm (val_in, int_mode))=0A= -=A0=A0=A0 return false;=0A= -=0A= -=A0 unsigned HOST_WIDE_INT imm2 =3D aarch64_and_split_imm2 (val_in);=0A= -=0A= -=A0 return aarch64_bitmask_imm (imm2, int_mode);=0A= -}=0A= -=0A= -/* Return true if val is an immediate that can be loaded into a=0A= -=A0=A0 register in a single instruction.=A0 */=0A= -bool=0A= -aarch64_move_imm (HOST_WIDE_INT val, machine_mode mode)=0A= -{=0A= -=A0 scalar_int_mode int_mode;=0A= -=A0 if (!is_a (mode, &int_mode))=0A= -=A0=A0=A0 return false;=0A= -=0A= -=A0 if (aarch64_movw_imm (val, int_mode) || aarch64_movw_imm (~val, int_mo= de))=0A= -=A0=A0=A0 return 1;=0A= -=A0 return aarch64_bitmask_imm (val, int_mode);=0A= -}=0A= -=0A= =A0static bool=0A= =A0aarch64_cannot_force_const_mem (machine_mode mode ATTRIBUTE_UNUSED, rtx = x)=0A= =A0{=0A= diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md= =0A= index 0a7633e5dd6d45282edd7a1088c14b555bc09b40..23ceca48543d23b85beea1f0bf9= 8ef83051d80b6 100644=0A= --- a/gcc/config/aarch64/aarch64.md=0A= +++ b/gcc/config/aarch64/aarch64.md=0A= @@ -1309,16 +1309,15 @@ (define_insn_and_split "*movsi_aarch64"=0A= =A0)=0A= =A0=0A= =A0(define_insn_and_split "*movdi_aarch64"=0A= -=A0 [(set (match_operand:DI 0 "nonimmediate_operand" "=3Dr,k,r,r,r,r,r, r,= w, m,m,=A0=A0 r,=A0 r,=A0 r, w,r,w, w")=0A= -=A0=A0=A0=A0=A0=A0 (match_operand:DI 1 "aarch64_mov_operand"=A0 " r,r,k,N,= M,n,Usv,m,m,rZ,w,Usw,Usa,Ush,rZ,w,w,Dd"))]=0A= +=A0 [(set (match_operand:DI 0 "nonimmediate_operand" "=3Dr,k,r,r,r,r, r,w,= m,m,=A0=A0 r,=A0 r,=A0 r, w,r,w, w")=0A= +=A0=A0=A0=A0=A0=A0 (match_operand:DI 1 "aarch64_mov_operand"=A0 " r,r,k,O,= n,Usv,m,m,rZ,w,Usw,Usa,Ush,rZ,w,w,Dd"))]=0A= =A0=A0 "(register_operand (operands[0], DImode)=0A= =A0=A0=A0=A0 || aarch64_reg_or_zero (operands[1], DImode))"=0A= =A0=A0 "@=0A= =A0=A0=A0 mov\\t%x0, %x1=0A= =A0=A0=A0 mov\\t%0, %x1=0A= =A0=A0=A0 mov\\t%x0, %1=0A= -=A0=A0 mov\\t%x0, %1=0A= -=A0=A0 mov\\t%w0, %1=0A= +=A0=A0 * return aarch64_zeroextended_move_imm (INTVAL (operands[1])) ? \"m= ov\\t%w0, %1\" : \"mov\\t%x0, %1\";=0A= =A0=A0=A0 #=0A= =A0=A0=A0 * return aarch64_output_sve_cnt_immediate (\"cnt\", \"%x0\", oper= ands[1]);=0A= =A0=A0=A0 ldr\\t%x0, %1=0A= @@ -1340,11 +1339,11 @@ (define_insn_and_split "*movdi_aarch64"=0A= =A0=A0=A0=A0=A0=A0=A0 DONE;=0A= =A0=A0=A0=A0 }"=0A= =A0=A0 ;; The "mov_imm" type for CNTD is just a placeholder.=0A= -=A0 [(set_attr "type" "mov_reg,mov_reg,mov_reg,mov_imm,mov_imm,mov_imm,mov= _imm,=0A= +=A0 [(set_attr "type" "mov_reg,mov_reg,mov_reg,mov_imm,mov_imm,mov_imm,=0A= =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 load_8,load= _8,store_8,store_8,load_8,adr,adr,f_mcr,f_mrc,=0A= =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 fmov,neon_m= ove")=0A= -=A0=A0 (set_attr "arch"=A0=A0 "*,*,*,*,*,*,sve,*,fp,*,fp,*,*,*,fp,fp,fp,si= md")=0A= -=A0=A0 (set_attr "length" "4,4,4,4,4,*,=A0 4,4, 4,4, 4,8,4,4, 4, 4, 4,=A0= =A0 4")]=0A= +=A0=A0 (set_attr "arch"=A0=A0 "*,*,*,*,*,sve,*,fp,*,fp,*,*,*,fp,fp,fp,simd= ")=0A= +=A0=A0 (set_attr "length" "4,4,4,4,*,=A0 4,4, 4,4, 4,8,4,4, 4, 4, 4,=A0=A0= 4")]=0A= =A0)=0A= =A0=0A= =A0(define_insn "insv_imm"=0A= @@ -1508,7 +1507,7 @@ (define_insn "*mov_aarch64"=0A= =A0=0A= =A0(define_insn "*mov_aarch64"=0A= =A0=A0 [(set (match_operand:DFD 0 "nonimmediate_operand" "=3Dw, w=A0 ,?r,w,= w=A0 ,w=A0 ,w,m,r,m ,r,r")=0A= -=A0=A0=A0=A0=A0=A0 (match_operand:DFD 1 "general_operand"=A0=A0=A0=A0=A0 "= Y , ?rY, w,w,Ufc,Uvi,m,w,m,rY,r,N"))]=0A= +=A0=A0=A0=A0=A0=A0 (match_operand:DFD 1 "general_operand"=A0=A0=A0=A0=A0 "= Y , ?rY, w,w,Ufc,Uvi,m,w,m,rY,r,O"))]=0A= =A0=A0 "TARGET_FLOAT && (register_operand (operands[0], mode)=0A= =A0=A0=A0=A0 || aarch64_reg_or_fp_zero (operands[1], mode))"=0A= =A0=A0 "@=0A= @@ -1523,7 +1522,7 @@ (define_insn "*mov_aarch64"=0A= =A0=A0=A0 ldr\\t%x0, %1=0A= =A0=A0=A0 str\\t%x1, %0=0A= =A0=A0=A0 mov\\t%x0, %x1=0A= -=A0=A0 mov\\t%x0, %1"=0A= +=A0=A0 * return aarch64_zeroextended_move_imm (INTVAL (operands[1])) ? \"m= ov\\t%w0, %1\" : \"mov\\t%x0, %1\";"=0A= =A0=A0 [(set_attr "type" "neon_move,f_mcr,f_mrc,fmov,fconstd,neon_move,\=0A= =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 f_loadd,f_s= tored,load_8,store_8,mov_reg,\=0A= =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 fconstd")= =0A= diff --git a/gcc/config/aarch64/constraints.md b/gcc/config/aarch64/constra= ints.md=0A= index ee7587cca1673208e2bfd6b503a21d0c8b69bf75..e91c7eab0b3674ca34ac2f790c3= 8fcd27986c35f 100644=0A= --- a/gcc/config/aarch64/constraints.md=0A= +++ b/gcc/config/aarch64/constraints.md=0A= @@ -106,6 +106,12 @@ (define_constraint "M"=0A= =A0=0A= =A0(define_constraint "N"=0A= =A0 "A constant that can be used with a 64-bit MOV immediate operation."=0A= + (and (match_code "const_int")=0A= +=A0=A0=A0=A0=A0 (match_test "aarch64_move_imm (ival, DImode)")=0A= +=A0=A0=A0=A0=A0 (match_test "!aarch64_zeroextended_move_imm (ival)")))=0A= +=0A= +(define_constraint "O"=0A= + "A constant that can be used with a 32 or 64-bit MOV immediate operation.= "=0A= =A0 (and (match_code "const_int")=0A= =A0=A0=A0=A0=A0=A0 (match_test "aarch64_move_imm (ival, DImode)")))=0A= =A0=0A= diff --git a/gcc/testsuite/gcc.target/aarch64/pr106583.c b/gcc/testsuite/gc= c.target/aarch64/pr106583.c=0A= new file mode 100644=0A= index 0000000000000000000000000000000000000000..0f931580817d78dc1cc58f03b25= 1bd21bec71f59=0A= --- /dev/null=0A= +++ b/gcc/testsuite/gcc.target/aarch64/pr106583.c=0A= @@ -0,0 +1,41 @@=0A= +/* { dg-do assemble } */=0A= +/* { dg-options "-O2 --save-temps" } */=0A= +=0A= +long f1 (void)=0A= +{=0A= +=A0 return 0x7efefefefefefeff;=0A= +}=0A= +=0A= +long f2 (void)=0A= +{=0A= +=A0 return 0x12345678aaaaaaaa;=0A= +}=0A= +=0A= +long f3 (void)=0A= +{=0A= +=A0 return 0x1234cccccccc5678;=0A= +}=0A= +=0A= +long f4 (void)=0A= +{=0A= +=A0 return 0x7777123456787777;=0A= +}=0A= +=0A= +long f5 (void)=0A= +{=0A= +=A0 return 0x5555555512345678;=0A= +}=0A= +=0A= +long f6 (void)=0A= +{=0A= +=A0 return 0x1234bbbb5678bbbb;=0A= +}=0A= +=0A= +long f7 (void)=0A= +{=0A= +=A0 return 0x4444123444445678;=0A= +}=0A= +=0A= +=0A= +/* { dg-final { scan-assembler-times {\tmovk\t} 14 } } */=0A= +/* { dg-final { scan-assembler-times {\tmov\t} 7 } } */=0A=