From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from EUR04-DB3-obe.outbound.protection.outlook.com (mail-eopbgr60062.outbound.protection.outlook.com [40.107.6.62]) by sourceware.org (Postfix) with ESMTPS id C53FB3851148 for ; Wed, 12 Oct 2022 14:57:15 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org C53FB3851148 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com ARC-Seal: i=2; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=pass; b=cdhNPD/jUswu2qteJEO7Q0Bp/thHNG3ZvmdNS8AgX4KOXDzu8mqEsrHPqCK8wtxWK/1GFyfizJ4X+t5dASkcvmZCGy6TthHLz2VsGIMEY/Y4ZiCa4fZtO9IDtJHY6a0syHd+zpxpzEX+2mR9LG5cXFWgyAc+BcRLEduq1s4T11P1SS60ICKHnk6s7d2PnwgSa9R4VsIGqrtZMKSQKwS/RcMlV9bRdxC1Hi4Uo2JD5ddDEBgS/MbPKmJ8gTLhaJC1pLSa0F7rph4skkyo1mH+5jSXaSOhvQ2cJ2Q5CZCNgOXwHa7s9l3f+CVadVBDy+v/npwl7k4AY3TkVF6RUc3aQg== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=NcJqrd8o5EWsDzfgCMK5VQ7hAJ9Re9j004RXDEA/qdw=; b=HdT9J/x5xH8k97Nz4Mr5a6eUZYg79RVWLOWQBhG93AzotGOpEiE4ffVWc9t91a/YEj8c1sfqLyWY4g9/tEiUvF48NrK9UiHZ/+HFcNRw8RSf0aZR3s4gLUnxc6F8hWUknmpfC/W+WSOhSvqpr4yzRVtNO8TFbvaLM++RJM/rV0Xh2q87YszLnHJjV1lmQvJDewdyE4UXoQx4ImG6LIiJXCRBbfia6FrqKghnICko+iD8x7NJTrSStAQA6ajsIhKCMqgffmEt+5fJvMrKcz969jpwiamXbpeFqfATZuP+JqIRlkdlOX3Y5qumYkfaLfeuxs7ksLiut18m2/9A1Bge2g== ARC-Authentication-Results: i=2; mx.microsoft.com 1; spf=pass (sender ip is 63.35.35.123) smtp.rcpttodomain=gcc.gnu.org smtp.mailfrom=arm.com; dmarc=pass (p=none sp=none pct=100) action=none header.from=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com; arc=pass (0 oda=1 ltdi=1 spf=[1,1,smtp.mailfrom=arm.com] dkim=[1,1,header.d=arm.com] dmarc=[1,1,header.from=arm.com]) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=NcJqrd8o5EWsDzfgCMK5VQ7hAJ9Re9j004RXDEA/qdw=; b=8S12L0JVK5OcudUJHOKc/+jL4KKBKRVpiLGYowgkRO9y/XHKT7Zzye1i8FQSBBP0qXs3WqtINgbI7RMLfeBH9sV7WoZqCUbni5FJ8iIr9Rsj83RY97CD9sATb8yyNCfWB3yoR5AzQgNAyZKgGvnDcu/OtfeUepMHNcA7hgnpd78= Received: from DU2PR04CA0067.eurprd04.prod.outlook.com (2603:10a6:10:232::12) by PA4PR08MB5935.eurprd08.prod.outlook.com (2603:10a6:102:e4::20) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5709.21; Wed, 12 Oct 2022 14:57:10 +0000 Received: from DBAEUR03FT057.eop-EUR03.prod.protection.outlook.com (2603:10a6:10:232:cafe::be) by DU2PR04CA0067.outlook.office365.com (2603:10a6:10:232::12) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5709.15 via Frontend Transport; Wed, 12 Oct 2022 14:57:10 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; pr=C Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by DBAEUR03FT057.mail.protection.outlook.com (100.127.142.182) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5709.10 via Frontend Transport; Wed, 12 Oct 2022 14:57:09 +0000 Received: ("Tessian outbound 86cf7f935b1b:v128"); Wed, 12 Oct 2022 14:57:09 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: 956b84d9c8df2b06 X-CR-MTA-TID: 64aa7808 Received: from 4260250c86fb.1 by 64aa7808-outbound-1.mta.getcheckrecipient.com id 38E2095F-6756-4777-9300-5FAF765CE543.1; Wed, 12 Oct 2022 14:57:02 +0000 Received: from EUR04-DB3-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id 4260250c86fb.1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Wed, 12 Oct 2022 14:57:02 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=fr1k8GLfhKaTAkrkTICoerhnUWjkNSNVDIEJ8vMSEQ6sIxw5rodXtAp1884e+BNRmxjeRfJwfbj3HE1sq4d/cdqK9ndj+pRJF1OLI055Cb2G0rIMt/Sp/h9ITUs2hZWz8VuCra8/m7WL7cEpcEc7R5yOZgCEC82STUUK86LuKQWXCT+9R66Pd4wz+SfD+9jd1hbswcRObzZIilIY7lFRTUkv2DCJkAgL6QtTRsXiUsyzYW3q7j48VrpwhLJ2K6wrg2kyp1/QKDx7HzH4Gro2V+9+VeoUFfn0RWWEf1/bwl1EgVhnk0PJ88qn5tP7nN7ek13H2pj9grnJ6zObKPHpqA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=NcJqrd8o5EWsDzfgCMK5VQ7hAJ9Re9j004RXDEA/qdw=; b=PcVmm+Z4m1Xkq+J5ORw04Sn2lsR3gN0RLotcgfpiJ84PxIHS4/iuc3GFKfeOnPlfIZc5MHxqDQY7oIujaIjn9X5PmqaImcRli9f9TLw7hgQ+LieWeNQrhIGzITKLH12oSTKmiqJj/McrRbvTMPHd1/mLB8zCiYDz+kqgEREsNmNAri5Z350i+LAbHgLuCbCj522e32Id/bHyjMDUDsRxeuHFmUY0wEtq/XYz58jaeB0D8ZKsL8O8J5OnMKaOQsCvb9GgojHibIAr1nXJ0ZlNIYJ4WRzr5qIHf6wX3vwQmJ+VMn4ZPZDg4EClDg4FHcPTDYwQ69/LPADYR5zQppBSig== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=NcJqrd8o5EWsDzfgCMK5VQ7hAJ9Re9j004RXDEA/qdw=; b=8S12L0JVK5OcudUJHOKc/+jL4KKBKRVpiLGYowgkRO9y/XHKT7Zzye1i8FQSBBP0qXs3WqtINgbI7RMLfeBH9sV7WoZqCUbni5FJ8iIr9Rsj83RY97CD9sATb8yyNCfWB3yoR5AzQgNAyZKgGvnDcu/OtfeUepMHNcA7hgnpd78= Received: from AS4PR08MB7901.eurprd08.prod.outlook.com (2603:10a6:20b:51c::16) by AS8PR08MB9816.eurprd08.prod.outlook.com (2603:10a6:20b:613::15) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5709.21; Wed, 12 Oct 2022 14:57:00 +0000 Received: from AS4PR08MB7901.eurprd08.prod.outlook.com ([fe80::3991:ebed:c15b:de1e]) by AS4PR08MB7901.eurprd08.prod.outlook.com ([fe80::3991:ebed:c15b:de1e%5]) with mapi id 15.20.5709.015; Wed, 12 Oct 2022 14:57:00 +0000 From: Wilco Dijkstra To: Richard Sandiford CC: Wilco Dijkstra via Gcc-patches Subject: Re: [PATCH][AArch64] Improve immediate expansion [PR106583] Thread-Topic: [PATCH][AArch64] Improve immediate expansion [PR106583] Thread-Index: AQHY2A60YLzBQMfh40K/8+uaM+WmL63/fNejgAIgoBGAAVDLSYAABmEYgAATQ+2AB9iuAw== Date: Wed, 12 Oct 2022 14:57:00 +0000 Message-ID: References: In-Reply-To: Accept-Language: en-GB, en-US Content-Language: en-GB X-MS-Has-Attach: X-MS-TNEF-Correlator: msip_labels: Authentication-Results-Original: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; x-ms-traffictypediagnostic: AS4PR08MB7901:EE_|AS8PR08MB9816:EE_|DBAEUR03FT057:EE_|PA4PR08MB5935:EE_ X-MS-Office365-Filtering-Correlation-Id: e77bf5d4-560e-442c-fbad-08daac6209ec x-checkrecipientrouted: true nodisclaimer: true X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: sqdbwI/R5cHxWO2jJ6Id5YPHN/blwEFs/IQQsP2BSrOrhHpzsSjnxmarkHAt2+2BtG1KHftGP9/Rd4cGD3EXTTE1wDY/lzrFWO7S8pGi1cw6RRPs9qKbh147O6RES6u3hF/dqe+06fw5ox72gqHeFddnMCyUyyM/ptnJB29y0sM8xQ2CrrxsHkZML7FvgRkkF30SusR/Psclx7v10/mV8wb21vLwv/u8rLRmD14aMjIp5XHdaMwuQInaCYI962naq7xtdqwNgWad2+xJM6Gx34jZ7ZTuVsDZpPd8yGhma5qe4NbWss4Kkz0Nku7EgLkANKb/MHBjSOP+IqVXBjR7jUFj9QQQgcE435syXxEx/GVfPpkigUIxuyPKI2/Fu94rLfmjAPTktuOBo4kqTUZoQOuu82BYXZ186VR4nhxkYLaQ9iK0x2VE77cgyVhy3MP/6sV+GqFomW2ez5hgewokUKc3th464aMd3/9ex5U4+Sc+cFrycGJAm4cCOetypapGS1LUaIer8EZcKNq6LWm/3fGa77f7JfN2VSjU0treBGQIsykTlqmFg5YQlI1dJ1UpCFMHZrCThsa30mMR0G9Yy5lJQKepOnbdiVeOhJfe2m8YFRlmeExYjeEWUzG85bAWwkalHLHLx+Cw1zWksOkQkOzi11Rvzzb+kfXldJwPR0lymPcNtu0eL8m5CxnFQzyBN9XN4ILKvGieH1BrkHorKQk2MMVyRIq2y7Vk2b/DFjgcFpeXQFpxON006yqUmQhahXppDvvOaG0RvGLJtHR2oM8Hn9XXJzl91EJ06XIDzZ/qYtN+TNqgK/yKJEEiFNtM X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:AS4PR08MB7901.eurprd08.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230022)(4636009)(366004)(376002)(39860400002)(136003)(396003)(346002)(451199015)(86362001)(38100700002)(122000001)(38070700005)(91956017)(7696005)(6506007)(9686003)(41300700001)(4326008)(66556008)(66946007)(76116006)(64756008)(66446008)(26005)(478600001)(8676002)(66476007)(71200400001)(316002)(6636002)(2906002)(52536014)(5660300002)(186003)(8936002)(6862004)(30864003)(83380400001)(33656002)(55016003)(84970400001)(14773001)(579004)(559001);DIR:OUT;SFP:1101; Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-MS-Exchange-Transport-CrossTenantHeadersStamped: AS8PR08MB9816 Original-Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: DBAEUR03FT057.eop-EUR03.prod.protection.outlook.com X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id-Prvs: 3b9275b2-bd13-4bd8-1752-08daac620482 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: zdkdN0+QerORMi2SEKiLzcf9a19zwYW4fjRjrq4nvoqz3Pif33MgDLEJIqa5b/FQxljZN1Fvz5L0S2UJfcq38ZdNe1VrZgSVXfuyD4hdCcSVQbEmVR/8EtpGrENrSY/xwilMPek2t5EE/UXttsFZ/hlTz8598L5xZFQUT4uOiBkgyas5Ppk80PyKUQFWWG3iWCPZJ3YJqf+xQ/+25IAujfWOJHRAWIqAV7BsvIE7dZ7+5NBkIDze63mNrJCNpwmTGKTdlbBs0pFpd3VHbL/AzJvhdsHWEp6numVZibOKmtnsKPNXMErH8Ta+7vQFwaSFtqQs557hUWz7gnvG20tS9KpLXWT8vo0xQ0khfV9bZJ6dZMV5Zh9tYUV20LFXjQkxKDWbOrNEfiaIkoIJq106GGz1qkpU6kWE1c48Ua1dLaRj8MJB4AD+Hgj2O/C5ePxo+UlreGB7bVudC+b3z3WEYNyVAIB0W8H7RL7dgccD2S/dq33spIYK61E8W41NpK4zp0bjydgznfX62VS/go+Yvrkg+HVL/cq172FP/0e1SrnHOW9tDSEWbpHJKe2WRv5dgVlN3x7ibVaZPcfSRK4K+DdajVxDt7RvtF9kopUKwmoLCdr7Jl5EnJSuJToJg1vT+7rrg6T3fckvaRXWb4qpwMsbeEUDMJCGfYY6YZq19dNIFH3/nCLX3WgBKFyEesCmN7B9TOauznNLDHf4m75rTmsJ2ZnYzyqneW6ea8ABFrTjOylsnU1612Xf+/68Z5ii1nCoCKKqD9PQRvAd1MTcoqTrLXpcRZUQQbgSrHEoeADLIZLfcCW7pOOcnOZdeEjCIRUN1ygSWFQVK79Tx+JO3A== X-Forefront-Antispam-Report: CIP:63.35.35.123;CTRY:IE;LANG:en;SCL:1;SRV:;IPV:CAL;SFV:NSPM;H:64aa7808-outbound-1.mta.getcheckrecipient.com;PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com;CAT:NONE;SFS:(13230022)(4636009)(39860400002)(346002)(136003)(396003)(376002)(451199015)(46966006)(40470700004)(36840700001)(81166007)(356005)(82740400003)(336012)(86362001)(40460700003)(7696005)(9686003)(6506007)(70206006)(70586007)(8676002)(478600001)(26005)(4326008)(6636002)(316002)(2906002)(41300700001)(36860700001)(6862004)(186003)(5660300002)(52536014)(83380400001)(30864003)(8936002)(47076005)(33656002)(55016003)(82310400005)(84970400001)(40480700001)(14773001);DIR:OUT;SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 12 Oct 2022 14:57:09.9488 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: e77bf5d4-560e-442c-fbad-08daac6209ec X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d;Ip=[63.35.35.123];Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: DBAEUR03FT057.eop-EUR03.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: PA4PR08MB5935 X-Spam-Status: No, score=-11.1 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,FORGED_SPF_HELO,GIT_PATCH_0,KAM_DMARC_NONE,KAM_LOTSOFHASH,KAM_SHORT,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_PASS,SPF_NONE,TXREP,UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Hi Richard,=0A= =0A= >>> Sounds good, but could you put it before the mode version,=0A= >>> to avoid the forward declaration?=0A= >>=0A= >> I can swap them around but the forward declaration is still required as= =0A= >> aarch64_check_bitmask is 5000 lines before aarch64_bitmask_imm.=0A= >=0A= > OK, how about moving them both above aarch64_check_bitmask?=0A= =0A= Sure I've moved them as well as all related helper functions - it makes the= diff=0A= quite large but they are all together now which makes sense. I also refacto= red=0A= aarch64_mov_imm to handle the case of a 64-bit immediate being generated=0A= by a 32-bit MOVZ/MOVN - this simplifies aarch64_internal_move_immediate=0A= and movdi patterns even further.=0A= =0A= Cheers,=0A= Wilco=0A= =0A= v3: move immediate code together and avoid forward declarations,=0A= further cleanups and simplifications.=0A= =0A= Improve immediate expansion of immediates which can be created from a=0A= bitmask immediate and 2 MOVKs. Simplify, refactor and improve =0A= efficiency of bitmask checks and move immediate. Move various immediate=0A= handling functions together to avoid forward declarations.=0A= Include 32-bit MOVZ/N as valid 64-bit immediates. Add new constraint so=0A= the movdi pattern only needs a single alternative for move immediate.=0A= =0A= This reduces the number of 4-instruction immediates in SPECINT/FP by 10-15%= .=0A= =0A= Passes bootstrap & regress, OK for commit?=0A= =0A= gcc/ChangeLog:=0A= =0A= PR target/106583=0A= * config/aarch64/aarch64.cc (aarch64_internal_mov_immediate)=0A= Add support for a bitmask immediate with 2 MOVKs.=0A= (aarch64_check_bitmask): New function after refactorization.=0A= (aarch64_replicate_bitmask_imm): Remove function, merge into...=0A= (aarch64_bitmask_imm): Simplify replication of small modes.=0A= Split function into 64-bit only version for efficiency.=0A= (aarch64_zeroextended_move_imm): New function.=0A= (aarch64_move_imm): Refactor code.=0A= (aarch64_uimm12_shift): Move near other immediate functions.=0A= (aarch64_clamp_to_uimm12_shift): Likewise.=0A= (aarch64_movk_shift): Likewise.=0A= (aarch64_replicate_bitmask_imm): Likewise.=0A= (aarch64_and_split_imm1): Likewise.=0A= (aarch64_and_split_imm2): Likewise.=0A= (aarch64_and_bitmask_imm): Likewise.=0A= (aarch64_movw_imm): Remove.=0A= * config/aarch64/aarch64.md (movdi_aarch64): Merge 'N' and 'M'=0A= constraints into single 'O'.=0A= (mov_aarch64): Likewise.=0A= * config/aarch64/aarch64-protos.h (aarch64_move_imm): Use unsigned.= =0A= (aarch64_bitmask_imm): Likewise.=0A= (aarch64_uimm12_shift): Likewise.=0A= (aarch64_zeroextended_move_imm): New prototype.=0A= * config/aarch64/constraints.md: Add 'O' for 32/64-bit immediates,= =0A= limit 'N' to 64-bit only moves.=0A= =0A= gcc/testsuite:=0A= PR target/106583=0A= * gcc.target/aarch64/pr106583.c: Add new test.=0A= =0A= ---=0A= =0A= diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch= 64-protos.h=0A= index 3e4005c9f4ff1f999f1811c6fb0b2252878dc4ae..b82f9ba7c2bb4cffa16abbf45f8= 7061f72015083 100644=0A= --- a/gcc/config/aarch64/aarch64-protos.h=0A= +++ b/gcc/config/aarch64/aarch64-protos.h=0A= @@ -755,7 +755,7 @@ void aarch64_post_cfi_startproc (void);=0A= poly_int64 aarch64_initial_elimination_offset (unsigned, unsigned);=0A= int aarch64_get_condition_code (rtx);=0A= bool aarch64_address_valid_for_prefetch_p (rtx, bool);=0A= -bool aarch64_bitmask_imm (HOST_WIDE_INT val, machine_mode);=0A= +bool aarch64_bitmask_imm (unsigned HOST_WIDE_INT val, machine_mode);=0A= unsigned HOST_WIDE_INT aarch64_and_split_imm1 (HOST_WIDE_INT val_in);=0A= unsigned HOST_WIDE_INT aarch64_and_split_imm2 (HOST_WIDE_INT val_in);=0A= bool aarch64_and_bitmask_imm (unsigned HOST_WIDE_INT val_in, machine_mode = mode);=0A= @@ -792,7 +792,7 @@ bool aarch64_masks_and_shift_for_bfi_p (scalar_int_mode= , unsigned HOST_WIDE_INT,=0A= unsigned HOST_WIDE_INT,=0A= unsigned HOST_WIDE_INT);=0A= bool aarch64_zero_extend_const_eq (machine_mode, rtx, machine_mode, rtx);= =0A= -bool aarch64_move_imm (HOST_WIDE_INT, machine_mode);=0A= +bool aarch64_move_imm (unsigned HOST_WIDE_INT, machine_mode);=0A= machine_mode aarch64_sve_int_mode (machine_mode);=0A= opt_machine_mode aarch64_sve_pred_mode (unsigned int);=0A= machine_mode aarch64_sve_pred_mode (machine_mode);=0A= @@ -842,8 +842,9 @@ bool aarch64_sve_float_arith_immediate_p (rtx, bool);= =0A= bool aarch64_sve_float_mul_immediate_p (rtx);=0A= bool aarch64_split_dimode_const_store (rtx, rtx);=0A= bool aarch64_symbolic_address_p (rtx);=0A= -bool aarch64_uimm12_shift (HOST_WIDE_INT);=0A= +bool aarch64_uimm12_shift (unsigned HOST_WIDE_INT);=0A= int aarch64_movk_shift (const wide_int_ref &, const wide_int_ref &);=0A= +bool aarch64_zeroextended_move_imm (unsigned HOST_WIDE_INT);=0A= bool aarch64_use_return_insn_p (void);=0A= const char *aarch64_output_casesi (rtx *);=0A= =0A= diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc= =0A= index 4de55beb067ea8f0be0a90060a785c94bdee708b..785ec07692981d423582051ac08= 97e5dbc3a001f 100644=0A= --- a/gcc/config/aarch64/aarch64.cc=0A= +++ b/gcc/config/aarch64/aarch64.cc=0A= @@ -305,7 +305,6 @@ static bool aarch64_builtin_support_vector_misalignment= (machine_mode mode,=0A= static machine_mode aarch64_simd_container_mode (scalar_mode, poly_int64);= =0A= static bool aarch64_print_address_internal (FILE*, machine_mode, rtx,=0A= aarch64_addr_query_type);=0A= -static HOST_WIDE_INT aarch64_clamp_to_uimm12_shift (HOST_WIDE_INT val);=0A= =0A= /* The processor for which instructions should be scheduled. */=0A= enum aarch64_processor aarch64_tune =3D cortexa53;=0A= @@ -5502,6 +5501,142 @@ aarch64_output_sve_vector_inc_dec (const char *oper= ands, rtx x)=0A= factor, nelts_per_vq);=0A= }=0A= =0A= +/* Multipliers for repeating bitmasks of width 32, 16, 8, 4, and 2. */=0A= +=0A= +static const unsigned HOST_WIDE_INT bitmask_imm_mul[] =3D=0A= + {=0A= + 0x0000000100000001ull,=0A= + 0x0001000100010001ull,=0A= + 0x0101010101010101ull,=0A= + 0x1111111111111111ull,=0A= + 0x5555555555555555ull,=0A= + };=0A= +=0A= +=0A= +/* Return true if 64-bit VAL is a valid bitmask immediate. */=0A= +static bool=0A= +aarch64_bitmask_imm (unsigned HOST_WIDE_INT val)=0A= +{=0A= + unsigned HOST_WIDE_INT tmp, mask, first_one, next_one;=0A= + int bits;=0A= +=0A= + /* Check for a single sequence of one bits and return quickly if so.=0A= + The special cases of all ones and all zeroes returns false. */=0A= + tmp =3D val + (val & -val);=0A= +=0A= + if (tmp =3D=3D (tmp & -tmp))=0A= + return (val + 1) > 1;=0A= +=0A= + /* Invert if the immediate doesn't start with a zero bit - this means we= =0A= + only need to search for sequences of one bits. */=0A= + if (val & 1)=0A= + val =3D ~val;=0A= +=0A= + /* Find the first set bit and set tmp to val with the first sequence of = one=0A= + bits removed. Return success if there is a single sequence of ones. = */=0A= + first_one =3D val & -val;=0A= + tmp =3D val & (val + first_one);=0A= +=0A= + if (tmp =3D=3D 0)=0A= + return true;=0A= +=0A= + /* Find the next set bit and compute the difference in bit position. */= =0A= + next_one =3D tmp & -tmp;=0A= + bits =3D clz_hwi (first_one) - clz_hwi (next_one);=0A= + mask =3D val ^ tmp;=0A= +=0A= + /* Check the bit position difference is a power of 2, and that the first= =0A= + sequence of one bits fits within 'bits' bits. */=0A= + if ((mask >> bits) !=3D 0 || bits !=3D (bits & -bits))=0A= + return false;=0A= +=0A= + /* Check the sequence of one bits is repeated 64/bits times. */=0A= + return val =3D=3D mask * bitmask_imm_mul[__builtin_clz (bits) - 26];=0A= +}=0A= +=0A= +=0A= +/* Return true if VAL is a valid bitmask immediate for any mode. */=0A= +bool=0A= +aarch64_bitmask_imm (unsigned HOST_WIDE_INT val, machine_mode mode)=0A= +{=0A= + if (mode =3D=3D DImode)=0A= + return aarch64_bitmask_imm (val);=0A= +=0A= + if (mode =3D=3D SImode)=0A= + return aarch64_bitmask_imm ((val & 0xffffffff) | (val << 32));=0A= +=0A= + /* Replicate small immediates to fit 64 bits. */=0A= + int size =3D GET_MODE_UNIT_PRECISION (mode);=0A= + val &=3D (HOST_WIDE_INT_1U << size) - 1;=0A= + val *=3D bitmask_imm_mul[__builtin_clz (size) - 26];=0A= +=0A= + return aarch64_bitmask_imm (val);=0A= +}=0A= +=0A= +/* Return true if the immediate VAL can be a bitfield immediate=0A= + by changing the given MASK bits in VAL to zeroes, ones or bits=0A= + from the other half of VAL. Return the new immediate in VAL2. */=0A= +static inline bool=0A= +aarch64_check_bitmask (unsigned HOST_WIDE_INT val,=0A= + unsigned HOST_WIDE_INT &val2,=0A= + unsigned HOST_WIDE_INT mask)=0A= +{=0A= + val2 =3D val & ~mask;=0A= + if (val2 !=3D val && aarch64_bitmask_imm (val2))=0A= + return true;=0A= + val2 =3D val | mask;=0A= + if (val2 !=3D val && aarch64_bitmask_imm (val2))=0A= + return true;=0A= + val =3D val & ~mask;=0A= + val2 =3D val | (((val >> 32) | (val << 32)) & mask);=0A= + if (val2 !=3D val && aarch64_bitmask_imm (val2))=0A= + return true;=0A= + val2 =3D val | (((val >> 16) | (val << 48)) & mask);=0A= + if (val2 !=3D val && aarch64_bitmask_imm (val2))=0A= + return true;=0A= + return false;=0A= +}=0A= +=0A= +/* Return true if immediate VAL can only be created by using a 32-bit=0A= + zero-extended move immediate, not by a 64-bit move. */=0A= +bool=0A= +aarch64_zeroextended_move_imm (unsigned HOST_WIDE_INT val)=0A= +{=0A= + if ((val >> 16) =3D=3D 0 || (val >> 32) !=3D 0 || (val & 0xffff) =3D=3D = 0)=0A= + return false;=0A= + return !aarch64_bitmask_imm (val);=0A= +}=0A= +=0A= +/* Return true if VAL is an immediate that can be created by a single=0A= + MOV instruction. */=0A= +bool=0A= +aarch64_move_imm (unsigned HOST_WIDE_INT val, machine_mode mode)=0A= +{=0A= + unsigned HOST_WIDE_INT val2;=0A= +=0A= + if (val < 65536)=0A= + return true;=0A= +=0A= + val2 =3D val ^ ((HOST_WIDE_INT) val >> 63);=0A= + if ((val2 >> (__builtin_ctzll (val2) & 48)) < 65536)=0A= + return true;=0A= +=0A= + /* Special case 0xyyyyffffffffffff. */=0A= + if (((val2 + 1) << 16) =3D=3D 0)=0A= + return true;=0A= +=0A= + /* Special case immediates 0xffffyyyy and 0xyyyyffff. */=0A= + val2 =3D (mode =3D=3D DImode) ? val : val2;=0A= + if (((val2 + 1) & ~(unsigned HOST_WIDE_INT) 0xffff0000) =3D=3D 0=0A= + || (val2 >> 16) =3D=3D 0xffff)=0A= + return true;=0A= +=0A= + if (mode =3D=3D SImode || (val >> 32) =3D=3D 0)=0A= + val =3D (val & 0xffffffff) | (val << 32);=0A= + return aarch64_bitmask_imm (val);=0A= +}=0A= +=0A= +=0A= static int=0A= aarch64_internal_mov_immediate (rtx dest, rtx imm, bool generate,=0A= scalar_int_mode mode)=0A= @@ -5520,31 +5655,6 @@ aarch64_internal_mov_immediate (rtx dest, rtx imm, b= ool generate,=0A= return 1;=0A= }=0A= =0A= - /* Check to see if the low 32 bits are either 0xffffXXXX or 0xXXXXffff= =0A= - (with XXXX non-zero). In that case check to see if the move can be do= ne in=0A= - a smaller mode. */=0A= - val2 =3D val & 0xffffffff;=0A= - if (mode =3D=3D DImode=0A= - && aarch64_move_imm (val2, SImode)=0A= - && (((val >> 32) & 0xffff) =3D=3D 0 || (val >> 48) =3D=3D 0))=0A= - {=0A= - if (generate)=0A= - emit_insn (gen_rtx_SET (dest, GEN_INT (val2)));=0A= -=0A= - /* Check if we have to emit a second instruction by checking to see= =0A= - if any of the upper 32 bits of the original DI mode value is set.= */=0A= - if (val =3D=3D val2)=0A= - return 1;=0A= -=0A= - i =3D (val >> 48) ? 48 : 32;=0A= -=0A= - if (generate)=0A= - emit_insn (gen_insv_immdi (dest, GEN_INT (i),=0A= - GEN_INT ((val >> i) & 0xffff)));=0A= -=0A= - return 2;=0A= - }=0A= -=0A= if ((val >> 32) =3D=3D 0 || mode =3D=3D SImode)=0A= {=0A= if (generate)=0A= @@ -5568,26 +5678,20 @@ aarch64_internal_mov_immediate (rtx dest, rtx imm, = bool generate,=0A= one_match =3D ((~val & mask) =3D=3D 0) + ((~val & (mask << 16)) =3D=3D 0= ) +=0A= ((~val & (mask << 32)) =3D=3D 0) + ((~val & (mask << 48)) =3D=3D 0);= =0A= =0A= - if (zero_match !=3D 2 && one_match !=3D 2)=0A= + /* Try a bitmask immediate and a movk to generate the immediate=0A= + in 2 instructions. */=0A= + if (zero_match < 2 && one_match < 2)=0A= {=0A= - /* Try emitting a bitmask immediate with a movk replacing 16 bits.= =0A= - For a 64-bit bitmask try whether changing 16 bits to all ones or=0A= - zeroes creates a valid bitmask. To check any repeated bitmask,=0A= - try using 16 bits from the other 32-bit half of val. */=0A= -=0A= - for (i =3D 0; i < 64; i +=3D 16, mask <<=3D 16)=0A= + for (i =3D 0; i < 64; i +=3D 16)=0A= {=0A= - val2 =3D val & ~mask;=0A= - if (val2 !=3D val && aarch64_bitmask_imm (val2, mode))=0A= - break;=0A= - val2 =3D val | mask;=0A= - if (val2 !=3D val && aarch64_bitmask_imm (val2, mode))=0A= + if (aarch64_check_bitmask (val, val2, mask << i))=0A= break;=0A= - val2 =3D val2 & ~mask;=0A= - val2 =3D val2 | (((val2 >> 32) | (val2 << 32)) & mask);=0A= - if (val2 !=3D val && aarch64_bitmask_imm (val2, mode))=0A= +=0A= + val2 =3D val & ~(mask << i);=0A= + if ((val2 >> 32) =3D=3D 0 && aarch64_move_imm (val2, DImode))=0A= break;=0A= }=0A= +=0A= if (i !=3D 64)=0A= {=0A= if (generate)=0A= @@ -5600,6 +5704,25 @@ aarch64_internal_mov_immediate (rtx dest, rtx imm, b= ool generate,=0A= }=0A= }=0A= =0A= + /* Try a bitmask plus 2 movk to generate the immediate in 3 instructions= . */=0A= + if (zero_match + one_match =3D=3D 0)=0A= + {=0A= + for (i =3D 0; i < 48; i +=3D 16)=0A= + for (int j =3D i + 16; j < 64; j +=3D 16)=0A= + if (aarch64_check_bitmask (val, val2, (mask << i) | (mask << j)))=0A= + {=0A= + if (generate)=0A= + {=0A= + emit_insn (gen_rtx_SET (dest, GEN_INT (val2)));=0A= + emit_insn (gen_insv_immdi (dest, GEN_INT (i),=0A= + GEN_INT ((val >> i) & 0xffff)));=0A= + emit_insn (gen_insv_immdi (dest, GEN_INT (j),=0A= + GEN_INT ((val >> j) & 0xffff)));=0A= + }=0A= + return 3;=0A= + }=0A= + }=0A= +=0A= /* Generate 2-4 instructions, skipping 16 bits of all zeroes or ones whi= ch=0A= are emitted by the initial mov. If one_match > zero_match, skip set = bits,=0A= otherwise skip zero bits. */=0A= @@ -5643,6 +5766,95 @@ aarch64_mov128_immediate (rtx imm)=0A= + aarch64_internal_mov_immediate (NULL_RTX, hi, false, DImode) <=3D 4;= =0A= }=0A= =0A= +/* Return true if val can be encoded as a 12-bit unsigned immediate with= =0A= + a left shift of 0 or 12 bits. */=0A= +bool=0A= +aarch64_uimm12_shift (unsigned HOST_WIDE_INT val)=0A= +{=0A= + return val < 4096 || (val & 0xfff000) =3D=3D val;=0A= +}=0A= +=0A= +/* Returns the nearest value to VAL that will fit as a 12-bit unsigned imm= ediate=0A= + that can be created with a left shift of 0 or 12. */=0A= +static HOST_WIDE_INT=0A= +aarch64_clamp_to_uimm12_shift (unsigned HOST_WIDE_INT val)=0A= +{=0A= + /* Check to see if the value fits in 24 bits, as that is the maximum we = can=0A= + handle correctly. */=0A= + gcc_assert (val < 0x1000000);=0A= +=0A= + if (val < 4096)=0A= + return val;=0A= +=0A= + return val & 0xfff000;=0A= +}=0A= +=0A= +/* Test whether:=0A= +=0A= + X =3D (X & AND_VAL) | IOR_VAL;=0A= +=0A= + can be implemented using:=0A= +=0A= + MOVK X, #(IOR_VAL >> shift), LSL #shift=0A= +=0A= + Return the shift if so, otherwise return -1. */=0A= +int=0A= +aarch64_movk_shift (const wide_int_ref &and_val,=0A= + const wide_int_ref &ior_val)=0A= +{=0A= + unsigned int precision =3D and_val.get_precision ();=0A= + unsigned HOST_WIDE_INT mask =3D 0xffff;=0A= + for (unsigned int shift =3D 0; shift < precision; shift +=3D 16)=0A= + {=0A= + if (and_val =3D=3D ~mask && (ior_val & mask) =3D=3D ior_val)=0A= + return shift;=0A= + mask <<=3D 16;=0A= + }=0A= + return -1;=0A= +}=0A= +=0A= +/* Create mask of ones, covering the lowest to highest bits set in VAL_IN.= =0A= + Assumed precondition: VAL_IN Is not zero. */=0A= +=0A= +unsigned HOST_WIDE_INT=0A= +aarch64_and_split_imm1 (HOST_WIDE_INT val_in)=0A= +{=0A= + int lowest_bit_set =3D ctz_hwi (val_in);=0A= + int highest_bit_set =3D floor_log2 (val_in);=0A= + gcc_assert (val_in !=3D 0);=0A= +=0A= + return ((HOST_WIDE_INT_UC (2) << highest_bit_set) -=0A= + (HOST_WIDE_INT_1U << lowest_bit_set));=0A= +}=0A= +=0A= +/* Create constant where bits outside of lowest bit set to highest bit set= =0A= + are set to 1. */=0A= +=0A= +unsigned HOST_WIDE_INT=0A= +aarch64_and_split_imm2 (HOST_WIDE_INT val_in)=0A= +{=0A= + return val_in | ~aarch64_and_split_imm1 (val_in);=0A= +}=0A= +=0A= +/* Return true if VAL_IN is a valid 'and' bitmask immediate. */=0A= +=0A= +bool=0A= +aarch64_and_bitmask_imm (unsigned HOST_WIDE_INT val_in, machine_mode mode)= =0A= +{=0A= + scalar_int_mode int_mode;=0A= + if (!is_a (mode, &int_mode))=0A= + return false;=0A= +=0A= + if (aarch64_bitmask_imm (val_in, int_mode))=0A= + return false;=0A= +=0A= + if (aarch64_move_imm (val_in, int_mode))=0A= + return false;=0A= +=0A= + unsigned HOST_WIDE_INT imm2 =3D aarch64_and_split_imm2 (val_in);=0A= +=0A= + return aarch64_bitmask_imm (imm2, int_mode);=0A= +}=0A= =0A= /* Return the number of temporary registers that aarch64_add_offset_1=0A= would need to add OFFSET to a register. */=0A= @@ -10098,208 +10310,6 @@ aarch64_tls_referenced_p (rtx x)=0A= return false;=0A= }=0A= =0A= -=0A= -/* Return true if val can be encoded as a 12-bit unsigned immediate with= =0A= - a left shift of 0 or 12 bits. */=0A= -bool=0A= -aarch64_uimm12_shift (HOST_WIDE_INT val)=0A= -{=0A= - return ((val & (((HOST_WIDE_INT) 0xfff) << 0)) =3D=3D val=0A= - || (val & (((HOST_WIDE_INT) 0xfff) << 12)) =3D=3D val=0A= - );=0A= -}=0A= -=0A= -/* Returns the nearest value to VAL that will fit as a 12-bit unsigned imm= ediate=0A= - that can be created with a left shift of 0 or 12. */=0A= -static HOST_WIDE_INT=0A= -aarch64_clamp_to_uimm12_shift (HOST_WIDE_INT val)=0A= -{=0A= - /* Check to see if the value fits in 24 bits, as that is the maximum we = can=0A= - handle correctly. */=0A= - gcc_assert ((val & 0xffffff) =3D=3D val);=0A= -=0A= - if (((val & 0xfff) << 0) =3D=3D val)=0A= - return val;=0A= -=0A= - return val & (0xfff << 12);=0A= -}=0A= -=0A= -/* Return true if val is an immediate that can be loaded into a=0A= - register by a MOVZ instruction. */=0A= -static bool=0A= -aarch64_movw_imm (HOST_WIDE_INT val, scalar_int_mode mode)=0A= -{=0A= - if (GET_MODE_SIZE (mode) > 4)=0A= - {=0A= - if ((val & (((HOST_WIDE_INT) 0xffff) << 32)) =3D=3D val=0A= - || (val & (((HOST_WIDE_INT) 0xffff) << 48)) =3D=3D val)=0A= - return 1;=0A= - }=0A= - else=0A= - {=0A= - /* Ignore sign extension. */=0A= - val &=3D (HOST_WIDE_INT) 0xffffffff;=0A= - }=0A= - return ((val & (((HOST_WIDE_INT) 0xffff) << 0)) =3D=3D val=0A= - || (val & (((HOST_WIDE_INT) 0xffff) << 16)) =3D=3D val);=0A= -}=0A= -=0A= -/* Test whether:=0A= -=0A= - X =3D (X & AND_VAL) | IOR_VAL;=0A= -=0A= - can be implemented using:=0A= -=0A= - MOVK X, #(IOR_VAL >> shift), LSL #shift=0A= -=0A= - Return the shift if so, otherwise return -1. */=0A= -int=0A= -aarch64_movk_shift (const wide_int_ref &and_val,=0A= - const wide_int_ref &ior_val)=0A= -{=0A= - unsigned int precision =3D and_val.get_precision ();=0A= - unsigned HOST_WIDE_INT mask =3D 0xffff;=0A= - for (unsigned int shift =3D 0; shift < precision; shift +=3D 16)=0A= - {=0A= - if (and_val =3D=3D ~mask && (ior_val & mask) =3D=3D ior_val)=0A= - return shift;=0A= - mask <<=3D 16;=0A= - }=0A= - return -1;=0A= -}=0A= -=0A= -/* VAL is a value with the inner mode of MODE. Replicate it to fill a=0A= - 64-bit (DImode) integer. */=0A= -=0A= -static unsigned HOST_WIDE_INT=0A= -aarch64_replicate_bitmask_imm (unsigned HOST_WIDE_INT val, machine_mode mo= de)=0A= -{=0A= - unsigned int size =3D GET_MODE_UNIT_PRECISION (mode);=0A= - while (size < 64)=0A= - {=0A= - val &=3D (HOST_WIDE_INT_1U << size) - 1;=0A= - val |=3D val << size;=0A= - size *=3D 2;=0A= - }=0A= - return val;=0A= -}=0A= -=0A= -/* Multipliers for repeating bitmasks of width 32, 16, 8, 4, and 2. */=0A= -=0A= -static const unsigned HOST_WIDE_INT bitmask_imm_mul[] =3D=0A= - {=0A= - 0x0000000100000001ull,=0A= - 0x0001000100010001ull,=0A= - 0x0101010101010101ull,=0A= - 0x1111111111111111ull,=0A= - 0x5555555555555555ull,=0A= - };=0A= -=0A= -=0A= -/* Return true if val is a valid bitmask immediate. */=0A= -=0A= -bool=0A= -aarch64_bitmask_imm (HOST_WIDE_INT val_in, machine_mode mode)=0A= -{=0A= - unsigned HOST_WIDE_INT val, tmp, mask, first_one, next_one;=0A= - int bits;=0A= -=0A= - /* Check for a single sequence of one bits and return quickly if so.=0A= - The special cases of all ones and all zeroes returns false. */=0A= - val =3D aarch64_replicate_bitmask_imm (val_in, mode);=0A= - tmp =3D val + (val & -val);=0A= -=0A= - if (tmp =3D=3D (tmp & -tmp))=0A= - return (val + 1) > 1;=0A= -=0A= - /* Replicate 32-bit immediates so we can treat them as 64-bit. */=0A= - if (mode =3D=3D SImode)=0A= - val =3D (val << 32) | (val & 0xffffffff);=0A= -=0A= - /* Invert if the immediate doesn't start with a zero bit - this means we= =0A= - only need to search for sequences of one bits. */=0A= - if (val & 1)=0A= - val =3D ~val;=0A= -=0A= - /* Find the first set bit and set tmp to val with the first sequence of = one=0A= - bits removed. Return success if there is a single sequence of ones. = */=0A= - first_one =3D val & -val;=0A= - tmp =3D val & (val + first_one);=0A= -=0A= - if (tmp =3D=3D 0)=0A= - return true;=0A= -=0A= - /* Find the next set bit and compute the difference in bit position. */= =0A= - next_one =3D tmp & -tmp;=0A= - bits =3D clz_hwi (first_one) - clz_hwi (next_one);=0A= - mask =3D val ^ tmp;=0A= -=0A= - /* Check the bit position difference is a power of 2, and that the first= =0A= - sequence of one bits fits within 'bits' bits. */=0A= - if ((mask >> bits) !=3D 0 || bits !=3D (bits & -bits))=0A= - return false;=0A= -=0A= - /* Check the sequence of one bits is repeated 64/bits times. */=0A= - return val =3D=3D mask * bitmask_imm_mul[__builtin_clz (bits) - 26];=0A= -}=0A= -=0A= -/* Create mask of ones, covering the lowest to highest bits set in VAL_IN.= =0A= - Assumed precondition: VAL_IN Is not zero. */=0A= -=0A= -unsigned HOST_WIDE_INT=0A= -aarch64_and_split_imm1 (HOST_WIDE_INT val_in)=0A= -{=0A= - int lowest_bit_set =3D ctz_hwi (val_in);=0A= - int highest_bit_set =3D floor_log2 (val_in);=0A= - gcc_assert (val_in !=3D 0);=0A= -=0A= - return ((HOST_WIDE_INT_UC (2) << highest_bit_set) -=0A= - (HOST_WIDE_INT_1U << lowest_bit_set));=0A= -}=0A= -=0A= -/* Create constant where bits outside of lowest bit set to highest bit set= =0A= - are set to 1. */=0A= -=0A= -unsigned HOST_WIDE_INT=0A= -aarch64_and_split_imm2 (HOST_WIDE_INT val_in)=0A= -{=0A= - return val_in | ~aarch64_and_split_imm1 (val_in);=0A= -}=0A= -=0A= -/* Return true if VAL_IN is a valid 'and' bitmask immediate. */=0A= -=0A= -bool=0A= -aarch64_and_bitmask_imm (unsigned HOST_WIDE_INT val_in, machine_mode mode)= =0A= -{=0A= - scalar_int_mode int_mode;=0A= - if (!is_a (mode, &int_mode))=0A= - return false;=0A= -=0A= - if (aarch64_bitmask_imm (val_in, int_mode))=0A= - return false;=0A= -=0A= - if (aarch64_move_imm (val_in, int_mode))=0A= - return false;=0A= -=0A= - unsigned HOST_WIDE_INT imm2 =3D aarch64_and_split_imm2 (val_in);=0A= -=0A= - return aarch64_bitmask_imm (imm2, int_mode);=0A= -}=0A= -=0A= -/* Return true if val is an immediate that can be loaded into a=0A= - register in a single instruction. */=0A= -bool=0A= -aarch64_move_imm (HOST_WIDE_INT val, machine_mode mode)=0A= -{=0A= - scalar_int_mode int_mode;=0A= - if (!is_a (mode, &int_mode))=0A= - return false;=0A= -=0A= - if (aarch64_movw_imm (val, int_mode) || aarch64_movw_imm (~val, int_mode= ))=0A= - return 1;=0A= - return aarch64_bitmask_imm (val, int_mode);=0A= -}=0A= -=0A= static bool=0A= aarch64_cannot_force_const_mem (machine_mode mode ATTRIBUTE_UNUSED, rtx x)= =0A= {=0A= diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md= =0A= index 0a7633e5dd6d45282edd7a1088c14b555bc09b40..23ceca48543d23b85beea1f0bf9= 8ef83051d80b6 100644=0A= --- a/gcc/config/aarch64/aarch64.md=0A= +++ b/gcc/config/aarch64/aarch64.md=0A= @@ -1309,16 +1309,15 @@ (define_insn_and_split "*movsi_aarch64"=0A= )=0A= =0A= (define_insn_and_split "*movdi_aarch64"=0A= - [(set (match_operand:DI 0 "nonimmediate_operand" "=3Dr,k,r,r,r,r,r, r,w,= m,m, r, r, r, w,r,w, w")=0A= - (match_operand:DI 1 "aarch64_mov_operand" " r,r,k,N,M,n,Usv,m,m,rZ,w,Usw= ,Usa,Ush,rZ,w,w,Dd"))]=0A= + [(set (match_operand:DI 0 "nonimmediate_operand" "=3Dr,k,r,r,r,r, r,w, m= ,m, r, r, r, w,r,w, w")=0A= + (match_operand:DI 1 "aarch64_mov_operand" " r,r,k,O,n,Usv,m,m,rZ,w,Usw,U= sa,Ush,rZ,w,w,Dd"))]=0A= "(register_operand (operands[0], DImode)=0A= || aarch64_reg_or_zero (operands[1], DImode))"=0A= "@=0A= mov\\t%x0, %x1=0A= mov\\t%0, %x1=0A= mov\\t%x0, %1=0A= - mov\\t%x0, %1=0A= - mov\\t%w0, %1=0A= + * return aarch64_zeroextended_move_imm (INTVAL (operands[1])) ? \"mov\\= t%w0, %1\" : \"mov\\t%x0, %1\";=0A= #=0A= * return aarch64_output_sve_cnt_immediate (\"cnt\", \"%x0\", operands[1= ]);=0A= ldr\\t%x0, %1=0A= @@ -1340,11 +1339,11 @@ (define_insn_and_split "*movdi_aarch64"=0A= DONE;=0A= }"=0A= ;; The "mov_imm" type for CNTD is just a placeholder.=0A= - [(set_attr "type" "mov_reg,mov_reg,mov_reg,mov_imm,mov_imm,mov_imm,mov_i= mm,=0A= + [(set_attr "type" "mov_reg,mov_reg,mov_reg,mov_imm,mov_imm,mov_imm,=0A= load_8,load_8,store_8,store_8,load_8,adr,adr,f_mcr,f_mrc,=0A= fmov,neon_move")=0A= - (set_attr "arch" "*,*,*,*,*,*,sve,*,fp,*,fp,*,*,*,fp,fp,fp,simd")=0A= - (set_attr "length" "4,4,4,4,4,*, 4,4, 4,4, 4,8,4,4, 4, 4, 4, 4")]=0A= + (set_attr "arch" "*,*,*,*,*,sve,*,fp,*,fp,*,*,*,fp,fp,fp,simd")=0A= + (set_attr "length" "4,4,4,4,*, 4,4, 4,4, 4,8,4,4, 4, 4, 4, 4")]=0A= )=0A= =0A= (define_insn "insv_imm"=0A= @@ -1508,7 +1507,7 @@ (define_insn "*mov_aarch64"=0A= =0A= (define_insn "*mov_aarch64"=0A= [(set (match_operand:DFD 0 "nonimmediate_operand" "=3Dw, w ,?r,w,w ,w = ,w,m,r,m ,r,r")=0A= - (match_operand:DFD 1 "general_operand" "Y , ?rY, w,w,Ufc,Uvi,m,w,m,r= Y,r,N"))]=0A= + (match_operand:DFD 1 "general_operand" "Y , ?rY, w,w,Ufc,Uvi,m,w,m,r= Y,r,O"))]=0A= "TARGET_FLOAT && (register_operand (operands[0], mode)=0A= || aarch64_reg_or_fp_zero (operands[1], mode))"=0A= "@=0A= @@ -1523,7 +1522,7 @@ (define_insn "*mov_aarch64"=0A= ldr\\t%x0, %1=0A= str\\t%x1, %0=0A= mov\\t%x0, %x1=0A= - mov\\t%x0, %1"=0A= + * return aarch64_zeroextended_move_imm (INTVAL (operands[1])) ? \"mov\\= t%w0, %1\" : \"mov\\t%x0, %1\";"=0A= [(set_attr "type" "neon_move,f_mcr,f_mrc,fmov,fconstd,neon_move,\=0A= f_loadd,f_stored,load_8,store_8,mov_reg,\=0A= fconstd")=0A= diff --git a/gcc/config/aarch64/constraints.md b/gcc/config/aarch64/constra= ints.md=0A= index ee7587cca1673208e2bfd6b503a21d0c8b69bf75..e91c7eab0b3674ca34ac2f790c3= 8fcd27986c35f 100644=0A= --- a/gcc/config/aarch64/constraints.md=0A= +++ b/gcc/config/aarch64/constraints.md=0A= @@ -106,6 +106,12 @@ (define_constraint "M"=0A= =0A= (define_constraint "N"=0A= "A constant that can be used with a 64-bit MOV immediate operation."=0A= + (and (match_code "const_int")=0A= + (match_test "aarch64_move_imm (ival, DImode)")=0A= + (match_test "!aarch64_zeroextended_move_imm (ival)")))=0A= +=0A= +(define_constraint "O"=0A= + "A constant that can be used with a 32 or 64-bit MOV immediate operation.= "=0A= (and (match_code "const_int")=0A= (match_test "aarch64_move_imm (ival, DImode)")))=0A= =0A= diff --git a/gcc/testsuite/gcc.target/aarch64/pr106583.c b/gcc/testsuite/gc= c.target/aarch64/pr106583.c=0A= new file mode 100644=0A= index 0000000000000000000000000000000000000000..0f931580817d78dc1cc58f03b25= 1bd21bec71f59=0A= --- /dev/null=0A= +++ b/gcc/testsuite/gcc.target/aarch64/pr106583.c=0A= @@ -0,0 +1,41 @@=0A= +/* { dg-do assemble } */=0A= +/* { dg-options "-O2 --save-temps" } */=0A= +=0A= +long f1 (void)=0A= +{=0A= + return 0x7efefefefefefeff;=0A= +}=0A= +=0A= +long f2 (void)=0A= +{=0A= + return 0x12345678aaaaaaaa;=0A= +}=0A= +=0A= +long f3 (void)=0A= +{=0A= + return 0x1234cccccccc5678;=0A= +}=0A= +=0A= +long f4 (void)=0A= +{=0A= + return 0x7777123456787777;=0A= +}=0A= +=0A= +long f5 (void)=0A= +{=0A= + return 0x5555555512345678;=0A= +}=0A= +=0A= +long f6 (void)=0A= +{=0A= + return 0x1234bbbb5678bbbb;=0A= +}=0A= +=0A= +long f7 (void)=0A= +{=0A= + return 0x4444123444445678;=0A= +}=0A= +=0A= +=0A= +/* { dg-final { scan-assembler-times {\tmovk\t} 14 } } */=0A= +/* { dg-final { scan-assembler-times {\tmov\t} 7 } } */=0A= =0A=