From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from EUR05-VI1-obe.outbound.protection.outlook.com (mail-vi1eur05on2041.outbound.protection.outlook.com [40.107.21.41]) by sourceware.org (Postfix) with ESMTPS id E43D03858C2A for ; Mon, 16 Oct 2023 12:27:18 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org E43D03858C2A Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org E43D03858C2A Authentication-Results: server2.sourceware.org; arc=fail smtp.remote-ip=40.107.21.41 ARC-Seal: i=2; a=rsa-sha256; d=sourceware.org; s=key; t=1697459241; cv=fail; b=i1T6ph4j8vs8yX0A0nCDwgDJEdrxqV7JzScdN1m9fTt+JQf35B1f5KA/dpBgkgWqc+/L0GoyRCDv2aiQBl7auKp8lTo2NxmINH8/6TxM7I/uHm6VuYc201OFLtxUWMK7oNNqaeFmsVtFTNBWmtuvzU6UQ9Ek8JyFEkF7qfTZ2Nc= ARC-Message-Signature: i=2; a=rsa-sha256; d=sourceware.org; s=key; t=1697459241; c=relaxed/simple; bh=wndXF/7x2VIykXJGweNIaVi1G5ruPf4oFu7v0SSy3PQ=; h=DKIM-Signature:DKIM-Signature:From:To:Subject:Date:Message-ID: MIME-Version; b=u9/CsCHfscGxfWY0WJEBd9aiMth5mU858XkJSjkJDRxXwKkai6MS5OojSWLQJbz1hXxoyeRgX0qraK4TGTwGZmUl0pziN9+FhuWNKsR22p1KSZEPIXnEr8J4MITcBx83Ais2X5UBJ6A7HoIJH0EzvtADf0dmEMnnQmF3ar/jTDY= ARC-Authentication-Results: i=2; server2.sourceware.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=zmlSwyT9UXA4MyDCdQKutwlWw8X5LZXcX7Dz7ZrpR4g=; b=vP7v6J/ny03jjrGFrygjG/eF13AwhWhjg0KEy4BmfHqk0XpQH7Z/8Hoyy99+zPAUdjWPVYbCVM5qXmYu2T3SSPacJYG6THYxAshBMl8XaydmQUN4+g/CM/7CiYb96y0DZEb5J8S9CjdqKj6C35ByUUy7zfJw9uu+tFTn9kDVopQ= Received: from AM6P194CA0035.EURP194.PROD.OUTLOOK.COM (2603:10a6:209:90::48) by PA6PR08MB10420.eurprd08.prod.outlook.com (2603:10a6:102:3d2::8) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6886.35; Mon, 16 Oct 2023 12:27:15 +0000 Received: from AMS0EPF000001AD.eurprd05.prod.outlook.com (2603:10a6:209:90:cafe::5a) by AM6P194CA0035.outlook.office365.com (2603:10a6:209:90::48) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6886.35 via Frontend Transport; Mon, 16 Oct 2023 12:27:15 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; pr=C Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by AMS0EPF000001AD.mail.protection.outlook.com (10.167.16.153) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6838.22 via Frontend Transport; Mon, 16 Oct 2023 12:27:15 +0000 Received: ("Tessian outbound fb5c0777b309:v211"); Mon, 16 Oct 2023 12:27:14 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: 8d166c920e003cab X-CR-MTA-TID: 64aa7808 Received: from 9d6c38f3529a.2 by 64aa7808-outbound-1.mta.getcheckrecipient.com id D2FFE73E-A8DA-4476-BDE0-EE5D114B0754.1; Mon, 16 Oct 2023 12:27:08 +0000 Received: from EUR05-VI1-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id 9d6c38f3529a.2 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Mon, 16 Oct 2023 12:27:08 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=lMr5gwNWeTBqGgqlDqqPhKzCt2/8lFYvs5idz3myzVhRZpCOcEysbKf6POIT+YHpNkpxA7kAAWOVC/aiODIE0/gRYJyA8pO+AX5F7u/CMxyzOqhOYdVdXrC95ZYxQ86JcTxozKNOuz76bLZCl1TaYimBZ+FFPKbbl5eHbow/ekeBPrHRzf40+H0jrrbXmkbpN73OJa5MWXPTKS4QzdJMyo0eXKW3IVgSc2odt89KEZu2wxMTjvr6gE1rURZaatDwRUe7MUWIi9vqZq9RhEDbbLVu8padrls0yrtuMu5Lh2HnM5gAyIGcYeSkCjIDaWtqaAMCeYkj+tchmNnTuOQpHw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=zmlSwyT9UXA4MyDCdQKutwlWw8X5LZXcX7Dz7ZrpR4g=; b=O5/0jz64jSBmr6CurtcIh1zADO/p9HmB6cxmLBngK8KbZsN3OdnJZiU3mxH/MrmECoH8tzRJJP7BKn1TcYoLHH6pYrOv6YD97QSpq99FvvlOiF9X5/15484loPUa7jIPibSjdfHzBwd4hhCWTBPc5tC3LCmibzVfY4uDSFFNhpErgg+aOoxZ4SXPB9dTr4lGAM5MZXcM91186IAg23GTncRYwaMRkkF0LEFfqxeyD7cYKlEha1KkLa+1Lv4OD6VWRvwbnh+s1t/ce6DfIQW1NKOE1ZdpsP7mEI6am0gdg+cnH3QgEmqFlZmfJrWz5jPrv9qZqgpw3ifuFshK0oboog== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=zmlSwyT9UXA4MyDCdQKutwlWw8X5LZXcX7Dz7ZrpR4g=; b=vP7v6J/ny03jjrGFrygjG/eF13AwhWhjg0KEy4BmfHqk0XpQH7Z/8Hoyy99+zPAUdjWPVYbCVM5qXmYu2T3SSPacJYG6THYxAshBMl8XaydmQUN4+g/CM/7CiYb96y0DZEb5J8S9CjdqKj6C35ByUUy7zfJw9uu+tFTn9kDVopQ= Received: from PAWPR08MB8982.eurprd08.prod.outlook.com (2603:10a6:102:33f::20) by GV2PR08MB9302.eurprd08.prod.outlook.com (2603:10a6:150:d4::10) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6886.34; Mon, 16 Oct 2023 12:27:05 +0000 Received: from PAWPR08MB8982.eurprd08.prod.outlook.com ([fe80::31cd:30d1:37a7:3e8]) by PAWPR08MB8982.eurprd08.prod.outlook.com ([fe80::31cd:30d1:37a7:3e8%4]) with mapi id 15.20.6886.034; Mon, 16 Oct 2023 12:27:05 +0000 From: Wilco Dijkstra To: GCC Patches CC: Richard Sandiford , Richard Earnshaw Subject: [PATCH v2] AArch64: Add inline memmove expansion Thread-Topic: [PATCH v2] AArch64: Add inline memmove expansion Thread-Index: AQHaACsZ3TFocMu4B0Ow6liA9vXqVg== Date: Mon, 16 Oct 2023 12:27:05 +0000 Message-ID: Accept-Language: en-GB, en-US Content-Language: en-GB X-MS-Has-Attach: X-MS-TNEF-Correlator: msip_labels: Authentication-Results-Original: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; x-ms-traffictypediagnostic: PAWPR08MB8982:EE_|GV2PR08MB9302:EE_|AMS0EPF000001AD:EE_|PA6PR08MB10420:EE_ X-MS-Office365-Filtering-Correlation-Id: 5c917573-a839-43d1-ad4c-08dbce433b07 x-checkrecipientrouted: true nodisclaimer: true X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: 0i9hAhtaI/x8w85Fy4FmUMh3HasU8zPbxXh8fnsgOEerG6ro9yfubuYnvyJ5WmtJreEjrTorJSdxkAXqoeVe+GR6hL4rxzjBISxWAgEAKIBlowTkjeoFs/1VNt5yLt1t1KCr4gLjFUxXmipdCfG+zDtfBdfHFpAsouwfwLFmziiFWxI/8XjqN09cega080+cG+ipUvjxuwX3H7VyqWGyRh+Lgdtg/TY/82gyyOSXp1ZxPHavxttvwgvu1zeoKW1Tepqzwk0OUgzPX5IahubakPJ70/LTGI4vIM13eQ9ViAPipp7k1aEsujRk+eGYd92ISDsZvL5ugqnnVm+e90MyvNBMoFWx52ADk/nhBbBA7lp3u7GRQTSvfgVQ9R0HE+7wBUeu/TluT/eqec7MqzaiEI9/csgmMU+zvjaYWCIZiJgCQpnFSewRY0tIWHXaxybOy9fHyIlulPwih+MLUNFhpS2CwM1bhn/c42yfutTUoRJDlvFUjPyt766VPULhs9PrsmHKszfLtqeZn21WSzBRuOH2L0CEA4T/ytOI0YBGwVQbQ3ssXBSuDKVnWRiXe5sUGbKsxhTxVt9aHmaXcf736SZZ0tJ91h4wD4xdu6GNiTg= X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:PAWPR08MB8982.eurprd08.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230031)(136003)(366004)(376002)(346002)(39860400002)(396003)(230922051799003)(451199024)(186009)(64100799003)(1800799009)(55016003)(84970400001)(122000001)(38070700005)(38100700002)(83380400001)(9686003)(26005)(6506007)(71200400001)(7696005)(66446008)(316002)(76116006)(66946007)(6916009)(478600001)(66556008)(91956017)(64756008)(66476007)(54906003)(30864003)(2906002)(33656002)(41300700001)(86362001)(8676002)(8936002)(4326008)(52536014)(5660300002);DIR:OUT;SFP:1101; Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-MS-Exchange-Transport-CrossTenantHeadersStamped: GV2PR08MB9302 Original-Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: AMS0EPF000001AD.eurprd05.prod.outlook.com X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id-Prvs: 1bd58eeb-87f6-4594-9556-08dbce433581 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: Qc3D+pEn0oUSt4SuHoXUvXq51Qn/AqIthl9n9Yvz0vpB/tjurucwO3IJZgr5RMthb5ncozZy/tpfecsCte30NRUCCMmUfAI702SBkQvDQuEKIi5F4Ph3z6RsFRzECcJEP+IJaPjSb03CAKRoKSgF7r4WCrSr9TMIlT5KQUj5o3kkm+Fn6SoHdeXFFsiF4BLcckHi3PYdRrLcYiZwtOjXVXnfmvRS2MtXLRzhEiejOEmke3pCEQSJE8lh51Co8Xp1VORqd3H6q+6NDm97npgt5yAsUOLmEzw/MNWFJSiEMwKlJiRAqDalYZoiV1NigPqnjWJrQ8lBFo6gMHEIERSRDWsxAl5kvv+/57jfse0Ak7D9xfM7VTRl87ur+/D5PS+I8WKBXVCFeXUWkYiK7QYjayhHHWVxZZoAn9v7xuY5STDGKwSRDESXi3UqBdtKyVCDxi0OYQ6CJTs/i8vFqedqy6kQPtvI1uxKB+Jx1/RZ962ZWaSxhQZU5b2Z19Kx0U136FOa7xboy86dQA87SroqOWhqMrR/lNrN2X0u40LDaWrITrkPYwcUuiXKk2Mulol2Ev1iJ49E/3mgheP16WWgmnhDzgG07oUI7s5Bup1hGnMK2bCFC/OpKlZxsjMxHcl49DAC1wccRk8LsCIl+6PIODHyptr0dRM5qiyqVkzs8MzfwTcXcx/yrEpX8PVVpePBAebGyJMdyJLE3BconSct8fSfZ5uWRnA5uP1iWyUK5Hs= X-Forefront-Antispam-Report: CIP:63.35.35.123;CTRY:IE;LANG:en;SCL:1;SRV:;IPV:CAL;SFV:NSPM;H:64aa7808-outbound-1.mta.getcheckrecipient.com;PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com;CAT:NONE;SFS:(13230031)(4636009)(136003)(396003)(376002)(346002)(39860400002)(230922051799003)(451199024)(64100799003)(186009)(1800799009)(82310400011)(46966006)(36840700001)(40470700004)(84970400001)(55016003)(40480700001)(40460700003)(478600001)(70586007)(54906003)(70206006)(6916009)(47076005)(83380400001)(36860700001)(86362001)(356005)(82740400003)(9686003)(41300700001)(26005)(6506007)(316002)(7696005)(336012)(33656002)(5660300002)(81166007)(4326008)(52536014)(30864003)(8936002)(8676002)(2906002);DIR:OUT;SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 16 Oct 2023 12:27:15.0921 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 5c917573-a839-43d1-ad4c-08dbce433b07 X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d;Ip=[63.35.35.123];Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: AMS0EPF000001AD.eurprd05.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: PA6PR08MB10420 X-Spam-Status: No, score=-10.6 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,FORGED_SPF_HELO,GIT_PATCH_0,KAM_DMARC_NONE,KAM_LOTSOFHASH,KAM_SHORT,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_PASS,SPF_NONE,TXREP,UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: v2: further cleanups, improved comments=0A= =0A= Add support for inline memmove expansions. The generated code is identical= =0A= as for memcpy, except that all loads are emitted before stores rather than= =0A= being interleaved. The maximum size is 256 bytes which requires at most 16= =0A= registers.=0A= =0A= Passes regress/bootstrap, OK for commit?=0A= =0A= gcc/ChangeLog/=0A= * config/aarch64/aarch64.opt (aarch64_mops_memmove_size_threshold):= =0A= Change default.=0A= * config/aarch64/aarch64.md (cpymemdi): Add a parameter.=0A= (movmemdi): Call aarch64_expand_cpymem.=0A= * config/aarch64/aarch64.cc (aarch64_copy_one_block): Rename functi= on,=0A= simplify, support storing generated loads/stores. =0A= (aarch64_expand_cpymem): Support expansion of memmove.=0A= * config/aarch64/aarch64-protos.h (aarch64_expand_cpymem): Add bool= arg.=0A= =0A= gcc/testsuite/ChangeLog/=0A= * gcc.target/aarch64/memmove.c: Add new test.=0A= =0A= ---=0A= =0A= diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch= 64-protos.h=0A= index 60a55f4bc1956786ea687fc7cad7ec9e4a84e1f0..0d39622bd2826a3fde54d67b5c5= da9ee9286cbbd 100644=0A= --- a/gcc/config/aarch64/aarch64-protos.h=0A= +++ b/gcc/config/aarch64/aarch64-protos.h=0A= @@ -769,7 +769,7 @@ bool aarch64_emit_approx_sqrt (rtx, rtx, bool);=0A= tree aarch64_vector_load_decl (tree);=0A= void aarch64_expand_call (rtx, rtx, rtx, bool);=0A= bool aarch64_expand_cpymem_mops (rtx *, bool);=0A= -bool aarch64_expand_cpymem (rtx *);=0A= +bool aarch64_expand_cpymem (rtx *, bool);=0A= bool aarch64_expand_setmem (rtx *);=0A= bool aarch64_float_const_zero_rtx_p (rtx);=0A= bool aarch64_float_const_rtx_p (rtx);=0A= diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc= =0A= index 2fa5d09de85d385c1165e399bcc97681ef170916..e19e2d1de2e5b30eca672df05d9= dcc1bc106ecc8 100644=0A= --- a/gcc/config/aarch64/aarch64.cc=0A= +++ b/gcc/config/aarch64/aarch64.cc=0A= @@ -25238,52 +25238,37 @@ aarch64_progress_pointer (rtx pointer)=0A= return aarch64_move_pointer (pointer, GET_MODE_SIZE (GET_MODE (pointer))= );=0A= }=0A= =0A= -/* Copy one MODE sized block from SRC to DST, then progress SRC and DST by= =0A= - MODE bytes. */=0A= +/* Copy one block of size MODE from SRC to DST at offset OFFSET. */=0A= =0A= static void=0A= -aarch64_copy_one_block_and_progress_pointers (rtx *src, rtx *dst,=0A= - machine_mode mode)=0A= +aarch64_copy_one_block (rtx *load, rtx *store, rtx src, rtx dst,=0A= + int offset, machine_mode mode)=0A= {=0A= - /* Handle 256-bit memcpy separately. We do this by making 2 adjacent me= mory=0A= - address copies using V4SImode so that we can use Q registers. */=0A= - if (known_eq (GET_MODE_BITSIZE (mode), 256))=0A= + /* Emit explict load/store pair instructions for 32-byte copies. */=0A= + if (known_eq (GET_MODE_SIZE (mode), 32))=0A= {=0A= mode =3D V4SImode;=0A= + rtx src1 =3D adjust_address (src, mode, offset);=0A= + rtx src2 =3D adjust_address (src, mode, offset + 16);=0A= + rtx dst1 =3D adjust_address (dst, mode, offset);=0A= + rtx dst2 =3D adjust_address (dst, mode, offset + 16);=0A= rtx reg1 =3D gen_reg_rtx (mode);=0A= rtx reg2 =3D gen_reg_rtx (mode);=0A= - /* "Cast" the pointers to the correct mode. */=0A= - *src =3D adjust_address (*src, mode, 0);=0A= - *dst =3D adjust_address (*dst, mode, 0);=0A= - /* Emit the memcpy. */=0A= - emit_insn (aarch64_gen_load_pair (mode, reg1, *src, reg2,=0A= - aarch64_progress_pointer (*src)));=0A= - emit_insn (aarch64_gen_store_pair (mode, *dst, reg1,=0A= - aarch64_progress_pointer (*dst), reg2));=0A= - /* Move the pointers forward. */=0A= - *src =3D aarch64_move_pointer (*src, 32);=0A= - *dst =3D aarch64_move_pointer (*dst, 32);=0A= + *load =3D aarch64_gen_load_pair (mode, reg1, src1, reg2, src2);=0A= + *store =3D aarch64_gen_store_pair (mode, dst1, reg1, dst2, reg2);=0A= return;=0A= }=0A= =0A= rtx reg =3D gen_reg_rtx (mode);=0A= -=0A= - /* "Cast" the pointers to the correct mode. */=0A= - *src =3D adjust_address (*src, mode, 0);=0A= - *dst =3D adjust_address (*dst, mode, 0);=0A= - /* Emit the memcpy. */=0A= - emit_move_insn (reg, *src);=0A= - emit_move_insn (*dst, reg);=0A= - /* Move the pointers forward. */=0A= - *src =3D aarch64_progress_pointer (*src);=0A= - *dst =3D aarch64_progress_pointer (*dst);=0A= + *load =3D gen_move_insn (reg, adjust_address (src, mode, offset));=0A= + *store =3D gen_move_insn (adjust_address (dst, mode, offset), reg);=0A= }=0A= =0A= /* Expand a cpymem/movmem using the MOPS extension. OPERANDS are taken=0A= from the cpymem/movmem pattern. IS_MEMMOVE is true if this is a memmov= e=0A= rather than memcpy. Return true iff we succeeded. */=0A= bool=0A= -aarch64_expand_cpymem_mops (rtx *operands, bool is_memmove =3D false)=0A= +aarch64_expand_cpymem_mops (rtx *operands, bool is_memmove)=0A= {=0A= if (!TARGET_MOPS)=0A= return false;=0A= @@ -25302,51 +25287,48 @@ aarch64_expand_cpymem_mops (rtx *operands, bool i= s_memmove =3D false)=0A= return true;=0A= }=0A= =0A= -/* Expand cpymem, as if from a __builtin_memcpy. Return true if=0A= - we succeed, otherwise return false, indicating that a libcall to=0A= - memcpy should be emitted. */=0A= -=0A= +/* Expand cpymem/movmem, as if from a __builtin_memcpy/memmove.=0A= + OPERANDS are taken from the cpymem/movmem pattern. IS_MEMMOVE is true= =0A= + if this is a memmove rather than memcpy. Return true if we succeed,=0A= + otherwise return false, indicating that a libcall should be emitted. *= /=0A= bool=0A= -aarch64_expand_cpymem (rtx *operands)=0A= +aarch64_expand_cpymem (rtx *operands, bool is_memmove)=0A= {=0A= - int mode_bits;=0A= + int mode_bytes;=0A= rtx dst =3D operands[0];=0A= rtx src =3D operands[1];=0A= unsigned align =3D UINTVAL (operands[3]);=0A= rtx base;=0A= - machine_mode cur_mode =3D BLKmode;=0A= - bool size_p =3D optimize_function_for_size_p (cfun);=0A= + machine_mode cur_mode =3D BLKmode, next_mode;=0A= =0A= /* Variable-sized or strict-align copies may use the MOPS expansion. */= =0A= if (!CONST_INT_P (operands[2]) || (STRICT_ALIGNMENT && align < 16))=0A= - return aarch64_expand_cpymem_mops (operands);=0A= + return aarch64_expand_cpymem_mops (operands, is_memmove);=0A= =0A= unsigned HOST_WIDE_INT size =3D UINTVAL (operands[2]);=0A= =0A= - /* Try to inline up to 256 bytes. */=0A= - unsigned max_copy_size =3D 256;=0A= - unsigned mops_threshold =3D aarch64_mops_memcpy_size_threshold;=0A= + /* Set inline limits for memmove/memcpy. MOPS has a separate threshold.= */=0A= + unsigned max_copy_size =3D TARGET_SIMD ? 256 : 128;=0A= + unsigned mops_threshold =3D is_memmove ? aarch64_mops_memmove_size_thres= hold=0A= + : aarch64_mops_memcpy_size_threshold;=0A= +=0A= + /* Reduce the maximum size with -Os. */=0A= + if (optimize_function_for_size_p (cfun))=0A= + max_copy_size /=3D 4;=0A= =0A= /* Large copies use MOPS when available or a library call. */=0A= if (size > max_copy_size || (TARGET_MOPS && size > mops_threshold))=0A= - return aarch64_expand_cpymem_mops (operands);=0A= + return aarch64_expand_cpymem_mops (operands, is_memmove);=0A= =0A= - int copy_bits =3D 256;=0A= + unsigned copy_max =3D 32;=0A= =0A= - /* Default to 256-bit LDP/STP on large copies, however small copies, no = SIMD=0A= - support or slow 256-bit LDP/STP fall back to 128-bit chunks. */=0A= + /* Default to 32-byte LDP/STP on large copies, however small copies, no = SIMD=0A= + support or slow LDP/STP fall back to 16-byte chunks. */=0A= if (size <=3D 24=0A= || !TARGET_SIMD=0A= || (aarch64_tune_params.extra_tuning_flags=0A= & AARCH64_EXTRA_TUNE_NO_LDP_STP_QREGS))=0A= - copy_bits =3D 128;=0A= -=0A= - /* Emit an inline load+store sequence and count the number of operations= =0A= - involved. We use a simple count of just the loads and stores emitted= =0A= - rather than rtx_insn count as all the pointer adjustments and reg cop= ying=0A= - in this function will get optimized away later in the pipeline. */= =0A= - start_sequence ();=0A= - unsigned nops =3D 0;=0A= + copy_max =3D 16;=0A= =0A= base =3D copy_to_mode_reg (Pmode, XEXP (dst, 0));=0A= dst =3D adjust_automodify_address (dst, VOIDmode, base, 0);=0A= @@ -25354,69 +25336,60 @@ aarch64_expand_cpymem (rtx *operands)=0A= base =3D copy_to_mode_reg (Pmode, XEXP (src, 0));=0A= src =3D adjust_automodify_address (src, VOIDmode, base, 0);=0A= =0A= - /* Convert size to bits to make the rest of the code simpler. */=0A= - int n =3D size * BITS_PER_UNIT;=0A= + const int max_ops =3D 40;=0A= + rtx load[max_ops], store[max_ops];=0A= =0A= - while (n > 0)=0A= + int nops, offset;=0A= +=0A= + for (nops =3D 0, offset =3D 0; size > 0; nops++)=0A= {=0A= /* Find the largest mode in which to do the copy in without over rea= ding=0A= or writing. */=0A= opt_scalar_int_mode mode_iter;=0A= FOR_EACH_MODE_IN_CLASS (mode_iter, MODE_INT)=0A= - if (GET_MODE_BITSIZE (mode_iter.require ()) <=3D MIN (n, copy_bits))=0A= + if (GET_MODE_SIZE (mode_iter.require ()) <=3D MIN (size, copy_max))=0A= cur_mode =3D mode_iter.require ();=0A= =0A= - gcc_assert (cur_mode !=3D BLKmode);=0A= + gcc_assert (cur_mode !=3D BLKmode && nops < max_ops);=0A= =0A= - mode_bits =3D GET_MODE_BITSIZE (cur_mode).to_constant ();=0A= + mode_bytes =3D GET_MODE_SIZE (cur_mode).to_constant ();=0A= =0A= /* Prefer Q-register accesses for the last bytes. */=0A= - if (mode_bits =3D=3D 128 && copy_bits =3D=3D 256)=0A= + if (mode_bytes =3D=3D 16 && copy_max =3D=3D 32)=0A= cur_mode =3D V4SImode;=0A= =0A= - aarch64_copy_one_block_and_progress_pointers (&src, &dst, cur_mode);= =0A= - /* A single block copy is 1 load + 1 store. */=0A= - nops +=3D 2;=0A= - n -=3D mode_bits;=0A= + aarch64_copy_one_block (&load[nops], &store[nops], src, dst, offset,= cur_mode);=0A= + size -=3D mode_bytes;=0A= + offset +=3D mode_bytes;=0A= =0A= /* Emit trailing copies using overlapping unaligned accesses=0A= - (when !STRICT_ALIGNMENT) - this is smaller and faster. */=0A= - if (n > 0 && n < copy_bits / 2 && !STRICT_ALIGNMENT)=0A= + (when !STRICT_ALIGNMENT) - this is smaller and faster. */=0A= + if (size > 0 && size < copy_max / 2 && !STRICT_ALIGNMENT)=0A= {=0A= - machine_mode next_mode =3D smallest_mode_for_size (n, MODE_INT);=0A= - int n_bits =3D GET_MODE_BITSIZE (next_mode).to_constant ();=0A= - gcc_assert (n_bits <=3D mode_bits);=0A= - src =3D aarch64_move_pointer (src, (n - n_bits) / BITS_PER_UNIT);=0A= - dst =3D aarch64_move_pointer (dst, (n - n_bits) / BITS_PER_UNIT);=0A= - n =3D n_bits;=0A= + next_mode =3D smallest_mode_for_size (size * BITS_PER_UNIT, MODE_INT);= =0A= + int n_bytes =3D GET_MODE_SIZE (next_mode).to_constant ();=0A= + gcc_assert (n_bytes <=3D mode_bytes);=0A= + offset -=3D n_bytes - size;=0A= + size =3D n_bytes;=0A= }=0A= }=0A= - rtx_insn *seq =3D get_insns ();=0A= - end_sequence ();=0A= - /* MOPS sequence requires 3 instructions for the memory copying + 1 to m= ove=0A= - the constant size into a register. */=0A= - unsigned mops_cost =3D 3 + 1;=0A= -=0A= - /* If MOPS is available at this point we don't consider the libcall as i= t's=0A= - not a win even on code size. At this point only consider MOPS if=0A= - optimizing for size. For speed optimizations we will have chosen bet= ween=0A= - the two based on copy size already. */=0A= - if (TARGET_MOPS)=0A= - {=0A= - if (size_p && mops_cost < nops)=0A= - return aarch64_expand_cpymem_mops (operands);=0A= - emit_insn (seq);=0A= - return true;=0A= - }=0A= =0A= - /* A memcpy libcall in the worst case takes 3 instructions to prepare th= e=0A= - arguments + 1 for the call. When MOPS is not available and we're=0A= - optimizing for size a libcall may be preferable. */=0A= - unsigned libcall_cost =3D 4;=0A= - if (size_p && libcall_cost < nops)=0A= - return false;=0A= + /* Memcpy interleaves loads with stores, memmove emits all loads first. = */=0A= + int i, j, m, inc;=0A= + inc =3D is_memmove ? nops : 3;=0A= + if (nops =3D=3D inc + 1)=0A= + inc =3D nops / 2;=0A= + for (i =3D 0; i < nops; i +=3D inc)=0A= + {=0A= + m =3D inc;=0A= + if (i + m > nops)=0A= + m =3D nops - i;=0A= =0A= - emit_insn (seq);=0A= + for (j =3D 0; j < m; j++)=0A= + emit_insn (load[i + j]);=0A= + for (j =3D 0; j < m; j++)=0A= + emit_insn (store[i + j]);=0A= + }=0A= return true;=0A= }=0A= =0A= diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md= =0A= index 1cb3a01d6791a48dc0b08df5783d97805448c7f2..18dd629c2456041b1185eae6d39= de074709b2a39 100644=0A= --- a/gcc/config/aarch64/aarch64.md=0A= +++ b/gcc/config/aarch64/aarch64.md=0A= @@ -1629,7 +1629,7 @@ (define_expand "cpymemdi"=0A= (match_operand:DI 3 "immediate_operand")]=0A= ""=0A= {=0A= - if (aarch64_expand_cpymem (operands))=0A= + if (aarch64_expand_cpymem (operands, false))=0A= DONE;=0A= FAIL;=0A= }=0A= @@ -1673,17 +1673,9 @@ (define_expand "movmemdi"=0A= (match_operand:BLK 1 "memory_operand")=0A= (match_operand:DI 2 "general_operand")=0A= (match_operand:DI 3 "immediate_operand")]=0A= - "TARGET_MOPS"=0A= + ""=0A= {=0A= - rtx sz_reg =3D operands[2];=0A= - /* For constant-sized memmoves check the threshold.=0A= - FIXME: We should add a non-MOPS memmove expansion for smaller,=0A= - constant-sized memmove to avoid going to a libcall. */=0A= - if (CONST_INT_P (sz_reg)=0A= - && INTVAL (sz_reg) < aarch64_mops_memmove_size_threshold)=0A= - FAIL;=0A= -=0A= - if (aarch64_expand_cpymem_mops (operands, true))=0A= + if (aarch64_expand_cpymem (operands, true))=0A= DONE;=0A= FAIL;=0A= }=0A= diff --git a/gcc/config/aarch64/aarch64.opt b/gcc/config/aarch64/aarch64.op= t=0A= index f5a518202a157b5b5bc2b2aa14ac1177fded7d66..0ac9d8c578d706e7bf0f0ae399d= 84544f0c619dc 100644=0A= --- a/gcc/config/aarch64/aarch64.opt=0A= +++ b/gcc/config/aarch64/aarch64.opt=0A= @@ -327,7 +327,7 @@ Target Joined UInteger Var(aarch64_mops_memcpy_size_thr= eshold) Init(256) Param=0A= Constant memcpy size in bytes above which to start using MOPS sequence.=0A= =0A= -param=3Daarch64-mops-memmove-size-threshold=3D=0A= -Target Joined UInteger Var(aarch64_mops_memmove_size_threshold) Init(0) Pa= ram=0A= +Target Joined UInteger Var(aarch64_mops_memmove_size_threshold) Init(256) = Param=0A= Constant memmove size in bytes above which to start using MOPS sequence.= =0A= =0A= -param=3Daarch64-mops-memset-size-threshold=3D=0A= diff --git a/gcc/testsuite/gcc.target/aarch64/memmove.c b/gcc/testsuite/gcc= .target/aarch64/memmove.c=0A= new file mode 100644=0A= index 0000000000000000000000000000000000000000..6926a97761eb2578d3f1db7e6eb= 19dba17b888be=0A= --- /dev/null=0A= +++ b/gcc/testsuite/gcc.target/aarch64/memmove.c=0A= @@ -0,0 +1,22 @@=0A= +/* { dg-do compile } */=0A= +/* { dg-options "-O2" } */=0A= +=0A= +void=0A= +copy1 (int *x, int *y)=0A= +{=0A= + __builtin_memmove (x, y, 12);=0A= +}=0A= +=0A= +void=0A= +copy2 (int *x, int *y)=0A= +{=0A= + __builtin_memmove (x, y, 128);=0A= +}=0A= +=0A= +void=0A= +copy3 (int *x, int *y)=0A= +{=0A= + __builtin_memmove (x, y, 255);=0A= +}=0A= +=0A= +/* { dg-final { scan-assembler-not {\tb\tmemmove} } } */=0A= =0A=