From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from EUR05-VI1-obe.outbound.protection.outlook.com (mail-vi1eur05on2042.outbound.protection.outlook.com [40.107.21.42]) by sourceware.org (Postfix) with ESMTPS id 390FB3858288 for ; Thu, 21 Sep 2023 16:20:12 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 390FB3858288 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=w7i305WqidXF09P+xldQyWOneGmOCCZ90t6y1bW5aK8=; b=0b0Xm0vQ+QCmHBGlhEKzuGNxwl47BDTYvV5evVbqjBt+kQvGtLiUmkDUPzCfPe0ClJ7iYFSaiXKRpVA/AupXFuMTGjGIt7O15xQdxZ0k8Du95ZKc7Qz7tq3S8WoLCh9cA+MGD1WMjwP/5AOPhpoW0WziQduv5JHDpHZe8wC80LY= Received: from AS9PR01CA0027.eurprd01.prod.exchangelabs.com (2603:10a6:20b:542::11) by AS2PR08MB9248.eurprd08.prod.outlook.com (2603:10a6:20b:59c::17) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6792.26; Thu, 21 Sep 2023 16:20:08 +0000 Received: from AM7EUR03FT006.eop-EUR03.prod.protection.outlook.com (2603:10a6:20b:542:cafe::56) by AS9PR01CA0027.outlook.office365.com (2603:10a6:20b:542::11) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6792.30 via Frontend Transport; Thu, 21 Sep 2023 16:20:08 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; pr=C Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by AM7EUR03FT006.mail.protection.outlook.com (100.127.141.21) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6813.20 via Frontend Transport; Thu, 21 Sep 2023 16:20:08 +0000 Received: ("Tessian outbound 1eb4e931b055:v175"); Thu, 21 Sep 2023 16:20:08 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: ad568cb1e7ae1d75 X-CR-MTA-TID: 64aa7808 Received: from 46fd500b63a2.2 by 64aa7808-outbound-1.mta.getcheckrecipient.com id A8E13647-5E17-4D12-91D4-87319A202052.1; Thu, 21 Sep 2023 16:19:57 +0000 Received: from EUR04-VI1-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id 46fd500b63a2.2 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Thu, 21 Sep 2023 16:19:57 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=QrQNrFi53XG/+ZA88A1zxZ9AEyAeEPX4NJ1ANgkv8oxICoRs+fC0cbX8lE95C54WdeV/iVQVXCY7O4pZVEg4gqzoApWHIkGH3N41UqWG4Xa8jGWUF7aNDVbPI1sWE27t6asu3SLlWVwNHRR4RnBZa9hrkzMkG4kLR4kdZhrkqkWiZWIS53naF13+9BDAiKlUlvlbDQek/fOAxAwHHCSYzFwoY/Qq5gvq56FjiAIzDc9gDEO8pEGDV7hLk4nvTmYDGBDolBNKGCtmh/DiB3eJp4BGnq2nyTxjftMQUv0WfTRUAzJR1P084JhGZXG7IeGLNvSryRNZqtSdDhLcG6zUCw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=w7i305WqidXF09P+xldQyWOneGmOCCZ90t6y1bW5aK8=; b=D7Qu2CW5ZmlsqCakapmXcOrV18mpZSjkKUIIUYHzjh0Qr8kvz0w0xOimsC17NT6F8/mhtAD9xg3oMOgSPd/nDF8xg6Hyfz3Ke9vccOcPva5m3B4jrffWEOZavRyM+g3Hw1rCpT0R0SGWC8GUhnfpjOTXfSCjl81ZvDEJJ5F/ISrro49A3UAVOKkqpxYf+oJA3MMXNq0+AcI09X6DRAuCquOYH9t6GpJNQtceMDx0NllmeI9R2HUavXI33TnduHzmMdJ11hOTM4wUquobqEHCvg+JutofAhH4RSsLLMugn8OnlHDTm2IFY/tzI5Ar7Z97gTrHqKcZtlYRaFE3c7ujjQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=w7i305WqidXF09P+xldQyWOneGmOCCZ90t6y1bW5aK8=; b=0b0Xm0vQ+QCmHBGlhEKzuGNxwl47BDTYvV5evVbqjBt+kQvGtLiUmkDUPzCfPe0ClJ7iYFSaiXKRpVA/AupXFuMTGjGIt7O15xQdxZ0k8Du95ZKc7Qz7tq3S8WoLCh9cA+MGD1WMjwP/5AOPhpoW0WziQduv5JHDpHZe8wC80LY= Received: from PAWPR08MB8982.eurprd08.prod.outlook.com (2603:10a6:102:33f::20) by AM8PR08MB6339.eurprd08.prod.outlook.com (2603:10a6:20b:317::6) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6813.20; Thu, 21 Sep 2023 16:19:54 +0000 Received: from PAWPR08MB8982.eurprd08.prod.outlook.com ([fe80::ff3d:6e95:9971:a7e]) by PAWPR08MB8982.eurprd08.prod.outlook.com ([fe80::ff3d:6e95:9971:a7e%5]) with mapi id 15.20.6813.017; Thu, 21 Sep 2023 16:19:51 +0000 From: Wilco Dijkstra To: GCC Patches CC: Richard Sandiford , Richard Earnshaw Subject: [PATCH] AArch64: Add inline memmove expansion Thread-Topic: [PATCH] AArch64: Add inline memmove expansion Thread-Index: AQHZ7KZq1x2vk2WvwkSrmqteNIco5A== Date: Thu, 21 Sep 2023 16:19:51 +0000 Message-ID: Accept-Language: en-GB, en-US Content-Language: en-GB X-MS-Has-Attach: X-MS-TNEF-Correlator: msip_labels: Authentication-Results-Original: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; x-ms-traffictypediagnostic: PAWPR08MB8982:EE_|AM8PR08MB6339:EE_|AM7EUR03FT006:EE_|AS2PR08MB9248:EE_ X-MS-Office365-Filtering-Correlation-Id: a2d5f31a-cf98-4d36-32cf-08dbbabe9f97 x-checkrecipientrouted: true nodisclaimer: true X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: nhcOrAjuKDB9WS4/fkJ+M9js/gp9Eqq2VqXs9Eh9koJCEORNBUs0pBbyVxx20o7edqQUgcNZMFzig4UJI3NelK7+rJz+WQCSJVpmZXPEnQi+a4yYNSMR0LpYu6rBiNruwoQFtCid50G7vqxg6cRzjedAJq2WVUqoL1nAk0w865PjZf/G4ic75dPWo4D+FZn+eEtQ8OPdJ+M1I+6RIvcXJtD5R81V4lm6WJRkMjrNiMv9IeYWwqnABsqHjY0hvbodmTNxFct/RvibN9FyIwos6UtMPmL+69EdODRMA01NgQX+4sLezUtHSK2dSB3Ajzmf/Ch2SW8Vj4yd0GpytV4NoU4HaIcyICzJVMs12dZqSogDFuzc0wEvVTdLg3X3Sxz5BZ6w/C6FllrKk2pIu0nvcNPQDdu/xjkl6HCPChHvGpgXPe4/STnoytBOtQCSu9CL4u+eSbQ+YUOXCVnfsXFkoo4G35uOkTxaD7zXL1YdMx93KFj04+clKdmP3ZKtTLteYv/Bm55KLja097xwc6didxWq+i57F9pTyDmQ3t9KDpgPQgmpPZHYEI8HztgM6FA67vlm+Buvt2u2/WC/ifokRtiexM9bXm0Aniq/fUbIJKo= X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:PAWPR08MB8982.eurprd08.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230031)(39860400002)(136003)(366004)(346002)(396003)(376002)(1800799009)(186009)(451199024)(91956017)(7696005)(6506007)(9686003)(122000001)(86362001)(66946007)(76116006)(38070700005)(71200400001)(478600001)(55016003)(38100700002)(26005)(66556008)(66476007)(66446008)(5660300002)(52536014)(4326008)(54906003)(6916009)(64756008)(316002)(8936002)(41300700001)(33656002)(8676002)(2906002)(30864003)(84970400001)(83380400001);DIR:OUT;SFP:1101; Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-MS-Exchange-Transport-CrossTenantHeadersStamped: AM8PR08MB6339 Original-Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: AM7EUR03FT006.eop-EUR03.prod.protection.outlook.com X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id-Prvs: e9a53da5-91b4-45ac-5362-08dbbabe956c X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: tuEsQn6fzyNmzyF7syOXg+zOBWmicZlQZkdslcyElguSsnYh85Lf8qzOfJMKjmwqlo35f1Icsq2bR+08pkLqVVs++6TfDXzuyO1X4VbBYdd9dkxAr/cm1a+7jHSuUJPa5ETcVZWVpYPkbiSQLoOf9pF8IWEPmHiptfDLX5mx+fdTP9GoVYe5MkJT5JFokYA7lxa/bJYr7agPA81WsdrWeEEkeJTtlrKgY9tnLKlRfzSaAy4gJLr8ssLm33cnl/NQF9z28fLKJV0GmiJs7j5WWG9WN64C7HCqCNsEeb1dLBHdjnj8vldzGYsA3COHUOuFGGUPKr8+7ZiqdmL2lVRTQ+/1igT/pJsVeqsR2581qCFF03VI69+wCKJcak6BQlv2eMNVwF1iyRqShvpOvoydn4jP7V8JV0rrYSbYYXwMUIgT6J4yDSk4M3ziZIIkPq1l4VGaLnBqI0E3SeVXOvzeKFL/KfdcGNVGQiqNq5+dXJDH4JqrT4r+Tk3yrR0Ol/2zsZtt+dTlk1X7PQLFZihoX3Bi0jmH6j7c71XIAn4oe93RVMklvtm2m4LdSVq+hnRgI1/cqesRZAG7fqAvIiv/Xw+kFrXRQKMAoXRjw8frYhBIYlgDX3l7eybnA3BiTYe74hZlrMHTgn26uwSmNxgK8qR9jRy1Gi10FqDCbZBQJoWhIBU7t3tgN2vWk3qDxmgCVKzD6++PH7HbF7SR7gQmiaUG3DMJ7N3/0mCTMebwa1w= X-Forefront-Antispam-Report: CIP:63.35.35.123;CTRY:IE;LANG:en;SCL:1;SRV:;IPV:CAL;SFV:NSPM;H:64aa7808-outbound-1.mta.getcheckrecipient.com;PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com;CAT:NONE;SFS:(13230031)(4636009)(396003)(346002)(39860400002)(136003)(376002)(451199024)(186009)(82310400011)(1800799009)(40470700004)(46966006)(36840700001)(478600001)(83380400001)(82740400003)(6506007)(7696005)(70586007)(9686003)(336012)(2906002)(41300700001)(47076005)(54906003)(6916009)(70206006)(316002)(30864003)(52536014)(26005)(40460700003)(8936002)(356005)(36860700001)(86362001)(81166007)(4326008)(40480700001)(33656002)(55016003)(84970400001)(8676002)(5660300002);DIR:OUT;SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 21 Sep 2023 16:20:08.6314 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: a2d5f31a-cf98-4d36-32cf-08dbbabe9f97 X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d;Ip=[63.35.35.123];Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: AM7EUR03FT006.eop-EUR03.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: AS2PR08MB9248 X-Spam-Status: No, score=-10.6 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,FORGED_SPF_HELO,GIT_PATCH_0,KAM_DMARC_NONE,KAM_LOTSOFHASH,KAM_SHORT,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_PASS,SPF_NONE,TXREP,UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: =0A= Add support for inline memmove expansions. The generated code is identical= =0A= as for memcpy, except that all loads are emitted before stores rather than= =0A= being interleaved. The maximum size is 256 bytes which requires at most 16= =0A= registers.=0A= =0A= Passes regress/bootstrap, OK for commit?=0A= =0A= gcc/ChangeLog/=0A= * config/aarch64/aarch64.opt (aarch64_mops_memmove_size_threshold):= =0A= Change default.=0A= * config/aarch64/aarch64.md (cpymemdi): Add a parameter.=0A= (movmemdi): Call aarch64_expand_cpymem.=0A= * config/aarch64/aarch64.cc (aarch64_copy_one_block): Rename functi= on,=0A= simplify, support storing generated loads/stores. =0A= (aarch64_expand_cpymem): Support expansion of memmove.=0A= * config/aarch64/aarch64-protos.h (aarch64_expand_cpymem): Add bool= arg.=0A= =0A= gcc/testsuite/ChangeLog/=0A= * gcc.target/aarch64/memmove.c: Add new test.=0A= =0A= ---=0A= =0A= diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch= 64-protos.h=0A= index e8d91cba30e32e03c4794ccc24254691d135f2dd..e224218600969d9d052128790f1= 524414bbab5c6 100644=0A= --- a/gcc/config/aarch64/aarch64-protos.h=0A= +++ b/gcc/config/aarch64/aarch64-protos.h=0A= @@ -766,7 +766,7 @@ bool aarch64_emit_approx_sqrt (rtx, rtx, bool);=0A= tree aarch64_vector_load_decl (tree);=0A= void aarch64_expand_call (rtx, rtx, rtx, bool);=0A= bool aarch64_expand_cpymem_mops (rtx *, bool);=0A= -bool aarch64_expand_cpymem (rtx *);=0A= +bool aarch64_expand_cpymem (rtx *, bool);=0A= bool aarch64_expand_setmem (rtx *);=0A= bool aarch64_float_const_zero_rtx_p (rtx);=0A= bool aarch64_float_const_rtx_p (rtx);=0A= diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc= =0A= index 8a12894d6b80de1031d6e7d02dca680c57bce136..a573e3bded2736f5108ad2d4004= f530e0f32c99c 100644=0A= --- a/gcc/config/aarch64/aarch64.cc=0A= +++ b/gcc/config/aarch64/aarch64.cc=0A= @@ -25191,48 +25191,35 @@ aarch64_progress_pointer (rtx pointer)=0A= MODE bytes. */=0A= =0A= static void=0A= -aarch64_copy_one_block_and_progress_pointers (rtx *src, rtx *dst,=0A= - machine_mode mode)=0A= +aarch64_copy_one_block (rtx *load, rtx *store, rtx src, rtx dst,=0A= + int offset, machine_mode mode)=0A= {=0A= /* Handle 256-bit memcpy separately. We do this by making 2 adjacent me= mory=0A= address copies using V4SImode so that we can use Q registers. */=0A= if (known_eq (GET_MODE_BITSIZE (mode), 256))=0A= {=0A= mode =3D V4SImode;=0A= + rtx src1 =3D adjust_address (src, mode, offset);=0A= + rtx src2 =3D adjust_address (src, mode, offset + 16);=0A= + rtx dst1 =3D adjust_address (dst, mode, offset);=0A= + rtx dst2 =3D adjust_address (dst, mode, offset + 16);=0A= rtx reg1 =3D gen_reg_rtx (mode);=0A= rtx reg2 =3D gen_reg_rtx (mode);=0A= - /* "Cast" the pointers to the correct mode. */=0A= - *src =3D adjust_address (*src, mode, 0);=0A= - *dst =3D adjust_address (*dst, mode, 0);=0A= - /* Emit the memcpy. */=0A= - emit_insn (aarch64_gen_load_pair (mode, reg1, *src, reg2,=0A= - aarch64_progress_pointer (*src)));=0A= - emit_insn (aarch64_gen_store_pair (mode, *dst, reg1,=0A= - aarch64_progress_pointer (*dst), reg2));=0A= - /* Move the pointers forward. */=0A= - *src =3D aarch64_move_pointer (*src, 32);=0A= - *dst =3D aarch64_move_pointer (*dst, 32);=0A= + *load =3D aarch64_gen_load_pair (mode, reg1, src1, reg2, src2);=0A= + *store =3D aarch64_gen_store_pair (mode, dst1, reg1, dst2, reg2);=0A= return;=0A= }=0A= =0A= rtx reg =3D gen_reg_rtx (mode);=0A= -=0A= - /* "Cast" the pointers to the correct mode. */=0A= - *src =3D adjust_address (*src, mode, 0);=0A= - *dst =3D adjust_address (*dst, mode, 0);=0A= - /* Emit the memcpy. */=0A= - emit_move_insn (reg, *src);=0A= - emit_move_insn (*dst, reg);=0A= - /* Move the pointers forward. */=0A= - *src =3D aarch64_progress_pointer (*src);=0A= - *dst =3D aarch64_progress_pointer (*dst);=0A= + *load =3D gen_move_insn (reg, adjust_address (src, mode, offset));=0A= + *store =3D gen_move_insn (adjust_address (dst, mode, offset), reg);=0A= }=0A= =0A= /* Expand a cpymem/movmem using the MOPS extension. OPERANDS are taken=0A= from the cpymem/movmem pattern. IS_MEMMOVE is true if this is a memmov= e=0A= rather than memcpy. Return true iff we succeeded. */=0A= bool=0A= -aarch64_expand_cpymem_mops (rtx *operands, bool is_memmove =3D false)=0A= +aarch64_expand_cpymem_mops (rtx *operands, bool is_memmove)=0A= {=0A= if (!TARGET_MOPS)=0A= return false;=0A= @@ -25251,12 +25238,12 @@ aarch64_expand_cpymem_mops (rtx *operands, bool i= s_memmove =3D false)=0A= return true;=0A= }=0A= =0A= -/* Expand cpymem, as if from a __builtin_memcpy. Return true if=0A= - we succeed, otherwise return false, indicating that a libcall to=0A= - memcpy should be emitted. */=0A= -=0A= +/* Expand cpymem/movmem, as if from a __builtin_memcpy/memmove.=0A= + OPERANDS are taken from the cpymem/movmem pattern. IS_MEMMOVE is true= =0A= + if this is a memmove rather than memcpy. Return true if we succeed,=0A= + otherwise return false, indicating that a libcall should be emitted. *= /=0A= bool=0A= -aarch64_expand_cpymem (rtx *operands)=0A= +aarch64_expand_cpymem (rtx *operands, bool is_memmove)=0A= {=0A= int mode_bits;=0A= rtx dst =3D operands[0];=0A= @@ -25268,17 +25255,22 @@ aarch64_expand_cpymem (rtx *operands)=0A= =0A= /* Variable-sized or strict-align copies may use the MOPS expansion. */= =0A= if (!CONST_INT_P (operands[2]) || (STRICT_ALIGNMENT && align < 16))=0A= - return aarch64_expand_cpymem_mops (operands);=0A= + return aarch64_expand_cpymem_mops (operands, is_memmove);=0A= =0A= unsigned HOST_WIDE_INT size =3D UINTVAL (operands[2]);=0A= =0A= - /* Try to inline up to 256 bytes. */=0A= - unsigned max_copy_size =3D 256;=0A= - unsigned mops_threshold =3D aarch64_mops_memcpy_size_threshold;=0A= + /* Set inline limits for memmove/memcpy. MOPS has a separate threshold.= */=0A= + unsigned max_copy_size =3D TARGET_SIMD ? 256 : 128;=0A= + unsigned mops_threshold =3D is_memmove ? aarch64_mops_memmove_size_thres= hold=0A= + : aarch64_mops_memcpy_size_threshold;=0A= +=0A= + /* Reduce the maximum size with -Os. */=0A= + if (size_p)=0A= + max_copy_size /=3D 4;=0A= =0A= /* Large copies use MOPS when available or a library call. */=0A= if (size > max_copy_size || (TARGET_MOPS && size > mops_threshold))=0A= - return aarch64_expand_cpymem_mops (operands);=0A= + return aarch64_expand_cpymem_mops (operands, is_memmove);=0A= =0A= int copy_bits =3D 256;=0A= =0A= @@ -25290,23 +25282,20 @@ aarch64_expand_cpymem (rtx *operands)=0A= & AARCH64_EXTRA_TUNE_NO_LDP_STP_QREGS))=0A= copy_bits =3D 128;=0A= =0A= - /* Emit an inline load+store sequence and count the number of operations= =0A= - involved. We use a simple count of just the loads and stores emitted= =0A= - rather than rtx_insn count as all the pointer adjustments and reg cop= ying=0A= - in this function will get optimized away later in the pipeline. */= =0A= - start_sequence ();=0A= - unsigned nops =3D 0;=0A= -=0A= base =3D copy_to_mode_reg (Pmode, XEXP (dst, 0));=0A= dst =3D adjust_automodify_address (dst, VOIDmode, base, 0);=0A= =0A= base =3D copy_to_mode_reg (Pmode, XEXP (src, 0));=0A= src =3D adjust_automodify_address (src, VOIDmode, base, 0);=0A= =0A= + const int max_ops =3D 40;=0A= + rtx load[max_ops], store[max_ops];=0A= +=0A= /* Convert size to bits to make the rest of the code simpler. */=0A= int n =3D size * BITS_PER_UNIT;=0A= + int nops, offset;=0A= =0A= - while (n > 0)=0A= + for (nops =3D 0, offset =3D 0; n > 0; nops++)=0A= {=0A= /* Find the largest mode in which to do the copy in without over rea= ding=0A= or writing. */=0A= @@ -25315,7 +25304,7 @@ aarch64_expand_cpymem (rtx *operands)=0A= if (GET_MODE_BITSIZE (mode_iter.require ()) <=3D MIN (n, copy_bits))=0A= cur_mode =3D mode_iter.require ();=0A= =0A= - gcc_assert (cur_mode !=3D BLKmode);=0A= + gcc_assert (cur_mode !=3D BLKmode && nops < max_ops);=0A= =0A= mode_bits =3D GET_MODE_BITSIZE (cur_mode).to_constant ();=0A= =0A= @@ -25323,49 +25312,38 @@ aarch64_expand_cpymem (rtx *operands)=0A= if (mode_bits =3D=3D 128 && copy_bits =3D=3D 256)=0A= cur_mode =3D V4SImode;=0A= =0A= - aarch64_copy_one_block_and_progress_pointers (&src, &dst, cur_mode);= =0A= - /* A single block copy is 1 load + 1 store. */=0A= - nops +=3D 2;=0A= + aarch64_copy_one_block (&load[nops], &store[nops], src, dst, offset,= cur_mode);=0A= n -=3D mode_bits;=0A= + offset +=3D mode_bits / BITS_PER_UNIT;=0A= =0A= - /* Emit trailing copies using overlapping unaligned accesses=0A= - (when !STRICT_ALIGNMENT) - this is smaller and faster. */=0A= - if (n > 0 && n < copy_bits / 2 && !STRICT_ALIGNMENT)=0A= + /* Emit trailing copies using overlapping unaligned accesses -=0A= + this is smaller and faster. */=0A= + if (n > 0 && n < copy_bits / 2)=0A= {=0A= machine_mode next_mode =3D smallest_mode_for_size (n, MODE_INT);=0A= int n_bits =3D GET_MODE_BITSIZE (next_mode).to_constant ();=0A= gcc_assert (n_bits <=3D mode_bits);=0A= - src =3D aarch64_move_pointer (src, (n - n_bits) / BITS_PER_UNIT);=0A= - dst =3D aarch64_move_pointer (dst, (n - n_bits) / BITS_PER_UNIT);=0A= + offset -=3D (n_bits - n) / BITS_PER_UNIT;=0A= n =3D n_bits;=0A= }=0A= }=0A= - rtx_insn *seq =3D get_insns ();=0A= - end_sequence ();=0A= - /* MOPS sequence requires 3 instructions for the memory copying + 1 to m= ove=0A= - the constant size into a register. */=0A= - unsigned mops_cost =3D 3 + 1;=0A= -=0A= - /* If MOPS is available at this point we don't consider the libcall as i= t's=0A= - not a win even on code size. At this point only consider MOPS if=0A= - optimizing for size. For speed optimizations we will have chosen bet= ween=0A= - the two based on copy size already. */=0A= - if (TARGET_MOPS)=0A= - {=0A= - if (size_p && mops_cost < nops)=0A= - return aarch64_expand_cpymem_mops (operands);=0A= - emit_insn (seq);=0A= - return true;=0A= - }=0A= =0A= - /* A memcpy libcall in the worst case takes 3 instructions to prepare th= e=0A= - arguments + 1 for the call. When MOPS is not available and we're=0A= - optimizing for size a libcall may be preferable. */=0A= - unsigned libcall_cost =3D 4;=0A= - if (size_p && libcall_cost < nops)=0A= - return false;=0A= + /* Memcpy interleaves loads with stores, memmove emits all loads first. = */=0A= + int i, j, m, inc;=0A= + inc =3D is_memmove ? nops : 3;=0A= + if (nops =3D=3D inc + 1)=0A= + inc =3D nops / 2;=0A= + for (i =3D 0; i < nops; i +=3D inc)=0A= + {=0A= + m =3D inc;=0A= + if (i + m > nops)=0A= + m =3D nops - i;=0A= =0A= - emit_insn (seq);=0A= + for (j =3D 0; j < m; j++)=0A= + emit_insn (load[i + j]);=0A= + for (j =3D 0; j < m; j++)=0A= + emit_insn (store[i + j]);=0A= + }=0A= return true;=0A= }=0A= =0A= diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md= =0A= index 96508a2580876d1fdbdfa6c67d1a3d02608c1d24..d08598fcdb146dfe0f6283cf570= 88b224f695c9b 100644=0A= --- a/gcc/config/aarch64/aarch64.md=0A= +++ b/gcc/config/aarch64/aarch64.md=0A= @@ -1629,7 +1629,7 @@ (define_expand "cpymemdi"=0A= (match_operand:DI 3 "immediate_operand")]=0A= ""=0A= {=0A= - if (aarch64_expand_cpymem (operands))=0A= + if (aarch64_expand_cpymem (operands, false))=0A= DONE;=0A= FAIL;=0A= }=0A= @@ -1673,17 +1673,9 @@ (define_expand "movmemdi"=0A= (match_operand:BLK 1 "memory_operand")=0A= (match_operand:DI 2 "general_operand")=0A= (match_operand:DI 3 "immediate_operand")]=0A= - "TARGET_MOPS"=0A= + ""=0A= {=0A= - rtx sz_reg =3D operands[2];=0A= - /* For constant-sized memmoves check the threshold.=0A= - FIXME: We should add a non-MOPS memmove expansion for smaller,=0A= - constant-sized memmove to avoid going to a libcall. */=0A= - if (CONST_INT_P (sz_reg)=0A= - && INTVAL (sz_reg) < aarch64_mops_memmove_size_threshold)=0A= - FAIL;=0A= -=0A= - if (aarch64_expand_cpymem_mops (operands, true))=0A= + if (aarch64_expand_cpymem (operands, true))=0A= DONE;=0A= FAIL;=0A= }=0A= diff --git a/gcc/config/aarch64/aarch64.opt b/gcc/config/aarch64/aarch64.op= t=0A= index 4a0580435a8d3c92eca8936515026882c7ea7f48..305923751067ac14d228c5fd51b= c24eeaca164dc 100644=0A= --- a/gcc/config/aarch64/aarch64.opt=0A= +++ b/gcc/config/aarch64/aarch64.opt=0A= @@ -327,7 +327,7 @@ Target Joined UInteger Var(aarch64_mops_memcpy_size_thr= eshold) Init(256) Param=0A= Constant memcpy size in bytes above which to start using MOPS sequence.=0A= =0A= -param=3Daarch64-mops-memmove-size-threshold=3D=0A= -Target Joined UInteger Var(aarch64_mops_memmove_size_threshold) Init(0) Pa= ram=0A= +Target Joined UInteger Var(aarch64_mops_memmove_size_threshold) Init(256) = Param=0A= Constant memmove size in bytes above which to start using MOPS sequence.= =0A= =0A= -param=3Daarch64-mops-memset-size-threshold=3D=0A= diff --git a/gcc/testsuite/gcc.target/aarch64/memmove.c b/gcc/testsuite/gcc= .target/aarch64/memmove.c=0A= new file mode 100644=0A= index 0000000000000000000000000000000000000000..6926a97761eb2578d3f1db7e6eb= 19dba17b888be=0A= --- /dev/null=0A= +++ b/gcc/testsuite/gcc.target/aarch64/memmove.c=0A= @@ -0,0 +1,22 @@=0A= +/* { dg-do compile } */=0A= +/* { dg-options "-O2" } */=0A= +=0A= +void=0A= +copy1 (int *x, int *y)=0A= +{=0A= + __builtin_memmove (x, y, 12);=0A= +}=0A= +=0A= +void=0A= +copy2 (int *x, int *y)=0A= +{=0A= + __builtin_memmove (x, y, 128);=0A= +}=0A= +=0A= +void=0A= +copy3 (int *x, int *y)=0A= +{=0A= + __builtin_memmove (x, y, 255);=0A= +}=0A= +=0A= +/* { dg-final { scan-assembler-not {\tb\tmemmove} } } */=0A= =0A=