From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=vlur=FF=arm.com=Wilco.Dijkstra@sourceware.org>
Received: from EUR05-VI1-obe.outbound.protection.outlook.com (mail-vi1eur05on2042.outbound.protection.outlook.com [40.107.21.42])
	by sourceware.org (Postfix) with ESMTPS id 390FB3858288
	for <gcc-patches@gcc.gnu.org>; Thu, 21 Sep 2023 16:20:12 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 390FB3858288
Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com;
 s=selector2-armh-onmicrosoft-com;
 h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck;
 bh=w7i305WqidXF09P+xldQyWOneGmOCCZ90t6y1bW5aK8=;
 b=0b0Xm0vQ+QCmHBGlhEKzuGNxwl47BDTYvV5evVbqjBt+kQvGtLiUmkDUPzCfPe0ClJ7iYFSaiXKRpVA/AupXFuMTGjGIt7O15xQdxZ0k8Du95ZKc7Qz7tq3S8WoLCh9cA+MGD1WMjwP/5AOPhpoW0WziQduv5JHDpHZe8wC80LY=
Received: from AS9PR01CA0027.eurprd01.prod.exchangelabs.com
 (2603:10a6:20b:542::11) by AS2PR08MB9248.eurprd08.prod.outlook.com
 (2603:10a6:20b:59c::17) with Microsoft SMTP Server (version=TLS1_2,
 cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6792.26; Thu, 21 Sep
 2023 16:20:08 +0000
Received: from AM7EUR03FT006.eop-EUR03.prod.protection.outlook.com
 (2603:10a6:20b:542:cafe::56) by AS9PR01CA0027.outlook.office365.com
 (2603:10a6:20b:542::11) with Microsoft SMTP Server (version=TLS1_2,
 cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6792.30 via Frontend
 Transport; Thu, 21 Sep 2023 16:20:08 +0000
X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123)
 smtp.mailfrom=arm.com; dkim=pass (signature was verified)
 header.d=armh.onmicrosoft.com;dmarc=pass action=none header.from=arm.com;
Received-SPF: Pass (protection.outlook.com: domain of arm.com designates
 63.35.35.123 as permitted sender) receiver=protection.outlook.com;
 client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com;
 pr=C
Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by
 AM7EUR03FT006.mail.protection.outlook.com (100.127.141.21) with Microsoft
 SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id
 15.20.6813.20 via Frontend Transport; Thu, 21 Sep 2023 16:20:08 +0000
Received: ("Tessian outbound 1eb4e931b055:v175"); Thu, 21 Sep 2023 16:20:08 +0000
X-CheckRecipientChecked: true
X-CR-MTA-CID: ad568cb1e7ae1d75
X-CR-MTA-TID: 64aa7808
Received: from 46fd500b63a2.2
	by 64aa7808-outbound-1.mta.getcheckrecipient.com id A8E13647-5E17-4D12-91D4-87319A202052.1;
	Thu, 21 Sep 2023 16:19:57 +0000
Received: from EUR04-VI1-obe.outbound.protection.outlook.com
    by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id 46fd500b63a2.2
    (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384);
    Thu, 21 Sep 2023 16:19:57 +0000
ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none;
 b=QrQNrFi53XG/+ZA88A1zxZ9AEyAeEPX4NJ1ANgkv8oxICoRs+fC0cbX8lE95C54WdeV/iVQVXCY7O4pZVEg4gqzoApWHIkGH3N41UqWG4Xa8jGWUF7aNDVbPI1sWE27t6asu3SLlWVwNHRR4RnBZa9hrkzMkG4kLR4kdZhrkqkWiZWIS53naF13+9BDAiKlUlvlbDQek/fOAxAwHHCSYzFwoY/Qq5gvq56FjiAIzDc9gDEO8pEGDV7hLk4nvTmYDGBDolBNKGCtmh/DiB3eJp4BGnq2nyTxjftMQUv0WfTRUAzJR1P084JhGZXG7IeGLNvSryRNZqtSdDhLcG6zUCw==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com;
 s=arcselector9901;
 h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1;
 bh=w7i305WqidXF09P+xldQyWOneGmOCCZ90t6y1bW5aK8=;
 b=D7Qu2CW5ZmlsqCakapmXcOrV18mpZSjkKUIIUYHzjh0Qr8kvz0w0xOimsC17NT6F8/mhtAD9xg3oMOgSPd/nDF8xg6Hyfz3Ke9vccOcPva5m3B4jrffWEOZavRyM+g3Hw1rCpT0R0SGWC8GUhnfpjOTXfSCjl81ZvDEJJ5F/ISrro49A3UAVOKkqpxYf+oJA3MMXNq0+AcI09X6DRAuCquOYH9t6GpJNQtceMDx0NllmeI9R2HUavXI33TnduHzmMdJ11hOTM4wUquobqEHCvg+JutofAhH4RSsLLMugn8OnlHDTm2IFY/tzI5Ar7Z97gTrHqKcZtlYRaFE3c7ujjQ==
ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass
 smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass
 header.d=arm.com; arc=none
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com;
 s=selector2-armh-onmicrosoft-com;
 h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck;
 bh=w7i305WqidXF09P+xldQyWOneGmOCCZ90t6y1bW5aK8=;
 b=0b0Xm0vQ+QCmHBGlhEKzuGNxwl47BDTYvV5evVbqjBt+kQvGtLiUmkDUPzCfPe0ClJ7iYFSaiXKRpVA/AupXFuMTGjGIt7O15xQdxZ0k8Du95ZKc7Qz7tq3S8WoLCh9cA+MGD1WMjwP/5AOPhpoW0WziQduv5JHDpHZe8wC80LY=
Received: from PAWPR08MB8982.eurprd08.prod.outlook.com (2603:10a6:102:33f::20)
 by AM8PR08MB6339.eurprd08.prod.outlook.com (2603:10a6:20b:317::6) with
 Microsoft SMTP Server (version=TLS1_2,
 cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6813.20; Thu, 21 Sep
 2023 16:19:54 +0000
Received: from PAWPR08MB8982.eurprd08.prod.outlook.com
 ([fe80::ff3d:6e95:9971:a7e]) by PAWPR08MB8982.eurprd08.prod.outlook.com
 ([fe80::ff3d:6e95:9971:a7e%5]) with mapi id 15.20.6813.017; Thu, 21 Sep 2023
 16:19:51 +0000
From: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
To: GCC Patches <gcc-patches@gcc.gnu.org>
CC: Richard Sandiford <Richard.Sandiford@arm.com>, Richard Earnshaw
	<Richard.Earnshaw@arm.com>
Subject: [PATCH] AArch64: Add inline memmove expansion
Thread-Topic: [PATCH] AArch64: Add inline memmove expansion
Thread-Index: AQHZ7KZq1x2vk2WvwkSrmqteNIco5A==
Date: Thu, 21 Sep 2023 16:19:51 +0000
Message-ID:
 <PAWPR08MB8982074EA9925BC24E2DC43A83F8A@PAWPR08MB8982.eurprd08.prod.outlook.com>
Accept-Language: en-GB, en-US
Content-Language: en-GB
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
msip_labels:
Authentication-Results-Original: dkim=none (message not signed)
 header.d=none;dmarc=none action=none header.from=arm.com;
x-ms-traffictypediagnostic:
	PAWPR08MB8982:EE_|AM8PR08MB6339:EE_|AM7EUR03FT006:EE_|AS2PR08MB9248:EE_
X-MS-Office365-Filtering-Correlation-Id: a2d5f31a-cf98-4d36-32cf-08dbbabe9f97
x-checkrecipientrouted: true
nodisclaimer: true
X-MS-Exchange-SenderADCheck: 1
X-MS-Exchange-AntiSpam-Relay: 0
X-Microsoft-Antispam-Untrusted: BCL:0;
X-Microsoft-Antispam-Message-Info-Original:
 nhcOrAjuKDB9WS4/fkJ+M9js/gp9Eqq2VqXs9Eh9koJCEORNBUs0pBbyVxx20o7edqQUgcNZMFzig4UJI3NelK7+rJz+WQCSJVpmZXPEnQi+a4yYNSMR0LpYu6rBiNruwoQFtCid50G7vqxg6cRzjedAJq2WVUqoL1nAk0w865PjZf/G4ic75dPWo4D+FZn+eEtQ8OPdJ+M1I+6RIvcXJtD5R81V4lm6WJRkMjrNiMv9IeYWwqnABsqHjY0hvbodmTNxFct/RvibN9FyIwos6UtMPmL+69EdODRMA01NgQX+4sLezUtHSK2dSB3Ajzmf/Ch2SW8Vj4yd0GpytV4NoU4HaIcyICzJVMs12dZqSogDFuzc0wEvVTdLg3X3Sxz5BZ6w/C6FllrKk2pIu0nvcNPQDdu/xjkl6HCPChHvGpgXPe4/STnoytBOtQCSu9CL4u+eSbQ+YUOXCVnfsXFkoo4G35uOkTxaD7zXL1YdMx93KFj04+clKdmP3ZKtTLteYv/Bm55KLja097xwc6didxWq+i57F9pTyDmQ3t9KDpgPQgmpPZHYEI8HztgM6FA67vlm+Buvt2u2/WC/ifokRtiexM9bXm0Aniq/fUbIJKo=
X-Forefront-Antispam-Report-Untrusted:
 CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:PAWPR08MB8982.eurprd08.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230031)(39860400002)(136003)(366004)(346002)(396003)(376002)(1800799009)(186009)(451199024)(91956017)(7696005)(6506007)(9686003)(122000001)(86362001)(66946007)(76116006)(38070700005)(71200400001)(478600001)(55016003)(38100700002)(26005)(66556008)(66476007)(66446008)(5660300002)(52536014)(4326008)(54906003)(6916009)(64756008)(316002)(8936002)(41300700001)(33656002)(8676002)(2906002)(30864003)(84970400001)(83380400001);DIR:OUT;SFP:1101;
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
X-MS-Exchange-Transport-CrossTenantHeadersStamped: AM8PR08MB6339
Original-Authentication-Results: dkim=none (message not signed)
 header.d=none;dmarc=none action=none header.from=arm.com;
X-EOPAttributedMessage: 0
X-MS-Exchange-Transport-CrossTenantHeadersStripped:
 AM7EUR03FT006.eop-EUR03.prod.protection.outlook.com
X-MS-PublicTrafficType: Email
X-MS-Office365-Filtering-Correlation-Id-Prvs:
	e9a53da5-91b4-45ac-5362-08dbbabe956c
X-Microsoft-Antispam: BCL:0;
X-Microsoft-Antispam-Message-Info:
	tuEsQn6fzyNmzyF7syOXg+zOBWmicZlQZkdslcyElguSsnYh85Lf8qzOfJMKjmwqlo35f1Icsq2bR+08pkLqVVs++6TfDXzuyO1X4VbBYdd9dkxAr/cm1a+7jHSuUJPa5ETcVZWVpYPkbiSQLoOf9pF8IWEPmHiptfDLX5mx+fdTP9GoVYe5MkJT5JFokYA7lxa/bJYr7agPA81WsdrWeEEkeJTtlrKgY9tnLKlRfzSaAy4gJLr8ssLm33cnl/NQF9z28fLKJV0GmiJs7j5WWG9WN64C7HCqCNsEeb1dLBHdjnj8vldzGYsA3COHUOuFGGUPKr8+7ZiqdmL2lVRTQ+/1igT/pJsVeqsR2581qCFF03VI69+wCKJcak6BQlv2eMNVwF1iyRqShvpOvoydn4jP7V8JV0rrYSbYYXwMUIgT6J4yDSk4M3ziZIIkPq1l4VGaLnBqI0E3SeVXOvzeKFL/KfdcGNVGQiqNq5+dXJDH4JqrT4r+Tk3yrR0Ol/2zsZtt+dTlk1X7PQLFZihoX3Bi0jmH6j7c71XIAn4oe93RVMklvtm2m4LdSVq+hnRgI1/cqesRZAG7fqAvIiv/Xw+kFrXRQKMAoXRjw8frYhBIYlgDX3l7eybnA3BiTYe74hZlrMHTgn26uwSmNxgK8qR9jRy1Gi10FqDCbZBQJoWhIBU7t3tgN2vWk3qDxmgCVKzD6++PH7HbF7SR7gQmiaUG3DMJ7N3/0mCTMebwa1w=
X-Forefront-Antispam-Report:
	CIP:63.35.35.123;CTRY:IE;LANG:en;SCL:1;SRV:;IPV:CAL;SFV:NSPM;H:64aa7808-outbound-1.mta.getcheckrecipient.com;PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com;CAT:NONE;SFS:(13230031)(4636009)(396003)(346002)(39860400002)(136003)(376002)(451199024)(186009)(82310400011)(1800799009)(40470700004)(46966006)(36840700001)(478600001)(83380400001)(82740400003)(6506007)(7696005)(70586007)(9686003)(336012)(2906002)(41300700001)(47076005)(54906003)(6916009)(70206006)(316002)(30864003)(52536014)(26005)(40460700003)(8936002)(356005)(36860700001)(86362001)(81166007)(4326008)(40480700001)(33656002)(55016003)(84970400001)(8676002)(5660300002);DIR:OUT;SFP:1101;
X-OriginatorOrg: arm.com
X-MS-Exchange-CrossTenant-OriginalArrivalTime: 21 Sep 2023 16:20:08.6314
 (UTC)
X-MS-Exchange-CrossTenant-Network-Message-Id: a2d5f31a-cf98-4d36-32cf-08dbbabe9f97
X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d
X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d;Ip=[63.35.35.123];Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com]
X-MS-Exchange-CrossTenant-AuthSource:
	AM7EUR03FT006.eop-EUR03.prod.protection.outlook.com
X-MS-Exchange-CrossTenant-AuthAs: Anonymous
X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem
X-MS-Exchange-Transport-CrossTenantHeadersStamped: AS2PR08MB9248
X-Spam-Status: No, score=-10.6 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,FORGED_SPF_HELO,GIT_PATCH_0,KAM_DMARC_NONE,KAM_LOTSOFHASH,KAM_SHORT,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_PASS,SPF_NONE,TXREP,UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <gcc-patches.gcc.gnu.org>

=0A=
Add support for inline memmove expansions.  The generated code is identical=
=0A=
as for memcpy, except that all loads are emitted before stores rather than=
=0A=
being interleaved.  The maximum size is 256 bytes which requires at most 16=
=0A=
registers.=0A=
=0A=
Passes regress/bootstrap, OK for commit?=0A=
    =0A=
gcc/ChangeLog/=0A=
        * config/aarch64/aarch64.opt (aarch64_mops_memmove_size_threshold):=
=0A=
        Change default.=0A=
        * config/aarch64/aarch64.md (cpymemdi): Add a parameter.=0A=
        (movmemdi): Call aarch64_expand_cpymem.=0A=
        * config/aarch64/aarch64.cc (aarch64_copy_one_block): Rename functi=
on,=0A=
        simplify, support storing generated loads/stores. =0A=
        (aarch64_expand_cpymem): Support expansion of memmove.=0A=
        * config/aarch64/aarch64-protos.h (aarch64_expand_cpymem): Add bool=
 arg.=0A=
=0A=
gcc/testsuite/ChangeLog/=0A=
        * gcc.target/aarch64/memmove.c: Add new test.=0A=
=0A=
---=0A=
=0A=
diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch=
64-protos.h=0A=
index e8d91cba30e32e03c4794ccc24254691d135f2dd..e224218600969d9d052128790f1=
524414bbab5c6 100644=0A=
--- a/gcc/config/aarch64/aarch64-protos.h=0A=
+++ b/gcc/config/aarch64/aarch64-protos.h=0A=
@@ -766,7 +766,7 @@ bool aarch64_emit_approx_sqrt (rtx, rtx, bool);=0A=
 tree aarch64_vector_load_decl (tree);=0A=
 void aarch64_expand_call (rtx, rtx, rtx, bool);=0A=
 bool aarch64_expand_cpymem_mops (rtx *, bool);=0A=
-bool aarch64_expand_cpymem (rtx *);=0A=
+bool aarch64_expand_cpymem (rtx *, bool);=0A=
 bool aarch64_expand_setmem (rtx *);=0A=
 bool aarch64_float_const_zero_rtx_p (rtx);=0A=
 bool aarch64_float_const_rtx_p (rtx);=0A=
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc=
=0A=
index 8a12894d6b80de1031d6e7d02dca680c57bce136..a573e3bded2736f5108ad2d4004=
f530e0f32c99c 100644=0A=
--- a/gcc/config/aarch64/aarch64.cc=0A=
+++ b/gcc/config/aarch64/aarch64.cc=0A=
@@ -25191,48 +25191,35 @@ aarch64_progress_pointer (rtx pointer)=0A=
    MODE bytes.  */=0A=
 =0A=
 static void=0A=
-aarch64_copy_one_block_and_progress_pointers (rtx *src, rtx *dst,=0A=
-					      machine_mode mode)=0A=
+aarch64_copy_one_block (rtx *load, rtx *store, rtx src, rtx dst,=0A=
+			int offset, machine_mode mode)=0A=
 {=0A=
   /* Handle 256-bit memcpy separately.  We do this by making 2 adjacent me=
mory=0A=
      address copies using V4SImode so that we can use Q registers.  */=0A=
   if (known_eq (GET_MODE_BITSIZE (mode), 256))=0A=
     {=0A=
       mode =3D V4SImode;=0A=
+      rtx src1 =3D adjust_address (src, mode, offset);=0A=
+      rtx src2 =3D adjust_address (src, mode, offset + 16);=0A=
+      rtx dst1 =3D adjust_address (dst, mode, offset);=0A=
+      rtx dst2 =3D adjust_address (dst, mode, offset + 16);=0A=
       rtx reg1 =3D gen_reg_rtx (mode);=0A=
       rtx reg2 =3D gen_reg_rtx (mode);=0A=
-      /* "Cast" the pointers to the correct mode.  */=0A=
-      *src =3D adjust_address (*src, mode, 0);=0A=
-      *dst =3D adjust_address (*dst, mode, 0);=0A=
-      /* Emit the memcpy.  */=0A=
-      emit_insn (aarch64_gen_load_pair (mode, reg1, *src, reg2,=0A=
-					aarch64_progress_pointer (*src)));=0A=
-      emit_insn (aarch64_gen_store_pair (mode, *dst, reg1,=0A=
-					 aarch64_progress_pointer (*dst), reg2));=0A=
-      /* Move the pointers forward.  */=0A=
-      *src =3D aarch64_move_pointer (*src, 32);=0A=
-      *dst =3D aarch64_move_pointer (*dst, 32);=0A=
+      *load =3D aarch64_gen_load_pair (mode, reg1, src1, reg2, src2);=0A=
+      *store =3D aarch64_gen_store_pair (mode, dst1, reg1, dst2, reg2);=0A=
       return;=0A=
     }=0A=
 =0A=
   rtx reg =3D gen_reg_rtx (mode);=0A=
-=0A=
-  /* "Cast" the pointers to the correct mode.  */=0A=
-  *src =3D adjust_address (*src, mode, 0);=0A=
-  *dst =3D adjust_address (*dst, mode, 0);=0A=
-  /* Emit the memcpy.  */=0A=
-  emit_move_insn (reg, *src);=0A=
-  emit_move_insn (*dst, reg);=0A=
-  /* Move the pointers forward.  */=0A=
-  *src =3D aarch64_progress_pointer (*src);=0A=
-  *dst =3D aarch64_progress_pointer (*dst);=0A=
+  *load =3D gen_move_insn (reg, adjust_address (src, mode, offset));=0A=
+  *store =3D gen_move_insn (adjust_address (dst, mode, offset), reg);=0A=
 }=0A=
 =0A=
 /* Expand a cpymem/movmem using the MOPS extension.  OPERANDS are taken=0A=
    from the cpymem/movmem pattern.  IS_MEMMOVE is true if this is a memmov=
e=0A=
    rather than memcpy.  Return true iff we succeeded.  */=0A=
 bool=0A=
-aarch64_expand_cpymem_mops (rtx *operands, bool is_memmove =3D false)=0A=
+aarch64_expand_cpymem_mops (rtx *operands, bool is_memmove)=0A=
 {=0A=
   if (!TARGET_MOPS)=0A=
     return false;=0A=
@@ -25251,12 +25238,12 @@ aarch64_expand_cpymem_mops (rtx *operands, bool i=
s_memmove =3D false)=0A=
   return true;=0A=
 }=0A=
 =0A=
-/* Expand cpymem, as if from a __builtin_memcpy.  Return true if=0A=
-   we succeed, otherwise return false, indicating that a libcall to=0A=
-   memcpy should be emitted.  */=0A=
-=0A=
+/* Expand cpymem/movmem, as if from a __builtin_memcpy/memmove.=0A=
+   OPERANDS are taken from the cpymem/movmem pattern.  IS_MEMMOVE is true=
=0A=
+   if this is a memmove rather than memcpy.  Return true if we succeed,=0A=
+   otherwise return false, indicating that a libcall should be emitted.  *=
/=0A=
 bool=0A=
-aarch64_expand_cpymem (rtx *operands)=0A=
+aarch64_expand_cpymem (rtx *operands, bool is_memmove)=0A=
 {=0A=
   int mode_bits;=0A=
   rtx dst =3D operands[0];=0A=
@@ -25268,17 +25255,22 @@ aarch64_expand_cpymem (rtx *operands)=0A=
 =0A=
   /* Variable-sized or strict-align copies may use the MOPS expansion.  */=
=0A=
   if (!CONST_INT_P (operands[2]) || (STRICT_ALIGNMENT && align < 16))=0A=
-    return aarch64_expand_cpymem_mops (operands);=0A=
+    return aarch64_expand_cpymem_mops (operands, is_memmove);=0A=
 =0A=
   unsigned HOST_WIDE_INT size =3D UINTVAL (operands[2]);=0A=
 =0A=
-  /* Try to inline up to 256 bytes.  */=0A=
-  unsigned max_copy_size =3D 256;=0A=
-  unsigned mops_threshold =3D aarch64_mops_memcpy_size_threshold;=0A=
+  /* Set inline limits for memmove/memcpy.  MOPS has a separate threshold.=
  */=0A=
+  unsigned max_copy_size =3D TARGET_SIMD ? 256 : 128;=0A=
+  unsigned mops_threshold =3D is_memmove ? aarch64_mops_memmove_size_thres=
hold=0A=
+				       : aarch64_mops_memcpy_size_threshold;=0A=
+=0A=
+  /* Reduce the maximum size with -Os.  */=0A=
+  if (size_p)=0A=
+    max_copy_size /=3D 4;=0A=
 =0A=
   /* Large copies use MOPS when available or a library call.  */=0A=
   if (size > max_copy_size || (TARGET_MOPS && size > mops_threshold))=0A=
-    return aarch64_expand_cpymem_mops (operands);=0A=
+    return aarch64_expand_cpymem_mops (operands, is_memmove);=0A=
 =0A=
   int copy_bits =3D 256;=0A=
 =0A=
@@ -25290,23 +25282,20 @@ aarch64_expand_cpymem (rtx *operands)=0A=
 	  & AARCH64_EXTRA_TUNE_NO_LDP_STP_QREGS))=0A=
     copy_bits =3D 128;=0A=
 =0A=
-  /* Emit an inline load+store sequence and count the number of operations=
=0A=
-     involved.  We use a simple count of just the loads and stores emitted=
=0A=
-     rather than rtx_insn count as all the pointer adjustments and reg cop=
ying=0A=
-     in this function will get optimized away later in the pipeline.  */=
=0A=
-  start_sequence ();=0A=
-  unsigned nops =3D 0;=0A=
-=0A=
   base =3D copy_to_mode_reg (Pmode, XEXP (dst, 0));=0A=
   dst =3D adjust_automodify_address (dst, VOIDmode, base, 0);=0A=
 =0A=
   base =3D copy_to_mode_reg (Pmode, XEXP (src, 0));=0A=
   src =3D adjust_automodify_address (src, VOIDmode, base, 0);=0A=
 =0A=
+  const int max_ops =3D 40;=0A=
+  rtx load[max_ops], store[max_ops];=0A=
+=0A=
   /* Convert size to bits to make the rest of the code simpler.  */=0A=
   int n =3D size * BITS_PER_UNIT;=0A=
+  int nops, offset;=0A=
 =0A=
-  while (n > 0)=0A=
+  for (nops =3D 0, offset =3D 0; n > 0; nops++)=0A=
     {=0A=
       /* Find the largest mode in which to do the copy in without over rea=
ding=0A=
 	 or writing.  */=0A=
@@ -25315,7 +25304,7 @@ aarch64_expand_cpymem (rtx *operands)=0A=
 	if (GET_MODE_BITSIZE (mode_iter.require ()) <=3D MIN (n, copy_bits))=0A=
 	  cur_mode =3D mode_iter.require ();=0A=
 =0A=
-      gcc_assert (cur_mode !=3D BLKmode);=0A=
+      gcc_assert (cur_mode !=3D BLKmode && nops < max_ops);=0A=
 =0A=
       mode_bits =3D GET_MODE_BITSIZE (cur_mode).to_constant ();=0A=
 =0A=
@@ -25323,49 +25312,38 @@ aarch64_expand_cpymem (rtx *operands)=0A=
       if (mode_bits =3D=3D 128 && copy_bits =3D=3D 256)=0A=
 	cur_mode =3D V4SImode;=0A=
 =0A=
-      aarch64_copy_one_block_and_progress_pointers (&src, &dst, cur_mode);=
=0A=
-      /* A single block copy is 1 load + 1 store.  */=0A=
-      nops +=3D 2;=0A=
+      aarch64_copy_one_block (&load[nops], &store[nops], src, dst, offset,=
 cur_mode);=0A=
       n -=3D mode_bits;=0A=
+      offset +=3D mode_bits / BITS_PER_UNIT;=0A=
 =0A=
-      /* Emit trailing copies using overlapping unaligned accesses=0A=
-	(when !STRICT_ALIGNMENT) - this is smaller and faster.  */=0A=
-      if (n > 0 && n < copy_bits / 2 && !STRICT_ALIGNMENT)=0A=
+      /* Emit trailing copies using overlapping unaligned accesses -=0A=
+	 this is smaller and faster.  */=0A=
+      if (n > 0 && n < copy_bits / 2)=0A=
 	{=0A=
 	  machine_mode next_mode =3D smallest_mode_for_size (n, MODE_INT);=0A=
 	  int n_bits =3D GET_MODE_BITSIZE (next_mode).to_constant ();=0A=
 	  gcc_assert (n_bits <=3D mode_bits);=0A=
-	  src =3D aarch64_move_pointer (src, (n - n_bits) / BITS_PER_UNIT);=0A=
-	  dst =3D aarch64_move_pointer (dst, (n - n_bits) / BITS_PER_UNIT);=0A=
+	  offset -=3D (n_bits - n) / BITS_PER_UNIT;=0A=
 	  n =3D n_bits;=0A=
 	}=0A=
     }=0A=
-  rtx_insn *seq =3D get_insns ();=0A=
-  end_sequence ();=0A=
-  /* MOPS sequence requires 3 instructions for the memory copying + 1 to m=
ove=0A=
-     the constant size into a register.  */=0A=
-  unsigned mops_cost =3D 3 + 1;=0A=
-=0A=
-  /* If MOPS is available at this point we don't consider the libcall as i=
t's=0A=
-     not a win even on code size.  At this point only consider MOPS if=0A=
-     optimizing for size.  For speed optimizations we will have chosen bet=
ween=0A=
-     the two based on copy size already.  */=0A=
-  if (TARGET_MOPS)=0A=
-    {=0A=
-      if (size_p && mops_cost < nops)=0A=
-	return aarch64_expand_cpymem_mops (operands);=0A=
-      emit_insn (seq);=0A=
-      return true;=0A=
-    }=0A=
 =0A=
-  /* A memcpy libcall in the worst case takes 3 instructions to prepare th=
e=0A=
-     arguments + 1 for the call.  When MOPS is not available and we're=0A=
-     optimizing for size a libcall may be preferable.  */=0A=
-  unsigned libcall_cost =3D 4;=0A=
-  if (size_p && libcall_cost < nops)=0A=
-    return false;=0A=
+  /* Memcpy interleaves loads with stores, memmove emits all loads first. =
 */=0A=
+  int i, j, m, inc;=0A=
+  inc =3D is_memmove ? nops : 3;=0A=
+  if (nops =3D=3D inc + 1)=0A=
+    inc =3D nops / 2;=0A=
+  for (i =3D 0; i < nops; i +=3D inc)=0A=
+    {=0A=
+      m =3D inc;=0A=
+      if (i + m > nops)=0A=
+	m =3D nops - i;=0A=
 =0A=
-  emit_insn (seq);=0A=
+      for (j =3D 0; j < m; j++)=0A=
+	emit_insn (load[i + j]);=0A=
+      for (j =3D 0; j < m; j++)=0A=
+	emit_insn (store[i + j]);=0A=
+    }=0A=
   return true;=0A=
 }=0A=
 =0A=
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md=
=0A=
index 96508a2580876d1fdbdfa6c67d1a3d02608c1d24..d08598fcdb146dfe0f6283cf570=
88b224f695c9b 100644=0A=
--- a/gcc/config/aarch64/aarch64.md=0A=
+++ b/gcc/config/aarch64/aarch64.md=0A=
@@ -1629,7 +1629,7 @@ (define_expand "cpymemdi"=0A=
    (match_operand:DI 3 "immediate_operand")]=0A=
    ""=0A=
 {=0A=
-  if (aarch64_expand_cpymem (operands))=0A=
+  if (aarch64_expand_cpymem (operands, false))=0A=
     DONE;=0A=
   FAIL;=0A=
 }=0A=
@@ -1673,17 +1673,9 @@ (define_expand "movmemdi"=0A=
    (match_operand:BLK 1 "memory_operand")=0A=
    (match_operand:DI 2 "general_operand")=0A=
    (match_operand:DI 3 "immediate_operand")]=0A=
-   "TARGET_MOPS"=0A=
+   ""=0A=
 {=0A=
-   rtx sz_reg =3D operands[2];=0A=
-   /* For constant-sized memmoves check the threshold.=0A=
-      FIXME: We should add a non-MOPS memmove expansion for smaller,=0A=
-      constant-sized memmove to avoid going to a libcall.  */=0A=
-   if (CONST_INT_P (sz_reg)=0A=
-       && INTVAL (sz_reg) < aarch64_mops_memmove_size_threshold)=0A=
-     FAIL;=0A=
-=0A=
-  if (aarch64_expand_cpymem_mops (operands, true))=0A=
+  if (aarch64_expand_cpymem (operands, true))=0A=
     DONE;=0A=
   FAIL;=0A=
 }=0A=
diff --git a/gcc/config/aarch64/aarch64.opt b/gcc/config/aarch64/aarch64.op=
t=0A=
index 4a0580435a8d3c92eca8936515026882c7ea7f48..305923751067ac14d228c5fd51b=
c24eeaca164dc 100644=0A=
--- a/gcc/config/aarch64/aarch64.opt=0A=
+++ b/gcc/config/aarch64/aarch64.opt=0A=
@@ -327,7 +327,7 @@ Target Joined UInteger Var(aarch64_mops_memcpy_size_thr=
eshold) Init(256) Param=0A=
 Constant memcpy size in bytes above which to start using MOPS sequence.=0A=
 =0A=
 -param=3Daarch64-mops-memmove-size-threshold=3D=0A=
-Target Joined UInteger Var(aarch64_mops_memmove_size_threshold) Init(0) Pa=
ram=0A=
+Target Joined UInteger Var(aarch64_mops_memmove_size_threshold) Init(256) =
Param=0A=
 Constant memmove size in bytes above which to start using MOPS sequence.=
=0A=
 =0A=
 -param=3Daarch64-mops-memset-size-threshold=3D=0A=
diff --git a/gcc/testsuite/gcc.target/aarch64/memmove.c b/gcc/testsuite/gcc=
.target/aarch64/memmove.c=0A=
new file mode 100644=0A=
index 0000000000000000000000000000000000000000..6926a97761eb2578d3f1db7e6eb=
19dba17b888be=0A=
--- /dev/null=0A=
+++ b/gcc/testsuite/gcc.target/aarch64/memmove.c=0A=
@@ -0,0 +1,22 @@=0A=
+/* { dg-do compile } */=0A=
+/* { dg-options "-O2" } */=0A=
+=0A=
+void=0A=
+copy1 (int *x, int *y)=0A=
+{=0A=
+  __builtin_memmove (x, y, 12);=0A=
+}=0A=
+=0A=
+void=0A=
+copy2 (int *x, int *y)=0A=
+{=0A=
+  __builtin_memmove (x, y, 128);=0A=
+}=0A=
+=0A=
+void=0A=
+copy3 (int *x, int *y)=0A=
+{=0A=
+  __builtin_memmove (x, y, 255);=0A=
+}=0A=
+=0A=
+/* { dg-final { scan-assembler-not {\tb\tmemmove} } } */=0A=
=0A=