From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from EUR01-DB5-obe.outbound.protection.outlook.com (mail-db5eur01on2052.outbound.protection.outlook.com [40.107.15.52]) by sourceware.org (Postfix) with ESMTPS id E8A5A3858D33 for ; Fri, 22 Dec 2023 14:25:29 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org E8A5A3858D33 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org E8A5A3858D33 Authentication-Results: server2.sourceware.org; arc=pass smtp.remote-ip=40.107.15.52 ARC-Seal: i=3; a=rsa-sha256; d=sourceware.org; s=key; t=1703255132; cv=pass; b=nEVFnWJUqSWJp8r114uNS1I/I1lBRNSt6IaSjBsf5e/41phHz+JlOmzSYZ71F/KDqrpVxtnhCF53zdaChIQEgU/rdmZRSEnbuz7+jkVtnYZWqRlgRuYVTxVBXCTZxve6qIDaZ+hRCEQ9EeOyB1p5/bwTS75aHiNfVVpd8q/Y4Jk= ARC-Message-Signature: i=3; a=rsa-sha256; d=sourceware.org; s=key; t=1703255132; c=relaxed/simple; bh=iGuttcSgZx4+L+CBoLCDqGQmnzsp0WpEqSaEUdHP+yc=; h=DKIM-Signature:DKIM-Signature:From:To:Subject:Date:Message-ID: MIME-Version; b=T0PrVwWFJ2YwLswa/xfwI0PXfnxXdDgyUYxBnWWrGdbPLbtV3eAmcnfiWHDPpPVLM8SxLodP7xZfQ4Iqc6v65KsMbO7VjcpQA5d0LCGmcqqL2Tghiw/Asj6Hv+H6ZGhoevJe4JvVFrOpVXlBEpGScNkCm7goszB37xLnOQcc0FY= ARC-Authentication-Results: i=3; server2.sourceware.org ARC-Seal: i=2; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=pass; b=Op2szXHgEjfk5UpdLOxA1zmV0b5xQTd6OfI2I/6CbCNYRWR6xn7+vvWI6ibAHLlm81DkOR4cfiZBWgalfl0M/tMw6noPxNALbJA6+9JHW1Zeosv8RzRLFzxiQhF8eQmLyJt892Mwv5bd9MrdtTF6ohEiUMfJDNOYp4blQX7S6MPG1zd5nYtLdniEyqAvAq5Xa+AqWKOG0a5ECCnJLkZZ+TUDJMYCc3LjmNtaEWJXKW6Nxs0XJ7x1Q69f0jNJ8Kzm9+VBrSH+Q2nDcgSZlDTNhQbhxwFMS9oXufrZvezdAzTOn1cd4BXFHCapFIXAD2fYk7a/l0Qr+z5P66fMdSpMcA== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=5PO6Rmq5FvJSf418Um0/QoTQ2ujVlXUn+SanCV0Av3Y=; b=X6UtXF+mAfI7mDVzgz7GYGPyCpGft1LkfDKgg80Z3L6Nr7YcjE8Y5H25UVT7CkAe6mSD4xg172aRZbypKiv1Pt0ne/rFPRxGAhyjM67bh1JJT3wXO6byvgd0FyVfdU1xumJjSjIPNFgFfYGhnRDn7/OLbtEDfT9mc8jvN67rFnIrWDkHjmT0bjmpU0gg/rpJZSWzchKB2YtC0CMa46kog9HFGE3edmVYb/GVOkaOvL6z/SpKBTNTapWVsYGRqVdj3/AMaBPdQtaANhHMeBmAW53f2vLe0sLk8B1OHSEauKAeAH3jj581i3WjuKA9zoQkCU5ycH/Xqj7VolJCZjCzMg== ARC-Authentication-Results: i=2; mx.microsoft.com 1; spf=pass (sender ip is 63.35.35.123) smtp.rcpttodomain=gcc.gnu.org smtp.mailfrom=arm.com; dmarc=pass (p=none sp=none pct=100) action=none header.from=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com; arc=pass (0 oda=1 ltdi=1 spf=[1,1,smtp.mailfrom=arm.com] dkim=[1,1,header.d=arm.com] dmarc=[1,1,header.from=arm.com]) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=5PO6Rmq5FvJSf418Um0/QoTQ2ujVlXUn+SanCV0Av3Y=; b=PEJ9WH4a7mNGk2vbhWLmpSOP1RtznJRnI1UIlBhQwIyZe8K8ZjKIvHvM8f9rPrg471AZcRuMJ6lc0c/z/LL3vM4Cu/fpvxijjx40oNn54H4y8Z+jpxg3VRtMAyuwjS4IjHfHFEHfEAk6IMBA0sBdeBI4nNztpwSsPEY3g1suStI= Received: from AS8PR07CA0039.eurprd07.prod.outlook.com (2603:10a6:20b:459::8) by GVXPR08MB7895.eurprd08.prod.outlook.com (2603:10a6:150:17::6) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7113.21; Fri, 22 Dec 2023 14:25:25 +0000 Received: from AM1PEPF000252DE.eurprd07.prod.outlook.com (2603:10a6:20b:459:cafe::4c) by AS8PR07CA0039.outlook.office365.com (2603:10a6:20b:459::8) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7135.10 via Frontend Transport; Fri, 22 Dec 2023 14:25:25 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; pr=C Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by AM1PEPF000252DE.mail.protection.outlook.com (10.167.16.56) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7113.14 via Frontend Transport; Fri, 22 Dec 2023 14:25:25 +0000 Received: ("Tessian outbound 26ee1d40577c:v228"); Fri, 22 Dec 2023 14:25:25 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: a5415a700b155f50 X-CR-MTA-TID: 64aa7808 Received: from 3ccdc980303f.3 by 64aa7808-outbound-1.mta.getcheckrecipient.com id 13C8EB54-37B1-4D15-A577-71AEECE61DF6.1; Fri, 22 Dec 2023 14:25:18 +0000 Received: from EUR05-DB8-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id 3ccdc980303f.3 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Fri, 22 Dec 2023 14:25:18 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=NTUYU+WwUUUHmqKnIfMo6Wva28i4GMFPIKZweHXLGdyFebrAj9iUP5ky/GYE+kB4GzDdmLiG1k958/2QhjVOPcGM1bPxY5UYB5yWqc7HW/VFwXCk5gff7HMXO2YYHbDJZZd/WTbRjNibowjNjezGK7s3Zss9JfiXQ4matgTqif8Mkzfq7svPSKsPPAc15gPsijWwy7+nV7F1Wolj99LEnZ3eQzfIDVqmEawHKdx6tatJcK9wHJwDP/+P0+hvgIfX8I2WdD5JeP82LpVokurICi0ZEtKUYLdkwGSajYifp+qpko3/naB2AaJ7YjVM/eeRKDxSZ9z7Oz4WPRI7kI/sbQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=5PO6Rmq5FvJSf418Um0/QoTQ2ujVlXUn+SanCV0Av3Y=; b=Ofv/h/9kmCTmOkSLEXiliW822O7VxbQyyqIkORqSHFy6H3ZDQFjingIsO+HBRd2miemBus2J8BUiEyMIW9OLt6TWCllexhNS72Z+6HfBUD65ipZn5VPMNXNTiDdG+NlpwiFz8vVHWZx89lBBYUBV5/gcd1hTlMg+cMZDkWMUoOdpfhy4lpJ3GplIUWM2N9vMKjwyKE7/pgOqo+qWJsrK9Z+ppHGMIq5KCwjWwgUON4UFYjOk3xy2STlnRboXJNerDeW0L7LJHDzc5hFp+Qhp/dtX1/qshWOc88Vi6ndlIeHX7k/De7gzKSsrgQBYK69/P4yvCF5yvqc3c0PIRbfKRg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=5PO6Rmq5FvJSf418Um0/QoTQ2ujVlXUn+SanCV0Av3Y=; b=PEJ9WH4a7mNGk2vbhWLmpSOP1RtznJRnI1UIlBhQwIyZe8K8ZjKIvHvM8f9rPrg471AZcRuMJ6lc0c/z/LL3vM4Cu/fpvxijjx40oNn54H4y8Z+jpxg3VRtMAyuwjS4IjHfHFEHfEAk6IMBA0sBdeBI4nNztpwSsPEY3g1suStI= Received: from PAWPR08MB8982.eurprd08.prod.outlook.com (2603:10a6:102:33f::20) by PAVPR08MB9236.eurprd08.prod.outlook.com (2603:10a6:102:307::16) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7113.21; Fri, 22 Dec 2023 14:25:16 +0000 Received: from PAWPR08MB8982.eurprd08.prod.outlook.com ([fe80::2ed5:dc23:2624:df0a]) by PAWPR08MB8982.eurprd08.prod.outlook.com ([fe80::2ed5:dc23:2624:df0a%7]) with mapi id 15.20.7113.019; Fri, 22 Dec 2023 14:25:16 +0000 From: Wilco Dijkstra To: Richard Earnshaw , Kyrylo Tkachov , GCC Patches CC: Richard Sandiford , Richard Earnshaw Subject: Re: [PATCH v3] AArch64: Cleanup memset expansion Thread-Topic: [PATCH v3] AArch64: Cleanup memset expansion Thread-Index: AQHaNOKu/sqynMvBoUq6/cwiF6+VSQ== Date: Fri, 22 Dec 2023 14:25:16 +0000 Message-ID: References: <372b9689-24b5-41f4-a990-5aee0226e15f@foss.arm.com> <61c6e268-188c-4b35-956d-bd8927d705f2@foss.arm.com> In-Reply-To: Accept-Language: en-GB, en-US Content-Language: en-GB X-MS-Has-Attach: X-MS-TNEF-Correlator: msip_labels: Authentication-Results-Original: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; x-ms-traffictypediagnostic: PAWPR08MB8982:EE_|PAVPR08MB9236:EE_|AM1PEPF000252DE:EE_|GVXPR08MB7895:EE_ X-MS-Office365-Filtering-Correlation-Id: fdf76f4c-7e7c-4ab7-e3e6-08dc02f9d6f7 x-checkrecipientrouted: true nodisclaimer: true X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: DVMWI8paU1NjahEqAZRXDolh0XHeoSsAHtejggTG2rjc4uWAnrFau4mXNeIzFwfWAgkURqmLN9dlGNQomOK+YvnDCiy3UcOdR0qL+yLm+umOg6V35y71/KLlxTeRqsDnGC/G9TpALdVH7G+MkpCXiVwwQMMAGF+1ibRlM5jSy5He/HGiU8Y9MBZe/GEE7PjaIkmTG1ZCaNyPaeWDL5w+tn2wM2vkyvtsJhyNar4dx+mk4PFTBre6sYY4U1wDN1IF5w0Bk3J1QRwwjCgV3FXNl1an3c6DmANBXiLKoT29dRp2Gz9H8wLHdln1qTwgxYWnc0LjObWYtpkw1n1RMsxEVSDNKSMs2HZjZKaPDlHw0JuYziQiOLHEoRZRfnaRYHzbdGqXGmcXVT+VXIPpWFbyQLwPw2zix3J89hcBEZw9NH9poLS+zKt0nG9j8Z9OCFnBHFEdEi02wuxgf1PNWTj2J/KC/f4C98gTNM8m8nO1pHiIWAPCepkWNvBeIwL5aPZdlBQjMKB5yWMmBzRBM9CzbgYRU+MQu07zUajOepKMgqRXD8BnLRC+9Bqq2KZhya78N9GYZkZ779sn4cp2GPootVyAVff7Jl6FxFx1j3zl7yIrjbSYhl6eR6YtLtBXMgur X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:PAWPR08MB8982.eurprd08.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230031)(366004)(396003)(136003)(39860400002)(376002)(346002)(230922051799003)(186009)(64100799003)(451199024)(1800799012)(71200400001)(478600001)(38100700002)(55016003)(122000001)(26005)(83380400001)(9686003)(7696005)(6506007)(4326008)(8936002)(8676002)(5660300002)(2906002)(33656002)(38070700009)(52536014)(86362001)(66476007)(66446008)(316002)(66556008)(64756008)(76116006)(91956017)(54906003)(66946007)(41300700001)(110136005);DIR:OUT;SFP:1101; Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-MS-Exchange-Transport-CrossTenantHeadersStamped: PAVPR08MB9236 Original-Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: AM1PEPF000252DE.eurprd07.prod.outlook.com X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id-Prvs: 76274015-c5c1-459d-7227-08dc02f9d15e X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: knl/lT2BjQiD8CasLwosYHazpkqZge4YHtHercVgI3lsD7MZ5jfnk17wI9IsT3mTDF+OQIBoJnrl4pfeRe++PzJDFGYoWu/8JMtKa0R6Zbcwe0bePKazBmJIcd5mS6sCbkOxBy5oa8PNbIQAJ+aJS9OlkAJhou06u5V//73MbAE2h868QUiEpJVwULKrz3+lhT1uBrPT80DXt5QzpSG5FMc2WqdFIeryVtgz+zxBGS3ndAWevyOurXgSNuckH3+6tEooRzpeGjXsCIn0hz8RVdzGdPMC8PP4uyJfL5iTzh3wJ6TAbSsFTLvmsWzs60s1J0AaVxJvFZ9D4g+6R8/xtd1UnloaqktHRKAxD852SnSbT6Gj5ofdhRaGBQP6a1HDvCW+jGNhSLjCdMtb1j0HnulvXyvNFv4R2+YaNv+34UeGL5dynb/T+0gR/aQy+PTVeDRKHjhui/41wklgIX604T4BHrtlyWHrsT2u2Gi2YzIFB1FZRnG5EGsqKM4kQKxUCH+40K0WVfOTN9gU5eb/qTGC5nMCPI3eTx23ohTV5M3/T9o5HQ6RGN5ukJUWKteZDG2jB+y1xsXiM6gn+RUQK/soN5luecweCaUvmUvUblMpRbkwdDLGHFSszKtI61x+7huREVzqdniWxOFAS95pmYcyYZaIDb6M+Omll5XUdvCKwBEt1xbBq8re2eJTvRgQ/vNNb1C++yozOLSe4PJA349hm0ssB74JwwjTw7bzhf3FHviFAaASAaXnuIAOX3cS X-Forefront-Antispam-Report: CIP:63.35.35.123;CTRY:IE;LANG:en;SCL:1;SRV:;IPV:CAL;SFV:NSPM;H:64aa7808-outbound-1.mta.getcheckrecipient.com;PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com;CAT:NONE;SFS:(13230031)(4636009)(376002)(396003)(136003)(39860400002)(346002)(230922051799003)(186009)(82310400011)(1800799012)(451199024)(64100799003)(40470700004)(46966006)(36840700001)(47076005)(5660300002)(7696005)(40480700001)(6506007)(478600001)(9686003)(40460700003)(336012)(26005)(83380400001)(55016003)(36860700001)(8676002)(4326008)(8936002)(52536014)(81166007)(41300700001)(2906002)(86362001)(33656002)(70586007)(70206006)(110136005)(54906003)(316002)(356005)(82740400003);DIR:OUT;SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 22 Dec 2023 14:25:25.5825 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: fdf76f4c-7e7c-4ab7-e3e6-08dc02f9d6f7 X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d;Ip=[63.35.35.123];Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: AM1PEPF000252DE.eurprd07.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: GVXPR08MB7895 X-Spam-Status: No, score=-10.7 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,FORGED_SPF_HELO,GIT_PATCH_0,KAM_DMARC_NONE,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_PASS,SPF_NONE,TXREP,T_SCC_BODY_TEXT_LINE,UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: v3: rebased to latest trunk=0A= =0A= Cleanup memset implementation. Similar to memcpy/memmove, use an offset an= d=0A= bytes throughout. Simplify the complex calculations when optimizing for si= ze=0A= by using a fixed limit.=0A= =0A= Passes regress & bootstrap.=0A= =0A= gcc/ChangeLog:=0A= * config/aarch64/aarch64.h (MAX_SET_SIZE): New define.=0A= * config/aarch64/aarch64.cc (aarch64_progress_pointer): Remove function.= =0A= (aarch64_set_one_block_and_progress_pointer): Simplify and clean up.=0A= (aarch64_expand_setmem): Clean up implementation, use byte offsets,=0A= simplify size calculation.=0A= =0A= ---=0A= =0A= diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h=0A= index 3ae42be770400da96ea3d9d25d6e1b2d393d034d..dd3b7988d585277181c478cd022= fd7b6285929d0 100644=0A= --- a/gcc/config/aarch64/aarch64.h=0A= +++ b/gcc/config/aarch64/aarch64.h=0A= @@ -1178,6 +1178,10 @@ typedef struct=0A= mode that should actually be used. We allow pairs of registers. */=0A= #define MAX_FIXED_MODE_SIZE GET_MODE_BITSIZE (TImode)=0A= =0A= +/* Maximum bytes set for an inline memset expansion. With -Os use 3 STP= =0A= + and 1 MOVI/DUP (same size as a call). */=0A= +#define MAX_SET_SIZE(speed) (speed ? 256 : 96)=0A= +=0A= /* Maximum bytes moved by a single instruction (load/store pair). */=0A= #define MOVE_MAX (UNITS_PER_WORD * 2)=0A= =0A= diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc= =0A= index f9850320f61c5ddccf47e6583d304e5f405a484f..0909b319d16b9a1587314bcfda0= a8112b42a663f 100644=0A= --- a/gcc/config/aarch64/aarch64.cc=0A= +++ b/gcc/config/aarch64/aarch64.cc=0A= @@ -26294,15 +26294,6 @@ aarch64_move_pointer (rtx pointer, poly_int64 amou= nt)=0A= next, amount);=0A= }=0A= =0A= -/* Return a new RTX holding the result of moving POINTER forward by the=0A= - size of the mode it points to. */=0A= -=0A= -static rtx=0A= -aarch64_progress_pointer (rtx pointer)=0A= -{=0A= - return aarch64_move_pointer (pointer, GET_MODE_SIZE (GET_MODE (pointer))= );=0A= -}=0A= -=0A= typedef auto_vec, 12> copy_ops;=0A= =0A= /* Copy one block of size MODE from SRC to DST at offset OFFSET. */=0A= @@ -26457,45 +26448,21 @@ aarch64_expand_cpymem (rtx *operands, bool is_mem= move)=0A= return true;=0A= }=0A= =0A= -/* Like aarch64_copy_one_block_and_progress_pointers, except for memset wh= ere=0A= - SRC is a register we have created with the duplicated value to be set. = */=0A= +/* Set one block of size MODE at DST at offset OFFSET to value in SRC. */= =0A= static void=0A= -aarch64_set_one_block_and_progress_pointer (rtx src, rtx *dst,=0A= - machine_mode mode)=0A= +aarch64_set_one_block (rtx src, rtx dst, int offset, machine_mode mode)=0A= {=0A= - /* If we are copying 128bits or 256bits, we can do that straight from=0A= - the SIMD register we prepared. */=0A= - if (known_eq (GET_MODE_BITSIZE (mode), 256))=0A= - {=0A= - mode =3D GET_MODE (src);=0A= - /* "Cast" the *dst to the correct mode. */=0A= - *dst =3D adjust_address (*dst, mode, 0);=0A= - /* Emit the memset. */=0A= - emit_insn (aarch64_gen_store_pair (*dst, src, src));=0A= -=0A= - /* Move the pointers forward. */=0A= - *dst =3D aarch64_move_pointer (*dst, 32);=0A= - return;=0A= - }=0A= - if (known_eq (GET_MODE_BITSIZE (mode), 128))=0A= + /* Emit explict store pair instructions for 32-byte writes. */=0A= + if (known_eq (GET_MODE_SIZE (mode), 32))=0A= {=0A= - /* "Cast" the *dst to the correct mode. */=0A= - *dst =3D adjust_address (*dst, GET_MODE (src), 0);=0A= - /* Emit the memset. */=0A= - emit_move_insn (*dst, src);=0A= - /* Move the pointers forward. */=0A= - *dst =3D aarch64_move_pointer (*dst, 16);=0A= + mode =3D V16QImode;=0A= + rtx dst1 =3D adjust_address (dst, mode, offset);=0A= + emit_insn (aarch64_gen_store_pair (dst1, src, src));=0A= return;=0A= }=0A= - /* For copying less, we have to extract the right amount from src. */= =0A= - rtx reg =3D lowpart_subreg (mode, src, GET_MODE (src));=0A= -=0A= - /* "Cast" the *dst to the correct mode. */=0A= - *dst =3D adjust_address (*dst, mode, 0);=0A= - /* Emit the memset. */=0A= - emit_move_insn (*dst, reg);=0A= - /* Move the pointer forward. */=0A= - *dst =3D aarch64_progress_pointer (*dst);=0A= + if (known_lt (GET_MODE_SIZE (mode), 16))=0A= + src =3D lowpart_subreg (mode, src, GET_MODE (src));=0A= + emit_move_insn (adjust_address (dst, mode, offset), src);=0A= }=0A= =0A= /* Expand a setmem using the MOPS instructions. OPERANDS are the same=0A= @@ -26524,7 +26491,7 @@ aarch64_expand_setmem_mops (rtx *operands)=0A= bool=0A= aarch64_expand_setmem (rtx *operands)=0A= {=0A= - int n, mode_bits;=0A= + int mode_bytes;=0A= unsigned HOST_WIDE_INT len;=0A= rtx dst =3D operands[0];=0A= rtx val =3D operands[2], src;=0A= @@ -26537,11 +26504,9 @@ aarch64_expand_setmem (rtx *operands)=0A= || (STRICT_ALIGNMENT && align < 16))=0A= return aarch64_expand_setmem_mops (operands);=0A= =0A= - bool size_p =3D optimize_function_for_size_p (cfun);=0A= -=0A= /* Default the maximum to 256-bytes when considering only libcall vs=0A= SIMD broadcast sequence. */=0A= - unsigned max_set_size =3D 256;=0A= + unsigned max_set_size =3D MAX_SET_SIZE (optimize_function_for_speed_p (c= fun));=0A= unsigned mops_threshold =3D aarch64_mops_memset_size_threshold;=0A= =0A= len =3D UINTVAL (operands[1]);=0A= @@ -26550,91 +26515,55 @@ aarch64_expand_setmem (rtx *operands)=0A= if (len > max_set_size || (TARGET_MOPS && len > mops_threshold))=0A= return aarch64_expand_setmem_mops (operands);=0A= =0A= - int cst_val =3D !!(CONST_INT_P (val) && (INTVAL (val) !=3D 0));=0A= - /* The MOPS sequence takes:=0A= - 3 instructions for the memory storing=0A= - + 1 to move the constant size into a reg=0A= - + 1 if VAL is a non-zero constant to move into a reg=0A= - (zero constants can use XZR directly). */=0A= - unsigned mops_cost =3D 3 + 1 + cst_val;=0A= - /* A libcall to memset in the worst case takes 3 instructions to prepare= =0A= - the arguments + 1 for the call. */=0A= - unsigned libcall_cost =3D 4;=0A= -=0A= - /* Attempt a sequence with a vector broadcast followed by stores.=0A= - Count the number of operations involved to see if it's worth it=0A= - against the alternatives. A simple counter simd_ops on the=0A= - algorithmically-relevant operations is used rather than an rtx_insn c= ount=0A= - as all the pointer adjusmtents and mode reinterprets will be optimize= d=0A= - away later. */=0A= - start_sequence ();=0A= - unsigned simd_ops =3D 0;=0A= -=0A= base =3D copy_to_mode_reg (Pmode, XEXP (dst, 0));=0A= dst =3D adjust_automodify_address (dst, VOIDmode, base, 0);=0A= =0A= /* Prepare the val using a DUP/MOVI v0.16B, val. */=0A= src =3D expand_vector_broadcast (V16QImode, val);=0A= src =3D force_reg (V16QImode, src);=0A= - simd_ops++;=0A= - /* Convert len to bits to make the rest of the code simpler. */=0A= - n =3D len * BITS_PER_UNIT;=0A= =0A= - /* Maximum amount to copy in one go. We allow 256-bit chunks based on t= he=0A= - AARCH64_EXTRA_TUNE_NO_LDP_STP_QREGS tuning parameter. */=0A= - const int copy_limit =3D (aarch64_tune_params.extra_tuning_flags=0A= - & AARCH64_EXTRA_TUNE_NO_LDP_STP_QREGS)=0A= - ? GET_MODE_BITSIZE (TImode) : 256;=0A= + /* Set maximum amount to write in one go. We allow 32-byte chunks based= =0A= + on the AARCH64_EXTRA_TUNE_NO_LDP_STP_QREGS tuning parameter. */=0A= + unsigned set_max =3D 32;=0A= +=0A= + if (len <=3D 24 || (aarch64_tune_params.extra_tuning_flags=0A= + & AARCH64_EXTRA_TUNE_NO_LDP_STP_QREGS))=0A= + set_max =3D 16;=0A= =0A= - while (n > 0)=0A= + int offset =3D 0;=0A= + while (len > 0)=0A= {=0A= /* Find the largest mode in which to do the copy without=0A= over writing. */=0A= opt_scalar_int_mode mode_iter;=0A= FOR_EACH_MODE_IN_CLASS (mode_iter, MODE_INT)=0A= - if (GET_MODE_BITSIZE (mode_iter.require ()) <=3D MIN (n, copy_limit))=0A= + if (GET_MODE_SIZE (mode_iter.require ()) <=3D MIN (len, set_max))=0A= cur_mode =3D mode_iter.require ();=0A= =0A= gcc_assert (cur_mode !=3D BLKmode);=0A= =0A= - mode_bits =3D GET_MODE_BITSIZE (cur_mode).to_constant ();=0A= - aarch64_set_one_block_and_progress_pointer (src, &dst, cur_mode);=0A= - simd_ops++;=0A= - n -=3D mode_bits;=0A= + mode_bytes =3D GET_MODE_SIZE (cur_mode).to_constant ();=0A= +=0A= + /* Prefer Q-register accesses for the last bytes. */=0A= + if (mode_bytes =3D=3D 16)=0A= + cur_mode =3D V16QImode;=0A= +=0A= + aarch64_set_one_block (src, dst, offset, cur_mode);=0A= + len -=3D mode_bytes;=0A= + offset +=3D mode_bytes;=0A= =0A= /* Emit trailing writes using overlapping unaligned accesses=0A= - (when !STRICT_ALIGNMENT) - this is smaller and faster. */=0A= - if (n > 0 && n < copy_limit / 2 && !STRICT_ALIGNMENT)=0A= + (when !STRICT_ALIGNMENT) - this is smaller and faster. */=0A= + if (len > 0 && len < set_max / 2 && !STRICT_ALIGNMENT)=0A= {=0A= - next_mode =3D smallest_mode_for_size (n, MODE_INT);=0A= - int n_bits =3D GET_MODE_BITSIZE (next_mode).to_constant ();=0A= - gcc_assert (n_bits <=3D mode_bits);=0A= - dst =3D aarch64_move_pointer (dst, (n - n_bits) / BITS_PER_UNIT);=0A= - n =3D n_bits;=0A= + next_mode =3D smallest_mode_for_size (len * BITS_PER_UNIT, MODE_INT);= =0A= + int n_bytes =3D GET_MODE_SIZE (next_mode).to_constant ();=0A= + gcc_assert (n_bytes <=3D mode_bytes);=0A= + offset -=3D n_bytes - len;=0A= + len =3D n_bytes;=0A= }=0A= }=0A= - rtx_insn *seq =3D get_insns ();=0A= - end_sequence ();=0A= -=0A= - if (size_p)=0A= - {=0A= - /* When optimizing for size we have 3 options: the SIMD broadcast se= quence,=0A= - call to memset or the MOPS expansion. */=0A= - if (TARGET_MOPS=0A= - && mops_cost <=3D libcall_cost=0A= - && mops_cost <=3D simd_ops)=0A= - return aarch64_expand_setmem_mops (operands);=0A= - /* If MOPS is not available or not shorter pick a libcall if the SIM= D=0A= - sequence is too long. */=0A= - else if (libcall_cost < simd_ops)=0A= - return false;=0A= - emit_insn (seq);=0A= - return true;=0A= - }=0A= =0A= - /* At this point the SIMD broadcast sequence is the best choice when=0A= - optimizing for speed. */=0A= - emit_insn (seq);=0A= return true;=0A= }=0A= =0A=