From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from EUR01-VE1-obe.outbound.protection.outlook.com (mail-ve1eur01on2048.outbound.protection.outlook.com [40.107.14.48]) by sourceware.org (Postfix) with ESMTPS id 74BEB385841B for ; Fri, 10 Nov 2023 09:51:16 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 74BEB385841B Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 74BEB385841B Authentication-Results: server2.sourceware.org; arc=pass smtp.remote-ip=40.107.14.48 ARC-Seal: i=3; a=rsa-sha256; d=sourceware.org; s=key; t=1699609878; cv=pass; b=frDyzcv8AuKTptxokjyYyLqJCQkX8EK28pAuHUhcpbX/7OWje7Kp8vj/6wKnW+OGokvf+15+QlMTw6pxoLcwCg7lSImcIbnku9N4kKjm+wtI7MNrrllCFiDvz/pEw0D17QGtSrBM8aG43cuDL6Gu0GgKJ2U7XpsaqlIfBk7sUNQ= ARC-Message-Signature: i=3; a=rsa-sha256; d=sourceware.org; s=key; t=1699609878; c=relaxed/simple; bh=2jEel6JnwLHecygC0hEtv98lI3mitTjbU0N54FMYEMg=; h=DKIM-Signature:DKIM-Signature:From:To:Subject:Date:Message-ID: MIME-Version; b=AKd/sP5k0pT7fQyKDs+crQLOXPf/SVVHiItB4FfLUVo70lKZ3Md8OMiBohFdqwzC1YDbVjuG+RBbbp4p8Jjljg7GZ3Tqj5V8yDr+6AO27Wh+U9lvOYTEY/OOQDz4q3hDirhmJUz0f4l5Grv96CdwSMQhALrS7hX9EaAbmsQQXRo= ARC-Authentication-Results: i=3; server2.sourceware.org ARC-Seal: i=2; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=pass; b=FoswchMzZolvVbW8WYxzK+Uxz1OxqpORjHM3j7aY65Cu9bjxuXXcjhYFYFExN+Ov/032S+oTQSDB+CtTyR0sh39Khv+o29D44i/DG5wPCuie401vWSEyV3kxfAu9Sp/CtmGlbuRgfNIEsrtJA9KP+vBVDDKCgEy1OGpqqhhDSkwrhOboLqc84dWzCP32FGq3+DUZijgV5dCj21XxYkLMCkHEgehpG+DAde9b8NsV09aH5jGOz92wVNANoLvzkHZNvUose8zBMMSHUAKhjfa7Vn7CEycX6qurfDRN6cQpk3KQE9N7XjR90qjGTBpUxodD+VHIcbBXJDZ36pVD7A5ISg== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=2jEel6JnwLHecygC0hEtv98lI3mitTjbU0N54FMYEMg=; b=GCPRNeNKEMIjA2E2QEiWsE2FcJZozzBQAiiRj8DbV6ij57D1UW98zoZVY52gf2E4GXVF0oZrEcsyQoxn34LGMZwSeO3BoFHar5EDEWo9ICvzfbKGU80qf34YTZ8UxqCoDCuS1v10R1QYwE+rq8c7XBHwxOfy/l5/OapCCRnHNNnVP3SeEt+0Q1A768JDWktJu4phPr5aOOInZr4wtcvfC/RJToPC9u9GbD82lgNlGLSvr0SWtjpvHmKXP6pent9AQpnWetiWl3UKagx619KU9n6MM3j4PdN6pupUuUdXbsQUJ2u1ecjobxovI4KAQZvgRksqzuLiI8L7zbBkR+YI9g== ARC-Authentication-Results: i=2; mx.microsoft.com 1; spf=pass (sender ip is 63.35.35.123) smtp.rcpttodomain=gcc.gnu.org smtp.mailfrom=arm.com; dmarc=pass (p=none sp=none pct=100) action=none header.from=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com; arc=pass (0 oda=1 ltdi=1 spf=[1,1,smtp.mailfrom=arm.com] dkim=[1,1,header.d=arm.com] dmarc=[1,1,header.from=arm.com]) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=2jEel6JnwLHecygC0hEtv98lI3mitTjbU0N54FMYEMg=; b=fFKl5Y7u6tTvHbxjlyh2Nvyv6k9hac0W+N3O7V5ND/VLyEtMWb4xELNp2vgpC6kjR/p/Xg4pOWtGvgNoEEy5NnbYK6dsOaA32zZzecnPPZ+UGSjflefpb7Udyomjx6R++ci5Iuj3HJ5JaNlaHCXLtHdsNhFnzzVAxWlTjgzvEQ8= Received: from DBBPR09CA0024.eurprd09.prod.outlook.com (2603:10a6:10:c0::36) by AS2PR08MB9739.eurprd08.prod.outlook.com (2603:10a6:20b:605::17) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6977.18; Fri, 10 Nov 2023 09:51:13 +0000 Received: from DU6PEPF0000A7E2.eurprd02.prod.outlook.com (2603:10a6:10:c0:cafe::b2) by DBBPR09CA0024.outlook.office365.com (2603:10a6:10:c0::36) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6977.19 via Frontend Transport; Fri, 10 Nov 2023 09:51:13 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; pr=C Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by DU6PEPF0000A7E2.mail.protection.outlook.com (10.167.8.46) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6977.16 via Frontend Transport; Fri, 10 Nov 2023 09:51:13 +0000 Received: ("Tessian outbound 20615a7e7970:v228"); Fri, 10 Nov 2023 09:51:13 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: 1c3cd8a629e44701 X-CR-MTA-TID: 64aa7808 Received: from 9197d777d78b.3 by 64aa7808-outbound-1.mta.getcheckrecipient.com id 6609AA35-E768-47E7-B4E2-EFD5661F04C5.1; Fri, 10 Nov 2023 09:51:02 +0000 Received: from EUR05-AM6-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id 9197d777d78b.3 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Fri, 10 Nov 2023 09:51:02 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=Zbua+NsFKHjFRHXygm9tg8Lvrf7kMc/PIjHvPbMpuxygQnFg4Wq9TSKeOInMkQTlk1UGkVs07veVAXekbhXO9HPPfo4Y56pah0vAyS8s3+e0qOgDHF1q2ZtwPcLNltXMiqC3dmiflL00YbkNZ3DTlbF/Ph1CafTs6RHwVST98TRNtTIV5fQa8t7w9uHmIDbCUUIZdfGs0XS4WE/Z1kafppV41HIP0yBDF1M987UN9v2p57ZSnsL1bSeXqvpgz/qRyW+V7o0tjl+e+hfooLL9kkishRXYcU21qsT5RsMMyV6FjT/zrhGj3SjsZcoWaDUCQROcBiWJHKiD9O9DdxwyDA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=2jEel6JnwLHecygC0hEtv98lI3mitTjbU0N54FMYEMg=; b=MGlqh5cLGEeAZEWT5EIMZx120+zzuAvpKpMNqiAxT7qmv37470bR04QDwJ8Xpv1nDH9IgiIh6ozffWTOMgC4cFHW45mGNKoiWLPAmnmuEAk1pXIVMI1ibWcwUREH7/LUQMEAbE8aQF6yANWRf8HEQ2Fx3Ycn1DLTEjg2oN/ULvxbGgd9V3aVlNC7UF5AbPuAJyliWOZDBUKt6oivluFyQtrZ5VaXOuXOhHhN6gF7UoGy4j1XqNrQ+VJJinKRBEhIQ7IiKze4rIF4Ndkg9XKgdlawmn+lBGGR2wFtfEeVljtGK1bhjbuTm8QLR+FeCbrHBE8tNw/ZpPX5TtTgF/RD4A== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=2jEel6JnwLHecygC0hEtv98lI3mitTjbU0N54FMYEMg=; b=fFKl5Y7u6tTvHbxjlyh2Nvyv6k9hac0W+N3O7V5ND/VLyEtMWb4xELNp2vgpC6kjR/p/Xg4pOWtGvgNoEEy5NnbYK6dsOaA32zZzecnPPZ+UGSjflefpb7Udyomjx6R++ci5Iuj3HJ5JaNlaHCXLtHdsNhFnzzVAxWlTjgzvEQ8= Received: from PAXPR08MB6926.eurprd08.prod.outlook.com (2603:10a6:102:138::24) by DU5PR08MB10549.eurprd08.prod.outlook.com (2603:10a6:10:529::20) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6977.19; Fri, 10 Nov 2023 09:50:59 +0000 Received: from PAXPR08MB6926.eurprd08.prod.outlook.com ([fe80::afc8:6fef:2f82:559]) by PAXPR08MB6926.eurprd08.prod.outlook.com ([fe80::afc8:6fef:2f82:559%4]) with mapi id 15.20.6977.019; Fri, 10 Nov 2023 09:50:59 +0000 From: Kyrylo Tkachov To: Wilco Dijkstra , GCC Patches CC: Richard Sandiford , Richard Earnshaw Subject: RE: [PATCH] AArch64: Cleanup memset expansion Thread-Topic: [PATCH] AArch64: Cleanup memset expansion Thread-Index: AQHaAonvDvOGXhWPREyotSkr/rO2DbBtT7FFgAYhwJA= Date: Fri, 10 Nov 2023 09:50:59 +0000 Message-ID: References: In-Reply-To: Accept-Language: en-GB, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: msip_labels: Authentication-Results-Original: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; x-ms-traffictypediagnostic: PAXPR08MB6926:EE_|DU5PR08MB10549:EE_|DU6PEPF0000A7E2:EE_|AS2PR08MB9739:EE_ X-MS-Office365-Filtering-Correlation-Id: 50e79d11-8d72-4f18-c4b8-08dbe1d2935e x-checkrecipientrouted: true nodisclaimer: true X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: ojawfmBc1WGGtr1FhNeOUyoEt13KV2c/946va5N6/4iwzD3FYAQrPiSRoRUenWppGcgXis6UEVMQpGk/oLAlsTuLIcUvrWksX0N0SAdT8LxNOzbOzGxsDmP76JhVSJMUBy1wsGnrCUlbF3A5LyEyMpToetNfHImrVrbPIU/bqrLony7zDsJVBixPIxlTQL0bTE6JQMPWM+vCsTPchkox1RDtUSX+Ubu7Q1eKevYh7qhlSsHoe7I9nq7A3TDm376rg9xxEUfPRSUCDgMXNi8AClVvK5ethIZu7cO0QSjABPfTo0ZXK1BigZyUu6XrqFQRmdn+32BSKUTERY/L4TZuYBaxkgTk3ElJd2nGlBd+U3c4img6XmuAyfZuAzxrCYtkXWTdQp197kM+KqYVk9ST7Xhl8gndky+EleVVuNLnSFL+9w5GcO1sHgsj0Sr+GKu0PgEeJkBaXjEh+J28HM0ifcRjN1VB/i0YlLsFglMTonC0CUXPqooaYo01a1AD0A5euCQqw+ABGg565Wvx3Ad/B4g+6Fl+VSqnewN/9km0VGXPoEbcKPhp+UctjK5yTpznmdkvgmXA7IZmvybbjWzTbDC0cxserQDxuKnZa5Y5mZ8632rv5GKf+kvNmQNNFdHk X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:PAXPR08MB6926.eurprd08.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230031)(366004)(346002)(136003)(39860400002)(396003)(376002)(230922051799003)(451199024)(64100799003)(186009)(1800799009)(41300700001)(122000001)(5660300002)(38100700002)(2906002)(38070700009)(86362001)(33656002)(53546011)(83380400001)(8936002)(8676002)(6506007)(7696005)(4326008)(55016003)(9686003)(316002)(66476007)(54906003)(66556008)(66946007)(66446008)(64756008)(76116006)(26005)(110136005)(478600001)(71200400001)(52536014);DIR:OUT;SFP:1101; Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-MS-Exchange-Transport-CrossTenantHeadersStamped: DU5PR08MB10549 Original-Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: DU6PEPF0000A7E2.eurprd02.prod.outlook.com X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id-Prvs: 5bf48d07-dbb6-4ac5-d300-08dbe1d28b02 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: /x4iXg6U+KBSWueNqP/CTrTT1ek9SkwNEjn48csVES1qP4KScf275knICUu4yDz9fPgAYV6c+p43dAJ9etzMhj3T9nQt63Xpkxq6ZM0cWRGTvqSYA8orKWEEIpifSec5W5owYsKXLfcdTqfL2N58NgV/TEWfUHWdC2GnuZUn6cn07OyfHCx08qjtkIvFnVPeUrbR2uskJLJADq/RjN2snLNxs8HxzyOJL3dc0dQdk3lvMhin2xf2NBSTd3Lza+1WIMSFTnYdhjAhMcw/tx1Dj5bdwT58/VSpe56p2aU/L3+q4wsSMLWtEZ970jzMHU8kW+QjRsg5OP2PdYp7wEquTJ+TLSKNpgcyZa363ggmeDl1Jkqd+Zt2US9fRKP18+Z/sOF9mOYiLvXfKLoyBfbIbeSVhlyQgxtAkG/n0jNYuSTMJ8am3xdUGAePDC9dNq/2E6NTH4npIhbbXMr/fBcvwYnkKpnueI4cnmHlM5UkbDqrxKO9pQx/Dd/6jyRmptwXgMlJbR+/wktzFqj9GipdZjgyy28WzWhhSpjxuaI19J6rXySZtVIawfLvYQD1CSiH5FBiwCYmLNx7Otf87wVQmk8unBMUOT2lrXXHJZOMw7bBzLjBTKwEIObTlstRc0/YC/AxhlR2zGTZCKS1iI4wrtM/K5Q6spBSJtgJeApooGyEO2r2wEC6V2rn/akxgKL+TjF718mWZAxHU7DH08d+poF0IaQkMbxEBeHpCN0j8ZJQotXI4qEi9e9aog/Sgy0vgtaBJU5MVTRDXHi7rOS4po+unQXkpnudhdd6sr5Bwzs= X-Forefront-Antispam-Report: CIP:63.35.35.123;CTRY:IE;LANG:en;SCL:1;SRV:;IPV:CAL;SFV:NSPM;H:64aa7808-outbound-1.mta.getcheckrecipient.com;PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com;CAT:NONE;SFS:(13230031)(4636009)(376002)(39860400002)(396003)(346002)(136003)(230922051799003)(64100799003)(1800799009)(82310400011)(451199024)(186009)(36840700001)(46966006)(40470700004)(8676002)(8936002)(4326008)(83380400001)(336012)(7696005)(41300700001)(52536014)(82740400003)(26005)(86362001)(55016003)(9686003)(316002)(53546011)(6506007)(54906003)(70586007)(70206006)(110136005)(40480700001)(47076005)(5660300002)(2906002)(40460700003)(33656002)(356005)(36860700001)(478600001)(81166007);DIR:OUT;SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 10 Nov 2023 09:51:13.4900 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 50e79d11-8d72-4f18-c4b8-08dbe1d2935e X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d;Ip=[63.35.35.123];Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: DU6PEPF0000A7E2.eurprd02.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: AS2PR08MB9739 X-Spam-Status: No, score=-9.8 required=5.0 tests=BAYES_00,BODY_8BITS,DKIM_SIGNED,DKIM_VALID,FORGED_SPF_HELO,GIT_PATCH_0,KAM_DMARC_NONE,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_PASS,SPF_NONE,TXREP,T_SCC_BODY_TEXT_LINE,UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Hi Wilco, > -----Original Message----- > From: Wilco Dijkstra > Sent: Monday, November 6, 2023 12:12 PM > To: GCC Patches > Cc: Richard Sandiford ; Richard Earnshaw > > Subject: Re: [PATCH] AArch64: Cleanup memset expansion >=20 > ping >=20 > Cleanup memset implementation.=A0 Similar to memcpy/memmove, use an > offset and > bytes throughout.=A0 Simplify the complex calculations when optimizing fo= r size > by using a fixed limit. >=20 > Passes regress/bootstrap, OK for commit? >=20 This looks like a good cleanup but I have a question... > gcc/ChangeLog: > =A0=A0=A0=A0=A0=A0=A0 * config/aarch64/aarch64.cc (aarch64_progress_point= er): Remove > function. > =A0=A0=A0=A0=A0=A0=A0 (aarch64_set_one_block_and_progress_pointer): Simpl= ify and clean up. > =A0=A0=A0=A0=A0=A0=A0 (aarch64_expand_setmem): Clean up implementation, u= se byte offsets, > =A0=A0=A0=A0=A0=A0=A0 simplify size calculation. >=20 > --- >=20 > diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.c= c > index > e19e2d1de2e5b30eca672df05d9dcc1bc106ecc8..578a253d6e0e133e1959255 > 3fc873b3e73f9f218 100644 > --- a/gcc/config/aarch64/aarch64.cc > +++ b/gcc/config/aarch64/aarch64.cc > @@ -25229,15 +25229,6 @@ aarch64_move_pointer (rtx pointer, poly_int64 > amount) > =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 next, amount); > =A0} >=20 > -/* Return a new RTX holding the result of moving POINTER forward by the > -=A0=A0 size of the mode it points to.=A0 */ > - > -static rtx > -aarch64_progress_pointer (rtx pointer) > -{ > -=A0 return aarch64_move_pointer (pointer, GET_MODE_SIZE (GET_MODE > (pointer))); > -} > - > =A0/* Copy one block of size MODE from SRC to DST at offset OFFSET.=A0 */ >=20 > =A0static void > @@ -25393,46 +25384,22 @@ aarch64_expand_cpymem (rtx *operands, > bool is_memmove) > =A0=A0 return true; > =A0} >=20 > -/* Like aarch64_copy_one_block_and_progress_pointers, except for memset > where > -=A0=A0 SRC is a register we have created with the duplicated value to be= set.=A0 */ > +/* Set one block of size MODE at DST at offset OFFSET to value in SRC.= =A0 */ > =A0static void > -aarch64_set_one_block_and_progress_pointer (rtx src, rtx *dst, > -=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 machine_mode mode) > -{ > -=A0 /* If we are copying 128bits or 256bits, we can do that straight fro= m > -=A0=A0=A0=A0 the SIMD register we prepared.=A0 */ > -=A0 if (known_eq (GET_MODE_BITSIZE (mode), 256)) > -=A0=A0=A0 { > -=A0=A0=A0=A0=A0 mode =3D GET_MODE (src); > -=A0=A0=A0=A0=A0 /* "Cast" the *dst to the correct mode.=A0 */ > -=A0=A0=A0=A0=A0 *dst =3D adjust_address (*dst, mode, 0); > -=A0=A0=A0=A0=A0 /* Emit the memset.=A0 */ > -=A0=A0=A0=A0=A0 emit_insn (aarch64_gen_store_pair (mode, *dst, src, > -=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 aarch64_progress_pointer (*ds= t), src)); > - > -=A0=A0=A0=A0=A0 /* Move the pointers forward.=A0 */ > -=A0=A0=A0=A0=A0 *dst =3D aarch64_move_pointer (*dst, 32); > -=A0=A0=A0=A0=A0 return; > -=A0=A0=A0 } > -=A0 if (known_eq (GET_MODE_BITSIZE (mode), 128)) > +aarch64_set_one_block (rtx src, rtx dst, int offset, machine_mode mode) > +{ > +=A0 /* Emit explict store pair instructions for 32-byte writes.=A0 */ > +=A0 if (known_eq (GET_MODE_SIZE (mode), 32)) > =A0=A0=A0=A0 { > -=A0=A0=A0=A0=A0 /* "Cast" the *dst to the correct mode.=A0 */ > -=A0=A0=A0=A0=A0 *dst =3D adjust_address (*dst, GET_MODE (src), 0); > -=A0=A0=A0=A0=A0 /* Emit the memset.=A0 */ > -=A0=A0=A0=A0=A0 emit_move_insn (*dst, src); > -=A0=A0=A0=A0=A0 /* Move the pointers forward.=A0 */ > -=A0=A0=A0=A0=A0 *dst =3D aarch64_move_pointer (*dst, 16); > +=A0=A0=A0=A0=A0 mode =3D V16QImode; > +=A0=A0=A0=A0=A0 rtx dst1 =3D adjust_address (dst, mode, offset); > +=A0=A0=A0=A0=A0 rtx dst2 =3D adjust_address (dst, mode, offset + 16); > +=A0=A0=A0=A0=A0 emit_insn (aarch64_gen_store_pair (mode, dst1, src, dst2= , src)); > =A0=A0=A0=A0=A0=A0 return; > =A0=A0=A0=A0 } > -=A0 /* For copying less, we have to extract the right amount from src.= =A0 */ > -=A0 rtx reg =3D lowpart_subreg (mode, src, GET_MODE (src)); > - > -=A0 /* "Cast" the *dst to the correct mode.=A0 */ > -=A0 *dst =3D adjust_address (*dst, mode, 0); > -=A0 /* Emit the memset.=A0 */ > -=A0 emit_move_insn (*dst, reg); > -=A0 /* Move the pointer forward.=A0 */ > -=A0 *dst =3D aarch64_progress_pointer (*dst); > +=A0 if (known_lt (GET_MODE_SIZE (mode), 16)) > +=A0=A0=A0 src =3D lowpart_subreg (mode, src, GET_MODE (src)); > +=A0 emit_move_insn (adjust_address (dst, mode, offset), src); > =A0} >=20 > =A0/* Expand a setmem using the MOPS instructions.=A0 OPERANDS are the sa= me > @@ -25461,7 +25428,7 @@ aarch64_expand_setmem_mops (rtx *operands) > =A0bool > =A0aarch64_expand_setmem (rtx *operands) > =A0{ > -=A0 int n, mode_bits; > +=A0 int mode_bytes; > =A0=A0 unsigned HOST_WIDE_INT len; > =A0=A0 rtx dst =3D operands[0]; > =A0=A0 rtx val =3D operands[2], src; > @@ -25474,104 +25441,70 @@ aarch64_expand_setmem (rtx *operands) > =A0=A0=A0=A0=A0=A0 || (STRICT_ALIGNMENT && align < 16)) > =A0=A0=A0=A0 return aarch64_expand_setmem_mops (operands); >=20 > -=A0 bool size_p =3D optimize_function_for_size_p (cfun); > - > =A0=A0 /* Default the maximum to 256-bytes when considering only libcall = vs > =A0=A0=A0=A0=A0 SIMD broadcast sequence.=A0 */ > =A0=A0 unsigned max_set_size =3D 256; > =A0=A0 unsigned mops_threshold =3D aarch64_mops_memset_size_threshold; >=20 > +=A0 /* Reduce the maximum size with -Os.=A0 */ > +=A0 if (optimize_function_for_size_p (cfun)) > +=A0=A0=A0 max_set_size =3D 96; > + .... This is a new "magic" number in this code. It looks sensible, but how = did you arrive at it? Thanks, Kyrill > =A0=A0 len =3D UINTVAL (operands[1]); >=20 > =A0=A0 /* Large memset uses MOPS when available or a library call.=A0 */ > =A0=A0 if (len > max_set_size || (TARGET_MOPS && len > mops_threshold)) > =A0=A0=A0=A0 return aarch64_expand_setmem_mops (operands); >=20 > -=A0 int cst_val =3D !!(CONST_INT_P (val) && (INTVAL (val) !=3D 0)); > -=A0 /* The MOPS sequence takes: > -=A0=A0=A0=A0 3 instructions for the memory storing > -=A0=A0=A0=A0 + 1 to move the constant size into a reg > -=A0=A0=A0=A0 + 1 if VAL is a non-zero constant to move into a reg > -=A0=A0=A0 (zero constants can use XZR directly).=A0 */ > -=A0 unsigned mops_cost =3D 3 + 1 + cst_val; > -=A0 /* A libcall to memset in the worst case takes 3 instructions to pre= pare > -=A0=A0=A0=A0 the arguments + 1 for the call.=A0 */ > -=A0 unsigned libcall_cost =3D 4; > - > -=A0 /* Attempt a sequence with a vector broadcast followed by stores. > -=A0=A0=A0=A0 Count the number of operations involved to see if it's wort= h it > -=A0=A0=A0=A0 against the alternatives.=A0 A simple counter simd_ops on t= he > -=A0=A0=A0=A0 algorithmically-relevant operations is used rather than an = rtx_insn count > -=A0=A0=A0=A0 as all the pointer adjusmtents and mode reinterprets will b= e optimized > -=A0=A0=A0=A0 away later.=A0 */ > -=A0 start_sequence (); > -=A0 unsigned simd_ops =3D 0; > - > =A0=A0 base =3D copy_to_mode_reg (Pmode, XEXP (dst, 0)); > =A0=A0 dst =3D adjust_automodify_address (dst, VOIDmode, base, 0); >=20 > =A0=A0 /* Prepare the val using a DUP/MOVI v0.16B, val.=A0 */ > =A0=A0 src =3D expand_vector_broadcast (V16QImode, val); > =A0=A0 src =3D force_reg (V16QImode, src); > -=A0 simd_ops++; > -=A0 /* Convert len to bits to make the rest of the code simpler.=A0 */ > -=A0 n =3D len * BITS_PER_UNIT; >=20 > -=A0 /* Maximum amount to copy in one go.=A0 We allow 256-bit chunks base= d on > the > -=A0=A0=A0=A0 AARCH64_EXTRA_TUNE_NO_LDP_STP_QREGS tuning parameter.=A0 */ > -=A0 const int copy_limit =3D (aarch64_tune_params.extra_tuning_flags > -=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= & AARCH64_EXTRA_TUNE_NO_LDP_STP_QREGS) > -=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= ? GET_MODE_BITSIZE (TImode) : 256; > +=A0 /* Set maximum amount to write in one go.=A0 We allow 32-byte chunks > based > +=A0=A0=A0=A0 on the AARCH64_EXTRA_TUNE_NO_LDP_STP_QREGS tuning > parameter.=A0 */ > +=A0 unsigned set_max =3D 32; > + > +=A0 if (len <=3D 24 || (aarch64_tune_params.extra_tuning_flags > +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 & AARCH64_EXTRA_T= UNE_NO_LDP_STP_QREGS)) > +=A0=A0=A0 set_max =3D 16; >=20 > -=A0 while (n > 0) > +=A0 int offset =3D 0; > +=A0 while (len > 0) > =A0=A0=A0=A0 { > =A0=A0=A0=A0=A0=A0 /* Find the largest mode in which to do the copy witho= ut > =A0=A0=A0=A0=A0=A0=A0=A0=A0 over writing.=A0 */ > =A0=A0=A0=A0=A0=A0 opt_scalar_int_mode mode_iter; > =A0=A0=A0=A0=A0=A0 FOR_EACH_MODE_IN_CLASS (mode_iter, MODE_INT) > -=A0=A0=A0=A0=A0=A0 if (GET_MODE_BITSIZE (mode_iter.require ()) <=3D MIN = (n, copy_limit)) > +=A0=A0=A0=A0=A0=A0 if (GET_MODE_SIZE (mode_iter.require ()) <=3D MIN (le= n, set_max)) > =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 cur_mode =3D mode_iter.require (); >=20 > =A0=A0=A0=A0=A0=A0 gcc_assert (cur_mode !=3D BLKmode); >=20 > -=A0=A0=A0=A0=A0 mode_bits =3D GET_MODE_BITSIZE (cur_mode).to_constant ()= ; > -=A0=A0=A0=A0=A0 aarch64_set_one_block_and_progress_pointer (src, &dst, c= ur_mode); > -=A0=A0=A0=A0=A0 simd_ops++; > -=A0=A0=A0=A0=A0 n -=3D mode_bits; > +=A0=A0=A0=A0=A0 mode_bytes =3D GET_MODE_SIZE (cur_mode).to_constant (); > + > +=A0=A0=A0=A0=A0 /* Prefer Q-register accesses for the last bytes.=A0 */ > +=A0=A0=A0=A0=A0 if (mode_bytes =3D=3D 16) > +=A0=A0=A0=A0=A0=A0 cur_mode =3D V16QImode; > + > +=A0=A0=A0=A0=A0 aarch64_set_one_block (src, dst, offset, cur_mode); > +=A0=A0=A0=A0=A0 len -=3D mode_bytes; > +=A0=A0=A0=A0=A0 offset +=3D mode_bytes; >=20 > =A0=A0=A0=A0=A0=A0 /* Emit trailing writes using overlapping unaligned ac= cesses > -=A0=A0=A0=A0=A0=A0 (when !STRICT_ALIGNMENT) - this is smaller and faster= .=A0 */ > -=A0=A0=A0=A0=A0 if (n > 0 && n < copy_limit / 2 && !STRICT_ALIGNMENT) > +=A0=A0=A0=A0=A0=A0=A0 (when !STRICT_ALIGNMENT) - this is smaller and fas= ter.=A0 */ > +=A0=A0=A0=A0=A0 if (len > 0 && len < set_max / 2 && !STRICT_ALIGNMENT) > =A0=A0=A0=A0=A0=A0=A0=A0 { > -=A0=A0=A0=A0=A0=A0=A0=A0 next_mode =3D smallest_mode_for_size (n, MODE_I= NT); > -=A0=A0=A0=A0=A0=A0=A0=A0 int n_bits =3D GET_MODE_BITSIZE (next_mode).to_= constant (); > -=A0=A0=A0=A0=A0=A0=A0=A0 gcc_assert (n_bits <=3D mode_bits); > -=A0=A0=A0=A0=A0=A0=A0=A0 dst =3D aarch64_move_pointer (dst, (n - n_bits)= / BITS_PER_UNIT); > -=A0=A0=A0=A0=A0=A0=A0=A0 n =3D n_bits; > +=A0=A0=A0=A0=A0=A0=A0=A0 next_mode =3D smallest_mode_for_size (len * BIT= S_PER_UNIT, > MODE_INT); > +=A0=A0=A0=A0=A0=A0=A0=A0 int n_bytes =3D GET_MODE_SIZE (next_mode).to_co= nstant (); > +=A0=A0=A0=A0=A0=A0=A0=A0 gcc_assert (n_bytes <=3D mode_bytes); > +=A0=A0=A0=A0=A0=A0=A0=A0 offset -=3D n_bytes - len; > +=A0=A0=A0=A0=A0=A0=A0=A0 len =3D n_bytes; > =A0=A0=A0=A0=A0=A0=A0=A0 } > =A0=A0=A0=A0 } > -=A0 rtx_insn *seq =3D get_insns (); > -=A0 end_sequence (); > - > -=A0 if (size_p) > -=A0=A0=A0 { > -=A0=A0=A0=A0=A0 /* When optimizing for size we have 3 options: the SIMD = broadcast > sequence, > -=A0=A0=A0=A0=A0=A0=A0 call to memset or the MOPS expansion.=A0 */ > -=A0=A0=A0=A0=A0 if (TARGET_MOPS > -=A0=A0=A0=A0=A0=A0=A0=A0 && mops_cost <=3D libcall_cost > -=A0=A0=A0=A0=A0=A0=A0=A0 && mops_cost <=3D simd_ops) > -=A0=A0=A0=A0=A0=A0 return aarch64_expand_setmem_mops (operands); > -=A0=A0=A0=A0=A0 /* If MOPS is not available or not shorter pick a libcal= l if the SIMD > -=A0=A0=A0=A0=A0=A0=A0 sequence is too long.=A0 */ > -=A0=A0=A0=A0=A0 else if (libcall_cost < simd_ops) > -=A0=A0=A0=A0=A0=A0 return false; > -=A0=A0=A0=A0=A0 emit_insn (seq); > -=A0=A0=A0=A0=A0 return true; > -=A0=A0=A0 } >=20 > -=A0 /* At this point the SIMD broadcast sequence is the best choice when > -=A0=A0=A0=A0 optimizing for speed.=A0 */ > -=A0 emit_insn (seq); > =A0=A0 return true; > =A0} >=20