From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from EUR04-HE1-obe.outbound.protection.outlook.com (mail-he1eur04on2077.outbound.protection.outlook.com [40.107.7.77]) by sourceware.org (Postfix) with ESMTPS id 832FA3858D28 for ; Mon, 6 Nov 2023 12:12:03 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 832FA3858D28 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 832FA3858D28 Authentication-Results: server2.sourceware.org; arc=pass smtp.remote-ip=40.107.7.77 ARC-Seal: i=3; a=rsa-sha256; d=sourceware.org; s=key; t=1699272725; cv=pass; b=QkME1LaX+18eqP0Wy4CT4L8bxNcxv0pUBzxFrnlchB6FXI3rYJ5CIAtuCVdT8S6rcc+9bgEkqeR0Z3XQ1ksoxGSkylgxIJZIRp/8Fa8gY+0yoVHmB32AC57f8POp/RttN6q5aE/Qbi9KKHS6oNHj4b46mL7WkYgpZx3ltjiL6mM= ARC-Message-Signature: i=3; a=rsa-sha256; d=sourceware.org; s=key; t=1699272725; c=relaxed/simple; bh=8ujs2uo//yJvq80SvQkZe7K4sYeUKJRelxQjvrgBBCM=; h=DKIM-Signature:DKIM-Signature:From:To:Subject:Date:Message-ID: MIME-Version; b=bHJtIku+B5KDQ9/AAuJZLsA2zt/qXQwiBLytSNVKfZoU/yczLVucEJlpgUrPDWr3bKRDp/f/EtmMO9Ge9QMsj3NiAOOeKUWS2TUbVWd3c8Z/0NhNMhJ12dyIE9nxeHDbrxbN/KLvYx/v6wMV+EnsNNaKJ5E8Vqa24fkRqsQWy1A= ARC-Authentication-Results: i=3; server2.sourceware.org ARC-Seal: i=2; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=pass; b=cvEIopRL/9ynwAwJrN0LXW9Bl6Ryzj+SLHO9tmkYUAca7uNuz2kvDpBO263fT7XTCwfqBbOPUVY9XZs35cClqjqRDkj7dqUAGM6ScN/H56t0X/0EWBo7crKzKZCnSIscYNEmn8jbR2Xj6CmmCdRbUnAMxINFp7jk/B2sPo9EXONb/byT9neumCHfORGKbluKvUbSHJvXZqLrODonbkNNp277l+EOm7LfBim8ft9oa/HWbW6MUr85f8hUlcGhu6ZV2slRuRMAUD51pBY8gF9HZXkjUXBcMtZ1BEAz1PCwLkRZPXAoqPpS81+xsbQELs5vmqCyNN9P76HZ6mqVlLW2xQ== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=8ujs2uo//yJvq80SvQkZe7K4sYeUKJRelxQjvrgBBCM=; b=dsHWXJF40gxgZ5K60KQycBH7b6lvWiLd+uGEvXAM0u/uqzsWg3QaRsC4COYUnlSzTmeWKuA9TSUv0QTR6OroFljCFXGIBuJcltzLl27i1Vmz6mOVfcRjLzE6ZSfB4E4/LxxotoZygOZDY2vqR7f+vmmX5HjXlyB2Cqrn5uHjvIXg+zhFjWS/0/PvrtX/l3KDCZfFlhRu2N6Rzj9oe0XGNrM9K9dwC1U9UGKh+sQ2ynqN5Ijaz9EGs5IOpqMkPVH5Yts+rROuUYyINnr78xkVAy6bOxPvIbZ1kJ+TsssnutprWcIBh42Xma4uSQbaz+V6pdI53bkx7cI4JmV20CfznA== ARC-Authentication-Results: i=2; mx.microsoft.com 1; spf=pass (sender ip is 63.35.35.123) smtp.rcpttodomain=gcc.gnu.org smtp.mailfrom=arm.com; dmarc=pass (p=none sp=none pct=100) action=none header.from=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com; arc=pass (0 oda=1 ltdi=1 spf=[1,1,smtp.mailfrom=arm.com] dkim=[1,1,header.d=arm.com] dmarc=[1,1,header.from=arm.com]) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=8ujs2uo//yJvq80SvQkZe7K4sYeUKJRelxQjvrgBBCM=; b=VkZ1r6iciopFJ/IjNEP2rVM8IlBF13sdDvJTDd6+9a2DLkQSovCLj1X9zDJkZPRpQ1h3ehORDtnBlZeN09Ff4Ipt07hUvjMdeoZMTiX1l58HeOHdisropbXFop9uU6VDYeXBj7NoNhjj8br0OOgDmL5MEnmaoqrqVXyIy8hDReg= Received: from AM0PR03CA0108.eurprd03.prod.outlook.com (2603:10a6:208:69::49) by AM8PR08MB6548.eurprd08.prod.outlook.com (2603:10a6:20b:314::14) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6954.28; Mon, 6 Nov 2023 12:11:59 +0000 Received: from AM4PEPF00025F98.EURPRD83.prod.outlook.com (2603:10a6:208:69:cafe::ea) by AM0PR03CA0108.outlook.office365.com (2603:10a6:208:69::49) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6954.28 via Frontend Transport; Mon, 6 Nov 2023 12:11:59 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; pr=C Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by AM4PEPF00025F98.mail.protection.outlook.com (10.167.16.7) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7002.1 via Frontend Transport; Mon, 6 Nov 2023 12:11:59 +0000 Received: ("Tessian outbound e243565b0037:v228"); Mon, 06 Nov 2023 12:11:59 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: 1e9c0d7c4fbc3ff4 X-CR-MTA-TID: 64aa7808 Received: from 371357c853af.2 by 64aa7808-outbound-1.mta.getcheckrecipient.com id A3608B36-93A7-4708-A175-C7758E993CD1.1; Mon, 06 Nov 2023 12:11:52 +0000 Received: from EUR05-AM6-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id 371357c853af.2 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Mon, 06 Nov 2023 12:11:52 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=AojFosVIX+5rlk6/L/Mo+jnEsmafOk3rGv9tDHc5klYBdev0Pj8gvjet+Z5SCFMSAMDDlMG3h78fbV6hB0YcJ9nOi/zfq0/oC1Px0i3R+TVKdiNM1s4/O+D8cc7mDtuWigjB1PbMpXWHkuIHp4e42TxfcfwZENNy+mLmOGGeNFQcaFX6rXpyIcHcY+FnD28Z9tNoHwYoLZMafHvL7K4zPQnI4SCvEiPcT7E8ckAr24JwlcDJZDGNVEMbJjaBKqkmNvXpNs2zjqhfdGm/fGWe2KxV9pZxnhuOUfLKEyDoZDOkXMqVUF3KDS5m4s/yhsBPjNaYTp4rRxWoMto1kpJ6+w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=8ujs2uo//yJvq80SvQkZe7K4sYeUKJRelxQjvrgBBCM=; b=OIwi3x53XPuqFFPycdm4DtfeK/lm9JAUkUIv0n2qbA4DtrOIQn/H5gHnx0lj+TXso9z/wWXM7KK8PfEXg+ZqgnYnW6/SftTRBjLQ13YlQVjWZ62ZX8+TTdZigaEOrj9Q4ohWNsPHOJW5x20dQFAcj9wyoUmTRFXX0KaMzkkyLITVj6OwlNCMxsL00eokCEeYTQ1FUUNMN0qgmxWIwzd9K3x9ApFkASN2VKRi23p97no2x+cCifkODFYUOE8Nzli8nCrIj5xhGb5HryaVjUh8bUvewNYH5uENSXJYcnG9w1NxJ3HXdu0owT5c36RJtc+d1f6UC/9UWu1yCokIbgCMOQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=8ujs2uo//yJvq80SvQkZe7K4sYeUKJRelxQjvrgBBCM=; b=VkZ1r6iciopFJ/IjNEP2rVM8IlBF13sdDvJTDd6+9a2DLkQSovCLj1X9zDJkZPRpQ1h3ehORDtnBlZeN09Ff4Ipt07hUvjMdeoZMTiX1l58HeOHdisropbXFop9uU6VDYeXBj7NoNhjj8br0OOgDmL5MEnmaoqrqVXyIy8hDReg= Received: from DB3PR08MB8986.eurprd08.prod.outlook.com (2603:10a6:10:42b::7) by GV1PR08MB7778.eurprd08.prod.outlook.com (2603:10a6:150:56::9) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6954.27; Mon, 6 Nov 2023 12:11:50 +0000 Received: from DB3PR08MB8986.eurprd08.prod.outlook.com ([fe80::465f:1549:2d43:c618]) by DB3PR08MB8986.eurprd08.prod.outlook.com ([fe80::465f:1549:2d43:c618%6]) with mapi id 15.20.6954.027; Mon, 6 Nov 2023 12:11:50 +0000 From: Wilco Dijkstra To: GCC Patches CC: Richard Sandiford , Richard Earnshaw Subject: Re: [PATCH] AArch64: Cleanup memset expansion Thread-Topic: [PATCH] AArch64: Cleanup memset expansion Thread-Index: AQHaAonvDvOGXhWPREyotSkr/rO2DbBtT7FF Date: Mon, 6 Nov 2023 12:11:50 +0000 Message-ID: References: In-Reply-To: Accept-Language: en-GB, en-US Content-Language: en-GB X-MS-Has-Attach: X-MS-TNEF-Correlator: msip_labels: Authentication-Results-Original: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; x-ms-traffictypediagnostic: DB3PR08MB8986:EE_|GV1PR08MB7778:EE_|AM4PEPF00025F98:EE_|AM8PR08MB6548:EE_ X-MS-Office365-Filtering-Correlation-Id: 54987400-9773-4080-05f3-08dbdec193e4 x-checkrecipientrouted: true nodisclaimer: true X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: SJKjuFyp8mQSKE2zoucgI7RM3YCp8H6X5hWg9JuHdXwSuy2y6L6p65VOei261DLdvOpGu8mLJ2SYoYPxFIF+dObjksk2krEpZg4iNdTyj0O02svkW/bhgYP5ocs1p6MmQ1rj/xeQEbBdN9fJp6ykHCegRXvqbdIUYiVZxbt/iO79YtUX2OyJwAngoINLSwxzzzMgvl18zrZSNPaeb/SBeYZR6DAL/eaJ4Ir0jhrIvvMpodBvNr8l4FZ6bCDTeTL5LIrjg0V2g6dOhc4+2sufrHvxdFD0moDG1YORIBKu43OScu7q3nhotbiwL0icOIdle9cQmeeh40RWv9hABOqEjbqzHmtSztXGp3jmk28dsoPC29otwXoJxaxIZ/uGHMHDdPhOT+bY5+yCfEtYDvwO17xqo4SxtU+aGWMZHGPf7mz/OMwzsJDgO6QeNVETS6M/CCoZHYbC3zqNGt9Rkw1usw4uVW8nX39kgYSy5ny2vsiZyB22pWaw9+K1YvkHneJiuX96lrkfliNPaLEILmwWbB7SDsQLUShzMAmdf8zDzPEIcbUhWYvHS1mcj6lOkq6cIkGDKd4od7hKoXF8ZiPVhrWKl5DfRLkLoa4mEa8BAodDUqkGKgGKPmJdpIB3cc6U X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:DB3PR08MB8986.eurprd08.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230031)(396003)(376002)(136003)(346002)(39860400002)(366004)(230922051799003)(1800799009)(186009)(64100799003)(451199024)(26005)(6506007)(7696005)(9686003)(41300700001)(8936002)(83380400001)(71200400001)(4326008)(8676002)(52536014)(2906002)(5660300002)(66946007)(478600001)(64756008)(54906003)(6916009)(76116006)(66556008)(66446008)(66476007)(316002)(91956017)(38070700009)(38100700002)(122000001)(86362001)(33656002)(55016003);DIR:OUT;SFP:1101; Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-MS-Exchange-Transport-CrossTenantHeadersStamped: GV1PR08MB7778 Original-Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: AM4PEPF00025F98.EURPRD83.prod.outlook.com X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id-Prvs: f7370308-4357-40c6-2563-08dbdec18e6b X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: s13eUaTEKD7+0RcUJbY1TMMrTleYIocc8SLfO24wEhxjQ/U7qJRyvEYMMDwEU8Hsm7orhf5lddCRM9xlKwIqQd8x55LKuEjw0bOaRXvK4LkYpiT1cjzHgZmMKSQhbHkWVQg+bff7LD1ZCVstfcFJIfiJ9swQmm+YGo/dEiP/9iJWCNubm8uAMyfoMUa/KqpwAEjpej55GosNXcL6cpPd+VMueMXY4O3IkjUu2aVQbmlENWmSkyTS704yNoIzmbeXxgnTtkXRpaEZ6CkByRLnNKJSrfRXPwfrRuYhiik3ykS6RQVXkrT6wVY91X0DCOxizfdG+TAK875tpUhH55gKtAXoYit9aTflyj0i8zRDWEzBbEccgFDGdkl9UjZj54GVHyt64MF1jTBckKTqiiXINngNStyGUnOPry1QxumGZcZZBuNMiZa0iHpPESIGugGYBg/6nriHlvkn2epCjcfjukNdoKS0Q8Z/wQb0574iwAQfm9OgUSmDhtld7cpzSJ8Tzk0FlHCkRZKYGa0Bq6nRMiywCSDHMQid7ZzUSfspPy9kzIdrhCLw1PvbEyPX5C7hqwmpiHnXY30Ftsd7/7+L/2pVAq3NATtr3jUl2oN1uCFUzwdg66bSrs59HTiYtRlAcRh0X1h5SaIdxUS0o37Afqc70+fev7oRS1n5rdKD4W2wEfZ5XNrG9qU2+KdiV5arugjDFPnI57cLVT5wO25hznnXwj9yK5yiNjyF8BFxozjT7xvVswBwDHrfDm43YIdq X-Forefront-Antispam-Report: CIP:63.35.35.123;CTRY:IE;LANG:en;SCL:1;SRV:;IPV:CAL;SFV:NSPM;H:64aa7808-outbound-1.mta.getcheckrecipient.com;PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com;CAT:NONE;SFS:(13230031)(4636009)(346002)(376002)(396003)(39860400002)(136003)(230922051799003)(186009)(64100799003)(1800799009)(82310400011)(451199024)(36840700001)(46966006)(40470700004)(55016003)(36860700001)(40480700001)(478600001)(356005)(6506007)(81166007)(7696005)(9686003)(82740400003)(54906003)(316002)(336012)(6916009)(47076005)(26005)(70586007)(70206006)(83380400001)(8676002)(4326008)(8936002)(52536014)(86362001)(2906002)(41300700001)(33656002)(5660300002)(40460700003);DIR:OUT;SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 06 Nov 2023 12:11:59.3524 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 54987400-9773-4080-05f3-08dbdec193e4 X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d;Ip=[63.35.35.123];Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: AM4PEPF00025F98.EURPRD83.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: AM8PR08MB6548 X-Spam-Status: No, score=-9.3 required=5.0 tests=BAYES_00,BODY_8BITS,DKIM_SIGNED,DKIM_VALID,FORGED_SPF_HELO,GIT_PATCH_0,KAM_DMARC_NONE,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_PASS,SPF_NONE,TXREP,T_SCC_BODY_TEXT_LINE,UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: ping=0A= =A0=0A= Cleanup memset implementation.=A0 Similar to memcpy/memmove, use an offset = and=0A= bytes throughout.=A0 Simplify the complex calculations when optimizing for = size=0A= by using a fixed limit.=0A= =0A= Passes regress/bootstrap, OK for commit?=0A= =A0=A0=A0 =0A= gcc/ChangeLog:=0A= =A0=A0=A0=A0=A0=A0=A0 * config/aarch64/aarch64.cc (aarch64_progress_pointer= ): Remove function.=0A= =A0=A0=A0=A0=A0=A0=A0 (aarch64_set_one_block_and_progress_pointer): Simplif= y and clean up.=0A= =A0=A0=A0=A0=A0=A0=A0 (aarch64_expand_setmem): Clean up implementation, use= byte offsets,=0A= =A0=A0=A0=A0=A0=A0=A0 simplify size calculation.=0A= =0A= ---=0A= =0A= diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc= =0A= index e19e2d1de2e5b30eca672df05d9dcc1bc106ecc8..578a253d6e0e133e19592553fc8= 73b3e73f9f218 100644=0A= --- a/gcc/config/aarch64/aarch64.cc=0A= +++ b/gcc/config/aarch64/aarch64.cc=0A= @@ -25229,15 +25229,6 @@ aarch64_move_pointer (rtx pointer, poly_int64 amou= nt)=0A= =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 next, amount);=0A= =A0}=0A= =A0=0A= -/* Return a new RTX holding the result of moving POINTER forward by the=0A= -=A0=A0 size of the mode it points to.=A0 */=0A= -=0A= -static rtx=0A= -aarch64_progress_pointer (rtx pointer)=0A= -{=0A= -=A0 return aarch64_move_pointer (pointer, GET_MODE_SIZE (GET_MODE (pointer= )));=0A= -}=0A= -=0A= =A0/* Copy one block of size MODE from SRC to DST at offset OFFSET.=A0 */= =0A= =A0=0A= =A0static void=0A= @@ -25393,46 +25384,22 @@ aarch64_expand_cpymem (rtx *operands, bool is_mem= move)=0A= =A0=A0 return true;=0A= =A0}=0A= =A0=0A= -/* Like aarch64_copy_one_block_and_progress_pointers, except for memset wh= ere=0A= -=A0=A0 SRC is a register we have created with the duplicated value to be s= et.=A0 */=0A= +/* Set one block of size MODE at DST at offset OFFSET to value in SRC.=A0 = */=0A= =A0static void=0A= -aarch64_set_one_block_and_progress_pointer (rtx src, rtx *dst,=0A= -=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 machine_mode mode)= =0A= -{=0A= -=A0 /* If we are copying 128bits or 256bits, we can do that straight from= =0A= -=A0=A0=A0=A0 the SIMD register we prepared.=A0 */=0A= -=A0 if (known_eq (GET_MODE_BITSIZE (mode), 256))=0A= -=A0=A0=A0 {=0A= -=A0=A0=A0=A0=A0 mode =3D GET_MODE (src);=0A= -=A0=A0=A0=A0=A0 /* "Cast" the *dst to the correct mode.=A0 */=0A= -=A0=A0=A0=A0=A0 *dst =3D adjust_address (*dst, mode, 0);=0A= -=A0=A0=A0=A0=A0 /* Emit the memset.=A0 */=0A= -=A0=A0=A0=A0=A0 emit_insn (aarch64_gen_store_pair (mode, *dst, src,=0A= -=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 aarch64_progress_pointer (*ds= t), src));=0A= -=0A= -=A0=A0=A0=A0=A0 /* Move the pointers forward.=A0 */=0A= -=A0=A0=A0=A0=A0 *dst =3D aarch64_move_pointer (*dst, 32);=0A= -=A0=A0=A0=A0=A0 return;=0A= -=A0=A0=A0 }=0A= -=A0 if (known_eq (GET_MODE_BITSIZE (mode), 128))=0A= +aarch64_set_one_block (rtx src, rtx dst, int offset, machine_mode mode)=0A= +{=0A= +=A0 /* Emit explict store pair instructions for 32-byte writes.=A0 */=0A= +=A0 if (known_eq (GET_MODE_SIZE (mode), 32))=0A= =A0=A0=A0=A0 {=0A= -=A0=A0=A0=A0=A0 /* "Cast" the *dst to the correct mode.=A0 */=0A= -=A0=A0=A0=A0=A0 *dst =3D adjust_address (*dst, GET_MODE (src), 0);=0A= -=A0=A0=A0=A0=A0 /* Emit the memset.=A0 */=0A= -=A0=A0=A0=A0=A0 emit_move_insn (*dst, src);=0A= -=A0=A0=A0=A0=A0 /* Move the pointers forward.=A0 */=0A= -=A0=A0=A0=A0=A0 *dst =3D aarch64_move_pointer (*dst, 16);=0A= +=A0=A0=A0=A0=A0 mode =3D V16QImode;=0A= +=A0=A0=A0=A0=A0 rtx dst1 =3D adjust_address (dst, mode, offset);=0A= +=A0=A0=A0=A0=A0 rtx dst2 =3D adjust_address (dst, mode, offset + 16);=0A= +=A0=A0=A0=A0=A0 emit_insn (aarch64_gen_store_pair (mode, dst1, src, dst2, = src));=0A= =A0=A0=A0=A0=A0=A0 return;=0A= =A0=A0=A0=A0 }=0A= -=A0 /* For copying less, we have to extract the right amount from src.=A0 = */=0A= -=A0 rtx reg =3D lowpart_subreg (mode, src, GET_MODE (src));=0A= -=0A= -=A0 /* "Cast" the *dst to the correct mode.=A0 */=0A= -=A0 *dst =3D adjust_address (*dst, mode, 0);=0A= -=A0 /* Emit the memset.=A0 */=0A= -=A0 emit_move_insn (*dst, reg);=0A= -=A0 /* Move the pointer forward.=A0 */=0A= -=A0 *dst =3D aarch64_progress_pointer (*dst);=0A= +=A0 if (known_lt (GET_MODE_SIZE (mode), 16))=0A= +=A0=A0=A0 src =3D lowpart_subreg (mode, src, GET_MODE (src));=0A= +=A0 emit_move_insn (adjust_address (dst, mode, offset), src);=0A= =A0}=0A= =A0=0A= =A0/* Expand a setmem using the MOPS instructions.=A0 OPERANDS are the same= =0A= @@ -25461,7 +25428,7 @@ aarch64_expand_setmem_mops (rtx *operands)=0A= =A0bool=0A= =A0aarch64_expand_setmem (rtx *operands)=0A= =A0{=0A= -=A0 int n, mode_bits;=0A= +=A0 int mode_bytes;=0A= =A0=A0 unsigned HOST_WIDE_INT len;=0A= =A0=A0 rtx dst =3D operands[0];=0A= =A0=A0 rtx val =3D operands[2], src;=0A= @@ -25474,104 +25441,70 @@ aarch64_expand_setmem (rtx *operands)=0A= =A0=A0=A0=A0=A0=A0 || (STRICT_ALIGNMENT && align < 16))=0A= =A0=A0=A0=A0 return aarch64_expand_setmem_mops (operands);=0A= =A0=0A= -=A0 bool size_p =3D optimize_function_for_size_p (cfun);=0A= -=0A= =A0=A0 /* Default the maximum to 256-bytes when considering only libcall vs= =0A= =A0=A0=A0=A0=A0 SIMD broadcast sequence.=A0 */=0A= =A0=A0 unsigned max_set_size =3D 256;=0A= =A0=A0 unsigned mops_threshold =3D aarch64_mops_memset_size_threshold;=0A= =A0=0A= +=A0 /* Reduce the maximum size with -Os.=A0 */=0A= +=A0 if (optimize_function_for_size_p (cfun))=0A= +=A0=A0=A0 max_set_size =3D 96;=0A= +=0A= =A0=A0 len =3D UINTVAL (operands[1]);=0A= =A0=0A= =A0=A0 /* Large memset uses MOPS when available or a library call.=A0 */=0A= =A0=A0 if (len > max_set_size || (TARGET_MOPS && len > mops_threshold))=0A= =A0=A0=A0=A0 return aarch64_expand_setmem_mops (operands);=0A= =A0=0A= -=A0 int cst_val =3D !!(CONST_INT_P (val) && (INTVAL (val) !=3D 0));=0A= -=A0 /* The MOPS sequence takes:=0A= -=A0=A0=A0=A0 3 instructions for the memory storing=0A= -=A0=A0=A0=A0 + 1 to move the constant size into a reg=0A= -=A0=A0=A0=A0 + 1 if VAL is a non-zero constant to move into a reg=0A= -=A0=A0=A0 (zero constants can use XZR directly).=A0 */=0A= -=A0 unsigned mops_cost =3D 3 + 1 + cst_val;=0A= -=A0 /* A libcall to memset in the worst case takes 3 instructions to prepa= re=0A= -=A0=A0=A0=A0 the arguments + 1 for the call.=A0 */=0A= -=A0 unsigned libcall_cost =3D 4;=0A= -=0A= -=A0 /* Attempt a sequence with a vector broadcast followed by stores.=0A= -=A0=A0=A0=A0 Count the number of operations involved to see if it's worth = it=0A= -=A0=A0=A0=A0 against the alternatives.=A0 A simple counter simd_ops on the= =0A= -=A0=A0=A0=A0 algorithmically-relevant operations is used rather than an rt= x_insn count=0A= -=A0=A0=A0=A0 as all the pointer adjusmtents and mode reinterprets will be = optimized=0A= -=A0=A0=A0=A0 away later.=A0 */=0A= -=A0 start_sequence ();=0A= -=A0 unsigned simd_ops =3D 0;=0A= -=0A= =A0=A0 base =3D copy_to_mode_reg (Pmode, XEXP (dst, 0));=0A= =A0=A0 dst =3D adjust_automodify_address (dst, VOIDmode, base, 0);=0A= =A0=0A= =A0=A0 /* Prepare the val using a DUP/MOVI v0.16B, val.=A0 */=0A= =A0=A0 src =3D expand_vector_broadcast (V16QImode, val);=0A= =A0=A0 src =3D force_reg (V16QImode, src);=0A= -=A0 simd_ops++;=0A= -=A0 /* Convert len to bits to make the rest of the code simpler.=A0 */=0A= -=A0 n =3D len * BITS_PER_UNIT;=0A= =A0=0A= -=A0 /* Maximum amount to copy in one go.=A0 We allow 256-bit chunks based = on the=0A= -=A0=A0=A0=A0 AARCH64_EXTRA_TUNE_NO_LDP_STP_QREGS tuning parameter.=A0 */= =0A= -=A0 const int copy_limit =3D (aarch64_tune_params.extra_tuning_flags=0A= -=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 &= AARCH64_EXTRA_TUNE_NO_LDP_STP_QREGS)=0A= -=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 ?= GET_MODE_BITSIZE (TImode) : 256;=0A= +=A0 /* Set maximum amount to write in one go.=A0 We allow 32-byte chunks b= ased=0A= +=A0=A0=A0=A0 on the AARCH64_EXTRA_TUNE_NO_LDP_STP_QREGS tuning parameter.= =A0 */=0A= +=A0 unsigned set_max =3D 32;=0A= +=0A= +=A0 if (len <=3D 24 || (aarch64_tune_params.extra_tuning_flags=0A= +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 & AARCH64_EXTRA_TUN= E_NO_LDP_STP_QREGS))=0A= +=A0=A0=A0 set_max =3D 16;=0A= =A0=0A= -=A0 while (n > 0)=0A= +=A0 int offset =3D 0;=0A= +=A0 while (len > 0)=0A= =A0=A0=A0=A0 {=0A= =A0=A0=A0=A0=A0=A0 /* Find the largest mode in which to do the copy without= =0A= =A0=A0=A0=A0=A0=A0=A0=A0=A0 over writing.=A0 */=0A= =A0=A0=A0=A0=A0=A0 opt_scalar_int_mode mode_iter;=0A= =A0=A0=A0=A0=A0=A0 FOR_EACH_MODE_IN_CLASS (mode_iter, MODE_INT)=0A= -=A0=A0=A0=A0=A0=A0 if (GET_MODE_BITSIZE (mode_iter.require ()) <=3D MIN (n= , copy_limit))=0A= +=A0=A0=A0=A0=A0=A0 if (GET_MODE_SIZE (mode_iter.require ()) <=3D MIN (len,= set_max))=0A= =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 cur_mode =3D mode_iter.require ();=0A= =A0=0A= =A0=A0=A0=A0=A0=A0 gcc_assert (cur_mode !=3D BLKmode);=0A= =A0=0A= -=A0=A0=A0=A0=A0 mode_bits =3D GET_MODE_BITSIZE (cur_mode).to_constant ();= =0A= -=A0=A0=A0=A0=A0 aarch64_set_one_block_and_progress_pointer (src, &dst, cur= _mode);=0A= -=A0=A0=A0=A0=A0 simd_ops++;=0A= -=A0=A0=A0=A0=A0 n -=3D mode_bits;=0A= +=A0=A0=A0=A0=A0 mode_bytes =3D GET_MODE_SIZE (cur_mode).to_constant ();=0A= +=0A= +=A0=A0=A0=A0=A0 /* Prefer Q-register accesses for the last bytes.=A0 */=0A= +=A0=A0=A0=A0=A0 if (mode_bytes =3D=3D 16)=0A= +=A0=A0=A0=A0=A0=A0 cur_mode =3D V16QImode;=0A= +=0A= +=A0=A0=A0=A0=A0 aarch64_set_one_block (src, dst, offset, cur_mode);=0A= +=A0=A0=A0=A0=A0 len -=3D mode_bytes;=0A= +=A0=A0=A0=A0=A0 offset +=3D mode_bytes;=0A= =A0=0A= =A0=A0=A0=A0=A0=A0 /* Emit trailing writes using overlapping unaligned acce= sses=0A= -=A0=A0=A0=A0=A0=A0 (when !STRICT_ALIGNMENT) - this is smaller and faster.= =A0 */=0A= -=A0=A0=A0=A0=A0 if (n > 0 && n < copy_limit / 2 && !STRICT_ALIGNMENT)=0A= +=A0=A0=A0=A0=A0=A0=A0 (when !STRICT_ALIGNMENT) - this is smaller and faste= r.=A0 */=0A= +=A0=A0=A0=A0=A0 if (len > 0 && len < set_max / 2 && !STRICT_ALIGNMENT)=0A= =A0=A0=A0=A0=A0=A0=A0=A0 {=0A= -=A0=A0=A0=A0=A0=A0=A0=A0 next_mode =3D smallest_mode_for_size (n, MODE_INT= );=0A= -=A0=A0=A0=A0=A0=A0=A0=A0 int n_bits =3D GET_MODE_BITSIZE (next_mode).to_co= nstant ();=0A= -=A0=A0=A0=A0=A0=A0=A0=A0 gcc_assert (n_bits <=3D mode_bits);=0A= -=A0=A0=A0=A0=A0=A0=A0=A0 dst =3D aarch64_move_pointer (dst, (n - n_bits) /= BITS_PER_UNIT);=0A= -=A0=A0=A0=A0=A0=A0=A0=A0 n =3D n_bits;=0A= +=A0=A0=A0=A0=A0=A0=A0=A0 next_mode =3D smallest_mode_for_size (len * BITS_= PER_UNIT, MODE_INT);=0A= +=A0=A0=A0=A0=A0=A0=A0=A0 int n_bytes =3D GET_MODE_SIZE (next_mode).to_cons= tant ();=0A= +=A0=A0=A0=A0=A0=A0=A0=A0 gcc_assert (n_bytes <=3D mode_bytes);=0A= +=A0=A0=A0=A0=A0=A0=A0=A0 offset -=3D n_bytes - len;=0A= +=A0=A0=A0=A0=A0=A0=A0=A0 len =3D n_bytes;=0A= =A0=A0=A0=A0=A0=A0=A0=A0 }=0A= =A0=A0=A0=A0 }=0A= -=A0 rtx_insn *seq =3D get_insns ();=0A= -=A0 end_sequence ();=0A= -=0A= -=A0 if (size_p)=0A= -=A0=A0=A0 {=0A= -=A0=A0=A0=A0=A0 /* When optimizing for size we have 3 options: the SIMD br= oadcast sequence,=0A= -=A0=A0=A0=A0=A0=A0=A0 call to memset or the MOPS expansion.=A0 */=0A= -=A0=A0=A0=A0=A0 if (TARGET_MOPS=0A= -=A0=A0=A0=A0=A0=A0=A0=A0 && mops_cost <=3D libcall_cost=0A= -=A0=A0=A0=A0=A0=A0=A0=A0 && mops_cost <=3D simd_ops)=0A= -=A0=A0=A0=A0=A0=A0 return aarch64_expand_setmem_mops (operands);=0A= -=A0=A0=A0=A0=A0 /* If MOPS is not available or not shorter pick a libcall = if the SIMD=0A= -=A0=A0=A0=A0=A0=A0=A0 sequence is too long.=A0 */=0A= -=A0=A0=A0=A0=A0 else if (libcall_cost < simd_ops)=0A= -=A0=A0=A0=A0=A0=A0 return false;=0A= -=A0=A0=A0=A0=A0 emit_insn (seq);=0A= -=A0=A0=A0=A0=A0 return true;=0A= -=A0=A0=A0 }=0A= =A0=0A= -=A0 /* At this point the SIMD broadcast sequence is the best choice when= =0A= -=A0=A0=A0=A0 optimizing for speed.=A0 */=0A= -=A0 emit_insn (seq);=0A= =A0=A0 return true;=0A= =A0}=0A= =A0=