From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from EUR04-HE1-obe.outbound.protection.outlook.com (mail-he1eur04on2054.outbound.protection.outlook.com [40.107.7.54]) by sourceware.org (Postfix) with ESMTPS id ACDF4385840F for ; Tue, 9 Jan 2024 20:51:25 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org ACDF4385840F Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org ACDF4385840F Authentication-Results: server2.sourceware.org; arc=pass smtp.remote-ip=40.107.7.54 ARC-Seal: i=3; a=rsa-sha256; d=sourceware.org; s=key; t=1704833488; cv=pass; b=OP44QOCxDEo7QYFCnQfLHfKnAdXeNhnwHM7dQocYdVKo3bhGySRSIrAqMMMxlSgGw2bGTNmAq4v9KhZTGYKVSXSlmj8Bb9vv5zEmQDsUEeId9/51L4xLFUjWlcHGMnNQxDBUtsDdpBYN5sKw7+j5XYOiEI2xRVs9yKGAVIZcrM4= ARC-Message-Signature: i=3; a=rsa-sha256; d=sourceware.org; s=key; t=1704833488; c=relaxed/simple; bh=VUt4IC/Po9zevQWSVvhoUgpJ4rBaJ93LRvJlUJeVBTk=; h=DKIM-Signature:DKIM-Signature:From:To:Subject:Date:Message-ID: MIME-Version; b=QVnsRS4QdZTiMh61sjm3I3ojHm1qli/pjTZg8PS2vm9Zxn7FmBsPHl1cQoTqgwQXrTpK94hYK34X0o3CDJozVbvq4YXhbYEEhvvhJaxH+q5gW7UQCFgr9jqvQ7STnUcxJpvv0ZSJak6IzvoG93fpkaYpnwq2byKtHkTgQaqk7lg= ARC-Authentication-Results: i=3; server2.sourceware.org ARC-Seal: i=2; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=pass; b=Dikgz/CHkU93+DpVtUkH9+05B4uLobPBxwJM5fin/P887dBs76GIKYO3hKN0GcvMd9mPblvx9QuDO8kjB36NkiUjuVtB5uUzkDMxbzzW6aHZpmRRDvBWODMoeSY/kOUPX8/KRgFHI2vEsmatclI+ZUMgLZ+fIbzR+7TobM0QEqCc6XcPHdewW/hbMVmwJumVo4qJ7E+Lv6jxMiCJayRHrVs30LjrgiPOK2nmz9vXWtaolCoZEgG/YyxykXp9Ym+yAmy5G+6ydfkCyZ4oBzGOeEPypaMTe74N6G4ZMq++Ta/BFFucutynFyDxSRH2N5ZeazF2gR8efji37Uq21xBZsg== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=pbt2LExfAI2d5Q+jbpwepeKiAmhKumN5RFIASDgPntk=; b=hy3n3udV8d9KGQfr8lWmUAxX+VwzxflL0ZwqZ2NqloluxVCNDCZqDENJff74mg7ZrnZpMvryYD4Sku6VfpuzhmamRQs3+JjDfflSRlxoOYJ3u/F4Z0hdxxJQdGAhynag4zHplbFuHj6r8K6jFoYe4+H8tLH/3tCjOmgWVbfdzDxhYHAYv2LUdxZsriFtZyc3g/+p1n/l3vvFxzGtu/bWvfv8FaZm92y/HXY8R2uC/2WWRrd4zLVp0o2/mjfyz003sesrl/8YRT7x+TexgR+NkP2Rzdl1d7P0hRzbu/nhdMwPMFIfoSJS3URB5NMV0Q7sTE/k+2fwqaMl3i943s/+jw== ARC-Authentication-Results: i=2; mx.microsoft.com 1; spf=pass (sender ip is 63.35.35.123) smtp.rcpttodomain=gcc.gnu.org smtp.mailfrom=arm.com; dmarc=pass (p=none sp=none pct=100) action=none header.from=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com; arc=pass (0 oda=1 ltdi=1 spf=[1,1,smtp.mailfrom=arm.com] dkim=[1,1,header.d=arm.com] dmarc=[1,1,header.from=arm.com]) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=pbt2LExfAI2d5Q+jbpwepeKiAmhKumN5RFIASDgPntk=; b=V+qQ2mBpTBdwKau5yHVmCy2Xiyatmy/71TpawKWSyNx7oERBO8HJUhXks9jHRdsnv7yLT2TavRmeHB16FcfLVAEMKZ9NM3jSuVh9jrutpxCjlnZ/l3Na0QVkPksVesN1sN1aLGl2TqXO5r3Cc0+xZoTOCoKrqLaLSwVCTgKrGSU= Received: from AS4P191CA0032.EURP191.PROD.OUTLOOK.COM (2603:10a6:20b:657::21) by AM8PR08MB6481.eurprd08.prod.outlook.com (2603:10a6:20b:364::20) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7181.17; Tue, 9 Jan 2024 20:51:22 +0000 Received: from AM4PEPF00027A62.eurprd04.prod.outlook.com (2603:10a6:20b:657:cafe::cb) by AS4P191CA0032.outlook.office365.com (2603:10a6:20b:657::21) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7159.23 via Frontend Transport; Tue, 9 Jan 2024 20:51:22 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; pr=C Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by AM4PEPF00027A62.mail.protection.outlook.com (10.167.16.71) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7181.14 via Frontend Transport; Tue, 9 Jan 2024 20:51:22 +0000 Received: ("Tessian outbound 9f74e12f0f2b:v239"); Tue, 09 Jan 2024 20:51:21 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: 7a35999e83b8132a X-CR-MTA-TID: 64aa7808 Received: from 35539ccef9ae.3 by 64aa7808-outbound-1.mta.getcheckrecipient.com id F0B738C8-E056-473E-BE44-C32F562CF961.1; Tue, 09 Jan 2024 20:51:10 +0000 Received: from EUR02-AM0-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id 35539ccef9ae.3 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Tue, 09 Jan 2024 20:51:10 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=nN+ktPDG2gb4osNEOjJQEubtuBpOaLJujzzRlae09lMQS73hKau0DvTWEmSd2ZOQ4viUV87/w9RxAv3NkN+lD9XGTysdPxXoboTlDZZ5K0w01hNSz+ua+uGV9QZBN7gDX6xfunnLZyJpsmwKiwbIjHAlxEILw7CRElZ3IPd52RZ7r+1Ek5/oPZXpcEKBWYOMBVq/St564f4RuE2dubrp4B9CaFL9kF5bX30urhL+ltiXq/PnIkHuMG7cVoWHcxQzlJYEME1owCZm0gB0Nd3HKLc1DGF0LyhUU+jBUPXwIJyvgXeePG/9SN0WuZi+fvgtciYfgaO6JERfvaOcAedPTw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=pbt2LExfAI2d5Q+jbpwepeKiAmhKumN5RFIASDgPntk=; b=gS3EMc3oJkcdaHjc153lDobZ795WTLjkzVvdfj3cBMvvmHi1r7w8ryya6VrRcWqE/uHXkmf6qMCzzxjejM6stOiNehedbkN9N9B9kX+vCTA544xf+DuJph4XeHEWC7TTgbxWq1AWB0UluzHHEFTXSGHlHcXgdFvoC9Km89S2bCBzO7Nj0Dwgk9vqJ7t+sepFP7XBmt4i+Jrj0dt6VifSSKQusz7rPoubGb8XPmi627r/Kj68jPO1/lLN98TSXZ/UvJVfPHgqMlbqQSKqvEqCrMU14aGPdPZcO6rvw/zE91zA2GXMdUGIKcY6cA/TDMWrvyK05p4QKChfMxlOMPricA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=pbt2LExfAI2d5Q+jbpwepeKiAmhKumN5RFIASDgPntk=; b=V+qQ2mBpTBdwKau5yHVmCy2Xiyatmy/71TpawKWSyNx7oERBO8HJUhXks9jHRdsnv7yLT2TavRmeHB16FcfLVAEMKZ9NM3jSuVh9jrutpxCjlnZ/l3Na0QVkPksVesN1sN1aLGl2TqXO5r3Cc0+xZoTOCoKrqLaLSwVCTgKrGSU= Received: from PAWPR08MB8982.eurprd08.prod.outlook.com (2603:10a6:102:33f::20) by PAWPR08MB10090.eurprd08.prod.outlook.com (2603:10a6:102:367::11) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7159.23; Tue, 9 Jan 2024 20:51:07 +0000 Received: from PAWPR08MB8982.eurprd08.prod.outlook.com ([fe80::2ed5:dc23:2624:df0a]) by PAWPR08MB8982.eurprd08.prod.outlook.com ([fe80::2ed5:dc23:2624:df0a%7]) with mapi id 15.20.7159.020; Tue, 9 Jan 2024 20:51:06 +0000 From: Wilco Dijkstra To: Richard Sandiford CC: Richard Earnshaw , Kyrylo Tkachov , GCC Patches , Richard Earnshaw Subject: Re: [PATCH v4] AArch64: Cleanup memset expansion Thread-Topic: [PATCH v4] AArch64: Cleanup memset expansion Thread-Index: AQHaQz2R+nbcCZyNOEiO763kMev3rg== Date: Tue, 9 Jan 2024 20:51:06 +0000 Message-ID: References: <372b9689-24b5-41f4-a990-5aee0226e15f@foss.arm.com> <61c6e268-188c-4b35-956d-bd8927d705f2@foss.arm.com> In-Reply-To: Accept-Language: en-GB, en-US Content-Language: en-GB X-MS-Has-Attach: X-MS-TNEF-Correlator: msip_labels: Authentication-Results-Original: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; x-ms-traffictypediagnostic: PAWPR08MB8982:EE_|PAWPR08MB10090:EE_|AM4PEPF00027A62:EE_|AM8PR08MB6481:EE_ X-MS-Office365-Filtering-Correlation-Id: c7b4ee9a-285e-4dd3-1c21-08dc1154bcb8 x-checkrecipientrouted: true nodisclaimer: true X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: B9cD4yja/rV1JzseAapm7JQLYCp8soQdleespCIpDPV6eH38aerxZQzDdBWnLNVY331oA+QG3qq8yh1BJHAHj69OtGbaz7HKlZ2QyoXTmtOKK8zqpd2aQ7l5RxkoEPfPf+NT3Nt0a4eFmppD7N7H0u6BkQ4jiGwaRfC/SAzA3iLDoMZ7npkqDcXhnQWgCkyrKjor55ElBPCubvV77SDWBdGlrEjwEiUBZfv4ws10eNqsqSImz4HFG2j7mvRcE3YbNzOJYc84jtdFo+0SAuL29vsBo/4msFZuBgthV00aqcuZEZ99e9gDmLYOWNeRsWkmJl/ww3K7U7GtCIQS6f13UA72wZRiuaa6yChIWT9fPM6oqP0XpfKuBUr9ZbFp7FGKz7YaVIK7NQiuN+TyD11qfPqRjdEe/Qt1QR2ihuZrvWbErxWPcTDpm7XT3Sc+Uhn7zvGsW9YrbSbpXgjz6mQFFk9Okb8T9NAEJFmY9hZ+J2VNOvofMdwhnY/490ts7zVi1O0O27BRkUZxNfBA6544LJ0NQO0k3USQvuM4oOAxBbbgdkxh89AJYYj/Azet6w4QodW7z67rceqsrHsJSTGAwbTndB+a4+ipuf/mm3YqbkQzdMXhxlNnE6Tkq4MmbbrP X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:PAWPR08MB8982.eurprd08.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230031)(39860400002)(376002)(366004)(346002)(136003)(396003)(230922051799003)(64100799003)(1800799012)(186009)(451199024)(6506007)(38100700002)(26005)(38070700009)(83380400001)(55016003)(122000001)(86362001)(33656002)(76116006)(9686003)(7696005)(2906002)(66946007)(71200400001)(6636002)(66476007)(52536014)(8936002)(66556008)(54906003)(64756008)(66446008)(6862004)(91956017)(8676002)(4326008)(5660300002)(316002)(478600001)(41300700001);DIR:OUT;SFP:1101; Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-MS-Exchange-Transport-CrossTenantHeadersStamped: PAWPR08MB10090 Original-Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: AM4PEPF00027A62.eurprd04.prod.outlook.com X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id-Prvs: 8ebb67f2-0fad-4fdb-4cf4-08dc1154b3aa X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: smLyl8k7TxNr31uwE9b8rVdKx9LdGFDKGa82cNSaozAP8iN7WNBAZdLqU11rPTnKBTc5avmiGU53rdC+E29ChYyhk06B8xxC/B7rVtQcsWKZZlukYVIUO+B4P/FY35+2s0vGgP39546zNDNubNSdyKkjxnZjjkat/jQJys1Hr+CwKHoYnikUbni3lIy/HzfrFhILhTy+wlVZl+26jFiDn7/iLlUG6soQZE7EbBmPKZCCdfTGlrxOV4t0JUIRcNeHLuw7JUQC/H/kI/pnumT4NfTJHuimOZdw2KXH9lShex2pNH2vwpTwbY1w+kMIkHEDPKisfgsBkBP5/6SR29BVc0cRrnbosy7o2lJGG3pn9ERDpdamkHvJpop85b1NZs67O5uOQtt9JiN5x63Aj/ITN2rxBD1wtyTE36T1d8LRmlQUE2Ibg9ipv4IhoYVO1nu+kzHa8jcvKTaJWx24OjzUoLy9R/5rBtOYjmhZh2o0XcogIsw/M7sJwBFvPKnl21faMPsLgChOmSVWLsJmM1X+riPbODVVDv+5k/9VZgPiHlnWP0yqECA3F9sj0QEHdk2jY8INZp0mOq4CUltsZ6JHhuqpsamXANUiu8VSM73hzCCsQf+ZgzDoBG/MkV0MLvnhjfCVapzjL+6e8UdVCse6l/MyQNYLosnWGSF+tMpy4Hg/mmxhjqCSfw5qeMMdQb91chzaMzgRhZUCPhjHdHHQxCqvbWOHgp6o9p2o7N4/lRBFZelGBwK/f9mffYzXF5g6 X-Forefront-Antispam-Report: CIP:63.35.35.123;CTRY:IE;LANG:en;SCL:1;SRV:;IPV:CAL;SFV:NSPM;H:64aa7808-outbound-1.mta.getcheckrecipient.com;PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com;CAT:NONE;SFS:(13230031)(4636009)(396003)(136003)(376002)(346002)(39860400002)(230922051799003)(64100799003)(82310400011)(186009)(1800799012)(451199024)(46966006)(36840700001)(40470700004)(9686003)(8676002)(8936002)(478600001)(83380400001)(6636002)(70206006)(70586007)(54906003)(316002)(6506007)(7696005)(40460700003)(40480700001)(52536014)(55016003)(6862004)(336012)(4326008)(26005)(33656002)(36860700001)(81166007)(356005)(47076005)(2906002)(5660300002)(82740400003)(41300700001)(86362001);DIR:OUT;SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 09 Jan 2024 20:51:22.0094 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: c7b4ee9a-285e-4dd3-1c21-08dc1154bcb8 X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d;Ip=[63.35.35.123];Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: AM4PEPF00027A62.eurprd04.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: AM8PR08MB6481 X-Spam-Status: No, score=-9.2 required=5.0 tests=BAYES_00,BODY_8BITS,DKIM_SIGNED,DKIM_VALID,FORGED_SPF_HELO,GIT_PATCH_0,KAM_DMARC_NONE,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_PASS,SPF_NONE,TXREP,T_SCC_BODY_TEXT_LINE,UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Hi Richard,=0A= =0A= >> +#define MAX_SET_SIZE(speed) (speed ? 256 : 96)=0A= >=0A= > Since this isn't (AFAIK) a standard macro, there doesn't seem to be=0A= > any need to put it in the header file.=A0 It could just go at the head=0A= > of aarch64.cc instead.=0A= =0A= Sure, I've moved it in v4.=0A= =0A= >> +=A0 if (len <=3D 24 || (aarch64_tune_params.extra_tuning_flags=0A= >> +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 & AARCH64_EXTRA_= TUNE_NO_LDP_STP_QREGS))=0A= >> +=A0=A0=A0 set_max =3D 16;=0A= >=0A= > I think we should take the tuning parameter into account when applying=0A= > the MAX_SET_SIZE limit for -Os.=A0 Shouldn't it be 48 rather than 96 in= =0A= > that case?=A0 (Alternatively, I suppose it would make sense to ignore=0A= > the param for -Os, although we don't seem to do that elsewhere.)=0A= =0A= That tune is only used by an obsolete core. I ran the memcpy and memset=0A= benchmarks from Optimized Routines on xgene-1 with and without LDP/STP.=0A= There is no measurable penalty for using LDP/STP. I'm not sure why it was= =0A= ever added given it does not do anything useful. I'll post a separate patch= =0A= to remove it to reduce the maintenance overhead.=0A= =0A= Cheers,=0A= Wilco=0A= =0A= =0A= Here is v4 (move MAX_SET_SIZE definition to aarch64.cc):=0A= =0A= Cleanup memset implementation. Similar to memcpy/memmove, use an offset an= d=0A= bytes throughout. Simplify the complex calculations when optimizing for si= ze=0A= by using a fixed limit.=0A= =0A= Passes regress/bootstrap, OK for commit?=0A= =0A= gcc/ChangeLog:=0A= * config/aarch64/aarch64.cc (MAX_SET_SIZE): New define.=0A= (aarch64_progress_pointer): Remove function.=0A= (aarch64_set_one_block_and_progress_pointer): Simplify and clean up= .=0A= (aarch64_expand_setmem): Clean up implementation, use byte offsets,= =0A= simplify size calculation.=0A= =0A= ---=0A= =0A= diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc= =0A= index a5a6b52730d6c5013346d128e89915883f1707ae..62f4eee429c1c5195d54604f1d3= 41a8a5a499d89 100644=0A= --- a/gcc/config/aarch64/aarch64.cc=0A= +++ b/gcc/config/aarch64/aarch64.cc=0A= @@ -101,6 +101,10 @@=0A= /* Defined for convenience. */=0A= #define POINTER_BYTES (POINTER_SIZE / BITS_PER_UNIT)=0A= =0A= +/* Maximum bytes set for an inline memset expansion. With -Os use 3 STP= =0A= + and 1 MOVI/DUP (same size as a call). */=0A= +#define MAX_SET_SIZE(speed) (speed ? 256 : 96)=0A= +=0A= /* Flags that describe how a function shares certain architectural state= =0A= with its callers.=0A= =0A= @@ -26321,15 +26325,6 @@ aarch64_move_pointer (rtx pointer, poly_int64 amou= nt)=0A= next, amount);=0A= }=0A= =0A= -/* Return a new RTX holding the result of moving POINTER forward by the=0A= - size of the mode it points to. */=0A= -=0A= -static rtx=0A= -aarch64_progress_pointer (rtx pointer)=0A= -{=0A= - return aarch64_move_pointer (pointer, GET_MODE_SIZE (GET_MODE (pointer))= );=0A= -}=0A= -=0A= typedef auto_vec, 12> copy_ops;=0A= =0A= /* Copy one block of size MODE from SRC to DST at offset OFFSET. */=0A= @@ -26484,45 +26479,21 @@ aarch64_expand_cpymem (rtx *operands, bool is_mem= move)=0A= return true;=0A= }=0A= =0A= -/* Like aarch64_copy_one_block_and_progress_pointers, except for memset wh= ere=0A= - SRC is a register we have created with the duplicated value to be set. = */=0A= +/* Set one block of size MODE at DST at offset OFFSET to value in SRC. */= =0A= static void=0A= -aarch64_set_one_block_and_progress_pointer (rtx src, rtx *dst,=0A= - machine_mode mode)=0A= +aarch64_set_one_block (rtx src, rtx dst, int offset, machine_mode mode)=0A= {=0A= - /* If we are copying 128bits or 256bits, we can do that straight from=0A= - the SIMD register we prepared. */=0A= - if (known_eq (GET_MODE_BITSIZE (mode), 256))=0A= - {=0A= - mode =3D GET_MODE (src);=0A= - /* "Cast" the *dst to the correct mode. */=0A= - *dst =3D adjust_address (*dst, mode, 0);=0A= - /* Emit the memset. */=0A= - emit_insn (aarch64_gen_store_pair (*dst, src, src));=0A= -=0A= - /* Move the pointers forward. */=0A= - *dst =3D aarch64_move_pointer (*dst, 32);=0A= - return;=0A= - }=0A= - if (known_eq (GET_MODE_BITSIZE (mode), 128))=0A= + /* Emit explict store pair instructions for 32-byte writes. */=0A= + if (known_eq (GET_MODE_SIZE (mode), 32))=0A= {=0A= - /* "Cast" the *dst to the correct mode. */=0A= - *dst =3D adjust_address (*dst, GET_MODE (src), 0);=0A= - /* Emit the memset. */=0A= - emit_move_insn (*dst, src);=0A= - /* Move the pointers forward. */=0A= - *dst =3D aarch64_move_pointer (*dst, 16);=0A= + mode =3D V16QImode;=0A= + rtx dst1 =3D adjust_address (dst, mode, offset);=0A= + emit_insn (aarch64_gen_store_pair (dst1, src, src));=0A= return;=0A= }=0A= - /* For copying less, we have to extract the right amount from src. */= =0A= - rtx reg =3D lowpart_subreg (mode, src, GET_MODE (src));=0A= -=0A= - /* "Cast" the *dst to the correct mode. */=0A= - *dst =3D adjust_address (*dst, mode, 0);=0A= - /* Emit the memset. */=0A= - emit_move_insn (*dst, reg);=0A= - /* Move the pointer forward. */=0A= - *dst =3D aarch64_progress_pointer (*dst);=0A= + if (known_lt (GET_MODE_SIZE (mode), 16))=0A= + src =3D lowpart_subreg (mode, src, GET_MODE (src));=0A= + emit_move_insn (adjust_address (dst, mode, offset), src);=0A= }=0A= =0A= /* Expand a setmem using the MOPS instructions. OPERANDS are the same=0A= @@ -26551,7 +26522,7 @@ aarch64_expand_setmem_mops (rtx *operands)=0A= bool=0A= aarch64_expand_setmem (rtx *operands)=0A= {=0A= - int n, mode_bits;=0A= + int mode_bytes;=0A= unsigned HOST_WIDE_INT len;=0A= rtx dst =3D operands[0];=0A= rtx val =3D operands[2], src;=0A= @@ -26564,11 +26535,9 @@ aarch64_expand_setmem (rtx *operands)=0A= || (STRICT_ALIGNMENT && align < 16))=0A= return aarch64_expand_setmem_mops (operands);=0A= =0A= - bool size_p =3D optimize_function_for_size_p (cfun);=0A= -=0A= /* Default the maximum to 256-bytes when considering only libcall vs=0A= SIMD broadcast sequence. */=0A= - unsigned max_set_size =3D 256;=0A= + unsigned max_set_size =3D MAX_SET_SIZE (optimize_function_for_speed_p (c= fun));=0A= unsigned mops_threshold =3D aarch64_mops_memset_size_threshold;=0A= =0A= len =3D UINTVAL (operands[1]);=0A= @@ -26577,91 +26546,55 @@ aarch64_expand_setmem (rtx *operands)=0A= if (len > max_set_size || (TARGET_MOPS && len > mops_threshold))=0A= return aarch64_expand_setmem_mops (operands);=0A= =0A= - int cst_val =3D !!(CONST_INT_P (val) && (INTVAL (val) !=3D 0));=0A= - /* The MOPS sequence takes:=0A= - 3 instructions for the memory storing=0A= - + 1 to move the constant size into a reg=0A= - + 1 if VAL is a non-zero constant to move into a reg=0A= - (zero constants can use XZR directly). */=0A= - unsigned mops_cost =3D 3 + 1 + cst_val;=0A= - /* A libcall to memset in the worst case takes 3 instructions to prepare= =0A= - the arguments + 1 for the call. */=0A= - unsigned libcall_cost =3D 4;=0A= -=0A= - /* Attempt a sequence with a vector broadcast followed by stores.=0A= - Count the number of operations involved to see if it's worth it=0A= - against the alternatives. A simple counter simd_ops on the=0A= - algorithmically-relevant operations is used rather than an rtx_insn c= ount=0A= - as all the pointer adjusmtents and mode reinterprets will be optimize= d=0A= - away later. */=0A= - start_sequence ();=0A= - unsigned simd_ops =3D 0;=0A= -=0A= base =3D copy_to_mode_reg (Pmode, XEXP (dst, 0));=0A= dst =3D adjust_automodify_address (dst, VOIDmode, base, 0);=0A= =0A= /* Prepare the val using a DUP/MOVI v0.16B, val. */=0A= src =3D expand_vector_broadcast (V16QImode, val);=0A= src =3D force_reg (V16QImode, src);=0A= - simd_ops++;=0A= - /* Convert len to bits to make the rest of the code simpler. */=0A= - n =3D len * BITS_PER_UNIT;=0A= =0A= - /* Maximum amount to copy in one go. We allow 256-bit chunks based on t= he=0A= - AARCH64_EXTRA_TUNE_NO_LDP_STP_QREGS tuning parameter. */=0A= - const int copy_limit =3D (aarch64_tune_params.extra_tuning_flags=0A= - & AARCH64_EXTRA_TUNE_NO_LDP_STP_QREGS)=0A= - ? GET_MODE_BITSIZE (TImode) : 256;=0A= + /* Set maximum amount to write in one go. We allow 32-byte chunks based= =0A= + on the AARCH64_EXTRA_TUNE_NO_LDP_STP_QREGS tuning parameter. */=0A= + unsigned set_max =3D 32;=0A= =0A= - while (n > 0)=0A= + if (len <=3D 24 || (aarch64_tune_params.extra_tuning_flags=0A= + & AARCH64_EXTRA_TUNE_NO_LDP_STP_QREGS))=0A= + set_max =3D 16;=0A= +=0A= + int offset =3D 0;=0A= + while (len > 0)=0A= {=0A= /* Find the largest mode in which to do the copy without=0A= over writing. */=0A= opt_scalar_int_mode mode_iter;=0A= FOR_EACH_MODE_IN_CLASS (mode_iter, MODE_INT)=0A= - if (GET_MODE_BITSIZE (mode_iter.require ()) <=3D MIN (n, copy_limit))=0A= + if (GET_MODE_SIZE (mode_iter.require ()) <=3D MIN (len, set_max))=0A= cur_mode =3D mode_iter.require ();=0A= =0A= gcc_assert (cur_mode !=3D BLKmode);=0A= =0A= - mode_bits =3D GET_MODE_BITSIZE (cur_mode).to_constant ();=0A= - aarch64_set_one_block_and_progress_pointer (src, &dst, cur_mode);=0A= - simd_ops++;=0A= - n -=3D mode_bits;=0A= + mode_bytes =3D GET_MODE_SIZE (cur_mode).to_constant ();=0A= +=0A= + /* Prefer Q-register accesses for the last bytes. */=0A= + if (mode_bytes =3D=3D 16)=0A= + cur_mode =3D V16QImode;=0A= +=0A= + aarch64_set_one_block (src, dst, offset, cur_mode);=0A= + len -=3D mode_bytes;=0A= + offset +=3D mode_bytes;=0A= =0A= /* Emit trailing writes using overlapping unaligned accesses=0A= - (when !STRICT_ALIGNMENT) - this is smaller and faster. */=0A= - if (n > 0 && n < copy_limit / 2 && !STRICT_ALIGNMENT)=0A= + (when !STRICT_ALIGNMENT) - this is smaller and faster. */=0A= + if (len > 0 && len < set_max / 2 && !STRICT_ALIGNMENT)=0A= {=0A= - next_mode =3D smallest_mode_for_size (n, MODE_INT);=0A= - int n_bits =3D GET_MODE_BITSIZE (next_mode).to_constant ();=0A= - gcc_assert (n_bits <=3D mode_bits);=0A= - dst =3D aarch64_move_pointer (dst, (n - n_bits) / BITS_PER_UNIT);=0A= - n =3D n_bits;=0A= + next_mode =3D smallest_mode_for_size (len * BITS_PER_UNIT, MODE_INT);= =0A= + int n_bytes =3D GET_MODE_SIZE (next_mode).to_constant ();=0A= + gcc_assert (n_bytes <=3D mode_bytes);=0A= + offset -=3D n_bytes - len;=0A= + len =3D n_bytes;=0A= }=0A= }=0A= - rtx_insn *seq =3D get_insns ();=0A= - end_sequence ();=0A= -=0A= - if (size_p)=0A= - {=0A= - /* When optimizing for size we have 3 options: the SIMD broadcast se= quence,=0A= - call to memset or the MOPS expansion. */=0A= - if (TARGET_MOPS=0A= - && mops_cost <=3D libcall_cost=0A= - && mops_cost <=3D simd_ops)=0A= - return aarch64_expand_setmem_mops (operands);=0A= - /* If MOPS is not available or not shorter pick a libcall if the SIM= D=0A= - sequence is too long. */=0A= - else if (libcall_cost < simd_ops)=0A= - return false;=0A= - emit_insn (seq);=0A= - return true;=0A= - }=0A= =0A= - /* At this point the SIMD broadcast sequence is the best choice when=0A= - optimizing for speed. */=0A= - emit_insn (seq);=0A= return true;=0A= }=0A= =0A=