From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from EUR05-AM6-obe.outbound.protection.outlook.com (mail-am6eur05on2081.outbound.protection.outlook.com [40.107.22.81]) by sourceware.org (Postfix) with ESMTPS id C1D593858410 for ; Thu, 10 Feb 2022 13:02:15 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org C1D593858410 Received: from AM5PR0601CA0077.eurprd06.prod.outlook.com (2603:10a6:206::42) by AM8PR08MB5684.eurprd08.prod.outlook.com (2603:10a6:20b:1dc::20) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4975.11; Thu, 10 Feb 2022 13:02:08 +0000 Received: from AM5EUR03FT039.eop-EUR03.prod.protection.outlook.com (2603:10a6:206:0:cafe::70) by AM5PR0601CA0077.outlook.office365.com (2603:10a6:206::42) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4975.12 via Frontend Transport; Thu, 10 Feb 2022 13:02:08 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by AM5EUR03FT039.mail.protection.outlook.com (10.152.17.185) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4975.11 via Frontend Transport; Thu, 10 Feb 2022 13:02:08 +0000 Received: ("Tessian outbound 1f399c739551:v113"); Thu, 10 Feb 2022 13:02:08 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: 7990f4b83d5b23b2 X-CR-MTA-TID: 64aa7808 Received: from f75c23a27630.1 by 64aa7808-outbound-1.mta.getcheckrecipient.com id 2A84D3BF-2A0B-4D42-A62D-7F17613EE0A4.1; Thu, 10 Feb 2022 13:02:01 +0000 Received: from EUR05-AM6-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id f75c23a27630.1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Thu, 10 Feb 2022 13:02:01 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=cHAEKRJZN1fk1lOVfUmW1K4tbfNDnAC4H7mCCIjkGDTilc4Gj5Qu1Z23jnS//5crHApGD3Q1LMmoKq4dHYsI6i0c2o3i9OTAZSC/0RGUHxIEgnxSZC/wPq14nSZuHaPYQlomzVKBr3d8nzkLx840JK+6YIf/L9RcOeibZID8lkoY8nKMxd7r3CgDUyzvx1MprAYr6MLCJvWTgFQp05//HOht2/kndYIMvfi4BorGf8P3ldTBRre0NqPjZVHoLTRQ+/BResQmxChlhPADl6aeG3Qxf29xyKSXod//4W9nPsvdwBHc1uR1p3RRIvGHquiQkJL7Xyr9gUdPGr2zfa+oKw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=ua1FLiMepjO9ofXlAO49ESc92z9xLNUhceUm2tPYalI=; b=aw9XCNr/OCDK70wxKqTCcELvSBOJRyYKwN9SqZv/KMonV/guiywSFJCPdF9Nc/aHCrL1we7HbDANcfRvXlqk1e31PckPd6kwOegG0INGghfyt0DiFBeI1XKdWEz1LJUdEbcoNtLywnmJm1CUaGsJtRyDx4bWM55W0hHEGSV2UvdCcDj9Tq0TFpicVly7ISiPCtq+Kjp174j2wpOtD7nt6mR9Od90wcAdMEg1tcacAvubAqz/oQ2jSbNqegpq3FkY5dmvURNf8MFjlU+4L+RHA2CVIva+zwKd7632IEXeM0LoEAvHkIanz5ecpyZ9UHogM6GrvC6kp9ExcyZ+nzW9gg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=none; dmarc=none; dkim=none; arc=none Received: from AS8PR08MB6534.eurprd08.prod.outlook.com (2603:10a6:20b:31c::10) by HE1PR0802MB2396.eurprd08.prod.outlook.com (2603:10a6:3:dc::15) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4951.12; Thu, 10 Feb 2022 13:01:59 +0000 Received: from AS8PR08MB6534.eurprd08.prod.outlook.com ([fe80::4c4f:f584:ce98:e21d]) by AS8PR08MB6534.eurprd08.prod.outlook.com ([fe80::4c4f:f584:ce98:e21d%5]) with mapi id 15.20.4951.019; Thu, 10 Feb 2022 13:01:59 +0000 From: Wilco Dijkstra To: Adhemerval Zanella , Noah Goldstein CC: "H.J. Lu" , GNU C Library Subject: Re: [PATCH v2] x86-64: Optimize bzero Thread-Topic: [PATCH v2] x86-64: Optimize bzero Thread-Index: AQHYHan5YCUSvWalA0+/WJO+HSj366yLyVsAgADwuYCAAAE0Mw== Date: Thu, 10 Feb 2022 13:01:59 +0000 Message-ID: References: <20220208224319.40271-1-hjl.tools@gmail.com> In-Reply-To: Accept-Language: en-GB, en-US Content-Language: en-GB X-MS-Has-Attach: X-MS-TNEF-Correlator: Authentication-Results-Original: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; X-MS-Office365-Filtering-Correlation-Id: 94b628dc-f829-46e5-7265-08d9ec958b6a x-ms-traffictypediagnostic: HE1PR0802MB2396:EE_|AM5EUR03FT039:EE_|AM8PR08MB5684:EE_ X-Microsoft-Antispam-PRVS: x-checkrecipientrouted: true nodisclaimer: true x-ms-oob-tlc-oobclassifiers: OLM:10000;OLM:10000; X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: FPAjhvcoW/4iSCBNy7SsMY/gy+VhHQ3NyQdjOsISvQ5A17lrkuPpsMGQSLMJGSWVV/6qZdX9RidUMFFyeDfk8kUFopcs9ddLN+dvxqwO5xkfDdGW37K7gPn1rSXD0J+saWUR0XD9sHJU3L1NRBIc9vLESPFBdCJtg+MS1J8ZFcpS962VqknOli+sHHbGwj1+hc/joOCIQnT1lmK6SnhGFS5rEWd1Gilaefg1/3qNZ5qXh+aEnMBIXlLKdKX0+oCYThp28Xx3ZDxdTyuE+7BEX/H2+W6QbZ+HAw/xVwXJab26hr6gFuGgfgCw2EaHYeOWADKG/N5o/72oOtq8gCDh97gayWGy4XZJkNL7IYRZ/wH57lH0CyBpQYndX9+D8f9onNr0LLr6sVsS+3+mQu17o26BcAuvHoSxPr8vI6y5i+T6WJdza5L5PNUzYfr/aLT4Mts69sO1kBaq/z3OcUsCtgLhFeHhgeFJ1b2ICdBDzmDUeHsRgVj2plIq8wWnm/d6cye5gnlNelGjvPeuf8odfRBp1fobz8Pd50SLEQnjNbeWv4edA1+STqTFcpBz8R28nwocwdZqCeJFoeh8MESy2MLLySeaLAj+U0zouGiEjRRyqioG9wyZg9nUr3TXd7d6CaSkPvgrPG4dCet3WyE4uy0JY5ihNitrZBncGc6NmI77q3dT1+6P28DMlsLvBuYTggwEfm3b+TvAxXUCFNWs4g== X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:AS8PR08MB6534.eurprd08.prod.outlook.com; PTR:; CAT:NONE; SFS:(13230001)(4636009)(366004)(66446008)(55016003)(64756008)(66946007)(8676002)(508600001)(71200400001)(8936002)(6506007)(7696005)(66476007)(38070700005)(91956017)(4326008)(33656002)(76116006)(66556008)(122000001)(52536014)(54906003)(316002)(2906002)(9686003)(110136005)(186003)(26005)(5660300002)(86362001)(38100700002); DIR:OUT; SFP:1101; Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-MS-Exchange-Transport-CrossTenantHeadersStamped: HE1PR0802MB2396 Original-Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: AM5EUR03FT039.eop-EUR03.prod.protection.outlook.com X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id-Prvs: caa3f0ff-3272-4c55-9c24-08d9ec958603 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: VLxoz4Jqu/s4SY9+CLokz8gj6bqAfcoqJItkJIKNPveCwK74Fs9H331lyjSukqo7BE1W5WSLhZjgSqwezYpf456lbmkjkIGZVjzHlzMIh+p613ghrD6fKYCWIEmDezL17wK6WS7CwsHVXhAZqSXZ64CHR9FkM/F7XTifAzvCEVkbU+dUUNMNceSnwmJWVtpSyjHpKB8PQLFlRVccjXqADi53z/tUTI3oaQzw/1Izyfx5aqxfa6KTmoRvRpPQmlXl6c23nkuHtpRwCYBUV0Q4Oyy5dbcSH1E7ByooFUUkCChdnNFJeg+2kx6HHVpiQ/S6K9T9kOocp5OcPDCJwKuznSY9XZvQHKTgD6jSPT5h43af7G5kXZZx0QnId0wKJPiS1V2m88JqLhGt6uMaPN9DMK1RgwdpbqQqZaQR1ZXZ7G1y/cGBu++WDghQT482NCL1zhg2+FmyzJQEnlEQe8p5BTQkEBALszGXUotmxoFlm6HTQGSYomW3tCvr27GLRt/8p7m1KnGm2434d9ePtoQ+zGDqiP6uuqemENtDk7602v6CTEEU7dfiQue3PTE45sVoK9sYy6NpHWozZ0qF/2zaEoi658+nIViAsYizBYoViE/F6o1ykXOf3oRsiikqnbx6eO9eHVdA+U5BiLrsw0NkRVhp6xelzVmUpp4TY+bmE6m0F/uIAL/FI+6iv69fTlaa X-Forefront-Antispam-Report: CIP:63.35.35.123; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:64aa7808-outbound-1.mta.getcheckrecipient.com; PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com; CAT:NONE; SFS:(13230001)(4636009)(46966006)(36840700001)(40470700004)(7696005)(52536014)(186003)(26005)(6506007)(9686003)(36860700001)(47076005)(82310400004)(2906002)(8936002)(5660300002)(55016003)(40460700003)(336012)(70586007)(8676002)(508600001)(110136005)(54906003)(70206006)(81166007)(86362001)(356005)(316002)(4326008)(33656002); DIR:OUT; SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 10 Feb 2022 13:02:08.2313 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 94b628dc-f829-46e5-7265-08d9ec958b6a X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[63.35.35.123]; Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: AM5EUR03FT039.eop-EUR03.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: AM8PR08MB5684 X-Spam-Status: No, score=-6.0 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE, UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 10 Feb 2022 13:02:18 -0000 Hi,=0A= =0A= >> The saving is in the lane-cross broadcast which is on the critical=0A= >> path for memsets in [VEC_SIZE, 2 * VEC_SIZE] (think 32-64).=0A= =0A= What is the speedup in eg. bench-memset? Generally the OoO engine will=0A= be able to hide a small increase in latency, so I'd be surprised it shows u= p=0A= as a significant gain.=0A= =0A= If you can show a good speedup in an important application (or benchmark=0A= like SPEC2017) then it may be worth pursuing. However there are other=0A= optimization opportunities that may be easier or give a larger benefit.=0A= =0A= >> Agreed it's not clear if it's worth it to start replacing memset calls w= ith=0A= >> bzero calls, but at the very least this will improve existing code that= =0A= >> uses bzero.=0A= =0A= No code uses bzero, no compiler emits bzero. It died 2 decades ago...=0A= =0A= > My point is this is a lot of code and infrastructure for a symbol marked= =0A= > as legacy for POSIX.1-2001 and removed on POSIX.1-2008 for the sake of=0A= > marginal gains in specific cases.=0A= =0A= Indeed, what we really should discuss is how to remove the last traces of= =0A= bcopy and bcmp from GLIBC. Do we need to keep a compatibility symbol=0A= or could we just get rid of it altogether?=0A= =0A= Cheers,=0A= Wilco=