From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from EUR02-DB5-obe.outbound.protection.outlook.com (mail-db5eur02on2050.outbound.protection.outlook.com [40.107.249.50]) by sourceware.org (Postfix) with ESMTPS id 7A4663858C50 for ; Thu, 9 Feb 2023 11:43:26 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 7A4663858C50 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=0aODn2icgnK1x5R5uP2fnfVH+appDt++W1IDh7gMRDw=; b=sRCKXDg9CF4yJGdrpbtcyJtH7Pk+QqWocxFEmCsIEY9x00U42xb9xAEcGhVFpcdjYsFZjZz2fBAMDGDkul3+Mss+7+MbAR1Bz0kWROoWpeNaBOdzxMHP0nZqOEpxDQOnRI1wt27BSZvW7eyBNNHw/neiExoIXnPyN5aUYiD7feE= Received: from DB6PR07CA0202.eurprd07.prod.outlook.com (2603:10a6:6:42::32) by AS1PR08MB7562.eurprd08.prod.outlook.com (2603:10a6:20b:471::10) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6064.36; Thu, 9 Feb 2023 11:43:23 +0000 Received: from DBAEUR03FT043.eop-EUR03.prod.protection.outlook.com (2603:10a6:6:42:cafe::17) by DB6PR07CA0202.outlook.office365.com (2603:10a6:6:42::32) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6086.17 via Frontend Transport; Thu, 9 Feb 2023 11:43:23 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; pr=C Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by DBAEUR03FT043.mail.protection.outlook.com (100.127.143.24) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6086.19 via Frontend Transport; Thu, 9 Feb 2023 11:43:23 +0000 Received: ("Tessian outbound 8038f0863a52:v132"); Thu, 09 Feb 2023 11:43:23 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: 7b2ee0a46156587e X-CR-MTA-TID: 64aa7808 Received: from 89c79d5c8f9c.1 by 64aa7808-outbound-1.mta.getcheckrecipient.com id 5B7266D8-1854-4565-9BC3-7F7603D7F463.1; Thu, 09 Feb 2023 11:43:16 +0000 Received: from EUR05-VI1-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id 89c79d5c8f9c.1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Thu, 09 Feb 2023 11:43:16 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=WB7Aauc1UqVgE+8+8yTOydKbGnY5t6Pd6dbDyUrJwQwHlPoDYVcdydbYO0H9+6n+8rrqx1x0yzon1Z5hN+SSMwPsErMT7zGqZ6M6c7wo516geK5BVK2AWOs0kLRG5uygiiUj4Tmc41UPMK3CePck/WkAMMMlcl0Qrnq9POtIYEIONIVcFMcXBiCV6C0evBxATEauSyUM54xdJtKzz6PIRLNpmYbywQox3Wac8dWvv4RDTcTNbBEjJ5tI11ULM3NMJooGCAXaWfbACbqRZBTHsp3egU+uKMOtT9gSwaQgQA5eMQHE41eX7/u+vpJgzQoglSrdDGMhID7nUaQWNZe5Aw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=0aODn2icgnK1x5R5uP2fnfVH+appDt++W1IDh7gMRDw=; b=SSbJBdHKr5YpMhsf0InQsyleFnK5c1bzqd7PbxAdxFwGsAvhdgZDZVdrUwAt0J2Fo8eo2YmHVJS9PBLCs8i34XnDqSM781psditKukdKlIWmTrK6u+SbGqFQ8zEgHKROgdN1DexetM1VFb2dGoRPjtj+na4BlHsXHjx9aBIpAW9DtE/XUZIWvXlyXLo1ryOVDnQO1iTJts/RNgmUOP3y69gjKY43m4irKqilnZUVE5h6P1oQdXZlTygEfRdBMOJ84GvlThfLa4wjOupZSAizRkSOlOKWsHxVCkB+eJt7ObbkkBsmlHYtj10qrT3H5eBy+yOBvQOsF7iEY580eQzRRw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=0aODn2icgnK1x5R5uP2fnfVH+appDt++W1IDh7gMRDw=; b=sRCKXDg9CF4yJGdrpbtcyJtH7Pk+QqWocxFEmCsIEY9x00U42xb9xAEcGhVFpcdjYsFZjZz2fBAMDGDkul3+Mss+7+MbAR1Bz0kWROoWpeNaBOdzxMHP0nZqOEpxDQOnRI1wt27BSZvW7eyBNNHw/neiExoIXnPyN5aUYiD7feE= Received: from PAWPR08MB8982.eurprd08.prod.outlook.com (2603:10a6:102:33f::20) by PA4PR08MB6174.eurprd08.prod.outlook.com (2603:10a6:102:e6::8) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6086.17; Thu, 9 Feb 2023 11:43:15 +0000 Received: from PAWPR08MB8982.eurprd08.prod.outlook.com ([fe80::dc17:8fa2:cce5:3573]) by PAWPR08MB8982.eurprd08.prod.outlook.com ([fe80::dc17:8fa2:cce5:3573%7]) with mapi id 15.20.6064.031; Thu, 9 Feb 2023 11:43:15 +0000 From: Wilco Dijkstra To: Adhemerval Zanella CC: 'GNU C Library' Subject: [RFC PATCH 12/19] riscv: Add accelerated memcpy/memmove routines for RV64 Thread-Topic: [RFC PATCH 12/19] riscv: Add accelerated memcpy/memmove routines for RV64 Thread-Index: AQHZPHOV+dPv6OsCOUmV7psme+gcZw== Date: Thu, 9 Feb 2023 11:43:15 +0000 Message-ID: Accept-Language: en-GB, en-US Content-Language: en-GB X-MS-Has-Attach: X-MS-TNEF-Correlator: msip_labels: Authentication-Results-Original: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; x-ms-traffictypediagnostic: PAWPR08MB8982:EE_|PA4PR08MB6174:EE_|DBAEUR03FT043:EE_|AS1PR08MB7562:EE_ X-MS-Office365-Filtering-Correlation-Id: cc2b42eb-89b4-4d2f-4f37-08db0a92d999 x-checkrecipientrouted: true nodisclaimer: true X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: OYS4LN5e2x3zhbsBxxAY6VxJytQ/Pm3VSaE7nZZ/4zQJk8wukIKQ36bbz6CcPyd9jKgIPRZ9hZV1K3mmcEzzPJldi2nb6EBzzk1qHuFwki6yFua8h7xuUfZ956n/kaBC9y+Sbr5gfvHTI7LvYQhwxQiHQe8sDw6ygHdewEWnketop+1rzcGrJuQb7oAWRzaGOD76pVXxBkMorpYp+2G9nCMwwLhkiXBIxpwGfk4k/487FSzOknIz2EmhH44hkiEq0RYv7ACA+NKvExJsg5mK16k45xovPQvxRDKr0rT96KTbpBhmFTv1jQ1XSnHd7uNRC84tLqZKk+G6PviZj57PZ+pxdpyW3PJXpLyD5aD7XK8p/pOvj32S1zK9Cf2j1SOEi/mmuGvD3mK7/VUHUNwjqLBzDor9+bkKb6OkvsyWsTMuK4E+EN/+MpwCXNFFRK7vIPjBY2z/fwxyi8u99h24KSjXCY3H2LV84C44x2NMMMHUZmYrU62FmyaGtVesoJpq146PsroF59cS5ulvj8J6A+sodLXkmttA5lgAYbv7J5Wu9yOng18hYq6uLN6ghU+S+DXW6c+itmHQ+aPKg+VWwm54hwowXmZjTsLS4HNCQ4hnlsXjkRkh/Dgkj9bXpNhetrnDKynRd9BGG+HnZTPRqxJ5v4pmiWeV0vIe0c0XEnTP2dlk27+q1JONZXe5Qsp1Rcjam9NlfkMby0OE+Sok5bsRzt6bZMEY/PArqXkIOQs= X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:PAWPR08MB8982.eurprd08.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230025)(4636009)(346002)(39860400002)(136003)(396003)(376002)(366004)(451199018)(478600001)(55016003)(91956017)(38070700005)(76116006)(316002)(2906002)(5660300002)(8936002)(64756008)(26005)(52536014)(4326008)(66556008)(41300700001)(86362001)(8676002)(6506007)(6916009)(66476007)(66946007)(186003)(9686003)(33656002)(66446008)(7696005)(38100700002)(71200400001)(966005)(122000001)(83380400001);DIR:OUT;SFP:1101; Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-MS-Exchange-Transport-CrossTenantHeadersStamped: PA4PR08MB6174 Original-Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: DBAEUR03FT043.eop-EUR03.prod.protection.outlook.com X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id-Prvs: 957f21ab-d73f-49e7-ee4a-08db0a92d49e X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: H6CKPvqWHnBS5BWp1l6eWPeXeBMNcXzK82TKhTAO7zO6rGE1XeSEOSv0ZVeuDAQf/kERUSb+qyxHueYk0Nc51rOxs+e2KNyFxvZTjyG54+zOMME9Yd8GDw7pQPxRW4GqM3No24hf4N/qH/xPNEzimbXUxZfvedKJKlQsHo/387Lc2gd0cpnFbDy/eD2R1UOGWbJkIP+L+VwrljxZzlhE5sYKjH/WHLa9nbeXymQwn4BkqG+G20EZaKOKM8X6XZpmEYqWYlDm5+xsZ53jkyaQb5yaBJUWvkgekLXWii1v2ifLH5TW7+cXWqONX3t+3L9w4akYeQ8P0RQJ5rsXGUua8IvHcYpT8uCFsjWZfb20b3VrBY0Ii9Eb+D0Y5zepYnOWRHgUb2ZfzeOmlggtGMt/osS81ht+4UOtviE7wvFTCIPv6qbnbC2nkKLe6nWZGTBQlYXO4+xRcQ+Q544TDuSwZ8zHGrtRr9PHzYQB/Ab9GHrYbqAE5je1Nxd+/ghP4QZv93tEniNa7cNCFKHrgFpRdvI9AdMJ/EVTtjGk5vjRTpBgH5cDE/+KMegr9Y9jjJoS+2iAQ8BFjK9Qhzx3A+jfWXFlo6IekljkelMPvznp4VH7wprOzXfEddITG/I4V6U5cj7a314N7fTE2NlXRsrQ/1uBGb+NREtUD6IpwH2dRJrfkcEJ62ZO3Jbs1malGzl3DR8WCTdNoT6+HbZhtz81ASEoUkgSHWnk4gTngFZun5o= X-Forefront-Antispam-Report: CIP:63.35.35.123;CTRY:IE;LANG:en;SCL:1;SRV:;IPV:CAL;SFV:NSPM;H:64aa7808-outbound-1.mta.getcheckrecipient.com;PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com;CAT:NONE;SFS:(13230025)(4636009)(346002)(376002)(39860400002)(396003)(136003)(451199018)(40470700004)(36840700001)(46966006)(2906002)(40460700003)(966005)(356005)(5660300002)(82740400003)(81166007)(47076005)(7696005)(86362001)(33656002)(478600001)(336012)(6506007)(26005)(9686003)(55016003)(40480700001)(82310400005)(8676002)(70586007)(70206006)(36860700001)(52536014)(8936002)(41300700001)(4326008)(83380400001)(316002)(6862004)(186003);DIR:OUT;SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 09 Feb 2023 11:43:23.4950 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: cc2b42eb-89b4-4d2f-4f37-08db0a92d999 X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d;Ip=[63.35.35.123];Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: DBAEUR03FT043.eop-EUR03.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: AS1PR08MB7562 X-Spam-Status: No, score=-0.1 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,FORGED_SPF_HELO,KAM_DMARC_NONE,KAM_NUMSUBJECT,KAM_STORAGE_GOOGLE,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,RCVD_IN_VALIDITY_RPBL,SPF_HELO_PASS,SPF_NONE,TXREP,UNPARSEABLE_RELAY,URIBL_SBL_A autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Hi Adhemerval,=0A= =0A= > The generic routines still assumes that hardware can't or is prohibitive = =0A= > expensive to issue unaligned memory access. However, I think we move tow= ard =0A= > this direction to start adding unaligned variants when it makes sense.=0A= =0A= There is a _STRING_ARCH_unaligned define that can be set per target. It nee= ds=0A= cleaning up since it's used mostly for premature micro-optimizations (eg. g= etenv.c)=0A= where using a fixed size memcpy would be best (it also appears to have big-= endian=0A= bugs).=0A= =0A= > Another usual tuning is loop unrolling, which depends on underlying hardw= are.=0A= > Unfortunately we need to explicit force gcc to unroll some loop construct= ion=0A= > (for instance check sysdeps/powerpc/powerpc64/power4/Makefile), so this m= ight=0A= > be another approach you might use to tune RISCV routines.=0A= =0A= Compiler unrolling is unlikely to give improved results, especially on GCC = where=0A= the default unroll factor is still 16 times which will just bloat the code.= ..=0A= So all reasonable unrolling is best done by hand (and doesn't need to be ta= rget=0A= specific).=0A= =0A= > The memcpy, memmove, memset, memcmp are a slight different subject. Alth= ough=0A= > current generic mem routines does use some explicit unrolling, it also do= es=0A= > not take in consideration unaligned access, vector instructions, or speci= al =0A= > instruction (such as cache clear one). And these usually make a lot of= =0A= > difference.=0A= =0A= Indeed. However it is also quite difficult to make use of all these without= a lot of=0A= target specific code and inline assembler. And at that point you might as w= ell use=0A= assembler...=0A= =0A= > What I would expect it maybe we can use a similar strategy Google is doin= g=0A= > with llvm libc, which based its work on the automemcpy paper [1]. It mean= s=0A= > that for unaligned, each architecture will reimplement the memory routine= =0A= > block. Although the project focus on static compiling, I think using hoo= ks=0A= > over assembly routines might be a better approach (you might reuse code= =0A= > blocks or try different strategies more easily).=0A= >=0A= > [1] https://storage.googleapis.com/pub-tools-public-publication-data/pdf/= 4f7c3da72d557ed418828823a8e59942859d677f.pdf=0A= =0A= I'm still not convinced about this strategy - it's hard to beat assembler u= sing=0A= generic code. The way it works in LLVM is that you implement a new set of= =0A= builtins that inline an optimal memcpy for a fixed size. But you don't know= the=0A= alignment, so this only works on targets that support fast unaligned access= .=0A= And with different compiler versions/options you get major performance=0A= variations due to code reordering, register allocation differences or failu= re=0A= to emit load/store pairs...=0A= =0A= I believe it is reasonable to ensure the generic string functions are effic= ient=0A= to avoid having to write assembler for every string function. However it=0A= becomes crazy when you set the goal to be as close as possible to the best= =0A= assembler version in all cases. Most targets will add assembly versions for= =0A= key functions like memcpy, strlen etc.=0A= =0A= Cheers,=0A= Wilco=