From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 87234 invoked by alias); 25 May 2017 17:49:16 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Received: (qmail 87217 invoked by uid 89); 25 May 2017 17:49:15 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.9 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_NONE,SPF_HELO_PASS,SPF_PASS autolearn=ham version=3.3.2 spammy= X-HELO: EUR01-DB5-obe.outbound.protection.outlook.com From: Wilco Dijkstra To: Andrew Pinski , Siddhesh Poyarekar CC: Szabolcs Nagy , "Ellcey, Steve" , libc-alpha , nd Subject: Re: Ping: [Patch] aarch64: Thunderx specific memcpy and memmove Date: Thu, 25 May 2017 17:49:00 -0000 Message-ID: References: <1493663254.29498.11.camel@cavium.com> <5909E2C5.7090603@arm.com> <1494366305.9224.26.camel@cavium.com> <74006e0a-fb4a-dc36-bc29-77303cef3cfb@gotplt.org> <5925BD04.7000902@arm.com> <0950612b-cff4-2256-6f81-3bacf30ce7e9@gotplt.org>, In-Reply-To: authentication-results: arm.com; dkim=none (message not signed) header.d=none;arm.com; dmarc=none action=none header.from=arm.com; x-ms-publictraffictype: Email x-microsoft-exchange-diagnostics: 1;AM5PR0802MB2483;7:pVxGDIcvfbnlQSXz6wIDZdl1Q1VwR8eCiqiz00srWyBEk40WSwdM9x1fGtO4avo4x2irD2gpwhzAr05WJZRT5xSGR0bwWI9si2XY6xSSRC4Aug/ElP218k7ZkR6Ul/XyoADFP9T05fSDLWTee9+4Dk0kwjVw9UMglIRHu0xNgpJFgHpDYug0QsCGsO+Q/KVqWFtT5xksFPRSoKdva2yakovk9nE5HgPakVdYDjoheB9dS6iNAQNYNQ1eGBncqEE/sSNk6MfZlr8VEh2A9W1lCpg2dcOosNtQldp3XNiW7htrtqx+LX6FK6CmjTqu9nIulsezRBOi/cLEAqtocJWgTQ== x-ms-traffictypediagnostic: AM5PR0802MB2483: x-ms-office365-filtering-correlation-id: e6587e3e-5e83-4ee3-4d7e-08d4a3965bcf x-ms-office365-filtering-ht: Tenant x-microsoft-antispam: UriScan:;BCL:0;PCL:0;RULEID:(22001)(2017030254075)(48565401081)(201703131423075)(201703031133081);SRVR:AM5PR0802MB2483; nodisclaimer: True x-microsoft-antispam-prvs: x-exchange-antispam-report-test: UriScan:; x-exchange-antispam-report-cfa-test: BCL:0;PCL:0;RULEID:(100000700054)(100105000095)(100000701054)(100105300095)(100000702054)(100105100095)(6040450)(601004)(2401047)(8121501046)(5005006)(3002001)(10201501046)(93006095)(93001095)(100000703054)(100105400095)(6055026)(6041248)(20161123564025)(201703131423075)(201702281528075)(201703061421075)(201703061406153)(20161123562025)(20161123558100)(20161123555025)(20161123560025)(6072148)(100000704054)(100105200095)(100000705054)(100105500095);SRVR:AM5PR0802MB2483;BCL:0;PCL:0;RULEID:(100000800054)(100110000095)(100000801054)(100110300095)(100000802054)(100110100095)(100000803054)(100110400095)(100000804054)(100110200095)(100000805047)(100110500095);SRVR:AM5PR0802MB2483; x-forefront-prvs: 0318501FAE x-forefront-antispam-report: SFV:NSPM;SFS:(10009020)(6009001)(39450400003)(39850400002)(39410400002)(39840400002)(39400400002)(39860400002)(24454002)(6246003)(72206003)(86362001)(93886004)(38730400002)(305945005)(189998001)(33656002)(478600001)(81166006)(8936002)(5250100002)(53936002)(8676002)(2950100002)(39060400002)(76176999)(3660700001)(54356999)(229853002)(9686003)(5660300001)(3280700002)(6506006)(2900100001)(2906002)(6436002)(25786009)(6116002)(4326008)(99286003)(102836003)(50986999)(74316002)(54906002)(55016002)(7696004)(7736002);DIR:OUT;SFP:1101;SCL:1;SRVR:AM5PR0802MB2483;H:AM5PR0802MB2610.eurprd08.prod.outlook.com;FPR:;SPF:None;MLV:ovrnspm;PTR:InfoNoRecords;LANG:en; spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-originalarrivaltime: 25 May 2017 17:49:14.5098 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-Transport-CrossTenantHeadersStamped: AM5PR0802MB2483 X-SW-Source: 2017-05/txt/msg00768.txt.bz2 Andrew Pinski wrote: >=20 > One memcpy does not fit all micro-arch.=A0 Just look at x86, where they > have many different versions and even do selection based on cache size > (see the current discussion about the memcpy regression). Given the number of micro architectures already existing, it would be a rea= lly bad situation to end up with one memcpy per micro architecture... Micro architectures will tend to converge rather than diverge as performance level increases. So I believe it's generally best to use the same instructi= ons for memcpy as for compiled code as that is what CPUs will actually encounter and optimize for. For the rare, very large copies we could do something dif= ferent if it helps (eg. prefetch, non-temporals, SIMD registers etc). > >> - non-thunderx systems are affected: static linked code using > >> memcpy will start to go through an indirection (iplt) instead > >> of direct call. if there are complaints about it or other ifunc > >> related issues come up, then again we will have to reconsider it. > > Just to answer this.=A0 This is true on x86 and PowerPC already so there > should be no difference on aarch64 than those two targets. An ifunc has a measurable overhead unfortunately, and that would no longer be trivially avoidable via static linking. Most calls to memcpy tend to be = very small copies. Maybe we should investigate statically linking the small copy= part of memcpy with say -O3? Cheers, Wilco