From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 76172 invoked by alias); 12 Apr 2018 15:53:20 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 76160 invoked by uid 89); 12 Apr 2018 15:53:19 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.8 required=5.0 tests=AWL,BAYES_00,RCVD_IN_DNSWL_NONE,SPF_HELO_PASS,SPF_PASS autolearn=ham version=3.3.2 spammy=Hx-spam-relays-external:sk:EUR01-V, H*RU:sk:EUR01-V, HX-HELO:sk:EUR01-V, H*RU:sk:mail-ve X-HELO: EUR01-VE1-obe.outbound.protection.outlook.com Received: from mail-ve1eur01on0064.outbound.protection.outlook.com (HELO EUR01-VE1-obe.outbound.protection.outlook.com) (104.47.1.64) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Thu, 12 Apr 2018 15:53:17 +0000 Received: from DB6PR0801MB2053.eurprd08.prod.outlook.com (10.168.86.22) by DB6PR0801MB2117.eurprd08.prod.outlook.com (10.169.220.151) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384_P256) id 15.20.675.10; Thu, 12 Apr 2018 15:53:14 +0000 Received: from DB6PR0801MB2053.eurprd08.prod.outlook.com ([fe80::c43d:c607:66b6:5f6e]) by DB6PR0801MB2053.eurprd08.prod.outlook.com ([fe80::c43d:c607:66b6:5f6e%17]) with mapi id 15.20.0675.011; Thu, 12 Apr 2018 15:53:14 +0000 From: Wilco Dijkstra To: Jakub Jelinek , Richard Biener CC: nd , "mliska@suse.cz" , "ubizjak@gmail.com" , GCC Patches , "marc.glisse@inria.fr" , "H.J. Lu" , Jan Hubicka Subject: Re: [PATCH] Prefer mempcpy to memcpy on x86_64 target (PR middle-end/81657). Date: Thu, 12 Apr 2018 15:53:00 -0000 Message-ID: authentication-results: spf=none (sender IP is ) smtp.mailfrom=Wilco.Dijkstra@arm.com; x-ms-publictraffictype: Email x-microsoft-exchange-diagnostics: 1;DB6PR0801MB2117;7:HbobVImwIZ2LgOe7rTLxWaHBaTuh+EYoAoH5cJJ9xGAf6tllOccq+KlAbGc768ZNs9Jhm/6phH8VDetPBPL7QHqkMFqnlgaGFV5cV0sGf+OgI3eZS5nu0Wf3Qd5eeMyBz+EMrWJ+CywD1hHHieG+1e7exDnBVy5nmJW56DrjWemsVmK7uqbPbSTy/S/uqDQLoxoLVmZBu6yVbSQmzgpkjB4TM6i7gnqFuVSeZc5b7YbHggGrshI6vguOgKLaij0j x-ms-exchange-antispam-srfa-diagnostics: SOS; x-ms-office365-filtering-ht: Tenant x-microsoft-antispam: UriScan:;BCL:0;PCL:0;RULEID:(7020095)(4652020)(5600026)(48565401081)(2017052603328)(7153060)(7193020);SRVR:DB6PR0801MB2117; x-ms-traffictypediagnostic: DB6PR0801MB2117: nodisclaimer: True x-microsoft-antispam-prvs: x-exchange-antispam-report-test: UriScan:; x-exchange-antispam-report-cfa-test: BCL:0;PCL:0;RULEID:(8211001083)(6040522)(2401047)(8121501046)(5005006)(93006095)(93001095)(3231221)(944501327)(52105095)(3002001)(10201501046)(6055026)(6041310)(20161123560045)(20161123564045)(20161123558120)(20161123562045)(201703131423095)(201702281528075)(20161123555045)(201703061421075)(201703061406153)(6072148)(201708071742011);SRVR:DB6PR0801MB2117;BCL:0;PCL:0;RULEID:;SRVR:DB6PR0801MB2117; x-forefront-prvs: 06400060E1 x-forefront-antispam-report: SFV:NSPM;SFS:(10009020)(39380400002)(396003)(39850400004)(376002)(366004)(346002)(199004)(189003)(74316002)(3660700001)(39060400002)(14454004)(5660300001)(2900100001)(97736004)(6506007)(7696005)(59450400001)(6246003)(3280700002)(86362001)(8676002)(81156014)(81166006)(316002)(4326008)(2906002)(26005)(5250100002)(106356001)(8936002)(486006)(33656002)(305945005)(54906003)(110136005)(9686003)(7736002)(102836004)(105586002)(53936002)(68736007)(478600001)(476003)(25786009)(99286004)(6436002)(3846002)(229853002)(6116002)(66066001)(55016002)(72206003);DIR:OUT;SFP:1101;SCL:1;SRVR:DB6PR0801MB2117;H:DB6PR0801MB2053.eurprd08.prod.outlook.com;FPR:;SPF:None;LANG:en;PTR:InfoNoRecords;A:1;MX:1; received-spf: None (protection.outlook.com: arm.com does not designate permitted sender hosts) x-microsoft-antispam-message-info: HG1llj/QAByrBeT/Z9vt2qFUp5yAVCs38a+0SjJ4vO0ifn1mgtnh5UqJOde0LIU176gu7RydB3iXwk4OdLQF7v0TQi3pTVSvv6MIelqu/940JWtVYXw7NUBhr/1pwgQrREgZM4vv/VQs1J7w6vlzlz3PxxWQZTVU/zoi7a6ERWvots2Q9FF1Dqp1zobdQ3in spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-MS-Office365-Filtering-Correlation-Id: 50693480-f31a-4a00-0f8e-08d5a08d7ff7 X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-Network-Message-Id: 50693480-f31a-4a00-0f8e-08d5a08d7ff7 X-MS-Exchange-CrossTenant-originalarrivaltime: 12 Apr 2018 15:53:13.9273 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-Transport-CrossTenantHeadersStamped: DB6PR0801MB2117 X-SW-Source: 2018-04/txt/msg00614.txt.bz2 Jakub Jelinek wrote: > On Thu, Apr 12, 2018 at 03:52:09PM +0200, Richard Biener wrote: >> Not sure if I missed some important part of the discussion but >> for the testcase we want to preserve the tailcall, right? So >> it would be enough to set avoid_libcall to >> endp !=3D 0 && CALL_EXPR_TAILCALL (exp) (and thus also handle >> stpcpy)? The tailcall issue is just a distraction. Historically the handling of memp= cpy=20=20 has been horribly inefficient in both GCC and GLIBC for practically all tar= gets. This is why it was decided to defer to memcpy. For example small constant mempcpy was not expanded inline like memcpy until PR70140 was fixed. Except for a few targets which have added an optimized mempcpy, the default mempcpy implementation in almost all released GLIBCs is much slower than memcpy (due to using a badly written C implementation). Recent GLIBCs now call the optimized memcpy - this is better but still adds extra call/return overheads. So to improve that the GLIBC headers have an inline that changes any call to mempcpy into memcpy (this is the default but can be disabled on a per-target basis). Obviously it is best to do this optimization in GCC, which is what we final= ly do in GCC8. Inlining mempcpy means you sometimes miss a tailcall, but this is not common - in all of GLIBC the inlining on AArch64 adds 166 extra instruc= tions and 12 callee-save registers. This is a small codesize cost to avoid the ov= erhead of calling the generic C version. > My preference would be to have non-lame mempcpy etc. on all targets, but = the > aarch64 folks disagree. The question is who is going to write the 30+ mempcpy implementations for a= ll those targets which don't have one? And who says doing this is actually goi= ng to=20 improve performance? Having mempcpy+memcpy typically means more Icache misses in code that uses both. So generally it's a good idea to change mempcpy into memcpy by default. It's not slower than calling mempcpy even if you have a fast implementation, it'= s faster if you use an up to date GLIBC which calls memcpy, and it's significantly b= etter when using an old GLIBC. Wilco