From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 52901 invoked by alias); 23 May 2017 20:39:08 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Received: (qmail 52870 invoked by uid 89); 23 May 2017 20:39:07 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-24.2 required=5.0 tests=AWL,BAYES_00,GIT_PATCH_0,GIT_PATCH_1,GIT_PATCH_2,GIT_PATCH_3,RCVD_IN_DNSWL_NONE,RCVD_IN_SORBS_SPAM,RP_MATCHES_RCVD,SPF_PASS autolearn=ham version=3.3.2 spammy= X-HELO: mail-oi0-f44.google.com X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=5e+0cYbhprvcUuzznXtBkerBByBASiSopiFQV/ZlxvM=; b=WxyLHrwXmlV+EtgwxjnNKd8xITWDdU3+3zagARzZl1nFo5p1GRY17vlgpjlODggy2B vWiobReLNyfAY7Hjq9RU4NmNRJSfhiS9N59anZ7NbdUh27Las1IancSQrGCZ2E6Seqlp z+T5scOweRvThF+7L3V03Yyp9eyinrnRd2LwFFIBD6Idtv/FP3Ccg9DeNuEOXsvuypdT dbXK4se7KaPLyFJ246Iy0eScXbGWU81T7JJgMHLmyEnYr4vevsMed66WHQCWDlAZqPqh 2NsQO7PZBCDm7LukLsz5bFuj5cijPnTDxJq1Yku5dnTkYSJzIob2GDI6XdXMhNmRl3FK lNHw== X-Gm-Message-State: AODbwcDz/om7PaAcJyTacEhzGvGFfrjazGI/1bmjjaCCZZKsH+t+mjS2 XZHa/1GeMMWIt5MZ6/ofjMkkl3OxLInA X-Received: by 10.157.15.103 with SMTP id 94mr2480112ott.113.1495571947361; Tue, 23 May 2017 13:39:07 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: References: <9c563a4b-424b-242f-b82f-4650ab2637f7@redhat.com> <28e34264-e8c5-5570-c48c-9125893808b2@redhat.com> From: Erich Elsen Date: Tue, 23 May 2017 20:39:00 -0000 Message-ID: Subject: Re: memcpy performance regressions 2.19 -> 2.24(5) To: "H.J. Lu" Cc: "Carlos O'Donell" , GNU C Library Content-Type: multipart/mixed; boundary="94eb2c036852a92ecf055036faa6" X-SW-Source: 2017-05/txt/msg00698.txt.bz2 --94eb2c036852a92ecf055036faa6 Content-Type: text/plain; charset="UTF-8" Content-length: 1657 I was also thinking that it might be nice to have a TUNABLE that sets the implementation of memcpy directly. It would be easier to do this if memcpy.S was memcpy.c. Attached is a patch that does the conversion but doesn't add the tunables. How would you feel about this? It has no runtime impact, probably increases the size slightly, and makes the code easier to read / modify. On Mon, May 22, 2017 at 8:19 PM, Erich Elsen wrote: > Here is the patch that slightly refactors how init_cacheinfo is called. > > On Mon, May 22, 2017 at 7:24 PM, H.J. Lu wrote: >> On Mon, May 22, 2017 at 6:23 PM, Erich Elsen wrote: >>> I definitely think increasing the size in the case of processors with >>> a large number of cores makes sense. Hopefully with some testing we >>> can confirm it is a net win and/or find a more empirical number. >>> >>> Thanks for that patch with the tunable support. I've just put a >>> similar patch in review for sharing right now. It adds support in the >>> case that HAVE_TUNABLES isn't defined like the similar code in arena.c >>> and also makes a minor change that turns init_cacheinfo into a >>> init_cacheinfo_impl (a hidden callable). init_cacheinfo is now a >>> constructor that just calls the impl and passes the cpu_features >>> struct. This is useful in that it makes the code a bit more modular >>> (something that we'll need to be able to test this internally). >> >> This sounds a good idea. I'd also like to add tunable support in >> init_cpu_features to turn on/off CPU features. non_temporal_threshold >> will be one of them. >> >> >> -- >> H.J. --94eb2c036852a92ecf055036faa6 Content-Type: text/x-patch; charset="US-ASCII"; name="0001-add-memcpy.c.patch" Content-Disposition: attachment; filename="0001-add-memcpy.c.patch" Content-Transfer-Encoding: base64 X-Attachment-Id: f_j320wiqg1 Content-length: 4356 RnJvbSBhMjk1N2Y1YTBiMjFmOTU4OGU4NzU2MjI4YjExYjg2Zjg4NmIwZjRj IE1vbiBTZXAgMTcgMDA6MDA6MDAgMjAwMQpGcm9tOiBFcmljaCBFbHNlbiA8 ZXJpY2hlQGdvb2dsZS5jb20+CkRhdGU6IFR1ZSwgMjMgTWF5IDIwMTcgMTI6 Mjk6MjQgLTA3MDAKU3ViamVjdDogW1BBVENIXSBhZGQgbWVtY3B5LmMKCi0t LQogc3lzZGVwcy94ODZfNjQvbXVsdGlhcmNoL21lbWNweS5jIHwgNzAgKysr KysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrCiAxIGZpbGUg Y2hhbmdlZCwgNzAgaW5zZXJ0aW9ucygrKQogY3JlYXRlIG1vZGUgMTAwNjQ0 IHN5c2RlcHMveDg2XzY0L211bHRpYXJjaC9tZW1jcHkuYwoKZGlmZiAtLWdp dCBhL3N5c2RlcHMveDg2XzY0L211bHRpYXJjaC9tZW1jcHkuYyBiL3N5c2Rl cHMveDg2XzY0L211bHRpYXJjaC9tZW1jcHkuYwpuZXcgZmlsZSBtb2RlIDEw MDY0NAppbmRleCAwMDAwMDAwMDAwLi5iMGZmOGM3MWZkCi0tLSAvZGV2L251 bGwKKysrIGIvc3lzZGVwcy94ODZfNjQvbXVsdGlhcmNoL21lbWNweS5jCkBA IC0wLDAgKzEsNzAgQEAKKyNpbmNsdWRlICJjcHUtZmVhdHVyZXMuaCIKKyNp bmNsdWRlICJpbml0LWFyY2guaCIKKyNpbmNsdWRlICJzaGxpYi1jb21wYXQu aCIKKyNpbmNsdWRlIDxzdGRsaWIuaD4KKwordHlwZWRlZiB2b2lkICogKCpt ZW1jcHlfZm4pKHZvaWQgKiwgY29uc3Qgdm9pZCAqLCBzaXplX3QpOworCitl eHRlcm4gdm9pZCAqIF9fbWVtY3B5X2VybXModm9pZCAqZGVzdCwgY29uc3Qg dm9pZCAqc3JjLCBzaXplX3Qgbik7CitleHRlcm4gdm9pZCAqIF9fbWVtY3B5 X3NzZTJfdW5hbGlnbmVkKHZvaWQgKmRlc3QsIGNvbnN0IHZvaWQgKnNyYywg c2l6ZV90IG4pOworZXh0ZXJuIHZvaWQgKiBfX21lbWNweV9zc2UyX3VuYWxp Z25lZF9lcm1zKHZvaWQgKmRlc3QsIGNvbnN0IHZvaWQgKnNyYywgc2l6ZV90 IG4pOworZXh0ZXJuIHZvaWQgKiBfX21lbWNweV9zc3NlMyh2b2lkICpkZXN0 LCBjb25zdCB2b2lkICpzcmMsIHNpemVfdCBuKTsKK2V4dGVybiB2b2lkICog X19tZW1jcHlfc3NzZTNfYmFjayh2b2lkICpkZXN0LCBjb25zdCB2b2lkICpz cmMsIHNpemVfdCBuKTsKK2V4dGVybiB2b2lkICogX19tZW1jcHlfYXZ4X3Vu YWxpZ25lZCh2b2lkICpkZXN0LCBjb25zdCB2b2lkICpzcmMsIHNpemVfdCBu KTsKK2V4dGVybiB2b2lkICogX19tZW1jcHlfYXZ4X3VuYWxpZ25lZF9lcm1z KHZvaWQgKmRlc3QsIGNvbnN0IHZvaWQgKnNyYywgc2l6ZV90IG4pOworZXh0 ZXJuIHZvaWQgKiBfX21lbWNweV9hdng1MTJfdW5hbGlnbmVkKHZvaWQgKmRl c3QsIGNvbnN0IHZvaWQgKnNyYywgc2l6ZV90IG4pOworZXh0ZXJuIHZvaWQg KiBfX21lbWNweV9hdng1MTJfdW5hbGlnbmVkX2VybXModm9pZCAqZGVzdCwg Y29uc3Qgdm9pZCAqc3JjLCBzaXplX3Qgbik7CisKKy8qIERlZmluZWQgaW4g Y2FjaGVpbmZvLmMgKi8KK2V4dGVybiBsb25nIGludCBfX3g4Nl9zaGFyZWRf Y2FjaGVfc2l6ZSBhdHRyaWJ1dGVfaGlkZGVuOworZXh0ZXJuIGxvbmcgaW50 IF9feDg2X3NoYXJlZF9jYWNoZV9zaXplX2hhbGYgYXR0cmlidXRlX2hpZGRl bjsKK2V4dGVybiBsb25nIGludCBfX3g4Nl9kYXRhX2NhY2hlX3NpemUgYXR0 cmlidXRlX2hpZGRlbjsKK2V4dGVybiBsb25nIGludCBfX3g4Nl9kYXRhX2Nh Y2hlX3NpemVfaGFsZiBhdHRyaWJ1dGVfaGlkZGVuOworZXh0ZXJuIGxvbmcg aW50IF9feDg2X3NoYXJlZF9ub25fdGVtcG9yYWxfdGhyZXNob2xkIGF0dHJp YnV0ZV9oaWRkZW47CisKK3N0YXRpYyB2b2lkICogc2VsZWN0X21lbWNweV9p bXBsKHZvaWQpIHsKKyAgY29uc3Qgc3RydWN0IGNwdV9mZWF0dXJlcyogY3B1 X2ZlYXR1cmVzX3N0cnVjdF9wID0gX19nZXRfY3B1X2ZlYXR1cmVzICgpOwor CisgIGlmIChDUFVfRkVBVFVSRVNfQVJDSF9QKGNwdV9mZWF0dXJlc19zdHJ1 Y3RfcCwgUHJlZmVyX0VSTVMpKSB7CisgICAgcmV0dXJuIF9fbWVtY3B5X2Vy bXM7CisgIH0KKworICBpZiAoQ1BVX0ZFQVRVUkVTX0FSQ0hfUChjcHVfZmVh dHVyZXNfc3RydWN0X3AsIEFWWDUxMkZfVXNhYmxlKSkgeworICAgIGlmIChD UFVfRkVBVFVSRVNfQVJDSF9QKGNwdV9mZWF0dXJlc19zdHJ1Y3RfcCwgUHJl ZmVyX05vX1ZaRVJPVVBQRVIpKQorICAgICAgcmV0dXJuIF9fbWVtY3B5X2F2 eDUxMl91bmFsaWduZWRfZXJtczsKKyAgICByZXR1cm4gX19tZW1jcHlfYXZ4 NTEyX3VuYWxpZ25lZDsKKyAgfQorCisgIGlmIChDUFVfRkVBVFVSRVNfQVJD SF9QKGNwdV9mZWF0dXJlc19zdHJ1Y3RfcCwgQVZYX0Zhc3RfVW5hbGlnbmVk X0xvYWQpKSB7CisgICAgaWYgKENQVV9GRUFUVVJFU19DUFVfUChjcHVfZmVh dHVyZXNfc3RydWN0X3AsIEVSTVMpKSB7CisgICAgICByZXR1cm4gX19tZW1j cHlfYXZ4X3VuYWxpZ25lZF9lcm1zOworCisgICAgfQorICAgIHJldHVybiBf X21lbWNweV9hdnhfdW5hbGlnbmVkOworICB9CisgIGVsc2UgeworICAgIGlm IChDUFVfRkVBVFVSRVNfQVJDSF9QKGNwdV9mZWF0dXJlc19zdHJ1Y3RfcCwg RmFzdF9VbmFsaWduZWRfQ29weSkpIHsKKyAgICAgIGlmIChDUFVfRkVBVFVS RVNfQ1BVX1AoY3B1X2ZlYXR1cmVzX3N0cnVjdF9wLCBFUk1TKSkgeworICAg ICAgICByZXR1cm4gX19tZW1jcHlfc3NlMl91bmFsaWduZWRfZXJtczsKKwor ICAgICAgfQorICAgICAgcmV0dXJuIF9fbWVtY3B5X3NzZTJfdW5hbGlnbmVk OworICAgIH0KKyAgICBlbHNlIHsKKyAgICAgIGlmICghQ1BVX0ZFQVRVUkVT X0NQVV9QKGNwdV9mZWF0dXJlc19zdHJ1Y3RfcCwgU1NTRTMpKSB7CisgICAg ICAgIHJldHVybiBfX21lbWNweV9zc2UyX3VuYWxpZ25lZDsKKworICAgICAg fQorICAgICAgaWYgKENQVV9GRUFUVVJFU19BUkNIX1AoY3B1X2ZlYXR1cmVz X3N0cnVjdF9wLCBGYXN0X0NvcHlfQmFja3dhcmQpKSB7CisgICAgICAgIHJl dHVybiBfX21lbWNweV9zc3NlM19iYWNrOworCisgICAgICB9CisgICAgICBy ZXR1cm4gX19tZW1jcHlfc3NzZTM7CisgICAgfQorICB9Cit9CisKK3ZvaWQg Kl9fbmV3X21lbWNweSh2b2lkICpkZXN0LCBjb25zdCB2b2lkICpzcmMsIHNp emVfdCBuKQorICBfX2F0dHJpYnV0ZV9fICgoaWZ1bmMgKCJzZWxlY3RfbWVt Y3B5X2ltcGwiKSkpOworCit2ZXJzaW9uZWRfc3ltYm9sKGxpYmMsIF9fbmV3 X21lbWNweSwgbWVtY3B5LCBHTElCQ18yXzE0KTsKLS0gCjIuMTMuMC4yMTku Z2RiNjVhY2M4ODItZ29vZwoK --94eb2c036852a92ecf055036faa6--