From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 72587 invoked by alias); 30 Aug 2019 14:27:38 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 72573 invoked by uid 89); 30 Aug 2019 14:27:38 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-15.6 required=5.0 tests=AWL,BAYES_00,FREEMAIL_FROM,GIT_PATCH_0,GIT_PATCH_1,GIT_PATCH_2,GIT_PATCH_3,KAM_LOTSOFHASH,RCVD_IN_DNSWL_NONE,SPF_PASS autolearn=ham version=3.3.1 spammy=micro, Leading, 010, locality X-HELO: mail-qk1-f169.google.com Received: from mail-qk1-f169.google.com (HELO mail-qk1-f169.google.com) (209.85.222.169) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Fri, 30 Aug 2019 14:27:35 +0000 Received: by mail-qk1-f169.google.com with SMTP id m2so6250340qkd.10; Fri, 30 Aug 2019 07:27:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:from:date:message-id:subject:to; bh=6JC8iXcHX6iUIqZMHoOjPQe7INiHWfdGPtS9nTZxP2Q=; b=Mp3BktWvKgr/NgGMZxYnWBSQTy1FcZFTw0fLOTmcFAbKxK3ArUQ25fTBVRVWUCLYA4 oNsPKdFNUcVd033deiUoZ7ThL6/JL8mjOE03u+A31vKmhYkP5V9kzIeixP9YVgzrrlwQ fYr4J9p3zhv+eLTS824EBo3oEmgEIPI1u8Wo4sjqtypK5ibH1/s1Q62l/xt0MOg4RZ/V 2tsN1/yZPuL8pQBcS8U3f7Lbo5W1do7oI3sasSGtcEoLRLUoz2oyT4FQdFjRs42c/Vm4 oSfjtoO9ZH8KMDWmak2PGiEvSk0JdL/rF1UWCHQGOWz+GN6c2YrydPENCwyFuZhRdkhl pU5g== MIME-Version: 1.0 From: Antony Polukhin Date: Fri, 30 Aug 2019 14:39:00 -0000 Message-ID: Subject: [PATCH] Optimize to_chars To: "libstdc++" , gcc-patches List Content-Type: multipart/mixed; boundary="0000000000004f5e420591566cab" X-SW-Source: 2019-08/txt/msg02082.txt.bz2 --0000000000004f5e420591566cab Content-Type: text/plain; charset="UTF-8" Content-length: 2069 Bunch of micro optimizations for std::to_chars: * For base == 8 replacing the lookup in __digits table with arithmetic computations leads to a same CPU cycles for a loop (exchanges two movzx with 3 bit ops https://godbolt.org/z/RTui7m ). However this saves 129 bytes of data and totally avoids a chance of cache misses on __digits. * For base == 16 replacing the lookup in __digits table with arithmetic computations leads to a few additional instructions, but totally avoids a chance of cache misses on __digits (- ~9 cache misses for worst case) and saves 513 bytes of const data. * Replacing __first[pos] and __first[pos - 1] with __first[1] and __first[0] on final iterations saves ~2% of code size. * Removing trailing '\0' from arrays of digits allows the linker to merge the symbols (so that "0123456789abcdefghijklmnopqrstuvwxyz" and "0123456789abcdef" could share the same address). This improves data locality and reduces binary sizes. * Using __detail::__to_chars_len_2 instead of a generic __detail::__to_chars_len makes the operation O(1) instead of O(N). It also makes the code two times shorter ( https://godbolt.org/z/Peq_PG) . In sum: this significantly reduces the size of a binary (for about 4KBs only for base-8 conversion https://godbolt.org/z/WPKijS ), deals with latency (CPU cache misses) without changing the iterations count and without adding costly instructions into the loops. Changelog: * include/std/charconv (__detail::__to_chars_8, __detail::__to_chars_16): Replace array of precomputed digits with arithmetic operations to avoid CPU cache misses. Remove zero termination from array of digits to allow symbol merge with generic implementation of __detail::__to_chars. Replace final offsets with constants. Use __detail::__to_chars_len_2 instead of a generic __detail::__to_chars_len. * include/std/charconv (__detail::__to_chars): Remove zero termination from array of digits. * include/std/charconv (__detail::__to_chars_2): Leading digit is always '1'. -- Best regards, Antony Polukhin --0000000000004f5e420591566cab Content-Type: text/plain; charset="US-ASCII"; name="charconv_patch.txt" Content-Disposition: attachment; filename="charconv_patch.txt" Content-Transfer-Encoding: base64 Content-ID: X-Attachment-Id: f_jzy7m0nu0 Content-length: 6837 ZGlmZiAtLWdpdCBhL2xpYnN0ZGMrKy12My9DaGFuZ2VMb2cgYi9saWJzdGRj KystdjMvQ2hhbmdlTG9nCmluZGV4IGVhYTZmNzQuLjM1NzA2ZDAgMTAwNjQ0 Ci0tLSBhL2xpYnN0ZGMrKy12My9DaGFuZ2VMb2cKKysrIGIvbGlic3RkYysr LXYzL0NoYW5nZUxvZwpAQCAtMSwzICsxLDE3IEBACisyMDE5LTA4LTMwICBB bnRvbnkgUG9sdWtoaW4gIDxhbnRvc2hra2FAZ21haWwuY29tPgorCisJKiBp bmNsdWRlL3N0ZC9jaGFyY29udiAoX19kZXRhaWw6Ol9fdG9fY2hhcnNfOCwK KwlfX2RldGFpbDo6X190b19jaGFyc18xNik6IFJlcGxhY2UgYXJyYXkgb2Yg cHJlY29tcHV0ZWQgZGlnaXRzCisJd2l0aCBhcml0aG1ldGljIG9wZXJhdGlv bnMgdG8gYXZvaWQgQ1BVIGNhY2hlIG1pc3Nlcy4gUmVtb3ZlCisJemVybyB0 ZXJtaW5hdGlvbiBmcm9tIGFycmF5IG9mIGRpZ2l0cyB0byBhbGxvdyBzeW1i b2wgbWVyZ2Ugd2l0aAorCWdlbmVyaWMgaW1wbGVtZW50YXRpb24gb2YgX19k ZXRhaWw6Ol9fdG9fY2hhcnMuIFJlcGxhY2UgZmluYWwKKwlvZmZzZXRzIHdp dGggY29uc3RhbnRzLiBVc2UgX19kZXRhaWw6Ol9fdG9fY2hhcnNfbGVuXzIg aW5zdGVhZAorCW9mIGEgZ2VuZXJpYyBfX2RldGFpbDo6X190b19jaGFyc19s ZW4uCisJKiBpbmNsdWRlL3N0ZC9jaGFyY29udiAoX19kZXRhaWw6Ol9fdG9f Y2hhcnMpOiBSZW1vdmUKKwl6ZXJvIHRlcm1pbmF0aW9uIGZyb20gYXJyYXkg b2YgZGlnaXRzLgorCSogaW5jbHVkZS9zdGQvY2hhcmNvbnYgKF9fZGV0YWls OjpfX3RvX2NoYXJzXzIpOiBMZWFkaW5nIGRpZ2l0CisJaXMgYWx3YXlzICcx Jy4KKwogMjAxOS0wOC0yOSAgSm9uYXRoYW4gV2FrZWx5ICA8andha2VseUBy ZWRoYXQuY29tPgogCiAJUFIgbGlic3RkYysrLzkxMDY3CmRpZmYgLS1naXQg YS9saWJzdGRjKystdjMvaW5jbHVkZS9zdGQvY2hhcmNvbnYgYi9saWJzdGRj KystdjMvaW5jbHVkZS9zdGQvY2hhcmNvbnYKaW5kZXggNTNhYTYzZS4uNGU5 NGMzOSAxMDA2NDQKLS0tIGEvbGlic3RkYysrLXYzL2luY2x1ZGUvc3RkL2No YXJjb252CisrKyBiL2xpYnN0ZGMrKy12My9pbmNsdWRlL3N0ZC9jaGFyY29u dgpAQCAtMTMxLDcgKzEzMSw3IEBAIG5hbWVzcGFjZSBfX2RldGFpbAogCSAg ICA6IDF1OwogCX0KICAgICAgIGVsc2UKLQlyZXR1cm4gX190b19jaGFyc19s ZW4oX192YWx1ZSwgOCk7CisJcmV0dXJuIChfX3RvX2NoYXJzX2xlbl8yKF9f dmFsdWUpICsgMikgLyAzOwogICAgIH0KIAogICAvLyBHZW5lcmljIGltcGxl bWVudGF0aW9uIGZvciBhcmJpdHJhcnkgYmFzZXMuCkBAIC0xNTUsOCArMTU1 LDEyIEBAIG5hbWVzcGFjZSBfX2RldGFpbAogCiAgICAgICB1bnNpZ25lZCBf X3BvcyA9IF9fbGVuIC0gMTsKIAotICAgICAgc3RhdGljIGNvbnN0ZXhwciBj aGFyIF9fZGlnaXRzW10KLQk9ICIwMTIzNDU2Nzg5YWJjZGVmZ2hpamtsbW5v cHFyc3R1dnd4eXoiOworICAgICAgc3RhdGljIGNvbnN0ZXhwciBjaGFyIF9f ZGlnaXRzW10gPSB7CisJJzAnLCAnMScsICcyJywgJzMnLCAnNCcsICc1Jywg JzYnLCAnNycsICc4JywgJzknLAorCSdhJywgJ2InLCAnYycsICdkJywgJ2Un LCAnZicsICdnJywgJ2gnLCAnaScsICdqJywKKwknaycsICdsJywgJ20nLCAn bicsICdvJywgJ3AnLCAncScsICdyJywgJ3MnLCAndCcsCisJJ3UnLCAndics ICd3JywgJ3gnLCAneScsICd6JworICAgICAgfTsKIAogICAgICAgd2hpbGUg KF9fdmFsID49IF9fYmFzZSkKIAl7CkBAIC0xODEsNyArMTg1LDcgQEAgbmFt ZXNwYWNlIF9fZGV0YWlsCiAKICAgICAgIHRvX2NoYXJzX3Jlc3VsdCBfX3Jl czsKIAotICAgICAgY29uc3QgdW5zaWduZWQgX19sZW4gPSBfX3RvX2NoYXJz X2xlbihfX3ZhbCwgMHgxMCk7CisgICAgICBjb25zdCB1bnNpZ25lZCBfX2xl biA9IChfX3RvX2NoYXJzX2xlbl8yKF9fdmFsKSArIDMpIC8gNDsKIAogICAg ICAgaWYgKF9fYnVpbHRpbl9leHBlY3QoKF9fbGFzdCAtIF9fZmlyc3QpIDwg X19sZW4sIDApKQogCXsKQEAgLTE5MCwzMiArMTk0LDMwIEBAIG5hbWVzcGFj ZSBfX2RldGFpbAogCSAgcmV0dXJuIF9fcmVzOwogCX0KIAotICAgICAgc3Rh dGljIGNvbnN0ZXhwciBjaGFyIF9fZGlnaXRzWzUxM10gPQotCSIwMDAxMDIw MzA0MDUwNjA3MDgwOTBhMGIwYzBkMGUwZjEwMTExMjEzMTQxNTE2MTcxODE5 MWExYjFjMWQxZTFmIgotCSIyMDIxMjIyMzI0MjUyNjI3MjgyOTJhMmIyYzJk MmUyZjMwMzEzMjMzMzQzNTM2MzczODM5M2EzYjNjM2QzZTNmIgotCSI0MDQx NDI0MzQ0NDU0NjQ3NDg0OTRhNGI0YzRkNGU0ZjUwNTE1MjUzNTQ1NTU2NTc1 ODU5NWE1YjVjNWQ1ZTVmIgotCSI2MDYxNjI2MzY0NjU2NjY3Njg2OTZhNmI2 YzZkNmU2ZjcwNzE3MjczNzQ3NTc2Nzc3ODc5N2E3YjdjN2Q3ZTdmIgotCSI4 MDgxODI4Mzg0ODU4Njg3ODg4OThhOGI4YzhkOGU4ZjkwOTE5MjkzOTQ5NTk2 OTc5ODk5OWE5YjljOWQ5ZTlmIgotCSJhMGExYTJhM2E0YTVhNmE3YThhOWFh YWJhY2FkYWVhZmIwYjFiMmIzYjRiNWI2YjdiOGI5YmFiYmJjYmRiZWJmIgot CSJjMGMxYzJjM2M0YzVjNmM3YzhjOWNhY2JjY2NkY2VjZmQwZDFkMmQzZDRk NWQ2ZDdkOGQ5ZGFkYmRjZGRkZWRmIgotCSJlMGUxZTJlM2U0ZTVlNmU3ZThl OWVhZWJlY2VkZWVlZmYwZjFmMmYzZjRmNWY2ZjdmOGY5ZmFmYmZjZmRmZWZm IjsKKyAgICAgIHN0YXRpYyBjb25zdGV4cHIgY2hhciBfX2RpZ2l0c1tdID0g eworCScwJywgJzEnLCAnMicsICczJywgJzQnLCAnNScsICc2JywgJzcnLCAn OCcsICc5JywKKwknYScsICdiJywgJ2MnLCAnZCcsICdlJywgJ2YnCisgICAg ICB9OwogICAgICAgdW5zaWduZWQgX19wb3MgPSBfX2xlbiAtIDE7CiAgICAg ICB3aGlsZSAoX192YWwgPj0gMHgxMDApCiAJewotCSAgYXV0byBjb25zdCBf X251bSA9IChfX3ZhbCAlIDB4MTAwKSAqIDI7Ci0JICBfX3ZhbCAvPSAweDEw MDsKLQkgIF9fZmlyc3RbX19wb3NdID0gX19kaWdpdHNbX19udW0gKyAxXTsK KwkgIGF1dG8gX19udW0gPSBfX3ZhbCAmIDB4RjsKKwkgIF9fdmFsID4+PSA0 OworCSAgX19maXJzdFtfX3Bvc10gPSBfX2RpZ2l0c1tfX251bV07CisJICBf X251bSA9IF9fdmFsICYgMHhGOworCSAgX192YWwgPj49IDQ7CiAJICBfX2Zp cnN0W19fcG9zIC0gMV0gPSBfX2RpZ2l0c1tfX251bV07CiAJICBfX3BvcyAt PSAyOwogCX0KICAgICAgIGlmIChfX3ZhbCA+PSAweDEwKQogCXsKLQkgIGF1 dG8gY29uc3QgX19udW0gPSBfX3ZhbCAqIDI7Ci0JICBfX2ZpcnN0W19fcG9z XSA9IF9fZGlnaXRzW19fbnVtICsgMV07Ci0JICBfX2ZpcnN0W19fcG9zIC0g MV0gPSBfX2RpZ2l0c1tfX251bV07CisJICBjb25zdCBhdXRvIF9fbnVtID0g X192YWwgJiAweEY7CisJICBfX3ZhbCA+Pj0gNDsKKwkgIF9fZmlyc3RbMV0g PSBfX2RpZ2l0c1tfX251bV07CisJICBfX2ZpcnN0WzBdID0gX19kaWdpdHNb X192YWxdOwogCX0KICAgICAgIGVsc2UKLQlfX2ZpcnN0W19fcG9zXSA9ICIw MTIzNDU2Nzg5YWJjZGVmIltfX3ZhbF07CisJX19maXJzdFswXSA9IF9fZGln aXRzW19fdmFsXTsKICAgICAgIF9fcmVzLnB0ciA9IF9fZmlyc3QgKyBfX2xl bjsKICAgICAgIF9fcmVzLmVjID0ge307CiAgICAgICByZXR1cm4gX19yZXM7 CkBAIC0yNjMsMjggKzI2NSwyNiBAQCBuYW1lc3BhY2UgX19kZXRhaWwKIAkg IHJldHVybiBfX3JlczsKIAl9CiAKLSAgICAgIHN0YXRpYyBjb25zdGV4cHIg Y2hhciBfX2RpZ2l0c1sxMjldID0KLQkiMDAwMTAyMDMwNDA1MDYwNzEwMTEx MjEzMTQxNTE2MTciCi0JIjIwMjEyMjIzMjQyNTI2MjczMDMxMzIzMzM0MzUz NjM3IgotCSI0MDQxNDI0MzQ0NDU0NjQ3NTA1MTUyNTM1NDU1NTY1NyIKLQki NjA2MTYyNjM2NDY1NjY2NzcwNzE3MjczNzQ3NTc2NzciOwogICAgICAgdW5z aWduZWQgX19wb3MgPSBfX2xlbiAtIDE7CiAgICAgICB3aGlsZSAoX192YWwg Pj0gMDEwMCkKIAl7Ci0JICBhdXRvIGNvbnN0IF9fbnVtID0gKF9fdmFsICUg MDEwMCkgKiAyOwotCSAgX192YWwgLz0gMDEwMDsKLQkgIF9fZmlyc3RbX19w b3NdID0gX19kaWdpdHNbX19udW0gKyAxXTsKLQkgIF9fZmlyc3RbX19wb3Mg LSAxXSA9IF9fZGlnaXRzW19fbnVtXTsKKwkgIGF1dG8gX19udW0gPSBfX3Zh bCAmIDc7CisJICBfX3ZhbCA+Pj0gMzsKKwkgIF9fZmlyc3RbX19wb3NdID0g JzAnICsgX19udW07CisJICBfX251bSA9IF9fdmFsICYgNzsKKwkgIF9fdmFs ID4+PSAzOworCSAgX19maXJzdFtfX3BvcyAtIDFdID0gJzAnICsgX19udW07 CiAJICBfX3BvcyAtPSAyOwogCX0KICAgICAgIGlmIChfX3ZhbCA+PSAwMTAp CiAJewotCSAgYXV0byBjb25zdCBfX251bSA9IF9fdmFsICogMjsKLQkgIF9f Zmlyc3RbX19wb3NdID0gX19kaWdpdHNbX19udW0gKyAxXTsKLQkgIF9fZmly c3RbX19wb3MgLSAxXSA9IF9fZGlnaXRzW19fbnVtXTsKKwkgIGF1dG8gY29u c3QgX19udW0gPSBfX3ZhbCAmIDc7CisJICBfX3ZhbCA+Pj0gMzsKKwkgIF9f Zmlyc3RbMV0gPSAnMCcgKyBfX251bTsKKwkgIF9fZmlyc3RbMF0gPSAnMCcg KyBfX3ZhbDsKIAl9CiAgICAgICBlbHNlCi0JX19maXJzdFtfX3Bvc10gPSAn MCcgKyBfX3ZhbDsKKwlfX2ZpcnN0WzBdID0gJzAnICsgX192YWw7CiAgICAg ICBfX3Jlcy5wdHIgPSBfX2ZpcnN0ICsgX19sZW47CiAgICAgICBfX3Jlcy5l YyA9IHt9OwogICAgICAgcmV0dXJuIF9fcmVzOwpAQCAtMzE1LDcgKzMxNSwx MCBAQCBuYW1lc3BhY2UgX19kZXRhaWwKIAkgIF9fZmlyc3RbX19wb3MtLV0g PSAnMCcgKyAoX192YWwgJiAxKTsKIAkgIF9fdmFsID4+PSAxOwogCX0KLSAg ICAgICpfX2ZpcnN0ID0gJzAnICsgKF9fdmFsICYgMSk7CisgICAgICAvLyBG aXJzdCBkaWdpdCBpcyBhbHdheXMgJzEnIGJlY2F1c2UgX190b19jaGFyc19s ZW5fMiBza2lwcworICAgICAgLy8gbGVhZGluZyB6ZXJvIGJpdHMgYW5kIHN0 ZDo6dG9fY2hhcnMgaGFuZGxlcyB6ZXJvIHZhbHVlcworICAgICAgLy8gZGly ZWN0bHkuCisgICAgICBfX2ZpcnN0WzBdID0gJzEnOwogCiAgICAgICBfX3Jl cy5wdHIgPSBfX2ZpcnN0ICsgX19sZW47CiAgICAgICBfX3Jlcy5lYyA9IHt9 Owo= --0000000000004f5e420591566cab--