From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [63.128.21.124]) by sourceware.org (Postfix) with ESMTP id EFBA8389780D for ; Fri, 5 Mar 2021 20:37:54 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org EFBA8389780D Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-248-Dq6nNN70OBSIjrep0_YOKQ-1; Fri, 05 Mar 2021 15:37:51 -0500 X-MC-Unique: Dq6nNN70OBSIjrep0_YOKQ-1 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.phx2.redhat.com [10.5.11.16]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id DC6E280006E; Fri, 5 Mar 2021 20:37:49 +0000 (UTC) Received: from tucnak.zalov.cz (ovpn-113-215.ams2.redhat.com [10.36.113.215]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 884FD5C1BD; Fri, 5 Mar 2021 20:37:49 +0000 (UTC) Received: from tucnak.zalov.cz (localhost [127.0.0.1]) by tucnak.zalov.cz (8.16.1/8.16.1) with ESMTPS id 125KbkKC3911547 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NOT); Fri, 5 Mar 2021 21:37:46 +0100 Received: (from jakub@localhost) by tucnak.zalov.cz (8.16.1/8.16.1/Submit) id 125Kbjdu3911546; Fri, 5 Mar 2021 21:37:45 +0100 Date: Fri, 5 Mar 2021 21:37:45 +0100 From: Jakub Jelinek To: gcc-patches@gcc.gnu.org, libstdc++@gcc.gnu.org Subject: [PATCH] libstdc++: Improve std::rot[lr] [PR99396] Message-ID: <20210305203745.GB1837485@tucnak> Reply-To: Jakub Jelinek MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.16 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=us-ascii Content-Disposition: inline X-Spam-Status: No, score=-6.0 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H4, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=unavailable autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: libstdc++@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libstdc++ mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 05 Mar 2021 20:37:56 -0000 Hi! As can be seen on: #include unsigned char f1 (unsigned char x, int y) { return std::rotl (x, y); } unsigned char f2 (unsigned char x, int y) { return std::rotr (x, y); } unsigned short f3 (unsigned short x, int y) { return std::rotl (x, y); } unsigned short f4 (unsigned short x, int y) { return std::rotr (x, y); } unsigned int f5 (unsigned int x, int y) { return std::rotl (x, y); } unsigned int f6 (unsigned int x, int y) { return std::rotr (x, y); } unsigned long int f7 (unsigned long int x, int y) { return std::rotl (x, y); } unsigned long int f8 (unsigned long int x, int y) { return std::rotr (x, y); } unsigned long long int f9 (unsigned long long int x, int y) { return std::rotl (x, y); } unsigned long long int f10 (unsigned long long int x, int y) { return std::rotr (x, y); } //unsigned __int128 f11 (unsigned __int128 x, int y) { return std::rotl (x, y); } //unsigned __int128 f12 (unsigned __int128 x, int y) { return std::rotr (x, y); } constexpr auto a = std::rotl (1234U, 0); constexpr auto b = std::rotl (1234U, 5); constexpr auto c = std::rotl (1234U, -5); constexpr auto d = std::rotl (1234U, -__INT_MAX__ - 1); the current definitions of std::__rot[lr] aren't pattern recognized as rotates, they are too long/complex for that, starting with signed modulo, special case for 0 and different cases for positive and negative. For types with power of two bits the following patch adds definitions that the compiler can pattern recognize and turn e.g. on x86_64 into ro[lr][bwlq] instructions. For weirdo types like unsigned __int20 etc. it keeps the current definitions. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2021-03-05 Jakub Jelinek PR libstdc++/99396 * include/std/bit (__rotl, __rotr): Add optimized variants for power of two _Nd which the compiler can pattern match the rotates. --- libstdc++-v3/include/std/bit.jj 2021-03-05 10:37:36.108378753 +0100 +++ libstdc++-v3/include/std/bit 2021-03-05 12:01:57.926310110 +0100 @@ -68,6 +68,14 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION __rotl(_Tp __x, int __s) noexcept { constexpr auto _Nd = __gnu_cxx::__int_traits<_Tp>::__digits; + if _GLIBCXX17_CONSTEXPR ((_Nd & (_Nd - 1)) == 0) + { + // Variant for power of two _Nd which the compiler can + // easily pattern match. + constexpr unsigned __uNd = _Nd; + const unsigned __r = __s; + return (__x << (__r % __uNd)) | (__x >> ((-__r) % __uNd)); + } const int __r = __s % _Nd; if (__r == 0) return __x; @@ -82,6 +90,14 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION __rotr(_Tp __x, int __s) noexcept { constexpr auto _Nd = __gnu_cxx::__int_traits<_Tp>::__digits; + if _GLIBCXX17_CONSTEXPR ((_Nd & (_Nd - 1)) == 0) + { + // Variant for power of two _Nd which the compiler can + // easily pattern match. + constexpr unsigned __uNd = _Nd; + const unsigned __r = __s; + return (__x >> (__r % __uNd)) | (__x << ((-__r) % __uNd)); + } const int __r = __s % _Nd; if (__r == 0) return __x; Jakub