From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wr1-x429.google.com (mail-wr1-x429.google.com [IPv6:2a00:1450:4864:20::429]) by sourceware.org (Postfix) with ESMTPS id C207B388B02A; Tue, 9 Jun 2020 20:35:29 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org C207B388B02A Received: by mail-wr1-x429.google.com with SMTP id h5so22805158wrc.7; Tue, 09 Jun 2020 13:35:29 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-transfer-encoding :content-language; bh=4h7ooGW8Sk5DD7AE0GgJZenH4wOINx/r4fLcTlrynDI=; b=PIPt3mAQ4CvO75OcWrL9EuGvzDmXH5Oaj6Eh3SxIxRjsmCdYwmOIV/kbWDP/Bwr3Om kW2dDbVsap3LQaLM8tyStMLvi/eK1NC3mMpzEhPkTvrLK1WN+aAKf6vQb0WHH8XSpP45 WBxJQBIatRvr9z914R5fzWuGwSDi7IjBczwT4cyM3qzSE+2rr1SAaHD/FICcRj5IFagS 1VCfPffqoOMlsOg1sXjMrTqW2gqewXgQKrhZqFDKeDu+UHjTcx51rRIix5OxQ1Jxf2U+ b0nuUkQ7NzXzH3nT7iZj0XbEb/OeJ0fx7RCMpod/vZcX7tpWH+BLUsbcbksl8E2IqBX7 nwpw== X-Gm-Message-State: AOAM532UdUz1GQBjQPJxArooWine9Hn1kj/hrgHO5+QcN/Q0BUqg5xnd J3IG97x2kY0OUTucc4saz/wELN7WSPM= X-Google-Smtp-Source: ABdhPJzkCGtuCv61mNEnVfJsGDtRWOPYeEXqQpyuIGcpu+Rlv36y0zM+jgeBdfsOL6lDeACwepWbRA== X-Received: by 2002:adf:f0d2:: with SMTP id x18mr6318514wro.250.1591734928468; Tue, 09 Jun 2020 13:35:28 -0700 (PDT) Received: from ?IPv6:2a01:e0a:1dc:b1c0:ac43:34f3:4c5b:4208? ([2a01:e0a:1dc:b1c0:ac43:34f3:4c5b:4208]) by smtp.googlemail.com with ESMTPSA id l1sm5172976wrb.31.2020.06.09.13.35.27 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 09 Jun 2020 13:35:27 -0700 (PDT) Subject: Re: libstdc++: Extend memcmp optimization in std::lexicographical_compare To: Jonathan Wakely Cc: "libstdc++@gcc.gnu.org" , gcc-patches References: <0cfba2ec-cd66-fe64-a41c-4028403d9dea@gmail.com> <20200608182040.GA352899@redhat.com> <20200609102437.GW4137376@redhat.com> From: =?UTF-8?Q?Fran=c3=a7ois_Dumont?= Message-ID: Date: Tue, 9 Jun 2020 22:35:26 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.8.0 MIME-Version: 1.0 In-Reply-To: <20200609102437.GW4137376@redhat.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Content-Language: fr X-Spam-Status: No, score=-8.8 required=5.0 tests=BAYES_00, BODY_8BITS, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: libstdc++@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libstdc++ mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 09 Jun 2020 20:35:31 -0000 On 09/06/20 12:24 pm, Jonathan Wakely wrote: > On 08/06/20 19:20 +0100, Jonathan Wakely wrote: >> On 05/06/20 22:24 +0200, François Dumont via Libstdc++ wrote: >>> Hi >>> >>>     Here is the last of my algo patches this time to extend the >>> memcmp optimization to std::deque iterators and _GLIBCXX_DEBUG mode. >>> >>>     To do so I had to return int in implementation details of >>> lexicographical_compare to make sure we can treat a deque iterator >>> range by chunk in a performant way. >>> >>> Tested under Linux x86_64 normal and debug modes. >>> >>>             * include/bits/stl_algobase.h >>>             (__lexicographical_compare_impl): Return int. >>>             (__lexicographical_compare::__lc): Likewise. >>>             (__lexicographical_compare_aux1(_II1, _II1, >>> _II2, _II2)): New. >>> (__lexicographical_compare_aux1(_Deque_iterator<>, _Deque_iterator<>, >>>             _II2, _II2)): Declare. >>>             (__lexicographical_compare_aux1(_II1, _II1, >>>             _Deque_iterator<>, _Deque_iterator<>)): Declare. >>> (__lexicographical_compare_aux1(_Deque_iterator<>, _Deque_iterator<>, >>>             _Deque_iterator<>, _Deque_iterator<>)): Declare. >>>             (__lexicographical_compare_aux): Adapt, call >>> later. >> >> Is this meant to say "latter"? That's still not correct grammar >> though. I think it would be better to name the function it calls >> explicitly: "Call __lexicographical_compare_aux1." >> >>>            >>> (__lexicographical_compare_aux(_Safe_iterator<>, _Safe_iterator<>, >>>             _II2, _II2)): Declare. >>>             (__lexicographical_compare_aux(_II1, _II1, >>>             _Safe_iterator<>, _Safe_iterator<>)): Declare. >>>            >>> (__lexicographical_compare_aux(_Safe_iterator<>, _Safe_iterator<>, >>>             _Safe_iterator<>, _Safe_iterator<>)): Declare. >>>             (std::lexicographical_compare): Adapt, call >>> later without __niter_base >>>             usage. >>>             * include/bits/deque.tcc (__lex_cmp_dit): New. >>>             (__lexicographical_compare_aux1): New. >>>             * include/debug/safe_iterator.tcc >>>             (__lexicographical_compare_aux(const >>> _Safe_iterator<>&, >>>             const _Safe_iterator<>&, _II2, _II2)): New. >>>             (__lexicographical_compare_aux( >>>             const _Safe_iterator<>&, const_Safe_iterator<>&, >>>             const _Safe_iterator<>&, const >>> _Safe_iterator<>&)): New. >>>             * >>> testsuite/25_algorithms/lexicographical_compare/1.cc (test6, test7): >>>             New. >>>             * >>> testsuite/25_algorithms/lexicographical_compare/deque_iterators/1.cc: >>>             New test. >>> >>> Ok to commit ? >>> >>> François >>> >>> >> >>> diff --git a/libstdc++-v3/include/bits/deque.tcc >>> b/libstdc++-v3/include/bits/deque.tcc >>> index 1d32a1691c7..d7dbe64f3e1 100644 >>> --- a/libstdc++-v3/include/bits/deque.tcc >>> +++ b/libstdc++-v3/include/bits/deque.tcc >>> @@ -1261,6 +1261,98 @@ _GLIBCXX_END_NAMESPACE_CONTAINER >>>      return true; >>>    } >>> >>> +  template >> >> _II is the wrong name here, it mean InputIterator. All callers of this >> function are constrained for random access iterators. This cannot be >> called with an input iterator. Please use _RAIter. >> >>> +    int >>> +    __lex_cmp_dit( >>> +    const _GLIBCXX_STD_C::_Deque_iterator<_Tp, _Ref, _Ptr>& __first1, >>> +    const _GLIBCXX_STD_C::_Deque_iterator<_Tp, _Ref, _Ptr>& __last1, >>> +    _II __first2, _II __last2) >>> +    { >>> +      typedef _GLIBCXX_STD_C::_Deque_iterator<_Tp, _Ref, _Ptr> _Iter; >>> +      typedef typename _Iter::difference_type difference_type; >>> + >>> +      if (__first1._M_node != __last1._M_node) >>> +    { >>> +      difference_type __len = __last2 - __first2; >>> +      difference_type __flen >> >> What does "flen" mean? Why not use len2 for last2 - first2, as we do >> elsewhere? And then use len for min(len1, len2)? >> >> >>> +        = std::min(__len, __first1._M_last - __first1._M_cur); >>> +      if (int __ret = std::__lexicographical_compare_aux1( >> >> This call (and the three later in this function) will do overload >> resolution again on the full set of __lexicographical_compare_aux1 >> overloads, which includes the ones for deque iterators. But we know >> that __first1._M_cur and __first1._M_last are not deque iterators. >> >> __first2 could be a deque iterator, but I'm not sure if we really want >> one function that has to handle that case anyway. >> >>> +          __first1._M_cur, __first1._M_last, __first2, __first2 + >>> __flen)) >>> +        return __ret; >>> + >>> +      __first2 += __flen; >>> +      __len -= __flen; >>> +      __flen = std::min(__len, _Iter::_S_buffer_size()); >>> +      for (typename _Iter::_Map_pointer __node = __first1._M_node + 1; >>> +           __node != __last1._M_node; >>> +           __first2 += __flen, __len -= __flen, >>> +           __flen = std::min(__len, _Iter::_S_buffer_size()), >>> +           ++__node) >>> +        if (int __ret = std::__lexicographical_compare_aux1( >>> +          *__node, *__node + _Iter::_S_buffer_size(), >>> +          __first2, __first2 + __flen)) >>> +          return __ret; >>> + >>> +      return std::__lexicographical_compare_aux1( >>> +        __last1._M_first, __last1._M_cur, __first2, __last2); >>> +    } >>> + >>> +      return std::__lexicographical_compare_aux1( >>> +          __first1._M_cur, __last1._M_cur, __first2, __last2); >>> +    } >>> + >>> +  template>> +       typename _II2> >> >> This is not an input iterator either. >> >>> +    typename __gnu_cxx::__enable_if< >>> +      __is_random_access_iter<_II2>::__value, int>::__type >> >> This should be inline. >> >> Also, I'm curious why these overloads use __enable_if rather than the >> conventional approach of tag dispatching using iterator_category. >> (The same question applies to th __equal_aux1 and __copy_move_a1 >> overloads for deque iterators as well). It doesn't really matter >> though. >> >>> +    __lexicographical_compare_aux1( >>> +        _GLIBCXX_STD_C::_Deque_iterator<_Tp1, _Ref1, _Ptr1> __first1, >>> +        _GLIBCXX_STD_C::_Deque_iterator<_Tp1, _Ref1, _Ptr1> __last1, >>> +        _II2 __first2, _II2 __last2) >>> +    { return std::__lex_cmp_dit(__first1, __last1, __first2, >>> __last2); } >>> + >>> +  template>> +       typename _Tp2, typename _Ref2, typename _Ptr2> >>> +    int >> >> This should be inline. >> >>> +    __lexicographical_compare_aux1( >>> +        _GLIBCXX_STD_C::_Deque_iterator<_Tp1, _Ref1, _Ptr1> __first1, >>> +        _GLIBCXX_STD_C::_Deque_iterator<_Tp1, _Ref1, _Ptr1> __last1, >>> +        _GLIBCXX_STD_C::_Deque_iterator<_Tp2, _Ref2, _Ptr2> __first2, >>> +        _GLIBCXX_STD_C::_Deque_iterator<_Tp2, _Ref2, _Ptr2> __last2) >>> +    { return std::__lex_cmp_dit(__first1, __last1, __first2, >>> __last2); } >> >> I'm not convinced it's useful for this function and the one above it >> to share the same logic. >> >>> +  template>> +       typename _Tp2, typename _Ref2, typename _Ptr2> >>> +    typename __gnu_cxx::__enable_if< >>> +      __is_random_access_iter<_II1>::__value, int>::__type >>> +    __lexicographical_compare_aux1( >>> +        _II1 __first1, _II1 __last1, >>> +        _GLIBCXX_STD_C::_Deque_iterator<_Tp2, _Ref2, _Ptr2> __first2, >>> +        _GLIBCXX_STD_C::_Deque_iterator<_Tp2, _Ref2, _Ptr2> __last2) >>> +    { >>> +      typedef _GLIBCXX_STD_C::_Deque_iterator<_Tp2, _Ref2, _Ptr2> >>> _Iter; >>> +      typedef typename _Iter::difference_type difference_type; >>> + >>> +      difference_type __len = __last1 - __first1; >>> +      while (__len > 0) >>> +    { >>> +      const difference_type __flen = __first2._M_node == >>> __last2._M_node >>> +        ? __last2._M_cur - __first2._M_cur >>> +        : __first2._M_last - __first2._M_cur; >>> +      const difference_type __clen = std::min(__len, __flen); >> >> "flen"? "clen"? >> >>> +      if (int __ret = std::__lexicographical_compare_aux1( >>> +                __first1, __first1 + __clen, >>> +                __first2._M_cur, __first2._M_cur + __flen)) >>> +        return __ret; >>> + >>> +      __first1 += __clen; >>> +      __len -= __clen; >>> +      __first2 += __clen; >>> +    } >>> + >>> +      return __first1 == __last1 ? (__first2 != __last2 ? -1 : 0) : 1; >> >> Isn't this function doing basically the same as __lex_cmp_dit? Why >> does it not use the same logic? >> >> Can this whole function just negate the result of __lex_cmp_dit called >> with the arguments switched? >> >>  return -std::__lex_cmp_dit(__first2, __last2, __first1, __last1); >> >> That would also fix the bug that makes the following loop forever: >> >>  std::deque d; >>  int i = 0; >>  std::lexicographical_compare(&i, &i + 1, d.begin(), d.end()); > > Same question for __equal_aux1, why is it implemented twice? Why > doesn't __equal_aux1(_II f1, _II l1, _Deque_iterator<> f2) just call > __equal_aux1(f2, f2 + (l1 - f1), f1) ? > Because I didn't thought about doing it this way. Yet another patch if you don't do it yourself.