From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-patches-return-472319-listarch-gcc-patches=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 7959 invoked by alias); 30 Jan 2018 21:42:04 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Received: (qmail 7927 invoked by uid 89); 30 Jan 2018 21:42:03 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-9.8 required=5.0 tests=AWL,BAYES_00,FREEMAIL_FROM,GIT_PATCH_2,GIT_PATCH_3,KAM_ASCII_DIVIDERS,KAM_LINEPADDING,RCVD_IN_DNSWL_NONE,SPF_PASS autolearn=ham version=3.3.2 spammy=figures, H*f:sk:2w@mail, hesitate, month
X-Spam-User: qpsmtpd, 2 recipients
X-HELO: mail-wm0-f52.google.com
Received: from mail-wm0-f52.google.com (HELO mail-wm0-f52.google.com) (74.125.82.52) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Tue, 30 Jan 2018 21:42:01 +0000
Received: by mail-wm0-f52.google.com with SMTP id v123so4015699wmd.5;        Tue, 30 Jan 2018 13:42:01 -0800 (PST)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;        d=1e100.net; s=20161025;        h=x-gm-message-state:subject:to:references:from:message-id:date         :user-agent:mime-version:in-reply-to:content-transfer-encoding         :content-language;        bh=kEVU9CBJ/+uzrT/z32fIOC5INuhyXGoVVd4MhPB6boE=;        b=KcsGRhiT7skqhNCs82DDuRkys6x2/ddIPYKG8aMrTlMeMNCqlov9dvc/ybz8YibGKA         Ii1inPdrtTT+trJ8vdwffsyYrdArcSK3zWOvFTzC8/J3gHIgJAxqpLKKW5UHdeNxL0qk         pyv6AmKD4C8i1plH8zA8koQnr5uGGB7z8T5Ml5a+4ZEWum4WDuSgv0A01UZ9r4781yl6         b3FcceiV+lFtiPz5kmwF1F98VlMXGnRzx5UFtTi3QSNgADROrMY5jylTrR9Jvs6r0JSf         nYy+GWYOVHU38H6WFXAdNji14qjcghSEqRnqZIYKdLTKJ9P4aeEznOmLO4sOluNU6xNa         5Mvg==
X-Gm-Message-State: AKwxytdhFMzzxFMgHbut9L57nT8s6leZ18KQTsXrmTe4JIq9iT1bibFO	MtQBmdA8GQg8ECPUWXbMejM/6Q==
X-Google-Smtp-Source: AH8x225TQ1k1LBuEgdL/z/V8ZoH5h2Plcizl0lQxRzWirghngSBUFg+isAZPPgQ2fGqeUtk0o6t7UA==
X-Received: by 10.80.148.217 with SMTP id t25mr53939511eda.121.1517348518877;        Tue, 30 Jan 2018 13:41:58 -0800 (PST)
Received: from [192.168.0.23] (arf62-1-82-237-250-248.fbx.proxad.net. [82.237.250.248])        by smtp.googlemail.com with ESMTPSA id b46sm8131504edd.73.2018.01.30.13.41.57        (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);        Tue, 30 Jan 2018 13:41:58 -0800 (PST)
Subject: Re: [C++ PATCH] Speed up inplace_merge algorithm & fix inefficient logic(PR c++/83938)
To: libstdc++@gcc.gnu.org, gcc-patches <gcc-patches@gcc.gnu.org>
References: <CALnYPH-iONCpF595=MpWP=w_+4xGqU11=Br4Yh=g=wdf2Ken3A@mail.gmail.com> <CADzB+2kDq7A_h_+YdgLHQsqO+rhZabndHzxLmqQ5+bf3jHobxw@mail.gmail.com> <CALnYPH__r8ragj4j0nAY2DKOUFZafYoEEbsjJmMp0Cw-3x=-2w@mail.gmail.com>
From: =?UTF-8?Q?Fran=c3=a7ois_Dumont?= <frs.dumont@gmail.com>
Message-ID: <35c57e89-11b9-e67d-71c3-66ed318d1f29@gmail.com>
Date: Tue, 30 Jan 2018 22:43:00 -0000
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.5.0
MIME-Version: 1.0
In-Reply-To: <CALnYPH__r8ragj4j0nAY2DKOUFZafYoEEbsjJmMp0Cw-3x=-2w@mail.gmail.com>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
X-SW-Source: 2018-01/txt/msg02390.txt.bz2

Your change looks good, don't hesitate to re-submit a proper patch with 
the point 1 below fix. You should add gcc-patches@gcc.gnu.org when you 
do so.

Remaining questions will be:
Copyright assignment, your change seems limited enough to avoid it but 
you will need an official maintainer confirmation.

Stage 1: We are in a Fix-Only development phase. Your change brings such 
an important improvement that it could be considered as a bug fix but 
once again an official maintainer will have to decide. Otherwise you 
will have to wait for a couple of month to see you patch committed.

FranÃ§ois

On 25/01/2018 23:37, chang jc wrote:
> Hi:
>
> 1. The __len = (__len + 1) / 2; is as you suggested, need to modify as
> __len = (__len == 1) ? 0 : ((__len + 1) / 2);
>
> 2. The coding gain has been shown  PR c++/83938; I re-post here
>
>
>
>
>    21
>    20
>    19
>    18
>    17
>    16
>
>
>    0.471136
>    0.625695
>    0.767262
>    0.907461
>    1.04838
>    1.19508
>
>
>    0.340845
>    0.48651
>    0.639139
>    0.770133
>    0.898454
>    1.04632
>
> it means when Merge [0, 4325376, 16777216); A is a sorted integer with
> 4325376 & B with 12451840 elements, total with 16M integers
>
> The proposed method has the speed up under given buffer size =, ex
> 2^16, 2^17, ... 2^21 in unit of sizeof(int), for example, 2^16 means
> given sizof(int)*64K bytes.
>
> 3. As your suggestion, _TmpBuf __buf should be rewrite.
>
> 4. It represents a fact that the intuitive idea to split from larger
> part is wrong.
>
> For example, if you have an input sorted array A & B, A has 8 integers
> & B has 24 integers. Given tmp buffer whose capacity = 4 integers.
>
> Current it tries to split from B, right?
>
> Then we have:
>
> A1 | A2  B1 | B2
>
> B1 & B2 has 12 integers each, right?
>
> Current algorithm selects pivot as 13th integer from B, right?
>
> If the corresponding upper bound of A is 6th integer.
>
> Then it split in
>
> A1 = 5 | A2 = 3 | B1 = 12 | B2 = 12
>
> After rotate, we have two arrays to merge
>
> [A1 = 5 | B1 = 12]  & [A2 = 3 | B2 = 12]
>
> Great, [A2 = 3 | B2 = 12] can use tmp buffer to merge.
>
> Sadly, [A1 = 5 | B1 = 12] CANNOT.
>
> So we do rotate again, split & merge the two split arrays from [A1 = 5
> | B1 = 12] again.
>
>
> But wait, if we always split from the smaller one instead of larger one.
>
> After rotate, it promises two split arrays both contain ceiling[small/2].
>
> Since tmp buffer also allocate by starting from sizeof(small) &
> recursively downgrade by ceiling[small/2^(# of fail allocate)].
>
> It means the allocated tmp buffer promises to be sufficient at the
> level of (# of fail allocate).
>
> Instead, you can see if split from large at level (# of fail allocate)
> several split array still CANNOT use  tmp buffer to do buffered merge.
>
>
> As you know, buffered merge is far speed then (split, rotate, and
> merge two sub-arrays) (PR c++/83938 gives the profiling figures),
>
> the way should provide speedup.
>
>
> Thanks.
>
>
>
>
>
>
>
>
>
>
> On 24/01/2018 18:23, FranÃ§ois Dumont wrote:
>
> Hi
>
>
>      It sounds like a very sensitive change to make but nothing worth figures.
> Do you have any bench showing the importance of the gain ?
>
>      At least the memory usage optimization is obvious.
>
> On 19/01/2018 10:43, chang jc wrote:
>
> Current std::inplace_merge() suffers from performance issue by inefficient
>
> logic under limited memory,
>
> It leads to performance downgrade.
>
> Please help to review it.
>
> Index: include/bits/stl_algo.h
> ===================================================================
> --- include/bits/stl_algo.h    (revision 256871)
> +++ include/bits/stl_algo.h    (working copy)
> @@ -2437,7 +2437,7 @@
>          _BidirectionalIterator __second_cut = __middle;
>          _Distance __len11 = 0;
>          _Distance __len22 = 0;
> -      if (__len1 > __len2)
> +      if (__len1 < __len2)
>            {
>              __len11 = __len1 / 2;
>              std::advance(__first_cut, __len11);
> @@ -2539,9 +2539,15 @@
>          const _DistanceType __len1 = std::distance(__first, __middle);
>          const _DistanceType __len2 = std::distance(__middle, __last);
>
> +
>
>          typedef _Temporary_buffer<_BidirectionalIterator, _ValueType>
> _TmpBuf;
>
> -      _TmpBuf __buf(__first, __last);
> -
> +      _BidirectionalIterator __start, __end;
> +      if (__len1 < __len2) {
> +    __start = __first; __end = __middle;
> +      } else {
> +    __start = __middle; __end = __last;
> +      }
> +      _TmpBuf __buf(__start, ___end);
>
> Note another optimization, we could introduce a _Temporary_buffer<> constructor
> in order to write:
>
> _TmpBuf __buf(std::min(__len1, __len2), __first);
>
> So that std::distance do not need to be called again.
>
>          if (__buf.begin() == 0)
>        std::__merge_without_buffer
>          (__first, __middle, __last, __len1, __len2, __comp);
> Index: include/bits/stl_tempbuf.h
> ===================================================================
> --- include/bits/stl_tempbuf.h    (revision 256871)
> +++ include/bits/stl_tempbuf.h    (working copy)
> @@ -95,7 +95,7 @@
>                                std::nothrow));
>          if (__tmp != 0)
>            return std::pair<_Tp*, ptrdiff_t>(__tmp, __len);
> -      __len /= 2;
> +      __len = (__len + 1) / 2;
>
> This part is problematic, if __len is 1 and allocation fails it will loop
> forever.
>
> It doesn't seem really necessary for your patch.
>
>
> 2018-01-20 4:05 GMT+08:00 Jason Merrill <jason@redhat.com>:
>
>> This is a libstdc++ bug and patch, not the C++ front end.  So I'm
>> adding the libstdc++ list to CC.
>>
>> On Fri, Jan 19, 2018 at 3:02 AM, chang jc <r97922153@gmail.com> wrote:
>>> Current std::inplace_merge() suffers from performance issue by
>> inefficient
>>> logic under limited memory,
>>>
>>> It leads to performance downgrade.
>>>
>>> Please help to review it.
>>>
>>> Index: include/bits/stl_algo.h
>>> ===================================================================
>>> --- include/bits/stl_algo.h     (revision 256871)
>>> +++ include/bits/stl_algo.h     (working copy)
>>> @@ -2437,7 +2437,7 @@
>>>            _BidirectionalIterator __second_cut = __middle;
>>>            _Distance __len11 = 0;
>>>            _Distance __len22 = 0;
>>> -         if (__len1 > __len2)
>>> +         if (__len1 < __len2)
>>>              {
>>>                __len11 = __len1 / 2;
>>>                std::advance(__first_cut, __len11);
>>> @@ -2539,9 +2539,15 @@
>>>         const _DistanceType __len1 = std::distance(__first, __middle);
>>>         const _DistanceType __len2 = std::distance(__middle, __last);
>>>
>>> +
>>>         typedef _Temporary_buffer<_BidirectionalIterator, _ValueType>
>> _TmpBuf;
>>> -      _TmpBuf __buf(__first, __last);
>>> -
>>> +      _BidirectionalIterator __start, __end;
>>> +      if (__len1 < __len2) {
>>> +       __start = __first; __end = __middle;
>>> +      } else {
>>> +       __start = __middle; __end = __last;
>>> +      }
>>> +      _TmpBuf __buf(__start, ___end);
>>>         if (__buf.begin() == 0)
>>>          std::__merge_without_buffer
>>>            (__first, __middle, __last, __len1, __len2, __comp);
>>> Index: include/bits/stl_tempbuf.h
>>> ===================================================================
>>> --- include/bits/stl_tempbuf.h  (revision 256871)
>>> +++ include/bits/stl_tempbuf.h  (working copy)
>>> @@ -95,7 +95,7 @@
>>>                                                          std::nothrow));
>>>            if (__tmp != 0)
>>>              return std::pair<_Tp*, ptrdiff_t>(__tmp, __len);
>>> -         __len /= 2;
>>> +         __len = (__len + 1) / 2;
>>>          }
>>>         return std::pair<_Tp*, ptrdiff_t>(static_cast<_Tp*>(0), 0);
>>>       }
>>>
>>>
>>>
>>>
>>> Thanks