From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 7959 invoked by alias); 30 Jan 2018 21:42:04 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 7927 invoked by uid 89); 30 Jan 2018 21:42:03 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-9.8 required=5.0 tests=AWL,BAYES_00,FREEMAIL_FROM,GIT_PATCH_2,GIT_PATCH_3,KAM_ASCII_DIVIDERS,KAM_LINEPADDING,RCVD_IN_DNSWL_NONE,SPF_PASS autolearn=ham version=3.3.2 spammy=figures, H*f:sk:2w@mail, hesitate, month X-Spam-User: qpsmtpd, 2 recipients X-HELO: mail-wm0-f52.google.com Received: from mail-wm0-f52.google.com (HELO mail-wm0-f52.google.com) (74.125.82.52) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Tue, 30 Jan 2018 21:42:01 +0000 Received: by mail-wm0-f52.google.com with SMTP id v123so4015699wmd.5; Tue, 30 Jan 2018 13:42:01 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-transfer-encoding :content-language; bh=kEVU9CBJ/+uzrT/z32fIOC5INuhyXGoVVd4MhPB6boE=; b=KcsGRhiT7skqhNCs82DDuRkys6x2/ddIPYKG8aMrTlMeMNCqlov9dvc/ybz8YibGKA Ii1inPdrtTT+trJ8vdwffsyYrdArcSK3zWOvFTzC8/J3gHIgJAxqpLKKW5UHdeNxL0qk pyv6AmKD4C8i1plH8zA8koQnr5uGGB7z8T5Ml5a+4ZEWum4WDuSgv0A01UZ9r4781yl6 b3FcceiV+lFtiPz5kmwF1F98VlMXGnRzx5UFtTi3QSNgADROrMY5jylTrR9Jvs6r0JSf nYy+GWYOVHU38H6WFXAdNji14qjcghSEqRnqZIYKdLTKJ9P4aeEznOmLO4sOluNU6xNa 5Mvg== X-Gm-Message-State: AKwxytdhFMzzxFMgHbut9L57nT8s6leZ18KQTsXrmTe4JIq9iT1bibFO MtQBmdA8GQg8ECPUWXbMejM/6Q== X-Google-Smtp-Source: AH8x225TQ1k1LBuEgdL/z/V8ZoH5h2Plcizl0lQxRzWirghngSBUFg+isAZPPgQ2fGqeUtk0o6t7UA== X-Received: by 10.80.148.217 with SMTP id t25mr53939511eda.121.1517348518877; Tue, 30 Jan 2018 13:41:58 -0800 (PST) Received: from [192.168.0.23] (arf62-1-82-237-250-248.fbx.proxad.net. [82.237.250.248]) by smtp.googlemail.com with ESMTPSA id b46sm8131504edd.73.2018.01.30.13.41.57 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 30 Jan 2018 13:41:58 -0800 (PST) Subject: Re: [C++ PATCH] Speed up inplace_merge algorithm & fix inefficient logic(PR c++/83938) To: libstdc++@gcc.gnu.org, gcc-patches References: From: =?UTF-8?Q?Fran=c3=a7ois_Dumont?= Message-ID: <35c57e89-11b9-e67d-71c3-66ed318d1f29@gmail.com> Date: Tue, 30 Jan 2018 22:43:00 -0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.5.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit X-SW-Source: 2018-01/txt/msg02390.txt.bz2 Your change looks good, don't hesitate to re-submit a proper patch with the point 1 below fix. You should add gcc-patches@gcc.gnu.org when you do so. Remaining questions will be: Copyright assignment, your change seems limited enough to avoid it but you will need an official maintainer confirmation. Stage 1: We are in a Fix-Only development phase. Your change brings such an important improvement that it could be considered as a bug fix but once again an official maintainer will have to decide. Otherwise you will have to wait for a couple of month to see you patch committed. François On 25/01/2018 23:37, chang jc wrote: > Hi: > > 1. The __len = (__len + 1) / 2; is as you suggested, need to modify as > __len = (__len == 1) ? 0 : ((__len + 1) / 2); > > 2. The coding gain has been shown PR c++/83938; I re-post here > > > > > 21 > 20 > 19 > 18 > 17 > 16 > > > 0.471136 > 0.625695 > 0.767262 > 0.907461 > 1.04838 > 1.19508 > > > 0.340845 > 0.48651 > 0.639139 > 0.770133 > 0.898454 > 1.04632 > > it means when Merge [0, 4325376, 16777216); A is a sorted integer with > 4325376 & B with 12451840 elements, total with 16M integers > > The proposed method has the speed up under given buffer size =, ex > 2^16, 2^17, ... 2^21 in unit of sizeof(int), for example, 2^16 means > given sizof(int)*64K bytes. > > 3. As your suggestion, _TmpBuf __buf should be rewrite. > > 4. It represents a fact that the intuitive idea to split from larger > part is wrong. > > For example, if you have an input sorted array A & B, A has 8 integers > & B has 24 integers. Given tmp buffer whose capacity = 4 integers. > > Current it tries to split from B, right? > > Then we have: > > A1 | A2 B1 | B2 > > B1 & B2 has 12 integers each, right? > > Current algorithm selects pivot as 13th integer from B, right? > > If the corresponding upper bound of A is 6th integer. > > Then it split in > > A1 = 5 | A2 = 3 | B1 = 12 | B2 = 12 > > After rotate, we have two arrays to merge > > [A1 = 5 | B1 = 12] & [A2 = 3 | B2 = 12] > > Great, [A2 = 3 | B2 = 12] can use tmp buffer to merge. > > Sadly, [A1 = 5 | B1 = 12] CANNOT. > > So we do rotate again, split & merge the two split arrays from [A1 = 5 > | B1 = 12] again. > > > But wait, if we always split from the smaller one instead of larger one. > > After rotate, it promises two split arrays both contain ceiling[small/2]. > > Since tmp buffer also allocate by starting from sizeof(small) & > recursively downgrade by ceiling[small/2^(# of fail allocate)]. > > It means the allocated tmp buffer promises to be sufficient at the > level of (# of fail allocate). > > Instead, you can see if split from large at level (# of fail allocate) > several split array still CANNOT use tmp buffer to do buffered merge. > > > As you know, buffered merge is far speed then (split, rotate, and > merge two sub-arrays) (PR c++/83938 gives the profiling figures), > > the way should provide speedup. > > > Thanks. > > > > > > > > > > > On 24/01/2018 18:23, François Dumont wrote: > > Hi > > > It sounds like a very sensitive change to make but nothing worth figures. > Do you have any bench showing the importance of the gain ? > > At least the memory usage optimization is obvious. > > On 19/01/2018 10:43, chang jc wrote: > > Current std::inplace_merge() suffers from performance issue by inefficient > > logic under limited memory, > > It leads to performance downgrade. > > Please help to review it. > > Index: include/bits/stl_algo.h > =================================================================== > --- include/bits/stl_algo.h (revision 256871) > +++ include/bits/stl_algo.h (working copy) > @@ -2437,7 +2437,7 @@ > _BidirectionalIterator __second_cut = __middle; > _Distance __len11 = 0; > _Distance __len22 = 0; > - if (__len1 > __len2) > + if (__len1 < __len2) > { > __len11 = __len1 / 2; > std::advance(__first_cut, __len11); > @@ -2539,9 +2539,15 @@ > const _DistanceType __len1 = std::distance(__first, __middle); > const _DistanceType __len2 = std::distance(__middle, __last); > > + > > typedef _Temporary_buffer<_BidirectionalIterator, _ValueType> > _TmpBuf; > > - _TmpBuf __buf(__first, __last); > - > + _BidirectionalIterator __start, __end; > + if (__len1 < __len2) { > + __start = __first; __end = __middle; > + } else { > + __start = __middle; __end = __last; > + } > + _TmpBuf __buf(__start, ___end); > > Note another optimization, we could introduce a _Temporary_buffer<> constructor > in order to write: > > _TmpBuf __buf(std::min(__len1, __len2), __first); > > So that std::distance do not need to be called again. > > if (__buf.begin() == 0) > std::__merge_without_buffer > (__first, __middle, __last, __len1, __len2, __comp); > Index: include/bits/stl_tempbuf.h > =================================================================== > --- include/bits/stl_tempbuf.h (revision 256871) > +++ include/bits/stl_tempbuf.h (working copy) > @@ -95,7 +95,7 @@ > std::nothrow)); > if (__tmp != 0) > return std::pair<_Tp*, ptrdiff_t>(__tmp, __len); > - __len /= 2; > + __len = (__len + 1) / 2; > > This part is problematic, if __len is 1 and allocation fails it will loop > forever. > > It doesn't seem really necessary for your patch. > > > 2018-01-20 4:05 GMT+08:00 Jason Merrill : > >> This is a libstdc++ bug and patch, not the C++ front end. So I'm >> adding the libstdc++ list to CC. >> >> On Fri, Jan 19, 2018 at 3:02 AM, chang jc wrote: >>> Current std::inplace_merge() suffers from performance issue by >> inefficient >>> logic under limited memory, >>> >>> It leads to performance downgrade. >>> >>> Please help to review it. >>> >>> Index: include/bits/stl_algo.h >>> =================================================================== >>> --- include/bits/stl_algo.h (revision 256871) >>> +++ include/bits/stl_algo.h (working copy) >>> @@ -2437,7 +2437,7 @@ >>> _BidirectionalIterator __second_cut = __middle; >>> _Distance __len11 = 0; >>> _Distance __len22 = 0; >>> - if (__len1 > __len2) >>> + if (__len1 < __len2) >>> { >>> __len11 = __len1 / 2; >>> std::advance(__first_cut, __len11); >>> @@ -2539,9 +2539,15 @@ >>> const _DistanceType __len1 = std::distance(__first, __middle); >>> const _DistanceType __len2 = std::distance(__middle, __last); >>> >>> + >>> typedef _Temporary_buffer<_BidirectionalIterator, _ValueType> >> _TmpBuf; >>> - _TmpBuf __buf(__first, __last); >>> - >>> + _BidirectionalIterator __start, __end; >>> + if (__len1 < __len2) { >>> + __start = __first; __end = __middle; >>> + } else { >>> + __start = __middle; __end = __last; >>> + } >>> + _TmpBuf __buf(__start, ___end); >>> if (__buf.begin() == 0) >>> std::__merge_without_buffer >>> (__first, __middle, __last, __len1, __len2, __comp); >>> Index: include/bits/stl_tempbuf.h >>> =================================================================== >>> --- include/bits/stl_tempbuf.h (revision 256871) >>> +++ include/bits/stl_tempbuf.h (working copy) >>> @@ -95,7 +95,7 @@ >>> std::nothrow)); >>> if (__tmp != 0) >>> return std::pair<_Tp*, ptrdiff_t>(__tmp, __len); >>> - __len /= 2; >>> + __len = (__len + 1) / 2; >>> } >>> return std::pair<_Tp*, ptrdiff_t>(static_cast<_Tp*>(0), 0); >>> } >>> >>> >>> >>> >>> Thanks