From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-bugzilla@gcc.gnu.org>
Received: by sourceware.org (Postfix, from userid 48)
 id 12CA43858005; Fri, 15 Oct 2021 06:54:07 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 12CA43858005
From: "siarhei.siamashka at gmail dot com" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug d/102765] New: [11 Regression] GDC11 stopped inlining library
 functions and lambdas used by a binary search one-liner code
Date: Fri, 15 Oct 2021 06:54:06 +0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: new
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: d
X-Bugzilla-Version: 11.2.0
X-Bugzilla-Keywords: 
X-Bugzilla-Severity: normal
X-Bugzilla-Who: siarhei.siamashka at gmail dot com
X-Bugzilla-Status: UNCONFIRMED
X-Bugzilla-Resolution: 
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: ibuclaw at gdcproject dot org
X-Bugzilla-Target-Milestone: ---
X-Bugzilla-Flags: 
X-Bugzilla-Changed-Fields: bug_id short_desc product version bug_status
 bug_severity priority component assigned_to reporter target_milestone
Message-ID: <bug-102765-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-BeenThere: gcc-bugs@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-bugs mailing list <gcc-bugs.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-bugs>,
 <mailto:gcc-bugs-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-bugs/>
List-Post: <mailto:gcc-bugs@gcc.gnu.org>
List-Help: <mailto:gcc-bugs-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-bugs>,
 <mailto:gcc-bugs-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Fri, 15 Oct 2021 06:54:07 -0000

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D102765

            Bug ID: 102765
           Summary: [11 Regression] GDC11 stopped inlining library
                    functions and lambdas used by a binary search
                    one-liner code
           Product: gcc
           Version: 11.2.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: d
          Assignee: ibuclaw at gdcproject dot org
          Reporter: siarhei.siamashka at gmail dot com
  Target Milestone: ---

The performance of the following simple binary search code regressed a lot
starting from GDC11:

/*******************************************************/
import std.algorithm, std.range, std.stdio, std.stdint;

// calculate integer square root using binary search
int64_t isqrt(int64_t x) {
  return iota(0, min(x, 3037000499) + 1)
         .map!(v =3D> (v * v > x))
         .assumeSorted.lowerBound(true)
         .length - 1;
}

// print the sum of 20M square roots
void main() { 20000000.iota.map!isqrt.sum.writeln; }
/*******************************************************/

$ gdc-6.3.0 -g -O3 -frelease -fno-bounds-check test.d && time ./a.out=20
59618479180

real    0m1.924s
user    0m1.924s
sys     0m0.000s

$ gdc-9.3.0 -g -O3 -frelease -fno-bounds-check test.d && time ./a.out=20
59618479180

real    0m2.100s
user    0m2.099s
sys     0m0.000s

$ gdc-10.3.0 -g -O3 -frelease -fno-bounds-check test.d && time ./a.out=20
59618479180

real    0m1.776s
user    0m1.776s
sys     0m0.000s

$ gdc-11.2.0 -g -O3 -frelease -fno-bounds-check test.d && time ./a.out=20
59618479180

real    0m6.889s
user    0m6.887s
sys     0m0.000s


My expectation is that the compilers should inline everything here and gene=
rate
code for a small and efficient binary search loop. But GDC11 stopped doing
this, as can be confirmed by running "perf record ./a.out && perf report":

    27.86%  a.out    a.out             [.]
_D3std5range__T11SortedRangeTSQBc9algorithm9iteration__T9MapResultS4test5is=
qrtFlZ9__lambda2TSQDnQDm__T4iotaTiTlZQkFilZ6ResultZQCsVAyaa5_61203c2062ZQFc=
__T18getTransitionIndexVEQGrQGq12SearchPolicyi3SQHoQHn__TQHkTQHaVQDha5_6120=
3c2062ZQIj3geqTbZQDlMFNaNbNiNfbZm
    15.02%  a.out    a.out             [.]
_D3std5range__T11SortedRangeTSQBc9algorithm9iteration__T9MapResultS4test5is=
qrtFlZ9__lambda2TSQDnQDm__T4iotaTiTlZQkFilZ6ResultZQCsVAyaa5_61203c2062ZQFc=
__T3geqTbTbZQjMFNaNbNiNfbbZb
    10.34%  a.out    a.out             [.]
_D3std9algorithm9iteration__T9MapResultS4test5isqrtFlZ9__lambda2TSQCm5range=
__T4iotaTiTlZQkFilZ6ResultZQCv7opIndexMFNaNbNiNfmZb
    10.31%  a.out    a.out             [.]
_D3std10functional__T9binaryFunVAyaa5_61203c2062VQra1_61VQza1_62Z__TQBvTbTb=
ZQCdFNaNbNiNfKbKbZb
     3.03%  a.out    a.out             [.]
_D3std5range__T4iotaTiTlZQkFilZ6Result7opIndexMNgFNaNbNiNfmZNgl
     2.34%  a.out    a.out             [.] 0x0000000000031a09
     2.28%  a.out    a.out             [.]
_D4core6atomic__T7casImplTmTxmTmZQqFNaNbNiNePOmxmmZb
     2.11%  a.out    a.out             [.]
_D3std5range__T11SortedRangeTSQBc9algorithm9iteration__T9MapResultS4test5is=
qrtFlZ9__lambda2TSQDnQDm__T4iotaTiTlZQkFilZ6ResultZQCsVAyaa5_61203c2062ZQFc=
7opSliceMFNaNbNiNfmmZSQGoQGn__TQGkTQGaVQCha5_61203c2062ZQHj
     2.02%  a.out    a.out             [.]
_D3std5range__T12assumeSortedVAyaa5_61203c2062TSQBu9algorithm9iteration__T9=
MapResultS4test5isqrtFlZ9__lambda2TSQEfQEe__T4iotaTiTlZQkFilZ6ResultZQCsZQF=
dFNaNbNiNfQEjZSQGhQGg__T11SortedRangeTQFlVQGga5_61203c2062ZQBj


Using either -fwhole-program or -flto cmdline options resolves the performa=
nce
problem and allows all of these functions to be inlined again:

$ gdc-11.2.0 -g -O3 -frelease -fno-bounds-check -flto test.d && time ./a.ou=
t=20
59618479180

real    0m2.085s
user    0m2.085s
sys     0m0.000s


But is this expected? Does GDC now require using -flto option for getting
reasonable performance starting from version 11? Or is this a real performa=
nce
regression and something can be done to improve the inlining behaviour?=