public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug d/102765] New: [11 Regression] GDC11 stopped inlining library functions and lambdas used by a binary search one-liner code
@ 2021-10-15  6:54 siarhei.siamashka at gmail dot com
  2021-11-05 13:33 ` [Bug d/102765] " rguenth at gcc dot gnu.org
                   ` (7 more replies)
  0 siblings, 8 replies; 9+ messages in thread
From: siarhei.siamashka at gmail dot com @ 2021-10-15  6:54 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102765

            Bug ID: 102765
           Summary: [11 Regression] GDC11 stopped inlining library
                    functions and lambdas used by a binary search
                    one-liner code
           Product: gcc
           Version: 11.2.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: d
          Assignee: ibuclaw at gdcproject dot org
          Reporter: siarhei.siamashka at gmail dot com
  Target Milestone: ---

The performance of the following simple binary search code regressed a lot
starting from GDC11:

/*******************************************************/
import std.algorithm, std.range, std.stdio, std.stdint;

// calculate integer square root using binary search
int64_t isqrt(int64_t x) {
  return iota(0, min(x, 3037000499) + 1)
         .map!(v => (v * v > x))
         .assumeSorted.lowerBound(true)
         .length - 1;
}

// print the sum of 20M square roots
void main() { 20000000.iota.map!isqrt.sum.writeln; }
/*******************************************************/

$ gdc-6.3.0 -g -O3 -frelease -fno-bounds-check test.d && time ./a.out 
59618479180

real    0m1.924s
user    0m1.924s
sys     0m0.000s

$ gdc-9.3.0 -g -O3 -frelease -fno-bounds-check test.d && time ./a.out 
59618479180

real    0m2.100s
user    0m2.099s
sys     0m0.000s

$ gdc-10.3.0 -g -O3 -frelease -fno-bounds-check test.d && time ./a.out 
59618479180

real    0m1.776s
user    0m1.776s
sys     0m0.000s

$ gdc-11.2.0 -g -O3 -frelease -fno-bounds-check test.d && time ./a.out 
59618479180

real    0m6.889s
user    0m6.887s
sys     0m0.000s


My expectation is that the compilers should inline everything here and generate
code for a small and efficient binary search loop. But GDC11 stopped doing
this, as can be confirmed by running "perf record ./a.out && perf report":

    27.86%  a.out    a.out             [.]
_D3std5range__T11SortedRangeTSQBc9algorithm9iteration__T9MapResultS4test5isqrtFlZ9__lambda2TSQDnQDm__T4iotaTiTlZQkFilZ6ResultZQCsVAyaa5_61203c2062ZQFc__T18getTransitionIndexVEQGrQGq12SearchPolicyi3SQHoQHn__TQHkTQHaVQDha5_61203c2062ZQIj3geqTbZQDlMFNaNbNiNfbZm
    15.02%  a.out    a.out             [.]
_D3std5range__T11SortedRangeTSQBc9algorithm9iteration__T9MapResultS4test5isqrtFlZ9__lambda2TSQDnQDm__T4iotaTiTlZQkFilZ6ResultZQCsVAyaa5_61203c2062ZQFc__T3geqTbTbZQjMFNaNbNiNfbbZb
    10.34%  a.out    a.out             [.]
_D3std9algorithm9iteration__T9MapResultS4test5isqrtFlZ9__lambda2TSQCm5range__T4iotaTiTlZQkFilZ6ResultZQCv7opIndexMFNaNbNiNfmZb
    10.31%  a.out    a.out             [.]
_D3std10functional__T9binaryFunVAyaa5_61203c2062VQra1_61VQza1_62Z__TQBvTbTbZQCdFNaNbNiNfKbKbZb
     3.03%  a.out    a.out             [.]
_D3std5range__T4iotaTiTlZQkFilZ6Result7opIndexMNgFNaNbNiNfmZNgl
     2.34%  a.out    a.out             [.] 0x0000000000031a09
     2.28%  a.out    a.out             [.]
_D4core6atomic__T7casImplTmTxmTmZQqFNaNbNiNePOmxmmZb
     2.11%  a.out    a.out             [.]
_D3std5range__T11SortedRangeTSQBc9algorithm9iteration__T9MapResultS4test5isqrtFlZ9__lambda2TSQDnQDm__T4iotaTiTlZQkFilZ6ResultZQCsVAyaa5_61203c2062ZQFc7opSliceMFNaNbNiNfmmZSQGoQGn__TQGkTQGaVQCha5_61203c2062ZQHj
     2.02%  a.out    a.out             [.]
_D3std5range__T12assumeSortedVAyaa5_61203c2062TSQBu9algorithm9iteration__T9MapResultS4test5isqrtFlZ9__lambda2TSQEfQEe__T4iotaTiTlZQkFilZ6ResultZQCsZQFdFNaNbNiNfQEjZSQGhQGg__T11SortedRangeTQFlVQGga5_61203c2062ZQBj


Using either -fwhole-program or -flto cmdline options resolves the performance
problem and allows all of these functions to be inlined again:

$ gdc-11.2.0 -g -O3 -frelease -fno-bounds-check -flto test.d && time ./a.out 
59618479180

real    0m2.085s
user    0m2.085s
sys     0m0.000s


But is this expected? Does GDC now require using -flto option for getting
reasonable performance starting from version 11? Or is this a real performance
regression and something can be done to improve the inlining behaviour?

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2023-05-29 10:05 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-15  6:54 [Bug d/102765] New: [11 Regression] GDC11 stopped inlining library functions and lambdas used by a binary search one-liner code siarhei.siamashka at gmail dot com
2021-11-05 13:33 ` [Bug d/102765] " rguenth at gcc dot gnu.org
2021-11-05 13:47 ` ibuclaw at gdcproject dot org
2021-12-09  2:33 ` siarhei.siamashka at gmail dot com
2022-02-01  3:47 ` siarhei.siamashka at gmail dot com
2022-04-21  7:50 ` rguenth at gcc dot gnu.org
2022-08-09 19:27 ` ibuclaw at gdcproject dot org
2022-10-13  5:45 ` ibuclaw at gdcproject dot org
2023-05-29 10:05 ` jakub at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).