public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
From: "siarhei.siamashka at gmail dot com" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug d/102765] New: [11 Regression] GDC11 stopped inlining library functions and lambdas used by a binary search one-liner code
Date: Fri, 15 Oct 2021 06:54:06 +0000	[thread overview]
Message-ID: <bug-102765-4@http.gcc.gnu.org/bugzilla/> (raw)

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102765

            Bug ID: 102765
           Summary: [11 Regression] GDC11 stopped inlining library
                    functions and lambdas used by a binary search
                    one-liner code
           Product: gcc
           Version: 11.2.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: d
          Assignee: ibuclaw at gdcproject dot org
          Reporter: siarhei.siamashka at gmail dot com
  Target Milestone: ---

The performance of the following simple binary search code regressed a lot
starting from GDC11:

/*******************************************************/
import std.algorithm, std.range, std.stdio, std.stdint;

// calculate integer square root using binary search
int64_t isqrt(int64_t x) {
  return iota(0, min(x, 3037000499) + 1)
         .map!(v => (v * v > x))
         .assumeSorted.lowerBound(true)
         .length - 1;
}

// print the sum of 20M square roots
void main() { 20000000.iota.map!isqrt.sum.writeln; }
/*******************************************************/

$ gdc-6.3.0 -g -O3 -frelease -fno-bounds-check test.d && time ./a.out 
59618479180

real    0m1.924s
user    0m1.924s
sys     0m0.000s

$ gdc-9.3.0 -g -O3 -frelease -fno-bounds-check test.d && time ./a.out 
59618479180

real    0m2.100s
user    0m2.099s
sys     0m0.000s

$ gdc-10.3.0 -g -O3 -frelease -fno-bounds-check test.d && time ./a.out 
59618479180

real    0m1.776s
user    0m1.776s
sys     0m0.000s

$ gdc-11.2.0 -g -O3 -frelease -fno-bounds-check test.d && time ./a.out 
59618479180

real    0m6.889s
user    0m6.887s
sys     0m0.000s


My expectation is that the compilers should inline everything here and generate
code for a small and efficient binary search loop. But GDC11 stopped doing
this, as can be confirmed by running "perf record ./a.out && perf report":

    27.86%  a.out    a.out             [.]
_D3std5range__T11SortedRangeTSQBc9algorithm9iteration__T9MapResultS4test5isqrtFlZ9__lambda2TSQDnQDm__T4iotaTiTlZQkFilZ6ResultZQCsVAyaa5_61203c2062ZQFc__T18getTransitionIndexVEQGrQGq12SearchPolicyi3SQHoQHn__TQHkTQHaVQDha5_61203c2062ZQIj3geqTbZQDlMFNaNbNiNfbZm
    15.02%  a.out    a.out             [.]
_D3std5range__T11SortedRangeTSQBc9algorithm9iteration__T9MapResultS4test5isqrtFlZ9__lambda2TSQDnQDm__T4iotaTiTlZQkFilZ6ResultZQCsVAyaa5_61203c2062ZQFc__T3geqTbTbZQjMFNaNbNiNfbbZb
    10.34%  a.out    a.out             [.]
_D3std9algorithm9iteration__T9MapResultS4test5isqrtFlZ9__lambda2TSQCm5range__T4iotaTiTlZQkFilZ6ResultZQCv7opIndexMFNaNbNiNfmZb
    10.31%  a.out    a.out             [.]
_D3std10functional__T9binaryFunVAyaa5_61203c2062VQra1_61VQza1_62Z__TQBvTbTbZQCdFNaNbNiNfKbKbZb
     3.03%  a.out    a.out             [.]
_D3std5range__T4iotaTiTlZQkFilZ6Result7opIndexMNgFNaNbNiNfmZNgl
     2.34%  a.out    a.out             [.] 0x0000000000031a09
     2.28%  a.out    a.out             [.]
_D4core6atomic__T7casImplTmTxmTmZQqFNaNbNiNePOmxmmZb
     2.11%  a.out    a.out             [.]
_D3std5range__T11SortedRangeTSQBc9algorithm9iteration__T9MapResultS4test5isqrtFlZ9__lambda2TSQDnQDm__T4iotaTiTlZQkFilZ6ResultZQCsVAyaa5_61203c2062ZQFc7opSliceMFNaNbNiNfmmZSQGoQGn__TQGkTQGaVQCha5_61203c2062ZQHj
     2.02%  a.out    a.out             [.]
_D3std5range__T12assumeSortedVAyaa5_61203c2062TSQBu9algorithm9iteration__T9MapResultS4test5isqrtFlZ9__lambda2TSQEfQEe__T4iotaTiTlZQkFilZ6ResultZQCsZQFdFNaNbNiNfQEjZSQGhQGg__T11SortedRangeTQFlVQGga5_61203c2062ZQBj


Using either -fwhole-program or -flto cmdline options resolves the performance
problem and allows all of these functions to be inlined again:

$ gdc-11.2.0 -g -O3 -frelease -fno-bounds-check -flto test.d && time ./a.out 
59618479180

real    0m2.085s
user    0m2.085s
sys     0m0.000s


But is this expected? Does GDC now require using -flto option for getting
reasonable performance starting from version 11? Or is this a real performance
regression and something can be done to improve the inlining behaviour?

             reply	other threads:[~2021-10-15  6:54 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-10-15  6:54 siarhei.siamashka at gmail dot com [this message]
2021-11-05 13:33 ` [Bug d/102765] " rguenth at gcc dot gnu.org
2021-11-05 13:47 ` ibuclaw at gdcproject dot org
2021-12-09  2:33 ` siarhei.siamashka at gmail dot com
2022-02-01  3:47 ` siarhei.siamashka at gmail dot com
2022-04-21  7:50 ` rguenth at gcc dot gnu.org
2022-08-09 19:27 ` ibuclaw at gdcproject dot org
2022-10-13  5:45 ` ibuclaw at gdcproject dot org
2023-05-29 10:05 ` jakub at gcc dot gnu.org

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bug-102765-4@http.gcc.gnu.org/bugzilla/ \
    --to=gcc-bugzilla@gcc.gnu.org \
    --cc=gcc-bugs@gcc.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).