public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* Compiling c++ template is very slow.
@ 2018-03-09  2:33 Fis Trivial
  2018-03-09 10:32 ` Richard Biener
  0 siblings, 1 reply; 6+ messages in thread
From: Fis Trivial @ 2018-03-09  2:33 UTC (permalink / raw)
  To: gcc


I tried to use c++ template to generate code for a personal project, but
found that the compilation time needed with g++ is much
slower(exponentially) than with clang++.

This is a code snippet for testing purpose:

#include <iostream>

template <int a, int b>
struct v : v<a-1, b>, v<a, b-1>
{
  static int constexpr m = a;
  static int constexpr n = b;
  static int constexpr s = a + b;
};

template <int b>
struct v<1, b> : v<1, b-1> 
{
  static int constexpr m = 1;
  static int constexpr n = b;
  static int constexpr s = b + 1;
};
template <int a>
struct v<a, 1> : v<a-1, 1>
{
  static int constexpr m = a;
  static int constexpr n = 1;
  static int constexpr s = a + 1;
};
template <>
struct v<1, 1>
{
  static int constexpr m = 1;
  static int constexpr n = 1;
  static int constexpr s = 2;
};

int main()
{
  std::cout << v<7, 12>::s << std::endl;
  std::cout << v<4, 3>::s << std::endl;
};


Here is the time information:
---
$ time g++ -std=c++11 generate.cc -o bygcc

real	0m39.529s
user	0m39.418s
sys	0m0.053s

$ time clang++ -std=c++11 generate.cc -o byclang

real	0m0.310s
user	0m0.273s
sys	0m0.024s
---

When using greater value, gcc will require exponentially more time to
compile while the needed time from clang grows linearly. For example,
replacing the `main` function from above code to:

int main()
{
  // change 7 to 8 and drop <4, 3>
  std::cout << v<8, 12>::s << std::endl;
};

---
$ time g++ -std=c++11 generate.cc -o bygcc

real	5m20.755s
user	5m8.509s
sys	0m0.260s

$ time clang++ -std=c++11 generate.cc -o byclang

real	0m0.314s
user	0m0.281s
sys	0m0.020s
---


Just for fun, making the template parameter to 128:
---
$ time g++ -std=c++11 generate.cc -o bygcc
... not gonna happen :).

$ time clang++ -std=c++11 generate.cc -o byclang

real	0m18.549s
user	0m18.410s
sys	0m0.066s
---


I am currently running Fedora 27 with following version of gcc and
clang:

$ g++ --version
g++ (GCC) 7.3.1 20180303 (Red Hat 7.3.1-5)
Copyright (C) 2017 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

$ clang++ --version
clang version 5.0.1 (tags/RELEASE_501/final)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /usr/bin

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Compiling c++ template is very slow.
  2018-03-09  2:33 Compiling c++ template is very slow Fis Trivial
@ 2018-03-09 10:32 ` Richard Biener
  2018-03-09 11:51   ` Richard Biener
  0 siblings, 1 reply; 6+ messages in thread
From: Richard Biener @ 2018-03-09 10:32 UTC (permalink / raw)
  To: Fis Trivial; +Cc: gcc

On Fri, Mar 9, 2018 at 3:33 AM, Fis Trivial <ybbs.daans@hotmail.com> wrote:
>
> I tried to use c++ template to generate code for a personal project, but
> found that the compilation time needed with g++ is much
> slower(exponentially) than with clang++.
>
> This is a code snippet for testing purpose:
>
> #include <iostream>
>
> template <int a, int b>
> struct v : v<a-1, b>, v<a, b-1>
> {
>   static int constexpr m = a;
>   static int constexpr n = b;
>   static int constexpr s = a + b;
> };
>
> template <int b>
> struct v<1, b> : v<1, b-1>
> {
>   static int constexpr m = 1;
>   static int constexpr n = b;
>   static int constexpr s = b + 1;
> };
> template <int a>
> struct v<a, 1> : v<a-1, 1>
> {
>   static int constexpr m = a;
>   static int constexpr n = 1;
>   static int constexpr s = a + 1;
> };
> template <>
> struct v<1, 1>
> {
>   static int constexpr m = 1;
>   static int constexpr n = 1;
>   static int constexpr s = 2;
> };
>
> int main()
> {
>   std::cout << v<7, 12>::s << std::endl;
>   std::cout << v<4, 3>::s << std::endl;
> };
>
>
> Here is the time information:
> ---
> $ time g++ -std=c++11 generate.cc -o bygcc
>
> real    0m39.529s
> user    0m39.418s
> sys     0m0.053s
>
> $ time clang++ -std=c++11 generate.cc -o byclang
>
> real    0m0.310s
> user    0m0.273s
> sys     0m0.024s
> ---
>
> When using greater value, gcc will require exponentially more time to
> compile while the needed time from clang grows linearly. For example,
> replacing the `main` function from above code to:
>
> int main()
> {
>   // change 7 to 8 and drop <4, 3>
>   std::cout << v<8, 12>::s << std::endl;
> };
>
> ---
> $ time g++ -std=c++11 generate.cc -o bygcc
>
> real    5m20.755s
> user    5m8.509s
> sys     0m0.260s

I can confirm this and suggest to open a bugreport.

> $ time clang++ -std=c++11 generate.cc -o byclang
>
> real    0m0.314s
> user    0m0.281s
> sys     0m0.020s
> ---
>
>
> Just for fun, making the template parameter to 128:
> ---
> $ time g++ -std=c++11 generate.cc -o bygcc
> ... not gonna happen :).
>
> $ time clang++ -std=c++11 generate.cc -o byclang
>
> real    0m18.549s
> user    0m18.410s
> sys     0m0.066s
> ---
>
>
> I am currently running Fedora 27 with following version of gcc and
> clang:
>
> $ g++ --version
> g++ (GCC) 7.3.1 20180303 (Red Hat 7.3.1-5)
> Copyright (C) 2017 Free Software Foundation, Inc.
> This is free software; see the source for copying conditions.  There is NO
> warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
>
> $ clang++ --version
> clang version 5.0.1 (tags/RELEASE_501/final)
> Target: x86_64-unknown-linux-gnu
> Thread model: posix
> InstalledDir: /usr/bin

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Compiling c++ template is very slow.
  2018-03-09 10:32 ` Richard Biener
@ 2018-03-09 11:51   ` Richard Biener
  2018-03-09 13:08     ` Nathan Sidwell
  0 siblings, 1 reply; 6+ messages in thread
From: Richard Biener @ 2018-03-09 11:51 UTC (permalink / raw)
  To: Fis Trivial, Jason Merrill; +Cc: gcc

On Fri, Mar 9, 2018 at 11:32 AM, Richard Biener
<richard.guenther@gmail.com> wrote:
> On Fri, Mar 9, 2018 at 3:33 AM, Fis Trivial <ybbs.daans@hotmail.com> wrote:
>>
>> I tried to use c++ template to generate code for a personal project, but
>> found that the compilation time needed with g++ is much
>> slower(exponentially) than with clang++.
>>
>> This is a code snippet for testing purpose:
>>
>> #include <iostream>
>>
>> template <int a, int b>
>> struct v : v<a-1, b>, v<a, b-1>
>> {
>>   static int constexpr m = a;
>>   static int constexpr n = b;
>>   static int constexpr s = a + b;
>> };
>>
>> template <int b>
>> struct v<1, b> : v<1, b-1>
>> {
>>   static int constexpr m = 1;
>>   static int constexpr n = b;
>>   static int constexpr s = b + 1;
>> };
>> template <int a>
>> struct v<a, 1> : v<a-1, 1>
>> {
>>   static int constexpr m = a;
>>   static int constexpr n = 1;
>>   static int constexpr s = a + 1;
>> };
>> template <>
>> struct v<1, 1>
>> {
>>   static int constexpr m = 1;
>>   static int constexpr n = 1;
>>   static int constexpr s = 2;
>> };
>>
>> int main()
>> {
>>   std::cout << v<7, 12>::s << std::endl;
>>   std::cout << v<4, 3>::s << std::endl;
>> };
>>
>>
>> Here is the time information:
>> ---
>> $ time g++ -std=c++11 generate.cc -o bygcc
>>
>> real    0m39.529s
>> user    0m39.418s
>> sys     0m0.053s
>>
>> $ time clang++ -std=c++11 generate.cc -o byclang
>>
>> real    0m0.310s
>> user    0m0.273s
>> sys     0m0.024s
>> ---
>>
>> When using greater value, gcc will require exponentially more time to
>> compile while the needed time from clang grows linearly. For example,
>> replacing the `main` function from above code to:
>>
>> int main()
>> {
>>   // change 7 to 8 and drop <4, 3>
>>   std::cout << v<8, 12>::s << std::endl;
>> };
>>
>> ---
>> $ time g++ -std=c++11 generate.cc -o bygcc
>>
>> real    5m20.755s
>> user    5m8.509s
>> sys     0m0.260s
>
> I can confirm this and suggest to open a bugreport.

callgrind shows that propagate_binfo_offsets recursing
to self very many times is likely the issue.  Your templates
build a very deep inheritance chain and it seems that
the binfo offset propagation ends up being exponential here.

I would have expected that any bases have already correct
offsets so we don't need the recursion to bases of bases?

Ah, so in this case we have offset of 1 because sizeof is
always nonzero.  So the issue might just be that
BINFO_OFFSET is not relative to the parent?

Anyway,

Index: gcc/cp/class.c
===================================================================
--- gcc/cp/class.c      (revision 258380)
+++ gcc/cp/class.c      (working copy)
@@ -5755,6 +5755,9 @@ propagate_binfo_offsets (tree binfo, tre
   tree primary_binfo;
   tree base_binfo;

+  if (integer_zerop (offset))
+    return;
+
   /* Update BINFO's offset.  */
   BINFO_OFFSET (binfo)
     = fold_convert (sizetype,

doens't improve things.  We call this functions millions of times
creating millions of INTEGER_CSTs.  Can BINFO_OFFSET be
non-constant?

Richard.

>> $ time clang++ -std=c++11 generate.cc -o byclang
>>
>> real    0m0.314s
>> user    0m0.281s
>> sys     0m0.020s
>> ---
>>
>>
>> Just for fun, making the template parameter to 128:
>> ---
>> $ time g++ -std=c++11 generate.cc -o bygcc
>> ... not gonna happen :).
>>
>> $ time clang++ -std=c++11 generate.cc -o byclang
>>
>> real    0m18.549s
>> user    0m18.410s
>> sys     0m0.066s
>> ---
>>
>>
>> I am currently running Fedora 27 with following version of gcc and
>> clang:
>>
>> $ g++ --version
>> g++ (GCC) 7.3.1 20180303 (Red Hat 7.3.1-5)
>> Copyright (C) 2017 Free Software Foundation, Inc.
>> This is free software; see the source for copying conditions.  There is NO
>> warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
>>
>> $ clang++ --version
>> clang version 5.0.1 (tags/RELEASE_501/final)
>> Target: x86_64-unknown-linux-gnu
>> Thread model: posix
>> InstalledDir: /usr/bin

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Compiling c++ template is very slow.
  2018-03-09 11:51   ` Richard Biener
@ 2018-03-09 13:08     ` Nathan Sidwell
  2018-03-09 13:42       ` Richard Biener
  0 siblings, 1 reply; 6+ messages in thread
From: Nathan Sidwell @ 2018-03-09 13:08 UTC (permalink / raw)
  To: Richard Biener, Fis Trivial, Jason Merrill; +Cc: gcc

On 03/09/2018 06:51 AM, Richard Biener wrote:
> On Fri, Mar 9, 2018 at 11:32 AM, Richard Biener
> <richard.guenther@gmail.com> wrote:

> callgrind shows that propagate_binfo_offsets recursing
> to self very many times is likely the issue.  Your templates
> build a very deep inheritance chain and it seems that
> the binfo offset propagation ends up being exponential here.
> 
> I would have expected that any bases have already correct
> offsets so we don't need the recursion to bases of bases?
> 
> Ah, so in this case we have offset of 1 because sizeof is
> always nonzero.  So the issue might just be that
> BINFO_OFFSET is not relative to the parent?

Correct, it's relative to the complete object.

> doens't improve things.  We call this functions millions of times
> creating millions of INTEGER_CSTs.  Can BINFO_OFFSET be
> non-constant?

No. (if it's a vbase, it shows the offset in the complete object, IIRC, 
and there's other data to let code generation know some vtable 
inspection is needed when the dynamic type is unknown).

nathan

-- 
Nathan Sidwell

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Compiling c++ template is very slow.
  2018-03-09 13:08     ` Nathan Sidwell
@ 2018-03-09 13:42       ` Richard Biener
  2018-03-09 13:49         ` Nathan Sidwell
  0 siblings, 1 reply; 6+ messages in thread
From: Richard Biener @ 2018-03-09 13:42 UTC (permalink / raw)
  To: Nathan Sidwell; +Cc: Fis Trivial, Jason Merrill, gcc

On Fri, Mar 9, 2018 at 2:08 PM, Nathan Sidwell <nathan@acm.org> wrote:
> On 03/09/2018 06:51 AM, Richard Biener wrote:
>>
>> On Fri, Mar 9, 2018 at 11:32 AM, Richard Biener
>> <richard.guenther@gmail.com> wrote:
>
>
>> callgrind shows that propagate_binfo_offsets recursing
>> to self very many times is likely the issue.  Your templates
>> build a very deep inheritance chain and it seems that
>> the binfo offset propagation ends up being exponential here.
>>
>> I would have expected that any bases have already correct
>> offsets so we don't need the recursion to bases of bases?
>>
>> Ah, so in this case we have offset of 1 because sizeof is
>> always nonzero.  So the issue might just be that
>> BINFO_OFFSET is not relative to the parent?
>
>
> Correct, it's relative to the complete object.
>
>> doens't improve things.  We call this functions millions of times
>> creating millions of INTEGER_CSTs.  Can BINFO_OFFSET be
>> non-constant?
>
>
> No. (if it's a vbase, it shows the offset in the complete object, IIRC, and
> there's other data to let code generation know some vtable inspection is
> needed when the dynamic type is unknown).

So there's no multiple inheritance of classes with VLA members then I guess.
Or rather there's no such thing as VLA members ;)

If it's always constant I suggest to make it a non-tree ... I suspect while not
addressing the complexity it would improve compile-time a lot...

Oh, and in the testcase there's no virtual methods so nobody should look
at BINFO_OFFSET anyway?

Richard.

> nathan
>
> --
> Nathan Sidwell

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Compiling c++ template is very slow.
  2018-03-09 13:42       ` Richard Biener
@ 2018-03-09 13:49         ` Nathan Sidwell
  0 siblings, 0 replies; 6+ messages in thread
From: Nathan Sidwell @ 2018-03-09 13:49 UTC (permalink / raw)
  To: Richard Biener; +Cc: Fis Trivial, Jason Merrill, gcc

On 03/09/2018 08:42 AM, Richard Biener wrote:

>> No. (if it's a vbase, it shows the offset in the complete object, IIRC, and
>> there's other data to let code generation know some vtable inspection is
>> needed when the dynamic type is unknown).
> 
> So there's no multiple inheritance of classes with VLA members then I guess.
> Or rather there's no such thing as VLA members ;)

I guess.

> If it's always constant I suggest to make it a non-tree ... I suspect while not
> addressing the complexity it would improve compile-time a lot...

Not disagreeing -- I think there's a bunch of stuff in BINFOs that 
either don't need to be trees, or don't even need to be there.

> Oh, and in the testcase there's no virtual methods so nobody should look
> at BINFO_OFFSET anyway?

It's used for conversions to bases.  (So it's not immediately clear to 
me that making it not a tree would win -- you'd be pushing the 
int->INTEGER_CST conversions into each base conversion generation. 
Don't forget, small integer_csts are commonized)

nathan

-- 
Nathan Sidwell

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2018-03-09 13:49 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-03-09  2:33 Compiling c++ template is very slow Fis Trivial
2018-03-09 10:32 ` Richard Biener
2018-03-09 11:51   ` Richard Biener
2018-03-09 13:08     ` Nathan Sidwell
2018-03-09 13:42       ` Richard Biener
2018-03-09 13:49         ` Nathan Sidwell

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).