public inbox for gcc-help@gcc.gnu.org
 help / color / mirror / Atom feed
* stdc++ issue: extremely long compile time with large number of string literals
@ 2020-07-09 20:31 Mandeep Sandhu
  2020-07-09 20:49 ` Marc Glisse
  2020-07-09 23:49 ` Jonathan Wakely
  0 siblings, 2 replies; 7+ messages in thread
From: Mandeep Sandhu @ 2020-07-09 20:31 UTC (permalink / raw)
  To: gcc-help

Hi All,

I have an strange (to me) issue, where trying to compile a header
which has a single "std::unordered_set<std::string>" initialized with
around 50K short strings is taking forever.

The set is declared as:
const std::unordered_set<std::string> my_set ({"item1", "item2", ....});

(The header is auto-generated using a script which takes a JSON array
and puts its elements in an unordered_set)

I understand that creation of many strings has an overhead, but this
issue seems to affect compilation time, not runtime.

Can someone explain to me why it takes such a long time to compile?
Keeping the strings to under 5K, makes the program compile in about 8
secs.

I'm using the following compiler on Linux:
$ g++ --version
g++ (Ubuntu 9.2.1-9ubuntu2) 9.2.1 20191008
Copyright (C) 2019 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Thanks for your time.

-mandeep

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: stdc++ issue: extremely long compile time with large number of string literals
  2020-07-09 20:31 stdc++ issue: extremely long compile time with large number of string literals Mandeep Sandhu
@ 2020-07-09 20:49 ` Marc Glisse
  2020-07-09 21:27   ` Mandeep Sandhu
  2020-07-09 23:49 ` Jonathan Wakely
  1 sibling, 1 reply; 7+ messages in thread
From: Marc Glisse @ 2020-07-09 20:49 UTC (permalink / raw)
  To: Mandeep Sandhu; +Cc: gcc-help

On Thu, 9 Jul 2020, Mandeep Sandhu via Gcc-help wrote:

> I have an strange (to me) issue, where trying to compile a header
> which has a single "std::unordered_set<std::string>" initialized with
> around 50K short strings is taking forever.
>
> The set is declared as:
> const std::unordered_set<std::string> my_set ({"item1", "item2", ....});
>
> (The header is auto-generated using a script which takes a JSON array
> and puts its elements in an unordered_set)
>
> I understand that creation of many strings has an overhead, but this
> issue seems to affect compilation time, not runtime.
>
> Can someone explain to me why it takes such a long time to compile?
> Keeping the strings to under 5K, makes the program compile in about 8
> secs.
>
> I'm using the following compiler on Linux:
> $ g++ --version
> g++ (Ubuntu 9.2.1-9ubuntu2) 9.2.1 20191008
> Copyright (C) 2019 Free Software Foundation, Inc.
> This is free software; see the source for copying conditions.  There is NO
> warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

You didn't say what flags you are using. For anything autogenerated like 
that, going above -O1 is always risky.

One first thing to try is -ftime-report

At -O0
  phase opt and generate             :   3.46 ( 84%)   0.15 ( 45%)   3.62 ( 81%)  188385 kB ( 70%)

without a specific pass that stands out.

At -O1
  phase opt and generate             :  87.96 ( 99%)   6.57 ( 97%)  94.56 ( 99%)24503776 kB (100%)
  callgraph ipa passes               :  85.93 ( 97%)   6.52 ( 96%)  92.47 ( 97%)24413253 kB ( 99%)
  tree eh                            :  84.32 ( 95%)   6.43 ( 95%)  90.78 ( 95%)24340697 kB ( 99%)

so, it looks related to the optimization of exceptions...

-- 
Marc Glisse

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: stdc++ issue: extremely long compile time with large number of string literals
  2020-07-09 20:49 ` Marc Glisse
@ 2020-07-09 21:27   ` Mandeep Sandhu
  0 siblings, 0 replies; 7+ messages in thread
From: Mandeep Sandhu @ 2020-07-09 21:27 UTC (permalink / raw)
  To: gcc-help

> You didn't say what flags you are using. For anything autogenerated like
> that, going above -O1 is always risky.

I run it with just -std=c++11 (not optimization flag, so I guess it
defaults to -O0)
>
> One first thing to try is -ftime-report
>
> At -O0
>   phase opt and generate             :   3.46 ( 84%)   0.15 ( 45%)   3.62 ( 81%)  188385 kB ( 70%)

With 5000 strings, & -O0 I get the following in the time report:

Time variable                                   usr           sys
    wall               GGC
...
phase opt and generate             :   4.12 ( 89%)   0.18 ( 46%)
4.30 ( 86%)   99680 kB ( 58%)
...
 expand vars                        :   2.41 ( 52%)   0.00 (  0%)
2.42 ( 48%)     699 kB (  0%)
...

TBH, not sure what these values mean. Someone mentioned that overload
resolution might be causing the excessive delay, but I can't find any
metrics to verify that.

-mandeep


>
> without a specific pass that stands out.
>
> At -O1
>   phase opt and generate             :  87.96 ( 99%)   6.57 ( 97%)  94.56 ( 99%)24503776 kB (100%)
>   callgraph ipa passes               :  85.93 ( 97%)   6.52 ( 96%)  92.47 ( 97%)24413253 kB ( 99%)
>   tree eh                            :  84.32 ( 95%)   6.43 ( 95%)  90.78 ( 95%)24340697 kB ( 99%)
>
> so, it looks related to the optimization of exceptions...
>
> --
> Marc Glisse

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: stdc++ issue: extremely long compile time with large number of string literals
  2020-07-09 20:31 stdc++ issue: extremely long compile time with large number of string literals Mandeep Sandhu
  2020-07-09 20:49 ` Marc Glisse
@ 2020-07-09 23:49 ` Jonathan Wakely
       [not found]   ` <CAC+QLdRqjTe4YsUpQRcLG7tpxGda0oh6H788ORZa5MQR3NqRbw@mail.gmail.com>
  1 sibling, 1 reply; 7+ messages in thread
From: Jonathan Wakely @ 2020-07-09 23:49 UTC (permalink / raw)
  To: Mandeep Sandhu; +Cc: gcc-help

On Thu, 9 Jul 2020 at 21:34, Mandeep Sandhu via Gcc-help
<gcc-help@gcc.gnu.org> wrote:
>
> Hi All,
>
> I have an strange (to me) issue, where trying to compile a header
> which has a single "std::unordered_set<std::string>" initialized with
> around 50K short strings is taking forever.
>
> The set is declared as:
> const std::unordered_set<std::string> my_set ({"item1", "item2", ....});

This constructs an enormous array of std::string. If construction of
any element of the array throws, then all previous elements need to be
destroyed.

There are known performance problems with the way G++ handles cases
like this, because the exception handling creates a huge amount of
overhead. You can find several bugs in bugzilla related to the
construction of large arrays of non-trivial types.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: stdc++ issue: extremely long compile time with large number of string literals
       [not found]     ` <CAH6eHdTRiwA3t1K3okfVc+umPM8PJfbePaO7AHdLz+SYPOU9XQ@mail.gmail.com>
@ 2020-07-10 19:36       ` Mandeep Sandhu
  2020-07-10 20:25         ` Jonathan Wakely
  0 siblings, 1 reply; 7+ messages in thread
From: Mandeep Sandhu @ 2020-07-10 19:36 UTC (permalink / raw)
  To: Jonathan Wakely, gcc-help

> Please reply to the mailing list, not just to me.

Sorry, this must've happened by mistake (I almost always to a
Reply-all when replying to messages from mailing lists)

> >
> > Thanks for the explanation. Still surprising that the performance
> > degraded so much! Will this destruction loop run in quadratic time?
>
> No, only the compilation time is quadratic. The runtime performance
> should be linear in the number of elements.

Right, my question was poorly worded. When I said "Will this
destruction loop run in quadratic time", I meant the time the compiler
takes to create the exception handling code for N-1 elements.

> I don't think you mean "raw string literals", do you?
> "Raw string" has a very specific meaning:
> https://en.cppreference.com/w/cpp/language/string_literal

You're right, no I did not mean the one mentioned above (with the R
prefix). Whats the correct term for a const char* (which is what I
meant)?

>
> Your email subject says "large number of string literals" but what
> your code actually does is create a large
> std::initializer_list<std::string>, which is not the same.
>
> If you do create string literals and then construct from that it
> compiles much faster. This is similar to your original:

Just to be sure I follow you, the following invokes
std::initializer_list<std::string>
std::unordered_set<std::string> s ({"a", "b", ...}); // this is what I
reported the issue with

but this does not?
std::unordered_set<std::string> s {"a", "b", ...}; (and is faster?)

>
> $ printf '#include <unordered_set>\n#include
> <string>\nstd::unordered_set<std::string> s{\n' > init.C ; seq
> --format='"%.0f",' 1 50000 >> init.C ; printf '};\n' >> init.C
> $ time g++ init.C -c
>
> real    1m57.054s
> user    1m55.370s
> sys     0m1.117s

On my machine, this too compiles forever (its been compiling for 20+
mins. I'm running on macbook with quad-core i7 & 8GB RAM)

>
> This creates an array of string literals and then constructs from that:
>
> $ printf '#include <unordered_set>\n#include
> <string>\nstd::unordered_set<std::string> make_set() {\nconst char*
> arr[] = {\n' > init2.C ; seq --format='"%.0f",' 1 50000 >> init2.C ;
> printf '};\nreturn std::unordered_set<std::string>{std::begin(arr),
> std::end(arr)};\n}\nauto s = make_set();\n' >> init2.C
> tmp$ time g++ init2.C -c
>
> real    0m0.662s
> user    0m0.591s
> sys     0m0.069s
>
> As you can see, this is almost instant.

Indeed it is! Although I have no idea why. Won't this form also
require the same kind of exception handling code? when its iterating
"arr", won't the compiler have to still ensure that prior objects are
deleted if constructing a string obj from the current element fails?

Or maybe the slowness comes from the way initializer_list specifically
handles this?

Thanks for your time.

-mandeep

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: stdc++ issue: extremely long compile time with large number of string literals
  2020-07-10 19:36       ` Mandeep Sandhu
@ 2020-07-10 20:25         ` Jonathan Wakely
  2020-07-11  6:27           ` Mandeep Sandhu
  0 siblings, 1 reply; 7+ messages in thread
From: Jonathan Wakely @ 2020-07-10 20:25 UTC (permalink / raw)
  To: Mandeep Sandhu; +Cc: gcc-help

On Fri, 10 Jul 2020 at 20:36, Mandeep Sandhu
<mandeepsandhu.chd@gmail.com> wrote:
>
> > Please reply to the mailing list, not just to me.
>
> Sorry, this must've happened by mistake (I almost always to a
> Reply-all when replying to messages from mailing lists)
>
> > >
> > > Thanks for the explanation. Still surprising that the performance
> > > degraded so much! Will this destruction loop run in quadratic time?
> >
> > No, only the compilation time is quadratic. The runtime performance
> > should be linear in the number of elements.
>
> Right, my question was poorly worded. When I said "Will this
> destruction loop run in quadratic time", I meant the time the compiler
> takes to create the exception handling code for N-1 elements.
>
> > I don't think you mean "raw string literals", do you?
> > "Raw string" has a very specific meaning:
> > https://en.cppreference.com/w/cpp/language/string_literal
>
> You're right, no I did not mean the one mentioned above (with the R
> prefix). Whats the correct term for a const char* (which is what I
> meant)?

Just a string literal. A raw string literal is the special kind with
the R prefix.

> >
> > Your email subject says "large number of string literals" but what
> > your code actually does is create a large
> > std::initializer_list<std::string>, which is not the same.
> >
> > If you do create string literals and then construct from that it
> > compiles much faster. This is similar to your original:
>
> Just to be sure I follow you, the following invokes
> std::initializer_list<std::string>
> std::unordered_set<std::string> s ({"a", "b", ...}); // this is what I
> reported the issue with
>
> but this does not?
> std::unordered_set<std::string> s {"a", "b", ...}; (and is faster?)

They're the same.

> >
> > $ printf '#include <unordered_set>\n#include
> > <string>\nstd::unordered_set<std::string> s{\n' > init.C ; seq
> > --format='"%.0f",' 1 50000 >> init.C ; printf '};\n' >> init.C
> > $ time g++ init.C -c
> >
> > real    1m57.054s
> > user    1m55.370s
> > sys     0m1.117s
>
> On my machine, this too compiles forever (its been compiling for 20+
> mins. I'm running on macbook with quad-core i7 & 8GB RAM)

Yes, that's the slow version.

> >
> > This creates an array of string literals and then constructs from that:
> >
> > $ printf '#include <unordered_set>\n#include
> > <string>\nstd::unordered_set<std::string> make_set() {\nconst char*
> > arr[] = {\n' > init2.C ; seq --format='"%.0f",' 1 50000 >> init2.C ;
> > printf '};\nreturn std::unordered_set<std::string>{std::begin(arr),
> > std::end(arr)};\n}\nauto s = make_set();\n' >> init2.C
> > tmp$ time g++ init2.C -c
> >
> > real    0m0.662s
> > user    0m0.591s
> > sys     0m0.069s
> >
> > As you can see, this is almost instant.
>
> Indeed it is! Although I have no idea why. Won't this form also
> require the same kind of exception handling code? when its iterating
> "arr", won't the compiler have to still ensure that prior objects are
> deleted if constructing a string obj from the current element fails?
>
> Or maybe the slowness comes from the way initializer_list specifically
> handles this?

It's the array of std::string that the std::initializer_list refers to
that causes the problem.

In the faster code there is no array of std::string, it's an array of
const char*. Initializing a const char* can't throw an exception, and
destroying it is a no-op, so there's no exception handling code needed
for the array.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: stdc++ issue: extremely long compile time with large number of string literals
  2020-07-10 20:25         ` Jonathan Wakely
@ 2020-07-11  6:27           ` Mandeep Sandhu
  0 siblings, 0 replies; 7+ messages in thread
From: Mandeep Sandhu @ 2020-07-11  6:27 UTC (permalink / raw)
  To: Jonathan Wakely; +Cc: gcc-help

>
> It's the array of std::string that the std::initializer_list refers to
> that causes the problem.
>
> In the faster code there is no array of std::string, it's an array of
> const char*. Initializing a const char* can't throw an exception, and
> destroying it is a no-op, so there's no exception handling code needed
> for the array.

Makes sense. Thanks for the explanation. I guess this is something to
keep in mind when using initializer_list with a large number of
non-trivial types.

-mandeep

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2020-07-11  6:27 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-09 20:31 stdc++ issue: extremely long compile time with large number of string literals Mandeep Sandhu
2020-07-09 20:49 ` Marc Glisse
2020-07-09 21:27   ` Mandeep Sandhu
2020-07-09 23:49 ` Jonathan Wakely
     [not found]   ` <CAC+QLdRqjTe4YsUpQRcLG7tpxGda0oh6H788ORZa5MQR3NqRbw@mail.gmail.com>
     [not found]     ` <CAH6eHdTRiwA3t1K3okfVc+umPM8PJfbePaO7AHdLz+SYPOU9XQ@mail.gmail.com>
2020-07-10 19:36       ` Mandeep Sandhu
2020-07-10 20:25         ` Jonathan Wakely
2020-07-11  6:27           ` Mandeep Sandhu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).