* stdc++ issue: extremely long compile time with large number of string literals @ 2020-07-09 20:31 Mandeep Sandhu 2020-07-09 20:49 ` Marc Glisse 2020-07-09 23:49 ` Jonathan Wakely 0 siblings, 2 replies; 7+ messages in thread From: Mandeep Sandhu @ 2020-07-09 20:31 UTC (permalink / raw) To: gcc-help Hi All, I have an strange (to me) issue, where trying to compile a header which has a single "std::unordered_set<std::string>" initialized with around 50K short strings is taking forever. The set is declared as: const std::unordered_set<std::string> my_set ({"item1", "item2", ....}); (The header is auto-generated using a script which takes a JSON array and puts its elements in an unordered_set) I understand that creation of many strings has an overhead, but this issue seems to affect compilation time, not runtime. Can someone explain to me why it takes such a long time to compile? Keeping the strings to under 5K, makes the program compile in about 8 secs. I'm using the following compiler on Linux: $ g++ --version g++ (Ubuntu 9.2.1-9ubuntu2) 9.2.1 20191008 Copyright (C) 2019 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. Thanks for your time. -mandeep ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: stdc++ issue: extremely long compile time with large number of string literals 2020-07-09 20:31 stdc++ issue: extremely long compile time with large number of string literals Mandeep Sandhu @ 2020-07-09 20:49 ` Marc Glisse 2020-07-09 21:27 ` Mandeep Sandhu 2020-07-09 23:49 ` Jonathan Wakely 1 sibling, 1 reply; 7+ messages in thread From: Marc Glisse @ 2020-07-09 20:49 UTC (permalink / raw) To: Mandeep Sandhu; +Cc: gcc-help On Thu, 9 Jul 2020, Mandeep Sandhu via Gcc-help wrote: > I have an strange (to me) issue, where trying to compile a header > which has a single "std::unordered_set<std::string>" initialized with > around 50K short strings is taking forever. > > The set is declared as: > const std::unordered_set<std::string> my_set ({"item1", "item2", ....}); > > (The header is auto-generated using a script which takes a JSON array > and puts its elements in an unordered_set) > > I understand that creation of many strings has an overhead, but this > issue seems to affect compilation time, not runtime. > > Can someone explain to me why it takes such a long time to compile? > Keeping the strings to under 5K, makes the program compile in about 8 > secs. > > I'm using the following compiler on Linux: > $ g++ --version > g++ (Ubuntu 9.2.1-9ubuntu2) 9.2.1 20191008 > Copyright (C) 2019 Free Software Foundation, Inc. > This is free software; see the source for copying conditions. There is NO > warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. You didn't say what flags you are using. For anything autogenerated like that, going above -O1 is always risky. One first thing to try is -ftime-report At -O0 phase opt and generate : 3.46 ( 84%) 0.15 ( 45%) 3.62 ( 81%) 188385 kB ( 70%) without a specific pass that stands out. At -O1 phase opt and generate : 87.96 ( 99%) 6.57 ( 97%) 94.56 ( 99%)24503776 kB (100%) callgraph ipa passes : 85.93 ( 97%) 6.52 ( 96%) 92.47 ( 97%)24413253 kB ( 99%) tree eh : 84.32 ( 95%) 6.43 ( 95%) 90.78 ( 95%)24340697 kB ( 99%) so, it looks related to the optimization of exceptions... -- Marc Glisse ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: stdc++ issue: extremely long compile time with large number of string literals 2020-07-09 20:49 ` Marc Glisse @ 2020-07-09 21:27 ` Mandeep Sandhu 0 siblings, 0 replies; 7+ messages in thread From: Mandeep Sandhu @ 2020-07-09 21:27 UTC (permalink / raw) To: gcc-help > You didn't say what flags you are using. For anything autogenerated like > that, going above -O1 is always risky. I run it with just -std=c++11 (not optimization flag, so I guess it defaults to -O0) > > One first thing to try is -ftime-report > > At -O0 > phase opt and generate : 3.46 ( 84%) 0.15 ( 45%) 3.62 ( 81%) 188385 kB ( 70%) With 5000 strings, & -O0 I get the following in the time report: Time variable usr sys wall GGC ... phase opt and generate : 4.12 ( 89%) 0.18 ( 46%) 4.30 ( 86%) 99680 kB ( 58%) ... expand vars : 2.41 ( 52%) 0.00 ( 0%) 2.42 ( 48%) 699 kB ( 0%) ... TBH, not sure what these values mean. Someone mentioned that overload resolution might be causing the excessive delay, but I can't find any metrics to verify that. -mandeep > > without a specific pass that stands out. > > At -O1 > phase opt and generate : 87.96 ( 99%) 6.57 ( 97%) 94.56 ( 99%)24503776 kB (100%) > callgraph ipa passes : 85.93 ( 97%) 6.52 ( 96%) 92.47 ( 97%)24413253 kB ( 99%) > tree eh : 84.32 ( 95%) 6.43 ( 95%) 90.78 ( 95%)24340697 kB ( 99%) > > so, it looks related to the optimization of exceptions... > > -- > Marc Glisse ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: stdc++ issue: extremely long compile time with large number of string literals 2020-07-09 20:31 stdc++ issue: extremely long compile time with large number of string literals Mandeep Sandhu 2020-07-09 20:49 ` Marc Glisse @ 2020-07-09 23:49 ` Jonathan Wakely [not found] ` <CAC+QLdRqjTe4YsUpQRcLG7tpxGda0oh6H788ORZa5MQR3NqRbw@mail.gmail.com> 1 sibling, 1 reply; 7+ messages in thread From: Jonathan Wakely @ 2020-07-09 23:49 UTC (permalink / raw) To: Mandeep Sandhu; +Cc: gcc-help On Thu, 9 Jul 2020 at 21:34, Mandeep Sandhu via Gcc-help <gcc-help@gcc.gnu.org> wrote: > > Hi All, > > I have an strange (to me) issue, where trying to compile a header > which has a single "std::unordered_set<std::string>" initialized with > around 50K short strings is taking forever. > > The set is declared as: > const std::unordered_set<std::string> my_set ({"item1", "item2", ....}); This constructs an enormous array of std::string. If construction of any element of the array throws, then all previous elements need to be destroyed. There are known performance problems with the way G++ handles cases like this, because the exception handling creates a huge amount of overhead. You can find several bugs in bugzilla related to the construction of large arrays of non-trivial types. ^ permalink raw reply [flat|nested] 7+ messages in thread
[parent not found: <CAC+QLdRqjTe4YsUpQRcLG7tpxGda0oh6H788ORZa5MQR3NqRbw@mail.gmail.com>]
[parent not found: <CAH6eHdTRiwA3t1K3okfVc+umPM8PJfbePaO7AHdLz+SYPOU9XQ@mail.gmail.com>]
* Re: stdc++ issue: extremely long compile time with large number of string literals [not found] ` <CAH6eHdTRiwA3t1K3okfVc+umPM8PJfbePaO7AHdLz+SYPOU9XQ@mail.gmail.com> @ 2020-07-10 19:36 ` Mandeep Sandhu 2020-07-10 20:25 ` Jonathan Wakely 0 siblings, 1 reply; 7+ messages in thread From: Mandeep Sandhu @ 2020-07-10 19:36 UTC (permalink / raw) To: Jonathan Wakely, gcc-help > Please reply to the mailing list, not just to me. Sorry, this must've happened by mistake (I almost always to a Reply-all when replying to messages from mailing lists) > > > > Thanks for the explanation. Still surprising that the performance > > degraded so much! Will this destruction loop run in quadratic time? > > No, only the compilation time is quadratic. The runtime performance > should be linear in the number of elements. Right, my question was poorly worded. When I said "Will this destruction loop run in quadratic time", I meant the time the compiler takes to create the exception handling code for N-1 elements. > I don't think you mean "raw string literals", do you? > "Raw string" has a very specific meaning: > https://en.cppreference.com/w/cpp/language/string_literal You're right, no I did not mean the one mentioned above (with the R prefix). Whats the correct term for a const char* (which is what I meant)? > > Your email subject says "large number of string literals" but what > your code actually does is create a large > std::initializer_list<std::string>, which is not the same. > > If you do create string literals and then construct from that it > compiles much faster. This is similar to your original: Just to be sure I follow you, the following invokes std::initializer_list<std::string> std::unordered_set<std::string> s ({"a", "b", ...}); // this is what I reported the issue with but this does not? std::unordered_set<std::string> s {"a", "b", ...}; (and is faster?) > > $ printf '#include <unordered_set>\n#include > <string>\nstd::unordered_set<std::string> s{\n' > init.C ; seq > --format='"%.0f",' 1 50000 >> init.C ; printf '};\n' >> init.C > $ time g++ init.C -c > > real 1m57.054s > user 1m55.370s > sys 0m1.117s On my machine, this too compiles forever (its been compiling for 20+ mins. I'm running on macbook with quad-core i7 & 8GB RAM) > > This creates an array of string literals and then constructs from that: > > $ printf '#include <unordered_set>\n#include > <string>\nstd::unordered_set<std::string> make_set() {\nconst char* > arr[] = {\n' > init2.C ; seq --format='"%.0f",' 1 50000 >> init2.C ; > printf '};\nreturn std::unordered_set<std::string>{std::begin(arr), > std::end(arr)};\n}\nauto s = make_set();\n' >> init2.C > tmp$ time g++ init2.C -c > > real 0m0.662s > user 0m0.591s > sys 0m0.069s > > As you can see, this is almost instant. Indeed it is! Although I have no idea why. Won't this form also require the same kind of exception handling code? when its iterating "arr", won't the compiler have to still ensure that prior objects are deleted if constructing a string obj from the current element fails? Or maybe the slowness comes from the way initializer_list specifically handles this? Thanks for your time. -mandeep ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: stdc++ issue: extremely long compile time with large number of string literals 2020-07-10 19:36 ` Mandeep Sandhu @ 2020-07-10 20:25 ` Jonathan Wakely 2020-07-11 6:27 ` Mandeep Sandhu 0 siblings, 1 reply; 7+ messages in thread From: Jonathan Wakely @ 2020-07-10 20:25 UTC (permalink / raw) To: Mandeep Sandhu; +Cc: gcc-help On Fri, 10 Jul 2020 at 20:36, Mandeep Sandhu <mandeepsandhu.chd@gmail.com> wrote: > > > Please reply to the mailing list, not just to me. > > Sorry, this must've happened by mistake (I almost always to a > Reply-all when replying to messages from mailing lists) > > > > > > > Thanks for the explanation. Still surprising that the performance > > > degraded so much! Will this destruction loop run in quadratic time? > > > > No, only the compilation time is quadratic. The runtime performance > > should be linear in the number of elements. > > Right, my question was poorly worded. When I said "Will this > destruction loop run in quadratic time", I meant the time the compiler > takes to create the exception handling code for N-1 elements. > > > I don't think you mean "raw string literals", do you? > > "Raw string" has a very specific meaning: > > https://en.cppreference.com/w/cpp/language/string_literal > > You're right, no I did not mean the one mentioned above (with the R > prefix). Whats the correct term for a const char* (which is what I > meant)? Just a string literal. A raw string literal is the special kind with the R prefix. > > > > Your email subject says "large number of string literals" but what > > your code actually does is create a large > > std::initializer_list<std::string>, which is not the same. > > > > If you do create string literals and then construct from that it > > compiles much faster. This is similar to your original: > > Just to be sure I follow you, the following invokes > std::initializer_list<std::string> > std::unordered_set<std::string> s ({"a", "b", ...}); // this is what I > reported the issue with > > but this does not? > std::unordered_set<std::string> s {"a", "b", ...}; (and is faster?) They're the same. > > > > $ printf '#include <unordered_set>\n#include > > <string>\nstd::unordered_set<std::string> s{\n' > init.C ; seq > > --format='"%.0f",' 1 50000 >> init.C ; printf '};\n' >> init.C > > $ time g++ init.C -c > > > > real 1m57.054s > > user 1m55.370s > > sys 0m1.117s > > On my machine, this too compiles forever (its been compiling for 20+ > mins. I'm running on macbook with quad-core i7 & 8GB RAM) Yes, that's the slow version. > > > > This creates an array of string literals and then constructs from that: > > > > $ printf '#include <unordered_set>\n#include > > <string>\nstd::unordered_set<std::string> make_set() {\nconst char* > > arr[] = {\n' > init2.C ; seq --format='"%.0f",' 1 50000 >> init2.C ; > > printf '};\nreturn std::unordered_set<std::string>{std::begin(arr), > > std::end(arr)};\n}\nauto s = make_set();\n' >> init2.C > > tmp$ time g++ init2.C -c > > > > real 0m0.662s > > user 0m0.591s > > sys 0m0.069s > > > > As you can see, this is almost instant. > > Indeed it is! Although I have no idea why. Won't this form also > require the same kind of exception handling code? when its iterating > "arr", won't the compiler have to still ensure that prior objects are > deleted if constructing a string obj from the current element fails? > > Or maybe the slowness comes from the way initializer_list specifically > handles this? It's the array of std::string that the std::initializer_list refers to that causes the problem. In the faster code there is no array of std::string, it's an array of const char*. Initializing a const char* can't throw an exception, and destroying it is a no-op, so there's no exception handling code needed for the array. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: stdc++ issue: extremely long compile time with large number of string literals 2020-07-10 20:25 ` Jonathan Wakely @ 2020-07-11 6:27 ` Mandeep Sandhu 0 siblings, 0 replies; 7+ messages in thread From: Mandeep Sandhu @ 2020-07-11 6:27 UTC (permalink / raw) To: Jonathan Wakely; +Cc: gcc-help > > It's the array of std::string that the std::initializer_list refers to > that causes the problem. > > In the faster code there is no array of std::string, it's an array of > const char*. Initializing a const char* can't throw an exception, and > destroying it is a no-op, so there's no exception handling code needed > for the array. Makes sense. Thanks for the explanation. I guess this is something to keep in mind when using initializer_list with a large number of non-trivial types. -mandeep ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2020-07-11 6:27 UTC | newest] Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-07-09 20:31 stdc++ issue: extremely long compile time with large number of string literals Mandeep Sandhu 2020-07-09 20:49 ` Marc Glisse 2020-07-09 21:27 ` Mandeep Sandhu 2020-07-09 23:49 ` Jonathan Wakely [not found] ` <CAC+QLdRqjTe4YsUpQRcLG7tpxGda0oh6H788ORZa5MQR3NqRbw@mail.gmail.com> [not found] ` <CAH6eHdTRiwA3t1K3okfVc+umPM8PJfbePaO7AHdLz+SYPOU9XQ@mail.gmail.com> 2020-07-10 19:36 ` Mandeep Sandhu 2020-07-10 20:25 ` Jonathan Wakely 2020-07-11 6:27 ` Mandeep Sandhu
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).