public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* performance of exception handling
@ 2020-05-11  8:14 Thomas Neumann
  2020-05-11 10:40 ` Florian Weimer
  0 siblings, 1 reply; 29+ messages in thread
From: Thomas Neumann @ 2020-05-11  8:14 UTC (permalink / raw)
  To: gcc

Hi,

I want to improve the performance of C++ exception handling, and I would
like to get some feedback on how to tackle that.

Currently, exception handling scales poorly due to global mutexes when
throwing. This can be seen with a small demo script here:
https://repl.it/repls/DeliriousPrivateProfiler
Using a thread count >1 is much slower than running single threaded.
This global locking is particular painful on a machine with more than a
hundred cores, as there mutexes are expensive and contention becomes
much more likely due to the high degree of parallelism.

Of course conventional wisdom is not to use exceptions when exceptions
can occur somewhat frequently. But I think that is a silly argument, see
the WG21 paper P0709 for a detailed discussion. In particular since
there is no technical reason why they have to be slow, it is just the
current implementation that is slow.

In the current gcc implementation on Linux the bottleneck is
_Unwind_Find_FDE, or more precisely, the function dl_iterate_phdr,
that is called for every frame and that iterates over all shared
libraries while holding a global lock.
That is inherently slow, both due to global locking and due to the data
structures involved.
And it is not easy to speed that up with, e.g., a thread local cache, as
glibc has no mechanism to notify us if a shared library is added or removed.

We therefore need a way to locate the exception frames that is
independent from glibc. One way to achieve that would be to explicitly
register exception frames with __register_frame_info_bases in a
constructor function (and deregister them in a destructor function).
Of course probing explicitly registered frame currently uses a global
lock, too, but that implementation is provided by libgcc, and we can
change that to something better, allowing for lock free reads.
In libgcc explicitly registered frames take precedence over the
dl_iterate_phdr mechanism, which means that we could mix future code
that does call __register_frame_info_bases explicitly with code that
does not. Code that does register will unwind faster than code that does
not, but both can coexist in one process.

Does that sound like a viable strategy to speed up exception handling? I
would be willing to contribute code for that, but I first wanted to know
if you are interested and if the strategy makes sense. Also, my
implementation makes use of atomics, which I hope are available on all
platforms that use unwind-dw2-fde.c, but I am not sure.

Thomas

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: performance of exception handling
  2020-05-11  8:14 performance of exception handling Thomas Neumann
@ 2020-05-11 10:40 ` Florian Weimer
  2020-05-11 13:59   ` Thomas Neumann
  2020-05-11 14:36   ` performance " David Edelsohn
  0 siblings, 2 replies; 29+ messages in thread
From: Florian Weimer @ 2020-05-11 10:40 UTC (permalink / raw)
  To: Thomas Neumann via Gcc; +Cc: Thomas Neumann

* Thomas Neumann via Gcc:

> Currently, exception handling scales poorly due to global mutexes when
> throwing. This can be seen with a small demo script here:
> https://repl.it/repls/DeliriousPrivateProfiler
> Using a thread count >1 is much slower than running single threaded.
> This global locking is particular painful on a machine with more than a
> hundred cores, as there mutexes are expensive and contention becomes
> much more likely due to the high degree of parallelism.
>
> Of course conventional wisdom is not to use exceptions when exceptions
> can occur somewhat frequently. But I think that is a silly argument, see
> the WG21 paper P0709 for a detailed discussion.

Link: <https://wg21.link/P0709>

I'm not sure if your summary is correct.

The claim in the paper that program bugs should not result in catchable
exceptions is also not what matches my limited experience with
application servers: They tend to install an exception handler of last
resort to catch unexpected exceptions (“bugs”) from processed requests
and log them, instead of letting them terminate the entire application
server.

> In particular since there is no technical reason why they have to be
> slow, it is just the current implementation that is slow.

I agree, the present state is not inherently due to the exception
handling model, it's a consequence of the current implementation.

> In the current gcc implementation on Linux the bottleneck is
> _Unwind_Find_FDE, or more precisely, the function dl_iterate_phdr,
> that is called for every frame and that iterates over all shared
> libraries while holding a global lock.
> That is inherently slow, both due to global locking and due to the data
> structures involved.

In particular, the libgcc unwinder relies on the global lock to protect
its own cache, so we cannot remove the lock from glibc.

> And it is not easy to speed that up with, e.g., a thread local cache,
> as glibc has no mechanism to notify us if a shared library is added or
> removed.

It is of course possible to change glibc.

My current preferred solution is something that moves the entire code
that locates the relevant FDE table into glibc.  This is all the code in
_Unwind_IteratePhdrCallback until the first read_encoded_value_with_base
call.  And the callback mechanism would be gone, so _Unwind_Find_FDE
would call __dl_ehframe_find (see below) and then the reamining
processing in _Unwind_IteratePhdrCallback.

The glibc interface would look like this:

/* Data returned by dl_find_ehframe.  */
struct dl_ehframe_info
{
  /* The link map of the object which contains the address.  */
  const struct link_map *dlehf_map;

  /* A pointer to its dynamic section.  This is a null pointer in
     statically linked applications.  */
  const ElfW(Dyn) *dlehf_dynamic;

  /* A pointer to the start of the PT_GNU_EH_FRAME segment for the
     object.  This is a null pointer if the object does not contain
     such a segment.  */
  const void *dlehf_ehframe;

  /* The size of the segment, or zero if not present.  */
  size_t dlehf_ehframe_size;

  /* Text and data base for the DWARF information in the segment.  */
  ElfW(Addr) dlehf_text_base;
  ElfW(Addr) dlehf_data_base;
};

/* Find the PT_GNU_EH_FRAME segment of the object which contains
   ADDRESS and writes information to it to *RESULT.  Return -1 if
   nothing was found, or 0 on success.  (*RESULT can be written to on
   failure, too.)  */
int __dl_ehframe_find (ElfW(Addr) __address,
                       struct dl_ehframe_info *__result)
  __THROW __nonnull ((2)) __attribute_warn_unused_result__;

It is the responsiblity of the glibc implementation of __dl_ehframe_find
to provide proper synchronization with the dynamic loader.  We can start
out with a lock-based implementation, as we have it today, and optimize
it later.

Based on prior discussions, this works because unwinding with a corrupt
stack or a stack containing unmapped objects is already undefined today,
so the live stack keeps all pointers returned from __dl_ehframe_find
valid.

The cache as it exists today would be removed from libgcc, but we
probably want to add a small cache that avoids the need to call into
glibc while unwinding through the same object (in which case we probably
should add boundary information to struct dl_ehframe_info).

The advantage of doing it this way is that it does not require
recompiling and relinking objects.

Thanks,
Florian


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: performance of exception handling
  2020-05-11 10:40 ` Florian Weimer
@ 2020-05-11 13:59   ` Thomas Neumann
  2020-05-11 14:22     ` Florian Weimer
  2020-05-11 15:14     ` size of exception handling (Was: performance of exception handling) Moritz Strübe
  2020-05-11 14:36   ` performance " David Edelsohn
  1 sibling, 2 replies; 29+ messages in thread
From: Thomas Neumann @ 2020-05-11 13:59 UTC (permalink / raw)
  To: Florian Weimer; +Cc: Gcc

> Link: <https://wg21.link/P0709>
> 
> I'm not sure if your summary is correct.

I was referring to Section 3.2, where Herb says:

"We must remove all technical reasons for a C++ project to disable
exception handling (e.g., by compiler switch) or ban use of exceptions,
in all or part of their project."

In a way I am disagreeing with the paper, of course, in that I propose
to make the existing exception mechanism faster instead of inventing a
new exception mechanism. But what I agree on with P0709 is that it is
unfortunate that many projects disable exceptions due to performance
concerns. And I think the performance problems can be solved.

> My current preferred solution is something that moves the entire code
> that locates the relevant FDE table into glibc.

That is indeed an option, but I have two concerns there. First, it will
lead to code duplication, as libgcc will have to continue to provide its
on implementation on systems with "old" glibcs lacking
__dl_ehframe_find. And second, libgcc has a second lookup mechanism for
__register_frame_info_bases etc., which is needed to JITed code anyway.
And it seems to be attractive to handle that in the same data structure
that also covers the code from executables and shared libraries. Of
course one could move that part to glibc, too. But the code duplication
problems will persist for a long time, as gcc cannot rely upon the
system glibc being new enough to provide __dl_ehframe_find.

Thomas

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: performance of exception handling
  2020-05-11 13:59   ` Thomas Neumann
@ 2020-05-11 14:22     ` Florian Weimer
  2020-05-11 15:14     ` size of exception handling (Was: performance of exception handling) Moritz Strübe
  1 sibling, 0 replies; 29+ messages in thread
From: Florian Weimer @ 2020-05-11 14:22 UTC (permalink / raw)
  To: Thomas Neumann; +Cc: Gcc

* Thomas Neumann:

>> Link: <https://wg21.link/P0709>
>> 
>> I'm not sure if your summary is correct.
>
> I was referring to Section 3.2, where Herb says:
>
> "We must remove all technical reasons for a C++ project to disable
> exception handling (e.g., by compiler switch) or ban use of exceptions,
> in all or part of their project."

Ah, but I think his other papers make it clear that he wants a unified
error handling mechanism for C++, but something that is not based on
exceptions with stack unwinding.  Or has he changed his mind?

> In a way I am disagreeing with the paper, of course, in that I propose
> to make the existing exception mechanism faster instead of inventing a
> new exception mechanism. But what I agree on with P0709 is that it is
> unfortunate that many projects disable exceptions due to performance
> concerns. And I think the performance problems can be solved.

Yes, I think we agree.

>> My current preferred solution is something that moves the entire code
>> that locates the relevant FDE table into glibc.
>
> That is indeed an option, but I have two concerns there. First, it will
> lead to code duplication, as libgcc will have to continue to provide its
> on implementation on systems with "old" glibcs lacking
> __dl_ehframe_find.

That's true for any approach with backwards compatibility.

> And second, libgcc has a second lookup mechanism for
> __register_frame_info_bases etc., which is needed to JITed code anyway.
> And it seems to be attractive to handle that in the same data structure
> that also covers the code from executables and shared libraries.

That would still need dynamic linker changes because currently, it is
not possible to report failure from an ELF constructor, and registration
may fail due to the memory allocations.

> Of course one could move that part to glibc, too. But the code
> duplication problems will persist for a long time, as gcc cannot rely
> upon the system glibc being new enough to provide __dl_ehframe_find.

Sure.  I'm not too worried about the explicitly registration code.  We
can optimize that independently, with a fast path for the common case
that no frames have been registered using this mechanism.

Thanks,
Florian


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: performance of exception handling
  2020-05-11 10:40 ` Florian Weimer
  2020-05-11 13:59   ` Thomas Neumann
@ 2020-05-11 14:36   ` David Edelsohn
  2020-05-11 14:52     ` Florian Weimer
  2020-05-12  6:08     ` Thomas Neumann
  1 sibling, 2 replies; 29+ messages in thread
From: David Edelsohn @ 2020-05-11 14:36 UTC (permalink / raw)
  To: Florian Weimer; +Cc: Thomas Neumann via Gcc, Thomas Neumann

On Mon, May 11, 2020 at 6:47 AM Florian Weimer via Gcc <gcc@gcc.gnu.org> wrote:

> My current preferred solution is something that moves the entire code
> that locates the relevant FDE table into glibc.  This is all the code in
> _Unwind_IteratePhdrCallback until the first read_encoded_value_with_base
> call.  And the callback mechanism would be gone, so _Unwind_Find_FDE
> would call __dl_ehframe_find (see below) and then the reamining
> processing in _Unwind_IteratePhdrCallback.

Not all GCC/G++ targets are GNU/Linux and use GLIBC.  A duplicate
implementation in GLIBC creates its own set of advantages and
disadvantages.

Thanks, David

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: performance of exception handling
  2020-05-11 14:36   ` performance " David Edelsohn
@ 2020-05-11 14:52     ` Florian Weimer
  2020-05-11 15:12       ` David Edelsohn
  2020-05-12  6:08     ` Thomas Neumann
  1 sibling, 1 reply; 29+ messages in thread
From: Florian Weimer @ 2020-05-11 14:52 UTC (permalink / raw)
  To: David Edelsohn; +Cc: Thomas Neumann via Gcc, Thomas Neumann

* David Edelsohn:

> On Mon, May 11, 2020 at 6:47 AM Florian Weimer via Gcc <gcc@gcc.gnu.org> wrote:
>
>> My current preferred solution is something that moves the entire code
>> that locates the relevant FDE table into glibc.  This is all the code in
>> _Unwind_IteratePhdrCallback until the first read_encoded_value_with_base
>> call.  And the callback mechanism would be gone, so _Unwind_Find_FDE
>> would call __dl_ehframe_find (see below) and then the reamining
>> processing in _Unwind_IteratePhdrCallback.
>
> Not all GCC/G++ targets are GNU/Linux and use GLIBC.  A duplicate
> implementation in GLIBC creates its own set of advantages and
> disadvantages.

The new interface is no less generic than _dl_iterate_phdr, so other
systems can easily implement it if they want (and reuse the libgcc code
that uses it).

Thanks,
Florian


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: performance of exception handling
  2020-05-11 14:52     ` Florian Weimer
@ 2020-05-11 15:12       ` David Edelsohn
  2020-05-11 15:24         ` Florian Weimer
  0 siblings, 1 reply; 29+ messages in thread
From: David Edelsohn @ 2020-05-11 15:12 UTC (permalink / raw)
  To: Florian Weimer; +Cc: Thomas Neumann via Gcc, Thomas Neumann

On Mon, May 11, 2020 at 10:52 AM Florian Weimer <fweimer@redhat.com> wrote:
>
> * David Edelsohn:
>
> > On Mon, May 11, 2020 at 6:47 AM Florian Weimer via Gcc <gcc@gcc.gnu.org> wrote:
> >
> >> My current preferred solution is something that moves the entire code
> >> that locates the relevant FDE table into glibc.  This is all the code in
> >> _Unwind_IteratePhdrCallback until the first read_encoded_value_with_base
> >> call.  And the callback mechanism would be gone, so _Unwind_Find_FDE
> >> would call __dl_ehframe_find (see below) and then the reamining
> >> processing in _Unwind_IteratePhdrCallback.
> >
> > Not all GCC/G++ targets are GNU/Linux and use GLIBC.  A duplicate
> > implementation in GLIBC creates its own set of advantages and
> > disadvantages.
>
> The new interface is no less generic than _dl_iterate_phdr, so other
> systems can easily implement it if they want (and reuse the libgcc code
> that uses it).

The mission of GCC explicitly states that it supports other platforms
and diverse environments.  Other targets may or may not be able to
modify their C Library.  GLIBC is free to implement a more optimal API
and coordinate with GCC, but GCC EH cannot rely on a new, optional, ad
hoc interface provided by GLIBC.

Thanks, David

^ permalink raw reply	[flat|nested] 29+ messages in thread

* size of exception handling (Was: performance of exception handling)
  2020-05-11 13:59   ` Thomas Neumann
  2020-05-11 14:22     ` Florian Weimer
@ 2020-05-11 15:14     ` Moritz Strübe
  2020-05-12  7:20       ` Freddie Chopin
  2020-05-12  9:03       ` size of exception handling Florian Weimer
  1 sibling, 2 replies; 29+ messages in thread
From: Moritz Strübe @ 2020-05-11 15:14 UTC (permalink / raw)
  To: gcc

Hey.

Am 11.05.2020 um 15:59 schrieb Thomas Neumann via Gcc:
> In a way I am disagreeing with the paper, of course, in that I propose
> to make the existing exception mechanism faster instead of inventing a
> new exception mechanism. But what I agree on with P0709 is that it is
> unfortunate that many projects disable exceptions due to performance
> concerns. And I think the performance problems can be solved.

I just wanted to point out that Herbceptions do not only fix performance 
issues, but also code-size problems. While anything below 4GB of RAM is 
considered under-powered for a PC, typical deep embedded environments 
have something around 32k of program memory and 2K of ram. And even 
those running Linux often have around 32MB program memory and 8MB of 
RAM. Also most of these systems are very cost sensitive, so each byte 
matters. Therefore RTTI and exceptions are one of the main reasons why 
parts of the embedded community consider C++ unusable: They do some 
experiments using C++ and the code size explodes.  And even if you know 
what you are doing and turn interrupts and RTTI off, adding a 
std::nothrow to every "new" to do decent error handling is pretty 
annoying. Not mentioning the parts of the C++ library that don't provide 
exception-free error-handling.

So yes, improving the performance is nice, but ISO C++ will still be 
unusable for most computer systems: There are way more emdedded systems 
with less than 32MB program memory out there than PCs, Servers and 
mobile phones together.

Cheers
Morty

-- 
Redheads Ltd. Softwaredienstleistungen
Schillerstr. 14
90409 Nürnberg

Telefon: +49 (0)911 180778-50
E-Mail: moritz.struebe@redheads.de | Web: www.redheads.de

Geschäftsführer: Andreas Hanke
Sitz der Gesellschaft: Lauf
Amtsgericht Nürnberg HRB 22681
Ust-ID: DE 249436843


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: performance of exception handling
  2020-05-11 15:12       ` David Edelsohn
@ 2020-05-11 15:24         ` Florian Weimer
  0 siblings, 0 replies; 29+ messages in thread
From: Florian Weimer @ 2020-05-11 15:24 UTC (permalink / raw)
  To: David Edelsohn; +Cc: Thomas Neumann via Gcc, Thomas Neumann

* David Edelsohn:

> On Mon, May 11, 2020 at 10:52 AM Florian Weimer <fweimer@redhat.com> wrote:
>>
>> * David Edelsohn:
>>
>> > On Mon, May 11, 2020 at 6:47 AM Florian Weimer via Gcc <gcc@gcc.gnu.org> wrote:
>> >
>> >> My current preferred solution is something that moves the entire code
>> >> that locates the relevant FDE table into glibc.  This is all the code in
>> >> _Unwind_IteratePhdrCallback until the first read_encoded_value_with_base
>> >> call.  And the callback mechanism would be gone, so _Unwind_Find_FDE
>> >> would call __dl_ehframe_find (see below) and then the reamining
>> >> processing in _Unwind_IteratePhdrCallback.
>> >
>> > Not all GCC/G++ targets are GNU/Linux and use GLIBC.  A duplicate
>> > implementation in GLIBC creates its own set of advantages and
>> > disadvantages.
>>
>> The new interface is no less generic than _dl_iterate_phdr, so other
>> systems can easily implement it if they want (and reuse the libgcc code
>> that uses it).
>
> The mission of GCC explicitly states that it supports other platforms
> and diverse environments.  Other targets may or may not be able to
> modify their C Library.  GLIBC is free to implement a more optimal API
> and coordinate with GCC, but GCC EH cannot rely on a new, optional, ad
> hoc interface provided by GLIBC.

Sorry, if I gave the impression that we should remove any code from
libgcc/ or start requiring support for dl_iterate_phdr, that was not my
intention.

Florian


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: performance of exception handling
  2020-05-11 14:36   ` performance " David Edelsohn
  2020-05-11 14:52     ` Florian Weimer
@ 2020-05-12  6:08     ` Thomas Neumann
  2020-05-12  7:15       ` Richard Biener
  2020-05-12  9:01       ` Richard Sandiford
  1 sibling, 2 replies; 29+ messages in thread
From: Thomas Neumann @ 2020-05-12  6:08 UTC (permalink / raw)
  To: David Edelsohn, Florian Weimer; +Cc: gcc

> Not all GCC/G++ targets are GNU/Linux and use GLIBC.  A duplicate
> implementation in GLIBC creates its own set of advantages and
> disadvantages.

so what should I do now? Should I try to move the lookup into GLIBC? Or
handled it within libgcc, as I had originally proposed? Or give up due
to the inertia of a large, grown system?

Another concern is memory consumption. I wanted to store the FDE entries
in a b-tree, which allows for fast lookup and low overhead
synchronization. Memory wise that is not really worse than what we have
today (the "linear" and "erratic" arrays). But the current code has a
fallback for when it is unable to allocate these arrays, falling back to
linear search. Is something like that required? It would make the code
much more complicated (but I got from Moritz mail that some people
really care about memory constrained situations).

Thomas

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: performance of exception handling
  2020-05-12  6:08     ` Thomas Neumann
@ 2020-05-12  7:15       ` Richard Biener
  2020-05-12  7:30         ` Thomas Neumann
  2020-05-12  9:01       ` Richard Sandiford
  1 sibling, 1 reply; 29+ messages in thread
From: Richard Biener @ 2020-05-12  7:15 UTC (permalink / raw)
  To: Thomas Neumann; +Cc: David Edelsohn, Florian Weimer, GCC Development

On Tue, May 12, 2020 at 8:14 AM Thomas Neumann via Gcc <gcc@gcc.gnu.org> wrote:
>
> > Not all GCC/G++ targets are GNU/Linux and use GLIBC.  A duplicate
> > implementation in GLIBC creates its own set of advantages and
> > disadvantages.
>
> so what should I do now? Should I try to move the lookup into GLIBC? Or
> handled it within libgcc, as I had originally proposed? Or give up due
> to the inertia of a large, grown system?
>
> Another concern is memory consumption. I wanted to store the FDE entries
> in a b-tree, which allows for fast lookup and low overhead
> synchronization. Memory wise that is not really worse than what we have
> today (the "linear" and "erratic" arrays). But the current code has a
> fallback for when it is unable to allocate these arrays, falling back to
> linear search. Is something like that required? It would make the code
> much more complicated (but I got from Moritz mail that some people
> really care about memory constrained situations).

Some people use exceptions to propagate "low memory" up which
made me increase the size of the EH emergency pool (which is
used when malloc cannot even allocate the EH data itself) ...

So yes, people care.  There absolutely has to be a path in
unwinding that allocates no (as little as possible) memory.

Richard.

> Thomas

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: size of exception handling (Was: performance of exception handling)
  2020-05-11 15:14     ` size of exception handling (Was: performance of exception handling) Moritz Strübe
@ 2020-05-12  7:20       ` Freddie Chopin
  2020-05-12  7:47         ` Oleg Endo
                           ` (2 more replies)
  2020-05-12  9:03       ` size of exception handling Florian Weimer
  1 sibling, 3 replies; 29+ messages in thread
From: Freddie Chopin @ 2020-05-12  7:20 UTC (permalink / raw)
  To: Moritz Strübe, gcc

On Mon, 2020-05-11 at 17:14 +0200, Moritz Strübe wrote:
> I just wanted to point out that Herbceptions do not only fix
> performance 
> issues, but also code-size problems. While anything below 4GB of RAM
> is 
> considered under-powered for a PC, typical deep embedded
> environments 
> have something around 32k of program memory and 2K of ram. And even 
> those running Linux often have around 32MB program memory and 8MB of 
> RAM. Also most of these systems are very cost sensitive, so each
> byte 
> matters. Therefore RTTI and exceptions are one of the main reasons
> why 
> parts of the embedded community consider C++ unusable: They do some 
> experiments using C++ and the code size explodes.  And even if you
> know 
> what you are doing and turn interrupts and RTTI off, adding a 
> std::nothrow to every "new" to do decent error handling is pretty 
> annoying. Not mentioning the parts of the C++ library that don't
> provide 
> exception-free error-handling.
> 
> So yes, improving the performance is nice, but ISO C++ will still be 
> unusable for most computer systems: There are way more emdedded
> systems 
> with less than 32MB program memory out there than PCs, Servers and 
> mobile phones together.

Very nice that Moritz finally mentioned it (; The world of deep
embedded is usually forgotten by all the language committees and people
who are in charge. That's why the proposal by Herb is a real surprise
and I really hope it could be implemented someday.

The numbers above are maybe a bit too strict. A typical ARM Cortex-M
chip can have up to 2 MB of ROM and 512 kB of RAM, but I would say that
usually it has around 128-256 kB of ROM and somewhere around 16-64 kB
of RAM. The problem with C++ exceptions is that even in the most
trivial of the programs and even if you don't explicitly
use/catch/throw them, they instantly eat around 60 kB of ROM and quite
a lot of RAM. With some hacking you can get down to about 20 kB of ROM
(by overriding a lot of string formatting code and overriding
std::terminate()), but this is still too much for a feature you
actually don't use.

I actually have to build my own toolchain instead of the one provided
by ARM, because to really NOT use C++ exceptions, you have to recompile
the whole libstdc++ with `-fno-exceptions -fno-rtti` (yes, I know they
provide the "nano" libraries, but I the options they used for newlib
don't suit my needs - this is "too minimized"). If you pass these two
flags during compilation and linking of your own application, this
disables these features only in your code. As libstdc++ is compiled
with exceptions and RTTI enabled, they get pulled during link anyway,
bloating the binary size with a functionality you don't use and can't
use - every throw from the library ends up as std::teminate() anyway.
That's why the statement by Herb that C++ exceptions are never zero-
cost when not used is perfectly true.

The performance is also an issue. Article I've read sometime ago
mentioned that a trivial throw out of a single function takes about
7000-10000 clock cycles until it is catched, which is unacceptable for
any real-time solution (normal return with error handling would take at
most a few dozen). But the size issue is a blocker for deep embedded,
then the performance and deterministic behaviour are only secondary
issues in such environment.

Regards,
FCh


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: performance of exception handling
  2020-05-12  7:15       ` Richard Biener
@ 2020-05-12  7:30         ` Thomas Neumann
  0 siblings, 0 replies; 29+ messages in thread
From: Thomas Neumann @ 2020-05-12  7:30 UTC (permalink / raw)
  To: Richard Biener; +Cc: David Edelsohn, Florian Weimer, GCC Development

> Some people use exceptions to propagate "low memory" up which
> made me increase the size of the EH emergency pool (which is
> used when malloc cannot even allocate the EH data itself) ...
> 
> So yes, people care.  There absolutely has to be a path in
> unwinding that allocates no (as little as possible) memory.

note that I would not allocate at all in the unwinding path. I would
allocate memory when new frames are registered, but unwinding would be
without any allocations.

Of course there is a trade-off here. We could delay allocating the
lookup structures until the first exception occurs, in order to speed up
programs that never throw any exceptions. But that would effectively
force us to implement a "no memory" fallback, for exactly the reason you
gave, as something like bad_alloc might be the first exception that we
encounter.

Thomas

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: size of exception handling (Was: performance of exception handling)
  2020-05-12  7:20       ` Freddie Chopin
@ 2020-05-12  7:47         ` Oleg Endo
  2020-05-13  9:13           ` Jonathan Wakely
  2020-05-12  9:16         ` size of exception handling Florian Weimer
  2020-05-12 11:07         ` size of exception handling (Was: performance of exception handling) Jonathan Wakely
  2 siblings, 1 reply; 29+ messages in thread
From: Oleg Endo @ 2020-05-12  7:47 UTC (permalink / raw)
  To: Freddie Chopin, Moritz Strübe, gcc

On Tue, 2020-05-12 at 09:20 +0200, Freddie Chopin wrote:
> 
> I actually have to build my own toolchain instead of the one provided
> by ARM, because to really NOT use C++ exceptions, you have to recompile
> the whole libstdc++ with `-fno-exceptions -fno-rtti` (yes, I know they
> provide the "nano" libraries, but I the options they used for newlib
> don't suit my needs - this is "too minimized"). If you pass these two
> flags during compilation and linking of your own application, this
> disables these features only in your code. As libstdc++ is compiled
> with exceptions and RTTI enabled, ...

IMHO this is a conceptual fail of the whole concept of using pre-
compiled pre-installed libraries somewhere in the toolchain, in
particular for this kind of cross-compilation scenario.  Like you say,
when we set "exceptions off" it usually means for the whole embedded
app, and the whole embedded app usually means all the OS and runtime
libraries and everything, not just the user code.

One option is to not use the pre-compiled toolchain libstc++ but build
it from source (or use another c++ std lib of your choice), as part of
the whole project, with the desired project settings.


BTW, just to throw in my 2-cents into the "I'm using MCU" pool of
pain/joy ... in one of my projects I'm using STM32F051K6U6, 32 KB
flash, 8 KB RAM, running all C++ code with shared C++ RPC libraries to
communicate with other (bigger) devices.  Exceptions, RTTI, threads
have to be turned off and only the header-only things from the stdlib
can be used and no heap allocations.  Otherwise the thing doesn't fit. 
Don't feel like rewriting the whole thing either.  There are some
annoyances when turning off exceptions and RTTI which results in
increased code maintenance.  I'd definitely be good and highly
appreciated if there were any improvements in the area of exception
handling.

Cheers,
Oleg


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: performance of exception handling
  2020-05-12  6:08     ` Thomas Neumann
  2020-05-12  7:15       ` Richard Biener
@ 2020-05-12  9:01       ` Richard Sandiford
  2020-05-13  1:13         ` Thomas Neumann
  1 sibling, 1 reply; 29+ messages in thread
From: Richard Sandiford @ 2020-05-12  9:01 UTC (permalink / raw)
  To: Thomas Neumann via Gcc; +Cc: David Edelsohn, Florian Weimer, Thomas Neumann

Thomas Neumann via Gcc <gcc@gcc.gnu.org> writes:
>> Not all GCC/G++ targets are GNU/Linux and use GLIBC.  A duplicate
>> implementation in GLIBC creates its own set of advantages and
>> disadvantages.
>
> so what should I do now? Should I try to move the lookup into GLIBC? Or
> handled it within libgcc, as I had originally proposed? Or give up due
> to the inertia of a large, grown system?

Just echoing what David said really, but: if the libgcc changes
are expected to be portable beyond glibc, then the existence of
an alternative option for glibc shouldn't block the libgcc changes.
The two approaches aren't be mutually exclusive and each approach
would achieve something that the other one wouldn't.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: size of exception handling
  2020-05-11 15:14     ` size of exception handling (Was: performance of exception handling) Moritz Strübe
  2020-05-12  7:20       ` Freddie Chopin
@ 2020-05-12  9:03       ` Florian Weimer
  1 sibling, 0 replies; 29+ messages in thread
From: Florian Weimer @ 2020-05-12  9:03 UTC (permalink / raw)
  To: Moritz Strübe; +Cc: gcc

* Moritz Strübe:

> Hey.
>
> Am 11.05.2020 um 15:59 schrieb Thomas Neumann via Gcc:
>> In a way I am disagreeing with the paper, of course, in that I propose
>> to make the existing exception mechanism faster instead of inventing a
>> new exception mechanism. But what I agree on with P0709 is that it is
>> unfortunate that many projects disable exceptions due to performance
>> concerns. And I think the performance problems can be solved.
>
> I just wanted to point out that Herbceptions do not only fix
> performance issues, but also code-size problems. While anything below
> 4GB of RAM is considered under-powered for a PC, typical deep embedded
> environments have something around 32k of program memory and 2K of
> ram. And even those running Linux often have around 32MB program
> memory and 8MB of RAM. Also most of these systems are very cost
> sensitive, so each byte matters. Therefore RTTI and exceptions are one
> of the main reasons why parts of the embedded community consider C++
> unusable: They do some experiments using C++ and the code size
> explodes.  And even if you know what you are doing and turn interrupts
> and RTTI off, adding a std::nothrow to every "new" to do decent error
> handling is pretty annoying. Not mentioning the parts of the C++
> library that don't provide exception-free error-handling.

The flag-return proposal has costs as well, especially if applied
verbatim without any additional language changes.  There won't be large
tables, but code size will increase across all non-leaf, noexcept(false)
functions.

My worry is that the overhead will be enough to deter embedded users,
and then we are stuck with a third error handling approach for C++.

Thanks,
Florian


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: size of exception handling
  2020-05-12  7:20       ` Freddie Chopin
  2020-05-12  7:47         ` Oleg Endo
@ 2020-05-12  9:16         ` Florian Weimer
  2020-05-12  9:44           ` Freddie Chopin
  2020-05-12 11:07         ` size of exception handling (Was: performance of exception handling) Jonathan Wakely
  2 siblings, 1 reply; 29+ messages in thread
From: Florian Weimer @ 2020-05-12  9:16 UTC (permalink / raw)
  To: Freddie Chopin; +Cc: Moritz Strübe, gcc

* Freddie Chopin:

> Very nice that Moritz finally mentioned it (; The world of deep
> embedded is usually forgotten by all the language committees and people
> who are in charge.

That can only happen if the embedded people do not bother to show up in
numbers.  Of course the tools will move in different directions.

> That's why the proposal by Herb is a real surprise and I really hope
> it could be implemented someday.

Would you use it if switching from -fno-exceptions to this new approach
resulted in an immediate 20% code size increase, without actually using
the new error handling feature at all?  What about 10%?

> The numbers above are maybe a bit too strict. A typical ARM Cortex-M
> chip can have up to 2 MB of ROM and 512 kB of RAM, but I would say that
> usually it has around 128-256 kB of ROM and somewhere around 16-64 kB
> of RAM. The problem with C++ exceptions is that even in the most
> trivial of the programs and even if you don't explicitly
> use/catch/throw them, they instantly eat around 60 kB of ROM and quite
> a lot of RAM. With some hacking you can get down to about 20 kB of ROM
> (by overriding a lot of string formatting code and overriding
> std::terminate()), but this is still too much for a feature you
> actually don't use.

It might still be interesting to contribute this somewhere.

Is there any effort to bring down the table sizes and reduce the
duplicated code in the cleanup paths?  In most cases, it seems that the
unwinder could actually run (the tail of) the successful return path and
call _Unwinded_Resume itself, rather than having the compiler duplicate
the cleanup code with an _Unwind_Resume at the end.

The code optimizations would likely be needed for the flag-return error
handling approach, too.

Thanks,
Florian


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: size of exception handling
  2020-05-12  9:16         ` size of exception handling Florian Weimer
@ 2020-05-12  9:44           ` Freddie Chopin
  2020-05-12 11:11             ` Jonathan Wakely
  2020-05-12 11:17             ` Moritz Strübe
  0 siblings, 2 replies; 29+ messages in thread
From: Freddie Chopin @ 2020-05-12  9:44 UTC (permalink / raw)
  To: Florian Weimer; +Cc: Moritz Strübe, gcc

On Tue, 2020-05-12 at 11:16 +0200, Florian Weimer wrote:
> That can only happen if the embedded people do not bother to show up
> in
> numbers.  Of course the tools will move in different directions.

True (;

> > That's why the proposal by Herb is a real surprise and I really
> > hope
> > it could be implemented someday.
> 
> Would you use it if switching from -fno-exceptions to this new
> approach
> resulted in an immediate 20% code size increase, without actually
> using
> the new error handling feature at all?  What about 10%?

As I understand the proposal, it would basically boil down to returning
something like std::pair<XXX, HerbCeption> where XXX is what you return
explicitly in the code. Or sth like returning std::variant<> or
std::optional<>. I use these approaches in my own code all the time
(mostly std::pair<int, XXX>, as it's the simplest one). The rest of the
proposal seems to be syntax sugar for catching this "HerbCeption" and
so on. As my code basically does the same now, I guess the increase for
me would be ~0%.

I perfectly understand that error handling has some non-zero cost and
the only way to avoid it is to completely ignore the errors (; But it
seems to me that what is proposed there is really very cheap and very
fast. As long as the committee will drop std::string and such from
std::error_code (;

With current C++ exceptions the increase is not very proportional to
the size of the rest of the application. It's more like a one-time ~60
kB increase, even if the application without it would be 5 kB total. In
a huge application the increase may be less noticeable, as parts of the
pulled code may be used anyway by the app (things like dynamic
allocation, fprintf() and so on), but this is still ~25% of total
available ROM size.

To summarize. Current C++ exceptions have very huge, mostly "one-time"
kind, cost on the size, even if not used at all by the user, mosly due
to std::terminate() and all the string handling code inside it, as well
as the unwind tables. The proposal by Herb seems to be more reasonable
in this regard - the amount of extra code generated under the hood will
probably be proportional to the amount of code involved and actually
similar to what C programmers do manually for decades.

Regards,
FCh


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: size of exception handling (Was: performance of exception handling)
  2020-05-12  7:20       ` Freddie Chopin
  2020-05-12  7:47         ` Oleg Endo
  2020-05-12  9:16         ` size of exception handling Florian Weimer
@ 2020-05-12 11:07         ` Jonathan Wakely
  2020-05-12 20:56           ` Freddie Chopin
  2 siblings, 1 reply; 29+ messages in thread
From: Jonathan Wakely @ 2020-05-12 11:07 UTC (permalink / raw)
  To: Freddie Chopin; +Cc: Moritz Strübe, gcc

On Tue, 12 May 2020 at 09:17, Freddie Chopin wrote:
> The problem with C++ exceptions is that even in the most
> trivial of the programs and even if you don't explicitly
> use/catch/throw them, they instantly eat around 60 kB of ROM and quite
> a lot of RAM. With some hacking you can get down to about 20 kB of ROM
> (by overriding a lot of string formatting code and overriding
> std::terminate()),

You're talking about C++ exceptions in general, but the problems you
mention seems to be issues with specific implementation properties.

If the comments above are referring to the libstdc++ verbose terminate
handler, that's configurable. Configuring GCC with
--disable-libstdcxx-verbose will disable that, and so will building
libstdc++ with -fno-exceptions. That was fixed years ago.

If there are remaining problems where I/O and string routines get
dragged in without exceptions and the verbose terminate handler,
please report bugs against libstdc++. I would expect heroics to be
needed for a tiny footprint, but it should be possible to get a small
footprint just by rebuilding with the right options and flags.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: size of exception handling
  2020-05-12  9:44           ` Freddie Chopin
@ 2020-05-12 11:11             ` Jonathan Wakely
  2020-05-12 11:17             ` Moritz Strübe
  1 sibling, 0 replies; 29+ messages in thread
From: Jonathan Wakely @ 2020-05-12 11:11 UTC (permalink / raw)
  To: Freddie Chopin; +Cc: Florian Weimer, gcc

On Tue, 12 May 2020 at 11:48, Freddie Chopin wrote:
> To summarize. Current C++ exceptions have very huge, mostly "one-time"
> kind, cost on the size, even if not used at all by the user, mosly due
> to std::terminate() and all the string handling code inside it, as well
> as the unwind tables.

There is no string handling code in std::terminate:

namespace std
{
  typedef void (*terminate_handler) ();
  void terminate() _GLIBCXX_USE_NOEXCEPT __attribute__ ((__noreturn__));
}

void
__cxxabiv1::__terminate (std::terminate_handler handler) throw ()
{
  __try
    {
      handler ();
      std::abort ();
    }
  __catch(...)
    { std::abort (); }
}

void
std::terminate () throw()
{
  __terminate (get_terminate ());
}

Please clarify what you're talking about.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: size of exception handling
  2020-05-12  9:44           ` Freddie Chopin
  2020-05-12 11:11             ` Jonathan Wakely
@ 2020-05-12 11:17             ` Moritz Strübe
  2020-05-12 11:29               ` Florian Weimer
  1 sibling, 1 reply; 29+ messages in thread
From: Moritz Strübe @ 2020-05-12 11:17 UTC (permalink / raw)
  To: Freddie Chopin, Florian Weimer; +Cc: gcc



Am 12.05.2020 um 11:44 schrieb Freddie Chopin:
>> Would you use it if switching from -fno-exceptions to this new
>> approach
>> resulted in an immediate 20% code size increase, without actually
>> using
>> the new error handling feature at all?  What about 10%?

I don't think that it will be that much. I agree with Freddie: 
Exceptions are critical errors you need to handle anyway. Thus the code 
size should not increase as the error-handling code should already be 
there.
I can really recommend Herb's talk: 
https://www.youtube.com/watch?v=ARYP83yNAWk ,  where also talks about 
reducing RTTI overhead and making the C++-lib mostly exception free.
In that talk he mentions that it would be possible using some CPU-Bit to 
return the state. This could result in adding a single instruction for 
each call (ret/jmp if bit is set) and maybe a second to clear that flag. 
Considering that you normally have to use at least an extra byte to pass 
this information,  which then needs to be evaluated, using 
Herbcexceptions might actually result in smaller code than using manual 
error handling. You still need run-time type information for catching, 
but I'm sure that there are solutions for that, too (e.g. Herb's 
solution from the talk *).

Cheers
Morty


* Alternatively it could be possible to only add type information that 
is necessary for catching and only for types that can actually be thrown 
(as far as I can see, all exceptions types must be known at link-time, 
as they are constructed by the throw-expression).


-- 
Redheads Ltd. Softwaredienstleistungen
Schillerstr. 14
90409 Nürnberg

Telefon: +49 (0)911 180778-50
E-Mail: moritz.struebe@redheads.de | Web: www.redheads.de

Geschäftsführer: Andreas Hanke
Sitz der Gesellschaft: Lauf
Amtsgericht Nürnberg HRB 22681
Ust-ID: DE 249436843


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: size of exception handling
  2020-05-12 11:17             ` Moritz Strübe
@ 2020-05-12 11:29               ` Florian Weimer
  2020-05-12 12:01                 ` Moritz Strübe
  0 siblings, 1 reply; 29+ messages in thread
From: Florian Weimer @ 2020-05-12 11:29 UTC (permalink / raw)
  To: Moritz Strübe; +Cc: Freddie Chopin, gcc

* Moritz Strübe:

>> Would you use it if switching from -fno-exceptions to this new
>> approach resulted in an immediate 20% code size increase, without
>> actually using the new error handling feature at all?  What about
>> 10%?
>
> I don't think that it will be that much.

Why?  Have you simulated the code size changes?  I actually ran some
experiments.

> Exceptions are critical errors you need to handle anyway. Thus the
> code size should not increase as the error-handling code should
> already be there.  I can really recommend Herb's talk:
> https://www.youtube.com/watch?v=ARYP83yNAWk , where also talks about
> reducing RTTI overhead and making the C++-lib mostly exception free.

I think the proponents generally underestimate how many functions would
need to change their signature so that they can propagate errors using
the new mechanism.  This leads to very optimistic estimations on size
impact.

> In that talk he mentions that it would be possible using some CPU-Bit
> to return the state.

That's quite hard for us because of the stack protector.  It's unclear
if using the flag is actually beneficial from a code site perspective.
Obviously it depends on the ration between functions and call sites.

Thanks,
Florian


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: size of exception handling
  2020-05-12 11:29               ` Florian Weimer
@ 2020-05-12 12:01                 ` Moritz Strübe
  0 siblings, 0 replies; 29+ messages in thread
From: Moritz Strübe @ 2020-05-12 12:01 UTC (permalink / raw)
  To: Florian Weimer; +Cc: Freddie Chopin, gcc

Hey.

Am 12.05.2020 um 13:29 schrieb Florian Weimer:
>>> Would you use it if switching from -fno-exceptions to this new
>>> approach resulted in an immediate 20% code size increase, without
>>> actually using the new error handling feature at all?  What about
>>> 10%?
>> I don't think that it will be that much.
> Why?  Have you simulated the code size changes?

No, but I read a lot of C/ C++ generated assembler (Mainly ARM and 
without Exceptions. :) ).

> I actually ran some
> experiments.
>
Sweet. Would you mind sharing the results?

Morty

-- 
Redheads Ltd. Softwaredienstleistungen
Schillerstr. 14
90409 Nürnberg

Telefon: +49 (0)911 180778-50
E-Mail: moritz.struebe@redheads.de | Web: www.redheads.de

Geschäftsführer: Andreas Hanke
Sitz der Gesellschaft: Lauf
Amtsgericht Nürnberg HRB 22681
Ust-ID: DE 249436843


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: size of exception handling (Was: performance of exception handling)
  2020-05-12 11:07         ` size of exception handling (Was: performance of exception handling) Jonathan Wakely
@ 2020-05-12 20:56           ` Freddie Chopin
  2020-05-12 22:39             ` Jonathan Wakely
  0 siblings, 1 reply; 29+ messages in thread
From: Freddie Chopin @ 2020-05-12 20:56 UTC (permalink / raw)
  To: Jonathan Wakely; +Cc: Moritz Strübe, gcc

On Tue, 2020-05-12 at 12:07 +0100, Jonathan Wakely wrote:
> You're talking about C++ exceptions in general, but the problems you
> mention seems to be issues with specific implementation properties.

Possibly true, but this argument - that all the problems are related to
specific implementation and thus can be easily fixed - is the same for
years and yet the problem is still there (; I guess that if this could
be easily fixed, then it would be done years ago. Along with the
performance and non-deterministic execution issues...

> If the comments above are referring to the libstdc++ verbose
> terminate
> handler, that's configurable. Configuring GCC with
> --disable-libstdcxx-verbose will disable that, and so will building
> libstdc++ with -fno-exceptions. That was fixed years ago.

True, sorry for the confusion, indeed I was talking about verbose
terminate handler. I check the state of C++ exceptions for MCUs only
once every few years, so that's why I got that mixed with
std::terminate(). I use my custom compilation with disabled exceptions
(toolchain & libstdc++ built with -fno-exceptions -fno-rtti) and this
works perfectly fine.

Anyway... If you have to recompile the toolchain, the problem is still
there. Most of the people (like 99,666%) will not do that for various
reasons. Some don't know how, some use only Windows, some don't have
time to deal with the compilation (the whole toolchain takes around an
hour here, but this excludes the time to prepare the script that builds
it), some other consider the toolchain provided by MCU vendor (or by
ARM) as "tested to work correctly" so they don't want to replace that
with their custom built solution, and so on, and so on...

Regards,
FCh


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: size of exception handling (Was: performance of exception handling)
  2020-05-12 20:56           ` Freddie Chopin
@ 2020-05-12 22:39             ` Jonathan Wakely
  2020-05-12 22:48               ` Jonathan Wakely
  0 siblings, 1 reply; 29+ messages in thread
From: Jonathan Wakely @ 2020-05-12 22:39 UTC (permalink / raw)
  To: Freddie Chopin; +Cc: Moritz Strübe, gcc

On Tue, 12 May 2020, 21:57 Freddie Chopin, <freddie_chopin@op.pl> wrote:
>
> On Tue, 2020-05-12 at 12:07 +0100, Jonathan Wakely wrote:
> > You're talking about C++ exceptions in general, but the problems you
> > mention seems to be issues with specific implementation properties.
>
> Possibly true, but this argument - that all the problems are related to
> specific implementation and thus can be easily fixed


I didn't say anything about it being easy to fix.

I'm just trying to stop misinformation about std::terminate requiring
string handling or I/O, which isn't true for C++ in general, and isn't
even true for libstdc++ because it's configurable. If you want a
smaller EH runtime, that's already possible with libstdc++. Could it
be even smaller? Yes, probably, but we need bug reports or concrete
suggestions, not outdated or misleading claims about optional
properties of the libstdc++ runtime.


> - is the same for
> years and yet the problem is still there (; I guess that if this could
> be easily fixed, then it would be done years ago. Along with the
> performance and non-deterministic execution issues...

Nobody said it can easily be fixed though.

> > If the comments above are referring to the libstdc++ verbose
> > terminate
> > handler, that's configurable. Configuring GCC with
> > --disable-libstdcxx-verbose will disable that, and so will building
> > libstdc++ with -fno-exceptions. That was fixed years ago.
>
> True, sorry for the confusion, indeed I was talking about verbose
> terminate handler. I check the state of C++ exceptions for MCUs only
> once every few years, so that's why I got that mixed with
> std::terminate(). I use my custom compilation with disabled exceptions
> (toolchain & libstdc++ built with -fno-exceptions -fno-rtti) and this
> works perfectly fine.

It's been a few years since we changed anything, because disabling the
verbose handler solved one of the biggest issues.

> Anyway... If you have to recompile the toolchain, the problem is still
> there. Most of the people (like 99,666%) will not do that for various
> reasons. Some don't know how, some use only Windows, some don't have
> time to deal with the compilation (the whole toolchain takes around an
> hour here, but this excludes the time to prepare the script that builds
> it), some other consider the toolchain provided by MCU vendor (or by
> ARM) as "tested to work correctly" so they don't want to replace that
> with their custom built solution, and so on, and so on...

There is no one-size-fits-all solution that gives everybody their
ideal set of defaults, so we provide configuration options to tune
things for your needs. Complaining that you have to rebuild things to
get different defaults seems silly. Would you prefer we don't offer
the options at all?

If you have concrete suggestions for improvements or can identify
places we can improve, I'd like to hear them. If you just want to
complain about C++ exceptions, that's not very helpful.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: size of exception handling (Was: performance of exception handling)
  2020-05-12 22:39             ` Jonathan Wakely
@ 2020-05-12 22:48               ` Jonathan Wakely
  2020-05-13  8:04                 ` David Brown
  0 siblings, 1 reply; 29+ messages in thread
From: Jonathan Wakely @ 2020-05-12 22:48 UTC (permalink / raw)
  To: Freddie Chopin; +Cc: Moritz Strübe, gcc

On Tue, 12 May 2020 at 23:39, Jonathan Wakely wrote:
> On Tue, 12 May 2020, 21:57 Freddie Chopin, <freddie_chopin@op.pl> wrote:
> > Anyway... If you have to recompile the toolchain, the problem is still
> > there. Most of the people (like 99,666%) will not do that for various
> > reasons. Some don't know how, some use only Windows, some don't have
> > time to deal with the compilation (the whole toolchain takes around an
> > hour here, but this excludes the time to prepare the script that builds
> > it), some other consider the toolchain provided by MCU vendor (or by
> > ARM) as "tested to work correctly" so they don't want to replace that
> > with their custom built solution, and so on, and so on...
>
> There is no one-size-fits-all solution that gives everybody their
> ideal set of defaults, so we provide configuration options to tune
> things for your needs. Complaining that you have to rebuild things to
> get different defaults seems silly. Would you prefer we don't offer
> the options at all?

And I also never said that every user should rebuild the toolchain.
The options can be used by vendors providing a toolchain for their
hardware, if the verbose handler (or exceptions in general!) are not
appropriate for their users. Just because something isn't the default,
doesn't mean every user needs to change it themselves.

And if writing a script and waiting an hour is too much effort to
reduce unwanted overhead, then I guess that overhead isn't such a big
deal anyway.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: performance of exception handling
  2020-05-12  9:01       ` Richard Sandiford
@ 2020-05-13  1:13         ` Thomas Neumann
  0 siblings, 0 replies; 29+ messages in thread
From: Thomas Neumann @ 2020-05-13  1:13 UTC (permalink / raw)
  To: Gcc, David Edelsohn, Florian Weimer, richard.sandiford

> Just echoing what David said really, but: if the libgcc changes
> are expected to be portable beyond glibc, then the existence of
> an alternative option for glibc shouldn't block the libgcc changes.
> The two approaches aren't be mutually exclusive and each approach
> would achieve something that the other one wouldn't.

to make this discussion a bit less abstract I have implemented a
prototype: https://pastebin.com/KtrPhci2
It is not perfect yet, for example frame de-registration is suboptimal,
but it allows us to speak about an actual implementation with real
performance numbers.

To give some numbers I take my silly example from
https://repl.it/repls/DeliriousPrivateProfiler
with 6 * 1,000,000 function calls, where half of the functions throw,
and I execute it either single threaded or multi-threaded (with 6
threads) on a i7-6800K. Note that the effects are even more dramatic on
larger machines.
The "old" implementation is gcc 9.3., the "new" implementation is gcc
git with the patch linked above. (Note that you have to both use the
patched gcc and use LD_LIBRARY_PATH or similar to force the new libgcc
when repeating the experiment).

The execution times are:

old approach, single threaded: 4.3s
old approach, multi threaded: 6.5s
new approach, single threaded: 3.9s
new approach, multi threaded: 0.7s

This is faster even when single threaded, and it is dramatically faster
when using multiple threads. On machines where atomics are supported
raising an exception no longer uses a global mutex (except for the first
exception after new exception frames were added), and thus exception
processing scales nicely with the threaded count. The code also handles
the out-of-memory condition, falling back to linear search in that case
(just as the old code).

Of course this needs more polishing and testing, but would something
like this be acceptable for gcc? It makes exceptions much more useful in
multi-threaded applications.

Thomas

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: size of exception handling (Was: performance of exception handling)
  2020-05-12 22:48               ` Jonathan Wakely
@ 2020-05-13  8:04                 ` David Brown
  0 siblings, 0 replies; 29+ messages in thread
From: David Brown @ 2020-05-13  8:04 UTC (permalink / raw)
  To: Jonathan Wakely, Freddie Chopin; +Cc: gcc

On 13/05/2020 00:48, Jonathan Wakely via Gcc wrote:
> On Tue, 12 May 2020 at 23:39, Jonathan Wakely wrote:
>> On Tue, 12 May 2020, 21:57 Freddie Chopin, <freddie_chopin@op.pl> wrote:
>>> Anyway... If you have to recompile the toolchain, the problem is still
>>> there. Most of the people (like 99,666%) will not do that for various
>>> reasons. Some don't know how, some use only Windows, some don't have
>>> time to deal with the compilation (the whole toolchain takes around an
>>> hour here, but this excludes the time to prepare the script that builds
>>> it), some other consider the toolchain provided by MCU vendor (or by
>>> ARM) as "tested to work correctly" so they don't want to replace that
>>> with their custom built solution, and so on, and so on...
>>
>> There is no one-size-fits-all solution that gives everybody their
>> ideal set of defaults, so we provide configuration options to tune
>> things for your needs. Complaining that you have to rebuild things to
>> get different defaults seems silly. Would you prefer we don't offer
>> the options at all?
> 
> And I also never said that every user should rebuild the toolchain.
> The options can be used by vendors providing a toolchain for their
> hardware, if the verbose handler (or exceptions in general!) are not
> appropriate for their users. Just because something isn't the default,
> doesn't mean every user needs to change it themselves.

I think complaining about extra unnecessary code (such as string 
handling for std::terminate) is justified - but the complaints should 
not be directed at the gcc or libstdc++ folks.  As you say, /you/ 
provide the options - if the vendors make poor choices of options, then 
it is /they/ who should get the bug reports and complaints.

One option that would be nice (I don't know if it is realistic), would 
be to say that the code should never stop normally.  On many embedded 
systems, main() never exits.  std::terminate() doesn't need any code 
except perhaps to reset the processor (that will be target-specific, of 
course).  exit() can never be called - there is no need for atexit 
functions, terminate handlers, global destructors, or any of the other 
machinery used for controlled shutdown and ending of a program.


> 
> And if writing a script and waiting an hour is too much effort to
> reduce unwanted overhead, then I guess that overhead isn't such a big
> deal anyway.
> 

There are, as Freddie mentions, many other reasons for end-users not 
building their own toolchains.  I have built many cross-gcc toolcahins 
over the years (starting with a gcc 2.95 m68k toolchain over 20 years 
ago, IIRC).  But for most professional embedded development, pre-built 
toolchains from vendors are a requirement - home-built is simply not an 
acceptable option.  Time and effort don't come into it.  (This is a good 
thing for gcc - a fair number of major gcc developers work for companies 
that earn money selling pre-built toolchains.)

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: size of exception handling (Was: performance of exception handling)
  2020-05-12  7:47         ` Oleg Endo
@ 2020-05-13  9:13           ` Jonathan Wakely
  0 siblings, 0 replies; 29+ messages in thread
From: Jonathan Wakely @ 2020-05-13  9:13 UTC (permalink / raw)
  To: Oleg Endo; +Cc: Freddie Chopin, Moritz Strübe, gcc

On Tue, 12 May 2020, 10:15 Oleg Endo, <oleg.endo@t-online.de> wrote:
>
> On Tue, 2020-05-12 at 09:20 +0200, Freddie Chopin wrote:
> >
> > I actually have to build my own toolchain instead of the one provided
> > by ARM, because to really NOT use C++ exceptions, you have to recompile
> > the whole libstdc++ with `-fno-exceptions -fno-rtti` (yes, I know they
> > provide the "nano" libraries, but I the options they used for newlib
> > don't suit my needs - this is "too minimized"). If you pass these two
> > flags during compilation and linking of your own application, this
> > disables these features only in your code. As libstdc++ is compiled
> > with exceptions and RTTI enabled, ...
>
> IMHO this is a conceptual fail of the whole concept of using pre-
> compiled pre-installed libraries somewhere in the toolchain, in
> particular for this kind of cross-compilation scenario.


The concept works well in other scenarios though. Not everybody has
the same use case or the same needs.


>   Like you say,
> when we set "exceptions off" it usually means for the whole embedded
> app, and the whole embedded app usually means all the OS and runtime
> libraries and everything, not just the user code.
>
> One option is to not use the pre-compiled toolchain libstc++ but build
> it from source (or use another c++ std lib of your choice), as part of
> the whole project, with the desired project settings.


Yes, IMO that's probably the right option if there is no pre-compiled
toolchain that matches your desired configuration.

If there are properties of libstdc++ that make it more difficult than
necessary, we want to know about them.


>
> BTW, just to throw in my 2-cents into the "I'm using MCU" pool of
> pain/joy ... in one of my projects I'm using STM32F051K6U6, 32 KB
> flash, 8 KB RAM, running all C++ code with shared C++ RPC libraries to
> communicate with other (bigger) devices.  Exceptions, RTTI, threads
> have to be turned off and only the header-only things from the stdlib
> can be used and no heap allocations.


Are you using headers that are not part of the freestanding subset? Which ones?

A future version of the C++ standard is probably going to expand the
headers that should be part of freestanding (or replace the concept of
freestanding with something more useful) so it would be good to know
what parts of the standard library people are actually using on
devices like that.


> Otherwise the thing doesn't fit.
> Don't feel like rewriting the whole thing either.  There are some
> annoyances when turning off exceptions and RTTI which results in
> increased code maintenance.


Such as?


> I'd definitely be good and highly
> appreciated if there were any improvements in the area of exception
> handling.
>
> Cheers,
> Oleg
>

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2020-05-13  9:13 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-11  8:14 performance of exception handling Thomas Neumann
2020-05-11 10:40 ` Florian Weimer
2020-05-11 13:59   ` Thomas Neumann
2020-05-11 14:22     ` Florian Weimer
2020-05-11 15:14     ` size of exception handling (Was: performance of exception handling) Moritz Strübe
2020-05-12  7:20       ` Freddie Chopin
2020-05-12  7:47         ` Oleg Endo
2020-05-13  9:13           ` Jonathan Wakely
2020-05-12  9:16         ` size of exception handling Florian Weimer
2020-05-12  9:44           ` Freddie Chopin
2020-05-12 11:11             ` Jonathan Wakely
2020-05-12 11:17             ` Moritz Strübe
2020-05-12 11:29               ` Florian Weimer
2020-05-12 12:01                 ` Moritz Strübe
2020-05-12 11:07         ` size of exception handling (Was: performance of exception handling) Jonathan Wakely
2020-05-12 20:56           ` Freddie Chopin
2020-05-12 22:39             ` Jonathan Wakely
2020-05-12 22:48               ` Jonathan Wakely
2020-05-13  8:04                 ` David Brown
2020-05-12  9:03       ` size of exception handling Florian Weimer
2020-05-11 14:36   ` performance " David Edelsohn
2020-05-11 14:52     ` Florian Weimer
2020-05-11 15:12       ` David Edelsohn
2020-05-11 15:24         ` Florian Weimer
2020-05-12  6:08     ` Thomas Neumann
2020-05-12  7:15       ` Richard Biener
2020-05-12  7:30         ` Thomas Neumann
2020-05-12  9:01       ` Richard Sandiford
2020-05-13  1:13         ` Thomas Neumann

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).