From: Thomas Neumann <thomas.neumann@in.tum.de>
To: "H.J. Lu" <hjl.tools@gmail.com>, Jakub Jelinek <jakub@redhat.com>
Cc: Tamar Christina <Tamar.Christina@arm.com>,
"gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>,
Jason Merrill <jason@redhat.com>,
Florian Weimer <fweimer@redhat.com>,
Jonathan Wakely <jwakely.gcc@gmail.com>
Subject: Re: [PATCH v4] eliminate mutex in fast path of __register_frame
Date: Tue, 22 Nov 2022 01:31:02 +0100 [thread overview]
Message-ID: <91045a34-a534-4436-bb06-cac32d797a36@in.tum.de> (raw)
In-Reply-To: <CAMe9rOoAwZkJjPP23DPsP0P3JcZi_SxwFcNOuB1uxxj2rmaqtg@mail.gmail.com>
Hi,
>>>> When dynamically linking a fast enough machine hides the latency, but when
>>>> Statically linking or on slower devices this change caused a 5x increase in
>>>> Instruction count and 2x increase in cycle count before getting to main.
I have looked at ways to fix that. The problem is that with static
linking unwinding tables are registered dynamically, and with my patch
that registration triggers an eager sort of fde lists. While previously
the lists were sorted when the first exception was thrown. If an
application throws at least one exception there is no downside in eager
sorting, but if the application never throws there is overhead.
The obvious way to improve the situation is to make sorting faster. When
replacing the split+sort+merge logic with a radix sort (which can be
done without additional memory) we get the following timings for your
#include <cstdio>
int main() {}
example (with git stat -r 200):
# pre-patch version, fast
0,06 msec task-clock
272.286 cycles
464.754 instructions
# post-patch version, slow
0,21 msec task-clock
972.876 cycles
3.079.515 instructions
# +radix sort, in the middle
0,13 msec task-clock
604.697 cycles
1.702.930 instructions
The radix sort clearly improves things, but it does not fully eliminate
the overhead.
The question is, how much do we care about that situation (i.e., static
linking, exceptions registered but never thrown). I could change the
code to recognize three states instead of two: no exceptions registered,
exceptions register but never thrown, and full exception mode. But that
would increase code complexity and it would pessimize applications that
do throw, as we now need more checks to guard against concurrent changes.
It makes the code more complex and a bit slower, which is why I am
hesistant. But I can implement that if you think that we need that. Or
we just replace the sort, which is probably a good idea anyway.
Best
Thomas
next prev parent reply other threads:[~2022-11-22 0:31 UTC|newest]
Thread overview: 33+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-09-16 10:19 Thomas Neumann
2022-09-16 14:49 ` Jason Merrill
2022-09-18 8:59 ` Dimitar Dimitrov
2022-09-18 9:20 ` Thomas Neumann
2022-09-18 10:02 ` Thomas Neumann
2022-09-19 13:46 ` Stephan Bergmann
2022-09-19 13:55 ` Thomas Neumann
2022-09-19 14:00 ` Stephan Bergmann
2022-09-19 15:33 ` Thomas Neumann
2022-09-20 5:39 ` Stephan Bergmann
2022-11-21 11:14 ` Tamar Christina
2022-11-21 11:22 ` Thomas Neumann
2022-11-21 11:48 ` Jakub Jelinek
2022-11-21 17:13 ` H.J. Lu
2022-11-22 0:31 ` Thomas Neumann [this message]
2022-11-22 8:20 ` Florian Weimer
2022-11-22 9:12 ` Thomas Neumann
2022-12-09 17:34 ` [PATCH] initialize fde objects lazily Thomas Neumann
2022-12-15 16:11 ` Tamar Christina
2022-12-16 17:25 ` Jason Merrill
2023-05-02 14:32 ` [PATCH] release the sorted FDE array when deregistering a frame [PR109685] Thomas Neumann
2023-05-10 10:49 ` [PATCH] fix radix sort on 32bit platforms [PR109670] Thomas Neumann
2023-08-10 11:33 ` [PATCH] preserve base pointer for __deregister_frame [PR110956] Thomas Neumann
2023-08-11 15:21 ` Jeff Law
2024-03-15 10:29 ` [PATCH] handle unwind tables that are embedded within unwinding code, [PR111731] Thomas Neumann
2024-03-20 8:25 ` Richard Biener
2024-03-22 13:35 ` Jeff Law
2024-03-22 13:36 ` Jeff Law
2024-03-22 14:43 ` Thomas Neumann
2022-11-22 8:00 ` [PATCH] speed up end_fde_sort using radix sort Thomas Neumann
2022-12-16 18:02 ` Jason Merrill
2022-11-21 11:49 ` [PATCH v4] eliminate mutex in fast path of __register_frame Tamar Christina
2022-11-21 11:53 ` Thomas Neumann
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=91045a34-a534-4436-bb06-cac32d797a36@in.tum.de \
--to=thomas.neumann@in.tum.de \
--cc=Tamar.Christina@arm.com \
--cc=fweimer@redhat.com \
--cc=gcc-patches@gcc.gnu.org \
--cc=hjl.tools@gmail.com \
--cc=jakub@redhat.com \
--cc=jason@redhat.com \
--cc=jwakely.gcc@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).