public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: Thomas Neumann <thomas.neumann@in.tum.de>
To: "H.J. Lu" <hjl.tools@gmail.com>, Jakub Jelinek <jakub@redhat.com>
Cc: Tamar Christina <Tamar.Christina@arm.com>,
	"gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>,
	Jason Merrill <jason@redhat.com>,
	Florian Weimer <fweimer@redhat.com>,
	Jonathan Wakely <jwakely.gcc@gmail.com>
Subject: Re: [PATCH v4] eliminate mutex in fast path of __register_frame
Date: Tue, 22 Nov 2022 01:31:02 +0100	[thread overview]
Message-ID: <91045a34-a534-4436-bb06-cac32d797a36@in.tum.de> (raw)
In-Reply-To: <CAMe9rOoAwZkJjPP23DPsP0P3JcZi_SxwFcNOuB1uxxj2rmaqtg@mail.gmail.com>

Hi,

>>>> When dynamically linking a fast enough machine hides the latency, but when
>>>> Statically linking or on slower devices this change caused a 5x increase in
>>>> Instruction count and 2x increase in cycle count before getting to main.

I have looked at ways to fix that. The problem is that with static 
linking unwinding tables are registered dynamically, and with my patch 
that registration triggers an eager sort of fde lists. While previously 
the lists were sorted when the first exception was thrown. If an 
application throws at least one exception there is no downside in eager 
sorting, but if the application never throws there is overhead.

The obvious way to improve the situation is to make sorting faster. When 
replacing the split+sort+merge logic with a radix sort (which can be 
done without additional memory) we get the following timings for your
#include <cstdio>
int main() {}
example (with git stat -r 200):

# pre-patch version, fast
               0,06 msec task-clock
            272.286      cycles
            464.754      instructions
# post-patch version, slow
               0,21 msec task-clock
            972.876      cycles
          3.079.515      instructions
# +radix sort, in the middle
               0,13 msec task-clock
            604.697      cycles
          1.702.930      instructions

The radix sort clearly improves things, but it does not fully eliminate 
the overhead.

The question is, how much do we care about that situation (i.e., static 
linking, exceptions registered but never thrown). I could change the 
code to recognize three states instead of two: no exceptions registered, 
exceptions register but never thrown, and full exception mode. But that 
would increase code complexity and it would pessimize applications that 
do throw, as we now need more checks to guard against concurrent changes.

It makes the code more complex and a bit slower, which is why I am 
hesistant. But I can implement that if you think that we need that. Or 
we just replace the sort, which is probably a good idea anyway.

Best

Thomas

  reply	other threads:[~2022-11-22  0:31 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-09-16 10:19 Thomas Neumann
2022-09-16 14:49 ` Jason Merrill
2022-09-18  8:59 ` Dimitar Dimitrov
2022-09-18  9:20   ` Thomas Neumann
2022-09-18 10:02   ` Thomas Neumann
2022-09-19 13:46 ` Stephan Bergmann
2022-09-19 13:55   ` Thomas Neumann
2022-09-19 14:00     ` Stephan Bergmann
2022-09-19 15:33   ` Thomas Neumann
2022-09-20  5:39     ` Stephan Bergmann
2022-11-21 11:14 ` Tamar Christina
2022-11-21 11:22   ` Thomas Neumann
2022-11-21 11:48     ` Jakub Jelinek
2022-11-21 17:13       ` H.J. Lu
2022-11-22  0:31         ` Thomas Neumann [this message]
2022-11-22  8:20           ` Florian Weimer
2022-11-22  9:12             ` Thomas Neumann
2022-12-09 17:34             ` [PATCH] initialize fde objects lazily Thomas Neumann
2022-12-15 16:11               ` Tamar Christina
2022-12-16 17:25               ` Jason Merrill
2023-05-02 14:32             ` [PATCH] release the sorted FDE array when deregistering a frame [PR109685] Thomas Neumann
2023-05-10 10:49             ` [PATCH] fix radix sort on 32bit platforms [PR109670] Thomas Neumann
2023-08-10 11:33             ` [PATCH] preserve base pointer for __deregister_frame [PR110956] Thomas Neumann
2023-08-11 15:21               ` Jeff Law
2024-03-15 10:29             ` [PATCH] handle unwind tables that are embedded within unwinding code, [PR111731] Thomas Neumann
2024-03-20  8:25               ` Richard Biener
2024-03-22 13:35               ` Jeff Law
2024-03-22 13:36               ` Jeff Law
2024-03-22 14:43                 ` Thomas Neumann
2022-11-22  8:00         ` [PATCH] speed up end_fde_sort using radix sort Thomas Neumann
2022-12-16 18:02           ` Jason Merrill
2022-11-21 11:49     ` [PATCH v4] eliminate mutex in fast path of __register_frame Tamar Christina
2022-11-21 11:53       ` Thomas Neumann

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=91045a34-a534-4436-bb06-cac32d797a36@in.tum.de \
    --to=thomas.neumann@in.tum.de \
    --cc=Tamar.Christina@arm.com \
    --cc=fweimer@redhat.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=hjl.tools@gmail.com \
    --cc=jakub@redhat.com \
    --cc=jason@redhat.com \
    --cc=jwakely.gcc@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).