From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <tneumann@users.sourceforge.net>
Received: from tneumann.de (tneumann.de [5.45.106.102])
 by sourceware.org (Postfix) with ESMTPS id 612E8388B01E
 for <gcc@gcc.gnu.org>; Mon, 11 May 2020 08:14:38 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 612E8388B01E
Received: from [IPv6:2003:ee:af11:5f00:788a:75b0:98f1:a49e]
 (p200300EEAF115F00788A75B098F1A49E.dip0.t-ipconnect.de
 [IPv6:2003:ee:af11:5f00:788a:75b0:98f1:a49e])
 (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits)
 key-exchange ECDHE (P-256) server-signature RSA-PSS (2048 bits) server-digest
 SHA256) (No client certificate requested)
 by tneumann.de (Postfix) with ESMTPSA id 9C52111F777
 for <gcc@gcc.gnu.org>; Mon, 11 May 2020 10:14:36 +0200 (CEST)
To: gcc@gcc.gnu.org
From: Thomas Neumann <tneumann@users.sourceforge.net>
Subject: performance of exception handling
Message-ID: <0bbdaab7-c083-e14e-6227-27713dab9657@users.sourceforge.net>
Date: Mon, 11 May 2020 10:14:36 +0200
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101
 Thunderbird/68.7.0
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Language: en-US
Content-Transfer-Encoding: 7bit
X-Spam-Status: No, score=-0.8 required=5.0 tests=BAYES_00, JMQ_SPF_ALL,
 KAM_DMARC_STATUS, KAM_SHORT, SPF_HELO_NONE, SPF_PASS,
 TXREP autolearn=no autolearn_force=no version=3.4.2
X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on
 server2.sourceware.org
X-BeenThere: gcc@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc mailing list <gcc.gcc.gnu.org>
List-Unsubscribe: <http://gcc.gnu.org/mailman/options/gcc>,
 <mailto:gcc-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc/>
List-Post: <mailto:gcc@gcc.gnu.org>
List-Help: <mailto:gcc-request@gcc.gnu.org?subject=help>
List-Subscribe: <http://gcc.gnu.org/mailman/listinfo/gcc>,
 <mailto:gcc-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Mon, 11 May 2020 08:14:48 -0000

Hi,

I want to improve the performance of C++ exception handling, and I would
like to get some feedback on how to tackle that.

Currently, exception handling scales poorly due to global mutexes when
throwing. This can be seen with a small demo script here:
https://repl.it/repls/DeliriousPrivateProfiler
Using a thread count >1 is much slower than running single threaded.
This global locking is particular painful on a machine with more than a
hundred cores, as there mutexes are expensive and contention becomes
much more likely due to the high degree of parallelism.

Of course conventional wisdom is not to use exceptions when exceptions
can occur somewhat frequently. But I think that is a silly argument, see
the WG21 paper P0709 for a detailed discussion. In particular since
there is no technical reason why they have to be slow, it is just the
current implementation that is slow.

In the current gcc implementation on Linux the bottleneck is
_Unwind_Find_FDE, or more precisely, the function dl_iterate_phdr,
that is called for every frame and that iterates over all shared
libraries while holding a global lock.
That is inherently slow, both due to global locking and due to the data
structures involved.
And it is not easy to speed that up with, e.g., a thread local cache, as
glibc has no mechanism to notify us if a shared library is added or removed.

We therefore need a way to locate the exception frames that is
independent from glibc. One way to achieve that would be to explicitly
register exception frames with __register_frame_info_bases in a
constructor function (and deregister them in a destructor function).
Of course probing explicitly registered frame currently uses a global
lock, too, but that implementation is provided by libgcc, and we can
change that to something better, allowing for lock free reads.
In libgcc explicitly registered frames take precedence over the
dl_iterate_phdr mechanism, which means that we could mix future code
that does call __register_frame_info_bases explicitly with code that
does not. Code that does register will unwind faster than code that does
not, but both can coexist in one process.

Does that sound like a viable strategy to speed up exception handling? I
would be willing to contribute code for that, but I first wanted to know
if you are interested and if the strategy makes sense. Also, my
implementation makes use of atomics, which I hope are available on all
platforms that use unwind-dw2-fde.c, but I am not sure.

Thomas