From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id CBE353858D38; Sun, 20 Nov 2022 23:57:27 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org CBE353858D38 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1668988647; bh=Nf6oUE4+aFTIFvm4LJ0bhRWGYeOS6m8MUCV0ZnzIZSE=; h=From:To:Subject:Date:In-Reply-To:References:From; b=AlOUsCBA6FlipFSKqsy8BmyAB9+4T5jph6SyNZMekX7ujoAPYiLhDtayb4xyMJ30s 8Phdy71pn1c+pmJeaf1pMh3Ar+qcVVQ2z8Jmz/fZLEP/QMzJv9LlXC5H0nuqrWZVrd 7mfJQLTJC2Rh8S5jekntMhEiPUjRD7+9rhJgs1qk= From: "tnfchris at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug libstdc++/107675] [13 Regression] GCC-13 is significantly slower to startup on C++ programs Date: Sun, 20 Nov 2022 23:57:20 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: libstdc++ X-Bugzilla-Version: 13.0 X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: tnfchris at gcc dot gnu.org X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: 13.0 X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 List-Id: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D107675 --- Comment #10 from Tamar Christina --- I've bisected this to: commit 6e80a1d164d1f996ad08a512c000025a7c2ca893 Author: Thomas Neumann Date: Tue Mar 1 21:57:35 2022 +0100 eliminate mutex in fast path of __register_frame The __register_frame/__deregister_frame functions are used to register unwinding frames from JITed code in a sorted list. That list itself is protected by object_mutex, which leads to terrible performance in multi-threaded code and is somewhat expensive even if single-threade= d. There was already a fast-path that avoided taking the mutex if no frame was registered at all. This commit eliminates both the mutex and the sorted list from the atomic fast path, and replaces it with a btree that uses optimistic lock coupling during lookup. This allows for fully parallel unwinding and is essential to scale exception handling to large core counts. libgcc/ChangeLog: * unwind-dw2-fde.c (release_registered_frames): Cleanup at shutdown. (__register_frame_info_table_bases): Use btree in atomic fast p= ath. (__deregister_frame_info_bases): Likewise. (_Unwind_Find_FDE): Likewise. (base_from_object): Make parameter const. (classify_object_over_fdes): Add query-only mode. (get_pc_range): Compute PC range for lookup. * unwind-dw2-fde.h (last_fde): Make parameter const. * unwind-dw2-btree.h: New file. libgcc/unwind-dw2-btree.h | 953 ++++++++++++++++++++++++++++++++++++++++++= ++++ libgcc/unwind-dw2-fde.c | 194 +++++++--- libgcc/unwind-dw2-fde.h | 2 +- 3 files changed, 1098 insertions(+), 51 deletions(-) create mode 100644 libgcc/unwind-dw2-btree.h Looking at the patch it looks like it now forces an eager registration of objects during frame registration vs a lazy initialization that was there before.=