From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mailout2.rbg.tum.de (mailout2.rbg.tum.de [IPv6:2a09:80c0::202]) by sourceware.org (Postfix) with ESMTPS id C6A75385843E for ; Tue, 22 Nov 2022 00:31:09 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org C6A75385843E Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=in.tum.de Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=in.tum.de Received: from mailrelay1.rbg.tum.de (mailrelay1.in.tum.de [IPv6:2a09:80c0:254::14]) by mailout2.rbg.tum.de (Postfix) with ESMTPS id F11524C01F4; Tue, 22 Nov 2022 01:31:04 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=in.tum.de; s=20220209; t=1669077065; bh=BFqq/PJhr/z3QqTrsaoHw8d+I9v5KCHqbHtv30TyelQ=; h=Date:Subject:To:Cc:References:From:In-Reply-To:From; b=aLG1Ou5bZfxxZfqI7dPXjr7lGAr20gWq6bnTdrziV4XouY7WbGJK0eCGDNtJsbpG5 kfSCpWL6If7tv7rgzM8/ZWqIuMv8sX+/+5C/3z5aqwCug4XrDQcpuXdVkbhWgWDUCH r0oPSRdj8H7CM7rRV7kZBJxMLTIH5sV2GQ1VE1lloyY1Kv7sVRcmCEtF07jQ3mwIRo HtoPFlEGFB6Dol1vzWO6ztYC0CEDmCBx6v+sNJ/Engd4vsgERqKfiQ9NgOp1rLncuu SU7YS9rdDzAkgQOVGHSHVKbv718AECI5dZOEH54KEaEpM+l5Bj4Vm88wK8QR2jJl8L D1MWcE4fdLqOQ== Received: by mailrelay1.rbg.tum.de (Postfix, from userid 112) id E868ADD; Tue, 22 Nov 2022 01:31:04 +0100 (CET) Received: from mailrelay1.rbg.tum.de (localhost [127.0.0.1]) by mailrelay1.rbg.tum.de (Postfix) with ESMTP id 95A0E85; Tue, 22 Nov 2022 01:31:04 +0100 (CET) Received: from mail.in.tum.de (vmrbg426.in.tum.de [131.159.0.73]) by mailrelay1.rbg.tum.de (Postfix) with ESMTPS id 9017B22; Tue, 22 Nov 2022 01:31:04 +0100 (CET) Received: by mail.in.tum.de (Postfix, from userid 112) id 87D1F4A021A; Tue, 22 Nov 2022 01:31:04 +0100 (CET) Received: (Authenticated sender: neumann) by mail.in.tum.de (Postfix) with ESMTPSA id 7ECBF4A0113; Tue, 22 Nov 2022 01:31:03 +0100 (CET) (Extended-Queue-bit xtech_em@fff.in.tum.de) Message-ID: <91045a34-a534-4436-bb06-cac32d797a36@in.tum.de> Date: Tue, 22 Nov 2022 01:31:02 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.4.2 Subject: Re: [PATCH v4] eliminate mutex in fast path of __register_frame To: "H.J. Lu" , Jakub Jelinek Cc: Tamar Christina , "gcc-patches@gcc.gnu.org" , Jason Merrill , Florian Weimer , Jonathan Wakely References: <2a4776b9-9271-bb3c-a626-d5ec22dae6f3@in.tum.de> Content-Language: en-US From: Thomas Neumann In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-4.6 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,NICE_REPLY_A,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Hi, >>>> When dynamically linking a fast enough machine hides the latency, but when >>>> Statically linking or on slower devices this change caused a 5x increase in >>>> Instruction count and 2x increase in cycle count before getting to main. I have looked at ways to fix that. The problem is that with static linking unwinding tables are registered dynamically, and with my patch that registration triggers an eager sort of fde lists. While previously the lists were sorted when the first exception was thrown. If an application throws at least one exception there is no downside in eager sorting, but if the application never throws there is overhead. The obvious way to improve the situation is to make sorting faster. When replacing the split+sort+merge logic with a radix sort (which can be done without additional memory) we get the following timings for your #include int main() {} example (with git stat -r 200): # pre-patch version, fast 0,06 msec task-clock 272.286 cycles 464.754 instructions # post-patch version, slow 0,21 msec task-clock 972.876 cycles 3.079.515 instructions # +radix sort, in the middle 0,13 msec task-clock 604.697 cycles 1.702.930 instructions The radix sort clearly improves things, but it does not fully eliminate the overhead. The question is, how much do we care about that situation (i.e., static linking, exceptions registered but never thrown). I could change the code to recognize three states instead of two: no exceptions registered, exceptions register but never thrown, and full exception mode. But that would increase code complexity and it would pessimize applications that do throw, as we now need more checks to guard against concurrent changes. It makes the code more complex and a bit slower, which is why I am hesistant. But I can implement that if you think that we need that. Or we just replace the sort, which is probably a good idea anyway. Best Thomas