From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by sourceware.org (Postfix) with ESMTPS id 175F93858421 for ; Mon, 19 Sep 2022 09:33:44 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 175F93858421 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=redhat.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1663580023; h=from:from:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:in-reply-to:in-reply-to: references:references; bh=24qozSRx7OPl0IVtK6YXnsBmif5rbMIaMm1RKCFfbQY=; b=QNHmN28qADAjwTy+zI88OgOICcF26mHfH72566L+nWds+e+e3vHSrN+m6OuRA9kIKIIpVa 7BApg64a1LJscyJhb6joxjawowX+tMPv5TEPTWeBeFQK/cnpxKjFxxAX+sPKHP5yKyc6kg wfbt6fN62EuV40fjBOnB0LmDZ71up7w= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-138-0d5T-fbyMeeKBCwsHPj4Zw-1; Mon, 19 Sep 2022 05:33:42 -0400 X-MC-Unique: 0d5T-fbyMeeKBCwsHPj4Zw-1 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.rdu2.redhat.com [10.11.54.3]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 0C5301C07540; Mon, 19 Sep 2022 09:33:42 +0000 (UTC) Received: from tucnak.zalov.cz (unknown [10.39.192.194]) by smtp.corp.redhat.com (Postfix) with ESMTPS id BA0851121314; Mon, 19 Sep 2022 09:33:41 +0000 (UTC) Received: from tucnak.zalov.cz (localhost [127.0.0.1]) by tucnak.zalov.cz (8.17.1/8.17.1) with ESMTPS id 28J9XcNj641742 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NOT); Mon, 19 Sep 2022 11:33:39 +0200 Received: (from jakub@localhost) by tucnak.zalov.cz (8.17.1/8.17.1/Submit) id 28J9Xc3m641741; Mon, 19 Sep 2022 11:33:38 +0200 Date: Mon, 19 Sep 2022 11:33:37 +0200 From: Jakub Jelinek To: Florian Weimer Cc: Jason Merrill , Michael Matz , gcc-patches@gcc.gnu.org Subject: Re: [PATCH] libgcc: Decrease size of _Unwind_FrameState and even more size of cleared area in uw_frame_state_for Message-ID: Reply-To: Jakub Jelinek References: <87czbrhg1y.fsf@oldenburg.str.redhat.com> MIME-Version: 1.0 In-Reply-To: <87czbrhg1y.fsf@oldenburg.str.redhat.com> X-Scanned-By: MIMEDefang 3.1 on 10.11.54.3 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=us-ascii Content-Disposition: inline X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_LOW,SPF_HELO_NONE,SPF_NONE,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Mon, Sep 19, 2022 at 11:25:13AM +0200, Florian Weimer wrote: > * Jakub Jelinek: > > > The disadvantage of the patch is that touching reg[x].loc and how[x] > > now means 2 cachelines rather than one as before, and I admit beyond > > bootstrap/regtest I haven't benchmarked it in any way. Florian, could > > you retry whatever you measured to get at the 40% of time spent on the > > stack clearing to see how the numbers change? > > A benchmark that unwinds through 100 frames containing a std::string > variable goes from (0b5b8ac5cb7fe92dd17ae8bd7de84640daa59e84): > > min: 24418 ns > 25%: 24740 ns > 50%: 24790 ns > 75%: 24840 ns > 95%: 24937 ns > 99%: 26174 ns > max: 42530 ns > avg: 24826.1 ns > > to (0b5b8ac5cb7fe92dd17ae8bd7de84640daa59e84 with this patch): > > min: 22307 ns > 25%: 22640 ns > 50%: 22713 ns > 75%: 22787 ns > 95%: 22948 ns > 99%: 24839 ns > max: 52658 ns > avg: 22863.4 ns > > So 227 ns per frame instead of 248 ns per frame, or ~9% less. Thanks for doing that. > Moving cfa_how after how in struct frame_state_reg_info as an 8-bit > bitfield should avoid zeroing another 8 bytes. This shaves off another > 3 ns per frame in my testing (on a Core i9-10900T, so with ERMS). Good idea. Won't help always, on some targets how could have size divisible by pointer alignment, but when it is at the end it always increases the size by alignment of pointer, while after how array it only does so if how is multiple of pointer alignment. > The REP STOS still dominates uw_frame_state_for execution time, but this > seems to be a profiling artifact. Replacing it with PXOR and seven > MOVUPS instructions makes the hotspot go away, but performance does not > improve. Odd. Jakub