From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 23065 invoked by alias); 6 Aug 2012 00:39:11 -0000 Received: (qmail 23055 invoked by uid 22791); 6 Aug 2012 00:39:10 -0000 X-SWARE-Spam-Status: No, hits=-4.3 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00,KHOP_THREADED,TW_OV X-Spam-Check-By: sourceware.org Received: from localhost (HELO gcc.gnu.org) (127.0.0.1) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Mon, 06 Aug 2012 00:38:57 +0000 From: "chip at pobox dot com" To: gcc-bugs@gcc.gnu.org Subject: [Bug middle-end/28831] [4.6/4.7/4.8 Regression] Aggregate copy not elided when using a return value as a pass-by-value parameter Date: Mon, 06 Aug 2012 00:39:00 -0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: middle-end X-Bugzilla-Keywords: missed-optimization X-Bugzilla-Severity: normal X-Bugzilla-Who: chip at pobox dot com X-Bugzilla-Status: NEW X-Bugzilla-Priority: P2 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: 4.6.4 X-Bugzilla-Changed-Fields: CC Message-ID: In-Reply-To: References: X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated Content-Type: text/plain; charset="UTF-8" MIME-Version: 1.0 Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-bugs-owner@gcc.gnu.org X-SW-Source: 2012-08/txt/msg00286.txt.bz2 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28831 Chip Salzenberg changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |chip at pobox dot com --- Comment #15 from Chip Salzenberg 2012-08-06 00:37:36 UTC --- Ping. I've just run into this with the tip of the gcc 4.7.1 branch. Is there a workaround? Some way to label the struct as not needing to be stored? Something like __attribute__((noaddress)); We want to pass and return structs by value as current C++ style recommends, but the extra register spills are dragging down performance. For small key classes we've switched to using big integers with masking functions, but for larger ones there is no workaround that we know of. Given this code: extern val_t foo(); extern int bar(val_t); int main() { return bar(foo()); } When val_t is a struct of two int64_t on x86_64, the code has two extra stores: > movq %rax, (%rsp) > movq %rdx, 8(%rsp) and the stack frame is larger and there is no tail call optimization. When val_t is __int128 on x86_64, the code is optimal: tail call, no extra stores, smaller stack frame (because there is no need to store the value).