From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id DB5CB3858D35; Mon, 8 May 2023 16:28:56 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org DB5CB3858D35 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1683563336; bh=ATrE60m5VoEa8Hht0cDcDKceRwlkRtixY3TlFXgm3Fw=; h=From:To:Subject:Date:In-Reply-To:References:From; b=EwOsF5/8mpHcjf0CqOYLYI2ZpZcGIRSuPNJ7oppkiQejjJEdpDX+moJdIwgOgLhMG ZBWiyWSK4aFNt+naZ4kE8bdKIvIadPhpFJ8piV1eMXr3Doklldde3YfJDvVpd5IR4f VMzWZpr6ExD95dpfLg2QVvBlLj792Ck8arLX+CD0= From: "roger at nextmovesoftware dot com" To: gcc-bugs@gcc.gnu.org Subject: [Bug middle-end/109766] Passing doubles through the stack generates a stack adjustment per each such argument at -Os/-Oz. Date: Mon, 08 May 2023 16:28:56 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: middle-end X-Bugzilla-Version: 13.1.0 X-Bugzilla-Keywords: missed-optimization X-Bugzilla-Severity: normal X-Bugzilla-Who: roger at nextmovesoftware dot com X-Bugzilla-Status: NEW X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: cf_reconfirmed_on everconfirmed bug_status Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 List-Id: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D109766 Roger Sayle changed: What |Removed |Added ---------------------------------------------------------------------------- Last reconfirmed| |2023-05-08 Ever confirmed|0 |1 Status|UNCONFIRMED |NEW --- Comment #2 from Roger Sayle --- I believe the problem is in the cprop_hardreg pass, which undoes reload's register assignments (to use DImode GPR registers with -Os), by propagating= DF mode values into *pushdi2_rex64, which then get split during the split3 pass into lea/movq pairs, that are each larger than a DImode push. The work aro= und, for this test case, is to use -Os -fno-cprop-registers which produces code that's shorter than -O2. 0000000000000000 : 0: 66 48 0f 7e ca movq %xmm1,%rdx 5: 66 48 0f 7e d1 movq %xmm2,%rcx a: 66 48 0f 7e de movq %xmm3,%rsi f: 50 push %rax 10: 66 49 0f 7e e0 movq %xmm4,%r8 15: 66 48 0f 7e c0 movq %xmm0,%rax 1a: 66 49 0f 7e e9 movq %xmm5,%r9 1f: 66 49 0f 7e f2 movq %xmm6,%r10 24: 66 49 0f 7e fb movq %xmm7,%r11 29: 41 53 push %r11 2b: 41 52 push %r10 2d: 41 51 push %r9 2f: 41 50 push %r8 31: 56 push %rsi 32: 51 push %rcx 33: 52 push %rdx 34: 50 push %rax 35: b0 08 mov $0x8,%al 37: e8 00 00 00 00 callq 3c 3c: 48 83 c4 48 add $0x48,%rsp 40: c3 retq Now to figure out if there's a way, using target rtx_costs or pushdi2_rex64= 's constraints/predicates, to prevent hardreg cprop performing this substituti= on. Plan B might be to investigate reload's choice of DFmode SSE vs DImode GPR,= but this is within one or two bytes of optimal (for four arguments I believe GCC would produce shorter code than clang).=