From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 496 invoked by alias); 4 Oct 2013 18:24:23 -0000 Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-bugs-owner@gcc.gnu.org Received: (qmail 465 invoked by uid 48); 4 Oct 2013 18:24:19 -0000 From: "b.grayson at samsung dot com" To: gcc-bugs@gcc.gnu.org Subject: [Bug target/58622] New: With -fomit-frame-pointer, A64 does not generate post-decrement stores Date: Fri, 04 Oct 2013 18:24:00 -0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: target X-Bugzilla-Version: 4.9.0 X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: b.grayson at samsung dot com X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_id short_desc product version bug_status bug_severity priority component assigned_to reporter cf_gcctarget cf_gccbuild Message-ID: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-SW-Source: 2013-10/txt/msg00238.txt.bz2 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58622 Bug ID: 58622 Summary: With -fomit-frame-pointer, A64 does not generate post-decrement stores Product: gcc Version: 4.9.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: b.grayson at samsung dot com Target: AArch64 Build: 4.9.0 20130602 In A64, if one compiles a simple program under -O3, one gets code like this: int bar(int i); int foo() { return bar(5)+4; } A64 -O3 assembly: foo: stp x29, x30, [sp, -16]! add x29, sp, 0 mov w0, 5 bl bar add w0, w0, 4 ldp x29, x30, [sp], 16 ret Note the use of update-form loads and stores for the SP. But if one uses -O3 -fomit-frame-pointer, the following is obtained: foo: sub sp, sp, #16 mov w0, 5 str x30, [sp] bl bar add w0, w0, 4 ldr x30, [sp] add sp, sp, 16 ret The sub and str could be merged into str x30, [sp, #-16]!, and the ldr/add could be merged into ldr x30, [sp], #16 (if I have my assembly correct), as they were in the with-frame-pointer case. On some ARM implementations, the updates are "for free", so one would get better performance with the merged load/store instructions, not to mention better instruction-cache density. Note that under A32, identical code (using update/post-decrement stores) is generated regardless of omit-frame-pointer settings.