From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 4071 invoked by alias); 24 Nov 2014 12:15:30 -0000 Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-bugs-owner@gcc.gnu.org Received: (qmail 3833 invoked by uid 48); 24 Nov 2014 12:15:26 -0000 From: "jiwang at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug target/62173] [5.0 regression] [AArch64] Performance regression due to r213488 Date: Mon, 24 Nov 2014 12:15:00 -0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: target X-Bugzilla-Version: 5.0 X-Bugzilla-Keywords: missed-optimization X-Bugzilla-Severity: normal X-Bugzilla-Who: jiwang at gcc dot gnu.org X-Bugzilla-Status: ASSIGNED X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: jiwang at gcc dot gnu.org X-Bugzilla-Target-Milestone: 5.0 X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-SW-Source: 2014-11/txt/msg02721.txt.bz2 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62173 --- Comment #7 from Jiong Wang --- (In reply to bin.cheng from comment #6) > Em, is offset valid for [reg+offset] addressing mode? if it is, why don't we > transform "reg+reg+offset" into "regX <- reg + reg; [regX + offset];"? that's because for local char array A, if we want to address it's element, like A[I], first we get the base address of array A, which is (plus virtual_stack_vars_rtx, offset), then we add the index offset I which is in register B: (plus (plus virtual_stack_vars_rtx, offset), B) while from my experiment, above will be canonicalized into : (plus (plus virtual_stack_vars_rtx, B), offset) and for any target define FRAME_GROWS_DOWNWARD be 1, virtual_stack_vars_rtx will be eliminated into (plus frame pointer, offset1), instead of (plus, frame_pointer, const_0) which only happen when FRAME_GROWS_DOWNWARD be 0. so, transform "reg+reg+offset" into "regX <- reg + reg; [regX + offset];" will cause some trouble for gcc rtl optimization, because it's finally splitted into: regA <- frame - offset0 regA <- regA + regB regA <- regA + offset1 and somehow, later rtl optimization can't fold offset 0 and offset 1, because virtual_stack_var_rtx elimination happens at quite later stage in LRA. so, if we found "virtual_stack_var_rtx + reg + offset", it's better to associate constant offset with it, which means transform it into regA <- virtual_stack_var_rtx + offset regA <- regA + regB thus the elimination offset will be merged into the array offset automatically in LRA. I verified if we add such transform in aarch64's LEGITIMIZE_ADDRESS hook, then we do generate optimized code for Pinski's sample code: bar: stp x29, x30, [sp, -48]! add x29, sp, 0 stp x19, x20, [sp, 16] add x19, x29, 32 mov w20, w0 mov x0, x19 bl g ldrb w0, [x19, w20, sxtw] bl f ldp x19, x20, [sp, 16] ldp x29, x30, [sp], 48 ret