From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 6936 invoked by alias); 4 Dec 2014 11:07:33 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 6923 invoked by uid 89); 4 Dec 2014 11:07:33 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.1 required=5.0 tests=AWL,BAYES_00,FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,SPF_PASS autolearn=ham version=3.3.2 X-HELO: mail-oi0-f53.google.com Received: from mail-oi0-f53.google.com (HELO mail-oi0-f53.google.com) (209.85.218.53) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-SHA encrypted) ESMTPS; Thu, 04 Dec 2014 11:07:32 +0000 Received: by mail-oi0-f53.google.com with SMTP id x69so11853382oia.26 for ; Thu, 04 Dec 2014 03:07:30 -0800 (PST) MIME-Version: 1.0 X-Received: by 10.60.250.135 with SMTP id zc7mr6282373oec.54.1417691250361; Thu, 04 Dec 2014 03:07:30 -0800 (PST) Received: by 10.76.174.2 with HTTP; Thu, 4 Dec 2014 03:07:30 -0800 (PST) In-Reply-To: References: <54803EBE.2060607@arm.com> Date: Thu, 04 Dec 2014 11:07:00 -0000 Message-ID: Subject: Re: [PATCH] PR 62173, re-shuffle insns for RTL loop invariant hoisting From: Richard Biener To: Jiong Wang Cc: "gcc-patches@gcc.gnu.org" Content-Type: text/plain; charset=UTF-8 X-IsSubscribed: yes X-SW-Source: 2014-12/txt/msg00360.txt.bz2 On Thu, Dec 4, 2014 at 12:07 PM, Richard Biener wrote: > On Thu, Dec 4, 2014 at 12:00 PM, Jiong Wang wrote: >> For PR62173, the ideal solution is to resolve the problem on tree level >> ivopt pass. >> >> While, apart from the tree level issue, PR 62173 also exposed another two >> RTL level issues. >> one of them is looks like we could improve RTL level loop invariant hoisting >> by re-shuffle insns. >> >> for Seb's testcase >> >> void bar(int i) { >> char A[10]; >> int d = 0; >> while (i > 0) >> A[d++] = i--; >> >> while (d > 0) >> foo(A[d--]); >> } >> >> the insn sequences to calculate A[I]'s address looks like: >> >> (insn 76 75 77 22 (set (reg/f:DI 109) >> (plus:DI (reg/f:DI 64 sfp) >> (reg:DI 108 [ i ]))) seb-pop.c:8 84 {*adddi3_aarch64} >> (expr_list:REG_DEAD (reg:DI 108 [ i ]) >> (nil))) >> (insn 77 76 78 22 (set (reg:SI 110 [ D.2633 ]) >> (zero_extend:SI (mem/j:QI (plus:DI (reg/f:DI 109) >> (const_int -16 [0xfffffffffffffff0])) [0 A S1 A8]))) seb-pop.c:8 76 >> {*zero_extendqisi2_aarch64} >> (expr_list:REG_DEAD (reg/f:DI 109) >> (nil))) >> >> while for most RISC archs, reg + reg addressing is typical, so if we >> re-shuffle >> the instruction sequences into the following: >> >> (insn 96 94 97 22 (set (reg/f:DI 129) >> (plus:DI (reg/f:DI 64 sfp) >> (const_int -16 [0xfffffffffffffff0]))) seb-pop.c:8 84 {*adddi3_aarch64} >> (nil)) >> (insn 97 96 98 22 (set (reg:DI 130 [ i ]) >> (sign_extend:DI (reg/v:SI 97 [ i ]))) seb-pop.c:8 70 >> {*extendsidi2_aarch64} >> (expr_list:REG_DEAD (reg/v:SI 97 [ i ]) >> (nil))) >> (insn 98 97 99 22 (set (reg:SI 131 [ D.2633 ]) >> (zero_extend:SI (mem/j:QI (plus:DI (reg/f:DI 129) >> (reg:DI 130 [ i ])) [0 A S1 A8]))) seb-pop.c:8 76 >> {*zero_extendqisi2_aarch64} >> (expr_list:REG_DEAD (reg:DI 130 [ i ]) >> (expr_list:REG_DEAD (reg/f:DI 129) >> (nil)))) >> >> which means re-associate the constant imm with the virtual frame pointer. >> >> transform >> >> RA <- fixed_reg + RC >> RD <- MEM (RA + const_offset) >> >> into: >> >> RA <- fixed_reg + const_offset >> RD <- MEM (RA + RC) >> >> then RA <- fixed_reg + const_offset is actually loop invariant, so the later >> RTL GCSE PRE pass could catch it and do the hoisting, and thus ameliorate >> what tree >> level ivopts could not sort out. > > There is a LIM pass after gimple ivopts - if the invariantness is already > visible there why not handle it there similar to the special-cases in > rewrite_bittest and rewrite_reciprocal? > > And of course similar tricks could be applied on the RTL level to > RTL invariant motion? Oh, and the patch misses a testcase. > Thanks, > Richard. > >> and this patch only tries to re-shuffle instructions within single basic >> block which >> is a inner loop which is perf critical. >> >> I am reusing the loop info in fwprop because there is loop info and it's run >> before >> GCSE. >> >> verified on aarch64 and mips64, the array base address hoisted out of loop. >> >> bootstrap ok on x86-64 and aarch64. >> >> comments? >> >> thanks. >> >> gcc/ >> PR62173 >> fwprop.c (prepare_for_gcse_pre): New function. >> (fwprop_done): Call it.