From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 5803 invoked by alias); 24 Apr 2015 17:05:17 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 5794 invoked by uid 89); 24 Apr 2015 17:05:16 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.7 required=5.0 tests=AWL,BAYES_00,SPF_PASS autolearn=ham version=3.3.2 X-HELO: eu-smtp-delivery-143.mimecast.com Received: from eu-smtp-delivery-143.mimecast.com (HELO eu-smtp-delivery-143.mimecast.com) (146.101.78.143) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Fri, 24 Apr 2015 17:05:14 +0000 Received: from cam-owa1.Emea.Arm.com (fw-tnat.cambridge.arm.com [217.140.96.140]) by uk-mta-27.uk.mimecast.lan; Fri, 24 Apr 2015 18:05:10 +0100 Received: from e104437-lin ([10.1.2.79]) by cam-owa1.Emea.Arm.com with Microsoft SMTPSVC(6.0.3790.3959); Fri, 24 Apr 2015 18:05:10 +0100 References: <54803EBE.2060607@arm.com> <5480B6D6.2020201@arm.com> <548EFE0D.1070808@arm.com> <548EFE55.6090901@arm.com> <54930811.1020003@arm.com> <20141218220908.GA20720@gate.crashing.org> <5494426A.9010209@naturalbridge.com> <54DB6587.1020207@naturalbridge.com> <54DB9CDB.5090304@arm.com> <552D4D61.9040100@redhat.com> <5539A2A8.6050505@redhat.com> From: Jiong Wang To: Jeff Law Cc: Steven Bosscher , "gcc-patches\@gcc.gnu.org" , Kenneth Zadeck Subject: Re: [PATCH] PR 62173, re-shuffle insns for RTL loop invariant hoisting Date: Fri, 24 Apr 2015 17:05:00 -0000 In-reply-to: <5539A2A8.6050505@redhat.com> Message-ID: MIME-Version: 1.0 X-MC-Unique: LVpm7imPQUOR7qSUfpEFxg-1 Content-Type: text/plain; charset=WINDOWS-1252 Content-Transfer-Encoding: quoted-printable X-SW-Source: 2015-04/txt/msg01518.txt.bz2 Jeff Law writes: > On 04/21/2015 08:24 AM, Jiong Wang wrote: >> >> Jiong Wang writes: >> >>> 2015-04-14 18:24 GMT+01:00 Jeff Law : >>>> On 04/14/2015 10:48 AM, Steven Bosscher wrote: >>>>>> >>>>>> So I think this stage2/3 binary difference is acceptable? >>>>> >>>>> >>>>> No, they should be identical. If there's a difference, then there's a >>>>> bug - which, it seems, you've already found, too. >>>> >>>> RIght. And so the natural question is how to fix. >>>> >>>> At first glance it would seem like having this new code ignore depende= ncies >>>> rising from debug insns would work. >>>> >>>> Which then begs the question, what happens to the debug insn -- it's >>>> certainly not going to be correct anymore if the transformation is mad= e. >>> >>> Exactly. >>> >>> The debug_insn 2776 in my example is to record the base address of a >>> local array. the new code is doing correctly here by not shuffling the >>> operands of insn 2556 and 2557 as there is additional reference of >>> reg:1473 from debug insn, although the code will still execute correctly >>> if we do the transformation. >>> >>> my understanding to fix this: >>> >>> * delete the out-of-date mismatch debug_insn? as there is no guarant= ee >>> to generate accurate debug info under -O2. >>> >>> IMO, this debug_insn may affect "DW_AT_location" field for variable >>> descrption of "classes" in .debug_info section, but it's omitted in >>> the final output already. >>> >>> <3><38a4d>: Abbrev Number: 137 (DW_TAG_variable) >>> <38a4f> DW_AT_name : (indirect string, offset: 0x18db): classes >>> <38a53> DW_AT_decl_file : 1 >>> <38a54> DW_AT_decl_line : 548 >>> <38a56> DW_AT_type : <0x38cb4> >>> >>> * update the debug_insn? if the following change is OK with dwarf st= andard >>> >>> from >>> >>> insn0: reg0 =3D fp + reg1 >>> debug_insn: var_loc =3D reg0 + const_off >>> insn1: reg2 =3D reg0 + const_off >>> >>> to >>> >>> insn0: reg0 =3D fp + const_off >>> debug_insn: var_loc =3D reg0 + reg1 >>> insn1: reg2 =3D reg0 + reg1 >>> >>> Thanks, >>> >> >> And attachment is the new patch which will update debug_insn as >> described in the second solution above. >> >> Now the stage2/3 binary differences on AArch64 gone away. Bootstrap OK. >> >> On AArch64, this patch give 600+ new rtl loop invariants found across >> spec2k6 float. +4.5% perf improvement on 436.cactusADM because four new >> invariants found in the critical function "regex_compile". >> >> The similar improvements may be achieved on other RISC backends like >> powerpc/mips I guess. >> >> One thing to mention, for AArch64, one minor glitch in >> aarch64_legitimize_address needs to be fixed to let this patch take >> effect, I will send out that patch later as it's a seperate issue. >> Powerpc/Mips don't have this glitch in LEGITIMIZE_ADDRESS hook, so >> should be OK, and I verified the base address of local array in the >> testcase given by Seb on pr62173 do hoisted on ppc64 now. I think >> pr62173 is fixed on those 64bit arch by this patch. >> >> Thoughts? >> >> Thanks. >> >> 2015-04-21 Jiong Wang >> >> gcc/ >> * loop-invariant.c (find_defs): Enable DF_DU_CHAIN build. >> (vfp_const_iv): New hash table. >> (expensive_addr_check_p): New boolean. >> (init_inv_motion_data): Initialize new variables.> >> (free_inv_motion_data): Release hash table. >> (create_new_invariant): Set cheap_address to false for iv in >> vfp_const_iv table. >> (find_invariant_insn): Skip dependencies check for iv in vfp_const_iv >> table. >> (use_for_single_du): New function. >> (reshuffle_insn_with_vfp): Likewise. >> (find_invariants_bb): Call reshuffle_insn_with_vfp. >> >> gcc/testsuite/ >> * gcc.dg/pr62173.c: New testcase. > So ultimately isn't this just a specialized version of reassociation of=20 > pointer arithmetic? And if so, don't we have all the usual problems=20 > around introducing overflow? > > > ISTM if this is going to go forward (ie, we somehow convince ourselves=20 > that this kind of reassociation is OK), then should it be made to apply=20 > on pointers in general rather than restricting to stack, frame,=20 > virtual-frame? > Jeff, Thanks for the review. This transformation is not reassociation of pointer arithmetic. =20=20 The idea of this patch is, given two instructions with variable value, we may get new instruction sequences with fixed value by reassociating their operands. And currently GCC only generate such instruction sequences for local array accessing as far as I known. for the statement "D.2707 =3D A[i]", gcc generate the following instruction sequence: (insn 92 91 93 6 (set (reg/f:DI 148) (plus:DI (reg/f:DI 64 sfp) (reg:DI 147 [ i ]))) (expr_list:REG_DEAD (reg:DI 147 [ i ]) (nil))) =20=20=20=20=20=20=20=20 (insn 93 92 94 6 (set (reg:SI 149 [ D.2707 ]) (zero_extend:SI (mem/j:QI (plus:DI (reg/f:DI 148) (const_int -16 [0xfffffffffffffff0])) [0 A S1 A8]))) (expr_list:REG_DEAD (reg/f:DI 148) (nil))) both insn 92 and insn 93 are with variable value, but "(reg/f:DI 64 sfp)" in insn 92 and "const_int -16" in insn 93 are fixed value. =20=20 So my patch will transform above sequence into the following if "apply_change_group" succeed. Then insn 92 is with fixed value and thus loop invariant. =20=20 (insn 92 88 97 5 (set (reg/f:DI 148) (plus:DI (reg/f:DI 64 sfp) (const_int -16 [0xfffffffffffffff0]))) (nil)) =20=20=20=20=20 (insn 93 131 94 6 (set (reg:SI 149 [ D.2707 ]) (zero_extend:SI (mem/j:QI (plus:DI (reg/f:DI 148) (reg:DI 147 [ i ])) [0 A S1 A8]))) (expr_list:REG_DEAD (reg:DI 147 [ i ]) (expr_list:REG_DEAD (reg/f:DI 148) (nil)))) This tranformation will only be done if the def of r148 in insn 92 is the= single def for the use of it in insn 93, and the use in insn 93 is also single use. And if there is overflow, then it will happen in the original insn 93 also, then it should not be generated in the first place. So I think there is no need to do overflow check. Does this explanation make sense?=20 =20=20 --=20 Regards, Jiong