From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 117084 invoked by alias); 26 Jun 2019 10:35:16 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 117073 invoked by uid 89); 26 Jun 2019 10:35:16 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-2.9 required=5.0 tests=AWL,BAYES_00,SPF_PASS autolearn=ham version=3.3.1 spammy= X-HELO: foss.arm.com Received: from foss.arm.com (HELO foss.arm.com) (217.140.110.172) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Wed, 26 Jun 2019 10:35:14 +0000 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id DD71F360; Wed, 26 Jun 2019 03:35:12 -0700 (PDT) Received: from localhost (e121540-lin.manchester.arm.com [10.32.98.39]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 687AB3F718; Wed, 26 Jun 2019 03:35:12 -0700 (PDT) From: Richard Sandiford To: Prathamesh Kulkarni Mail-Followup-To: Prathamesh Kulkarni ,gcc Patches , richard.sandiford@arm.com Cc: gcc Patches Subject: Re: [SVE] [fwprop] PR88833 - Redundant moves for WHILELO-based loops References: Date: Wed, 26 Jun 2019 10:35:00 -0000 In-Reply-To: (Prathamesh Kulkarni's message of "Wed, 26 Jun 2019 15:40:13 +0530") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-SW-Source: 2019-06/txt/msg01651.txt.bz2 Prathamesh Kulkarni writes: > On Tue, 25 Jun 2019 at 20:05, Richard Sandiford > wrote: >> >> Prathamesh Kulkarni writes: >> > On Mon, 24 Jun 2019 at 21:41, Prathamesh Kulkarni >> > wrote: >> >> >> >> On Mon, 24 Jun 2019 at 19:51, Richard Sandiford >> >> wrote: >> >> > >> >> > Prathamesh Kulkarni writes: >> >> > > @@ -1415,6 +1460,19 @@ forward_propagate_into (df_ref use) >> >> > > if (!def_set) >> >> > > return false; >> >> > > >> >> > > + if (reg_prop_only >> >> > > + && !REG_P (SET_SRC (def_set)) >> >> > > + && !REG_P (SET_DEST (def_set))) >> >> > > + return false; >> >> > >> >> > This should be: >> >> > >> >> > if (reg_prop_only >> >> > && (!REG_P (SET_SRC (def_set)) || !REG_P (SET_DEST (def_set)))) >> >> > return false; >> >> > >> >> > so that we return false if either operand isn't a register. >> >> Oops, sorry about that -:( >> >> > >> >> > > + >> >> > > + /* Allow propagations into a loop only for reg-to-reg copies, since >> >> > > + replacing one register by another shouldn't increase the cost. */ >> >> > > + >> >> > > + if (DF_REF_BB (def)->loop_father != DF_REF_BB (use)->loop_father >> >> > > + && !REG_P (SET_SRC (def_set)) >> >> > > + && !REG_P (SET_DEST (def_set))) >> >> > > + return false; >> >> > >> >> > Same here. >> >> > >> >> > OK with that change, thanks. >> >> Thanks for the review, will make the changes and commit the patch >> >> after re-testing. >> > Hi, >> > Testing the patch showed following failures on 32-bit x86: >> > >> > Executed from: g++.target/i386/i386.exp >> > g++:g++.target/i386/pr88152.C scan-assembler-not vpcmpgt|vpcmpeq|vpsra >> > Executed from: gcc.target/i386/i386.exp >> > gcc:gcc.target/i386/pr66768.c scan-assembler add*.[ \t]%gs: >> > gcc:gcc.target/i386/pr90178.c scan-assembler-times xorl[\\t >> > ]*\\%eax,[\\t ]*%eax 1 >> > >> > The failure of pr88152.C is also seen on x86_64. >> > >> > For pr66768.c, and pr90178.c, forwprop replaces register which is >> > volatile and frame related respectively. >> > To avoid that, the attached patch, makes a stronger constraint that >> > src and dest should be a register >> > and not have frame_related or volatil flags set, which is checked in >> > usable_reg_p(). >> > Which avoids the failures for both the cases. >> > Does it look OK ? >> >> That's not the reason it's a bad transform. In both cases we're >> propagating r2 <- r1 even though >> >> (a) r1 dies in the copy and >> (b) fwprop can't replace all uses of r2, because some have multiple >> definitions >> >> This has the effect of making both values live in cases where only one >> was previously. >> >> In the case of pr66768.c, fwprop2 is undoing the effect of >> cse.c:canon_reg, which tries to pick the best register to use >> (see cse.c:make_regs_eqv). fwprop1 makes the same change, >> and made it even before the patch, but the cse.c choice should win. >> >> A (hopefully) conservative fix would be to propagate the copy only if >> both registers have a single definition, which you can test with: >> >> (DF_REG_DEF_COUNT (regno) == 1 >> && !bitmap_bit_p (DF_LR_OUT (ENTRY_BLOCK_PTR_FOR_FN (m_fn)), regno)) >> >> In that case, fwprop should see all uses of the destination, and should >> be able to replace it in all cases with the source. > Ah I see, thanks for the explanation! > The above check works to resolve the failure. > IIUC, !bitmap_bit_p (...) above checks that reg isn't used uninitialized ? Right. >> > For g++.target/i386/pr88152.C, the issue is that after the patch, >> > forwprop1 does following propagation (in f10) which wasn't done >> > before: >> > >> > In insn 10, replacing >> > (unspec:SI [ >> > (reg:V2DF 91) >> > ] UNSPEC_MOVMSK) >> > with (unspec:SI [ >> > (subreg:V2DF (reg:V2DI 90) 0) >> > ] UNSPEC_MOVMSK) >> > >> > This later defeats combine because insn 9 gets deleted. >> > Without patch, the following combination takes place: >> > >> > Trying 7 -> 9: >> > 7: r90:V2DI=r89:V2DI>r93:V2DI >> > REG_DEAD r93:V2DI >> > REG_DEAD r89:V2DI >> > 9: r91:V2DF=r90:V2DI#0 >> > REG_DEAD r90:V2DI >> > Successfully matched this instruction: >> > (set (subreg:V2DI (reg:V2DF 91) 0) >> > (gt:V2DI (reg:V2DI 89) >> > (reg:V2DI 93))) >> > allowing combination of insns 7 and 9 >> > >> > and then: >> > Trying 6, 9 -> 10: >> > 6: r89:V2DI=const_vector >> > 9: r91:V2DF#0=r89:V2DI>r93:V2DI >> > REG_DEAD r89:V2DI >> > REG_DEAD r93:V2DI >> > 10: r87:SI=unspec[r91:V2DF] 43 >> > REG_DEAD r91:V2DF >> > Successfully matched this instruction: >> > (set (reg:SI 87) >> > (unspec:SI [ >> > (lt:V2DF (reg:V2DI 93) >> > (const_vector:V2DI [ >> > (const_int 0 [0]) repeated x2 >> > ])) >> > ] UNSPEC_MOVMSK)) >> >> Eh? lt:*V2DF*? Does that mean that it's 0 for false and an all-1 NaN >> for true? > Well looking at .optimized dump: > > vector(2) long int _2; > vector(2) double _3; > int _6; > signed long _7; > long int _8; > signed long _9; > long int _10; > > [local count: 1073741824]: > _7 = BIT_FIELD_REF ; > _8 = _7 < 0 ? -1 : 0; > _9 = BIT_FIELD_REF ; > _10 = _9 < 0 ? -1 : 0; > _2 = {_8, _10}; > _3 = VIEW_CONVERT_EXPR<__m128d>(_2); > _6 = __builtin_ia32_movmskpd (_3); [tail call] > return _6; > > So AFAIU, we're using -1 for representing true and 0 for false > and casting -1 (literally) to double would change it's value to -NaN ? Exactly. And for -ffinite-math-only, we might as well then fold the condition to false. :-) IMO rtl condition results have to have integral modes and not folding (subreg:V2DF (lt:V2DI ...) 0) is the correct behaviour. So I think this is just a latent bug and shouldn't hold up the patch. I'm not sure whether: reinterpret_cast<__m128d> (a > ...) is something we expect users to write, or whether it was just included for completeness. Thanks, Richard > The above insn 10 created by combine is then transformed to following insn 22: > (insn 22 9 16 2 (set (reg:SI 0 ax [87]) > (unspec:SI [ > (reg:V2DF 20 xmm0 [93]) > ] UNSPEC_MOVMSK)) > "../../stage1-build/gcc/include/emmintrin.h":958:34 4222 > {sse2_movmskpd} > (nil)) > > deleting insn 10. > > Thanks, > Prathamesh > >> >> Looks like a bug that we manage to fold to that, and manage to match it. >> >> Thanks, >> Richard >> >> > allowing combination of insns 6, 9 and 10 >> > original costs 4 + 8 + 4 = 16 >> > replacement cost 12 >> > deferring deletion of insn with uid = 9. >> > deferring deletion of insn with uid = 6. >> > which deletes insns 2, 3, 6, 7, 9. >> > >> > With patch, it fails to combine 7->10: >> > Trying 7 -> 10: >> > 7: r90:V2DI=r89:V2DI>r93:V2DI >> > REG_DEAD r93:V2DI >> > REG_DEAD r89:V2DI >> > 10: r87:SI=unspec[r90:V2DI#0] 43 >> > REG_DEAD r90:V2DI >> > Failed to match this instruction: >> > (set (reg:SI 87) >> > (unspec:SI [ >> > (subreg:V2DF (gt:V2DI (reg:V2DI 89) >> > (reg:V2DI 93)) 0) >> > ] UNSPEC_MOVMSK)) >> > >> > and subsequently 6, 7 -> 10 >> > (attached combine dumps before and after patch). >> > >> > So IIUC, the issue is that the target does not have a pattern that can >> > match the above insn ? >> > I tried a simple workaround to "pessimize" the else condition in >> > propagate_rtx_1 in patch, to require old_rtx and new_rtx have same >> > rtx_code, which at least >> > works for this test-case, but not sure if that's the correct approach. >> > Could you suggest how to proceed ? >> > >> > Thanks, >> > Prathamesh >> >> >> >> Thanks, >> >> Prathamesh >> >> > >> >> > Richard