From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 101892 invoked by alias); 26 Jun 2019 13:55:10 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 101884 invoked by uid 89); 26 Jun 2019 13:55:10 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-11.6 required=5.0 tests=AWL,BAYES_00,RCVD_IN_DNSWL_NONE,SPF_PASS autolearn=ham version=3.3.1 spammy= X-HELO: mail-lf1-f48.google.com Received: from mail-lf1-f48.google.com (HELO mail-lf1-f48.google.com) (209.85.167.48) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Wed, 26 Jun 2019 13:55:08 +0000 Received: by mail-lf1-f48.google.com with SMTP id x144so1629720lfa.9 for ; Wed, 26 Jun 2019 06:55:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to; bh=TMHcpB4hGCQtgqrBY61h+fNgUNFH4MHf/4oqwP5FwTg=; b=d25/xE7L1kg4A8vnaff9texAhTWni4sOlqYr7ZtOOo9gLXqCkCTQMPgfMmnNVrR3Fn 4tVwWc27r4qszDW/l/CDmVcv1JcWlwN6az0vlyZh8dSvzHP9etOSjgkigOHsMthdg1eY MiK+UZydvkWEvormJqEBVgwWmu1NOrxjafxls8qsDk2LoNOzVgZSSd91Is+PNYVXqPWd lgWKqT43uZhVYmbHQRWa49tblB3kfAFs9hB/OnNyzdlAWihfuFJzqsLZBhhmGIUANZxt U1ud35seGTqci1EetGZw55rzjnFGFqNUBRcktzogRIrtG6jwdqExoTbzvlPBex6cqvxQ YOEA== MIME-Version: 1.0 References: In-Reply-To: From: Prathamesh Kulkarni Date: Wed, 26 Jun 2019 13:55:00 -0000 Message-ID: Subject: Re: [SVE] [fwprop] PR88833 - Redundant moves for WHILELO-based loops To: Prathamesh Kulkarni , gcc Patches , Richard Sandiford Content-Type: text/plain; charset="UTF-8" X-IsSubscribed: yes X-SW-Source: 2019-06/txt/msg01667.txt.bz2 On Wed, 26 Jun 2019 at 16:05, Richard Sandiford wrote: > > Prathamesh Kulkarni writes: > > On Tue, 25 Jun 2019 at 20:05, Richard Sandiford > > wrote: > >> > >> Prathamesh Kulkarni writes: > >> > On Mon, 24 Jun 2019 at 21:41, Prathamesh Kulkarni > >> > wrote: > >> >> > >> >> On Mon, 24 Jun 2019 at 19:51, Richard Sandiford > >> >> wrote: > >> >> > > >> >> > Prathamesh Kulkarni writes: > >> >> > > @@ -1415,6 +1460,19 @@ forward_propagate_into (df_ref use) > >> >> > > if (!def_set) > >> >> > > return false; > >> >> > > > >> >> > > + if (reg_prop_only > >> >> > > + && !REG_P (SET_SRC (def_set)) > >> >> > > + && !REG_P (SET_DEST (def_set))) > >> >> > > + return false; > >> >> > > >> >> > This should be: > >> >> > > >> >> > if (reg_prop_only > >> >> > && (!REG_P (SET_SRC (def_set)) || !REG_P (SET_DEST (def_set)))) > >> >> > return false; > >> >> > > >> >> > so that we return false if either operand isn't a register. > >> >> Oops, sorry about that -:( > >> >> > > >> >> > > + > >> >> > > + /* Allow propagations into a loop only for reg-to-reg copies, since > >> >> > > + replacing one register by another shouldn't increase the cost. */ > >> >> > > + > >> >> > > + if (DF_REF_BB (def)->loop_father != DF_REF_BB (use)->loop_father > >> >> > > + && !REG_P (SET_SRC (def_set)) > >> >> > > + && !REG_P (SET_DEST (def_set))) > >> >> > > + return false; > >> >> > > >> >> > Same here. > >> >> > > >> >> > OK with that change, thanks. > >> >> Thanks for the review, will make the changes and commit the patch > >> >> after re-testing. > >> > Hi, > >> > Testing the patch showed following failures on 32-bit x86: > >> > > >> > Executed from: g++.target/i386/i386.exp > >> > g++:g++.target/i386/pr88152.C scan-assembler-not vpcmpgt|vpcmpeq|vpsra > >> > Executed from: gcc.target/i386/i386.exp > >> > gcc:gcc.target/i386/pr66768.c scan-assembler add*.[ \t]%gs: > >> > gcc:gcc.target/i386/pr90178.c scan-assembler-times xorl[\\t > >> > ]*\\%eax,[\\t ]*%eax 1 > >> > > >> > The failure of pr88152.C is also seen on x86_64. > >> > > >> > For pr66768.c, and pr90178.c, forwprop replaces register which is > >> > volatile and frame related respectively. > >> > To avoid that, the attached patch, makes a stronger constraint that > >> > src and dest should be a register > >> > and not have frame_related or volatil flags set, which is checked in > >> > usable_reg_p(). > >> > Which avoids the failures for both the cases. > >> > Does it look OK ? > >> > >> That's not the reason it's a bad transform. In both cases we're > >> propagating r2 <- r1 even though > >> > >> (a) r1 dies in the copy and > >> (b) fwprop can't replace all uses of r2, because some have multiple > >> definitions > >> > >> This has the effect of making both values live in cases where only one > >> was previously. > >> > >> In the case of pr66768.c, fwprop2 is undoing the effect of > >> cse.c:canon_reg, which tries to pick the best register to use > >> (see cse.c:make_regs_eqv). fwprop1 makes the same change, > >> and made it even before the patch, but the cse.c choice should win. > >> > >> A (hopefully) conservative fix would be to propagate the copy only if > >> both registers have a single definition, which you can test with: > >> > >> (DF_REG_DEF_COUNT (regno) == 1 > >> && !bitmap_bit_p (DF_LR_OUT (ENTRY_BLOCK_PTR_FOR_FN (m_fn)), regno)) > >> > >> In that case, fwprop should see all uses of the destination, and should > >> be able to replace it in all cases with the source. > > Ah I see, thanks for the explanation! > > The above check works to resolve the failure. > > IIUC, !bitmap_bit_p (...) above checks that reg isn't used uninitialized ? > > Right. > > >> > For g++.target/i386/pr88152.C, the issue is that after the patch, > >> > forwprop1 does following propagation (in f10) which wasn't done > >> > before: > >> > > >> > In insn 10, replacing > >> > (unspec:SI [ > >> > (reg:V2DF 91) > >> > ] UNSPEC_MOVMSK) > >> > with (unspec:SI [ > >> > (subreg:V2DF (reg:V2DI 90) 0) > >> > ] UNSPEC_MOVMSK) > >> > > >> > This later defeats combine because insn 9 gets deleted. > >> > Without patch, the following combination takes place: > >> > > >> > Trying 7 -> 9: > >> > 7: r90:V2DI=r89:V2DI>r93:V2DI > >> > REG_DEAD r93:V2DI > >> > REG_DEAD r89:V2DI > >> > 9: r91:V2DF=r90:V2DI#0 > >> > REG_DEAD r90:V2DI > >> > Successfully matched this instruction: > >> > (set (subreg:V2DI (reg:V2DF 91) 0) > >> > (gt:V2DI (reg:V2DI 89) > >> > (reg:V2DI 93))) > >> > allowing combination of insns 7 and 9 > >> > > >> > and then: > >> > Trying 6, 9 -> 10: > >> > 6: r89:V2DI=const_vector > >> > 9: r91:V2DF#0=r89:V2DI>r93:V2DI > >> > REG_DEAD r89:V2DI > >> > REG_DEAD r93:V2DI > >> > 10: r87:SI=unspec[r91:V2DF] 43 > >> > REG_DEAD r91:V2DF > >> > Successfully matched this instruction: > >> > (set (reg:SI 87) > >> > (unspec:SI [ > >> > (lt:V2DF (reg:V2DI 93) > >> > (const_vector:V2DI [ > >> > (const_int 0 [0]) repeated x2 > >> > ])) > >> > ] UNSPEC_MOVMSK)) > >> > >> Eh? lt:*V2DF*? Does that mean that it's 0 for false and an all-1 NaN > >> for true? > > Well looking at .optimized dump: > > > > vector(2) long int _2; > > vector(2) double _3; > > int _6; > > signed long _7; > > long int _8; > > signed long _9; > > long int _10; > > > > [local count: 1073741824]: > > _7 = BIT_FIELD_REF ; > > _8 = _7 < 0 ? -1 : 0; > > _9 = BIT_FIELD_REF ; > > _10 = _9 < 0 ? -1 : 0; > > _2 = {_8, _10}; > > _3 = VIEW_CONVERT_EXPR<__m128d>(_2); > > _6 = __builtin_ia32_movmskpd (_3); [tail call] > > return _6; > > > > So AFAIU, we're using -1 for representing true and 0 for false > > and casting -1 (literally) to double would change it's value to -NaN ? > > Exactly. And for -ffinite-math-only, we might as well then fold the > condition to false. :-) > > IMO rtl condition results have to have integral modes and not folding > (subreg:V2DF (lt:V2DI ...) 0) is the correct behaviour. So I think this > is just a latent bug and shouldn't hold up the patch. > > I'm not sure whether: > > reinterpret_cast<__m128d> (a > ...) > > is something we expect users to write, or whether it was just > included for completeness. I will report the issue and commit after re-testing patch. Thanks a lot for the helpful reviews! Thanks, Prathamesh > > Thanks, > Richard > > > The above insn 10 created by combine is then transformed to following insn 22: > > (insn 22 9 16 2 (set (reg:SI 0 ax [87]) > > (unspec:SI [ > > (reg:V2DF 20 xmm0 [93]) > > ] UNSPEC_MOVMSK)) > > "../../stage1-build/gcc/include/emmintrin.h":958:34 4222 > > {sse2_movmskpd} > > (nil)) > > > > deleting insn 10. > > > > Thanks, > > Prathamesh > > > >> > >> Looks like a bug that we manage to fold to that, and manage to match it. > >> > >> Thanks, > >> Richard > >> > >> > allowing combination of insns 6, 9 and 10 > >> > original costs 4 + 8 + 4 = 16 > >> > replacement cost 12 > >> > deferring deletion of insn with uid = 9. > >> > deferring deletion of insn with uid = 6. > >> > which deletes insns 2, 3, 6, 7, 9. > >> > > >> > With patch, it fails to combine 7->10: > >> > Trying 7 -> 10: > >> > 7: r90:V2DI=r89:V2DI>r93:V2DI > >> > REG_DEAD r93:V2DI > >> > REG_DEAD r89:V2DI > >> > 10: r87:SI=unspec[r90:V2DI#0] 43 > >> > REG_DEAD r90:V2DI > >> > Failed to match this instruction: > >> > (set (reg:SI 87) > >> > (unspec:SI [ > >> > (subreg:V2DF (gt:V2DI (reg:V2DI 89) > >> > (reg:V2DI 93)) 0) > >> > ] UNSPEC_MOVMSK)) > >> > > >> > and subsequently 6, 7 -> 10 > >> > (attached combine dumps before and after patch). > >> > > >> > So IIUC, the issue is that the target does not have a pattern that can > >> > match the above insn ? > >> > I tried a simple workaround to "pessimize" the else condition in > >> > propagate_rtx_1 in patch, to require old_rtx and new_rtx have same > >> > rtx_code, which at least > >> > works for this test-case, but not sure if that's the correct approach. > >> > Could you suggest how to proceed ? > >> > > >> > Thanks, > >> > Prathamesh > >> >> > >> >> Thanks, > >> >> Prathamesh > >> >> > > >> >> > Richard