From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-patches-return-503800-listarch-gcc-patches=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 101892 invoked by alias); 26 Jun 2019 13:55:10 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Received: (qmail 101884 invoked by uid 89); 26 Jun 2019 13:55:10 -0000
Authentication-Results: sourceware.org; auth=none
X-Spam-SWARE-Status: No, score=-11.6 required=5.0 tests=AWL,BAYES_00,RCVD_IN_DNSWL_NONE,SPF_PASS autolearn=ham version=3.3.1 spammy=
X-HELO: mail-lf1-f48.google.com
Received: from mail-lf1-f48.google.com (HELO mail-lf1-f48.google.com) (209.85.167.48) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Wed, 26 Jun 2019 13:55:08 +0000
Received: by mail-lf1-f48.google.com with SMTP id x144so1629720lfa.9        for <gcc-patches@gcc.gnu.org>; Wed, 26 Jun 2019 06:55:07 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;        d=linaro.org; s=google;        h=mime-version:references:in-reply-to:from:date:message-id:subject:to;        bh=TMHcpB4hGCQtgqrBY61h+fNgUNFH4MHf/4oqwP5FwTg=;        b=d25/xE7L1kg4A8vnaff9texAhTWni4sOlqYr7ZtOOo9gLXqCkCTQMPgfMmnNVrR3Fn         4tVwWc27r4qszDW/l/CDmVcv1JcWlwN6az0vlyZh8dSvzHP9etOSjgkigOHsMthdg1eY         MiK+UZydvkWEvormJqEBVgwWmu1NOrxjafxls8qsDk2LoNOzVgZSSd91Is+PNYVXqPWd         lgWKqT43uZhVYmbHQRWa49tblB3kfAFs9hB/OnNyzdlAWihfuFJzqsLZBhhmGIUANZxt         U1ud35seGTqci1EetGZw55rzjnFGFqNUBRcktzogRIrtG6jwdqExoTbzvlPBex6cqvxQ         YOEA==
MIME-Version: 1.0
References: <CAAgBjMndm8upw++qigGPQNTA4nqm+Uck=CcVBpt3cZ74fEGEwA@mail.gmail.com> <mptd0j3ib67.fsf@arm.com> <CAAgBjM=z6g3R6Ai7nRcOShAXEp2dd44kFkmZUUf-=T+jfUFboQ@mail.gmail.com> <mptv9wvgj37.fsf@arm.com> <CAAgBjMngAEz-3O6M45k_cjMaRzTqrPBXLf4aCWfipGsJ09Z=SA@mail.gmail.com> <CAAgBjMmrBvcuJiYu98PzGxSCpNbnQ92rJ39YDM=Eup-tSts2zQ@mail.gmail.com> <mpt5zotg2c9.fsf@arm.com> <CAAgBjMng2QWYv7J2eq2sZRgETtpbAk+sQArRAQGw4ox4UBYbUA@mail.gmail.com> <mpta7e4eisg.fsf@arm.com>
In-Reply-To: <mpta7e4eisg.fsf@arm.com>
From: Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org>
Date: Wed, 26 Jun 2019 13:55:00 -0000
Message-ID: <CAAgBjMmeShGZ1Jtx3f8haSu6A2H9rkpxoV8uah=tB7AzWquftA@mail.gmail.com>
Subject: Re: [SVE] [fwprop] PR88833 - Redundant moves for WHILELO-based loops
To: Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org>, gcc Patches <gcc-patches@gcc.gnu.org>, 	Richard Sandiford <richard.sandiford@arm.com>
Content-Type: text/plain; charset="UTF-8"
X-IsSubscribed: yes
X-SW-Source: 2019-06/txt/msg01667.txt.bz2

On Wed, 26 Jun 2019 at 16:05, Richard Sandiford
<richard.sandiford@arm.com> wrote:
>
> Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> writes:
> > On Tue, 25 Jun 2019 at 20:05, Richard Sandiford
> > <richard.sandiford@arm.com> wrote:
> >>
> >> Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> writes:
> >> > On Mon, 24 Jun 2019 at 21:41, Prathamesh Kulkarni
> >> > <prathamesh.kulkarni@linaro.org> wrote:
> >> >>
> >> >> On Mon, 24 Jun 2019 at 19:51, Richard Sandiford
> >> >> <richard.sandiford@arm.com> wrote:
> >> >> >
> >> >> > Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> writes:
> >> >> > > @@ -1415,6 +1460,19 @@ forward_propagate_into (df_ref use)
> >> >> > >    if (!def_set)
> >> >> > >      return false;
> >> >> > >
> >> >> > > +  if (reg_prop_only
> >> >> > > +      && !REG_P (SET_SRC (def_set))
> >> >> > > +      && !REG_P (SET_DEST (def_set)))
> >> >> > > +    return false;
> >> >> >
> >> >> > This should be:
> >> >> >
> >> >> >   if (reg_prop_only
> >> >> >       && (!REG_P (SET_SRC (def_set)) || !REG_P (SET_DEST (def_set))))
> >> >> >     return false;
> >> >> >
> >> >> > so that we return false if either operand isn't a register.
> >> >> Oops, sorry about that  -:(
> >> >> >
> >> >> > > +
> >> >> > > +  /* Allow propagations into a loop only for reg-to-reg copies, since
> >> >> > > +     replacing one register by another shouldn't increase the cost.  */
> >> >> > > +
> >> >> > > +  if (DF_REF_BB (def)->loop_father != DF_REF_BB (use)->loop_father
> >> >> > > +      && !REG_P (SET_SRC (def_set))
> >> >> > > +      && !REG_P (SET_DEST (def_set)))
> >> >> > > +    return false;
> >> >> >
> >> >> > Same here.
> >> >> >
> >> >> > OK with that change, thanks.
> >> >> Thanks for the review, will make the changes and commit the patch
> >> >> after re-testing.
> >> > Hi,
> >> > Testing the patch showed following failures on 32-bit x86:
> >> >
> >> >   Executed from: g++.target/i386/i386.exp
> >> >     g++:g++.target/i386/pr88152.C   scan-assembler-not vpcmpgt|vpcmpeq|vpsra
> >> >   Executed from: gcc.target/i386/i386.exp
> >> >     gcc:gcc.target/i386/pr66768.c scan-assembler add*.[ \t]%gs:
> >> >     gcc:gcc.target/i386/pr90178.c scan-assembler-times xorl[\\t
> >> > ]*\\%eax,[\\t ]*%eax 1
> >> >
> >> > The failure of pr88152.C is also seen on x86_64.
> >> >
> >> > For pr66768.c, and pr90178.c, forwprop replaces register which is
> >> > volatile and frame related respectively.
> >> > To avoid that, the attached patch, makes a stronger constraint that
> >> > src and dest should be a register
> >> > and not have frame_related or volatil flags set, which is checked in
> >> > usable_reg_p().
> >> > Which avoids the failures for both the cases.
> >> > Does it look OK ?
> >>
> >> That's not the reason it's a bad transform.  In both cases we're
> >> propagating r2 <- r1 even though
> >>
> >> (a) r1 dies in the copy and
> >> (b) fwprop can't replace all uses of r2, because some have multiple
> >>     definitions
> >>
> >> This has the effect of making both values live in cases where only one
> >> was previously.
> >>
> >> In the case of pr66768.c, fwprop2 is undoing the effect of
> >> cse.c:canon_reg, which tries to pick the best register to use
> >> (see cse.c:make_regs_eqv).  fwprop1 makes the same change,
> >> and made it even before the patch, but the cse.c choice should win.
> >>
> >> A (hopefully) conservative fix would be to propagate the copy only if
> >> both registers have a single definition, which you can test with:
> >>
> >>   (DF_REG_DEF_COUNT (regno) == 1
> >>    && !bitmap_bit_p (DF_LR_OUT (ENTRY_BLOCK_PTR_FOR_FN (m_fn)), regno))
> >>
> >> In that case, fwprop should see all uses of the destination, and should
> >> be able to replace it in all cases with the source.
> > Ah I see, thanks for the explanation!
> > The above check works to resolve the failure.
> > IIUC, !bitmap_bit_p (...) above checks that reg isn't used uninitialized ?
>
> Right.
>
> >> > For g++.target/i386/pr88152.C, the issue is that after the patch,
> >> > forwprop1 does following propagation (in f10) which wasn't done
> >> > before:
> >> >
> >> > In insn 10, replacing
> >> >  (unspec:SI [
> >> >             (reg:V2DF 91)
> >> >         ] UNSPEC_MOVMSK)
> >> >  with (unspec:SI [
> >> >             (subreg:V2DF (reg:V2DI 90) 0)
> >> >         ] UNSPEC_MOVMSK)
> >> >
> >> > This later defeats combine because insn 9 gets deleted.
> >> > Without patch, the following combination takes place:
> >> >
> >> > Trying 7 -> 9:
> >> >     7: r90:V2DI=r89:V2DI>r93:V2DI
> >> >       REG_DEAD r93:V2DI
> >> >       REG_DEAD r89:V2DI
> >> >     9: r91:V2DF=r90:V2DI#0
> >> >       REG_DEAD r90:V2DI
> >> > Successfully matched this instruction:
> >> > (set (subreg:V2DI (reg:V2DF 91) 0)
> >> >     (gt:V2DI (reg:V2DI 89)
> >> >         (reg:V2DI 93)))
> >> > allowing combination of insns 7 and 9
> >> >
> >> > and then:
> >> > Trying 6, 9 -> 10:
> >> >     6: r89:V2DI=const_vector
> >> >     9: r91:V2DF#0=r89:V2DI>r93:V2DI
> >> >       REG_DEAD r89:V2DI
> >> >       REG_DEAD r93:V2DI
> >> >    10: r87:SI=unspec[r91:V2DF] 43
> >> >       REG_DEAD r91:V2DF
> >> > Successfully matched this instruction:
> >> > (set (reg:SI 87)
> >> >     (unspec:SI [
> >> >             (lt:V2DF (reg:V2DI 93)
> >> >                 (const_vector:V2DI [
> >> >                         (const_int 0 [0]) repeated x2
> >> >                     ]))
> >> >         ] UNSPEC_MOVMSK))
> >>
> >> Eh?  lt:*V2DF*?  Does that mean that it's 0 for false and an all-1 NaN
> >> for true?
> > Well looking at .optimized dump:
> >
> >   vector(2) long int _2;
> >   vector(2) double _3;
> >   int _6;
> >   signed long _7;
> >   long int _8;
> >   signed long _9;
> >   long int _10;
> >
> >   <bb 2> [local count: 1073741824]:
> >   _7 = BIT_FIELD_REF <a_4(D), 64, 0>;
> >   _8 = _7 < 0 ? -1 : 0;
> >   _9 = BIT_FIELD_REF <a_4(D), 64, 64>;
> >   _10 = _9 < 0 ? -1 : 0;
> >   _2 = {_8, _10};
> >   _3 = VIEW_CONVERT_EXPR<__m128d>(_2);
> >   _6 = __builtin_ia32_movmskpd (_3); [tail call]
> >   return _6;
> >
> > So AFAIU, we're using -1 for representing true and 0 for false
> > and casting -1 (literally) to double would change it's value to -NaN ?
>
> Exactly.  And for -ffinite-math-only, we might as well then fold the
> condition to false. :-)
>
> IMO rtl condition results have to have integral modes and not folding
> (subreg:V2DF (lt:V2DI ...) 0) is the correct behaviour.  So I think this
> is just a latent bug and shouldn't hold up the patch.
>
> I'm not sure whether:
>
>    reinterpret_cast<__m128d> (a > ...)
>
> is something we expect users to write, or whether it was just
> included for completeness.
I will report the issue and commit after re-testing patch.
Thanks a lot for the helpful reviews!

Thanks,
Prathamesh
>
> Thanks,
> Richard
>
> > The above insn 10 created by combine is then transformed to following insn 22:
> > (insn 22 9 16 2 (set (reg:SI 0 ax [87])
> >         (unspec:SI [
> >                 (reg:V2DF 20 xmm0 [93])
> >             ] UNSPEC_MOVMSK))
> > "../../stage1-build/gcc/include/emmintrin.h":958:34 4222
> > {sse2_movmskpd}
> >      (nil))
> >
> > deleting insn 10.
> >
> > Thanks,
> > Prathamesh
> >
> >>
> >> Looks like a bug that we manage to fold to that, and manage to match it.
> >>
> >> Thanks,
> >> Richard
> >>
> >> > allowing combination of insns 6, 9 and 10
> >> > original costs 4 + 8 + 4 = 16
> >> > replacement cost 12
> >> > deferring deletion of insn with uid = 9.
> >> > deferring deletion of insn with uid = 6.
> >> > which deletes insns 2, 3, 6, 7, 9.
> >> >
> >> > With patch, it fails to combine 7->10:
> >> > Trying 7 -> 10:
> >> >     7: r90:V2DI=r89:V2DI>r93:V2DI
> >> >       REG_DEAD r93:V2DI
> >> >       REG_DEAD r89:V2DI
> >> >    10: r87:SI=unspec[r90:V2DI#0] 43
> >> >       REG_DEAD r90:V2DI
> >> > Failed to match this instruction:
> >> > (set (reg:SI 87)
> >> >     (unspec:SI [
> >> >             (subreg:V2DF (gt:V2DI (reg:V2DI 89)
> >> >                     (reg:V2DI 93)) 0)
> >> >         ] UNSPEC_MOVMSK))
> >> >
> >> > and subsequently 6, 7 -> 10
> >> > (attached combine dumps before and after patch).
> >> >
> >> > So IIUC, the issue is that the target does not have a pattern that can
> >> > match the above insn ?
> >> > I tried a simple workaround to "pessimize" the else condition in
> >> > propagate_rtx_1 in patch, to require old_rtx and new_rtx have same
> >> > rtx_code, which at least
> >> > works for this test-case, but not sure if that's the correct approach.
> >> > Could you suggest how to proceed ?
> >> >
> >> > Thanks,
> >> > Prathamesh
> >> >>
> >> >> Thanks,
> >> >> Prathamesh
> >> >> >
> >> >> > Richard