public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: Andrew Pinski <pinskia@gmail.com>
To: Richard Biener <rguenther@suse.de>
Cc: Richard Sandiford <richard.sandiford@arm.com>,
	Tamar Christina <tamar.christina@arm.com>,
	 gcc-patches@gcc.gnu.org, nd <nd@arm.com>,
	jlaw@ventanamicro.com
Subject: Re: [PATCH]middle-end match.pd: optimize fneg (fabs (x)) to x | (1 << signbit(x)) [PR109154]
Date: Mon, 9 Oct 2023 00:36:30 -0700	[thread overview]
Message-ID: <CA+=Sn1nToQ0HM_r3n_a85zgGB5bx6kVHOWehjZXiLTDbSJUUmw@mail.gmail.com> (raw)
In-Reply-To: <nycvar.YFH.7.77.849.2310090711380.5561@jbgna.fhfr.qr>

On Mon, Oct 9, 2023 at 12:20 AM Richard Biener <rguenther@suse.de> wrote:
>
> On Sat, 7 Oct 2023, Richard Sandiford wrote:
>
> > Richard Biener <rguenther@suse.de> writes:
> > >> Am 07.10.2023 um 11:23 schrieb Richard Sandiford <richard.sandiford@arm.com>>> Richard Biener <rguenther@suse.de> writes:
> > >>> On Thu, 5 Oct 2023, Tamar Christina wrote:
> > >>>
> > >>>>> I suppose the idea is that -abs(x) might be easier to optimize with other
> > >>>>> patterns (consider a - copysign(x,...), optimizing to a + abs(x)).
> > >>>>>
> > >>>>> For abs vs copysign it's a canonicalization, but (negate (abs @0)) is less
> > >>>>> canonical than copysign.
> > >>>>>
> > >>>>>> Should I try removing this?
> > >>>>>
> > >>>>> I'd say yes (and put the reverse canonicalization next to this pattern).
> > >>>>>
> > >>>>
> > >>>> This patch transforms fneg (fabs (x)) into copysign (x, -1) which is more
> > >>>> canonical and allows a target to expand this sequence efficiently.  Such
> > >>>> sequences are common in scientific code working with gradients.
> > >>>>
> > >>>> various optimizations in match.pd only happened on COPYSIGN but not COPYSIGN_ALL
> > >>>> which means they exclude IFN_COPYSIGN.  COPYSIGN however is restricted to only
> > >>>
> > >>> That's not true:
> > >>>
> > >>> (define_operator_list COPYSIGN
> > >>>    BUILT_IN_COPYSIGNF
> > >>>    BUILT_IN_COPYSIGN
> > >>>    BUILT_IN_COPYSIGNL
> > >>>    IFN_COPYSIGN)
> > >>>
> > >>> but they miss the extended float builtin variants like
> > >>> __builtin_copysignf16.  Also see below
> > >>>
> > >>>> the C99 builtins and so doesn't work for vectors.
> > >>>>
> > >>>> The patch expands these optimizations to work on COPYSIGN_ALL.
> > >>>>
> > >>>> There is an existing canonicalization of copysign (x, -1) to fneg (fabs (x))
> > >>>> which I remove since this is a less efficient form.  The testsuite is also
> > >>>> updated in light of this.
> > >>>>
> > >>>> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> > >>>>
> > >>>> Ok for master?
> > >>>>
> > >>>> Thanks,
> > >>>> Tamar
> > >>>>
> > >>>> gcc/ChangeLog:
> > >>>>
> > >>>>    PR tree-optimization/109154
> > >>>>    * match.pd: Add new neg+abs rule, remove inverse copysign rule and
> > >>>>    expand existing copysign optimizations.
> > >>>>
> > >>>> gcc/testsuite/ChangeLog:
> > >>>>
> > >>>>    PR tree-optimization/109154
> > >>>>    * gcc.dg/fold-copysign-1.c: Updated.
> > >>>>    * gcc.dg/pr55152-2.c: Updated.
> > >>>>    * gcc.dg/tree-ssa/abs-4.c: Updated.
> > >>>>    * gcc.dg/tree-ssa/backprop-6.c: Updated.
> > >>>>    * gcc.dg/tree-ssa/copy-sign-2.c: Updated.
> > >>>>    * gcc.dg/tree-ssa/mult-abs-2.c: Updated.
> > >>>>    * gcc.target/aarch64/fneg-abs_1.c: New test.
> > >>>>    * gcc.target/aarch64/fneg-abs_2.c: New test.
> > >>>>    * gcc.target/aarch64/fneg-abs_3.c: New test.
> > >>>>    * gcc.target/aarch64/fneg-abs_4.c: New test.
> > >>>>    * gcc.target/aarch64/sve/fneg-abs_1.c: New test.
> > >>>>    * gcc.target/aarch64/sve/fneg-abs_2.c: New test.
> > >>>>    * gcc.target/aarch64/sve/fneg-abs_3.c: New test.
> > >>>>    * gcc.target/aarch64/sve/fneg-abs_4.c: New test.
> > >>>>
> > >>>> --- inline copy of patch ---
> > >>>>
> > >>>> diff --git a/gcc/match.pd b/gcc/match.pd
> > >>>> index 4bdd83e6e061b16dbdb2845b9398fcfb8a6c9739..bd6599d36021e119f51a4928354f580ffe82c6e2 100644
> > >>>> --- a/gcc/match.pd
> > >>>> +++ b/gcc/match.pd
> > >>>> @@ -1074,45 +1074,43 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> > >>>>
> > >>>> /* cos(copysign(x, y)) -> cos(x).  Similarly for cosh.  */
> > >>>> (for coss (COS COSH)
> > >>>> -     copysigns (COPYSIGN)
> > >>>> - (simplify
> > >>>> -  (coss (copysigns @0 @1))
> > >>>> -   (coss @0)))
> > >>>> + (for copysigns (COPYSIGN_ALL)
> > >>>
> > >>> So this ends up generating for example the match
> > >>> (cosf (copysignl ...)) which doesn't make much sense.
> > >>>
> > >>> The lock-step iteration did
> > >>> (cosf (copysignf ..)) ... (ifn_cos (ifn_copysign ...))
> > >>> which is leaner but misses the case of
> > >>> (cosf (ifn_copysign ..)) - that's probably what you are
> > >>> after with this change.
> > >>>
> > >>> That said, there isn't a nice solution (without altering the match.pd
> > >>> IL).  There's the explicit solution, spelling out all combinations.
> > >>>
> > >>> So if we want to go with yout pragmatic solution changing this
> > >>> to use COPYSIGN_ALL isn't necessary, only changing the lock-step
> > >>> for iteration to a cross product for iteration is.
> > >>>
> > >>> Changing just this pattern to
> > >>>
> > >>> (for coss (COS COSH)
> > >>> (for copysigns (COPYSIGN)
> > >>>  (simplify
> > >>>   (coss (copysigns @0 @1))
> > >>>   (coss @0))))
> > >>>
> > >>> increases the total number of gimple-match-x.cc lines from
> > >>> 234988 to 235324.
> > >>
> > >> I guess the difference between this and the later suggestions is that
> > >> this one allows builtin copysign to be paired with ifn cos, which would
> > >> be potentially useful in other situations.  (It isn't here because
> > >> ifn_cos is rarely provided.)  How much of the growth is due to that,
> > >> and much of it is from nonsensical combinations like
> > >> (builtin_cosf (builtin_copysignl ...))?
> > >>
> > >> If it's mostly from nonsensical combinations then would it be possible
> > >> to make genmatch drop them?
> > >>
> > >>> The alternative is to do
> > >>>
> > >>> (for coss (COS COSH)
> > >>>     copysigns (COPYSIGN)
> > >>> (simplify
> > >>>  (coss (copysigns @0 @1))
> > >>>   (coss @0))
> > >>> (simplify
> > >>>  (coss (IFN_COPYSIGN @0 @1))
> > >>>   (coss @0)))
> > >>>
> > >>> which properly will diagnose a duplicate pattern.  Ther are
> > >>> currently no operator lists with just builtins defined (that
> > >>> could be fixed, see gencfn-macros.cc), supposed we'd have
> > >>> COS_C we could do
> > >>>
> > >>> (for coss (COS_C COSH_C IFN_COS IFN_COSH)
> > >>>     copysigns (COPYSIGN_C COPYSIGN_C IFN_COPYSIGN IFN_COPYSIGN
> > >>> IFN_COPYSIGN IFN_COPYSIGN IFN_COPYSIGN IFN_COPYSIGN IFN_COPYSIGN
> > >>> IFN_COPYSIGN)
> > >>> (simplify
> > >>>  (coss (copysigns @0 @1))
> > >>>   (coss @0)))
> > >>>
> > >>> which of course still looks ugly ;) (some syntax extension like
> > >>> allowing to specify IFN_COPYSIGN*8 would be nice here and easy
> > >>> enough to do)
> > >>>
> > >>> Can you split out the part changing COPYSIGN to COPYSIGN_ALL,
> > >>> re-do it to only split the fors, keeping COPYSIGN and provide
> > >>> some statistics on the gimple-match-* size?  I think this might
> > >>> be the pragmatic solution for now.
> > >>>
> > >>> Richard - can you think of a clever way to express the desired
> > >>> iteration?  How do RTL macro iterations address cases like this?
> > >>
> > >> I don't think .md files have an equivalent construct, unfortunately.
> > >> (I also regret some of the choices I made for .md iterators, but that's
> > >> another story.)
> > >>
> > >> Perhaps an alternative to the *8 thing would be "IFN_COPYSIGN...",
> > >> with the "..." meaning "fill to match the longest operator list
> > >> in the loop".
> > >
> > > Hm, I?ll think about this.  It would be useful to have a function like
> > >
> > > Internal_fn ifn_for (combined_fn);
> > >
> > > So we can indirectly match all builtins with a switch on the ifn code.
> >
> > There's:
> >
> > extern internal_fn associated_internal_fn (combined_fn, tree);
> > extern internal_fn associated_internal_fn (tree);
> > extern internal_fn replacement_internal_fn (gcall *);
> >
> > where the first one requires the return type, and the second one
> > operates on CALL_EXPRs.
>
> Hmm, for full generality the way we code-generate would need to change
> quite a bit.  Instead I've come up with the following quite limited
> approach.  You can write
>
> (for coss (COS COSH)
>  (simplify
>   (coss (ANY_COPYSIGN @0 @1))
>   (coss @0))))

This optimization is also handled by backprop (gimple-ssa-backprop.cc)
in a better way than the match code handle.
So maybe we don't really need to extend match-and-simplify here.
Right now backprop is only ran once early after inlining. Maybe run it
once more late would help?

Thanks,
Andrew


>
> with it.  For each internal function the following patch adds a
> ANY_<name> identifier.  The use is somewhat limited - you cannot
> use it as the outermost operation in the match part and you cannot
> use it in the replacement part at all.  The nice thing is there's
> no "iteration" at all, the ANY_COPYSIGN doesn't cause any pattern
> duplication, instead we match it via CASE_CFN_<name> so it will
> happily match mis-matched (typewise) calls (but those shouldn't
> be there...).
>
> The patch doesn't contain any defensiveness in the parser for the
> use restriction, but you should get compile failures for misuses
> at least.
>
> It should match quite some of the copysign cases, I suspect its
> of no use for most of the operators so maybe less general handling
> and only specifically introducing ANY_COPYSIGN would be better.
> At least I cannot think of any other functions that are matched
> but disappear in the resulting replacement?
>
> Richard.
>
> diff --git a/gcc/genmatch.cc b/gcc/genmatch.cc
> index 03d325efdf6..f7d3f51c013 100644
> --- a/gcc/genmatch.cc
> +++ b/gcc/genmatch.cc
> @@ -524,10 +524,14 @@ class fn_id : public id_base
>  {
>  public:
>    fn_id (enum built_in_function fn_, const char *id_)
> -      : id_base (id_base::FN, id_), fn (fn_) {}
> +      : id_base (id_base::FN, id_), fn (fn_), case_macro (nullptr) {}
>    fn_id (enum internal_fn fn_, const char *id_)
> -      : id_base (id_base::FN, id_), fn (int (END_BUILTINS) + int (fn_)) {}
> +      : id_base (id_base::FN, id_), fn (int (END_BUILTINS) + int (fn_)),
> +    case_macro (nullptr) {}
> +  fn_id (const char *case_macro_, const char *id_)
> +      : id_base (id_base::FN, id_), fn (-1U), case_macro (case_macro_) {}
>    unsigned int fn;
> +  const char *case_macro;
>  };
>
>  class simplify;
> @@ -3262,6 +3266,10 @@ dt_node::gen_kids_1 (FILE *f, int indent, bool gimple, int depth,
>               if (user_id *u = dyn_cast <user_id *> (e->operation))
>                 for (auto id : u->substitutes)
>                   fprintf_indent (f, indent, "case %s:\n", id->id);
> +             else if (is_a <fn_id *> (e->operation)
> +                      && as_a <fn_id *> (e->operation)->case_macro)
> +               fprintf_indent (f, indent, "%s:\n",
> +                               as_a <fn_id *> (e->operation)->case_macro);
>               else
>                 fprintf_indent (f, indent, "case %s:\n", e->operation->id);
>               /* We need to be defensive against bogus prototypes allowing
> @@ -3337,9 +3345,12 @@ dt_node::gen_kids_1 (FILE *f, int indent, bool gimple, int depth,
>        for (unsigned j = 0; j < generic_fns.length (); ++j)
>         {
>           expr *e = as_a <expr *>(generic_fns[j]->op);
> -         gcc_assert (e->operation->kind == id_base::FN);
> +         fn_id *oper = as_a <fn_id *> (e->operation);
>
> -         fprintf_indent (f, indent, "case %s:\n", e->operation->id);
> +         if (oper->case_macro)
> +           fprintf_indent (f, indent, "%s:\n", oper->case_macro);
> +         else
> +           fprintf_indent (f, indent, "case %s:\n", e->operation->id);
>           fprintf_indent (f, indent, "  if (call_expr_nargs (%s) == %d)\n"
>                                      "    {\n", kid_opname, e->ops.length ());
>           generic_fns[j]->gen (f, indent + 6, false, depth);
> @@ -5496,7 +5507,8 @@ main (int argc, char **argv)
>  #include "builtins.def"
>
>  #define DEF_INTERNAL_FN(CODE, NAME, FNSPEC) \
> -  add_function (IFN_##CODE, "CFN_" #CODE);
> +  add_function (IFN_##CODE, "CFN_" #CODE); \
> +  add_function ("CASE_CFN_" # CODE, "ANY_" # CODE);
>  #include "internal-fn.def"
>
>    /* Parse ahead!  */

  reply	other threads:[~2023-10-09  7:36 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-09-27  0:50 Tamar Christina
2023-09-27  1:17 ` Andrew Pinski
2023-09-27  2:31   ` Tamar Christina
2023-09-27  7:11     ` Richard Biener
2023-09-27  7:56       ` Tamar Christina
2023-09-27  9:35         ` Tamar Christina
2023-09-27  9:39           ` Richard Biener
2023-10-05 18:11             ` Tamar Christina
2023-10-06  6:24               ` Richard Biener
2023-10-07  9:22                 ` Richard Sandiford
2023-10-07 10:34                   ` Richard Biener
2023-10-07 11:34                     ` Richard Sandiford
2023-10-09  7:20                       ` Richard Biener
2023-10-09  7:36                         ` Andrew Pinski [this message]
2023-10-09  9:06                           ` Richard Biener
2023-09-29 15:00 ` Jeff Law
2023-10-05 18:09   ` Tamar Christina

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CA+=Sn1nToQ0HM_r3n_a85zgGB5bx6kVHOWehjZXiLTDbSJUUmw@mail.gmail.com' \
    --to=pinskia@gmail.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=jlaw@ventanamicro.com \
    --cc=nd@arm.com \
    --cc=rguenther@suse.de \
    --cc=richard.sandiford@arm.com \
    --cc=tamar.christina@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).