public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* Re: [PATCH][RFC] tree-optimization/88540 - FP x > y ? x : y if-conversion without -ffast-math
       [not found] <20230713095401.6C7E53858C41@sourceware.org>
@ 2023-07-14 20:14 ` Andrew Pinski
  2023-07-17  9:30   ` Richard Biener
  0 siblings, 1 reply; 4+ messages in thread
From: Andrew Pinski @ 2023-07-14 20:14 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches

On Thu, Jul 13, 2023 at 2:54 AM Richard Biener via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> The following makes sure that FP x > y ? x : y style max/min operations
> are if-converted at the GIMPLE level.  While we can neither match
> it to MAX_EXPR nor .FMAX as both have different semantics with IEEE
> than the ternary ?: operation we can make sure to maintain this form
> as a COND_EXPR so backends have the chance to match this to instructions
> their ISA offers.
>
> The patch does this in phiopt where we recognize min/max and instead
> of giving up when we have to honor NaNs we alter the generated code
> to a COND_EXPR.
>
> This resolves PR88540 and we can then SLP vectorize the min operation
> for its testcase.  It also resolves part of the regressions observed
> with the change matching bit-inserts of bit-field-refs to vec_perm.
>
> Expansion from a COND_EXPR rather than from compare-and-branch
> regresses gcc.target/i386/pr54855-13.c and gcc.target/i386/pr54855-9.c
> by producing extra moves while the corresponding min/max operations
> are now already synthesized by RTL expansion, register selection
> isn't optimal.  This can be also provoked without this change by
> altering the operand order in the source.
>
> It regresses gcc.target/i386/pr110170.c where we end up CSEing the
> condition which makes RTL expansion no longer produce the min/max
> directly and code generation is obfuscated enough to confuse
> RTL if-conversion.
>
> It also regresses gcc.target/i386/ssefp-[12].c where oddly one
> variant isn't if-converted and ix86_expand_fp_movcc doesn't
> match directly (the FP constants get expanded twice).  A fix
> could be in emit_conditional_move where both prepare_cmp_insn
> and emit_conditional_move_1 force the constants to (different)
> registers.
>
> Otherwise bootstrapped and tested on x86_64-unknown-linux-gnu.
>
>         PR tree-optimization/88540
>         * tree-ssa-phiopt.cc (minmax_replacement): Do not give up
>         with NaNs but handle the simple case by if-converting to a
>         COND_EXPR.

One thing which I was thinking about adding to phiopt is having the
last pass do the conversion to COND_EXPR if the target supports a
conditional move for that expression. That should fix this one right?
This was one of things I was working towards with the moving to use
match-and-simplify too.

Thanks,
Andrew

>
>         * gcc.target/i386/pr88540.c: New testcase.
>         * gcc.target/i386/pr54855-12.c: Adjust.
>         * gcc.target/i386/pr54855-13.c: Likewise.
> ---
>  gcc/testsuite/gcc.target/i386/pr54855-12.c |  2 +-
>  gcc/testsuite/gcc.target/i386/pr54855-13.c |  2 +-
>  gcc/testsuite/gcc.target/i386/pr88540.c    | 10 ++++++++++
>  gcc/tree-ssa-phiopt.cc                     | 21 ++++++++++++++++-----
>  4 files changed, 28 insertions(+), 7 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr88540.c
>
> diff --git a/gcc/testsuite/gcc.target/i386/pr54855-12.c b/gcc/testsuite/gcc.target/i386/pr54855-12.c
> index 2f8af392c83..09e8ab8ae39 100644
> --- a/gcc/testsuite/gcc.target/i386/pr54855-12.c
> +++ b/gcc/testsuite/gcc.target/i386/pr54855-12.c
> @@ -1,6 +1,6 @@
>  /* { dg-do compile } */
>  /* { dg-options "-O2 -mavx512fp16" } */
> -/* { dg-final { scan-assembler-times "vmaxsh\[ \\t\]" 1 } } */
> +/* { dg-final { scan-assembler-times "vm\[ai\]\[nx\]sh\[ \\t\]" 1 } } */
>  /* { dg-final { scan-assembler-not "vcomish\[ \\t\]" } } */
>  /* { dg-final { scan-assembler-not "vmovsh\[ \\t\]" { target { ! ia32 } } } } */
>
> diff --git a/gcc/testsuite/gcc.target/i386/pr54855-13.c b/gcc/testsuite/gcc.target/i386/pr54855-13.c
> index 87b4f459a5a..a4f25066f81 100644
> --- a/gcc/testsuite/gcc.target/i386/pr54855-13.c
> +++ b/gcc/testsuite/gcc.target/i386/pr54855-13.c
> @@ -1,6 +1,6 @@
>  /* { dg-do compile } */
>  /* { dg-options "-O2 -mavx512fp16" } */
> -/* { dg-final { scan-assembler-times "vmaxsh\[ \\t\]" 1 } } */
> +/* { dg-final { scan-assembler-times "vm\[ai\]\[nx\]sh\[ \\t\]" 1 } } */
>  /* { dg-final { scan-assembler-not "vcomish\[ \\t\]" } } */
>  /* { dg-final { scan-assembler-not "vmovsh\[ \\t\]" { target { ! ia32 } } } } */
>
> diff --git a/gcc/testsuite/gcc.target/i386/pr88540.c b/gcc/testsuite/gcc.target/i386/pr88540.c
> new file mode 100644
> index 00000000000..b927d0c57d5
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr88540.c
> @@ -0,0 +1,10 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -msse2" } */
> +
> +void test(double* __restrict d1, double* __restrict d2, double* __restrict d3)
> +{
> +  for (int n = 0; n < 2; ++n)
> +    d3[n] = d1[n] < d2[n] ? d1[n] : d2[n];
> +}
> +
> +/* { dg-final { scan-assembler "minpd" } } */
> diff --git a/gcc/tree-ssa-phiopt.cc b/gcc/tree-ssa-phiopt.cc
> index 467c9fd108a..13ee486831d 100644
> --- a/gcc/tree-ssa-phiopt.cc
> +++ b/gcc/tree-ssa-phiopt.cc
> @@ -1580,10 +1580,6 @@ minmax_replacement (basic_block cond_bb, basic_block middle_bb, basic_block alt_
>
>    tree type = TREE_TYPE (PHI_RESULT (phi));
>
> -  /* The optimization may be unsafe due to NaNs.  */
> -  if (HONOR_NANS (type) || HONOR_SIGNED_ZEROS (type))
> -    return false;
> -
>    gcond *cond = as_a <gcond *> (*gsi_last_bb (cond_bb));
>    enum tree_code cmp = gimple_cond_code (cond);
>    tree rhs = gimple_cond_rhs (cond);
> @@ -1770,6 +1766,9 @@ minmax_replacement (basic_block cond_bb, basic_block middle_bb, basic_block alt_
>        else
>         return false;
>      }
> +  else if (HONOR_NANS (type) || HONOR_SIGNED_ZEROS (type))
> +    /* The optimization may be unsafe due to NaNs.  */
> +    return false;
>    else if (middle_bb != alt_middle_bb && threeway_p)
>      {
>        /* Recognize the following case:
> @@ -2103,7 +2102,19 @@ minmax_replacement (basic_block cond_bb, basic_block middle_bb, basic_block alt_
>    /* Emit the statement to compute min/max.  */
>    gimple_seq stmts = NULL;
>    tree phi_result = PHI_RESULT (phi);
> -  result = gimple_build (&stmts, minmax, TREE_TYPE (phi_result), arg0, arg1);
> +
> +  /* When we can't use a MIN/MAX_EXPR still make sure the expression
> +     stays in a form to be recognized by ISA that map to IEEE x > y ? x : y
> +     semantics (that's not IEEE max semantics).  */
> +  if (HONOR_NANS (type) || HONOR_SIGNED_ZEROS (type))
> +    {
> +      result = gimple_build (&stmts, cmp, boolean_type_node,
> +                            gimple_cond_lhs (cond), rhs);
> +      result = gimple_build (&stmts, COND_EXPR, TREE_TYPE (phi_result),
> +                            result, arg_true, arg_false);
> +    }
> +  else
> +    result = gimple_build (&stmts, minmax, TREE_TYPE (phi_result), arg0, arg1);
>
>    gsi = gsi_last_bb (cond_bb);
>    gsi_insert_seq_before (&gsi, stmts, GSI_NEW_STMT);
> --
> 2.35.3

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH][RFC] tree-optimization/88540 - FP x > y ? x : y if-conversion without -ffast-math
  2023-07-14 20:14 ` [PATCH][RFC] tree-optimization/88540 - FP x > y ? x : y if-conversion without -ffast-math Andrew Pinski
@ 2023-07-17  9:30   ` Richard Biener
  2023-07-17 20:07     ` Andrew Pinski
  0 siblings, 1 reply; 4+ messages in thread
From: Richard Biener @ 2023-07-17  9:30 UTC (permalink / raw)
  To: Andrew Pinski; +Cc: gcc-patches

On Fri, 14 Jul 2023, Andrew Pinski wrote:

> On Thu, Jul 13, 2023 at 2:54?AM Richard Biener via Gcc-patches
> <gcc-patches@gcc.gnu.org> wrote:
> >
> > The following makes sure that FP x > y ? x : y style max/min operations
> > are if-converted at the GIMPLE level.  While we can neither match
> > it to MAX_EXPR nor .FMAX as both have different semantics with IEEE
> > than the ternary ?: operation we can make sure to maintain this form
> > as a COND_EXPR so backends have the chance to match this to instructions
> > their ISA offers.
> >
> > The patch does this in phiopt where we recognize min/max and instead
> > of giving up when we have to honor NaNs we alter the generated code
> > to a COND_EXPR.
> >
> > This resolves PR88540 and we can then SLP vectorize the min operation
> > for its testcase.  It also resolves part of the regressions observed
> > with the change matching bit-inserts of bit-field-refs to vec_perm.
> >
> > Expansion from a COND_EXPR rather than from compare-and-branch
> > regresses gcc.target/i386/pr54855-13.c and gcc.target/i386/pr54855-9.c
> > by producing extra moves while the corresponding min/max operations
> > are now already synthesized by RTL expansion, register selection
> > isn't optimal.  This can be also provoked without this change by
> > altering the operand order in the source.
> >
> > It regresses gcc.target/i386/pr110170.c where we end up CSEing the
> > condition which makes RTL expansion no longer produce the min/max
> > directly and code generation is obfuscated enough to confuse
> > RTL if-conversion.
> >
> > It also regresses gcc.target/i386/ssefp-[12].c where oddly one
> > variant isn't if-converted and ix86_expand_fp_movcc doesn't
> > match directly (the FP constants get expanded twice).  A fix
> > could be in emit_conditional_move where both prepare_cmp_insn
> > and emit_conditional_move_1 force the constants to (different)
> > registers.
> >
> > Otherwise bootstrapped and tested on x86_64-unknown-linux-gnu.
> >
> >         PR tree-optimization/88540
> >         * tree-ssa-phiopt.cc (minmax_replacement): Do not give up
> >         with NaNs but handle the simple case by if-converting to a
> >         COND_EXPR.
> 
> One thing which I was thinking about adding to phiopt is having the
> last pass do the conversion to COND_EXPR if the target supports a
> conditional move for that expression. That should fix this one right?
> This was one of things I was working towards with the moving to use
> match-and-simplify too.

Note the if-conversion has to happen before BB SLP but the last
phiopt is too late for this (yes, BB SLP could also be enhanced
to handle conditionals and do if-conversion on-the-fly).  For
BB SLP there's also usually jump threading making a mess of
same condition chain of if-convertible ops ...

As for the min + max case that regresses due 
to CSE (gcc.target/i386/pr110170.c) I wonder whether pre-expanding

 _1 = _2 < _3;
 _4 = _1 ? _2 : _3;
 _5 = _1 ? _3 : _2;

to something more clever would be appropriate anyway.  We could
adjust this to either duplicate _1 or expand the COND_EXPRs back
to a single CFG diamond.  I suppose force-duplicating non-vector
compares of COND_EXPRs to make TER work again would fix similar
regressions we might already observe (but I'm not aware of many
COND_EXPR generators).

Richard.

> Thanks,
> Andrew
> 
> >
> >         * gcc.target/i386/pr88540.c: New testcase.
> >         * gcc.target/i386/pr54855-12.c: Adjust.
> >         * gcc.target/i386/pr54855-13.c: Likewise.
> > ---
> >  gcc/testsuite/gcc.target/i386/pr54855-12.c |  2 +-
> >  gcc/testsuite/gcc.target/i386/pr54855-13.c |  2 +-
> >  gcc/testsuite/gcc.target/i386/pr88540.c    | 10 ++++++++++
> >  gcc/tree-ssa-phiopt.cc                     | 21 ++++++++++++++++-----
> >  4 files changed, 28 insertions(+), 7 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr88540.c
> >
> > diff --git a/gcc/testsuite/gcc.target/i386/pr54855-12.c b/gcc/testsuite/gcc.target/i386/pr54855-12.c
> > index 2f8af392c83..09e8ab8ae39 100644
> > --- a/gcc/testsuite/gcc.target/i386/pr54855-12.c
> > +++ b/gcc/testsuite/gcc.target/i386/pr54855-12.c
> > @@ -1,6 +1,6 @@
> >  /* { dg-do compile } */
> >  /* { dg-options "-O2 -mavx512fp16" } */
> > -/* { dg-final { scan-assembler-times "vmaxsh\[ \\t\]" 1 } } */
> > +/* { dg-final { scan-assembler-times "vm\[ai\]\[nx\]sh\[ \\t\]" 1 } } */
> >  /* { dg-final { scan-assembler-not "vcomish\[ \\t\]" } } */
> >  /* { dg-final { scan-assembler-not "vmovsh\[ \\t\]" { target { ! ia32 } } } } */
> >
> > diff --git a/gcc/testsuite/gcc.target/i386/pr54855-13.c b/gcc/testsuite/gcc.target/i386/pr54855-13.c
> > index 87b4f459a5a..a4f25066f81 100644
> > --- a/gcc/testsuite/gcc.target/i386/pr54855-13.c
> > +++ b/gcc/testsuite/gcc.target/i386/pr54855-13.c
> > @@ -1,6 +1,6 @@
> >  /* { dg-do compile } */
> >  /* { dg-options "-O2 -mavx512fp16" } */
> > -/* { dg-final { scan-assembler-times "vmaxsh\[ \\t\]" 1 } } */
> > +/* { dg-final { scan-assembler-times "vm\[ai\]\[nx\]sh\[ \\t\]" 1 } } */
> >  /* { dg-final { scan-assembler-not "vcomish\[ \\t\]" } } */
> >  /* { dg-final { scan-assembler-not "vmovsh\[ \\t\]" { target { ! ia32 } } } } */
> >
> > diff --git a/gcc/testsuite/gcc.target/i386/pr88540.c b/gcc/testsuite/gcc.target/i386/pr88540.c
> > new file mode 100644
> > index 00000000000..b927d0c57d5
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/i386/pr88540.c
> > @@ -0,0 +1,10 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O2 -msse2" } */
> > +
> > +void test(double* __restrict d1, double* __restrict d2, double* __restrict d3)
> > +{
> > +  for (int n = 0; n < 2; ++n)
> > +    d3[n] = d1[n] < d2[n] ? d1[n] : d2[n];
> > +}
> > +
> > +/* { dg-final { scan-assembler "minpd" } } */
> > diff --git a/gcc/tree-ssa-phiopt.cc b/gcc/tree-ssa-phiopt.cc
> > index 467c9fd108a..13ee486831d 100644
> > --- a/gcc/tree-ssa-phiopt.cc
> > +++ b/gcc/tree-ssa-phiopt.cc
> > @@ -1580,10 +1580,6 @@ minmax_replacement (basic_block cond_bb, basic_block middle_bb, basic_block alt_
> >
> >    tree type = TREE_TYPE (PHI_RESULT (phi));
> >
> > -  /* The optimization may be unsafe due to NaNs.  */
> > -  if (HONOR_NANS (type) || HONOR_SIGNED_ZEROS (type))
> > -    return false;
> > -
> >    gcond *cond = as_a <gcond *> (*gsi_last_bb (cond_bb));
> >    enum tree_code cmp = gimple_cond_code (cond);
> >    tree rhs = gimple_cond_rhs (cond);
> > @@ -1770,6 +1766,9 @@ minmax_replacement (basic_block cond_bb, basic_block middle_bb, basic_block alt_
> >        else
> >         return false;
> >      }
> > +  else if (HONOR_NANS (type) || HONOR_SIGNED_ZEROS (type))
> > +    /* The optimization may be unsafe due to NaNs.  */
> > +    return false;
> >    else if (middle_bb != alt_middle_bb && threeway_p)
> >      {
> >        /* Recognize the following case:
> > @@ -2103,7 +2102,19 @@ minmax_replacement (basic_block cond_bb, basic_block middle_bb, basic_block alt_
> >    /* Emit the statement to compute min/max.  */
> >    gimple_seq stmts = NULL;
> >    tree phi_result = PHI_RESULT (phi);
> > -  result = gimple_build (&stmts, minmax, TREE_TYPE (phi_result), arg0, arg1);
> > +
> > +  /* When we can't use a MIN/MAX_EXPR still make sure the expression
> > +     stays in a form to be recognized by ISA that map to IEEE x > y ? x : y
> > +     semantics (that's not IEEE max semantics).  */
> > +  if (HONOR_NANS (type) || HONOR_SIGNED_ZEROS (type))
> > +    {
> > +      result = gimple_build (&stmts, cmp, boolean_type_node,
> > +                            gimple_cond_lhs (cond), rhs);
> > +      result = gimple_build (&stmts, COND_EXPR, TREE_TYPE (phi_result),
> > +                            result, arg_true, arg_false);
> > +    }
> > +  else
> > +    result = gimple_build (&stmts, minmax, TREE_TYPE (phi_result), arg0, arg1);
> >
> >    gsi = gsi_last_bb (cond_bb);
> >    gsi_insert_seq_before (&gsi, stmts, GSI_NEW_STMT);
> > --
> > 2.35.3
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
HRB 36809 (AG Nuernberg)

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH][RFC] tree-optimization/88540 - FP x > y ? x : y if-conversion without -ffast-math
  2023-07-17  9:30   ` Richard Biener
@ 2023-07-17 20:07     ` Andrew Pinski
  0 siblings, 0 replies; 4+ messages in thread
From: Andrew Pinski @ 2023-07-17 20:07 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches

On Mon, Jul 17, 2023 at 2:30 AM Richard Biener <rguenther@suse.de> wrote:
>
> On Fri, 14 Jul 2023, Andrew Pinski wrote:
>
> > On Thu, Jul 13, 2023 at 2:54?AM Richard Biener via Gcc-patches
> > <gcc-patches@gcc.gnu.org> wrote:
> > >
> > > The following makes sure that FP x > y ? x : y style max/min operations
> > > are if-converted at the GIMPLE level.  While we can neither match
> > > it to MAX_EXPR nor .FMAX as both have different semantics with IEEE
> > > than the ternary ?: operation we can make sure to maintain this form
> > > as a COND_EXPR so backends have the chance to match this to instructions
> > > their ISA offers.
> > >
> > > The patch does this in phiopt where we recognize min/max and instead
> > > of giving up when we have to honor NaNs we alter the generated code
> > > to a COND_EXPR.
> > >
> > > This resolves PR88540 and we can then SLP vectorize the min operation
> > > for its testcase.  It also resolves part of the regressions observed
> > > with the change matching bit-inserts of bit-field-refs to vec_perm.
> > >
> > > Expansion from a COND_EXPR rather than from compare-and-branch
> > > regresses gcc.target/i386/pr54855-13.c and gcc.target/i386/pr54855-9.c
> > > by producing extra moves while the corresponding min/max operations
> > > are now already synthesized by RTL expansion, register selection
> > > isn't optimal.  This can be also provoked without this change by
> > > altering the operand order in the source.
> > >
> > > It regresses gcc.target/i386/pr110170.c where we end up CSEing the
> > > condition which makes RTL expansion no longer produce the min/max
> > > directly and code generation is obfuscated enough to confuse
> > > RTL if-conversion.
> > >
> > > It also regresses gcc.target/i386/ssefp-[12].c where oddly one
> > > variant isn't if-converted and ix86_expand_fp_movcc doesn't
> > > match directly (the FP constants get expanded twice).  A fix
> > > could be in emit_conditional_move where both prepare_cmp_insn
> > > and emit_conditional_move_1 force the constants to (different)
> > > registers.
> > >
> > > Otherwise bootstrapped and tested on x86_64-unknown-linux-gnu.
> > >
> > >         PR tree-optimization/88540
> > >         * tree-ssa-phiopt.cc (minmax_replacement): Do not give up
> > >         with NaNs but handle the simple case by if-converting to a
> > >         COND_EXPR.
> >
> > One thing which I was thinking about adding to phiopt is having the
> > last pass do the conversion to COND_EXPR if the target supports a
> > conditional move for that expression. That should fix this one right?
> > This was one of things I was working towards with the moving to use
> > match-and-simplify too.
>
> Note the if-conversion has to happen before BB SLP but the last
> phiopt is too late for this (yes, BB SLP could also be enhanced
> to handle conditionals and do if-conversion on-the-fly).  For
> BB SLP there's also usually jump threading making a mess of
> same condition chain of if-convertible ops ...

Oh, I didn't think about that. I was thinking more of PR 110170 and PR
106952 when I saw this patch rather than thinking of SLP vectorizer
related stuff.

>
> As for the min + max case that regresses due
> to CSE (gcc.target/i386/pr110170.c) I wonder whether pre-expanding
>
>  _1 = _2 < _3;
>  _4 = _1 ? _2 : _3;
>  _5 = _1 ? _3 : _2;
>
> to something more clever would be appropriate anyway.  We could
> adjust this to either duplicate _1 or expand the COND_EXPRs back
> to a single CFG diamond.  I suppose force-duplicating non-vector
> compares of COND_EXPRs to make TER work again would fix similar
> regressions we might already observe (but I'm not aware of many
> COND_EXPR generators).

Oh yes you had already recorded as PR 105715 too.

Thanks,
Andrew Pinski


>
> Richard.
>
> > Thanks,
> > Andrew
> >
> > >
> > >         * gcc.target/i386/pr88540.c: New testcase.
> > >         * gcc.target/i386/pr54855-12.c: Adjust.
> > >         * gcc.target/i386/pr54855-13.c: Likewise.
> > > ---
> > >  gcc/testsuite/gcc.target/i386/pr54855-12.c |  2 +-
> > >  gcc/testsuite/gcc.target/i386/pr54855-13.c |  2 +-
> > >  gcc/testsuite/gcc.target/i386/pr88540.c    | 10 ++++++++++
> > >  gcc/tree-ssa-phiopt.cc                     | 21 ++++++++++++++++-----
> > >  4 files changed, 28 insertions(+), 7 deletions(-)
> > >  create mode 100644 gcc/testsuite/gcc.target/i386/pr88540.c
> > >
> > > diff --git a/gcc/testsuite/gcc.target/i386/pr54855-12.c b/gcc/testsuite/gcc.target/i386/pr54855-12.c
> > > index 2f8af392c83..09e8ab8ae39 100644
> > > --- a/gcc/testsuite/gcc.target/i386/pr54855-12.c
> > > +++ b/gcc/testsuite/gcc.target/i386/pr54855-12.c
> > > @@ -1,6 +1,6 @@
> > >  /* { dg-do compile } */
> > >  /* { dg-options "-O2 -mavx512fp16" } */
> > > -/* { dg-final { scan-assembler-times "vmaxsh\[ \\t\]" 1 } } */
> > > +/* { dg-final { scan-assembler-times "vm\[ai\]\[nx\]sh\[ \\t\]" 1 } } */
> > >  /* { dg-final { scan-assembler-not "vcomish\[ \\t\]" } } */
> > >  /* { dg-final { scan-assembler-not "vmovsh\[ \\t\]" { target { ! ia32 } } } } */
> > >
> > > diff --git a/gcc/testsuite/gcc.target/i386/pr54855-13.c b/gcc/testsuite/gcc.target/i386/pr54855-13.c
> > > index 87b4f459a5a..a4f25066f81 100644
> > > --- a/gcc/testsuite/gcc.target/i386/pr54855-13.c
> > > +++ b/gcc/testsuite/gcc.target/i386/pr54855-13.c
> > > @@ -1,6 +1,6 @@
> > >  /* { dg-do compile } */
> > >  /* { dg-options "-O2 -mavx512fp16" } */
> > > -/* { dg-final { scan-assembler-times "vmaxsh\[ \\t\]" 1 } } */
> > > +/* { dg-final { scan-assembler-times "vm\[ai\]\[nx\]sh\[ \\t\]" 1 } } */
> > >  /* { dg-final { scan-assembler-not "vcomish\[ \\t\]" } } */
> > >  /* { dg-final { scan-assembler-not "vmovsh\[ \\t\]" { target { ! ia32 } } } } */
> > >
> > > diff --git a/gcc/testsuite/gcc.target/i386/pr88540.c b/gcc/testsuite/gcc.target/i386/pr88540.c
> > > new file mode 100644
> > > index 00000000000..b927d0c57d5
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.target/i386/pr88540.c
> > > @@ -0,0 +1,10 @@
> > > +/* { dg-do compile } */
> > > +/* { dg-options "-O2 -msse2" } */
> > > +
> > > +void test(double* __restrict d1, double* __restrict d2, double* __restrict d3)
> > > +{
> > > +  for (int n = 0; n < 2; ++n)
> > > +    d3[n] = d1[n] < d2[n] ? d1[n] : d2[n];
> > > +}
> > > +
> > > +/* { dg-final { scan-assembler "minpd" } } */
> > > diff --git a/gcc/tree-ssa-phiopt.cc b/gcc/tree-ssa-phiopt.cc
> > > index 467c9fd108a..13ee486831d 100644
> > > --- a/gcc/tree-ssa-phiopt.cc
> > > +++ b/gcc/tree-ssa-phiopt.cc
> > > @@ -1580,10 +1580,6 @@ minmax_replacement (basic_block cond_bb, basic_block middle_bb, basic_block alt_
> > >
> > >    tree type = TREE_TYPE (PHI_RESULT (phi));
> > >
> > > -  /* The optimization may be unsafe due to NaNs.  */
> > > -  if (HONOR_NANS (type) || HONOR_SIGNED_ZEROS (type))
> > > -    return false;
> > > -
> > >    gcond *cond = as_a <gcond *> (*gsi_last_bb (cond_bb));
> > >    enum tree_code cmp = gimple_cond_code (cond);
> > >    tree rhs = gimple_cond_rhs (cond);
> > > @@ -1770,6 +1766,9 @@ minmax_replacement (basic_block cond_bb, basic_block middle_bb, basic_block alt_
> > >        else
> > >         return false;
> > >      }
> > > +  else if (HONOR_NANS (type) || HONOR_SIGNED_ZEROS (type))
> > > +    /* The optimization may be unsafe due to NaNs.  */
> > > +    return false;
> > >    else if (middle_bb != alt_middle_bb && threeway_p)
> > >      {
> > >        /* Recognize the following case:
> > > @@ -2103,7 +2102,19 @@ minmax_replacement (basic_block cond_bb, basic_block middle_bb, basic_block alt_
> > >    /* Emit the statement to compute min/max.  */
> > >    gimple_seq stmts = NULL;
> > >    tree phi_result = PHI_RESULT (phi);
> > > -  result = gimple_build (&stmts, minmax, TREE_TYPE (phi_result), arg0, arg1);
> > > +
> > > +  /* When we can't use a MIN/MAX_EXPR still make sure the expression
> > > +     stays in a form to be recognized by ISA that map to IEEE x > y ? x : y
> > > +     semantics (that's not IEEE max semantics).  */
> > > +  if (HONOR_NANS (type) || HONOR_SIGNED_ZEROS (type))
> > > +    {
> > > +      result = gimple_build (&stmts, cmp, boolean_type_node,
> > > +                            gimple_cond_lhs (cond), rhs);
> > > +      result = gimple_build (&stmts, COND_EXPR, TREE_TYPE (phi_result),
> > > +                            result, arg_true, arg_false);
> > > +    }
> > > +  else
> > > +    result = gimple_build (&stmts, minmax, TREE_TYPE (phi_result), arg0, arg1);
> > >
> > >    gsi = gsi_last_bb (cond_bb);
> > >    gsi_insert_seq_before (&gsi, stmts, GSI_NEW_STMT);
> > > --
> > > 2.35.3
> >
>
> --
> Richard Biener <rguenther@suse.de>
> SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
> Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
> HRB 36809 (AG Nuernberg)

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH][RFC] tree-optimization/88540 - FP x > y ? x : y if-conversion without -ffast-math
@ 2023-07-13  9:53 Richard Biener
  0 siblings, 0 replies; 4+ messages in thread
From: Richard Biener @ 2023-07-13  9:53 UTC (permalink / raw)
  To: gcc-patches

The following makes sure that FP x > y ? x : y style max/min operations
are if-converted at the GIMPLE level.  While we can neither match
it to MAX_EXPR nor .FMAX as both have different semantics with IEEE
than the ternary ?: operation we can make sure to maintain this form
as a COND_EXPR so backends have the chance to match this to instructions
their ISA offers.

The patch does this in phiopt where we recognize min/max and instead
of giving up when we have to honor NaNs we alter the generated code
to a COND_EXPR.

This resolves PR88540 and we can then SLP vectorize the min operation
for its testcase.  It also resolves part of the regressions observed
with the change matching bit-inserts of bit-field-refs to vec_perm.

Expansion from a COND_EXPR rather than from compare-and-branch
regresses gcc.target/i386/pr54855-13.c and gcc.target/i386/pr54855-9.c
by producing extra moves while the corresponding min/max operations
are now already synthesized by RTL expansion, register selection
isn't optimal.  This can be also provoked without this change by
altering the operand order in the source.

It regresses gcc.target/i386/pr110170.c where we end up CSEing the
condition which makes RTL expansion no longer produce the min/max
directly and code generation is obfuscated enough to confuse
RTL if-conversion.

It also regresses gcc.target/i386/ssefp-[12].c where oddly one
variant isn't if-converted and ix86_expand_fp_movcc doesn't
match directly (the FP constants get expanded twice).  A fix
could be in emit_conditional_move where both prepare_cmp_insn
and emit_conditional_move_1 force the constants to (different)
registers.

Otherwise bootstrapped and tested on x86_64-unknown-linux-gnu.

	PR tree-optimization/88540
	* tree-ssa-phiopt.cc (minmax_replacement): Do not give up
	with NaNs but handle the simple case by if-converting to a
	COND_EXPR.

	* gcc.target/i386/pr88540.c: New testcase.
	* gcc.target/i386/pr54855-12.c: Adjust.
	* gcc.target/i386/pr54855-13.c: Likewise.
---
 gcc/testsuite/gcc.target/i386/pr54855-12.c |  2 +-
 gcc/testsuite/gcc.target/i386/pr54855-13.c |  2 +-
 gcc/testsuite/gcc.target/i386/pr88540.c    | 10 ++++++++++
 gcc/tree-ssa-phiopt.cc                     | 21 ++++++++++++++++-----
 4 files changed, 28 insertions(+), 7 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr88540.c

diff --git a/gcc/testsuite/gcc.target/i386/pr54855-12.c b/gcc/testsuite/gcc.target/i386/pr54855-12.c
index 2f8af392c83..09e8ab8ae39 100644
--- a/gcc/testsuite/gcc.target/i386/pr54855-12.c
+++ b/gcc/testsuite/gcc.target/i386/pr54855-12.c
@@ -1,6 +1,6 @@
 /* { dg-do compile } */
 /* { dg-options "-O2 -mavx512fp16" } */
-/* { dg-final { scan-assembler-times "vmaxsh\[ \\t\]" 1 } } */
+/* { dg-final { scan-assembler-times "vm\[ai\]\[nx\]sh\[ \\t\]" 1 } } */
 /* { dg-final { scan-assembler-not "vcomish\[ \\t\]" } } */
 /* { dg-final { scan-assembler-not "vmovsh\[ \\t\]" { target { ! ia32 } } } } */
 
diff --git a/gcc/testsuite/gcc.target/i386/pr54855-13.c b/gcc/testsuite/gcc.target/i386/pr54855-13.c
index 87b4f459a5a..a4f25066f81 100644
--- a/gcc/testsuite/gcc.target/i386/pr54855-13.c
+++ b/gcc/testsuite/gcc.target/i386/pr54855-13.c
@@ -1,6 +1,6 @@
 /* { dg-do compile } */
 /* { dg-options "-O2 -mavx512fp16" } */
-/* { dg-final { scan-assembler-times "vmaxsh\[ \\t\]" 1 } } */
+/* { dg-final { scan-assembler-times "vm\[ai\]\[nx\]sh\[ \\t\]" 1 } } */
 /* { dg-final { scan-assembler-not "vcomish\[ \\t\]" } } */
 /* { dg-final { scan-assembler-not "vmovsh\[ \\t\]" { target { ! ia32 } } } } */
 
diff --git a/gcc/testsuite/gcc.target/i386/pr88540.c b/gcc/testsuite/gcc.target/i386/pr88540.c
new file mode 100644
index 00000000000..b927d0c57d5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr88540.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -msse2" } */
+
+void test(double* __restrict d1, double* __restrict d2, double* __restrict d3)
+{
+  for (int n = 0; n < 2; ++n)
+    d3[n] = d1[n] < d2[n] ? d1[n] : d2[n];
+}
+
+/* { dg-final { scan-assembler "minpd" } } */
diff --git a/gcc/tree-ssa-phiopt.cc b/gcc/tree-ssa-phiopt.cc
index 467c9fd108a..13ee486831d 100644
--- a/gcc/tree-ssa-phiopt.cc
+++ b/gcc/tree-ssa-phiopt.cc
@@ -1580,10 +1580,6 @@ minmax_replacement (basic_block cond_bb, basic_block middle_bb, basic_block alt_
 
   tree type = TREE_TYPE (PHI_RESULT (phi));
 
-  /* The optimization may be unsafe due to NaNs.  */
-  if (HONOR_NANS (type) || HONOR_SIGNED_ZEROS (type))
-    return false;
-
   gcond *cond = as_a <gcond *> (*gsi_last_bb (cond_bb));
   enum tree_code cmp = gimple_cond_code (cond);
   tree rhs = gimple_cond_rhs (cond);
@@ -1770,6 +1766,9 @@ minmax_replacement (basic_block cond_bb, basic_block middle_bb, basic_block alt_
       else
 	return false;
     }
+  else if (HONOR_NANS (type) || HONOR_SIGNED_ZEROS (type))
+    /* The optimization may be unsafe due to NaNs.  */
+    return false;
   else if (middle_bb != alt_middle_bb && threeway_p)
     {
       /* Recognize the following case:
@@ -2103,7 +2102,19 @@ minmax_replacement (basic_block cond_bb, basic_block middle_bb, basic_block alt_
   /* Emit the statement to compute min/max.  */
   gimple_seq stmts = NULL;
   tree phi_result = PHI_RESULT (phi);
-  result = gimple_build (&stmts, minmax, TREE_TYPE (phi_result), arg0, arg1);
+
+  /* When we can't use a MIN/MAX_EXPR still make sure the expression
+     stays in a form to be recognized by ISA that map to IEEE x > y ? x : y
+     semantics (that's not IEEE max semantics).  */
+  if (HONOR_NANS (type) || HONOR_SIGNED_ZEROS (type))
+    {
+      result = gimple_build (&stmts, cmp, boolean_type_node,
+			     gimple_cond_lhs (cond), rhs);
+      result = gimple_build (&stmts, COND_EXPR, TREE_TYPE (phi_result),
+			     result, arg_true, arg_false);
+    }
+  else
+    result = gimple_build (&stmts, minmax, TREE_TYPE (phi_result), arg0, arg1);
 
   gsi = gsi_last_bb (cond_bb);
   gsi_insert_seq_before (&gsi, stmts, GSI_NEW_STMT);
-- 
2.35.3

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2023-07-17 20:07 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20230713095401.6C7E53858C41@sourceware.org>
2023-07-14 20:14 ` [PATCH][RFC] tree-optimization/88540 - FP x > y ? x : y if-conversion without -ffast-math Andrew Pinski
2023-07-17  9:30   ` Richard Biener
2023-07-17 20:07     ` Andrew Pinski
2023-07-13  9:53 Richard Biener

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).