[PATCH] [vect]Use intermiediate integer type for float_expr/fix_trunc

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [PATCH] [vect]Use intermiediate integer type for float_expr/fix_trunc_expr when direct optab is not existed.
@ 2023-06-02  1:00 liuhongt
  2023-06-20  8:38 ` Richard Biener
  0 siblings, 1 reply; 16+ messages in thread
From: liuhongt @ 2023-06-02  1:00 UTC (permalink / raw)
  To: gcc-patches; +Cc: crazylht, hjl.tools

We have already use intermidate type in case WIDEN, but not for NONE,
this patch extended that.

I didn't do that in pattern recog since we need to know whether the
stmt belongs to any slp_node to decide the vectype, the related optabs
are checked according to vectype_in and vectype_out. For non-slp case,
vec_pack/unpack are always used when lhs has different size from rhs,
for slp case, sometimes vec_pack/unpack is used, somethings
direct conversion is used.

Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
Ok for trunk?

gcc/ChangeLog:

	PR target/110018
	* tree-vect-stmts.cc (vectorizable_conversion): Use
	intermiediate integer type for float_expr/fix_trunc_expr when
	direct optab is not existed.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/pr110018-1.c: New test.
---
 gcc/testsuite/gcc.target/i386/pr110018-1.c | 94 ++++++++++++++++++++++
 gcc/tree-vect-stmts.cc                     | 56 ++++++++++++-
 2 files changed, 149 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr110018-1.c

diff --git a/gcc/testsuite/gcc.target/i386/pr110018-1.c b/gcc/testsuite/gcc.target/i386/pr110018-1.c
new file mode 100644
index 00000000000..b1baffd7af1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr110018-1.c
@@ -0,0 +1,94 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512fp16 -mavx512vl -O2 -mavx512dq" } */
+/* { dg-final { scan-assembler-times {(?n)vcvttp[dsh]2[dqw]} 5 } } */
+/* { dg-final { scan-assembler-times {(?n)vcvt[dqw]*2p[dsh]} 5 } } */
+
+void
+foo (double* __restrict a, char* b)
+{
+  a[0] = b[0];
+  a[1] = b[1];
+}
+
+void
+foo1 (float* __restrict a, char* b)
+{
+  a[0] = b[0];
+  a[1] = b[1];
+  a[2] = b[2];
+  a[3] = b[3];
+}
+
+void
+foo2 (_Float16* __restrict a, char* b)
+{
+  a[0] = b[0];
+  a[1] = b[1];
+  a[2] = b[2];
+  a[3] = b[3];
+  a[4] = b[4];
+  a[5] = b[5];
+  a[6] = b[6];
+  a[7] = b[7];
+}
+
+void
+foo3 (double* __restrict a, short* b)
+{
+  a[0] = b[0];
+  a[1] = b[1];
+}
+
+void
+foo4 (float* __restrict a, char* b)
+{
+  a[0] = b[0];
+  a[1] = b[1];
+  a[2] = b[2];
+  a[3] = b[3];
+}
+
+void
+foo5 (double* __restrict b, char* a)
+{
+  a[0] = b[0];
+  a[1] = b[1];
+}
+
+void
+foo6 (float* __restrict b, char* a)
+{
+  a[0] = b[0];
+  a[1] = b[1];
+  a[2] = b[2];
+  a[3] = b[3];
+}
+
+void
+foo7 (_Float16* __restrict b, char* a)
+{
+  a[0] = b[0];
+  a[1] = b[1];
+  a[2] = b[2];
+  a[3] = b[3];
+  a[4] = b[4];
+  a[5] = b[5];
+  a[6] = b[6];
+  a[7] = b[7];
+}
+
+void
+foo8 (double* __restrict b, short* a)
+{
+  a[0] = b[0];
+  a[1] = b[1];
+}
+
+void
+foo9 (float* __restrict b, char* a)
+{
+  a[0] = b[0];
+  a[1] = b[1];
+  a[2] = b[2];
+  a[3] = b[3];
+}
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index bd3b07a3aa1..1118c89686d 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -5162,6 +5162,49 @@ vectorizable_conversion (vec_info *vinfo,
 	return false;
       if (supportable_convert_operation (code, vectype_out, vectype_in, &code1))
 	break;
+      if ((code == FLOAT_EXPR
+	   && GET_MODE_SIZE (lhs_mode) > GET_MODE_SIZE (rhs_mode))
+	  || (code == FIX_TRUNC_EXPR
+	      && GET_MODE_SIZE (rhs_mode) > GET_MODE_SIZE (lhs_mode)))
+	{
+	  bool float_expr_p = code == FLOAT_EXPR;
+	  scalar_mode imode = float_expr_p ? rhs_mode : lhs_mode;
+	  fltsz = GET_MODE_SIZE (float_expr_p ? lhs_mode : rhs_mode);
+	  code1 = float_expr_p ? code : NOP_EXPR;
+	  codecvt1 = float_expr_p ? NOP_EXPR : code;
+	  FOR_EACH_2XWIDER_MODE (rhs_mode_iter, imode)
+	    {
+	      imode = rhs_mode_iter.require ();
+	      if (GET_MODE_SIZE (imode) > fltsz)
+		break;
+
+	      cvt_type
+		= build_nonstandard_integer_type (GET_MODE_BITSIZE (imode),
+						  0);
+	      cvt_type = get_vectype_for_scalar_type (vinfo, cvt_type,
+						      slp_node);
+	      /* This should only happened for SLP as long as loop vectorizer
+		 only supports same-sized vector.  */
+	      if (cvt_type == NULL_TREE
+		  || maybe_ne (TYPE_VECTOR_SUBPARTS (cvt_type), nunits_in)
+		  || !supportable_convert_operation (code1, vectype_out,
+						     cvt_type, &code1)
+		  || !supportable_convert_operation (codecvt1, cvt_type,
+						     vectype_in, &codecvt1))
+		continue;
+
+	      found_mode = true;
+	      break;
+	    }
+
+	  if (found_mode)
+	    {
+	      multi_step_cvt++;
+	      interm_types.safe_push (cvt_type);
+	      cvt_type = NULL_TREE;
+	      break;
+	    }
+	}
       /* FALLTHRU */
     unsupported:
       if (dump_enabled_p ())
@@ -5381,7 +5424,18 @@ vectorizable_conversion (vec_info *vinfo,
 	{
 	  /* Arguments are ready, create the new vector stmt.  */
 	  gcc_assert (TREE_CODE_LENGTH (code1) == unary_op);
-	  gassign *new_stmt = gimple_build_assign (vec_dest, code1, vop0);
+	  gassign* new_stmt;
+	  if (multi_step_cvt)
+	    {
+	      gcc_assert (multi_step_cvt == 1);
+	      new_stmt = gimple_build_assign (vec_dest, codecvt1, vop0);
+	      new_temp = make_ssa_name (vec_dest, new_stmt);
+	      gimple_assign_set_lhs (new_stmt, new_temp);
+	      vect_finish_stmt_generation (vinfo, stmt_info, new_stmt, gsi);
+	      vop0 = new_temp;
+	      vec_dest = vec_dsts[0];
+	    }
+	  new_stmt = gimple_build_assign (vec_dest, code1, vop0);
 	  new_temp = make_ssa_name (vec_dest, new_stmt);
 	  gimple_assign_set_lhs (new_stmt, new_temp);
 	  vect_finish_stmt_generation (vinfo, stmt_info, new_stmt, gsi);
-- 
2.39.1.388.g2fc9e9ca3c


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] [vect]Use intermiediate integer type for float_expr/fix_trunc_expr when direct optab is not existed.
  2023-06-02  1:00 [PATCH] [vect]Use intermiediate integer type for float_expr/fix_trunc_expr when direct optab is not existed liuhongt
@ 2023-06-20  8:38 ` Richard Biener
  2023-06-20  9:09   ` Hongtao Liu
  2023-06-21  9:10   ` Richard Sandiford
  0 siblings, 2 replies; 16+ messages in thread
From: Richard Biener @ 2023-06-20  8:38 UTC (permalink / raw)
  To: liuhongt; +Cc: gcc-patches, crazylht, hjl.tools

On Fri, Jun 2, 2023 at 3:01 AM liuhongt via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> We have already use intermidate type in case WIDEN, but not for NONE,
> this patch extended that.
>
> I didn't do that in pattern recog since we need to know whether the
> stmt belongs to any slp_node to decide the vectype, the related optabs
> are checked according to vectype_in and vectype_out. For non-slp case,
> vec_pack/unpack are always used when lhs has different size from rhs,
> for slp case, sometimes vec_pack/unpack is used, somethings
> direct conversion is used.
>
> Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
> Ok for trunk?
>
> gcc/ChangeLog:
>
>         PR target/110018
>         * tree-vect-stmts.cc (vectorizable_conversion): Use
>         intermiediate integer type for float_expr/fix_trunc_expr when
>         direct optab is not existed.
>
> gcc/testsuite/ChangeLog:
>
>         * gcc.target/i386/pr110018-1.c: New test.
> ---
>  gcc/testsuite/gcc.target/i386/pr110018-1.c | 94 ++++++++++++++++++++++
>  gcc/tree-vect-stmts.cc                     | 56 ++++++++++++-
>  2 files changed, 149 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr110018-1.c
>
> diff --git a/gcc/testsuite/gcc.target/i386/pr110018-1.c b/gcc/testsuite/gcc.target/i386/pr110018-1.c
> new file mode 100644
> index 00000000000..b1baffd7af1
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr110018-1.c
> @@ -0,0 +1,94 @@
> +/* { dg-do compile } */
> +/* { dg-options "-mavx512fp16 -mavx512vl -O2 -mavx512dq" } */
> +/* { dg-final { scan-assembler-times {(?n)vcvttp[dsh]2[dqw]} 5 } } */
> +/* { dg-final { scan-assembler-times {(?n)vcvt[dqw]*2p[dsh]} 5 } } */
> +
> +void
> +foo (double* __restrict a, char* b)
> +{
> +  a[0] = b[0];
> +  a[1] = b[1];
> +}
> +
> +void
> +foo1 (float* __restrict a, char* b)
> +{
> +  a[0] = b[0];
> +  a[1] = b[1];
> +  a[2] = b[2];
> +  a[3] = b[3];
> +}
> +
> +void
> +foo2 (_Float16* __restrict a, char* b)
> +{
> +  a[0] = b[0];
> +  a[1] = b[1];
> +  a[2] = b[2];
> +  a[3] = b[3];
> +  a[4] = b[4];
> +  a[5] = b[5];
> +  a[6] = b[6];
> +  a[7] = b[7];
> +}
> +
> +void
> +foo3 (double* __restrict a, short* b)
> +{
> +  a[0] = b[0];
> +  a[1] = b[1];
> +}
> +
> +void
> +foo4 (float* __restrict a, char* b)
> +{
> +  a[0] = b[0];
> +  a[1] = b[1];
> +  a[2] = b[2];
> +  a[3] = b[3];
> +}
> +
> +void
> +foo5 (double* __restrict b, char* a)
> +{
> +  a[0] = b[0];
> +  a[1] = b[1];
> +}
> +
> +void
> +foo6 (float* __restrict b, char* a)
> +{
> +  a[0] = b[0];
> +  a[1] = b[1];
> +  a[2] = b[2];
> +  a[3] = b[3];
> +}
> +
> +void
> +foo7 (_Float16* __restrict b, char* a)
> +{
> +  a[0] = b[0];
> +  a[1] = b[1];
> +  a[2] = b[2];
> +  a[3] = b[3];
> +  a[4] = b[4];
> +  a[5] = b[5];
> +  a[6] = b[6];
> +  a[7] = b[7];
> +}
> +
> +void
> +foo8 (double* __restrict b, short* a)
> +{
> +  a[0] = b[0];
> +  a[1] = b[1];
> +}
> +
> +void
> +foo9 (float* __restrict b, char* a)
> +{
> +  a[0] = b[0];
> +  a[1] = b[1];
> +  a[2] = b[2];
> +  a[3] = b[3];
> +}
> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> index bd3b07a3aa1..1118c89686d 100644
> --- a/gcc/tree-vect-stmts.cc
> +++ b/gcc/tree-vect-stmts.cc
> @@ -5162,6 +5162,49 @@ vectorizable_conversion (vec_info *vinfo,
>         return false;
>        if (supportable_convert_operation (code, vectype_out, vectype_in, &code1))
>         break;

A comment would be nice here.  Like

   /* For conversions between float and smaller integer types try whether we can
      use intermediate signed integer types to support the conversion.  */

> +      if ((code == FLOAT_EXPR
> +          && GET_MODE_SIZE (lhs_mode) > GET_MODE_SIZE (rhs_mode))
> +         || (code == FIX_TRUNC_EXPR
> +             && GET_MODE_SIZE (rhs_mode) > GET_MODE_SIZE (lhs_mode)))
> +       {
> +         bool float_expr_p = code == FLOAT_EXPR;
> +         scalar_mode imode = float_expr_p ? rhs_mode : lhs_mode;
> +         fltsz = GET_MODE_SIZE (float_expr_p ? lhs_mode : rhs_mode);
> +         code1 = float_expr_p ? code : NOP_EXPR;
> +         codecvt1 = float_expr_p ? NOP_EXPR : code;
> +         FOR_EACH_2XWIDER_MODE (rhs_mode_iter, imode)
> +           {
> +             imode = rhs_mode_iter.require ();
> +             if (GET_MODE_SIZE (imode) > fltsz)
> +               break;
> +
> +             cvt_type
> +               = build_nonstandard_integer_type (GET_MODE_BITSIZE (imode),
> +                                                 0);
> +             cvt_type = get_vectype_for_scalar_type (vinfo, cvt_type,
> +                                                     slp_node);
> +             /* This should only happened for SLP as long as loop vectorizer
> +                only supports same-sized vector.  */
> +             if (cvt_type == NULL_TREE
> +                 || maybe_ne (TYPE_VECTOR_SUBPARTS (cvt_type), nunits_in)
> +                 || !supportable_convert_operation (code1, vectype_out,
> +                                                    cvt_type, &code1)
> +                 || !supportable_convert_operation (codecvt1, cvt_type,
> +                                                    vectype_in, &codecvt1))
> +               continue;
> +
> +             found_mode = true;
> +             break;
> +           }
> +
> +         if (found_mode)
> +           {
> +             multi_step_cvt++;
> +             interm_types.safe_push (cvt_type);
> +             cvt_type = NULL_TREE;
> +             break;
> +           }
> +       }
>        /* FALLTHRU */
>      unsupported:
>        if (dump_enabled_p ())
> @@ -5381,7 +5424,18 @@ vectorizable_conversion (vec_info *vinfo,
>         {
>           /* Arguments are ready, create the new vector stmt.  */
>           gcc_assert (TREE_CODE_LENGTH (code1) == unary_op);
> -         gassign *new_stmt = gimple_build_assign (vec_dest, code1, vop0);
> +         gassign* new_stmt;
> +         if (multi_step_cvt)
> +           {
> +             gcc_assert (multi_step_cvt == 1);
> +             new_stmt = gimple_build_assign (vec_dest, codecvt1, vop0);
> +             new_temp = make_ssa_name (vec_dest, new_stmt);

I wonder how you get away with using vec_dest both for the final and the
intermediate conversion and not involve interm_types[0]?

Otherwise looks good.

Thanks,
Richard.

> +             gimple_assign_set_lhs (new_stmt, new_temp);
> +             vect_finish_stmt_generation (vinfo, stmt_info, new_stmt, gsi);
> +             vop0 = new_temp;
> +             vec_dest = vec_dsts[0];
> +           }
> +         new_stmt = gimple_build_assign (vec_dest, code1, vop0);
>           new_temp = make_ssa_name (vec_dest, new_stmt);
>           gimple_assign_set_lhs (new_stmt, new_temp);
>           vect_finish_stmt_generation (vinfo, stmt_info, new_stmt, gsi);
> --
> 2.39.1.388.g2fc9e9ca3c
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] [vect]Use intermiediate integer type for float_expr/fix_trunc_expr when direct optab is not existed.
  2023-06-20  8:38 ` Richard Biener
@ 2023-06-20  9:09   ` Hongtao Liu
  2023-06-20  9:22     ` Richard Biener
  2023-06-21  9:10   ` Richard Sandiford
  1 sibling, 1 reply; 16+ messages in thread
From: Hongtao Liu @ 2023-06-20  9:09 UTC (permalink / raw)
  To: Richard Biener; +Cc: liuhongt, gcc-patches, hjl.tools

On Tue, Jun 20, 2023 at 4:41 PM Richard Biener
<richard.guenther@gmail.com> wrote:
>
> On Fri, Jun 2, 2023 at 3:01 AM liuhongt via Gcc-patches
> <gcc-patches@gcc.gnu.org> wrote:
> >
> > We have already use intermidate type in case WIDEN, but not for NONE,
> > this patch extended that.
> >
> > I didn't do that in pattern recog since we need to know whether the
> > stmt belongs to any slp_node to decide the vectype, the related optabs
> > are checked according to vectype_in and vectype_out. For non-slp case,
> > vec_pack/unpack are always used when lhs has different size from rhs,
> > for slp case, sometimes vec_pack/unpack is used, somethings
> > direct conversion is used.
> >
> > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
> > Ok for trunk?
> >
> > gcc/ChangeLog:
> >
> >         PR target/110018
> >         * tree-vect-stmts.cc (vectorizable_conversion): Use
> >         intermiediate integer type for float_expr/fix_trunc_expr when
> >         direct optab is not existed.
> >
> > gcc/testsuite/ChangeLog:
> >
> >         * gcc.target/i386/pr110018-1.c: New test.
> > ---
> >  gcc/testsuite/gcc.target/i386/pr110018-1.c | 94 ++++++++++++++++++++++
> >  gcc/tree-vect-stmts.cc                     | 56 ++++++++++++-
> >  2 files changed, 149 insertions(+), 1 deletion(-)
> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr110018-1.c
> >
> > diff --git a/gcc/testsuite/gcc.target/i386/pr110018-1.c b/gcc/testsuite/gcc.target/i386/pr110018-1.c
> > new file mode 100644
> > index 00000000000..b1baffd7af1
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/i386/pr110018-1.c
> > @@ -0,0 +1,94 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-mavx512fp16 -mavx512vl -O2 -mavx512dq" } */
> > +/* { dg-final { scan-assembler-times {(?n)vcvttp[dsh]2[dqw]} 5 } } */
> > +/* { dg-final { scan-assembler-times {(?n)vcvt[dqw]*2p[dsh]} 5 } } */
> > +
> > +void
> > +foo (double* __restrict a, char* b)
> > +{
> > +  a[0] = b[0];
> > +  a[1] = b[1];
> > +}
> > +
> > +void
> > +foo1 (float* __restrict a, char* b)
> > +{
> > +  a[0] = b[0];
> > +  a[1] = b[1];
> > +  a[2] = b[2];
> > +  a[3] = b[3];
> > +}
> > +
> > +void
> > +foo2 (_Float16* __restrict a, char* b)
> > +{
> > +  a[0] = b[0];
> > +  a[1] = b[1];
> > +  a[2] = b[2];
> > +  a[3] = b[3];
> > +  a[4] = b[4];
> > +  a[5] = b[5];
> > +  a[6] = b[6];
> > +  a[7] = b[7];
> > +}
> > +
> > +void
> > +foo3 (double* __restrict a, short* b)
> > +{
> > +  a[0] = b[0];
> > +  a[1] = b[1];
> > +}
> > +
> > +void
> > +foo4 (float* __restrict a, char* b)
> > +{
> > +  a[0] = b[0];
> > +  a[1] = b[1];
> > +  a[2] = b[2];
> > +  a[3] = b[3];
> > +}
> > +
> > +void
> > +foo5 (double* __restrict b, char* a)
> > +{
> > +  a[0] = b[0];
> > +  a[1] = b[1];
> > +}
> > +
> > +void
> > +foo6 (float* __restrict b, char* a)
> > +{
> > +  a[0] = b[0];
> > +  a[1] = b[1];
> > +  a[2] = b[2];
> > +  a[3] = b[3];
> > +}
> > +
> > +void
> > +foo7 (_Float16* __restrict b, char* a)
> > +{
> > +  a[0] = b[0];
> > +  a[1] = b[1];
> > +  a[2] = b[2];
> > +  a[3] = b[3];
> > +  a[4] = b[4];
> > +  a[5] = b[5];
> > +  a[6] = b[6];
> > +  a[7] = b[7];
> > +}
> > +
> > +void
> > +foo8 (double* __restrict b, short* a)
> > +{
> > +  a[0] = b[0];
> > +  a[1] = b[1];
> > +}
> > +
> > +void
> > +foo9 (float* __restrict b, char* a)
> > +{
> > +  a[0] = b[0];
> > +  a[1] = b[1];
> > +  a[2] = b[2];
> > +  a[3] = b[3];
> > +}
> > diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> > index bd3b07a3aa1..1118c89686d 100644
> > --- a/gcc/tree-vect-stmts.cc
> > +++ b/gcc/tree-vect-stmts.cc
> > @@ -5162,6 +5162,49 @@ vectorizable_conversion (vec_info *vinfo,
> >         return false;
> >        if (supportable_convert_operation (code, vectype_out, vectype_in, &code1))
> >         break;
>
> A comment would be nice here.  Like
>
>    /* For conversions between float and smaller integer types try whether we can
>       use intermediate signed integer types to support the conversion.  */
>
> > +      if ((code == FLOAT_EXPR
> > +          && GET_MODE_SIZE (lhs_mode) > GET_MODE_SIZE (rhs_mode))
> > +         || (code == FIX_TRUNC_EXPR
> > +             && GET_MODE_SIZE (rhs_mode) > GET_MODE_SIZE (lhs_mode)))
> > +       {
> > +         bool float_expr_p = code == FLOAT_EXPR;
> > +         scalar_mode imode = float_expr_p ? rhs_mode : lhs_mode;
> > +         fltsz = GET_MODE_SIZE (float_expr_p ? lhs_mode : rhs_mode);
> > +         code1 = float_expr_p ? code : NOP_EXPR;
> > +         codecvt1 = float_expr_p ? NOP_EXPR : code;
> > +         FOR_EACH_2XWIDER_MODE (rhs_mode_iter, imode)
> > +           {
> > +             imode = rhs_mode_iter.require ();
> > +             if (GET_MODE_SIZE (imode) > fltsz)
> > +               break;
> > +
> > +             cvt_type
> > +               = build_nonstandard_integer_type (GET_MODE_BITSIZE (imode),
> > +                                                 0);
> > +             cvt_type = get_vectype_for_scalar_type (vinfo, cvt_type,
> > +                                                     slp_node);
> > +             /* This should only happened for SLP as long as loop vectorizer
> > +                only supports same-sized vector.  */
> > +             if (cvt_type == NULL_TREE
> > +                 || maybe_ne (TYPE_VECTOR_SUBPARTS (cvt_type), nunits_in)
> > +                 || !supportable_convert_operation (code1, vectype_out,
> > +                                                    cvt_type, &code1)
> > +                 || !supportable_convert_operation (codecvt1, cvt_type,
> > +                                                    vectype_in, &codecvt1))
> > +               continue;
> > +
> > +             found_mode = true;
> > +             break;
> > +           }
> > +
> > +         if (found_mode)
> > +           {
> > +             multi_step_cvt++;
> > +             interm_types.safe_push (cvt_type);
> > +             cvt_type = NULL_TREE;
> > +             break;
> > +           }
> > +       }
> >        /* FALLTHRU */
> >      unsupported:
> >        if (dump_enabled_p ())
> > @@ -5381,7 +5424,18 @@ vectorizable_conversion (vec_info *vinfo,
> >         {
> >           /* Arguments are ready, create the new vector stmt.  */
> >           gcc_assert (TREE_CODE_LENGTH (code1) == unary_op);
> > -         gassign *new_stmt = gimple_build_assign (vec_dest, code1, vop0);
> > +         gassign* new_stmt;
> > +         if (multi_step_cvt)
> > +           {
> > +             gcc_assert (multi_step_cvt == 1);
> > +             new_stmt = gimple_build_assign (vec_dest, codecvt1, vop0);
> > +             new_temp = make_ssa_name (vec_dest, new_stmt);
>
> I wonder how you get away with using vec_dest both for the final and the
> intermediate conversion and not involve interm_types[0]?
interm_types[0] is assigned to vect_dest, the original
vect_dest(vectype_out) is pushed to vec_dsts.

  auto_vec<tree> vec_dsts (multi_step_cvt + 1);
  vec_dest = vect_create_destination_var (scalar_dest,
  (cvt_type && modifier == WIDEN)
  ? cvt_type : vectype_out);
  vec_dsts.quick_push (vec_dest);

  if (multi_step_cvt)
    {
      for (i = interm_types.length () - 1;
   interm_types.iterate (i, &intermediate_type); i--)
{
  vec_dest = vect_create_destination_var (scalar_dest,
  intermediate_type);
  vec_dsts.quick_push (vec_dest);
}
    }

>
> Otherwise looks good.
Thanks for review.
>
> Thanks,
> Richard.
>
> > +             gimple_assign_set_lhs (new_stmt, new_temp);
> > +             vect_finish_stmt_generation (vinfo, stmt_info, new_stmt, gsi);
> > +             vop0 = new_temp;
> > +             vec_dest = vec_dsts[0];
And it's assigned back to vec_dest here.
> > +           }
> > +         new_stmt = gimple_build_assign (vec_dest, code1, vop0);
> >           new_temp = make_ssa_name (vec_dest, new_stmt);
> >           gimple_assign_set_lhs (new_stmt, new_temp);
> >           vect_finish_stmt_generation (vinfo, stmt_info, new_stmt, gsi);
> > --
> > 2.39.1.388.g2fc9e9ca3c
> >



-- 
BR,
Hongtao

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] [vect]Use intermiediate integer type for float_expr/fix_trunc_expr when direct optab is not existed.
  2023-06-20  9:09   ` Hongtao Liu
@ 2023-06-20  9:22     ` Richard Biener
  2023-06-20 16:11       ` liuhongt
  0 siblings, 1 reply; 16+ messages in thread
From: Richard Biener @ 2023-06-20  9:22 UTC (permalink / raw)
  To: Hongtao Liu; +Cc: liuhongt, gcc-patches, hjl.tools

On Tue, Jun 20, 2023 at 11:02 AM Hongtao Liu <crazylht@gmail.com> wrote:
>
> On Tue, Jun 20, 2023 at 4:41 PM Richard Biener
> <richard.guenther@gmail.com> wrote:
> >
> > On Fri, Jun 2, 2023 at 3:01 AM liuhongt via Gcc-patches
> > <gcc-patches@gcc.gnu.org> wrote:
> > >
> > > We have already use intermidate type in case WIDEN, but not for NONE,
> > > this patch extended that.
> > >
> > > I didn't do that in pattern recog since we need to know whether the
> > > stmt belongs to any slp_node to decide the vectype, the related optabs
> > > are checked according to vectype_in and vectype_out. For non-slp case,
> > > vec_pack/unpack are always used when lhs has different size from rhs,
> > > for slp case, sometimes vec_pack/unpack is used, somethings
> > > direct conversion is used.
> > >
> > > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
> > > Ok for trunk?
> > >
> > > gcc/ChangeLog:
> > >
> > >         PR target/110018
> > >         * tree-vect-stmts.cc (vectorizable_conversion): Use
> > >         intermiediate integer type for float_expr/fix_trunc_expr when
> > >         direct optab is not existed.
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > >         * gcc.target/i386/pr110018-1.c: New test.
> > > ---
> > >  gcc/testsuite/gcc.target/i386/pr110018-1.c | 94 ++++++++++++++++++++++
> > >  gcc/tree-vect-stmts.cc                     | 56 ++++++++++++-
> > >  2 files changed, 149 insertions(+), 1 deletion(-)
> > >  create mode 100644 gcc/testsuite/gcc.target/i386/pr110018-1.c
> > >
> > > diff --git a/gcc/testsuite/gcc.target/i386/pr110018-1.c b/gcc/testsuite/gcc.target/i386/pr110018-1.c
> > > new file mode 100644
> > > index 00000000000..b1baffd7af1
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.target/i386/pr110018-1.c
> > > @@ -0,0 +1,94 @@
> > > +/* { dg-do compile } */
> > > +/* { dg-options "-mavx512fp16 -mavx512vl -O2 -mavx512dq" } */
> > > +/* { dg-final { scan-assembler-times {(?n)vcvttp[dsh]2[dqw]} 5 } } */
> > > +/* { dg-final { scan-assembler-times {(?n)vcvt[dqw]*2p[dsh]} 5 } } */
> > > +
> > > +void
> > > +foo (double* __restrict a, char* b)
> > > +{
> > > +  a[0] = b[0];
> > > +  a[1] = b[1];
> > > +}
> > > +
> > > +void
> > > +foo1 (float* __restrict a, char* b)
> > > +{
> > > +  a[0] = b[0];
> > > +  a[1] = b[1];
> > > +  a[2] = b[2];
> > > +  a[3] = b[3];
> > > +}
> > > +
> > > +void
> > > +foo2 (_Float16* __restrict a, char* b)
> > > +{
> > > +  a[0] = b[0];
> > > +  a[1] = b[1];
> > > +  a[2] = b[2];
> > > +  a[3] = b[3];
> > > +  a[4] = b[4];
> > > +  a[5] = b[5];
> > > +  a[6] = b[6];
> > > +  a[7] = b[7];
> > > +}
> > > +
> > > +void
> > > +foo3 (double* __restrict a, short* b)
> > > +{
> > > +  a[0] = b[0];
> > > +  a[1] = b[1];
> > > +}
> > > +
> > > +void
> > > +foo4 (float* __restrict a, char* b)
> > > +{
> > > +  a[0] = b[0];
> > > +  a[1] = b[1];
> > > +  a[2] = b[2];
> > > +  a[3] = b[3];
> > > +}
> > > +
> > > +void
> > > +foo5 (double* __restrict b, char* a)
> > > +{
> > > +  a[0] = b[0];
> > > +  a[1] = b[1];
> > > +}
> > > +
> > > +void
> > > +foo6 (float* __restrict b, char* a)
> > > +{
> > > +  a[0] = b[0];
> > > +  a[1] = b[1];
> > > +  a[2] = b[2];
> > > +  a[3] = b[3];
> > > +}
> > > +
> > > +void
> > > +foo7 (_Float16* __restrict b, char* a)
> > > +{
> > > +  a[0] = b[0];
> > > +  a[1] = b[1];
> > > +  a[2] = b[2];
> > > +  a[3] = b[3];
> > > +  a[4] = b[4];
> > > +  a[5] = b[5];
> > > +  a[6] = b[6];
> > > +  a[7] = b[7];
> > > +}
> > > +
> > > +void
> > > +foo8 (double* __restrict b, short* a)
> > > +{
> > > +  a[0] = b[0];
> > > +  a[1] = b[1];
> > > +}
> > > +
> > > +void
> > > +foo9 (float* __restrict b, char* a)
> > > +{
> > > +  a[0] = b[0];
> > > +  a[1] = b[1];
> > > +  a[2] = b[2];
> > > +  a[3] = b[3];
> > > +}
> > > diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> > > index bd3b07a3aa1..1118c89686d 100644
> > > --- a/gcc/tree-vect-stmts.cc
> > > +++ b/gcc/tree-vect-stmts.cc
> > > @@ -5162,6 +5162,49 @@ vectorizable_conversion (vec_info *vinfo,
> > >         return false;
> > >        if (supportable_convert_operation (code, vectype_out, vectype_in, &code1))
> > >         break;
> >
> > A comment would be nice here.  Like
> >
> >    /* For conversions between float and smaller integer types try whether we can
> >       use intermediate signed integer types to support the conversion.  */
> >
> > > +      if ((code == FLOAT_EXPR
> > > +          && GET_MODE_SIZE (lhs_mode) > GET_MODE_SIZE (rhs_mode))
> > > +         || (code == FIX_TRUNC_EXPR
> > > +             && GET_MODE_SIZE (rhs_mode) > GET_MODE_SIZE (lhs_mode)))
> > > +       {
> > > +         bool float_expr_p = code == FLOAT_EXPR;
> > > +         scalar_mode imode = float_expr_p ? rhs_mode : lhs_mode;
> > > +         fltsz = GET_MODE_SIZE (float_expr_p ? lhs_mode : rhs_mode);
> > > +         code1 = float_expr_p ? code : NOP_EXPR;
> > > +         codecvt1 = float_expr_p ? NOP_EXPR : code;
> > > +         FOR_EACH_2XWIDER_MODE (rhs_mode_iter, imode)
> > > +           {
> > > +             imode = rhs_mode_iter.require ();
> > > +             if (GET_MODE_SIZE (imode) > fltsz)
> > > +               break;
> > > +
> > > +             cvt_type
> > > +               = build_nonstandard_integer_type (GET_MODE_BITSIZE (imode),
> > > +                                                 0);
> > > +             cvt_type = get_vectype_for_scalar_type (vinfo, cvt_type,
> > > +                                                     slp_node);
> > > +             /* This should only happened for SLP as long as loop vectorizer
> > > +                only supports same-sized vector.  */
> > > +             if (cvt_type == NULL_TREE
> > > +                 || maybe_ne (TYPE_VECTOR_SUBPARTS (cvt_type), nunits_in)
> > > +                 || !supportable_convert_operation (code1, vectype_out,
> > > +                                                    cvt_type, &code1)
> > > +                 || !supportable_convert_operation (codecvt1, cvt_type,
> > > +                                                    vectype_in, &codecvt1))
> > > +               continue;
> > > +
> > > +             found_mode = true;
> > > +             break;
> > > +           }
> > > +
> > > +         if (found_mode)
> > > +           {
> > > +             multi_step_cvt++;
> > > +             interm_types.safe_push (cvt_type);
> > > +             cvt_type = NULL_TREE;
> > > +             break;
> > > +           }
> > > +       }
> > >        /* FALLTHRU */
> > >      unsupported:
> > >        if (dump_enabled_p ())
> > > @@ -5381,7 +5424,18 @@ vectorizable_conversion (vec_info *vinfo,
> > >         {
> > >           /* Arguments are ready, create the new vector stmt.  */
> > >           gcc_assert (TREE_CODE_LENGTH (code1) == unary_op);
> > > -         gassign *new_stmt = gimple_build_assign (vec_dest, code1, vop0);
> > > +         gassign* new_stmt;
> > > +         if (multi_step_cvt)
> > > +           {
> > > +             gcc_assert (multi_step_cvt == 1);
> > > +             new_stmt = gimple_build_assign (vec_dest, codecvt1, vop0);
> > > +             new_temp = make_ssa_name (vec_dest, new_stmt);
> >
> > I wonder how you get away with using vec_dest both for the final and the
> > intermediate conversion and not involve interm_types[0]?
> interm_types[0] is assigned to vect_dest, the original
> vect_dest(vectype_out) is pushed to vec_dsts.
>
>   auto_vec<tree> vec_dsts (multi_step_cvt + 1);
>   vec_dest = vect_create_destination_var (scalar_dest,
>   (cvt_type && modifier == WIDEN)
>   ? cvt_type : vectype_out);
>   vec_dsts.quick_push (vec_dest);
>
>   if (multi_step_cvt)
>     {
>       for (i = interm_types.length () - 1;
>    interm_types.iterate (i, &intermediate_type); i--)
> {
>   vec_dest = vect_create_destination_var (scalar_dest,
>   intermediate_type);
>   vec_dsts.quick_push (vec_dest);
> }
>     }
>
> >
> > Otherwise looks good.
> Thanks for review.
> >
> > Thanks,
> > Richard.
> >
> > > +             gimple_assign_set_lhs (new_stmt, new_temp);
> > > +             vect_finish_stmt_generation (vinfo, stmt_info, new_stmt, gsi);
> > > +             vop0 = new_temp;
> > > +             vec_dest = vec_dsts[0];
> And it's assigned back to vec_dest here.

Ah, subtle.

So the patch is OK with the added comment.

Thanks,
Richard.

> > > +           }
> > > +         new_stmt = gimple_build_assign (vec_dest, code1, vop0);
> > >           new_temp = make_ssa_name (vec_dest, new_stmt);
> > >           gimple_assign_set_lhs (new_stmt, new_temp);
> > >           vect_finish_stmt_generation (vinfo, stmt_info, new_stmt, gsi);
> > > --
> > > 2.39.1.388.g2fc9e9ca3c
> > >
>
>
>
> --
> BR,
> Hongtao

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH] [vect]Use intermiediate integer type for float_expr/fix_trunc_expr when direct optab is not existed.
  2023-06-20  9:22     ` Richard Biener
@ 2023-06-20 16:11       ` liuhongt
  2023-06-21  7:49         ` Uros Bizjak
  2023-06-22 20:36         ` Thiago Jung Bauermann
  0 siblings, 2 replies; 16+ messages in thread
From: liuhongt @ 2023-06-20 16:11 UTC (permalink / raw)
  To: gcc-patches; +Cc: crazylht, hjl.tools

I notice there's some refactor in vectorizable_conversion
for code_helper,so I've adjusted my patch to that.
Here's the patch I'm going to commit.

We have already use intermidate type in case WIDEN, but not for NONE,
this patch extended that.

gcc/ChangeLog:

	PR target/110018
	* tree-vect-stmts.cc (vectorizable_conversion): Use
	intermiediate integer type for float_expr/fix_trunc_expr when
	direct optab is not existed.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/pr110018-1.c: New test.
---
 gcc/testsuite/gcc.target/i386/pr110018-1.c | 94 ++++++++++++++++++++++
 gcc/tree-vect-stmts.cc                     | 66 ++++++++++++++-
 2 files changed, 158 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr110018-1.c

diff --git a/gcc/testsuite/gcc.target/i386/pr110018-1.c b/gcc/testsuite/gcc.target/i386/pr110018-1.c
new file mode 100644
index 00000000000..b1baffd7af1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr110018-1.c
@@ -0,0 +1,94 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512fp16 -mavx512vl -O2 -mavx512dq" } */
+/* { dg-final { scan-assembler-times {(?n)vcvttp[dsh]2[dqw]} 5 } } */
+/* { dg-final { scan-assembler-times {(?n)vcvt[dqw]*2p[dsh]} 5 } } */
+
+void
+foo (double* __restrict a, char* b)
+{
+  a[0] = b[0];
+  a[1] = b[1];
+}
+
+void
+foo1 (float* __restrict a, char* b)
+{
+  a[0] = b[0];
+  a[1] = b[1];
+  a[2] = b[2];
+  a[3] = b[3];
+}
+
+void
+foo2 (_Float16* __restrict a, char* b)
+{
+  a[0] = b[0];
+  a[1] = b[1];
+  a[2] = b[2];
+  a[3] = b[3];
+  a[4] = b[4];
+  a[5] = b[5];
+  a[6] = b[6];
+  a[7] = b[7];
+}
+
+void
+foo3 (double* __restrict a, short* b)
+{
+  a[0] = b[0];
+  a[1] = b[1];
+}
+
+void
+foo4 (float* __restrict a, char* b)
+{
+  a[0] = b[0];
+  a[1] = b[1];
+  a[2] = b[2];
+  a[3] = b[3];
+}
+
+void
+foo5 (double* __restrict b, char* a)
+{
+  a[0] = b[0];
+  a[1] = b[1];
+}
+
+void
+foo6 (float* __restrict b, char* a)
+{
+  a[0] = b[0];
+  a[1] = b[1];
+  a[2] = b[2];
+  a[3] = b[3];
+}
+
+void
+foo7 (_Float16* __restrict b, char* a)
+{
+  a[0] = b[0];
+  a[1] = b[1];
+  a[2] = b[2];
+  a[3] = b[3];
+  a[4] = b[4];
+  a[5] = b[5];
+  a[6] = b[6];
+  a[7] = b[7];
+}
+
+void
+foo8 (double* __restrict b, short* a)
+{
+  a[0] = b[0];
+  a[1] = b[1];
+}
+
+void
+foo9 (float* __restrict b, char* a)
+{
+  a[0] = b[0];
+  a[1] = b[1];
+  a[2] = b[2];
+  a[3] = b[3];
+}
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 056a0ecb2be..ae24f3e66e6 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -5041,7 +5041,7 @@ vectorizable_conversion (vec_info *vinfo,
   tree scalar_dest;
   tree op0, op1 = NULL_TREE;
   loop_vec_info loop_vinfo = dyn_cast <loop_vec_info> (vinfo);
-  tree_code tc1;
+  tree_code tc1, tc2;
   code_helper code, code1, code2;
   code_helper codecvt1 = ERROR_MARK, codecvt2 = ERROR_MARK;
   tree new_temp;
@@ -5249,6 +5249,57 @@ vectorizable_conversion (vec_info *vinfo,
 	code1 = tc1;
 	break;
       }
+
+      /* For conversions between float and smaller integer types try whether we
+	 can use intermediate signed integer types to support the
+	 conversion.  */
+      if ((code == FLOAT_EXPR
+	   && GET_MODE_SIZE (lhs_mode) > GET_MODE_SIZE (rhs_mode))
+	  || (code == FIX_TRUNC_EXPR
+	      && GET_MODE_SIZE (rhs_mode) > GET_MODE_SIZE (lhs_mode)))
+	{
+	  bool float_expr_p = code == FLOAT_EXPR;
+	  scalar_mode imode = float_expr_p ? rhs_mode : lhs_mode;
+	  fltsz = GET_MODE_SIZE (float_expr_p ? lhs_mode : rhs_mode);
+	  code1 = float_expr_p ? code : NOP_EXPR;
+	  codecvt1 = float_expr_p ? NOP_EXPR : code;
+	  FOR_EACH_2XWIDER_MODE (rhs_mode_iter, imode)
+	    {
+	      imode = rhs_mode_iter.require ();
+	      if (GET_MODE_SIZE (imode) > fltsz)
+		break;
+
+	      cvt_type
+		= build_nonstandard_integer_type (GET_MODE_BITSIZE (imode),
+						  0);
+	      cvt_type = get_vectype_for_scalar_type (vinfo, cvt_type,
+						      slp_node);
+	      /* This should only happened for SLP as long as loop vectorizer
+		 only supports same-sized vector.  */
+	      if (cvt_type == NULL_TREE
+		  || maybe_ne (TYPE_VECTOR_SUBPARTS (cvt_type), nunits_in)
+		  || !supportable_convert_operation ((tree_code) code1,
+						     vectype_out,
+						     cvt_type, &tc1)
+		  || !supportable_convert_operation ((tree_code) codecvt1,
+						     cvt_type,
+						     vectype_in, &tc2))
+		continue;
+
+	      found_mode = true;
+	      break;
+	    }
+
+	  if (found_mode)
+	    {
+	      multi_step_cvt++;
+	      interm_types.safe_push (cvt_type);
+	      cvt_type = NULL_TREE;
+	      code1 = tc1;
+	      codecvt1 = tc2;
+	      break;
+	    }
+	}
       /* FALLTHRU */
     unsupported:
       if (dump_enabled_p ())
@@ -5513,7 +5564,18 @@ vectorizable_conversion (vec_info *vinfo,
       FOR_EACH_VEC_ELT (vec_oprnds0, i, vop0)
 	{
 	  /* Arguments are ready, create the new vector stmt.  */
-	  gimple *new_stmt = vect_gimple_build (vec_dest, code1, vop0);
+	  gimple* new_stmt;
+	  if (multi_step_cvt)
+	    {
+	      gcc_assert (multi_step_cvt == 1);
+	      new_stmt = vect_gimple_build (vec_dest, codecvt1, vop0);
+	      new_temp = make_ssa_name (vec_dest, new_stmt);
+	      gimple_assign_set_lhs (new_stmt, new_temp);
+	      vect_finish_stmt_generation (vinfo, stmt_info, new_stmt, gsi);
+	      vop0 = new_temp;
+	      vec_dest = vec_dsts[0];
+	    }
+	  new_stmt = vect_gimple_build (vec_dest, code1, vop0);
 	  new_temp = make_ssa_name (vec_dest, new_stmt);
 	  gimple_set_lhs (new_stmt, new_temp);
 	  vect_finish_stmt_generation (vinfo, stmt_info, new_stmt, gsi);
-- 
2.39.1.388.g2fc9e9ca3c


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] [vect]Use intermiediate integer type for float_expr/fix_trunc_expr when direct optab is not existed.
  2023-06-20 16:11       ` liuhongt
@ 2023-06-21  7:49         ` Uros Bizjak
  2023-06-21  8:19           ` Richard Biener
  2023-06-22 20:36         ` Thiago Jung Bauermann
  1 sibling, 1 reply; 16+ messages in thread
From: Uros Bizjak @ 2023-06-21  7:49 UTC (permalink / raw)
  To: liuhongt; +Cc: gcc-patches, crazylht, hjl.tools

[-- Attachment #1: Type: text/plain, Size: 1124 bytes --]

On Tue, Jun 20, 2023 at 6:11 PM liuhongt via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> I notice there's some refactor in vectorizable_conversion
> for code_helper,so I've adjusted my patch to that.
> Here's the patch I'm going to commit.
>
> We have already use intermidate type in case WIDEN, but not for NONE,
> this patch extended that.
>
> gcc/ChangeLog:
>
>         PR target/110018
>         * tree-vect-stmts.cc (vectorizable_conversion): Use
>         intermiediate integer type for float_expr/fix_trunc_expr when
>         direct optab is not existed.
>
> gcc/testsuite/ChangeLog:
>
>         * gcc.target/i386/pr110018-1.c: New test.

> +
> +      /* For conversions between float and smaller integer types try whether we
> +        can use intermediate signed integer types to support the
> +        conversion.  */

I'm trying to enhance testcase coverage with explicit signed/unsigned
types (patch attached), and I have noticed that zero-extension is used
for unsigned types. So, the above comment that mentions only signed
integer types is not entirely correct.

Uros.

[-- Attachment #2: t.diff.txt --]
[-- Type: text/plain, Size: 3611 bytes --]

diff --git a/gcc/testsuite/gcc.target/i386/pr110018-1.c b/gcc/testsuite/gcc.target/i386/pr110018-1.c
index b1baffd7af1..b6a3be7b7a2 100644
--- a/gcc/testsuite/gcc.target/i386/pr110018-1.c
+++ b/gcc/testsuite/gcc.target/i386/pr110018-1.c
@@ -4,14 +4,14 @@
 /* { dg-final { scan-assembler-times {(?n)vcvt[dqw]*2p[dsh]} 5 } } */
 
 void
-foo (double* __restrict a, char* b)
+foo (double* __restrict a, signed char* b)
 {
   a[0] = b[0];
   a[1] = b[1];
 }
 
 void
-foo1 (float* __restrict a, char* b)
+foo1 (float* __restrict a, signed char* b)
 {
   a[0] = b[0];
   a[1] = b[1];
@@ -20,7 +20,7 @@ foo1 (float* __restrict a, char* b)
 }
 
 void
-foo2 (_Float16* __restrict a, char* b)
+foo2 (_Float16* __restrict a, signed char* b)
 {
   a[0] = b[0];
   a[1] = b[1];
@@ -33,14 +33,14 @@ foo2 (_Float16* __restrict a, char* b)
 }
 
 void
-foo3 (double* __restrict a, short* b)
+foo3 (double* __restrict a, signed short* b)
 {
   a[0] = b[0];
   a[1] = b[1];
 }
 
 void
-foo4 (float* __restrict a, char* b)
+foo4 (float* __restrict a, signed char* b)
 {
   a[0] = b[0];
   a[1] = b[1];
@@ -49,14 +49,14 @@ foo4 (float* __restrict a, char* b)
 }
 
 void
-foo5 (double* __restrict b, char* a)
+foo5 (double* __restrict b, signed char* a)
 {
   a[0] = b[0];
   a[1] = b[1];
 }
 
 void
-foo6 (float* __restrict b, char* a)
+foo6 (float* __restrict b, signed char* a)
 {
   a[0] = b[0];
   a[1] = b[1];
@@ -65,7 +65,7 @@ foo6 (float* __restrict b, char* a)
 }
 
 void
-foo7 (_Float16* __restrict b, char* a)
+foo7 (_Float16* __restrict b, signed char* a)
 {
   a[0] = b[0];
   a[1] = b[1];
@@ -78,14 +78,14 @@ foo7 (_Float16* __restrict b, char* a)
 }
 
 void
-foo8 (double* __restrict b, short* a)
+foo8 (double* __restrict b, signed short* a)
 {
   a[0] = b[0];
   a[1] = b[1];
 }
 
 void
-foo9 (float* __restrict b, char* a)
+foo9 (float* __restrict b, signed char* a)
 {
   a[0] = b[0];
   a[1] = b[1];
diff --git a/gcc/testsuite/gcc.target/i386/pr110018-2.c b/gcc/testsuite/gcc.target/i386/pr110018-2.c
new file mode 100644
index 00000000000..a663e074698
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr110018-2.c
@@ -0,0 +1,94 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512fp16 -mavx512vl -O2 -mavx512dq" } */
+/* { dg-final { scan-assembler-times {(?n)vcvttp[dsh]2[dqw]} 5 } } */
+/* { dg-final { scan-assembler-times {(?n)vcvt[dqw]*2p[dsh]} 5 } } */
+
+void
+foo (double* __restrict a, unsigned char* b)
+{
+  a[0] = b[0];
+  a[1] = b[1];
+}
+
+void
+foo1 (float* __restrict a, unsigned char* b)
+{
+  a[0] = b[0];
+  a[1] = b[1];
+  a[2] = b[2];
+  a[3] = b[3];
+}
+
+void
+foo2 (_Float16* __restrict a, unsigned char* b)
+{
+  a[0] = b[0];
+  a[1] = b[1];
+  a[2] = b[2];
+  a[3] = b[3];
+  a[4] = b[4];
+  a[5] = b[5];
+  a[6] = b[6];
+  a[7] = b[7];
+}
+
+void
+foo3 (double* __restrict a, unsigned short* b)
+{
+  a[0] = b[0];
+  a[1] = b[1];
+}
+
+void
+foo4 (float* __restrict a, unsigned char* b)
+{
+  a[0] = b[0];
+  a[1] = b[1];
+  a[2] = b[2];
+  a[3] = b[3];
+}
+
+void
+foo5 (double* __restrict b, unsigned char* a)
+{
+  a[0] = b[0];
+  a[1] = b[1];
+}
+
+void
+foo6 (float* __restrict b, unsigned char* a)
+{
+  a[0] = b[0];
+  a[1] = b[1];
+  a[2] = b[2];
+  a[3] = b[3];
+}
+
+void
+foo7 (_Float16* __restrict b, unsigned char* a)
+{
+  a[0] = b[0];
+  a[1] = b[1];
+  a[2] = b[2];
+  a[3] = b[3];
+  a[4] = b[4];
+  a[5] = b[5];
+  a[6] = b[6];
+  a[7] = b[7];
+}
+
+void
+foo8 (double* __restrict b, unsigned short* a)
+{
+  a[0] = b[0];
+  a[1] = b[1];
+}
+
+void
+foo9 (float* __restrict b, unsigned char* a)
+{
+  a[0] = b[0];
+  a[1] = b[1];
+  a[2] = b[2];
+  a[3] = b[3];
+}

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] [vect]Use intermiediate integer type for float_expr/fix_trunc_expr when direct optab is not existed.
  2023-06-21  7:49         ` Uros Bizjak
@ 2023-06-21  8:19           ` Richard Biener
  2023-06-21 14:37             ` Uros Bizjak
  0 siblings, 1 reply; 16+ messages in thread
From: Richard Biener @ 2023-06-21  8:19 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: liuhongt, gcc-patches, crazylht, hjl.tools

On Wed, Jun 21, 2023 at 9:50 AM Uros Bizjak via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> On Tue, Jun 20, 2023 at 6:11 PM liuhongt via Gcc-patches
> <gcc-patches@gcc.gnu.org> wrote:
> >
> > I notice there's some refactor in vectorizable_conversion
> > for code_helper,so I've adjusted my patch to that.
> > Here's the patch I'm going to commit.
> >
> > We have already use intermidate type in case WIDEN, but not for NONE,
> > this patch extended that.
> >
> > gcc/ChangeLog:
> >
> >         PR target/110018
> >         * tree-vect-stmts.cc (vectorizable_conversion): Use
> >         intermiediate integer type for float_expr/fix_trunc_expr when
> >         direct optab is not existed.
> >
> > gcc/testsuite/ChangeLog:
> >
> >         * gcc.target/i386/pr110018-1.c: New test.
>
> > +
> > +      /* For conversions between float and smaller integer types try whether we
> > +        can use intermediate signed integer types to support the
> > +        conversion.  */
>
> I'm trying to enhance testcase coverage with explicit signed/unsigned
> types (patch attached), and I have noticed that zero-extension is used
> for unsigned types. So, the above comment that mentions only signed
> integer types is not entirely correct.

The comment says the intermediate sized vector types are always
signed (because float conversions to/from unsigned are always somewhat
awkward), but yes, if the original type was unsigned zero-extension is
used and if it was signed sign-extension.

The testcase adjustments / additions look good to me btw.

Thanks,
Richard.

>
> Uros.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] [vect]Use intermiediate integer type for float_expr/fix_trunc_expr when direct optab is not existed.
  2023-06-21  8:19           ` Richard Biener
@ 2023-06-21 14:37             ` Uros Bizjak
  0 siblings, 0 replies; 16+ messages in thread
From: Uros Bizjak @ 2023-06-21 14:37 UTC (permalink / raw)
  To: Richard Biener; +Cc: liuhongt, gcc-patches, crazylht, hjl.tools

On Wed, Jun 21, 2023 at 10:22 AM Richard Biener
<richard.guenther@gmail.com> wrote:

> > > +      /* For conversions between float and smaller integer types try whether we
> > > +        can use intermediate signed integer types to support the
> > > +        conversion.  */
> >
> > I'm trying to enhance testcase coverage with explicit signed/unsigned
> > types (patch attached), and I have noticed that zero-extension is used
> > for unsigned types. So, the above comment that mentions only signed
> > integer types is not entirely correct.
>
> The comment says the intermediate sized vector types are always
> signed (because float conversions to/from unsigned are always somewhat
> awkward), but yes, if the original type was unsigned zero-extension is
> used and if it was signed sign-extension.
>
> The testcase adjustments / additions look good to me btw.

Thanks, pushed with the following ChangeLog:

vect: Add testcases for unsigned conversions [PR110018]

Also test conversions with unsigned types.

    PR target/110018

gcc/testsuite/ChangeLog:

    * gcc.target/i386/pr110018-1.c: Use explicit signed types.
    * gcc.target/i386/pr110018-2.c: New test.

Tested on x86_64-linux-gnu {,-m32}.

Uros.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] [vect]Use intermiediate integer type for float_expr/fix_trunc_expr when direct optab is not existed.
  2023-06-20 16:11       ` liuhongt
  2023-06-21  7:49         ` Uros Bizjak
@ 2023-06-22 20:36         ` Thiago Jung Bauermann
  1 sibling, 0 replies; 16+ messages in thread
From: Thiago Jung Bauermann @ 2023-06-22 20:36 UTC (permalink / raw)
  To: liuhongt; +Cc: crazylht, hjl.tools, gcc-patches


Hello,

liuhongt via Gcc-patches <gcc-patches@gcc.gnu.org> writes:

> I notice there's some refactor in vectorizable_conversion
> for code_helper,so I've adjusted my patch to that.
> Here's the patch I'm going to commit.
>
> We have already use intermidate type in case WIDEN, but not for NONE,
> this patch extended that.
>
> gcc/ChangeLog:
>
> 	PR target/110018
> 	* tree-vect-stmts.cc (vectorizable_conversion): Use
> 	intermiediate integer type for float_expr/fix_trunc_expr when
> 	direct optab is not existed.
>
> gcc/testsuite/ChangeLog:
>
> 	* gcc.target/i386/pr110018-1.c: New test.
> ---
>  gcc/testsuite/gcc.target/i386/pr110018-1.c | 94 ++++++++++++++++++++++
>  gcc/tree-vect-stmts.cc                     | 66 ++++++++++++++-
>  2 files changed, 158 insertions(+), 2 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr110018-1.c

Our CI detected regressions on aarch64-linux-gnu with this commit, in
some aarch64-sve and gfortran tests. I filed the following bug report
with the details:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110371

Could you please check?

-- 
Thiago

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] [vect]Use intermiediate integer type for float_expr/fix_trunc_expr when direct optab is not existed.
  2023-06-20  8:38 ` Richard Biener
  2023-06-20  9:09   ` Hongtao Liu
@ 2023-06-21  9:10   ` Richard Sandiford
  2023-06-21  9:32     ` Richard Sandiford
  1 sibling, 1 reply; 16+ messages in thread
From: Richard Sandiford @ 2023-06-21  9:10 UTC (permalink / raw)
  To: Richard Biener via Gcc-patches
  Cc: liuhongt, Richard Biener, crazylht, hjl.tools

Richard Biener via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> On Fri, Jun 2, 2023 at 3:01 AM liuhongt via Gcc-patches
> <gcc-patches@gcc.gnu.org> wrote:
>>
>> We have already use intermidate type in case WIDEN, but not for NONE,
>> this patch extended that.
>>
>> I didn't do that in pattern recog since we need to know whether the
>> stmt belongs to any slp_node to decide the vectype, the related optabs
>> are checked according to vectype_in and vectype_out. For non-slp case,
>> vec_pack/unpack are always used when lhs has different size from rhs,
>> for slp case, sometimes vec_pack/unpack is used, somethings
>> direct conversion is used.
>>
>> Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
>> Ok for trunk?
>>
>> gcc/ChangeLog:
>>
>>         PR target/110018
>>         * tree-vect-stmts.cc (vectorizable_conversion): Use
>>         intermiediate integer type for float_expr/fix_trunc_expr when
>>         direct optab is not existed.
>>
>> gcc/testsuite/ChangeLog:
>>
>>         * gcc.target/i386/pr110018-1.c: New test.
>> ---
>>  gcc/testsuite/gcc.target/i386/pr110018-1.c | 94 ++++++++++++++++++++++
>>  gcc/tree-vect-stmts.cc                     | 56 ++++++++++++-
>>  2 files changed, 149 insertions(+), 1 deletion(-)
>>  create mode 100644 gcc/testsuite/gcc.target/i386/pr110018-1.c
>>
>> diff --git a/gcc/testsuite/gcc.target/i386/pr110018-1.c b/gcc/testsuite/gcc.target/i386/pr110018-1.c
>> new file mode 100644
>> index 00000000000..b1baffd7af1
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/i386/pr110018-1.c
>> @@ -0,0 +1,94 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-mavx512fp16 -mavx512vl -O2 -mavx512dq" } */
>> +/* { dg-final { scan-assembler-times {(?n)vcvttp[dsh]2[dqw]} 5 } } */
>> +/* { dg-final { scan-assembler-times {(?n)vcvt[dqw]*2p[dsh]} 5 } } */
>> +
>> +void
>> +foo (double* __restrict a, char* b)
>> +{
>> +  a[0] = b[0];
>> +  a[1] = b[1];
>> +}
>> +
>> +void
>> +foo1 (float* __restrict a, char* b)
>> +{
>> +  a[0] = b[0];
>> +  a[1] = b[1];
>> +  a[2] = b[2];
>> +  a[3] = b[3];
>> +}
>> +
>> +void
>> +foo2 (_Float16* __restrict a, char* b)
>> +{
>> +  a[0] = b[0];
>> +  a[1] = b[1];
>> +  a[2] = b[2];
>> +  a[3] = b[3];
>> +  a[4] = b[4];
>> +  a[5] = b[5];
>> +  a[6] = b[6];
>> +  a[7] = b[7];
>> +}
>> +
>> +void
>> +foo3 (double* __restrict a, short* b)
>> +{
>> +  a[0] = b[0];
>> +  a[1] = b[1];
>> +}
>> +
>> +void
>> +foo4 (float* __restrict a, char* b)
>> +{
>> +  a[0] = b[0];
>> +  a[1] = b[1];
>> +  a[2] = b[2];
>> +  a[3] = b[3];
>> +}
>> +
>> +void
>> +foo5 (double* __restrict b, char* a)
>> +{
>> +  a[0] = b[0];
>> +  a[1] = b[1];
>> +}
>> +
>> +void
>> +foo6 (float* __restrict b, char* a)
>> +{
>> +  a[0] = b[0];
>> +  a[1] = b[1];
>> +  a[2] = b[2];
>> +  a[3] = b[3];
>> +}
>> +
>> +void
>> +foo7 (_Float16* __restrict b, char* a)
>> +{
>> +  a[0] = b[0];
>> +  a[1] = b[1];
>> +  a[2] = b[2];
>> +  a[3] = b[3];
>> +  a[4] = b[4];
>> +  a[5] = b[5];
>> +  a[6] = b[6];
>> +  a[7] = b[7];
>> +}
>> +
>> +void
>> +foo8 (double* __restrict b, short* a)
>> +{
>> +  a[0] = b[0];
>> +  a[1] = b[1];
>> +}
>> +
>> +void
>> +foo9 (float* __restrict b, char* a)
>> +{
>> +  a[0] = b[0];
>> +  a[1] = b[1];
>> +  a[2] = b[2];
>> +  a[3] = b[3];
>> +}
>> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
>> index bd3b07a3aa1..1118c89686d 100644
>> --- a/gcc/tree-vect-stmts.cc
>> +++ b/gcc/tree-vect-stmts.cc
>> @@ -5162,6 +5162,49 @@ vectorizable_conversion (vec_info *vinfo,
>>         return false;
>>        if (supportable_convert_operation (code, vectype_out, vectype_in, &code1))
>>         break;
>
> A comment would be nice here.  Like
>
>    /* For conversions between float and smaller integer types try whether we can
>       use intermediate signed integer types to support the conversion.  */
>
>> +      if ((code == FLOAT_EXPR
>> +          && GET_MODE_SIZE (lhs_mode) > GET_MODE_SIZE (rhs_mode))
>> +         || (code == FIX_TRUNC_EXPR
>> +             && GET_MODE_SIZE (rhs_mode) > GET_MODE_SIZE (lhs_mode)))

Is the FIX_TRUNC_EXPR case safe without some flag?

#include <stdint.h>
int32_t x = (int32_t)0x1.0p32;
int32_t y = (int32_t)(int64_t)0x1.0p32;

sets x to 2147483647 and y to 0.

Thanks,
Richard

>> +       {
>> +         bool float_expr_p = code == FLOAT_EXPR;
>> +         scalar_mode imode = float_expr_p ? rhs_mode : lhs_mode;
>> +         fltsz = GET_MODE_SIZE (float_expr_p ? lhs_mode : rhs_mode);
>> +         code1 = float_expr_p ? code : NOP_EXPR;
>> +         codecvt1 = float_expr_p ? NOP_EXPR : code;
>> +         FOR_EACH_2XWIDER_MODE (rhs_mode_iter, imode)
>> +           {
>> +             imode = rhs_mode_iter.require ();
>> +             if (GET_MODE_SIZE (imode) > fltsz)
>> +               break;
>> +
>> +             cvt_type
>> +               = build_nonstandard_integer_type (GET_MODE_BITSIZE (imode),
>> +                                                 0);
>> +             cvt_type = get_vectype_for_scalar_type (vinfo, cvt_type,
>> +                                                     slp_node);
>> +             /* This should only happened for SLP as long as loop vectorizer
>> +                only supports same-sized vector.  */
>> +             if (cvt_type == NULL_TREE
>> +                 || maybe_ne (TYPE_VECTOR_SUBPARTS (cvt_type), nunits_in)
>> +                 || !supportable_convert_operation (code1, vectype_out,
>> +                                                    cvt_type, &code1)
>> +                 || !supportable_convert_operation (codecvt1, cvt_type,
>> +                                                    vectype_in, &codecvt1))
>> +               continue;
>> +
>> +             found_mode = true;
>> +             break;
>> +           }
>> +
>> +         if (found_mode)
>> +           {
>> +             multi_step_cvt++;
>> +             interm_types.safe_push (cvt_type);
>> +             cvt_type = NULL_TREE;
>> +             break;
>> +           }
>> +       }
>>        /* FALLTHRU */
>>      unsupported:
>>        if (dump_enabled_p ())
>> @@ -5381,7 +5424,18 @@ vectorizable_conversion (vec_info *vinfo,
>>         {
>>           /* Arguments are ready, create the new vector stmt.  */
>>           gcc_assert (TREE_CODE_LENGTH (code1) == unary_op);
>> -         gassign *new_stmt = gimple_build_assign (vec_dest, code1, vop0);
>> +         gassign* new_stmt;
>> +         if (multi_step_cvt)
>> +           {
>> +             gcc_assert (multi_step_cvt == 1);
>> +             new_stmt = gimple_build_assign (vec_dest, codecvt1, vop0);
>> +             new_temp = make_ssa_name (vec_dest, new_stmt);
>
> I wonder how you get away with using vec_dest both for the final and the
> intermediate conversion and not involve interm_types[0]?
>
> Otherwise looks good.
>
> Thanks,
> Richard.
>
>> +             gimple_assign_set_lhs (new_stmt, new_temp);
>> +             vect_finish_stmt_generation (vinfo, stmt_info, new_stmt, gsi);
>> +             vop0 = new_temp;
>> +             vec_dest = vec_dsts[0];
>> +           }
>> +         new_stmt = gimple_build_assign (vec_dest, code1, vop0);
>>           new_temp = make_ssa_name (vec_dest, new_stmt);
>>           gimple_assign_set_lhs (new_stmt, new_temp);
>>           vect_finish_stmt_generation (vinfo, stmt_info, new_stmt, gsi);
>> --
>> 2.39.1.388.g2fc9e9ca3c
>>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] [vect]Use intermiediate integer type for float_expr/fix_trunc_expr when direct optab is not existed.
  2023-06-21  9:10   ` Richard Sandiford
@ 2023-06-21  9:32     ` Richard Sandiford
  2023-06-21 11:04       ` Richard Biener
  0 siblings, 1 reply; 16+ messages in thread
From: Richard Sandiford @ 2023-06-21  9:32 UTC (permalink / raw)
  To: Richard Biener via Gcc-patches
  Cc: liuhongt, Richard Biener, crazylht, hjl.tools

Richard Sandiford <richard.sandiford@arm.com> writes:
> Richard Biener via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
>> On Fri, Jun 2, 2023 at 3:01 AM liuhongt via Gcc-patches
>> <gcc-patches@gcc.gnu.org> wrote:
>>>
>>> We have already use intermidate type in case WIDEN, but not for NONE,
>>> this patch extended that.
>>>
>>> I didn't do that in pattern recog since we need to know whether the
>>> stmt belongs to any slp_node to decide the vectype, the related optabs
>>> are checked according to vectype_in and vectype_out. For non-slp case,
>>> vec_pack/unpack are always used when lhs has different size from rhs,
>>> for slp case, sometimes vec_pack/unpack is used, somethings
>>> direct conversion is used.
>>>
>>> Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
>>> Ok for trunk?
>>>
>>> gcc/ChangeLog:
>>>
>>>         PR target/110018
>>>         * tree-vect-stmts.cc (vectorizable_conversion): Use
>>>         intermiediate integer type for float_expr/fix_trunc_expr when
>>>         direct optab is not existed.
>>>
>>> gcc/testsuite/ChangeLog:
>>>
>>>         * gcc.target/i386/pr110018-1.c: New test.
>>> ---
>>>  gcc/testsuite/gcc.target/i386/pr110018-1.c | 94 ++++++++++++++++++++++
>>>  gcc/tree-vect-stmts.cc                     | 56 ++++++++++++-
>>>  2 files changed, 149 insertions(+), 1 deletion(-)
>>>  create mode 100644 gcc/testsuite/gcc.target/i386/pr110018-1.c
>>>
>>> diff --git a/gcc/testsuite/gcc.target/i386/pr110018-1.c b/gcc/testsuite/gcc.target/i386/pr110018-1.c
>>> new file mode 100644
>>> index 00000000000..b1baffd7af1
>>> --- /dev/null
>>> +++ b/gcc/testsuite/gcc.target/i386/pr110018-1.c
>>> @@ -0,0 +1,94 @@
>>> +/* { dg-do compile } */
>>> +/* { dg-options "-mavx512fp16 -mavx512vl -O2 -mavx512dq" } */
>>> +/* { dg-final { scan-assembler-times {(?n)vcvttp[dsh]2[dqw]} 5 } } */
>>> +/* { dg-final { scan-assembler-times {(?n)vcvt[dqw]*2p[dsh]} 5 } } */
>>> +
>>> +void
>>> +foo (double* __restrict a, char* b)
>>> +{
>>> +  a[0] = b[0];
>>> +  a[1] = b[1];
>>> +}
>>> +
>>> +void
>>> +foo1 (float* __restrict a, char* b)
>>> +{
>>> +  a[0] = b[0];
>>> +  a[1] = b[1];
>>> +  a[2] = b[2];
>>> +  a[3] = b[3];
>>> +}
>>> +
>>> +void
>>> +foo2 (_Float16* __restrict a, char* b)
>>> +{
>>> +  a[0] = b[0];
>>> +  a[1] = b[1];
>>> +  a[2] = b[2];
>>> +  a[3] = b[3];
>>> +  a[4] = b[4];
>>> +  a[5] = b[5];
>>> +  a[6] = b[6];
>>> +  a[7] = b[7];
>>> +}
>>> +
>>> +void
>>> +foo3 (double* __restrict a, short* b)
>>> +{
>>> +  a[0] = b[0];
>>> +  a[1] = b[1];
>>> +}
>>> +
>>> +void
>>> +foo4 (float* __restrict a, char* b)
>>> +{
>>> +  a[0] = b[0];
>>> +  a[1] = b[1];
>>> +  a[2] = b[2];
>>> +  a[3] = b[3];
>>> +}
>>> +
>>> +void
>>> +foo5 (double* __restrict b, char* a)
>>> +{
>>> +  a[0] = b[0];
>>> +  a[1] = b[1];
>>> +}
>>> +
>>> +void
>>> +foo6 (float* __restrict b, char* a)
>>> +{
>>> +  a[0] = b[0];
>>> +  a[1] = b[1];
>>> +  a[2] = b[2];
>>> +  a[3] = b[3];
>>> +}
>>> +
>>> +void
>>> +foo7 (_Float16* __restrict b, char* a)
>>> +{
>>> +  a[0] = b[0];
>>> +  a[1] = b[1];
>>> +  a[2] = b[2];
>>> +  a[3] = b[3];
>>> +  a[4] = b[4];
>>> +  a[5] = b[5];
>>> +  a[6] = b[6];
>>> +  a[7] = b[7];
>>> +}
>>> +
>>> +void
>>> +foo8 (double* __restrict b, short* a)
>>> +{
>>> +  a[0] = b[0];
>>> +  a[1] = b[1];
>>> +}
>>> +
>>> +void
>>> +foo9 (float* __restrict b, char* a)
>>> +{
>>> +  a[0] = b[0];
>>> +  a[1] = b[1];
>>> +  a[2] = b[2];
>>> +  a[3] = b[3];
>>> +}
>>> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
>>> index bd3b07a3aa1..1118c89686d 100644
>>> --- a/gcc/tree-vect-stmts.cc
>>> +++ b/gcc/tree-vect-stmts.cc
>>> @@ -5162,6 +5162,49 @@ vectorizable_conversion (vec_info *vinfo,
>>>         return false;
>>>        if (supportable_convert_operation (code, vectype_out, vectype_in, &code1))
>>>         break;
>>
>> A comment would be nice here.  Like
>>
>>    /* For conversions between float and smaller integer types try whether we can
>>       use intermediate signed integer types to support the conversion.  */
>>
>>> +      if ((code == FLOAT_EXPR
>>> +          && GET_MODE_SIZE (lhs_mode) > GET_MODE_SIZE (rhs_mode))
>>> +         || (code == FIX_TRUNC_EXPR
>>> +             && GET_MODE_SIZE (rhs_mode) > GET_MODE_SIZE (lhs_mode)))
>
> Is the FIX_TRUNC_EXPR case safe without some flag?
>
> #include <stdint.h>
> int32_t x = (int32_t)0x1.0p32;
> int32_t y = (int32_t)(int64_t)0x1.0p32;
>
> sets x to 2147483647 and y to 0.

Also, I think multi_step_cvt should influence the costs, since at
the moment we cost one statement but generate two.  This makes a
difference for SVE with VECT_COMPARE_COSTS.  Would changing it to:

	  vect_model_simple_cost (vinfo, stmt_info,
				  ncopies * (multi_step_cvt + 1),
				  dt, ndts, slp_node,
				  cost_vec);

be OK?

There again, I wonder if we should handle this using patterns instead.
That makes both conversions explicit and therefore easier to cost.

E.g. for SVE, an integer extension is free if the source is a load,
and we do try to model that.  But it's difficult to handle if the
conversion is only implicit.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] [vect]Use intermiediate integer type for float_expr/fix_trunc_expr when direct optab is not existed.
  2023-06-21  9:32     ` Richard Sandiford
@ 2023-06-21 11:04       ` Richard Biener
  2023-06-21 11:20         ` Richard Sandiford
  2023-06-21 20:39         ` Joseph Myers
  0 siblings, 2 replies; 16+ messages in thread
From: Richard Biener @ 2023-06-21 11:04 UTC (permalink / raw)
  To: Richard Biener via Gcc-patches, liuhongt, Richard Biener,
	crazylht, hjl.tools, richard.sandiford

On Wed, Jun 21, 2023 at 11:32 AM Richard Sandiford
<richard.sandiford@arm.com> wrote:
>
> Richard Sandiford <richard.sandiford@arm.com> writes:
> > Richard Biener via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> >> On Fri, Jun 2, 2023 at 3:01 AM liuhongt via Gcc-patches
> >> <gcc-patches@gcc.gnu.org> wrote:
> >>>
> >>> We have already use intermidate type in case WIDEN, but not for NONE,
> >>> this patch extended that.
> >>>
> >>> I didn't do that in pattern recog since we need to know whether the
> >>> stmt belongs to any slp_node to decide the vectype, the related optabs
> >>> are checked according to vectype_in and vectype_out. For non-slp case,
> >>> vec_pack/unpack are always used when lhs has different size from rhs,
> >>> for slp case, sometimes vec_pack/unpack is used, somethings
> >>> direct conversion is used.
> >>>
> >>> Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
> >>> Ok for trunk?
> >>>
> >>> gcc/ChangeLog:
> >>>
> >>>         PR target/110018
> >>>         * tree-vect-stmts.cc (vectorizable_conversion): Use
> >>>         intermiediate integer type for float_expr/fix_trunc_expr when
> >>>         direct optab is not existed.
> >>>
> >>> gcc/testsuite/ChangeLog:
> >>>
> >>>         * gcc.target/i386/pr110018-1.c: New test.
> >>> ---
> >>>  gcc/testsuite/gcc.target/i386/pr110018-1.c | 94 ++++++++++++++++++++++
> >>>  gcc/tree-vect-stmts.cc                     | 56 ++++++++++++-
> >>>  2 files changed, 149 insertions(+), 1 deletion(-)
> >>>  create mode 100644 gcc/testsuite/gcc.target/i386/pr110018-1.c
> >>>
> >>> diff --git a/gcc/testsuite/gcc.target/i386/pr110018-1.c b/gcc/testsuite/gcc.target/i386/pr110018-1.c
> >>> new file mode 100644
> >>> index 00000000000..b1baffd7af1
> >>> --- /dev/null
> >>> +++ b/gcc/testsuite/gcc.target/i386/pr110018-1.c
> >>> @@ -0,0 +1,94 @@
> >>> +/* { dg-do compile } */
> >>> +/* { dg-options "-mavx512fp16 -mavx512vl -O2 -mavx512dq" } */
> >>> +/* { dg-final { scan-assembler-times {(?n)vcvttp[dsh]2[dqw]} 5 } } */
> >>> +/* { dg-final { scan-assembler-times {(?n)vcvt[dqw]*2p[dsh]} 5 } } */
> >>> +
> >>> +void
> >>> +foo (double* __restrict a, char* b)
> >>> +{
> >>> +  a[0] = b[0];
> >>> +  a[1] = b[1];
> >>> +}
> >>> +
> >>> +void
> >>> +foo1 (float* __restrict a, char* b)
> >>> +{
> >>> +  a[0] = b[0];
> >>> +  a[1] = b[1];
> >>> +  a[2] = b[2];
> >>> +  a[3] = b[3];
> >>> +}
> >>> +
> >>> +void
> >>> +foo2 (_Float16* __restrict a, char* b)
> >>> +{
> >>> +  a[0] = b[0];
> >>> +  a[1] = b[1];
> >>> +  a[2] = b[2];
> >>> +  a[3] = b[3];
> >>> +  a[4] = b[4];
> >>> +  a[5] = b[5];
> >>> +  a[6] = b[6];
> >>> +  a[7] = b[7];
> >>> +}
> >>> +
> >>> +void
> >>> +foo3 (double* __restrict a, short* b)
> >>> +{
> >>> +  a[0] = b[0];
> >>> +  a[1] = b[1];
> >>> +}
> >>> +
> >>> +void
> >>> +foo4 (float* __restrict a, char* b)
> >>> +{
> >>> +  a[0] = b[0];
> >>> +  a[1] = b[1];
> >>> +  a[2] = b[2];
> >>> +  a[3] = b[3];
> >>> +}
> >>> +
> >>> +void
> >>> +foo5 (double* __restrict b, char* a)
> >>> +{
> >>> +  a[0] = b[0];
> >>> +  a[1] = b[1];
> >>> +}
> >>> +
> >>> +void
> >>> +foo6 (float* __restrict b, char* a)
> >>> +{
> >>> +  a[0] = b[0];
> >>> +  a[1] = b[1];
> >>> +  a[2] = b[2];
> >>> +  a[3] = b[3];
> >>> +}
> >>> +
> >>> +void
> >>> +foo7 (_Float16* __restrict b, char* a)
> >>> +{
> >>> +  a[0] = b[0];
> >>> +  a[1] = b[1];
> >>> +  a[2] = b[2];
> >>> +  a[3] = b[3];
> >>> +  a[4] = b[4];
> >>> +  a[5] = b[5];
> >>> +  a[6] = b[6];
> >>> +  a[7] = b[7];
> >>> +}
> >>> +
> >>> +void
> >>> +foo8 (double* __restrict b, short* a)
> >>> +{
> >>> +  a[0] = b[0];
> >>> +  a[1] = b[1];
> >>> +}
> >>> +
> >>> +void
> >>> +foo9 (float* __restrict b, char* a)
> >>> +{
> >>> +  a[0] = b[0];
> >>> +  a[1] = b[1];
> >>> +  a[2] = b[2];
> >>> +  a[3] = b[3];
> >>> +}
> >>> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> >>> index bd3b07a3aa1..1118c89686d 100644
> >>> --- a/gcc/tree-vect-stmts.cc
> >>> +++ b/gcc/tree-vect-stmts.cc
> >>> @@ -5162,6 +5162,49 @@ vectorizable_conversion (vec_info *vinfo,
> >>>         return false;
> >>>        if (supportable_convert_operation (code, vectype_out, vectype_in, &code1))
> >>>         break;
> >>
> >> A comment would be nice here.  Like
> >>
> >>    /* For conversions between float and smaller integer types try whether we can
> >>       use intermediate signed integer types to support the conversion.  */
> >>
> >>> +      if ((code == FLOAT_EXPR
> >>> +          && GET_MODE_SIZE (lhs_mode) > GET_MODE_SIZE (rhs_mode))
> >>> +         || (code == FIX_TRUNC_EXPR
> >>> +             && GET_MODE_SIZE (rhs_mode) > GET_MODE_SIZE (lhs_mode)))
> >
> > Is the FIX_TRUNC_EXPR case safe without some flag?
> >
> > #include <stdint.h>
> > int32_t x = (int32_t)0x1.0p32;
> > int32_t y = (int32_t)(int64_t)0x1.0p32;
> >
> > sets x to 2147483647 and y to 0.

Hmm, good question.  GENERIC has a direct truncation to unsigned char
for example, the C standard generally says if the integral part cannot
be represented then the behavior is undefined.  So I think we should be
safe here (0x1.0p32 doesn't fit an int).

> Also, I think multi_step_cvt should influence the costs, since at
> the moment we cost one statement but generate two.  This makes a
> difference for SVE with VECT_COMPARE_COSTS.  Would changing it to:
>
>           vect_model_simple_cost (vinfo, stmt_info,
>                                   ncopies * (multi_step_cvt + 1),
>                                   dt, ndts, slp_node,
>                                   cost_vec);
>
> be OK?

Yeah, I guess so.

> There again, I wonder if we should handle this using patterns instead.
> That makes both conversions explicit and therefore easier to cost.

But we don't do this for the other multi-step conversions either ...
but sure, I also suggested that but the complaint was that with
BB SLP this would get us a vector type with off lanes?

> E.g. for SVE, an integer extension is free if the source is a load,
> and we do try to model that.  But it's difficult to handle if the
> conversion is only implicit.

In the distant future I hope that vectorizable_conversion will
upon analysis produce an SLP sub-graph for the suggested
code generation which we'd then cost.  My current idea is
to get this part live before switching the analysis to all-SLP
because it "seems" easier (fingers crossing).  Still I'm
struggling to even find time to start that effort.

Richard.

>
> Thanks,
> Richard

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] [vect]Use intermiediate integer type for float_expr/fix_trunc_expr when direct optab is not existed.
  2023-06-21 11:04       ` Richard Biener
@ 2023-06-21 11:20         ` Richard Sandiford
  2023-06-21 20:39         ` Joseph Myers
  1 sibling, 0 replies; 16+ messages in thread
From: Richard Sandiford @ 2023-06-21 11:20 UTC (permalink / raw)
  To: Richard Biener
  Cc: Richard Biener via Gcc-patches, liuhongt, crazylht, hjl.tools

Richard Biener <richard.guenther@gmail.com> writes:
> On Wed, Jun 21, 2023 at 11:32 AM Richard Sandiford
> <richard.sandiford@arm.com> wrote:
>>
>> Richard Sandiford <richard.sandiford@arm.com> writes:
>> > Richard Biener via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
>> >> On Fri, Jun 2, 2023 at 3:01 AM liuhongt via Gcc-patches
>> >> <gcc-patches@gcc.gnu.org> wrote:
>> >>>
>> >>> We have already use intermidate type in case WIDEN, but not for NONE,
>> >>> this patch extended that.
>> >>>
>> >>> I didn't do that in pattern recog since we need to know whether the
>> >>> stmt belongs to any slp_node to decide the vectype, the related optabs
>> >>> are checked according to vectype_in and vectype_out. For non-slp case,
>> >>> vec_pack/unpack are always used when lhs has different size from rhs,
>> >>> for slp case, sometimes vec_pack/unpack is used, somethings
>> >>> direct conversion is used.
>> >>>
>> >>> Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
>> >>> Ok for trunk?
>> >>>
>> >>> gcc/ChangeLog:
>> >>>
>> >>>         PR target/110018
>> >>>         * tree-vect-stmts.cc (vectorizable_conversion): Use
>> >>>         intermiediate integer type for float_expr/fix_trunc_expr when
>> >>>         direct optab is not existed.
>> >>>
>> >>> gcc/testsuite/ChangeLog:
>> >>>
>> >>>         * gcc.target/i386/pr110018-1.c: New test.
>> >>> ---
>> >>>  gcc/testsuite/gcc.target/i386/pr110018-1.c | 94 ++++++++++++++++++++++
>> >>>  gcc/tree-vect-stmts.cc                     | 56 ++++++++++++-
>> >>>  2 files changed, 149 insertions(+), 1 deletion(-)
>> >>>  create mode 100644 gcc/testsuite/gcc.target/i386/pr110018-1.c
>> >>>
>> >>> diff --git a/gcc/testsuite/gcc.target/i386/pr110018-1.c b/gcc/testsuite/gcc.target/i386/pr110018-1.c
>> >>> new file mode 100644
>> >>> index 00000000000..b1baffd7af1
>> >>> --- /dev/null
>> >>> +++ b/gcc/testsuite/gcc.target/i386/pr110018-1.c
>> >>> @@ -0,0 +1,94 @@
>> >>> +/* { dg-do compile } */
>> >>> +/* { dg-options "-mavx512fp16 -mavx512vl -O2 -mavx512dq" } */
>> >>> +/* { dg-final { scan-assembler-times {(?n)vcvttp[dsh]2[dqw]} 5 } } */
>> >>> +/* { dg-final { scan-assembler-times {(?n)vcvt[dqw]*2p[dsh]} 5 } } */
>> >>> +
>> >>> +void
>> >>> +foo (double* __restrict a, char* b)
>> >>> +{
>> >>> +  a[0] = b[0];
>> >>> +  a[1] = b[1];
>> >>> +}
>> >>> +
>> >>> +void
>> >>> +foo1 (float* __restrict a, char* b)
>> >>> +{
>> >>> +  a[0] = b[0];
>> >>> +  a[1] = b[1];
>> >>> +  a[2] = b[2];
>> >>> +  a[3] = b[3];
>> >>> +}
>> >>> +
>> >>> +void
>> >>> +foo2 (_Float16* __restrict a, char* b)
>> >>> +{
>> >>> +  a[0] = b[0];
>> >>> +  a[1] = b[1];
>> >>> +  a[2] = b[2];
>> >>> +  a[3] = b[3];
>> >>> +  a[4] = b[4];
>> >>> +  a[5] = b[5];
>> >>> +  a[6] = b[6];
>> >>> +  a[7] = b[7];
>> >>> +}
>> >>> +
>> >>> +void
>> >>> +foo3 (double* __restrict a, short* b)
>> >>> +{
>> >>> +  a[0] = b[0];
>> >>> +  a[1] = b[1];
>> >>> +}
>> >>> +
>> >>> +void
>> >>> +foo4 (float* __restrict a, char* b)
>> >>> +{
>> >>> +  a[0] = b[0];
>> >>> +  a[1] = b[1];
>> >>> +  a[2] = b[2];
>> >>> +  a[3] = b[3];
>> >>> +}
>> >>> +
>> >>> +void
>> >>> +foo5 (double* __restrict b, char* a)
>> >>> +{
>> >>> +  a[0] = b[0];
>> >>> +  a[1] = b[1];
>> >>> +}
>> >>> +
>> >>> +void
>> >>> +foo6 (float* __restrict b, char* a)
>> >>> +{
>> >>> +  a[0] = b[0];
>> >>> +  a[1] = b[1];
>> >>> +  a[2] = b[2];
>> >>> +  a[3] = b[3];
>> >>> +}
>> >>> +
>> >>> +void
>> >>> +foo7 (_Float16* __restrict b, char* a)
>> >>> +{
>> >>> +  a[0] = b[0];
>> >>> +  a[1] = b[1];
>> >>> +  a[2] = b[2];
>> >>> +  a[3] = b[3];
>> >>> +  a[4] = b[4];
>> >>> +  a[5] = b[5];
>> >>> +  a[6] = b[6];
>> >>> +  a[7] = b[7];
>> >>> +}
>> >>> +
>> >>> +void
>> >>> +foo8 (double* __restrict b, short* a)
>> >>> +{
>> >>> +  a[0] = b[0];
>> >>> +  a[1] = b[1];
>> >>> +}
>> >>> +
>> >>> +void
>> >>> +foo9 (float* __restrict b, char* a)
>> >>> +{
>> >>> +  a[0] = b[0];
>> >>> +  a[1] = b[1];
>> >>> +  a[2] = b[2];
>> >>> +  a[3] = b[3];
>> >>> +}
>> >>> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
>> >>> index bd3b07a3aa1..1118c89686d 100644
>> >>> --- a/gcc/tree-vect-stmts.cc
>> >>> +++ b/gcc/tree-vect-stmts.cc
>> >>> @@ -5162,6 +5162,49 @@ vectorizable_conversion (vec_info *vinfo,
>> >>>         return false;
>> >>>        if (supportable_convert_operation (code, vectype_out, vectype_in, &code1))
>> >>>         break;
>> >>
>> >> A comment would be nice here.  Like
>> >>
>> >>    /* For conversions between float and smaller integer types try whether we can
>> >>       use intermediate signed integer types to support the conversion.  */
>> >>
>> >>> +      if ((code == FLOAT_EXPR
>> >>> +          && GET_MODE_SIZE (lhs_mode) > GET_MODE_SIZE (rhs_mode))
>> >>> +         || (code == FIX_TRUNC_EXPR
>> >>> +             && GET_MODE_SIZE (rhs_mode) > GET_MODE_SIZE (lhs_mode)))
>> >
>> > Is the FIX_TRUNC_EXPR case safe without some flag?
>> >
>> > #include <stdint.h>
>> > int32_t x = (int32_t)0x1.0p32;
>> > int32_t y = (int32_t)(int64_t)0x1.0p32;
>> >
>> > sets x to 2147483647 and y to 0.
>
> Hmm, good question.  GENERIC has a direct truncation to unsigned char
> for example, the C standard generally says if the integral part cannot
> be represented then the behavior is undefined.  So I think we should be
> safe here (0x1.0p32 doesn't fit an int).

OK.

>> Also, I think multi_step_cvt should influence the costs, since at
>> the moment we cost one statement but generate two.  This makes a
>> difference for SVE with VECT_COMPARE_COSTS.  Would changing it to:
>>
>>           vect_model_simple_cost (vinfo, stmt_info,
>>                                   ncopies * (multi_step_cvt + 1),
>>                                   dt, ndts, slp_node,
>>                                   cost_vec);
>>
>> be OK?
>
> Yeah, I guess so.

Thanks, will send a patch.

>> There again, I wonder if we should handle this using patterns instead.
>> That makes both conversions explicit and therefore easier to cost.
>
> But we don't do this for the other multi-step conversions either ...

Yeah, agree it's not a new problem.

> but sure, I also suggested that but the complaint was that with
> BB SLP this would get us a vector type with off lanes?

Ah, sorry, missed the previous discussion.  I only became interested
once some SVE tests started failing :)

>> E.g. for SVE, an integer extension is free if the source is a load,
>> and we do try to model that.  But it's difficult to handle if the
>> conversion is only implicit.
>
> In the distant future I hope that vectorizable_conversion will
> upon analysis produce an SLP sub-graph for the suggested
> code generation which we'd then cost.

Sounds good.  Agree we can live with the compound operation
until then.

> My current idea is
> to get this part live before switching the analysis to all-SLP
> because it "seems" easier (fingers crossing).  Still I'm
> struggling to even find time to start that effort.

Yeah, can sympthaise :/  Don't know how you find time for all
the reviews you currently do.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] [vect]Use intermiediate integer type for float_expr/fix_trunc_expr when direct optab is not existed.
  2023-06-21 11:04       ` Richard Biener
  2023-06-21 11:20         ` Richard Sandiford
@ 2023-06-21 20:39         ` Joseph Myers
  2023-06-22  7:32           ` Richard Biener
  1 sibling, 1 reply; 16+ messages in thread
From: Joseph Myers @ 2023-06-21 20:39 UTC (permalink / raw)
  To: Richard Biener
  Cc: Richard Biener via Gcc-patches, liuhongt, crazylht, hjl.tools,
	richard.sandiford

On Wed, 21 Jun 2023, Richard Biener via Gcc-patches wrote:

> > > int32_t x = (int32_t)0x1.0p32;
> > > int32_t y = (int32_t)(int64_t)0x1.0p32;
> > >
> > > sets x to 2147483647 and y to 0.
> 
> Hmm, good question.  GENERIC has a direct truncation to unsigned char
> for example, the C standard generally says if the integral part cannot
> be represented then the behavior is undefined.  So I think we should be
> safe here (0x1.0p32 doesn't fit an int).

We should be following Annex F (unspecified value plus "invalid" exception 
for out-of-range floating-to-integer conversions rather than undefined 
behavior).  But we don't achieve that very well at present (see bug 93806 
comments 27-29 for examples of how such conversions produce wobbly 
values).

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] [vect]Use intermiediate integer type for float_expr/fix_trunc_expr when direct optab is not existed.
  2023-06-21 20:39         ` Joseph Myers
@ 2023-06-22  7:32           ` Richard Biener
  2023-07-12  9:19             ` Robin Dapp
  0 siblings, 1 reply; 16+ messages in thread
From: Richard Biener @ 2023-06-22  7:32 UTC (permalink / raw)
  To: Joseph Myers
  Cc: Richard Biener via Gcc-patches, liuhongt, crazylht, hjl.tools,
	richard.sandiford

On Wed, Jun 21, 2023 at 10:39 PM Joseph Myers <joseph@codesourcery.com> wrote:
>
> On Wed, 21 Jun 2023, Richard Biener via Gcc-patches wrote:
>
> > > > int32_t x = (int32_t)0x1.0p32;
> > > > int32_t y = (int32_t)(int64_t)0x1.0p32;
> > > >
> > > > sets x to 2147483647 and y to 0.
> >
> > Hmm, good question.  GENERIC has a direct truncation to unsigned char
> > for example, the C standard generally says if the integral part cannot
> > be represented then the behavior is undefined.  So I think we should be
> > safe here (0x1.0p32 doesn't fit an int).
>
> We should be following Annex F (unspecified value plus "invalid" exception
> for out-of-range floating-to-integer conversions rather than undefined
> behavior).  But we don't achieve that very well at present (see bug 93806
> comments 27-29 for examples of how such conversions produce wobbly
> values).

That would mean guarding this with !flag_trapping_math would be the appropriate
thing to do.

Richard.

> --
> Joseph S. Myers
> joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] [vect]Use intermiediate integer type for float_expr/fix_trunc_expr when direct optab is not existed.
  2023-06-22  7:32           ` Richard Biener
@ 2023-07-12  9:19             ` Robin Dapp
  0 siblings, 0 replies; 16+ messages in thread
From: Robin Dapp @ 2023-07-12  9:19 UTC (permalink / raw)
  To: Richard Biener, Joseph Myers
  Cc: rdapp.gcc, Richard Biener via Gcc-patches, liuhongt, crazylht,
	hjl.tools, richard.sandiford

>>>>> int32_t x = (int32_t)0x1.0p32;
>>>>> int32_t y = (int32_t)(int64_t)0x1.0p32;
>>>>>
>>>>> sets x to 2147483647 and y to 0.
>>>
>>> Hmm, good question.  GENERIC has a direct truncation to unsigned char
>>> for example, the C standard generally says if the integral part cannot
>>> be represented then the behavior is undefined.  So I think we should be
>>> safe here (0x1.0p32 doesn't fit an int).
>>
>> We should be following Annex F (unspecified value plus "invalid" exception
>> for out-of-range floating-to-integer conversions rather than undefined
>> behavior).  But we don't achieve that very well at present (see bug 93806
>> comments 27-29 for examples of how such conversions produce wobbly
>> values).
> 
> That would mean guarding this with !flag_trapping_math would be the appropriate
> thing to do.

Follow-up on this:  When we do a NARROW_DST multiple-step conversion we
do not guard with !flag_trapping_math.  Is this intentional and if so, why
do we only require it for the NONE case?

I was thinking of implementing an expander for double -> int16 conversion
for RISC-V with multiple steps but that would just circumvent the
!flag_trapping_math check.  Then I wondered why we vectorize this
using multiple steps on x86 even with trapping math and it turns out
that the difference is the NARROW_DST modifier but we emit
VEC_PACK_FIX_TRUNC_EXPR anyway.  Is this "safe"?

Regards
 Robin

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2023-07-12  9:19 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-06-02  1:00 [PATCH] [vect]Use intermiediate integer type for float_expr/fix_trunc_expr when direct optab is not existed liuhongt
2023-06-20  8:38 ` Richard Biener
2023-06-20  9:09   ` Hongtao Liu
2023-06-20  9:22     ` Richard Biener
2023-06-20 16:11       ` liuhongt
2023-06-21  7:49         ` Uros Bizjak
2023-06-21  8:19           ` Richard Biener
2023-06-21 14:37             ` Uros Bizjak
2023-06-22 20:36         ` Thiago Jung Bauermann
2023-06-21  9:10   ` Richard Sandiford
2023-06-21  9:32     ` Richard Sandiford
2023-06-21 11:04       ` Richard Biener
2023-06-21 11:20         ` Richard Sandiford
2023-06-21 20:39         ` Joseph Myers
2023-06-22  7:32           ` Richard Biener
2023-07-12  9:19             ` Robin Dapp

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).