[PATCH] vect: more oversized bitmask fixups

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [PATCH] vect: more oversized bitmask fixups
@ 2024-03-21 14:22 Andrew Stubbs
  2024-03-21 15:18 ` Richard Biener
  0 siblings, 1 reply; 6+ messages in thread
From: Andrew Stubbs @ 2024-03-21 14:22 UTC (permalink / raw)
  To: gcc-patches

My previous patch to fix this problem with xor was rejected because we
want to fix these issues only at the point of use.  That patch produced
slightly better code, in this example, but this works too....

These patches fix up a failure in testcase vect/tsvc/vect-tsvc-s278.c when
configured to use V32 instead of V64 (I plan to do this for RDNA devices).

The problem was that a "not" operation on the mask inadvertently enabled
inactive lanes 31-63 and corrupted the output.  The fix is to adjust the mask
when calling internal functions (in this case COND_MINUS), when doing masked
loads and stores, and when doing conditional jumps.

OK for mainline?

Andrew

gcc/ChangeLog:

	* dojump.cc (do_compare_rtx_and_jump): Clear excess bits in vector
	bitmaps.
	* internal-fn.cc (expand_fn_using_insn): Likewise.
	(add_mask_and_len_args): Likewise.
---
 gcc/dojump.cc      | 16 ++++++++++++++++
 gcc/internal-fn.cc | 26 ++++++++++++++++++++++++++
 2 files changed, 42 insertions(+)

diff --git a/gcc/dojump.cc b/gcc/dojump.cc
index 88600cb42d3..8df86957e83 100644
--- a/gcc/dojump.cc
+++ b/gcc/dojump.cc
@@ -1235,6 +1235,22 @@ do_compare_rtx_and_jump (rtx op0, rtx op1, enum rtx_code code, int unsignedp,
 	    }
 	}
 
+      if (val
+	  && VECTOR_BOOLEAN_TYPE_P (TREE_TYPE (val))
+	  && SCALAR_INT_MODE_P (mode))
+	{
+	  auto nunits = TYPE_VECTOR_SUBPARTS (TREE_TYPE (val)).to_constant ();
+	  if (maybe_ne (GET_MODE_PRECISION (mode), nunits))
+	    {
+	      op0 = expand_binop (mode, and_optab, op0,
+				  GEN_INT ((HOST_WIDE_INT_1U << nunits) - 1),
+				  NULL_RTX, true, OPTAB_WIDEN);
+	      op1 = expand_binop (mode, and_optab, op1,
+				  GEN_INT ((HOST_WIDE_INT_1U << nunits) - 1),
+				  NULL_RTX, true, OPTAB_WIDEN);
+	    }
+	}
+
       emit_cmp_and_jump_insns (op0, op1, code, size, mode, unsignedp, val,
 			       if_true_label, prob);
     }
diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
index fcf47c7fa12..5269f0ac528 100644
--- a/gcc/internal-fn.cc
+++ b/gcc/internal-fn.cc
@@ -245,6 +245,18 @@ expand_fn_using_insn (gcall *stmt, insn_code icode, unsigned int noutputs,
 	       && SSA_NAME_IS_DEFAULT_DEF (rhs)
 	       && VAR_P (SSA_NAME_VAR (rhs)))
 	create_undefined_input_operand (&ops[opno], TYPE_MODE (rhs_type));
+      else if (VECTOR_BOOLEAN_TYPE_P (rhs_type)
+	       && SCALAR_INT_MODE_P (TYPE_MODE (rhs_type))
+	       && maybe_ne (GET_MODE_PRECISION (TYPE_MODE (rhs_type)),
+			    TYPE_VECTOR_SUBPARTS (rhs_type).to_constant ()))
+	{
+	  /* Ensure that the vector bitmasks do not have excess bits.  */
+	  int nunits = TYPE_VECTOR_SUBPARTS (rhs_type).to_constant ();
+	  rtx tmp = expand_binop (TYPE_MODE (rhs_type), and_optab, rhs_rtx,
+				  GEN_INT ((HOST_WIDE_INT_1U << nunits) - 1),
+				  NULL_RTX, true, OPTAB_WIDEN);
+	  create_input_operand (&ops[opno], tmp, TYPE_MODE (rhs_type));
+	}
       else
 	create_input_operand (&ops[opno], rhs_rtx, TYPE_MODE (rhs_type));
       opno += 1;
@@ -312,6 +324,20 @@ add_mask_and_len_args (expand_operand *ops, unsigned int opno, gcall *stmt)
     {
       tree mask = gimple_call_arg (stmt, mask_index);
       rtx mask_rtx = expand_normal (mask);
+
+      tree mask_type = TREE_TYPE (mask);
+      if (VECTOR_BOOLEAN_TYPE_P (mask_type)
+	  && SCALAR_INT_MODE_P (TYPE_MODE (mask_type))
+	  && maybe_ne (GET_MODE_PRECISION (TYPE_MODE (mask_type)),
+		       TYPE_VECTOR_SUBPARTS (mask_type).to_constant ()))
+	{
+	  /* Ensure that the vector bitmasks do not have excess bits.  */
+	  int nunits = TYPE_VECTOR_SUBPARTS (mask_type).to_constant ();
+	  mask_rtx = expand_binop (TYPE_MODE (mask_type), and_optab, mask_rtx,
+				   GEN_INT ((HOST_WIDE_INT_1U << nunits) - 1),
+				   NULL_RTX, true, OPTAB_WIDEN);
+	}
+
       create_input_operand (&ops[opno++], mask_rtx,
 			    TYPE_MODE (TREE_TYPE (mask)));
     }
-- 
2.41.0


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] vect: more oversized bitmask fixups
  2024-03-21 14:22 [PATCH] vect: more oversized bitmask fixups Andrew Stubbs
@ 2024-03-21 15:18 ` Richard Biener
  2024-03-21 16:07   ` Andrew Stubbs
  0 siblings, 1 reply; 6+ messages in thread
From: Richard Biener @ 2024-03-21 15:18 UTC (permalink / raw)
  To: Andrew Stubbs; +Cc: gcc-patches

On Thu, Mar 21, 2024 at 3:23 PM Andrew Stubbs <ams@baylibre.com> wrote:
>
> My previous patch to fix this problem with xor was rejected because we
> want to fix these issues only at the point of use.  That patch produced
> slightly better code, in this example, but this works too....
>
> These patches fix up a failure in testcase vect/tsvc/vect-tsvc-s278.c when
> configured to use V32 instead of V64 (I plan to do this for RDNA devices).
>
> The problem was that a "not" operation on the mask inadvertently enabled
> inactive lanes 31-63 and corrupted the output.  The fix is to adjust the mask
> when calling internal functions (in this case COND_MINUS), when doing masked
> loads and stores, and when doing conditional jumps.
>
> OK for mainline?
>
> Andrew
>
> gcc/ChangeLog:
>
>         * dojump.cc (do_compare_rtx_and_jump): Clear excess bits in vector
>         bitmaps.
>         * internal-fn.cc (expand_fn_using_insn): Likewise.
>         (add_mask_and_len_args): Likewise.
> ---
>  gcc/dojump.cc      | 16 ++++++++++++++++
>  gcc/internal-fn.cc | 26 ++++++++++++++++++++++++++
>  2 files changed, 42 insertions(+)
>
> diff --git a/gcc/dojump.cc b/gcc/dojump.cc
> index 88600cb42d3..8df86957e83 100644
> --- a/gcc/dojump.cc
> +++ b/gcc/dojump.cc
> @@ -1235,6 +1235,22 @@ do_compare_rtx_and_jump (rtx op0, rtx op1, enum rtx_code code, int unsignedp,
>             }
>         }
>
> +      if (val
> +         && VECTOR_BOOLEAN_TYPE_P (TREE_TYPE (val))
> +         && SCALAR_INT_MODE_P (mode))
> +       {
> +         auto nunits = TYPE_VECTOR_SUBPARTS (TREE_TYPE (val)).to_constant ();
> +         if (maybe_ne (GET_MODE_PRECISION (mode), nunits))
> +           {
> +             op0 = expand_binop (mode, and_optab, op0,
> +                                 GEN_INT ((HOST_WIDE_INT_1U << nunits) - 1),
> +                                 NULL_RTX, true, OPTAB_WIDEN);
> +             op1 = expand_binop (mode, and_optab, op1,
> +                                 GEN_INT ((HOST_WIDE_INT_1U << nunits) - 1),
> +                                 NULL_RTX, true, OPTAB_WIDEN);
> +           }
> +       }
> +

Can we then remove the same code from do_compare_and_jump before the call to
do_compare_rtx_and_jump?  I'll note that we don't pass 'val' there and
'val' is unfortunately
not documented - what's it supposed to be?  I think I placed the original fix in
do_compare_and_jump because we have the full into available there.  So
what's the
do_compare_rtx_and_jump caller that needs fixing as well?  (IMHO keying on 'val'
looks fragile)

The other hunks below are OK.

Thanks,
Richard.

>        emit_cmp_and_jump_insns (op0, op1, code, size, mode, unsignedp, val,
>                                if_true_label, prob);
>      }
> diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
> index fcf47c7fa12..5269f0ac528 100644
> --- a/gcc/internal-fn.cc
> +++ b/gcc/internal-fn.cc
> @@ -245,6 +245,18 @@ expand_fn_using_insn (gcall *stmt, insn_code icode, unsigned int noutputs,
>                && SSA_NAME_IS_DEFAULT_DEF (rhs)
>                && VAR_P (SSA_NAME_VAR (rhs)))
>         create_undefined_input_operand (&ops[opno], TYPE_MODE (rhs_type));
> +      else if (VECTOR_BOOLEAN_TYPE_P (rhs_type)
> +              && SCALAR_INT_MODE_P (TYPE_MODE (rhs_type))
> +              && maybe_ne (GET_MODE_PRECISION (TYPE_MODE (rhs_type)),
> +                           TYPE_VECTOR_SUBPARTS (rhs_type).to_constant ()))
> +       {
> +         /* Ensure that the vector bitmasks do not have excess bits.  */
> +         int nunits = TYPE_VECTOR_SUBPARTS (rhs_type).to_constant ();
> +         rtx tmp = expand_binop (TYPE_MODE (rhs_type), and_optab, rhs_rtx,
> +                                 GEN_INT ((HOST_WIDE_INT_1U << nunits) - 1),
> +                                 NULL_RTX, true, OPTAB_WIDEN);
> +         create_input_operand (&ops[opno], tmp, TYPE_MODE (rhs_type));
> +       }
>        else
>         create_input_operand (&ops[opno], rhs_rtx, TYPE_MODE (rhs_type));
>        opno += 1;
> @@ -312,6 +324,20 @@ add_mask_and_len_args (expand_operand *ops, unsigned int opno, gcall *stmt)
>      {
>        tree mask = gimple_call_arg (stmt, mask_index);
>        rtx mask_rtx = expand_normal (mask);
> +
> +      tree mask_type = TREE_TYPE (mask);
> +      if (VECTOR_BOOLEAN_TYPE_P (mask_type)
> +         && SCALAR_INT_MODE_P (TYPE_MODE (mask_type))
> +         && maybe_ne (GET_MODE_PRECISION (TYPE_MODE (mask_type)),
> +                      TYPE_VECTOR_SUBPARTS (mask_type).to_constant ()))
> +       {
> +         /* Ensure that the vector bitmasks do not have excess bits.  */
> +         int nunits = TYPE_VECTOR_SUBPARTS (mask_type).to_constant ();
> +         mask_rtx = expand_binop (TYPE_MODE (mask_type), and_optab, mask_rtx,
> +                                  GEN_INT ((HOST_WIDE_INT_1U << nunits) - 1),
> +                                  NULL_RTX, true, OPTAB_WIDEN);
> +       }
> +
>        create_input_operand (&ops[opno++], mask_rtx,
>                             TYPE_MODE (TREE_TYPE (mask)));
>      }
> --
> 2.41.0
>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] vect: more oversized bitmask fixups
  2024-03-21 15:18 ` Richard Biener
@ 2024-03-21 16:07   ` Andrew Stubbs
  2024-03-22  8:43     ` Richard Biener
  0 siblings, 1 reply; 6+ messages in thread
From: Andrew Stubbs @ 2024-03-21 16:07 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches

On 21/03/2024 15:18, Richard Biener wrote:
> On Thu, Mar 21, 2024 at 3:23 PM Andrew Stubbs <ams@baylibre.com> wrote:
>>
>> My previous patch to fix this problem with xor was rejected because we
>> want to fix these issues only at the point of use.  That patch produced
>> slightly better code, in this example, but this works too....
>>
>> These patches fix up a failure in testcase vect/tsvc/vect-tsvc-s278.c when
>> configured to use V32 instead of V64 (I plan to do this for RDNA devices).
>>
>> The problem was that a "not" operation on the mask inadvertently enabled
>> inactive lanes 31-63 and corrupted the output.  The fix is to adjust the mask
>> when calling internal functions (in this case COND_MINUS), when doing masked
>> loads and stores, and when doing conditional jumps.
>>
>> OK for mainline?
>>
>> Andrew
>>
>> gcc/ChangeLog:
>>
>>          * dojump.cc (do_compare_rtx_and_jump): Clear excess bits in vector
>>          bitmaps.
>>          * internal-fn.cc (expand_fn_using_insn): Likewise.
>>          (add_mask_and_len_args): Likewise.
>> ---
>>   gcc/dojump.cc      | 16 ++++++++++++++++
>>   gcc/internal-fn.cc | 26 ++++++++++++++++++++++++++
>>   2 files changed, 42 insertions(+)
>>
>> diff --git a/gcc/dojump.cc b/gcc/dojump.cc
>> index 88600cb42d3..8df86957e83 100644
>> --- a/gcc/dojump.cc
>> +++ b/gcc/dojump.cc
>> @@ -1235,6 +1235,22 @@ do_compare_rtx_and_jump (rtx op0, rtx op1, enum rtx_code code, int unsignedp,
>>              }
>>          }
>>
>> +      if (val
>> +         && VECTOR_BOOLEAN_TYPE_P (TREE_TYPE (val))
>> +         && SCALAR_INT_MODE_P (mode))
>> +       {
>> +         auto nunits = TYPE_VECTOR_SUBPARTS (TREE_TYPE (val)).to_constant ();
>> +         if (maybe_ne (GET_MODE_PRECISION (mode), nunits))
>> +           {
>> +             op0 = expand_binop (mode, and_optab, op0,
>> +                                 GEN_INT ((HOST_WIDE_INT_1U << nunits) - 1),
>> +                                 NULL_RTX, true, OPTAB_WIDEN);
>> +             op1 = expand_binop (mode, and_optab, op1,
>> +                                 GEN_INT ((HOST_WIDE_INT_1U << nunits) - 1),
>> +                                 NULL_RTX, true, OPTAB_WIDEN);
>> +           }
>> +       }
>> +
> 
> Can we then remove the same code from do_compare_and_jump before the call to
> do_compare_rtx_and_jump?

It's called from do_jump.

>  I'll note that we don't pass 'val' there and
> 'val' is unfortunately
> not documented - what's it supposed to be?  I think I placed the original fix in
> do_compare_and_jump because we have the full into available there.  So
> what's the
> do_compare_rtx_and_jump caller that needs fixing as well?  (IMHO keying on 'val'
> looks fragile)

"val" is the tree expression from which the rtx op0 was expanded. It's 
optional, but it's used in emit_cmp_and_jump_insns to determine whether 
the target supports tbranch (according to a comment).

I think it would be safe to remove your code as that path does path 
"treeop0" to "val".

WDYT?

> The other hunks below are OK.

Thanks.

Andrew

> Thanks,
> Richard.
> 
>>         emit_cmp_and_jump_insns (op0, op1, code, size, mode, unsignedp, val,
>>                                 if_true_label, prob);
>>       }
>> diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
>> index fcf47c7fa12..5269f0ac528 100644
>> --- a/gcc/internal-fn.cc
>> +++ b/gcc/internal-fn.cc
>> @@ -245,6 +245,18 @@ expand_fn_using_insn (gcall *stmt, insn_code icode, unsigned int noutputs,
>>                 && SSA_NAME_IS_DEFAULT_DEF (rhs)
>>                 && VAR_P (SSA_NAME_VAR (rhs)))
>>          create_undefined_input_operand (&ops[opno], TYPE_MODE (rhs_type));
>> +      else if (VECTOR_BOOLEAN_TYPE_P (rhs_type)
>> +              && SCALAR_INT_MODE_P (TYPE_MODE (rhs_type))
>> +              && maybe_ne (GET_MODE_PRECISION (TYPE_MODE (rhs_type)),
>> +                           TYPE_VECTOR_SUBPARTS (rhs_type).to_constant ()))
>> +       {
>> +         /* Ensure that the vector bitmasks do not have excess bits.  */
>> +         int nunits = TYPE_VECTOR_SUBPARTS (rhs_type).to_constant ();
>> +         rtx tmp = expand_binop (TYPE_MODE (rhs_type), and_optab, rhs_rtx,
>> +                                 GEN_INT ((HOST_WIDE_INT_1U << nunits) - 1),
>> +                                 NULL_RTX, true, OPTAB_WIDEN);
>> +         create_input_operand (&ops[opno], tmp, TYPE_MODE (rhs_type));
>> +       }
>>         else
>>          create_input_operand (&ops[opno], rhs_rtx, TYPE_MODE (rhs_type));
>>         opno += 1;
>> @@ -312,6 +324,20 @@ add_mask_and_len_args (expand_operand *ops, unsigned int opno, gcall *stmt)
>>       {
>>         tree mask = gimple_call_arg (stmt, mask_index);
>>         rtx mask_rtx = expand_normal (mask);
>> +
>> +      tree mask_type = TREE_TYPE (mask);
>> +      if (VECTOR_BOOLEAN_TYPE_P (mask_type)
>> +         && SCALAR_INT_MODE_P (TYPE_MODE (mask_type))
>> +         && maybe_ne (GET_MODE_PRECISION (TYPE_MODE (mask_type)),
>> +                      TYPE_VECTOR_SUBPARTS (mask_type).to_constant ()))
>> +       {
>> +         /* Ensure that the vector bitmasks do not have excess bits.  */
>> +         int nunits = TYPE_VECTOR_SUBPARTS (mask_type).to_constant ();
>> +         mask_rtx = expand_binop (TYPE_MODE (mask_type), and_optab, mask_rtx,
>> +                                  GEN_INT ((HOST_WIDE_INT_1U << nunits) - 1),
>> +                                  NULL_RTX, true, OPTAB_WIDEN);
>> +       }
>> +
>>         create_input_operand (&ops[opno++], mask_rtx,
>>                              TYPE_MODE (TREE_TYPE (mask)));
>>       }
>> --
>> 2.41.0
>>


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] vect: more oversized bitmask fixups
  2024-03-21 16:07   ` Andrew Stubbs
@ 2024-03-22  8:43     ` Richard Biener
  2024-03-22 14:15       ` Andrew Stubbs
  0 siblings, 1 reply; 6+ messages in thread
From: Richard Biener @ 2024-03-22  8:43 UTC (permalink / raw)
  To: Andrew Stubbs; +Cc: gcc-patches

On Thu, Mar 21, 2024 at 5:07 PM Andrew Stubbs <ams@baylibre.com> wrote:
>
> On 21/03/2024 15:18, Richard Biener wrote:
> > On Thu, Mar 21, 2024 at 3:23 PM Andrew Stubbs <ams@baylibre.com> wrote:
> >>
> >> My previous patch to fix this problem with xor was rejected because we
> >> want to fix these issues only at the point of use.  That patch produced
> >> slightly better code, in this example, but this works too....
> >>
> >> These patches fix up a failure in testcase vect/tsvc/vect-tsvc-s278.c when
> >> configured to use V32 instead of V64 (I plan to do this for RDNA devices).
> >>
> >> The problem was that a "not" operation on the mask inadvertently enabled
> >> inactive lanes 31-63 and corrupted the output.  The fix is to adjust the mask
> >> when calling internal functions (in this case COND_MINUS), when doing masked
> >> loads and stores, and when doing conditional jumps.
> >>
> >> OK for mainline?
> >>
> >> Andrew
> >>
> >> gcc/ChangeLog:
> >>
> >>          * dojump.cc (do_compare_rtx_and_jump): Clear excess bits in vector
> >>          bitmaps.
> >>          * internal-fn.cc (expand_fn_using_insn): Likewise.
> >>          (add_mask_and_len_args): Likewise.
> >> ---
> >>   gcc/dojump.cc      | 16 ++++++++++++++++
> >>   gcc/internal-fn.cc | 26 ++++++++++++++++++++++++++
> >>   2 files changed, 42 insertions(+)
> >>
> >> diff --git a/gcc/dojump.cc b/gcc/dojump.cc
> >> index 88600cb42d3..8df86957e83 100644
> >> --- a/gcc/dojump.cc
> >> +++ b/gcc/dojump.cc
> >> @@ -1235,6 +1235,22 @@ do_compare_rtx_and_jump (rtx op0, rtx op1, enum rtx_code code, int unsignedp,
> >>              }
> >>          }
> >>
> >> +      if (val
> >> +         && VECTOR_BOOLEAN_TYPE_P (TREE_TYPE (val))
> >> +         && SCALAR_INT_MODE_P (mode))
> >> +       {
> >> +         auto nunits = TYPE_VECTOR_SUBPARTS (TREE_TYPE (val)).to_constant ();
> >> +         if (maybe_ne (GET_MODE_PRECISION (mode), nunits))
> >> +           {
> >> +             op0 = expand_binop (mode, and_optab, op0,
> >> +                                 GEN_INT ((HOST_WIDE_INT_1U << nunits) - 1),
> >> +                                 NULL_RTX, true, OPTAB_WIDEN);
> >> +             op1 = expand_binop (mode, and_optab, op1,
> >> +                                 GEN_INT ((HOST_WIDE_INT_1U << nunits) - 1),
> >> +                                 NULL_RTX, true, OPTAB_WIDEN);
> >> +           }
> >> +       }
> >> +
> >
> > Can we then remove the same code from do_compare_and_jump before the call to
> > do_compare_rtx_and_jump?
>
> It's called from do_jump.
>
> >  I'll note that we don't pass 'val' there and
> > 'val' is unfortunately
> > not documented - what's it supposed to be?  I think I placed the original fix in
> > do_compare_and_jump because we have the full into available there.  So
> > what's the
> > do_compare_rtx_and_jump caller that needs fixing as well?  (IMHO keying on 'val'
> > looks fragile)
>
> "val" is the tree expression from which the rtx op0 was expanded. It's
> optional, but it's used in emit_cmp_and_jump_insns to determine whether
> the target supports tbranch (according to a comment).
>
> I think it would be safe to remove your code as that path does path
> "treeop0" to "val".
>
> WDYT?

Looks like a bit of a mess, but yes, I think that sounds good.

Thanks,
Richard.

> > The other hunks below are OK.
>
> Thanks.
>
> Andrew
>
> > Thanks,
> > Richard.
> >
> >>         emit_cmp_and_jump_insns (op0, op1, code, size, mode, unsignedp, val,
> >>                                 if_true_label, prob);
> >>       }
> >> diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
> >> index fcf47c7fa12..5269f0ac528 100644
> >> --- a/gcc/internal-fn.cc
> >> +++ b/gcc/internal-fn.cc
> >> @@ -245,6 +245,18 @@ expand_fn_using_insn (gcall *stmt, insn_code icode, unsigned int noutputs,
> >>                 && SSA_NAME_IS_DEFAULT_DEF (rhs)
> >>                 && VAR_P (SSA_NAME_VAR (rhs)))
> >>          create_undefined_input_operand (&ops[opno], TYPE_MODE (rhs_type));
> >> +      else if (VECTOR_BOOLEAN_TYPE_P (rhs_type)
> >> +              && SCALAR_INT_MODE_P (TYPE_MODE (rhs_type))
> >> +              && maybe_ne (GET_MODE_PRECISION (TYPE_MODE (rhs_type)),
> >> +                           TYPE_VECTOR_SUBPARTS (rhs_type).to_constant ()))
> >> +       {
> >> +         /* Ensure that the vector bitmasks do not have excess bits.  */
> >> +         int nunits = TYPE_VECTOR_SUBPARTS (rhs_type).to_constant ();
> >> +         rtx tmp = expand_binop (TYPE_MODE (rhs_type), and_optab, rhs_rtx,
> >> +                                 GEN_INT ((HOST_WIDE_INT_1U << nunits) - 1),
> >> +                                 NULL_RTX, true, OPTAB_WIDEN);
> >> +         create_input_operand (&ops[opno], tmp, TYPE_MODE (rhs_type));
> >> +       }
> >>         else
> >>          create_input_operand (&ops[opno], rhs_rtx, TYPE_MODE (rhs_type));
> >>         opno += 1;
> >> @@ -312,6 +324,20 @@ add_mask_and_len_args (expand_operand *ops, unsigned int opno, gcall *stmt)
> >>       {
> >>         tree mask = gimple_call_arg (stmt, mask_index);
> >>         rtx mask_rtx = expand_normal (mask);
> >> +
> >> +      tree mask_type = TREE_TYPE (mask);
> >> +      if (VECTOR_BOOLEAN_TYPE_P (mask_type)
> >> +         && SCALAR_INT_MODE_P (TYPE_MODE (mask_type))
> >> +         && maybe_ne (GET_MODE_PRECISION (TYPE_MODE (mask_type)),
> >> +                      TYPE_VECTOR_SUBPARTS (mask_type).to_constant ()))
> >> +       {
> >> +         /* Ensure that the vector bitmasks do not have excess bits.  */
> >> +         int nunits = TYPE_VECTOR_SUBPARTS (mask_type).to_constant ();
> >> +         mask_rtx = expand_binop (TYPE_MODE (mask_type), and_optab, mask_rtx,
> >> +                                  GEN_INT ((HOST_WIDE_INT_1U << nunits) - 1),
> >> +                                  NULL_RTX, true, OPTAB_WIDEN);
> >> +       }
> >> +
> >>         create_input_operand (&ops[opno++], mask_rtx,
> >>                              TYPE_MODE (TREE_TYPE (mask)));
> >>       }
> >> --
> >> 2.41.0
> >>
>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] vect: more oversized bitmask fixups
  2024-03-22  8:43     ` Richard Biener
@ 2024-03-22 14:15       ` Andrew Stubbs
  2024-03-27 12:56         ` Thomas Schwinge
  0 siblings, 1 reply; 6+ messages in thread
From: Andrew Stubbs @ 2024-03-22 14:15 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 826 bytes --]

On 22/03/2024 08:43, Richard Biener wrote:

>>>   I'll note that we don't pass 'val' there and
>>> 'val' is unfortunately
>>> not documented - what's it supposed to be?  I think I placed the original fix in
>>> do_compare_and_jump because we have the full into available there.  So
>>> what's the
>>> do_compare_rtx_and_jump caller that needs fixing as well?  (IMHO keying on 'val'
>>> looks fragile)
>>
>> "val" is the tree expression from which the rtx op0 was expanded. It's
>> optional, but it's used in emit_cmp_and_jump_insns to determine whether
>> the target supports tbranch (according to a comment).
>>
>> I think it would be safe to remove your code as that path does path
>> "treeop0" to "val".
>>
>> WDYT?
> 
> Looks like a bit of a mess, but yes, I think that sounds good.

Thanks, here's what I pushed.

Andrew

[-- Attachment #2: gcc.patch --]
[-- Type: text/x-patch, Size: 4935 bytes --]

vect: more oversized bitmask fixups

These patches fix up a failure in testcase vect/tsvc/vect-tsvc-s278.c when
configured to use V32 instead of V64 (I plan to do this for RDNA devices).

The problem was that a "not" operation on the mask inadvertently enabled
inactive lanes 31-63 and corrupted the output.  The fix is to adjust the mask
when calling internal functions (in this case COND_MINUS), when doing masked
loads and stores, and when doing conditional jumps (some cases were already
handled).

gcc/ChangeLog:

	* dojump.cc (do_compare_rtx_and_jump): Clear excess bits in vector
	bitmasks.
	(do_compare_and_jump): Remove now-redundant similar code.
	* internal-fn.cc (expand_fn_using_insn): Clear excess bits in vector
	bitmasks.
	(add_mask_and_len_args): Likewise.

diff --git a/gcc/dojump.cc b/gcc/dojump.cc
index 88600cb42d3..5f74b696b41 100644
--- a/gcc/dojump.cc
+++ b/gcc/dojump.cc
@@ -1235,6 +1235,24 @@ do_compare_rtx_and_jump (rtx op0, rtx op1, enum rtx_code code, int unsignedp,
 	    }
 	}
 
+      /* For boolean vectors with less than mode precision
+	 make sure to fill padding with consistent values.  */
+      if (val
+	  && VECTOR_BOOLEAN_TYPE_P (TREE_TYPE (val))
+	  && SCALAR_INT_MODE_P (mode))
+	{
+	  auto nunits = TYPE_VECTOR_SUBPARTS (TREE_TYPE (val)).to_constant ();
+	  if (maybe_ne (GET_MODE_PRECISION (mode), nunits))
+	    {
+	      op0 = expand_binop (mode, and_optab, op0,
+				  GEN_INT ((HOST_WIDE_INT_1U << nunits) - 1),
+				  NULL_RTX, true, OPTAB_WIDEN);
+	      op1 = expand_binop (mode, and_optab, op1,
+				  GEN_INT ((HOST_WIDE_INT_1U << nunits) - 1),
+				  NULL_RTX, true, OPTAB_WIDEN);
+	    }
+	}
+
       emit_cmp_and_jump_insns (op0, op1, code, size, mode, unsignedp, val,
 			       if_true_label, prob);
     }
@@ -1266,7 +1284,6 @@ do_compare_and_jump (tree treeop0, tree treeop1, enum rtx_code signed_code,
   machine_mode mode;
   int unsignedp;
   enum rtx_code code;
-  unsigned HOST_WIDE_INT nunits;
 
   /* Don't crash if the comparison was erroneous.  */
   op0 = expand_normal (treeop0);
@@ -1309,21 +1326,6 @@ do_compare_and_jump (tree treeop0, tree treeop1, enum rtx_code signed_code,
       emit_insn (targetm.gen_canonicalize_funcptr_for_compare (new_op1, op1));
       op1 = new_op1;
     }
-  /* For boolean vectors with less than mode precision
-     make sure to fill padding with consistent values.  */
-  else if (VECTOR_BOOLEAN_TYPE_P (type)
-	   && SCALAR_INT_MODE_P (mode)
-	   && TYPE_VECTOR_SUBPARTS (type).is_constant (&nunits)
-	   && maybe_ne (GET_MODE_PRECISION (mode), nunits))
-    {
-      gcc_assert (code == EQ || code == NE);
-      op0 = expand_binop (mode, and_optab, op0,
-			  GEN_INT ((HOST_WIDE_INT_1U << nunits) - 1), NULL_RTX,
-			  true, OPTAB_WIDEN);
-      op1 = expand_binop (mode, and_optab, op1,
-			  GEN_INT ((HOST_WIDE_INT_1U << nunits) - 1), NULL_RTX,
-			  true, OPTAB_WIDEN);
-    }
 
   do_compare_rtx_and_jump (op0, op1, code, unsignedp, treeop0, mode,
 			   ((mode == BLKmode)
diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
index fcf47c7fa12..5269f0ac528 100644
--- a/gcc/internal-fn.cc
+++ b/gcc/internal-fn.cc
@@ -245,6 +245,18 @@ expand_fn_using_insn (gcall *stmt, insn_code icode, unsigned int noutputs,
 	       && SSA_NAME_IS_DEFAULT_DEF (rhs)
 	       && VAR_P (SSA_NAME_VAR (rhs)))
 	create_undefined_input_operand (&ops[opno], TYPE_MODE (rhs_type));
+      else if (VECTOR_BOOLEAN_TYPE_P (rhs_type)
+	       && SCALAR_INT_MODE_P (TYPE_MODE (rhs_type))
+	       && maybe_ne (GET_MODE_PRECISION (TYPE_MODE (rhs_type)),
+			    TYPE_VECTOR_SUBPARTS (rhs_type).to_constant ()))
+	{
+	  /* Ensure that the vector bitmasks do not have excess bits.  */
+	  int nunits = TYPE_VECTOR_SUBPARTS (rhs_type).to_constant ();
+	  rtx tmp = expand_binop (TYPE_MODE (rhs_type), and_optab, rhs_rtx,
+				  GEN_INT ((HOST_WIDE_INT_1U << nunits) - 1),
+				  NULL_RTX, true, OPTAB_WIDEN);
+	  create_input_operand (&ops[opno], tmp, TYPE_MODE (rhs_type));
+	}
       else
 	create_input_operand (&ops[opno], rhs_rtx, TYPE_MODE (rhs_type));
       opno += 1;
@@ -312,6 +324,20 @@ add_mask_and_len_args (expand_operand *ops, unsigned int opno, gcall *stmt)
     {
       tree mask = gimple_call_arg (stmt, mask_index);
       rtx mask_rtx = expand_normal (mask);
+
+      tree mask_type = TREE_TYPE (mask);
+      if (VECTOR_BOOLEAN_TYPE_P (mask_type)
+	  && SCALAR_INT_MODE_P (TYPE_MODE (mask_type))
+	  && maybe_ne (GET_MODE_PRECISION (TYPE_MODE (mask_type)),
+		       TYPE_VECTOR_SUBPARTS (mask_type).to_constant ()))
+	{
+	  /* Ensure that the vector bitmasks do not have excess bits.  */
+	  int nunits = TYPE_VECTOR_SUBPARTS (mask_type).to_constant ();
+	  mask_rtx = expand_binop (TYPE_MODE (mask_type), and_optab, mask_rtx,
+				   GEN_INT ((HOST_WIDE_INT_1U << nunits) - 1),
+				   NULL_RTX, true, OPTAB_WIDEN);
+	}
+
       create_input_operand (&ops[opno++], mask_rtx,
 			    TYPE_MODE (TREE_TYPE (mask)));
     }

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] vect: more oversized bitmask fixups
  2024-03-22 14:15       ` Andrew Stubbs
@ 2024-03-27 12:56         ` Thomas Schwinge
  0 siblings, 0 replies; 6+ messages in thread
From: Thomas Schwinge @ 2024-03-27 12:56 UTC (permalink / raw)
  To: Andrew Stubbs; +Cc: Richard Biener, gcc-patches

Hi!

On 2024-03-22T14:15:36+0000, Andrew Stubbs <ams@baylibre.com> wrote:
> On 22/03/2024 08:43, Richard Biener wrote:
> Thanks, here's what I pushed.

> vect: more oversized bitmask fixups
>
> These patches fix up a failure in testcase vect/tsvc/vect-tsvc-s278.c when
> configured to use V32 instead of V64 (I plan to do this for RDNA devices).

Thanks, confirming that this "vect: more oversized bitmask fixups" does
fix the GCN target '-march=gfx1100' testing regression:

    PASS: gcc.dg/vect/tsvc/vect-tsvc-s278.c (test for excess errors)
    [-PASS:-]{+FAIL:+} gcc.dg/vect/tsvc/vect-tsvc-s278.c execution test
    XPASS: gcc.dg/vect/tsvc/vect-tsvc-s278.c scan-tree-dump vect "vectorized 1 loops"

    PASS: gcc.dg/vect/tsvc/vect-tsvc-s279.c (test for excess errors)
    [-PASS:-]{+FAIL:+} gcc.dg/vect/tsvc/vect-tsvc-s279.c execution test
    XPASS: gcc.dg/vect/tsvc/vect-tsvc-s279.c scan-tree-dump vect "vectorized 1 loops"

... that I saw introduced by "amdgcn: Prefer V32 on RDNA devices".

(The XPASSes are independent of that, pre-existing.)


Grüße
 Thomas


> The problem was that a "not" operation on the mask inadvertently enabled
> inactive lanes 31-63 and corrupted the output.  The fix is to adjust the mask
> when calling internal functions (in this case COND_MINUS), when doing masked
> loads and stores, and when doing conditional jumps (some cases were already
> handled).
>
> gcc/ChangeLog:
>
> 	* dojump.cc (do_compare_rtx_and_jump): Clear excess bits in vector
> 	bitmasks.
> 	(do_compare_and_jump): Remove now-redundant similar code.
> 	* internal-fn.cc (expand_fn_using_insn): Clear excess bits in vector
> 	bitmasks.
> 	(add_mask_and_len_args): Likewise.
>
> diff --git a/gcc/dojump.cc b/gcc/dojump.cc
> index 88600cb42d3..5f74b696b41 100644
> --- a/gcc/dojump.cc
> +++ b/gcc/dojump.cc
> @@ -1235,6 +1235,24 @@ do_compare_rtx_and_jump (rtx op0, rtx op1, enum rtx_code code, int unsignedp,
>  	    }
>  	}
>  
> +      /* For boolean vectors with less than mode precision
> +	 make sure to fill padding with consistent values.  */
> +      if (val
> +	  && VECTOR_BOOLEAN_TYPE_P (TREE_TYPE (val))
> +	  && SCALAR_INT_MODE_P (mode))
> +	{
> +	  auto nunits = TYPE_VECTOR_SUBPARTS (TREE_TYPE (val)).to_constant ();
> +	  if (maybe_ne (GET_MODE_PRECISION (mode), nunits))
> +	    {
> +	      op0 = expand_binop (mode, and_optab, op0,
> +				  GEN_INT ((HOST_WIDE_INT_1U << nunits) - 1),
> +				  NULL_RTX, true, OPTAB_WIDEN);
> +	      op1 = expand_binop (mode, and_optab, op1,
> +				  GEN_INT ((HOST_WIDE_INT_1U << nunits) - 1),
> +				  NULL_RTX, true, OPTAB_WIDEN);
> +	    }
> +	}
> +
>        emit_cmp_and_jump_insns (op0, op1, code, size, mode, unsignedp, val,
>  			       if_true_label, prob);
>      }
> @@ -1266,7 +1284,6 @@ do_compare_and_jump (tree treeop0, tree treeop1, enum rtx_code signed_code,
>    machine_mode mode;
>    int unsignedp;
>    enum rtx_code code;
> -  unsigned HOST_WIDE_INT nunits;
>  
>    /* Don't crash if the comparison was erroneous.  */
>    op0 = expand_normal (treeop0);
> @@ -1309,21 +1326,6 @@ do_compare_and_jump (tree treeop0, tree treeop1, enum rtx_code signed_code,
>        emit_insn (targetm.gen_canonicalize_funcptr_for_compare (new_op1, op1));
>        op1 = new_op1;
>      }
> -  /* For boolean vectors with less than mode precision
> -     make sure to fill padding with consistent values.  */
> -  else if (VECTOR_BOOLEAN_TYPE_P (type)
> -	   && SCALAR_INT_MODE_P (mode)
> -	   && TYPE_VECTOR_SUBPARTS (type).is_constant (&nunits)
> -	   && maybe_ne (GET_MODE_PRECISION (mode), nunits))
> -    {
> -      gcc_assert (code == EQ || code == NE);
> -      op0 = expand_binop (mode, and_optab, op0,
> -			  GEN_INT ((HOST_WIDE_INT_1U << nunits) - 1), NULL_RTX,
> -			  true, OPTAB_WIDEN);
> -      op1 = expand_binop (mode, and_optab, op1,
> -			  GEN_INT ((HOST_WIDE_INT_1U << nunits) - 1), NULL_RTX,
> -			  true, OPTAB_WIDEN);
> -    }
>  
>    do_compare_rtx_and_jump (op0, op1, code, unsignedp, treeop0, mode,
>  			   ((mode == BLKmode)
> diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
> index fcf47c7fa12..5269f0ac528 100644
> --- a/gcc/internal-fn.cc
> +++ b/gcc/internal-fn.cc
> @@ -245,6 +245,18 @@ expand_fn_using_insn (gcall *stmt, insn_code icode, unsigned int noutputs,
>  	       && SSA_NAME_IS_DEFAULT_DEF (rhs)
>  	       && VAR_P (SSA_NAME_VAR (rhs)))
>  	create_undefined_input_operand (&ops[opno], TYPE_MODE (rhs_type));
> +      else if (VECTOR_BOOLEAN_TYPE_P (rhs_type)
> +	       && SCALAR_INT_MODE_P (TYPE_MODE (rhs_type))
> +	       && maybe_ne (GET_MODE_PRECISION (TYPE_MODE (rhs_type)),
> +			    TYPE_VECTOR_SUBPARTS (rhs_type).to_constant ()))
> +	{
> +	  /* Ensure that the vector bitmasks do not have excess bits.  */
> +	  int nunits = TYPE_VECTOR_SUBPARTS (rhs_type).to_constant ();
> +	  rtx tmp = expand_binop (TYPE_MODE (rhs_type), and_optab, rhs_rtx,
> +				  GEN_INT ((HOST_WIDE_INT_1U << nunits) - 1),
> +				  NULL_RTX, true, OPTAB_WIDEN);
> +	  create_input_operand (&ops[opno], tmp, TYPE_MODE (rhs_type));
> +	}
>        else
>  	create_input_operand (&ops[opno], rhs_rtx, TYPE_MODE (rhs_type));
>        opno += 1;
> @@ -312,6 +324,20 @@ add_mask_and_len_args (expand_operand *ops, unsigned int opno, gcall *stmt)
>      {
>        tree mask = gimple_call_arg (stmt, mask_index);
>        rtx mask_rtx = expand_normal (mask);
> +
> +      tree mask_type = TREE_TYPE (mask);
> +      if (VECTOR_BOOLEAN_TYPE_P (mask_type)
> +	  && SCALAR_INT_MODE_P (TYPE_MODE (mask_type))
> +	  && maybe_ne (GET_MODE_PRECISION (TYPE_MODE (mask_type)),
> +		       TYPE_VECTOR_SUBPARTS (mask_type).to_constant ()))
> +	{
> +	  /* Ensure that the vector bitmasks do not have excess bits.  */
> +	  int nunits = TYPE_VECTOR_SUBPARTS (mask_type).to_constant ();
> +	  mask_rtx = expand_binop (TYPE_MODE (mask_type), and_optab, mask_rtx,
> +				   GEN_INT ((HOST_WIDE_INT_1U << nunits) - 1),
> +				   NULL_RTX, true, OPTAB_WIDEN);
> +	}
> +
>        create_input_operand (&ops[opno++], mask_rtx,
>  			    TYPE_MODE (TREE_TYPE (mask)));
>      }

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2024-03-27 12:56 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-03-21 14:22 [PATCH] vect: more oversized bitmask fixups Andrew Stubbs
2024-03-21 15:18 ` Richard Biener
2024-03-21 16:07   ` Andrew Stubbs
2024-03-22  8:43     ` Richard Biener
2024-03-22 14:15       ` Andrew Stubbs
2024-03-27 12:56         ` Thomas Schwinge

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).