public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH] vect: Don't set excess bits in unform masks
@ 2023-10-20 15:48 Andrew Stubbs
  2023-10-23 10:43 ` Richard Biener
  0 siblings, 1 reply; 4+ messages in thread
From: Andrew Stubbs @ 2023-10-20 15:48 UTC (permalink / raw)
  To: gcc-patches, Richard Biener

[-- Attachment #1: Type: text/plain, Size: 977 bytes --]

This patch fixes a wrong-code bug on amdgcn in which the excess "ones" 
in the mask enable extra lanes that were supposed to be unused and are 
therefore undefined.

Richi suggested an alternative approach involving narrower types and 
then a zero-extend to the actual mask type.  This solved the problem for 
the specific test case that I had, but I'm not sure if it would work 
with V2 and V4 modes (not that I've observed bad behaviour from them 
anyway, but still).  There were some other caveats involving "two-lane 
division" that I don't fully understand, so I went with the simpler 
implementation.

This patch does have the disadvantage of an additional "and" instruction 
in the non-constant case even for machines that don't need it. I'm not 
sure how to fix that without an additional target hook. (If GCC could 
use the 64-lane vectors more effectively without the assistance of 
artificially reduced sizes then this problem wouldn't exist.)

OK to commit?

Andrew

[-- Attachment #2: 231020-uniform-masks.patch --]
[-- Type: text/plain, Size: 1487 bytes --]

vect: Don't set excess bits in unform masks

AVX ignores any excess bits in the mask, but AMD GCN magically uses a larger
vector than was intended (the smaller sizes are "fake"), leading to wrong-code.

gcc/ChangeLog:

	* expr.cc (store_constructor): Add "and" operation to uniform mask
	generation.

diff --git a/gcc/expr.cc b/gcc/expr.cc
index 4220cbd9f8f..fb4609f616e 100644
--- a/gcc/expr.cc
+++ b/gcc/expr.cc
@@ -7440,7 +7440,7 @@ store_constructor (tree exp, rtx target, int cleared, poly_int64 size,
 	    break;
 	  }
 	/* Use sign-extension for uniform boolean vectors with
-	   integer modes.  */
+	   integer modes.  Effectively "vec_duplicate" for bitmasks.  */
 	if (!TREE_SIDE_EFFECTS (exp)
 	    && VECTOR_BOOLEAN_TYPE_P (type)
 	    && SCALAR_INT_MODE_P (mode)
@@ -7449,7 +7449,21 @@ store_constructor (tree exp, rtx target, int cleared, poly_int64 size,
 	  {
 	    rtx op0 = force_reg (TYPE_MODE (TREE_TYPE (elt)),
 				 expand_normal (elt));
-	    convert_move (target, op0, 0);
+	    rtx tmp = gen_reg_rtx (mode);
+	    convert_move (tmp, op0, 0);
+
+	    if (known_ne (TYPE_VECTOR_SUBPARTS (type),
+			  GET_MODE_PRECISION (mode)))
+	      {
+		/* Ensure no excess bits are set.
+		   GCN needs this, AVX does not.  */
+		expand_binop (mode, and_optab, tmp,
+			      GEN_INT ((1 << (TYPE_VECTOR_SUBPARTS (type)
+					      .to_constant())) - 1),
+			      target, true, OPTAB_DIRECT);
+	      }
+	    else
+	      emit_move_insn (target, tmp);
 	    break;
 	  }
 

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] vect: Don't set excess bits in unform masks
  2023-10-20 15:48 [PATCH] vect: Don't set excess bits in unform masks Andrew Stubbs
@ 2023-10-23 10:43 ` Richard Biener
  2023-11-10 11:45   ` Andrew Stubbs
  0 siblings, 1 reply; 4+ messages in thread
From: Richard Biener @ 2023-10-23 10:43 UTC (permalink / raw)
  To: Andrew Stubbs; +Cc: gcc-patches

On Fri, 20 Oct 2023, Andrew Stubbs wrote:

> This patch fixes a wrong-code bug on amdgcn in which the excess "ones" in the
> mask enable extra lanes that were supposed to be unused and are therefore
> undefined.
> 
> Richi suggested an alternative approach involving narrower types and then a
> zero-extend to the actual mask type.  This solved the problem for the specific
> test case that I had, but I'm not sure if it would work with V2 and V4 modes
> (not that I've observed bad behaviour from them anyway, but still).  There
> were some other caveats involving "two-lane division" that I don't fully
> understand, so I went with the simpler implementation.
> 
> This patch does have the disadvantage of an additional "and" instruction in
> the non-constant case even for machines that don't need it. I'm not sure how
> to fix that without an additional target hook. (If GCC could use the 64-lane
> vectors more effectively without the assistance of artificially reduced sizes
> then this problem wouldn't exist.)
> 
> OK to commit?

-           convert_move (target, op0, 0);
+           rtx tmp = gen_reg_rtx (mode);
+           convert_move (tmp, op0, 0);
+
+           if (known_ne (TYPE_VECTOR_SUBPARTS (type),
+                         GET_MODE_PRECISION (mode)))

Usually this would be maybe_ne, but then ...

+             {
+               /* Ensure no excess bits are set.
+                  GCN needs this, AVX does not.  */
+               expand_binop (mode, and_optab, tmp,
+                             GEN_INT ((1 << (TYPE_VECTOR_SUBPARTS (type)
+                                             .to_constant())) - 1),
+                             target, true, OPTAB_DIRECT);

here you have .to_constant ().  I think with having an integer mode
we know subparts is constant so I'd prefer

            auto nunits = TYPE_VECTOR_SUBPARTS (type).to_constant ();
            if (maybe_ne (GET_MODE_PRECISION (mode), nunits)
...

+             }
+           else
+             emit_move_insn (target, tmp);

note you need the emit_move_insn also for the expand_binop
path since it's not guaranteed that 'target' is used there.  Thus

  tmp = expand_binop (...)
  if (tmp != target)
    emit_move_insn (...)

Otherwise looks good to me.  The and is needed on x86 for
two and four bit masks, it would be more efficient to use
smaller modes for the sign-extension I guess.

Thanks,
Richard.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] vect: Don't set excess bits in unform masks
  2023-10-23 10:43 ` Richard Biener
@ 2023-11-10 11:45   ` Andrew Stubbs
  2023-11-10 11:53     ` Richard Biener
  0 siblings, 1 reply; 4+ messages in thread
From: Andrew Stubbs @ 2023-11-10 11:45 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 2635 bytes --]

On 23/10/2023 11:43, Richard Biener wrote:
> On Fri, 20 Oct 2023, Andrew Stubbs wrote:
> 
>> This patch fixes a wrong-code bug on amdgcn in which the excess "ones" in the
>> mask enable extra lanes that were supposed to be unused and are therefore
>> undefined.
>>
>> Richi suggested an alternative approach involving narrower types and then a
>> zero-extend to the actual mask type.  This solved the problem for the specific
>> test case that I had, but I'm not sure if it would work with V2 and V4 modes
>> (not that I've observed bad behaviour from them anyway, but still).  There
>> were some other caveats involving "two-lane division" that I don't fully
>> understand, so I went with the simpler implementation.
>>
>> This patch does have the disadvantage of an additional "and" instruction in
>> the non-constant case even for machines that don't need it. I'm not sure how
>> to fix that without an additional target hook. (If GCC could use the 64-lane
>> vectors more effectively without the assistance of artificially reduced sizes
>> then this problem wouldn't exist.)
>>
>> OK to commit?
> 
> -           convert_move (target, op0, 0);
> +           rtx tmp = gen_reg_rtx (mode);
> +           convert_move (tmp, op0, 0);
> +
> +           if (known_ne (TYPE_VECTOR_SUBPARTS (type),
> +                         GET_MODE_PRECISION (mode)))
> 
> Usually this would be maybe_ne, but then ...
> 
> +             {
> +               /* Ensure no excess bits are set.
> +                  GCN needs this, AVX does not.  */
> +               expand_binop (mode, and_optab, tmp,
> +                             GEN_INT ((1 << (TYPE_VECTOR_SUBPARTS (type)
> +                                             .to_constant())) - 1),
> +                             target, true, OPTAB_DIRECT);
> 
> here you have .to_constant ().  I think with having an integer mode
> we know subparts is constant so I'd prefer
> 
>              auto nunits = TYPE_VECTOR_SUBPARTS (type).to_constant ();
>              if (maybe_ne (GET_MODE_PRECISION (mode), nunits)
> ...
> 
> +             }
> +           else
> +             emit_move_insn (target, tmp);
> 
> note you need the emit_move_insn also for the expand_binop
> path since it's not guaranteed that 'target' is used there.  Thus
> 
>    tmp = expand_binop (...)
>    if (tmp != target)
>      emit_move_insn (...)
> 
> Otherwise looks good to me.  The and is needed on x86 for
> two and four bit masks, it would be more efficient to use
> smaller modes for the sign-extension I guess.

I think this patch addresses these issues. I've confirmed it does the 
right thing on amdgcn.

OK?

Andrew

[-- Attachment #2: 231110-uniform-masks.patch --]
[-- Type: text/plain, Size: 1720 bytes --]

vect: Don't set excess bits in unform masks

AVX ignores any excess bits in the mask (at least for vector sizes >=8), but
AMD GCN magically uses a larger vector than was intended (the smaller sizes are
"fake"), leading to wrong-code.

This patch fixes amdgcn execution failures in gcc.dg/vect/pr81740-1.c,
gfortran.dg/c-interop/contiguous-1.f90,
gfortran.dg/c-interop/ff-descriptor-7.f90, and others.

gcc/ChangeLog:

	* expr.cc (store_constructor): Add "and" operation to uniform mask
	generation.

diff --git a/gcc/expr.cc b/gcc/expr.cc
index ed4dbb13d89..3e2a678710d 100644
--- a/gcc/expr.cc
+++ b/gcc/expr.cc
@@ -7470,7 +7470,7 @@ store_constructor (tree exp, rtx target, int cleared, poly_int64 size,
 	    break;
 	  }
 	/* Use sign-extension for uniform boolean vectors with
-	   integer modes.  */
+	   integer modes.  Effectively "vec_duplicate" for bitmasks.  */
 	if (!TREE_SIDE_EFFECTS (exp)
 	    && VECTOR_BOOLEAN_TYPE_P (type)
 	    && SCALAR_INT_MODE_P (mode)
@@ -7479,7 +7479,19 @@ store_constructor (tree exp, rtx target, int cleared, poly_int64 size,
 	  {
 	    rtx op0 = force_reg (TYPE_MODE (TREE_TYPE (elt)),
 				 expand_normal (elt));
-	    convert_move (target, op0, 0);
+	    rtx tmp = gen_reg_rtx (mode);
+	    convert_move (tmp, op0, 0);
+
+	    /* Ensure no excess bits are set.
+	       GCN needs this for nunits < 64.
+	       x86 needs this for nunits < 8.  */
+	    auto nunits = TYPE_VECTOR_SUBPARTS (type).to_constant ();
+	    if (maybe_ne (GET_MODE_PRECISION (mode), nunits))
+	      tmp = expand_binop (mode, and_optab, tmp,
+				  GEN_INT ((1 << nunits) - 1), target,
+				  true, OPTAB_DIRECT);
+	    if (tmp != target)
+	      emit_move_insn (target, tmp);
 	    break;
 	  }
 

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] vect: Don't set excess bits in unform masks
  2023-11-10 11:45   ` Andrew Stubbs
@ 2023-11-10 11:53     ` Richard Biener
  0 siblings, 0 replies; 4+ messages in thread
From: Richard Biener @ 2023-11-10 11:53 UTC (permalink / raw)
  To: Andrew Stubbs; +Cc: gcc-patches

On Fri, 10 Nov 2023, Andrew Stubbs wrote:

> On 23/10/2023 11:43, Richard Biener wrote:
> > On Fri, 20 Oct 2023, Andrew Stubbs wrote:
> > 
> >> This patch fixes a wrong-code bug on amdgcn in which the excess "ones" in
> >> the
> >> mask enable extra lanes that were supposed to be unused and are therefore
> >> undefined.
> >>
> >> Richi suggested an alternative approach involving narrower types and then a
> >> zero-extend to the actual mask type.  This solved the problem for the
> >> specific
> >> test case that I had, but I'm not sure if it would work with V2 and V4
> >> modes
> >> (not that I've observed bad behaviour from them anyway, but still).  There
> >> were some other caveats involving "two-lane division" that I don't fully
> >> understand, so I went with the simpler implementation.
> >>
> >> This patch does have the disadvantage of an additional "and" instruction in
> >> the non-constant case even for machines that don't need it. I'm not sure
> >> how
> >> to fix that without an additional target hook. (If GCC could use the
> >> 64-lane
> >> vectors more effectively without the assistance of artificially reduced
> >> sizes
> >> then this problem wouldn't exist.)
> >>
> >> OK to commit?
> > 
> > -           convert_move (target, op0, 0);
> > +           rtx tmp = gen_reg_rtx (mode);
> > +           convert_move (tmp, op0, 0);
> > +
> > +           if (known_ne (TYPE_VECTOR_SUBPARTS (type),
> > +                         GET_MODE_PRECISION (mode)))
> > 
> > Usually this would be maybe_ne, but then ...
> > 
> > +             {
> > +               /* Ensure no excess bits are set.
> > +                  GCN needs this, AVX does not.  */
> > +               expand_binop (mode, and_optab, tmp,
> > +                             GEN_INT ((1 << (TYPE_VECTOR_SUBPARTS (type)
> > +                                             .to_constant())) - 1),
> > +                             target, true, OPTAB_DIRECT);
> > 
> > here you have .to_constant ().  I think with having an integer mode
> > we know subparts is constant so I'd prefer
> > 
> >              auto nunits = TYPE_VECTOR_SUBPARTS (type).to_constant ();
> >              if (maybe_ne (GET_MODE_PRECISION (mode), nunits)
> > ...
> > 
> > +             }
> > +           else
> > +             emit_move_insn (target, tmp);
> > 
> > note you need the emit_move_insn also for the expand_binop
> > path since it's not guaranteed that 'target' is used there.  Thus
> > 
> >    tmp = expand_binop (...)
> >    if (tmp != target)
> >      emit_move_insn (...)
> > 
> > Otherwise looks good to me.  The and is needed on x86 for
> > two and four bit masks, it would be more efficient to use
> > smaller modes for the sign-extension I guess.
> 
> I think this patch addresses these issues. I've confirmed it does the right
> thing on amdgcn.
> 
> OK?

OK.

thanks,
Richard.

> Andrew
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2023-11-10 11:53 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-10-20 15:48 [PATCH] vect: Don't set excess bits in unform masks Andrew Stubbs
2023-10-23 10:43 ` Richard Biener
2023-11-10 11:45   ` Andrew Stubbs
2023-11-10 11:53     ` Richard Biener

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).