[PATCH] middle-end/113576 - avoid out-of-bound vector element access

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [PATCH] middle-end/113576 - avoid out-of-bound vector element access
@ 2024-02-06 11:37 Richard Biener
  0 siblings, 0 replies; 8+ messages in thread
From: Richard Biener @ 2024-02-06 11:37 UTC (permalink / raw)
  To: gcc-patches; +Cc: richard.sandiford

The following avoids accessing out-of-bound vector elements when
native encoding a boolean vector with sub-BITS_PER_UNIT precision
elements.  The error was basing the number of elements to extract
on the rounded up total byte size involved and the patch bases
everything on the total number of elements to extract instead.

As a side-effect this now consistently results in zeros in the
padding of the last encoded byte which also avoids the failure
mode seen in PR113576.

Bootstrapped and tested on x86_64-unknown-linux-gnu.

OK?

Thanks,
Richard.

	PR middle-end/113576
	* fold-const.cc (native_encode_vector_part): Avoid accessing
	out-of-bound elements.
---
 gcc/fold-const.cc | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
index 80e211e18c0..8638757312b 100644
--- a/gcc/fold-const.cc
+++ b/gcc/fold-const.cc
@@ -8057,13 +8057,13 @@ native_encode_vector_part (const_tree expr, unsigned char *ptr, int len,
 	off = 0;
 
       /* Zero the buffer and then set bits later where necessary.  */
-      int extract_bytes = MIN (len, total_bytes - off);
+      unsigned elts_per_byte = BITS_PER_UNIT / elt_bits;
+      unsigned first_elt = off * elts_per_byte;
+      unsigned extract_elts = MIN (len * elts_per_byte, count - first_elt);
+      unsigned extract_bytes = CEIL (elt_bits * extract_elts, BITS_PER_UNIT);
       if (ptr)
 	memset (ptr, 0, extract_bytes);
 
-      unsigned int elts_per_byte = BITS_PER_UNIT / elt_bits;
-      unsigned int first_elt = off * elts_per_byte;
-      unsigned int extract_elts = extract_bytes * elts_per_byte;
       for (unsigned int i = 0; i < extract_elts; ++i)
 	{
 	  tree elt = VECTOR_CST_ELT (expr, first_elt + i);
-- 
2.35.3

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] middle-end/113576 - avoid out-of-bound vector element access
       [not found] <20240206113813.26DCA3858D37@sourceware.org>
@ 2024-03-05 14:39 ` Jeff Law
  0 siblings, 0 replies; 8+ messages in thread
From: Jeff Law @ 2024-03-05 14:39 UTC (permalink / raw)
  To: Richard Biener, gcc-patches; +Cc: richard.sandiford



On 2/6/24 04:37, Richard Biener wrote:
> The following avoids accessing out-of-bound vector elements when
> native encoding a boolean vector with sub-BITS_PER_UNIT precision
> elements.  The error was basing the number of elements to extract
> on the rounded up total byte size involved and the patch bases
> everything on the total number of elements to extract instead.
> 
> As a side-effect this now consistently results in zeros in the
> padding of the last encoded byte which also avoids the failure
> mode seen in PR113576.
> 
> Bootstrapped and tested on x86_64-unknown-linux-gnu.
> 
> OK?
> 
> Thanks,
> Richard.
> 
> 	PR middle-end/113576
> 	* fold-const.cc (native_encode_vector_part): Avoid accessing
> 	out-of-bound elements.
OK.
jeff


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] middle-end/113576 - avoid out-of-bound vector element access
  2024-02-15  9:46         ` Richard Sandiford
@ 2024-02-15 10:12           ` Richard Biener
  0 siblings, 0 replies; 8+ messages in thread
From: Richard Biener @ 2024-02-15 10:12 UTC (permalink / raw)
  To: Richard Sandiford; +Cc: gcc-patches

On Thu, 15 Feb 2024, Richard Sandiford wrote:

> Richard Biener <rguenther@suse.de> writes:
> > On Wed, 14 Feb 2024, Richard Sandiford wrote:
> >
> >> Richard Biener <rguenther@suse.de> writes:
> >> > On Wed, 14 Feb 2024, Richard Sandiford wrote:
> >> >
> >> >> Richard Biener <rguenther@suse.de> writes:
> >> >> > The following avoids accessing out-of-bound vector elements when
> >> >> > native encoding a boolean vector with sub-BITS_PER_UNIT precision
> >> >> > elements.  The error was basing the number of elements to extract
> >> >> > on the rounded up total byte size involved and the patch bases
> >> >> > everything on the total number of elements to extract instead.
> >> >> 
> >> >> It's too long ago to be certain, but I think this was a deliberate choice.
> >> >> The point of the new vector constant encoding is that it can give an
> >> >> allegedly sensible value for any given index, even out-of-range ones.
> >> >> 
> >> >> Since the padding bits are undefined, we should in principle have a free
> >> >> choice of what to use.  And for VLA, it's often better to continue the
> >> >> existing pattern rather than force to zero.
> >> >> 
> >> >> I don't strongly object to changing it.  I think we should be careful
> >> >> about relying on zeroing for correctness though.  The bits are in principle
> >> >> undefined and we can't rely on reading zeros from equivalent memory or
> >> >> register values.
> >> >
> >> > The main motivation for a change here is to allow catching out-of-bound
> >> > indices again for VECTOR_CST_ELT, at least for constant nunits because
> >> > it might be a programming error like fat-fingering the index.  I do
> >> > think it's a regression that we no longer catch those.
> >> >
> >> > It's probably also a bit non-obvious how an encoding continues and
> >> > there might be DImode masks that can be represented by a 
> >> > zero-extended QImode immediate but "continued" it would require
> >> > a larger immediate.
> >> >
> >> > The change also effectively only changes something for 1 byte
> >> > encodings since nunits is a power of two and so is the element
> >> > size in bits.
> >> 
> >> Yeah, but even there, there's an argument that all-1s (0xff) is a more
> >> obvious value for an all-1s mask.
> >> 
> >> > A patch restoring the VECTOR_CST_ELT checking might be the
> >> > following
> >> >
> >> > diff --git a/gcc/tree.cc b/gcc/tree.cc
> >> > index 046a558d1b0..4c9b05167fd 100644
> >> > --- a/gcc/tree.cc
> >> > +++ b/gcc/tree.cc
> >> > @@ -10325,6 +10325,9 @@ vector_cst_elt (const_tree t, unsigned int i)
> >> >    if (i < encoded_nelts)
> >> >      return VECTOR_CST_ENCODED_ELT (t, i);
> >> >  
> >> > +  /* Catch out-of-bound element accesses.  */
> >> > +  gcc_checking_assert (maybe_gt (VECTOR_CST_NELTS (t), i));
> >> > +
> >> >    /* If there are no steps, the final encoded value is the right one.  */
> >> >    if (!VECTOR_CST_STEPPED_P (t))
> >> >      {
> >> >
> >> > but it triggers quite a bit via const_binop for, for example
> >> >
> >> > #2  0x00000000011c1506 in const_binop (code=PLUS_EXPR, 
> >> >     arg1=<vector_cst 0x7ffff6f40b40>, arg2=<vector_cst 0x7ffff6e731e0>)
> >> > (gdb) p debug_generic_expr (arg1)
> >> > { 12, 13, 14, 15 }
> >> > $5 = void
> >> > (gdb) p debug_generic_expr (arg2)
> >> > { -2, -2, -2, -3 }
> >> > (gdb) p count
> >> > $4 = 6
> >> > (gdb) l
> >> > 1711          if (!elts.new_binary_operation (type, arg1, arg2, 
> >> > step_ok_p))
> >> > 1712            return NULL_TREE;
> >> > 1713          unsigned int count = elts.encoded_nelts ();
> >> > 1714          for (unsigned int i = 0; i < count; ++i)
> >> > 1715            {
> >> > 1716              tree elem1 = VECTOR_CST_ELT (arg1, i);
> >> > 1717              tree elem2 = VECTOR_CST_ELT (arg2, i);
> >> > 1718
> >> > 1719              tree elt = const_binop (code, elem1, elem2);
> >> >
> >> > this seems like an error to me - why would we, for fixed-size
> >> > vectors and for PLUS ever create a vector encoding with 6 elements?!
> >> > That seems at least inefficient to me?
> >> 
> >> It's a case of picking your poison.  On the other side, operating
> >> individually on each element of a V64QI is inefficient when the
> >> representation says up-front that all elements are equal.
> >
> > True, though I wonder why for VLS vectors new_binary_operation
> > doesn't cap the number of encoded elts on the fixed vector size,
> > like doing
> >
> >   encoded_elts = ordered_min (TYPE_VECTOR_SUBPARTS (..), encoded_elts);
> >
> > or if there's no good way to write it applying for both VLA and VLS
> > do it only when TYPE_VECTOR_SUBPARTS is constant.
> 
> ordered_min can't be used because there's no guarantee that encoded_elts
> and TYPE_VECTOR_SUBPARTS are well-ordered for the VLA case.  E.g.
> for a stepped (3-element) encoding and a length of 2+2X, the stepped
> encoding is longer for X==0 and the vector is longer for X>0.
> 
> But yeah, in general, trying to enforce this for VLS would probably
> lead to a proliferation of more "if VLA do one thing, if VLS do some
> other thing".  The aim was to avoid that where it didn't seem strictly
> necessary.
> 
> >> Fundemantally, operations on VLA vectors are treated as functions
> >> that map patterns to patterns.  The number of elements that are
> >> consumed isn't really relevant to the function itself.  The VLA
> >> folders therefore rely on being to read an element from a pattern
> >> even if the index is outside TREE_VECTOR_SUBPARTS.
> >> 
> >> There were two reasons for using VLA paths for VLS vectors.
> >> One I mentioned above: it saves time when all elements are equal,
> >> or have a similarly compact representation.  The other is that it
> >> makes VLA less special and ensures that the code gets more testing.
> >> 
> >> Maybe one compromise between that and the assert would be:
> >> 
> >> (1) enforce the assert only for VLS and
> >
> > that's what I did by using maybe_gt?
> >
> >> (2) add new checks to ensure that a VLA-friendly operation will never
> >>     read out-of-bounds for VLS vectors
> >> 
> >> But I think this would be awkward.  E.g. we now build reversed vectors
> >> as a 3-element series N-1, N-2, N-3.  It would be nice not to have
> >> to special-case N==2 by suppressing N-3.  And the condition for (2)
> >> might not always be obvious.
> >
> > If we know only N <= 2 is problematic then having must_eq ()
> > special cases there makes sense IMO.
> 
> I'm not sure it's just that.  E.g. a zip is { 0, N, 1, N+1, 2, N+2, ... },
> which would need to be curtailed for N==2 or N==4.
> 
> >> Another option would be to have special accessors that are allowed
> >> to read out of bounds, and add the assert (for both VLA & VLS) to
> >> VECTOR_CST_ELT.  It might take a while to root out all the places that
> >> need to change though.
> >
> > Yes, that's another possibility.
> >
> > The point is (and we're seeing it now with the behavior of
> > encoding QImode bitmasks) that the behavior isn't entirely
> > obvious when VLS vectors are involved.  There's a memset zero
> > of the QImode so anything other than zero-padding is odd.
> 
> Which memset do you mean?  The one in:
> 
>       /* Zero the buffer and then set bits later where necessary.  */
>       int extract_bytes = MIN (len, total_bytes - off);
>       if (ptr)
> 	memset (ptr, 0, extract_bytes);
> 
> ?  If so, that's there for elt_bits>1.  It means that the canonical
> form of a 2-bit active mask element is 01 rather than 11.  (And that's
> something that might end up needing to be target-configurable.  The current
> behaviour is guided by SVE, although it shouldn't matter much in practice
> there.)
> 
> > It's also not clear to me we'd get reliable sign extension
> > (we probably don't?).
> 
> For which case?  The idea isn't that we always sign-extend or always
> zero-extend, but that we repeat the pattern.  So an all-1s mask would
> give an all-1s QImode.  A 1010 mask would give a 10101010 QI, etc.

But wouldn't 0 1 2 get you 0 1 2 3 4 5?  (not for bools of course)

Anyway, for QImode the efficiency argument is likely moot unless
some targets only have say 4 bit immediates, but even if they
probably either always sign or zero extend rather than pad out
the 4 bits to repeated other 4 bits ... so "repeating" isn't the
natural thing to do, it's zero- or sign-extending into padding.

Richard.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] middle-end/113576 - avoid out-of-bound vector element access
  2024-02-15  8:12       ` Richard Biener
@ 2024-02-15  9:46         ` Richard Sandiford
  2024-02-15 10:12           ` Richard Biener
  0 siblings, 1 reply; 8+ messages in thread
From: Richard Sandiford @ 2024-02-15  9:46 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches

Richard Biener <rguenther@suse.de> writes:
> On Wed, 14 Feb 2024, Richard Sandiford wrote:
>
>> Richard Biener <rguenther@suse.de> writes:
>> > On Wed, 14 Feb 2024, Richard Sandiford wrote:
>> >
>> >> Richard Biener <rguenther@suse.de> writes:
>> >> > The following avoids accessing out-of-bound vector elements when
>> >> > native encoding a boolean vector with sub-BITS_PER_UNIT precision
>> >> > elements.  The error was basing the number of elements to extract
>> >> > on the rounded up total byte size involved and the patch bases
>> >> > everything on the total number of elements to extract instead.
>> >> 
>> >> It's too long ago to be certain, but I think this was a deliberate choice.
>> >> The point of the new vector constant encoding is that it can give an
>> >> allegedly sensible value for any given index, even out-of-range ones.
>> >> 
>> >> Since the padding bits are undefined, we should in principle have a free
>> >> choice of what to use.  And for VLA, it's often better to continue the
>> >> existing pattern rather than force to zero.
>> >> 
>> >> I don't strongly object to changing it.  I think we should be careful
>> >> about relying on zeroing for correctness though.  The bits are in principle
>> >> undefined and we can't rely on reading zeros from equivalent memory or
>> >> register values.
>> >
>> > The main motivation for a change here is to allow catching out-of-bound
>> > indices again for VECTOR_CST_ELT, at least for constant nunits because
>> > it might be a programming error like fat-fingering the index.  I do
>> > think it's a regression that we no longer catch those.
>> >
>> > It's probably also a bit non-obvious how an encoding continues and
>> > there might be DImode masks that can be represented by a 
>> > zero-extended QImode immediate but "continued" it would require
>> > a larger immediate.
>> >
>> > The change also effectively only changes something for 1 byte
>> > encodings since nunits is a power of two and so is the element
>> > size in bits.
>> 
>> Yeah, but even there, there's an argument that all-1s (0xff) is a more
>> obvious value for an all-1s mask.
>> 
>> > A patch restoring the VECTOR_CST_ELT checking might be the
>> > following
>> >
>> > diff --git a/gcc/tree.cc b/gcc/tree.cc
>> > index 046a558d1b0..4c9b05167fd 100644
>> > --- a/gcc/tree.cc
>> > +++ b/gcc/tree.cc
>> > @@ -10325,6 +10325,9 @@ vector_cst_elt (const_tree t, unsigned int i)
>> >    if (i < encoded_nelts)
>> >      return VECTOR_CST_ENCODED_ELT (t, i);
>> >  
>> > +  /* Catch out-of-bound element accesses.  */
>> > +  gcc_checking_assert (maybe_gt (VECTOR_CST_NELTS (t), i));
>> > +
>> >    /* If there are no steps, the final encoded value is the right one.  */
>> >    if (!VECTOR_CST_STEPPED_P (t))
>> >      {
>> >
>> > but it triggers quite a bit via const_binop for, for example
>> >
>> > #2  0x00000000011c1506 in const_binop (code=PLUS_EXPR, 
>> >     arg1=<vector_cst 0x7ffff6f40b40>, arg2=<vector_cst 0x7ffff6e731e0>)
>> > (gdb) p debug_generic_expr (arg1)
>> > { 12, 13, 14, 15 }
>> > $5 = void
>> > (gdb) p debug_generic_expr (arg2)
>> > { -2, -2, -2, -3 }
>> > (gdb) p count
>> > $4 = 6
>> > (gdb) l
>> > 1711          if (!elts.new_binary_operation (type, arg1, arg2, 
>> > step_ok_p))
>> > 1712            return NULL_TREE;
>> > 1713          unsigned int count = elts.encoded_nelts ();
>> > 1714          for (unsigned int i = 0; i < count; ++i)
>> > 1715            {
>> > 1716              tree elem1 = VECTOR_CST_ELT (arg1, i);
>> > 1717              tree elem2 = VECTOR_CST_ELT (arg2, i);
>> > 1718
>> > 1719              tree elt = const_binop (code, elem1, elem2);
>> >
>> > this seems like an error to me - why would we, for fixed-size
>> > vectors and for PLUS ever create a vector encoding with 6 elements?!
>> > That seems at least inefficient to me?
>> 
>> It's a case of picking your poison.  On the other side, operating
>> individually on each element of a V64QI is inefficient when the
>> representation says up-front that all elements are equal.
>
> True, though I wonder why for VLS vectors new_binary_operation
> doesn't cap the number of encoded elts on the fixed vector size,
> like doing
>
>   encoded_elts = ordered_min (TYPE_VECTOR_SUBPARTS (..), encoded_elts);
>
> or if there's no good way to write it applying for both VLA and VLS
> do it only when TYPE_VECTOR_SUBPARTS is constant.

ordered_min can't be used because there's no guarantee that encoded_elts
and TYPE_VECTOR_SUBPARTS are well-ordered for the VLA case.  E.g.
for a stepped (3-element) encoding and a length of 2+2X, the stepped
encoding is longer for X==0 and the vector is longer for X>0.

But yeah, in general, trying to enforce this for VLS would probably
lead to a proliferation of more "if VLA do one thing, if VLS do some
other thing".  The aim was to avoid that where it didn't seem strictly
necessary.

>> Fundemantally, operations on VLA vectors are treated as functions
>> that map patterns to patterns.  The number of elements that are
>> consumed isn't really relevant to the function itself.  The VLA
>> folders therefore rely on being to read an element from a pattern
>> even if the index is outside TREE_VECTOR_SUBPARTS.
>> 
>> There were two reasons for using VLA paths for VLS vectors.
>> One I mentioned above: it saves time when all elements are equal,
>> or have a similarly compact representation.  The other is that it
>> makes VLA less special and ensures that the code gets more testing.
>> 
>> Maybe one compromise between that and the assert would be:
>> 
>> (1) enforce the assert only for VLS and
>
> that's what I did by using maybe_gt?
>
>> (2) add new checks to ensure that a VLA-friendly operation will never
>>     read out-of-bounds for VLS vectors
>> 
>> But I think this would be awkward.  E.g. we now build reversed vectors
>> as a 3-element series N-1, N-2, N-3.  It would be nice not to have
>> to special-case N==2 by suppressing N-3.  And the condition for (2)
>> might not always be obvious.
>
> If we know only N <= 2 is problematic then having must_eq ()
> special cases there makes sense IMO.

I'm not sure it's just that.  E.g. a zip is { 0, N, 1, N+1, 2, N+2, ... },
which would need to be curtailed for N==2 or N==4.

>> Another option would be to have special accessors that are allowed
>> to read out of bounds, and add the assert (for both VLA & VLS) to
>> VECTOR_CST_ELT.  It might take a while to root out all the places that
>> need to change though.
>
> Yes, that's another possibility.
>
> The point is (and we're seeing it now with the behavior of
> encoding QImode bitmasks) that the behavior isn't entirely
> obvious when VLS vectors are involved.  There's a memset zero
> of the QImode so anything other than zero-padding is odd.

Which memset do you mean?  The one in:

      /* Zero the buffer and then set bits later where necessary.  */
      int extract_bytes = MIN (len, total_bytes - off);
      if (ptr)
	memset (ptr, 0, extract_bytes);

?  If so, that's there for elt_bits>1.  It means that the canonical
form of a 2-bit active mask element is 01 rather than 11.  (And that's
something that might end up needing to be target-configurable.  The current
behaviour is guided by SVE, although it shouldn't matter much in practice
there.)

> It's also not clear to me we'd get reliable sign extension
> (we probably don't?).

For which case?  The idea isn't that we always sign-extend or always
zero-extend, but that we repeat the pattern.  So an all-1s mask would
give an all-1s QImode.  A 1010 mask would give a 10101010 QI, etc.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] middle-end/113576 - avoid out-of-bound vector element access
  2024-02-14 15:42     ` Richard Sandiford
@ 2024-02-15  8:12       ` Richard Biener
  2024-02-15  9:46         ` Richard Sandiford
  0 siblings, 1 reply; 8+ messages in thread
From: Richard Biener @ 2024-02-15  8:12 UTC (permalink / raw)
  To: Richard Sandiford; +Cc: gcc-patches

On Wed, 14 Feb 2024, Richard Sandiford wrote:

> Richard Biener <rguenther@suse.de> writes:
> > On Wed, 14 Feb 2024, Richard Sandiford wrote:
> >
> >> Richard Biener <rguenther@suse.de> writes:
> >> > The following avoids accessing out-of-bound vector elements when
> >> > native encoding a boolean vector with sub-BITS_PER_UNIT precision
> >> > elements.  The error was basing the number of elements to extract
> >> > on the rounded up total byte size involved and the patch bases
> >> > everything on the total number of elements to extract instead.
> >> 
> >> It's too long ago to be certain, but I think this was a deliberate choice.
> >> The point of the new vector constant encoding is that it can give an
> >> allegedly sensible value for any given index, even out-of-range ones.
> >> 
> >> Since the padding bits are undefined, we should in principle have a free
> >> choice of what to use.  And for VLA, it's often better to continue the
> >> existing pattern rather than force to zero.
> >> 
> >> I don't strongly object to changing it.  I think we should be careful
> >> about relying on zeroing for correctness though.  The bits are in principle
> >> undefined and we can't rely on reading zeros from equivalent memory or
> >> register values.
> >
> > The main motivation for a change here is to allow catching out-of-bound
> > indices again for VECTOR_CST_ELT, at least for constant nunits because
> > it might be a programming error like fat-fingering the index.  I do
> > think it's a regression that we no longer catch those.
> >
> > It's probably also a bit non-obvious how an encoding continues and
> > there might be DImode masks that can be represented by a 
> > zero-extended QImode immediate but "continued" it would require
> > a larger immediate.
> >
> > The change also effectively only changes something for 1 byte
> > encodings since nunits is a power of two and so is the element
> > size in bits.
> 
> Yeah, but even there, there's an argument that all-1s (0xff) is a more
> obvious value for an all-1s mask.
> 
> > A patch restoring the VECTOR_CST_ELT checking might be the
> > following
> >
> > diff --git a/gcc/tree.cc b/gcc/tree.cc
> > index 046a558d1b0..4c9b05167fd 100644
> > --- a/gcc/tree.cc
> > +++ b/gcc/tree.cc
> > @@ -10325,6 +10325,9 @@ vector_cst_elt (const_tree t, unsigned int i)
> >    if (i < encoded_nelts)
> >      return VECTOR_CST_ENCODED_ELT (t, i);
> >  
> > +  /* Catch out-of-bound element accesses.  */
> > +  gcc_checking_assert (maybe_gt (VECTOR_CST_NELTS (t), i));
> > +
> >    /* If there are no steps, the final encoded value is the right one.  */
> >    if (!VECTOR_CST_STEPPED_P (t))
> >      {
> >
> > but it triggers quite a bit via const_binop for, for example
> >
> > #2  0x00000000011c1506 in const_binop (code=PLUS_EXPR, 
> >     arg1=<vector_cst 0x7ffff6f40b40>, arg2=<vector_cst 0x7ffff6e731e0>)
> > (gdb) p debug_generic_expr (arg1)
> > { 12, 13, 14, 15 }
> > $5 = void
> > (gdb) p debug_generic_expr (arg2)
> > { -2, -2, -2, -3 }
> > (gdb) p count
> > $4 = 6
> > (gdb) l
> > 1711          if (!elts.new_binary_operation (type, arg1, arg2, 
> > step_ok_p))
> > 1712            return NULL_TREE;
> > 1713          unsigned int count = elts.encoded_nelts ();
> > 1714          for (unsigned int i = 0; i < count; ++i)
> > 1715            {
> > 1716              tree elem1 = VECTOR_CST_ELT (arg1, i);
> > 1717              tree elem2 = VECTOR_CST_ELT (arg2, i);
> > 1718
> > 1719              tree elt = const_binop (code, elem1, elem2);
> >
> > this seems like an error to me - why would we, for fixed-size
> > vectors and for PLUS ever create a vector encoding with 6 elements?!
> > That seems at least inefficient to me?
> 
> It's a case of picking your poison.  On the other side, operating
> individually on each element of a V64QI is inefficient when the
> representation says up-front that all elements are equal.

True, though I wonder why for VLS vectors new_binary_operation
doesn't cap the number of encoded elts on the fixed vector size,
like doing

  encoded_elts = ordered_min (TYPE_VECTOR_SUBPARTS (..), encoded_elts);

or if there's no good way to write it applying for both VLA and VLS
do it only when TYPE_VECTOR_SUBPARTS is constant.

> Fundemantally, operations on VLA vectors are treated as functions
> that map patterns to patterns.  The number of elements that are
> consumed isn't really relevant to the function itself.  The VLA
> folders therefore rely on being to read an element from a pattern
> even if the index is outside TREE_VECTOR_SUBPARTS.
> 
> There were two reasons for using VLA paths for VLS vectors.
> One I mentioned above: it saves time when all elements are equal,
> or have a similarly compact representation.  The other is that it
> makes VLA less special and ensures that the code gets more testing.
> 
> Maybe one compromise between that and the assert would be:
> 
> (1) enforce the assert only for VLS and

that's what I did by using maybe_gt?

> (2) add new checks to ensure that a VLA-friendly operation will never
>     read out-of-bounds for VLS vectors
> 
> But I think this would be awkward.  E.g. we now build reversed vectors
> as a 3-element series N-1, N-2, N-3.  It would be nice not to have
> to special-case N==2 by suppressing N-3.  And the condition for (2)
> might not always be obvious.

If we know only N <= 2 is problematic then having must_eq ()
special cases there makes sense IMO.

> Another option would be to have special accessors that are allowed
> to read out of bounds, and add the assert (for both VLA & VLS) to
> VECTOR_CST_ELT.  It might take a while to root out all the places that
> need to change though.

Yes, that's another possibility.

The point is (and we're seeing it now with the behavior of
encoding QImode bitmasks) that the behavior isn't entirely
obvious when VLS vectors are involved.  There's a memset zero
of the QImode so anything other than zero-padding is odd.
It's also not clear to me we'd get reliable sign extension
(we probably don't?).

Richard.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] middle-end/113576 - avoid out-of-bound vector element access
  2024-02-14 12:34   ` Richard Biener
@ 2024-02-14 15:42     ` Richard Sandiford
  2024-02-15  8:12       ` Richard Biener
  0 siblings, 1 reply; 8+ messages in thread
From: Richard Sandiford @ 2024-02-14 15:42 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches

Richard Biener <rguenther@suse.de> writes:
> On Wed, 14 Feb 2024, Richard Sandiford wrote:
>
>> Richard Biener <rguenther@suse.de> writes:
>> > The following avoids accessing out-of-bound vector elements when
>> > native encoding a boolean vector with sub-BITS_PER_UNIT precision
>> > elements.  The error was basing the number of elements to extract
>> > on the rounded up total byte size involved and the patch bases
>> > everything on the total number of elements to extract instead.
>> 
>> It's too long ago to be certain, but I think this was a deliberate choice.
>> The point of the new vector constant encoding is that it can give an
>> allegedly sensible value for any given index, even out-of-range ones.
>> 
>> Since the padding bits are undefined, we should in principle have a free
>> choice of what to use.  And for VLA, it's often better to continue the
>> existing pattern rather than force to zero.
>> 
>> I don't strongly object to changing it.  I think we should be careful
>> about relying on zeroing for correctness though.  The bits are in principle
>> undefined and we can't rely on reading zeros from equivalent memory or
>> register values.
>
> The main motivation for a change here is to allow catching out-of-bound
> indices again for VECTOR_CST_ELT, at least for constant nunits because
> it might be a programming error like fat-fingering the index.  I do
> think it's a regression that we no longer catch those.
>
> It's probably also a bit non-obvious how an encoding continues and
> there might be DImode masks that can be represented by a 
> zero-extended QImode immediate but "continued" it would require
> a larger immediate.
>
> The change also effectively only changes something for 1 byte
> encodings since nunits is a power of two and so is the element
> size in bits.

Yeah, but even there, there's an argument that all-1s (0xff) is a more
obvious value for an all-1s mask.

> A patch restoring the VECTOR_CST_ELT checking might be the
> following
>
> diff --git a/gcc/tree.cc b/gcc/tree.cc
> index 046a558d1b0..4c9b05167fd 100644
> --- a/gcc/tree.cc
> +++ b/gcc/tree.cc
> @@ -10325,6 +10325,9 @@ vector_cst_elt (const_tree t, unsigned int i)
>    if (i < encoded_nelts)
>      return VECTOR_CST_ENCODED_ELT (t, i);
>  
> +  /* Catch out-of-bound element accesses.  */
> +  gcc_checking_assert (maybe_gt (VECTOR_CST_NELTS (t), i));
> +
>    /* If there are no steps, the final encoded value is the right one.  */
>    if (!VECTOR_CST_STEPPED_P (t))
>      {
>
> but it triggers quite a bit via const_binop for, for example
>
> #2  0x00000000011c1506 in const_binop (code=PLUS_EXPR, 
>     arg1=<vector_cst 0x7ffff6f40b40>, arg2=<vector_cst 0x7ffff6e731e0>)
> (gdb) p debug_generic_expr (arg1)
> { 12, 13, 14, 15 }
> $5 = void
> (gdb) p debug_generic_expr (arg2)
> { -2, -2, -2, -3 }
> (gdb) p count
> $4 = 6
> (gdb) l
> 1711          if (!elts.new_binary_operation (type, arg1, arg2, 
> step_ok_p))
> 1712            return NULL_TREE;
> 1713          unsigned int count = elts.encoded_nelts ();
> 1714          for (unsigned int i = 0; i < count; ++i)
> 1715            {
> 1716              tree elem1 = VECTOR_CST_ELT (arg1, i);
> 1717              tree elem2 = VECTOR_CST_ELT (arg2, i);
> 1718
> 1719              tree elt = const_binop (code, elem1, elem2);
>
> this seems like an error to me - why would we, for fixed-size
> vectors and for PLUS ever create a vector encoding with 6 elements?!
> That seems at least inefficient to me?

It's a case of picking your poison.  On the other side, operating
individually on each element of a V64QI is inefficient when the
representation says up-front that all elements are equal.

Fundemantally, operations on VLA vectors are treated as functions
that map patterns to patterns.  The number of elements that are
consumed isn't really relevant to the function itself.  The VLA
folders therefore rely on being to read an element from a pattern
even if the index is outside TREE_VECTOR_SUBPARTS.

There were two reasons for using VLA paths for VLS vectors.
One I mentioned above: it saves time when all elements are equal,
or have a similarly compact representation.  The other is that it
makes VLA less special and ensures that the code gets more testing.

Maybe one compromise between that and the assert would be:

(1) enforce the assert only for VLS and
(2) add new checks to ensure that a VLA-friendly operation will never
    read out-of-bounds for VLS vectors

But I think this would be awkward.  E.g. we now build reversed vectors
as a 3-element series N-1, N-2, N-3.  It would be nice not to have
to special-case N==2 by suppressing N-3.  And the condition for (2)
might not always be obvious.

Another option would be to have special accessors that are allowed
to read out of bounds, and add the assert (for both VLA & VLS) to
VECTOR_CST_ELT.  It might take a while to root out all the places that
need to change though.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] middle-end/113576 - avoid out-of-bound vector element access
  2024-02-14 11:31 ` Richard Sandiford
@ 2024-02-14 12:34   ` Richard Biener
  2024-02-14 15:42     ` Richard Sandiford
  0 siblings, 1 reply; 8+ messages in thread
From: Richard Biener @ 2024-02-14 12:34 UTC (permalink / raw)
  To: Richard Sandiford; +Cc: gcc-patches

On Wed, 14 Feb 2024, Richard Sandiford wrote:

> Richard Biener <rguenther@suse.de> writes:
> > The following avoids accessing out-of-bound vector elements when
> > native encoding a boolean vector with sub-BITS_PER_UNIT precision
> > elements.  The error was basing the number of elements to extract
> > on the rounded up total byte size involved and the patch bases
> > everything on the total number of elements to extract instead.
> 
> It's too long ago to be certain, but I think this was a deliberate choice.
> The point of the new vector constant encoding is that it can give an
> allegedly sensible value for any given index, even out-of-range ones.
> 
> Since the padding bits are undefined, we should in principle have a free
> choice of what to use.  And for VLA, it's often better to continue the
> existing pattern rather than force to zero.
> 
> I don't strongly object to changing it.  I think we should be careful
> about relying on zeroing for correctness though.  The bits are in principle
> undefined and we can't rely on reading zeros from equivalent memory or
> register values.

The main motivation for a change here is to allow catching out-of-bound
indices again for VECTOR_CST_ELT, at least for constant nunits because
it might be a programming error like fat-fingering the index.  I do
think it's a regression that we no longer catch those.

It's probably also a bit non-obvious how an encoding continues and
there might be DImode masks that can be represented by a 
zero-extended QImode immediate but "continued" it would require
a larger immediate.

The change also effectively only changes something for 1 byte
encodings since nunits is a power of two and so is the element
size in bits.

A patch restoring the VECTOR_CST_ELT checking might be the
following

diff --git a/gcc/tree.cc b/gcc/tree.cc
index 046a558d1b0..4c9b05167fd 100644
--- a/gcc/tree.cc
+++ b/gcc/tree.cc
@@ -10325,6 +10325,9 @@ vector_cst_elt (const_tree t, unsigned int i)
   if (i < encoded_nelts)
     return VECTOR_CST_ENCODED_ELT (t, i);
 
+  /* Catch out-of-bound element accesses.  */
+  gcc_checking_assert (maybe_gt (VECTOR_CST_NELTS (t), i));
+
   /* If there are no steps, the final encoded value is the right one.  */
   if (!VECTOR_CST_STEPPED_P (t))
     {

but it triggers quite a bit via const_binop for, for example

#2  0x00000000011c1506 in const_binop (code=PLUS_EXPR, 
    arg1=<vector_cst 0x7ffff6f40b40>, arg2=<vector_cst 0x7ffff6e731e0>)
(gdb) p debug_generic_expr (arg1)
{ 12, 13, 14, 15 }
$5 = void
(gdb) p debug_generic_expr (arg2)
{ -2, -2, -2, -3 }
(gdb) p count
$4 = 6
(gdb) l
1711          if (!elts.new_binary_operation (type, arg1, arg2, 
step_ok_p))
1712            return NULL_TREE;
1713          unsigned int count = elts.encoded_nelts ();
1714          for (unsigned int i = 0; i < count; ++i)
1715            {
1716              tree elem1 = VECTOR_CST_ELT (arg1, i);
1717              tree elem2 = VECTOR_CST_ELT (arg2, i);
1718
1719              tree elt = const_binop (code, elem1, elem2);

this seems like an error to me - why would we, for fixed-size
vectors and for PLUS ever create a vector encoding with 6 elements?!
That seems at least inefficient to me?

Richard.

> Thanks,
> Richard
> >
> > As a side-effect this now consistently results in zeros in the
> > padding of the last encoded byte which also avoids the failure
> > mode seen in PR113576.
> >
> > Bootstrapped and tested on x86_64-unknown-linux-gnu.
> >
> > OK?
> >
> > Thanks,
> > Richard.
> >
> > 	PR middle-end/113576
> > 	* fold-const.cc (native_encode_vector_part): Avoid accessing
> > 	out-of-bound elements.
> > ---
> >  gcc/fold-const.cc | 8 ++++----
> >  1 file changed, 4 insertions(+), 4 deletions(-)
> >
> > diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
> > index 80e211e18c0..8638757312b 100644
> > --- a/gcc/fold-const.cc
> > +++ b/gcc/fold-const.cc
> > @@ -8057,13 +8057,13 @@ native_encode_vector_part (const_tree expr, unsigned char *ptr, int len,
> >  	off = 0;
> >  
> >        /* Zero the buffer and then set bits later where necessary.  */
> > -      int extract_bytes = MIN (len, total_bytes - off);
> > +      unsigned elts_per_byte = BITS_PER_UNIT / elt_bits;
> > +      unsigned first_elt = off * elts_per_byte;
> > +      unsigned extract_elts = MIN (len * elts_per_byte, count - first_elt);
> > +      unsigned extract_bytes = CEIL (elt_bits * extract_elts, BITS_PER_UNIT);
> >        if (ptr)
> >  	memset (ptr, 0, extract_bytes);
> >  
> > -      unsigned int elts_per_byte = BITS_PER_UNIT / elt_bits;
> > -      unsigned int first_elt = off * elts_per_byte;
> > -      unsigned int extract_elts = extract_bytes * elts_per_byte;
> >        for (unsigned int i = 0; i < extract_elts; ++i)
> >  	{
> >  	  tree elt = VECTOR_CST_ELT (expr, first_elt + i);
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] middle-end/113576 - avoid out-of-bound vector element access
       [not found] <20240206113819.2335F3858402@sourceware.org>
@ 2024-02-14 11:31 ` Richard Sandiford
  2024-02-14 12:34   ` Richard Biener
  0 siblings, 1 reply; 8+ messages in thread
From: Richard Sandiford @ 2024-02-14 11:31 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches

Richard Biener <rguenther@suse.de> writes:
> The following avoids accessing out-of-bound vector elements when
> native encoding a boolean vector with sub-BITS_PER_UNIT precision
> elements.  The error was basing the number of elements to extract
> on the rounded up total byte size involved and the patch bases
> everything on the total number of elements to extract instead.

It's too long ago to be certain, but I think this was a deliberate choice.
The point of the new vector constant encoding is that it can give an
allegedly sensible value for any given index, even out-of-range ones.

Since the padding bits are undefined, we should in principle have a free
choice of what to use.  And for VLA, it's often better to continue the
existing pattern rather than force to zero.

I don't strongly object to changing it.  I think we should be careful
about relying on zeroing for correctness though.  The bits are in principle
undefined and we can't rely on reading zeros from equivalent memory or
register values.

Thanks,
Richard
>
> As a side-effect this now consistently results in zeros in the
> padding of the last encoded byte which also avoids the failure
> mode seen in PR113576.
>
> Bootstrapped and tested on x86_64-unknown-linux-gnu.
>
> OK?
>
> Thanks,
> Richard.
>
> 	PR middle-end/113576
> 	* fold-const.cc (native_encode_vector_part): Avoid accessing
> 	out-of-bound elements.
> ---
>  gcc/fold-const.cc | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
> index 80e211e18c0..8638757312b 100644
> --- a/gcc/fold-const.cc
> +++ b/gcc/fold-const.cc
> @@ -8057,13 +8057,13 @@ native_encode_vector_part (const_tree expr, unsigned char *ptr, int len,
>  	off = 0;
>  
>        /* Zero the buffer and then set bits later where necessary.  */
> -      int extract_bytes = MIN (len, total_bytes - off);
> +      unsigned elts_per_byte = BITS_PER_UNIT / elt_bits;
> +      unsigned first_elt = off * elts_per_byte;
> +      unsigned extract_elts = MIN (len * elts_per_byte, count - first_elt);
> +      unsigned extract_bytes = CEIL (elt_bits * extract_elts, BITS_PER_UNIT);
>        if (ptr)
>  	memset (ptr, 0, extract_bytes);
>  
> -      unsigned int elts_per_byte = BITS_PER_UNIT / elt_bits;
> -      unsigned int first_elt = off * elts_per_byte;
> -      unsigned int extract_elts = extract_bytes * elts_per_byte;
>        for (unsigned int i = 0; i < extract_elts; ++i)
>  	{
>  	  tree elt = VECTOR_CST_ELT (expr, first_elt + i);

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2024-03-05 14:39 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-02-06 11:37 [PATCH] middle-end/113576 - avoid out-of-bound vector element access Richard Biener
     [not found] <20240206113819.2335F3858402@sourceware.org>
2024-02-14 11:31 ` Richard Sandiford
2024-02-14 12:34   ` Richard Biener
2024-02-14 15:42     ` Richard Sandiford
2024-02-15  8:12       ` Richard Biener
2024-02-15  9:46         ` Richard Sandiford
2024-02-15 10:12           ` Richard Biener
     [not found] <20240206113813.26DCA3858D37@sourceware.org>
2024-03-05 14:39 ` Jeff Law

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).