* Re: [PATCH] middle-end/113576 - avoid out-of-bound vector element access [not found] <20240206113819.2335F3858402@sourceware.org> @ 2024-02-14 11:31 ` Richard Sandiford 2024-02-14 12:34 ` Richard Biener 0 siblings, 1 reply; 8+ messages in thread From: Richard Sandiford @ 2024-02-14 11:31 UTC (permalink / raw) To: Richard Biener; +Cc: gcc-patches Richard Biener <rguenther@suse.de> writes: > The following avoids accessing out-of-bound vector elements when > native encoding a boolean vector with sub-BITS_PER_UNIT precision > elements. The error was basing the number of elements to extract > on the rounded up total byte size involved and the patch bases > everything on the total number of elements to extract instead. It's too long ago to be certain, but I think this was a deliberate choice. The point of the new vector constant encoding is that it can give an allegedly sensible value for any given index, even out-of-range ones. Since the padding bits are undefined, we should in principle have a free choice of what to use. And for VLA, it's often better to continue the existing pattern rather than force to zero. I don't strongly object to changing it. I think we should be careful about relying on zeroing for correctness though. The bits are in principle undefined and we can't rely on reading zeros from equivalent memory or register values. Thanks, Richard > > As a side-effect this now consistently results in zeros in the > padding of the last encoded byte which also avoids the failure > mode seen in PR113576. > > Bootstrapped and tested on x86_64-unknown-linux-gnu. > > OK? > > Thanks, > Richard. > > PR middle-end/113576 > * fold-const.cc (native_encode_vector_part): Avoid accessing > out-of-bound elements. > --- > gcc/fold-const.cc | 8 ++++---- > 1 file changed, 4 insertions(+), 4 deletions(-) > > diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc > index 80e211e18c0..8638757312b 100644 > --- a/gcc/fold-const.cc > +++ b/gcc/fold-const.cc > @@ -8057,13 +8057,13 @@ native_encode_vector_part (const_tree expr, unsigned char *ptr, int len, > off = 0; > > /* Zero the buffer and then set bits later where necessary. */ > - int extract_bytes = MIN (len, total_bytes - off); > + unsigned elts_per_byte = BITS_PER_UNIT / elt_bits; > + unsigned first_elt = off * elts_per_byte; > + unsigned extract_elts = MIN (len * elts_per_byte, count - first_elt); > + unsigned extract_bytes = CEIL (elt_bits * extract_elts, BITS_PER_UNIT); > if (ptr) > memset (ptr, 0, extract_bytes); > > - unsigned int elts_per_byte = BITS_PER_UNIT / elt_bits; > - unsigned int first_elt = off * elts_per_byte; > - unsigned int extract_elts = extract_bytes * elts_per_byte; > for (unsigned int i = 0; i < extract_elts; ++i) > { > tree elt = VECTOR_CST_ELT (expr, first_elt + i); ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] middle-end/113576 - avoid out-of-bound vector element access 2024-02-14 11:31 ` [PATCH] middle-end/113576 - avoid out-of-bound vector element access Richard Sandiford @ 2024-02-14 12:34 ` Richard Biener 2024-02-14 15:42 ` Richard Sandiford 0 siblings, 1 reply; 8+ messages in thread From: Richard Biener @ 2024-02-14 12:34 UTC (permalink / raw) To: Richard Sandiford; +Cc: gcc-patches On Wed, 14 Feb 2024, Richard Sandiford wrote: > Richard Biener <rguenther@suse.de> writes: > > The following avoids accessing out-of-bound vector elements when > > native encoding a boolean vector with sub-BITS_PER_UNIT precision > > elements. The error was basing the number of elements to extract > > on the rounded up total byte size involved and the patch bases > > everything on the total number of elements to extract instead. > > It's too long ago to be certain, but I think this was a deliberate choice. > The point of the new vector constant encoding is that it can give an > allegedly sensible value for any given index, even out-of-range ones. > > Since the padding bits are undefined, we should in principle have a free > choice of what to use. And for VLA, it's often better to continue the > existing pattern rather than force to zero. > > I don't strongly object to changing it. I think we should be careful > about relying on zeroing for correctness though. The bits are in principle > undefined and we can't rely on reading zeros from equivalent memory or > register values. The main motivation for a change here is to allow catching out-of-bound indices again for VECTOR_CST_ELT, at least for constant nunits because it might be a programming error like fat-fingering the index. I do think it's a regression that we no longer catch those. It's probably also a bit non-obvious how an encoding continues and there might be DImode masks that can be represented by a zero-extended QImode immediate but "continued" it would require a larger immediate. The change also effectively only changes something for 1 byte encodings since nunits is a power of two and so is the element size in bits. A patch restoring the VECTOR_CST_ELT checking might be the following diff --git a/gcc/tree.cc b/gcc/tree.cc index 046a558d1b0..4c9b05167fd 100644 --- a/gcc/tree.cc +++ b/gcc/tree.cc @@ -10325,6 +10325,9 @@ vector_cst_elt (const_tree t, unsigned int i) if (i < encoded_nelts) return VECTOR_CST_ENCODED_ELT (t, i); + /* Catch out-of-bound element accesses. */ + gcc_checking_assert (maybe_gt (VECTOR_CST_NELTS (t), i)); + /* If there are no steps, the final encoded value is the right one. */ if (!VECTOR_CST_STEPPED_P (t)) { but it triggers quite a bit via const_binop for, for example #2 0x00000000011c1506 in const_binop (code=PLUS_EXPR, arg1=<vector_cst 0x7ffff6f40b40>, arg2=<vector_cst 0x7ffff6e731e0>) (gdb) p debug_generic_expr (arg1) { 12, 13, 14, 15 } $5 = void (gdb) p debug_generic_expr (arg2) { -2, -2, -2, -3 } (gdb) p count $4 = 6 (gdb) l 1711 if (!elts.new_binary_operation (type, arg1, arg2, step_ok_p)) 1712 return NULL_TREE; 1713 unsigned int count = elts.encoded_nelts (); 1714 for (unsigned int i = 0; i < count; ++i) 1715 { 1716 tree elem1 = VECTOR_CST_ELT (arg1, i); 1717 tree elem2 = VECTOR_CST_ELT (arg2, i); 1718 1719 tree elt = const_binop (code, elem1, elem2); this seems like an error to me - why would we, for fixed-size vectors and for PLUS ever create a vector encoding with 6 elements?! That seems at least inefficient to me? Richard. > Thanks, > Richard > > > > As a side-effect this now consistently results in zeros in the > > padding of the last encoded byte which also avoids the failure > > mode seen in PR113576. > > > > Bootstrapped and tested on x86_64-unknown-linux-gnu. > > > > OK? > > > > Thanks, > > Richard. > > > > PR middle-end/113576 > > * fold-const.cc (native_encode_vector_part): Avoid accessing > > out-of-bound elements. > > --- > > gcc/fold-const.cc | 8 ++++---- > > 1 file changed, 4 insertions(+), 4 deletions(-) > > > > diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc > > index 80e211e18c0..8638757312b 100644 > > --- a/gcc/fold-const.cc > > +++ b/gcc/fold-const.cc > > @@ -8057,13 +8057,13 @@ native_encode_vector_part (const_tree expr, unsigned char *ptr, int len, > > off = 0; > > > > /* Zero the buffer and then set bits later where necessary. */ > > - int extract_bytes = MIN (len, total_bytes - off); > > + unsigned elts_per_byte = BITS_PER_UNIT / elt_bits; > > + unsigned first_elt = off * elts_per_byte; > > + unsigned extract_elts = MIN (len * elts_per_byte, count - first_elt); > > + unsigned extract_bytes = CEIL (elt_bits * extract_elts, BITS_PER_UNIT); > > if (ptr) > > memset (ptr, 0, extract_bytes); > > > > - unsigned int elts_per_byte = BITS_PER_UNIT / elt_bits; > > - unsigned int first_elt = off * elts_per_byte; > > - unsigned int extract_elts = extract_bytes * elts_per_byte; > > for (unsigned int i = 0; i < extract_elts; ++i) > > { > > tree elt = VECTOR_CST_ELT (expr, first_elt + i); > -- Richard Biener <rguenther@suse.de> SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg) ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] middle-end/113576 - avoid out-of-bound vector element access 2024-02-14 12:34 ` Richard Biener @ 2024-02-14 15:42 ` Richard Sandiford 2024-02-15 8:12 ` Richard Biener 0 siblings, 1 reply; 8+ messages in thread From: Richard Sandiford @ 2024-02-14 15:42 UTC (permalink / raw) To: Richard Biener; +Cc: gcc-patches Richard Biener <rguenther@suse.de> writes: > On Wed, 14 Feb 2024, Richard Sandiford wrote: > >> Richard Biener <rguenther@suse.de> writes: >> > The following avoids accessing out-of-bound vector elements when >> > native encoding a boolean vector with sub-BITS_PER_UNIT precision >> > elements. The error was basing the number of elements to extract >> > on the rounded up total byte size involved and the patch bases >> > everything on the total number of elements to extract instead. >> >> It's too long ago to be certain, but I think this was a deliberate choice. >> The point of the new vector constant encoding is that it can give an >> allegedly sensible value for any given index, even out-of-range ones. >> >> Since the padding bits are undefined, we should in principle have a free >> choice of what to use. And for VLA, it's often better to continue the >> existing pattern rather than force to zero. >> >> I don't strongly object to changing it. I think we should be careful >> about relying on zeroing for correctness though. The bits are in principle >> undefined and we can't rely on reading zeros from equivalent memory or >> register values. > > The main motivation for a change here is to allow catching out-of-bound > indices again for VECTOR_CST_ELT, at least for constant nunits because > it might be a programming error like fat-fingering the index. I do > think it's a regression that we no longer catch those. > > It's probably also a bit non-obvious how an encoding continues and > there might be DImode masks that can be represented by a > zero-extended QImode immediate but "continued" it would require > a larger immediate. > > The change also effectively only changes something for 1 byte > encodings since nunits is a power of two and so is the element > size in bits. Yeah, but even there, there's an argument that all-1s (0xff) is a more obvious value for an all-1s mask. > A patch restoring the VECTOR_CST_ELT checking might be the > following > > diff --git a/gcc/tree.cc b/gcc/tree.cc > index 046a558d1b0..4c9b05167fd 100644 > --- a/gcc/tree.cc > +++ b/gcc/tree.cc > @@ -10325,6 +10325,9 @@ vector_cst_elt (const_tree t, unsigned int i) > if (i < encoded_nelts) > return VECTOR_CST_ENCODED_ELT (t, i); > > + /* Catch out-of-bound element accesses. */ > + gcc_checking_assert (maybe_gt (VECTOR_CST_NELTS (t), i)); > + > /* If there are no steps, the final encoded value is the right one. */ > if (!VECTOR_CST_STEPPED_P (t)) > { > > but it triggers quite a bit via const_binop for, for example > > #2 0x00000000011c1506 in const_binop (code=PLUS_EXPR, > arg1=<vector_cst 0x7ffff6f40b40>, arg2=<vector_cst 0x7ffff6e731e0>) > (gdb) p debug_generic_expr (arg1) > { 12, 13, 14, 15 } > $5 = void > (gdb) p debug_generic_expr (arg2) > { -2, -2, -2, -3 } > (gdb) p count > $4 = 6 > (gdb) l > 1711 if (!elts.new_binary_operation (type, arg1, arg2, > step_ok_p)) > 1712 return NULL_TREE; > 1713 unsigned int count = elts.encoded_nelts (); > 1714 for (unsigned int i = 0; i < count; ++i) > 1715 { > 1716 tree elem1 = VECTOR_CST_ELT (arg1, i); > 1717 tree elem2 = VECTOR_CST_ELT (arg2, i); > 1718 > 1719 tree elt = const_binop (code, elem1, elem2); > > this seems like an error to me - why would we, for fixed-size > vectors and for PLUS ever create a vector encoding with 6 elements?! > That seems at least inefficient to me? It's a case of picking your poison. On the other side, operating individually on each element of a V64QI is inefficient when the representation says up-front that all elements are equal. Fundemantally, operations on VLA vectors are treated as functions that map patterns to patterns. The number of elements that are consumed isn't really relevant to the function itself. The VLA folders therefore rely on being to read an element from a pattern even if the index is outside TREE_VECTOR_SUBPARTS. There were two reasons for using VLA paths for VLS vectors. One I mentioned above: it saves time when all elements are equal, or have a similarly compact representation. The other is that it makes VLA less special and ensures that the code gets more testing. Maybe one compromise between that and the assert would be: (1) enforce the assert only for VLS and (2) add new checks to ensure that a VLA-friendly operation will never read out-of-bounds for VLS vectors But I think this would be awkward. E.g. we now build reversed vectors as a 3-element series N-1, N-2, N-3. It would be nice not to have to special-case N==2 by suppressing N-3. And the condition for (2) might not always be obvious. Another option would be to have special accessors that are allowed to read out of bounds, and add the assert (for both VLA & VLS) to VECTOR_CST_ELT. It might take a while to root out all the places that need to change though. Thanks, Richard ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] middle-end/113576 - avoid out-of-bound vector element access 2024-02-14 15:42 ` Richard Sandiford @ 2024-02-15 8:12 ` Richard Biener 2024-02-15 9:46 ` Richard Sandiford 0 siblings, 1 reply; 8+ messages in thread From: Richard Biener @ 2024-02-15 8:12 UTC (permalink / raw) To: Richard Sandiford; +Cc: gcc-patches On Wed, 14 Feb 2024, Richard Sandiford wrote: > Richard Biener <rguenther@suse.de> writes: > > On Wed, 14 Feb 2024, Richard Sandiford wrote: > > > >> Richard Biener <rguenther@suse.de> writes: > >> > The following avoids accessing out-of-bound vector elements when > >> > native encoding a boolean vector with sub-BITS_PER_UNIT precision > >> > elements. The error was basing the number of elements to extract > >> > on the rounded up total byte size involved and the patch bases > >> > everything on the total number of elements to extract instead. > >> > >> It's too long ago to be certain, but I think this was a deliberate choice. > >> The point of the new vector constant encoding is that it can give an > >> allegedly sensible value for any given index, even out-of-range ones. > >> > >> Since the padding bits are undefined, we should in principle have a free > >> choice of what to use. And for VLA, it's often better to continue the > >> existing pattern rather than force to zero. > >> > >> I don't strongly object to changing it. I think we should be careful > >> about relying on zeroing for correctness though. The bits are in principle > >> undefined and we can't rely on reading zeros from equivalent memory or > >> register values. > > > > The main motivation for a change here is to allow catching out-of-bound > > indices again for VECTOR_CST_ELT, at least for constant nunits because > > it might be a programming error like fat-fingering the index. I do > > think it's a regression that we no longer catch those. > > > > It's probably also a bit non-obvious how an encoding continues and > > there might be DImode masks that can be represented by a > > zero-extended QImode immediate but "continued" it would require > > a larger immediate. > > > > The change also effectively only changes something for 1 byte > > encodings since nunits is a power of two and so is the element > > size in bits. > > Yeah, but even there, there's an argument that all-1s (0xff) is a more > obvious value for an all-1s mask. > > > A patch restoring the VECTOR_CST_ELT checking might be the > > following > > > > diff --git a/gcc/tree.cc b/gcc/tree.cc > > index 046a558d1b0..4c9b05167fd 100644 > > --- a/gcc/tree.cc > > +++ b/gcc/tree.cc > > @@ -10325,6 +10325,9 @@ vector_cst_elt (const_tree t, unsigned int i) > > if (i < encoded_nelts) > > return VECTOR_CST_ENCODED_ELT (t, i); > > > > + /* Catch out-of-bound element accesses. */ > > + gcc_checking_assert (maybe_gt (VECTOR_CST_NELTS (t), i)); > > + > > /* If there are no steps, the final encoded value is the right one. */ > > if (!VECTOR_CST_STEPPED_P (t)) > > { > > > > but it triggers quite a bit via const_binop for, for example > > > > #2 0x00000000011c1506 in const_binop (code=PLUS_EXPR, > > arg1=<vector_cst 0x7ffff6f40b40>, arg2=<vector_cst 0x7ffff6e731e0>) > > (gdb) p debug_generic_expr (arg1) > > { 12, 13, 14, 15 } > > $5 = void > > (gdb) p debug_generic_expr (arg2) > > { -2, -2, -2, -3 } > > (gdb) p count > > $4 = 6 > > (gdb) l > > 1711 if (!elts.new_binary_operation (type, arg1, arg2, > > step_ok_p)) > > 1712 return NULL_TREE; > > 1713 unsigned int count = elts.encoded_nelts (); > > 1714 for (unsigned int i = 0; i < count; ++i) > > 1715 { > > 1716 tree elem1 = VECTOR_CST_ELT (arg1, i); > > 1717 tree elem2 = VECTOR_CST_ELT (arg2, i); > > 1718 > > 1719 tree elt = const_binop (code, elem1, elem2); > > > > this seems like an error to me - why would we, for fixed-size > > vectors and for PLUS ever create a vector encoding with 6 elements?! > > That seems at least inefficient to me? > > It's a case of picking your poison. On the other side, operating > individually on each element of a V64QI is inefficient when the > representation says up-front that all elements are equal. True, though I wonder why for VLS vectors new_binary_operation doesn't cap the number of encoded elts on the fixed vector size, like doing encoded_elts = ordered_min (TYPE_VECTOR_SUBPARTS (..), encoded_elts); or if there's no good way to write it applying for both VLA and VLS do it only when TYPE_VECTOR_SUBPARTS is constant. > Fundemantally, operations on VLA vectors are treated as functions > that map patterns to patterns. The number of elements that are > consumed isn't really relevant to the function itself. The VLA > folders therefore rely on being to read an element from a pattern > even if the index is outside TREE_VECTOR_SUBPARTS. > > There were two reasons for using VLA paths for VLS vectors. > One I mentioned above: it saves time when all elements are equal, > or have a similarly compact representation. The other is that it > makes VLA less special and ensures that the code gets more testing. > > Maybe one compromise between that and the assert would be: > > (1) enforce the assert only for VLS and that's what I did by using maybe_gt? > (2) add new checks to ensure that a VLA-friendly operation will never > read out-of-bounds for VLS vectors > > But I think this would be awkward. E.g. we now build reversed vectors > as a 3-element series N-1, N-2, N-3. It would be nice not to have > to special-case N==2 by suppressing N-3. And the condition for (2) > might not always be obvious. If we know only N <= 2 is problematic then having must_eq () special cases there makes sense IMO. > Another option would be to have special accessors that are allowed > to read out of bounds, and add the assert (for both VLA & VLS) to > VECTOR_CST_ELT. It might take a while to root out all the places that > need to change though. Yes, that's another possibility. The point is (and we're seeing it now with the behavior of encoding QImode bitmasks) that the behavior isn't entirely obvious when VLS vectors are involved. There's a memset zero of the QImode so anything other than zero-padding is odd. It's also not clear to me we'd get reliable sign extension (we probably don't?). Richard. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] middle-end/113576 - avoid out-of-bound vector element access 2024-02-15 8:12 ` Richard Biener @ 2024-02-15 9:46 ` Richard Sandiford 2024-02-15 10:12 ` Richard Biener 0 siblings, 1 reply; 8+ messages in thread From: Richard Sandiford @ 2024-02-15 9:46 UTC (permalink / raw) To: Richard Biener; +Cc: gcc-patches Richard Biener <rguenther@suse.de> writes: > On Wed, 14 Feb 2024, Richard Sandiford wrote: > >> Richard Biener <rguenther@suse.de> writes: >> > On Wed, 14 Feb 2024, Richard Sandiford wrote: >> > >> >> Richard Biener <rguenther@suse.de> writes: >> >> > The following avoids accessing out-of-bound vector elements when >> >> > native encoding a boolean vector with sub-BITS_PER_UNIT precision >> >> > elements. The error was basing the number of elements to extract >> >> > on the rounded up total byte size involved and the patch bases >> >> > everything on the total number of elements to extract instead. >> >> >> >> It's too long ago to be certain, but I think this was a deliberate choice. >> >> The point of the new vector constant encoding is that it can give an >> >> allegedly sensible value for any given index, even out-of-range ones. >> >> >> >> Since the padding bits are undefined, we should in principle have a free >> >> choice of what to use. And for VLA, it's often better to continue the >> >> existing pattern rather than force to zero. >> >> >> >> I don't strongly object to changing it. I think we should be careful >> >> about relying on zeroing for correctness though. The bits are in principle >> >> undefined and we can't rely on reading zeros from equivalent memory or >> >> register values. >> > >> > The main motivation for a change here is to allow catching out-of-bound >> > indices again for VECTOR_CST_ELT, at least for constant nunits because >> > it might be a programming error like fat-fingering the index. I do >> > think it's a regression that we no longer catch those. >> > >> > It's probably also a bit non-obvious how an encoding continues and >> > there might be DImode masks that can be represented by a >> > zero-extended QImode immediate but "continued" it would require >> > a larger immediate. >> > >> > The change also effectively only changes something for 1 byte >> > encodings since nunits is a power of two and so is the element >> > size in bits. >> >> Yeah, but even there, there's an argument that all-1s (0xff) is a more >> obvious value for an all-1s mask. >> >> > A patch restoring the VECTOR_CST_ELT checking might be the >> > following >> > >> > diff --git a/gcc/tree.cc b/gcc/tree.cc >> > index 046a558d1b0..4c9b05167fd 100644 >> > --- a/gcc/tree.cc >> > +++ b/gcc/tree.cc >> > @@ -10325,6 +10325,9 @@ vector_cst_elt (const_tree t, unsigned int i) >> > if (i < encoded_nelts) >> > return VECTOR_CST_ENCODED_ELT (t, i); >> > >> > + /* Catch out-of-bound element accesses. */ >> > + gcc_checking_assert (maybe_gt (VECTOR_CST_NELTS (t), i)); >> > + >> > /* If there are no steps, the final encoded value is the right one. */ >> > if (!VECTOR_CST_STEPPED_P (t)) >> > { >> > >> > but it triggers quite a bit via const_binop for, for example >> > >> > #2 0x00000000011c1506 in const_binop (code=PLUS_EXPR, >> > arg1=<vector_cst 0x7ffff6f40b40>, arg2=<vector_cst 0x7ffff6e731e0>) >> > (gdb) p debug_generic_expr (arg1) >> > { 12, 13, 14, 15 } >> > $5 = void >> > (gdb) p debug_generic_expr (arg2) >> > { -2, -2, -2, -3 } >> > (gdb) p count >> > $4 = 6 >> > (gdb) l >> > 1711 if (!elts.new_binary_operation (type, arg1, arg2, >> > step_ok_p)) >> > 1712 return NULL_TREE; >> > 1713 unsigned int count = elts.encoded_nelts (); >> > 1714 for (unsigned int i = 0; i < count; ++i) >> > 1715 { >> > 1716 tree elem1 = VECTOR_CST_ELT (arg1, i); >> > 1717 tree elem2 = VECTOR_CST_ELT (arg2, i); >> > 1718 >> > 1719 tree elt = const_binop (code, elem1, elem2); >> > >> > this seems like an error to me - why would we, for fixed-size >> > vectors and for PLUS ever create a vector encoding with 6 elements?! >> > That seems at least inefficient to me? >> >> It's a case of picking your poison. On the other side, operating >> individually on each element of a V64QI is inefficient when the >> representation says up-front that all elements are equal. > > True, though I wonder why for VLS vectors new_binary_operation > doesn't cap the number of encoded elts on the fixed vector size, > like doing > > encoded_elts = ordered_min (TYPE_VECTOR_SUBPARTS (..), encoded_elts); > > or if there's no good way to write it applying for both VLA and VLS > do it only when TYPE_VECTOR_SUBPARTS is constant. ordered_min can't be used because there's no guarantee that encoded_elts and TYPE_VECTOR_SUBPARTS are well-ordered for the VLA case. E.g. for a stepped (3-element) encoding and a length of 2+2X, the stepped encoding is longer for X==0 and the vector is longer for X>0. But yeah, in general, trying to enforce this for VLS would probably lead to a proliferation of more "if VLA do one thing, if VLS do some other thing". The aim was to avoid that where it didn't seem strictly necessary. >> Fundemantally, operations on VLA vectors are treated as functions >> that map patterns to patterns. The number of elements that are >> consumed isn't really relevant to the function itself. The VLA >> folders therefore rely on being to read an element from a pattern >> even if the index is outside TREE_VECTOR_SUBPARTS. >> >> There were two reasons for using VLA paths for VLS vectors. >> One I mentioned above: it saves time when all elements are equal, >> or have a similarly compact representation. The other is that it >> makes VLA less special and ensures that the code gets more testing. >> >> Maybe one compromise between that and the assert would be: >> >> (1) enforce the assert only for VLS and > > that's what I did by using maybe_gt? > >> (2) add new checks to ensure that a VLA-friendly operation will never >> read out-of-bounds for VLS vectors >> >> But I think this would be awkward. E.g. we now build reversed vectors >> as a 3-element series N-1, N-2, N-3. It would be nice not to have >> to special-case N==2 by suppressing N-3. And the condition for (2) >> might not always be obvious. > > If we know only N <= 2 is problematic then having must_eq () > special cases there makes sense IMO. I'm not sure it's just that. E.g. a zip is { 0, N, 1, N+1, 2, N+2, ... }, which would need to be curtailed for N==2 or N==4. >> Another option would be to have special accessors that are allowed >> to read out of bounds, and add the assert (for both VLA & VLS) to >> VECTOR_CST_ELT. It might take a while to root out all the places that >> need to change though. > > Yes, that's another possibility. > > The point is (and we're seeing it now with the behavior of > encoding QImode bitmasks) that the behavior isn't entirely > obvious when VLS vectors are involved. There's a memset zero > of the QImode so anything other than zero-padding is odd. Which memset do you mean? The one in: /* Zero the buffer and then set bits later where necessary. */ int extract_bytes = MIN (len, total_bytes - off); if (ptr) memset (ptr, 0, extract_bytes); ? If so, that's there for elt_bits>1. It means that the canonical form of a 2-bit active mask element is 01 rather than 11. (And that's something that might end up needing to be target-configurable. The current behaviour is guided by SVE, although it shouldn't matter much in practice there.) > It's also not clear to me we'd get reliable sign extension > (we probably don't?). For which case? The idea isn't that we always sign-extend or always zero-extend, but that we repeat the pattern. So an all-1s mask would give an all-1s QImode. A 1010 mask would give a 10101010 QI, etc. Thanks, Richard ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] middle-end/113576 - avoid out-of-bound vector element access 2024-02-15 9:46 ` Richard Sandiford @ 2024-02-15 10:12 ` Richard Biener 0 siblings, 0 replies; 8+ messages in thread From: Richard Biener @ 2024-02-15 10:12 UTC (permalink / raw) To: Richard Sandiford; +Cc: gcc-patches On Thu, 15 Feb 2024, Richard Sandiford wrote: > Richard Biener <rguenther@suse.de> writes: > > On Wed, 14 Feb 2024, Richard Sandiford wrote: > > > >> Richard Biener <rguenther@suse.de> writes: > >> > On Wed, 14 Feb 2024, Richard Sandiford wrote: > >> > > >> >> Richard Biener <rguenther@suse.de> writes: > >> >> > The following avoids accessing out-of-bound vector elements when > >> >> > native encoding a boolean vector with sub-BITS_PER_UNIT precision > >> >> > elements. The error was basing the number of elements to extract > >> >> > on the rounded up total byte size involved and the patch bases > >> >> > everything on the total number of elements to extract instead. > >> >> > >> >> It's too long ago to be certain, but I think this was a deliberate choice. > >> >> The point of the new vector constant encoding is that it can give an > >> >> allegedly sensible value for any given index, even out-of-range ones. > >> >> > >> >> Since the padding bits are undefined, we should in principle have a free > >> >> choice of what to use. And for VLA, it's often better to continue the > >> >> existing pattern rather than force to zero. > >> >> > >> >> I don't strongly object to changing it. I think we should be careful > >> >> about relying on zeroing for correctness though. The bits are in principle > >> >> undefined and we can't rely on reading zeros from equivalent memory or > >> >> register values. > >> > > >> > The main motivation for a change here is to allow catching out-of-bound > >> > indices again for VECTOR_CST_ELT, at least for constant nunits because > >> > it might be a programming error like fat-fingering the index. I do > >> > think it's a regression that we no longer catch those. > >> > > >> > It's probably also a bit non-obvious how an encoding continues and > >> > there might be DImode masks that can be represented by a > >> > zero-extended QImode immediate but "continued" it would require > >> > a larger immediate. > >> > > >> > The change also effectively only changes something for 1 byte > >> > encodings since nunits is a power of two and so is the element > >> > size in bits. > >> > >> Yeah, but even there, there's an argument that all-1s (0xff) is a more > >> obvious value for an all-1s mask. > >> > >> > A patch restoring the VECTOR_CST_ELT checking might be the > >> > following > >> > > >> > diff --git a/gcc/tree.cc b/gcc/tree.cc > >> > index 046a558d1b0..4c9b05167fd 100644 > >> > --- a/gcc/tree.cc > >> > +++ b/gcc/tree.cc > >> > @@ -10325,6 +10325,9 @@ vector_cst_elt (const_tree t, unsigned int i) > >> > if (i < encoded_nelts) > >> > return VECTOR_CST_ENCODED_ELT (t, i); > >> > > >> > + /* Catch out-of-bound element accesses. */ > >> > + gcc_checking_assert (maybe_gt (VECTOR_CST_NELTS (t), i)); > >> > + > >> > /* If there are no steps, the final encoded value is the right one. */ > >> > if (!VECTOR_CST_STEPPED_P (t)) > >> > { > >> > > >> > but it triggers quite a bit via const_binop for, for example > >> > > >> > #2 0x00000000011c1506 in const_binop (code=PLUS_EXPR, > >> > arg1=<vector_cst 0x7ffff6f40b40>, arg2=<vector_cst 0x7ffff6e731e0>) > >> > (gdb) p debug_generic_expr (arg1) > >> > { 12, 13, 14, 15 } > >> > $5 = void > >> > (gdb) p debug_generic_expr (arg2) > >> > { -2, -2, -2, -3 } > >> > (gdb) p count > >> > $4 = 6 > >> > (gdb) l > >> > 1711 if (!elts.new_binary_operation (type, arg1, arg2, > >> > step_ok_p)) > >> > 1712 return NULL_TREE; > >> > 1713 unsigned int count = elts.encoded_nelts (); > >> > 1714 for (unsigned int i = 0; i < count; ++i) > >> > 1715 { > >> > 1716 tree elem1 = VECTOR_CST_ELT (arg1, i); > >> > 1717 tree elem2 = VECTOR_CST_ELT (arg2, i); > >> > 1718 > >> > 1719 tree elt = const_binop (code, elem1, elem2); > >> > > >> > this seems like an error to me - why would we, for fixed-size > >> > vectors and for PLUS ever create a vector encoding with 6 elements?! > >> > That seems at least inefficient to me? > >> > >> It's a case of picking your poison. On the other side, operating > >> individually on each element of a V64QI is inefficient when the > >> representation says up-front that all elements are equal. > > > > True, though I wonder why for VLS vectors new_binary_operation > > doesn't cap the number of encoded elts on the fixed vector size, > > like doing > > > > encoded_elts = ordered_min (TYPE_VECTOR_SUBPARTS (..), encoded_elts); > > > > or if there's no good way to write it applying for both VLA and VLS > > do it only when TYPE_VECTOR_SUBPARTS is constant. > > ordered_min can't be used because there's no guarantee that encoded_elts > and TYPE_VECTOR_SUBPARTS are well-ordered for the VLA case. E.g. > for a stepped (3-element) encoding and a length of 2+2X, the stepped > encoding is longer for X==0 and the vector is longer for X>0. > > But yeah, in general, trying to enforce this for VLS would probably > lead to a proliferation of more "if VLA do one thing, if VLS do some > other thing". The aim was to avoid that where it didn't seem strictly > necessary. > > >> Fundemantally, operations on VLA vectors are treated as functions > >> that map patterns to patterns. The number of elements that are > >> consumed isn't really relevant to the function itself. The VLA > >> folders therefore rely on being to read an element from a pattern > >> even if the index is outside TREE_VECTOR_SUBPARTS. > >> > >> There were two reasons for using VLA paths for VLS vectors. > >> One I mentioned above: it saves time when all elements are equal, > >> or have a similarly compact representation. The other is that it > >> makes VLA less special and ensures that the code gets more testing. > >> > >> Maybe one compromise between that and the assert would be: > >> > >> (1) enforce the assert only for VLS and > > > > that's what I did by using maybe_gt? > > > >> (2) add new checks to ensure that a VLA-friendly operation will never > >> read out-of-bounds for VLS vectors > >> > >> But I think this would be awkward. E.g. we now build reversed vectors > >> as a 3-element series N-1, N-2, N-3. It would be nice not to have > >> to special-case N==2 by suppressing N-3. And the condition for (2) > >> might not always be obvious. > > > > If we know only N <= 2 is problematic then having must_eq () > > special cases there makes sense IMO. > > I'm not sure it's just that. E.g. a zip is { 0, N, 1, N+1, 2, N+2, ... }, > which would need to be curtailed for N==2 or N==4. > > >> Another option would be to have special accessors that are allowed > >> to read out of bounds, and add the assert (for both VLA & VLS) to > >> VECTOR_CST_ELT. It might take a while to root out all the places that > >> need to change though. > > > > Yes, that's another possibility. > > > > The point is (and we're seeing it now with the behavior of > > encoding QImode bitmasks) that the behavior isn't entirely > > obvious when VLS vectors are involved. There's a memset zero > > of the QImode so anything other than zero-padding is odd. > > Which memset do you mean? The one in: > > /* Zero the buffer and then set bits later where necessary. */ > int extract_bytes = MIN (len, total_bytes - off); > if (ptr) > memset (ptr, 0, extract_bytes); > > ? If so, that's there for elt_bits>1. It means that the canonical > form of a 2-bit active mask element is 01 rather than 11. (And that's > something that might end up needing to be target-configurable. The current > behaviour is guided by SVE, although it shouldn't matter much in practice > there.) > > > It's also not clear to me we'd get reliable sign extension > > (we probably don't?). > > For which case? The idea isn't that we always sign-extend or always > zero-extend, but that we repeat the pattern. So an all-1s mask would > give an all-1s QImode. A 1010 mask would give a 10101010 QI, etc. But wouldn't 0 1 2 get you 0 1 2 3 4 5? (not for bools of course) Anyway, for QImode the efficiency argument is likely moot unless some targets only have say 4 bit immediates, but even if they probably either always sign or zero extend rather than pad out the 4 bits to repeated other 4 bits ... so "repeating" isn't the natural thing to do, it's zero- or sign-extending into padding. Richard. ^ permalink raw reply [flat|nested] 8+ messages in thread
[parent not found: <20240206113813.26DCA3858D37@sourceware.org>]
* Re: [PATCH] middle-end/113576 - avoid out-of-bound vector element access [not found] <20240206113813.26DCA3858D37@sourceware.org> @ 2024-03-05 14:39 ` Jeff Law 0 siblings, 0 replies; 8+ messages in thread From: Jeff Law @ 2024-03-05 14:39 UTC (permalink / raw) To: Richard Biener, gcc-patches; +Cc: richard.sandiford On 2/6/24 04:37, Richard Biener wrote: > The following avoids accessing out-of-bound vector elements when > native encoding a boolean vector with sub-BITS_PER_UNIT precision > elements. The error was basing the number of elements to extract > on the rounded up total byte size involved and the patch bases > everything on the total number of elements to extract instead. > > As a side-effect this now consistently results in zeros in the > padding of the last encoded byte which also avoids the failure > mode seen in PR113576. > > Bootstrapped and tested on x86_64-unknown-linux-gnu. > > OK? > > Thanks, > Richard. > > PR middle-end/113576 > * fold-const.cc (native_encode_vector_part): Avoid accessing > out-of-bound elements. OK. jeff ^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH] middle-end/113576 - avoid out-of-bound vector element access @ 2024-02-06 11:37 Richard Biener 0 siblings, 0 replies; 8+ messages in thread From: Richard Biener @ 2024-02-06 11:37 UTC (permalink / raw) To: gcc-patches; +Cc: richard.sandiford The following avoids accessing out-of-bound vector elements when native encoding a boolean vector with sub-BITS_PER_UNIT precision elements. The error was basing the number of elements to extract on the rounded up total byte size involved and the patch bases everything on the total number of elements to extract instead. As a side-effect this now consistently results in zeros in the padding of the last encoded byte which also avoids the failure mode seen in PR113576. Bootstrapped and tested on x86_64-unknown-linux-gnu. OK? Thanks, Richard. PR middle-end/113576 * fold-const.cc (native_encode_vector_part): Avoid accessing out-of-bound elements. --- gcc/fold-const.cc | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc index 80e211e18c0..8638757312b 100644 --- a/gcc/fold-const.cc +++ b/gcc/fold-const.cc @@ -8057,13 +8057,13 @@ native_encode_vector_part (const_tree expr, unsigned char *ptr, int len, off = 0; /* Zero the buffer and then set bits later where necessary. */ - int extract_bytes = MIN (len, total_bytes - off); + unsigned elts_per_byte = BITS_PER_UNIT / elt_bits; + unsigned first_elt = off * elts_per_byte; + unsigned extract_elts = MIN (len * elts_per_byte, count - first_elt); + unsigned extract_bytes = CEIL (elt_bits * extract_elts, BITS_PER_UNIT); if (ptr) memset (ptr, 0, extract_bytes); - unsigned int elts_per_byte = BITS_PER_UNIT / elt_bits; - unsigned int first_elt = off * elts_per_byte; - unsigned int extract_elts = extract_bytes * elts_per_byte; for (unsigned int i = 0; i < extract_elts; ++i) { tree elt = VECTOR_CST_ELT (expr, first_elt + i); -- 2.35.3 ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2024-03-05 14:39 UTC | newest] Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- [not found] <20240206113819.2335F3858402@sourceware.org> 2024-02-14 11:31 ` [PATCH] middle-end/113576 - avoid out-of-bound vector element access Richard Sandiford 2024-02-14 12:34 ` Richard Biener 2024-02-14 15:42 ` Richard Sandiford 2024-02-15 8:12 ` Richard Biener 2024-02-15 9:46 ` Richard Sandiford 2024-02-15 10:12 ` Richard Biener [not found] <20240206113813.26DCA3858D37@sourceware.org> 2024-03-05 14:39 ` Jeff Law 2024-02-06 11:37 Richard Biener
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).