public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug tree-optimization/107451] New: Segmentation fault with vectorized code.
@ 2022-10-28 19:07 bartoldeman at users dot sourceforge.net
  2022-10-28 19:19 ` [Bug tree-optimization/107451] " jakub at gcc dot gnu.org
                   ` (14 more replies)
  0 siblings, 15 replies; 16+ messages in thread
From: bartoldeman at users dot sourceforge.net @ 2022-10-28 19:07 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107451

            Bug ID: 107451
           Summary: Segmentation fault with vectorized code.
           Product: gcc
           Version: 11.3.1
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: bartoldeman at users dot sourceforge.net
  Target Milestone: ---

Created attachment 53785
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53785&action=edit
Test case

The following code:

double dot(int n, const double *x, int inc_x, const double *y)
{
        int i, ix;
        double dot[4] = { 0.0, 0.0, 0.0, 0.0 } ; 

        ix=0;
        for(i = 0; i < n; i++) {
                dot[0] += x[ix]   * y[ix]   ;
                dot[1] += x[ix+1] * y[ix+1] ;
                dot[2] += x[ix]   * y[ix+1] ;
                dot[3] += x[ix+1] * y[ix]   ;
                ix += inc_x ;
        }

        return dot[0] + dot[1] + dot[2] + dot[3];
}

int main(void)
{
        double x = 0, y = 0;
        return dot(1, &x, 4096*4096, &y);
}

crashes with (on Linux x86-64)

$ gcc -O2 -ftree-vectorize -march=haswell crash.c -o crash
$ ./a.out 
Segmentation fault

for GCC 11.3.0 and also the current prerelease (gcc version 11.3.1 20221021),
and also when patched with the patches from
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107254 and
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107212.

The loop code assembly is as follows:

  18:   c5 f9 10 1e             vmovupd (%rsi),%xmm3
  1c:   c5 f9 10 21             vmovupd (%rcx),%xmm4
  20:   ff c2                   inc    %edx
  22:   c4 e3 65 18 0c 06 01    vinsertf128 $0x1,(%rsi,%rax,1),%ymm3,%ymm1
  29:   c4 e3 5d 18 04 01 01    vinsertf128 $0x1,(%rcx,%rax,1),%ymm4,%ymm0
  30:   48 01 c6                add    %rax,%rsi
  33:   48 01 c1                add    %rax,%rcx
  36:   c4 e3 fd 01 c9 11       vpermpd $0x11,%ymm1,%ymm1
  3c:   c4 e3 fd 01 c0 14       vpermpd $0x14,%ymm0,%ymm0
  42:   c4 e2 f5 b8 d0          vfmadd231pd %ymm0,%ymm1,%ymm2
  47:   39 fa                   cmp    %edi,%edx
  49:   75 cd                   jne    18 <dot+0x18>

what happens here is that the vinsertf128 instructions take the element from
one loop iteration later, and those get put in the high halves of ymm0 and
ymm1.
The vpermpd instructions then throw away those high halves again, so e.g. they
turn 1,2,3,4 into 2,1,2,1 and 1,2,2,1 respectively.

So the result is correct but the superfluous vinsertf128 instructions access
memory potentially past the end of x or y and thus a produce a segfault.

related issue (coming from OpenBLAS):
https://github.com/easybuilders/easybuild-easyconfigs/issues/16387
may also be related:
https://github.com/xianyi/OpenBLAS/issues/3740#issuecomment-1233899834
(the particular comment shows very similar code but it's for GCC 12 which
vectorizes by default, OpenBLAS worked around this by disabling the tree
vectorizer there but only on Mac OS and Windows).

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug tree-optimization/107451] Segmentation fault with vectorized code.
  2022-10-28 19:07 [Bug tree-optimization/107451] New: Segmentation fault with vectorized code bartoldeman at users dot sourceforge.net
@ 2022-10-28 19:19 ` jakub at gcc dot gnu.org
  2022-10-28 19:19 ` [Bug tree-optimization/107451] [11/12/13 Regression] " pinskia at gcc dot gnu.org
                   ` (13 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: jakub at gcc dot gnu.org @ 2022-10-28 19:19 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107451

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jakub at gcc dot gnu.org
             Status|UNCONFIRMED                 |RESOLVED
         Resolution|---                         |INVALID

--- Comment #1 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
The bug is in the testcase:
gcc -fsanitize=undefined,address -g -o /tmp/pr107451{,.c}; /tmp/pr107451
=================================================================
==2296364==ERROR: AddressSanitizer: stack-buffer-overflow on address
0x7ffca382d798 at pc 0x00000040148c bp 0x7ffca382d680 sp 0x7ffca382d678
READ of size 8 at 0x7ffca382d798 thread T0
    #0 0x40148b in dot /tmp/pr107451.c:9
    #1 0x4019f8 in main /tmp/pr107451.c:21
    #2 0x7f8c74de858f in __libc_start_call_main (/lib64/libc.so.6+0x2958f)
    #3 0x7f8c74de8648 in __libc_start_main@GLIBC_2.2.5
(/lib64/libc.so.6+0x29648)
    #4 0x4010f4 in _start (/tmp/pr107451+0x4010f4)

Address 0x7ffca382d798 is located in stack of thread T0 at offset 40 in frame
    #0 0x401922 in main /tmp/pr107451.c:19

  This frame has 2 object(s):
    [32, 40) 'x' (line 20) <== Memory access at offset 40 overflows this
variable
    [64, 72) 'y' (line 20)
HINT: this may be a false positive if your program uses some custom stack
unwind mechanism, swapcontext or vfork
      (longjmp and C++ exceptions *are* supported)
SUMMARY: AddressSanitizer: stack-buffer-overflow /tmp/pr107451.c:9 in dot
Shadow bytes around the buggy address:
  0x1000146fdaa0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x1000146fdab0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x1000146fdac0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x1000146fdad0: 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1 00 00
  0x1000146fdae0: 00 00 f3 f3 f3 f3 00 00 00 00 00 00 00 00 f1 f1
=>0x1000146fdaf0: f1 f1 00[f2]f2 f2 00 f3 f3 f3 00 00 00 00 00 00
  0x1000146fdb00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x1000146fdb10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x1000146fdb20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x1000146fdb30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x1000146fdb40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07 
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
==2296364==ABORTING

x[ix+1] or y[ix+1] when ix is 0 and x is &x in main or y &y in main
is an out of bounds access.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug tree-optimization/107451] [11/12/13 Regression] Segmentation fault with vectorized code.
  2022-10-28 19:07 [Bug tree-optimization/107451] New: Segmentation fault with vectorized code bartoldeman at users dot sourceforge.net
  2022-10-28 19:19 ` [Bug tree-optimization/107451] " jakub at gcc dot gnu.org
@ 2022-10-28 19:19 ` pinskia at gcc dot gnu.org
  2022-10-29  0:12 ` bartoldeman at users dot sourceforge.net
                   ` (12 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: pinskia at gcc dot gnu.org @ 2022-10-28 19:19 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107451

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
      Known to work|                            |10.4.0
             Status|RESOLVED                    |NEW
   Target Milestone|---                         |11.4
   Last reconfirmed|                            |2022-10-28
            Summary|Segmentation fault with     |[11/12/13 Regression]
                   |vectorized code.            |Segmentation fault with
                   |                            |vectorized code.
         Resolution|INVALID                     |---
     Ever confirmed|0                           |1
      Known to fail|                            |11.1.0

--- Comment #2 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
I think this code is undefined as x/y are arrays of size 1 but you access one
past.

But here is the main which makes this well defined:
int main(void)
{
        double x[2] = {0,0}, y[2] = {0,0};
        return dot(1, &x[0], 4096*4096, &y[0]);
}

Still an issue on the trunk.
Confirmed.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug tree-optimization/107451] [11/12/13 Regression] Segmentation fault with vectorized code.
  2022-10-28 19:07 [Bug tree-optimization/107451] New: Segmentation fault with vectorized code bartoldeman at users dot sourceforge.net
  2022-10-28 19:19 ` [Bug tree-optimization/107451] " jakub at gcc dot gnu.org
  2022-10-28 19:19 ` [Bug tree-optimization/107451] [11/12/13 Regression] " pinskia at gcc dot gnu.org
@ 2022-10-29  0:12 ` bartoldeman at users dot sourceforge.net
  2022-10-31  3:29 ` crazylht at gmail dot com
                   ` (11 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: bartoldeman at users dot sourceforge.net @ 2022-10-29  0:12 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107451

bartoldeman at users dot sourceforge.net changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  Attachment #53785|0                           |1
        is obsolete|                            |

--- Comment #3 from bartoldeman at users dot sourceforge.net ---
Created attachment 53786
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53786&action=edit
Corrected test case

In my eagerness to make it as short as possible I made it too short indeed!

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug tree-optimization/107451] [11/12/13 Regression] Segmentation fault with vectorized code.
  2022-10-28 19:07 [Bug tree-optimization/107451] New: Segmentation fault with vectorized code bartoldeman at users dot sourceforge.net
                   ` (2 preceding siblings ...)
  2022-10-29  0:12 ` bartoldeman at users dot sourceforge.net
@ 2022-10-31  3:29 ` crazylht at gmail dot com
  2022-11-05 10:21 ` rguenth at gcc dot gnu.org
                   ` (10 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: crazylht at gmail dot com @ 2022-10-31  3:29 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107451

--- Comment #4 from Hongtao.liu <crazylht at gmail dot com> ---
(In reply to bartoldeman from comment #3)
> Created attachment 53786 [details]
> Corrected test case
> 
> In my eagerness to make it as short as possible I made it too short indeed!

 35  <bb 3> [local count: 105119324]:
 36  bnd.12_91 = (unsigned int) n_29(D);
 37  _90 = (long unsigned int) inc_x_33(D);
 38  _89 = _90 * 8;
 39  ivtmp.24_45 = (unsigned long) x_31(D);
 40  ivtmp.26_3 = (unsigned long) y_32(D);
 41
 42  <bb 4> [local count: 955630225]:
 43  # vect_dot_3_55.16_71 = PHI <vect__20.17_70(4), { 0.0, 0.0, 0.0, 0.0 }(3)>
 44  # ivtmp.19_55 = PHI <ivtmp.19_92(4), 0(3)>
 45  # ivtmp.24_49 = PHI <ivtmp.24_46(4), ivtmp.24_45(3)>
 46  # ivtmp.26_1 = PHI <ivtmp.26_2(4), ivtmp.26_3(3)>
 47  _75 = (void *) ivtmp.24_49;
 48  _78 = MEM <vector(2) double> [(const double *)_75];
 49  _76 = MEM <vector(2) double> [(const double *)_75 + _89 * 1];
 50  vect_cst__74 = {_78, _76}; --------------- here
 51  vect__4.14_73 = VEC_PERM_EXPR <vect_cst__74, vect_cst__74, { 1, 0, 1, 0
}>;
 52  _5 = (void *) ivtmp.26_1;
 53  _86 = MEM <vector(2) double> [(const double *)_5];
 54  _84 = MEM <vector(2) double> [(const double *)_5 + _89 * 1];
 55  vect_cst__82 = {_86, _84};  -------------- here
 56  vect__6.13_81 = VEC_PERM_EXPR <vect_cst__82, vect_cst__82, { 0, 1, 1, 0
}>;
 57  vect__20.17_70 = .FMA (vect__4.14_73, vect__6.13_81, vect_dot_3_55.16_71);
 58  ivtmp.19_92 = ivtmp.19_55 + 1;
 59  ivtmp.24_46 = ivtmp.24_49 + _89;
 60  ivtmp.26_2 = ivtmp.26_1 + _89;
 61  if (bnd.12_91 != ivtmp.19_92)
 62    goto <bb 4>; [90.00%]
 63  else
 64    goto <bb 5>; [10.00%]
 65
 66  <bb 5> [local count: 105119324]:
 67  _51 = .REDUC_PLUS (vect__20.17_70); [tail call]

It looks like it should be vect_cst__82 = {_84, _86} not {_86, _84}, similar
for vect_cst__74 = {_76, _78} not {_78, _76}.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug tree-optimization/107451] [11/12/13 Regression] Segmentation fault with vectorized code.
  2022-10-28 19:07 [Bug tree-optimization/107451] New: Segmentation fault with vectorized code bartoldeman at users dot sourceforge.net
                   ` (3 preceding siblings ...)
  2022-10-31  3:29 ` crazylht at gmail dot com
@ 2022-11-05 10:21 ` rguenth at gcc dot gnu.org
  2022-11-05 10:29 ` [Bug tree-optimization/107451] [11/12/13 Regression] Segmentation fault with vectorized code since r11-6434 jakub at gcc dot gnu.org
                   ` (9 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: rguenth at gcc dot gnu.org @ 2022-11-05 10:21 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107451

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|                            |needs-bisection
             Status|NEW                         |ASSIGNED
           Assignee|unassigned at gcc dot gnu.org      |rguenth at gcc dot gnu.org

--- Comment #5 from Richard Biener <rguenth at gcc dot gnu.org> ---
I will have a look.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug tree-optimization/107451] [11/12/13 Regression] Segmentation fault with vectorized code since r11-6434
  2022-10-28 19:07 [Bug tree-optimization/107451] New: Segmentation fault with vectorized code bartoldeman at users dot sourceforge.net
                   ` (4 preceding siblings ...)
  2022-11-05 10:21 ` rguenth at gcc dot gnu.org
@ 2022-11-05 10:29 ` jakub at gcc dot gnu.org
  2022-11-17  9:06 ` rguenth at gcc dot gnu.org
                   ` (8 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: jakub at gcc dot gnu.org @ 2022-11-05 10:29 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107451

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Summary|[11/12/13 Regression]       |[11/12/13 Regression]
                   |Segmentation fault with     |Segmentation fault with
                   |vectorized code.            |vectorized code since
                   |                            |r11-6434
           Keywords|needs-bisection             |

--- Comment #6 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Started with r11-6434-g8837f82e4bab1b5405cf034eab9b3e83afc563ad

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug tree-optimization/107451] [11/12/13 Regression] Segmentation fault with vectorized code since r11-6434
  2022-10-28 19:07 [Bug tree-optimization/107451] New: Segmentation fault with vectorized code bartoldeman at users dot sourceforge.net
                   ` (5 preceding siblings ...)
  2022-11-05 10:29 ` [Bug tree-optimization/107451] [11/12/13 Regression] Segmentation fault with vectorized code since r11-6434 jakub at gcc dot gnu.org
@ 2022-11-17  9:06 ` rguenth at gcc dot gnu.org
  2022-11-17 14:48 ` rguenth at gcc dot gnu.org
                   ` (7 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: rguenth at gcc dot gnu.org @ 2022-11-17  9:06 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107451

--- Comment #7 from Richard Biener <rguenth at gcc dot gnu.org> ---
Apart from the permute issue that's maybe there the issue of the segfault is
failure to code generate the loads correctly to match the SLP analysis.  We
generate loads as if we'd use a VF of 2 but use only the lower part but DCE /
simplification doesn't simplify

  _64 = MEM <vector(2) double> [(const double *)ivtmp_66];
  ivtmp_63 = ivtmp_66 + _75;
  _62 = MEM <vector(2) double> [(const double *)ivtmp_63];
  vect_cst__60 = {_64, _62};
  vect__4.12_59 = VEC_PERM_EXPR <vect_cst__60, vect_cst__60, { 1, 0, 1, 0 }>;

to, for example

  _64 = MEM <vector(2) double> [(const double *)ivtmp_66];
  ivtmp_63 = ivtmp_66 + _75;
  vect_cst__60 = {_64, _64};
  vect__4.12_59 = VEC_PERM_EXPR <vect_cst__60, vect_cst__60, { 1, 0, 1, 0 }>;

(we now also allow VEC_PERM of _64, _64 directly with GCC 13, but the targets
need to be ready for this)

That's probably a latent issue in other cases as well.  We'd either need to
disallow these kind of load permutations or make sure we only reference
the actually loaded DR group when filling the input vectors.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug tree-optimization/107451] [11/12/13 Regression] Segmentation fault with vectorized code since r11-6434
  2022-10-28 19:07 [Bug tree-optimization/107451] New: Segmentation fault with vectorized code bartoldeman at users dot sourceforge.net
                   ` (6 preceding siblings ...)
  2022-11-17  9:06 ` rguenth at gcc dot gnu.org
@ 2022-11-17 14:48 ` rguenth at gcc dot gnu.org
  2022-11-17 14:57 ` bartoldeman at users dot sourceforge.net
                   ` (6 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: rguenth at gcc dot gnu.org @ 2022-11-17 14:48 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107451

--- Comment #8 from Richard Biener <rguenth at gcc dot gnu.org> ---
Peeling for gaps also isn't a good fix here.  One could envision a case with
even three iterations ahead load with

        for(i = 0; i < n; i++) {
                dot[0] += x[ix]   * y[ix]   ;
                dot[1] += x[ix] * y[ix] ;
                dot[2] += x[ix]   * y[ix] ;
                dot[3] += x[ix] * y[ix]   ;
                ix += inc_x ;
        }

or similar.  The root cause is how we generate code for VMAT_STRIDED_SLP
where we first generate loads to fill a contiguous output vector but only
then create the permute using the pieces that are actually necessary.

We could simply fail if 'nloads' is bigger than 'vf', or cap 'nloads' and
fail if we the cannot generate the permutation.

When we force VMAT_ELEMENTWISE the very same issue arises but later
optimization will eliminate the unnecessary loads, avoiding the problem:

  _62 = *ivtmp_64;
  _61 = MEM[(const double *)ivtmp_64 + 8B];
  ivtmp_60 = ivtmp_64 + _75;
  _59 = *ivtmp_60;
  _58 = MEM[(const double *)ivtmp_60 + 8B];
  ivtmp_57 = ivtmp_60 + _75;
  vect_cst__48 = {_62, _61, _59, _58};
  vect__4.12_47 = VEC_PERM_EXPR <vect_cst__48, vect_cst__48, { 1, 0, 1, 0 }>;

that just becomes

  _62 = MEM[(const double *)ivtmp_64];
  _61 = MEM[(const double *)ivtmp_64 + 8B];
  ivtmp_60 = ivtmp_64 + _75;
  vect__4.12_47 = {_61, _62, _61, _62};

with cost modeling and VMAT_ELEMENTWISE we fall back to SSE vectorization
which works fine.

I fear the proper fix is to integrate load emission with
vect_transform_slp_perm_load somehow, we shouldn't rely on followup
simplifications to fix what the vectorizer emits here.

Since we have no fallback detecting the situation and avoiding it completely
would mean to not vectorize the code (with AVX).

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug tree-optimization/107451] [11/12/13 Regression] Segmentation fault with vectorized code since r11-6434
  2022-10-28 19:07 [Bug tree-optimization/107451] New: Segmentation fault with vectorized code bartoldeman at users dot sourceforge.net
                   ` (7 preceding siblings ...)
  2022-11-17 14:48 ` rguenth at gcc dot gnu.org
@ 2022-11-17 14:57 ` bartoldeman at users dot sourceforge.net
  2022-11-17 14:58 ` rguenth at gcc dot gnu.org
                   ` (5 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: bartoldeman at users dot sourceforge.net @ 2022-11-17 14:57 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107451

--- Comment #9 from bartoldeman at users dot sourceforge.net ---
I ended up using -mprefer-vector-width=128 as a workaround myself (via
__attribute__((target("prefer-vector-width=128")))), so there is still some AVX
vectorization.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug tree-optimization/107451] [11/12/13 Regression] Segmentation fault with vectorized code since r11-6434
  2022-10-28 19:07 [Bug tree-optimization/107451] New: Segmentation fault with vectorized code bartoldeman at users dot sourceforge.net
                   ` (8 preceding siblings ...)
  2022-11-17 14:57 ` bartoldeman at users dot sourceforge.net
@ 2022-11-17 14:58 ` rguenth at gcc dot gnu.org
  2022-12-22 11:21 ` cvs-commit at gcc dot gnu.org
                   ` (4 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: rguenth at gcc dot gnu.org @ 2022-11-17 14:58 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107451

--- Comment #10 from Richard Biener <rguenth at gcc dot gnu.org> ---
Interestingly the following variant of the testcase falls back to
VMAT_ELEMENTWISE but does have the same problem there fixed up by later
folding, but it will segfault when using -O2 -mavx2 -fno-vect-cost-model
-fdisable-tree-vrp2 -fdisable-free-forwprop4 which then keeps the bogus

  _62 = *ivtmp_64;
  _61 = MEM[(const double *)ivtmp_64 + 8B];
  ivtmp_60 = ivtmp_64 + _65;
  _59 = *ivtmp_60;
  _58 = MEM[(const double *)ivtmp_60 + 8B];
  ivtmp_57 = ivtmp_60 + _65;
  vect_cst__56 = {_62, _61, _59, _58};
  vect__4.7_55 = VEC_PERM_EXPR <vect_cst__56, vect_cst__56, { 0, 1, 0, 1 }>;

that problem should be present even before the r11-6434 change.  In fact
this segfaults on the GCC 10 branch with just -O2 -ftree-loop-vectorize -mavx2
generating the same load/permute as trunk for the reduction (so there's some
half-way "fix" on the later branches).  Also broken with GCC 9.5.

static void __attribute__((noipa))
setdot(int n, const double *x, int inc_x, const double *y, double * __restrict
dot)
{
  int i, ix = 0;

  for(i = 0; i < n; i++) {
      dot[i*4+0] = x[ix]          * y[ix]   ;
      dot[i*4+1] = x[ix+1] * y[ix+1] ;
      dot[i*4+2] = x[ix]          * y[ix+1] ;
      dot[i*4+3] = x[ix+1] * y[ix]   ;
      ix += inc_x ;
  }
}

int main(void)
{
  double x[2] = {0, 0}, y[2] = {0, 0};
  double dot[4];
  setdot(1, x, 4096*4096, y, dot);
  return 0;
}

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug tree-optimization/107451] [11/12/13 Regression] Segmentation fault with vectorized code since r11-6434
  2022-10-28 19:07 [Bug tree-optimization/107451] New: Segmentation fault with vectorized code bartoldeman at users dot sourceforge.net
                   ` (9 preceding siblings ...)
  2022-11-17 14:58 ` rguenth at gcc dot gnu.org
@ 2022-12-22 11:21 ` cvs-commit at gcc dot gnu.org
  2022-12-22 11:21 ` [Bug tree-optimization/107451] [11/12 " rguenth at gcc dot gnu.org
                   ` (3 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2022-12-22 11:21 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107451

--- Comment #11 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Richard Biener <rguenth@gcc.gnu.org>:

https://gcc.gnu.org/g:7b2cf5041460859ca4f58e5da1308b7ef9129d8b

commit r13-4843-g7b2cf5041460859ca4f58e5da1308b7ef9129d8b
Author: Richard Biener <rguenther@suse.de>
Date:   Thu Dec 22 09:36:17 2022 +0100

    tree-optimization/107451 - SLP load vectorization issue

    When vectorizing SLP loads with permutations we can access excess
    elements when the load vector type is bigger than the group size
    and the vectorization factor covers less groups than necessary
    to fill it.  Since we know the code will only access up to
    group_size * VF elements in the unpermuted vector we can simply
    fill the rest of the vector with whatever we want.  For simplicity
    this patch chooses to repeat the last group.

            PR tree-optimization/107451
            * tree-vect-stmts.cc (vectorizable_load): Avoid loading
            SLP group members from group numbers in excess of the
            vectorization factor.

            * gcc.dg/torture/pr107451.c: New testcase.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug tree-optimization/107451] [11/12 Regression] Segmentation fault with vectorized code since r11-6434
  2022-10-28 19:07 [Bug tree-optimization/107451] New: Segmentation fault with vectorized code bartoldeman at users dot sourceforge.net
                   ` (10 preceding siblings ...)
  2022-12-22 11:21 ` cvs-commit at gcc dot gnu.org
@ 2022-12-22 11:21 ` rguenth at gcc dot gnu.org
  2023-03-15  9:47 ` cvs-commit at gcc dot gnu.org
                   ` (2 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: rguenth at gcc dot gnu.org @ 2022-12-22 11:21 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107451

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
      Known to work|                            |13.0
            Summary|[11/12/13 Regression]       |[11/12 Regression]
                   |Segmentation fault with     |Segmentation fault with
                   |vectorized code since       |vectorized code since
                   |r11-6434                    |r11-6434
           Priority|P3                          |P2

--- Comment #12 from Richard Biener <rguenth at gcc dot gnu.org> ---
Fixed on trunk sofar.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug tree-optimization/107451] [11/12 Regression] Segmentation fault with vectorized code since r11-6434
  2022-10-28 19:07 [Bug tree-optimization/107451] New: Segmentation fault with vectorized code bartoldeman at users dot sourceforge.net
                   ` (11 preceding siblings ...)
  2022-12-22 11:21 ` [Bug tree-optimization/107451] [11/12 " rguenth at gcc dot gnu.org
@ 2023-03-15  9:47 ` cvs-commit at gcc dot gnu.org
  2023-05-02 12:03 ` [Bug tree-optimization/107451] [11 " cvs-commit at gcc dot gnu.org
  2023-05-02 12:05 ` rguenth at gcc dot gnu.org
  14 siblings, 0 replies; 16+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2023-03-15  9:47 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107451

--- Comment #13 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The releases/gcc-12 branch has been updated by Richard Biener
<rguenth@gcc.gnu.org>:

https://gcc.gnu.org/g:c722c6b061a5e909267eae53ffe5910fbe0a7d5e

commit r12-9255-gc722c6b061a5e909267eae53ffe5910fbe0a7d5e
Author: Richard Biener <rguenther@suse.de>
Date:   Thu Dec 22 09:36:17 2022 +0100

    tree-optimization/107451 - SLP load vectorization issue

    When vectorizing SLP loads with permutations we can access excess
    elements when the load vector type is bigger than the group size
    and the vectorization factor covers less groups than necessary
    to fill it.  Since we know the code will only access up to
    group_size * VF elements in the unpermuted vector we can simply
    fill the rest of the vector with whatever we want.  For simplicity
    this patch chooses to repeat the last group.

            PR tree-optimization/107451
            * tree-vect-stmts.cc (vectorizable_load): Avoid loading
            SLP group members from group numbers in excess of the
            vectorization factor.

            * gcc.dg/torture/pr107451.c: New testcase.

    (cherry picked from commit 7b2cf5041460859ca4f58e5da1308b7ef9129d8b)

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug tree-optimization/107451] [11 Regression] Segmentation fault with vectorized code since r11-6434
  2022-10-28 19:07 [Bug tree-optimization/107451] New: Segmentation fault with vectorized code bartoldeman at users dot sourceforge.net
                   ` (12 preceding siblings ...)
  2023-03-15  9:47 ` cvs-commit at gcc dot gnu.org
@ 2023-05-02 12:03 ` cvs-commit at gcc dot gnu.org
  2023-05-02 12:05 ` rguenth at gcc dot gnu.org
  14 siblings, 0 replies; 16+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2023-05-02 12:03 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107451

--- Comment #14 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The releases/gcc-11 branch has been updated by Richard Biener
<rguenth@gcc.gnu.org>:

https://gcc.gnu.org/g:e74518e28d98994c2059063f339533d344db85e0

commit r11-10673-ge74518e28d98994c2059063f339533d344db85e0
Author: Richard Biener <rguenther@suse.de>
Date:   Thu Dec 22 09:36:17 2022 +0100

    tree-optimization/107451 - SLP load vectorization issue

    When vectorizing SLP loads with permutations we can access excess
    elements when the load vector type is bigger than the group size
    and the vectorization factor covers less groups than necessary
    to fill it.  Since we know the code will only access up to
    group_size * VF elements in the unpermuted vector we can simply
    fill the rest of the vector with whatever we want.  For simplicity
    this patch chooses to repeat the last group.

            PR tree-optimization/107451
            * tree-vect-stmts.c (vectorizable_load): Avoid loading
            SLP group members from group numbers in excess of the
            vectorization factor.

            * gcc.dg/torture/pr107451.c: New testcase.

    (cherry picked from commit 7b2cf5041460859ca4f58e5da1308b7ef9129d8b)

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug tree-optimization/107451] [11 Regression] Segmentation fault with vectorized code since r11-6434
  2022-10-28 19:07 [Bug tree-optimization/107451] New: Segmentation fault with vectorized code bartoldeman at users dot sourceforge.net
                   ` (13 preceding siblings ...)
  2023-05-02 12:03 ` [Bug tree-optimization/107451] [11 " cvs-commit at gcc dot gnu.org
@ 2023-05-02 12:05 ` rguenth at gcc dot gnu.org
  14 siblings, 0 replies; 16+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-05-02 12:05 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107451

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
      Known to work|                            |11.3.1
         Resolution|---                         |FIXED
      Known to fail|                            |11.3.0
             Status|ASSIGNED                    |RESOLVED

--- Comment #15 from Richard Biener <rguenth at gcc dot gnu.org> ---
Fixed.

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2023-05-02 12:05 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-10-28 19:07 [Bug tree-optimization/107451] New: Segmentation fault with vectorized code bartoldeman at users dot sourceforge.net
2022-10-28 19:19 ` [Bug tree-optimization/107451] " jakub at gcc dot gnu.org
2022-10-28 19:19 ` [Bug tree-optimization/107451] [11/12/13 Regression] " pinskia at gcc dot gnu.org
2022-10-29  0:12 ` bartoldeman at users dot sourceforge.net
2022-10-31  3:29 ` crazylht at gmail dot com
2022-11-05 10:21 ` rguenth at gcc dot gnu.org
2022-11-05 10:29 ` [Bug tree-optimization/107451] [11/12/13 Regression] Segmentation fault with vectorized code since r11-6434 jakub at gcc dot gnu.org
2022-11-17  9:06 ` rguenth at gcc dot gnu.org
2022-11-17 14:48 ` rguenth at gcc dot gnu.org
2022-11-17 14:57 ` bartoldeman at users dot sourceforge.net
2022-11-17 14:58 ` rguenth at gcc dot gnu.org
2022-12-22 11:21 ` cvs-commit at gcc dot gnu.org
2022-12-22 11:21 ` [Bug tree-optimization/107451] [11/12 " rguenth at gcc dot gnu.org
2023-03-15  9:47 ` cvs-commit at gcc dot gnu.org
2023-05-02 12:03 ` [Bug tree-optimization/107451] [11 " cvs-commit at gcc dot gnu.org
2023-05-02 12:05 ` rguenth at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).