public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug tree-optimization/98535] New: [11 Regression] ICE in operands_scanner::get_expr_operands(tree_node**, int) building 538.imagick_r
@ 2021-01-05 15:43 ktkachov at gcc dot gnu.org
  2021-01-05 15:43 ` [Bug tree-optimization/98535] " ktkachov at gcc dot gnu.org
                   ` (13 more replies)
  0 siblings, 14 replies; 15+ messages in thread
From: ktkachov at gcc dot gnu.org @ 2021-01-05 15:43 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98535

            Bug ID: 98535
           Summary: [11 Regression] ICE in
                    operands_scanner::get_expr_operands(tree_node**, int)
                    building 538.imagick_r
           Product: gcc
           Version: unknown
            Status: UNCONFIRMED
          Keywords: ice-on-valid-code
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: ktkachov at gcc dot gnu.org
  Target Milestone: ---
            Target: aarch64

Building 538.imagick_r from SPEC2017 ICEs on aarch64 with -O3 -mcpu=neoverse-v1
(SVE-enabled)

A reduced testcase is:
typedef short a;

typedef struct {
  a b, c, d, e;
} f;

f *g;

long h;

void
i() {
  f j;
  for (; h; h++)
    *g++ = j;
}

during GIMPLE pass: vect
foo.c: In function 'i':
foo.c:12:1: internal compiler error: Segmentation fault
   12 | i() {
      | ^
0xd8f177 crash_signal
        $SRC/gcc/toplev.c:327
0xf7daa9 operands_scanner::get_expr_operands(tree_node**, int)
        $SRC/gcc/tree-ssa-operands.c:780
0xf7f823 operands_scanner::parse_ssa_operands()
        $SRC/gcc/tree-ssa-operands.c:998
0xf80eff operands_scanner::build_ssa_operands()
        $SRC/gcc/tree-ssa-operands.c:1013
0xf817b4 update_stmt_operands(function*, gimple*)
        $SRC/gcc/tree-ssa-operands.c:1155
0xa25ef9 update_stmt_if_modified
        $SRC/gcc/gimple-ssa.h:185
0xa25ef9 update_modified_stmt
        $SRC/gcc/gimple-iterator.c:44
0xa25ef9 gsi_insert_after(gimple_stmt_iterator*, gimple*, gsi_iterator_update)
        $SRC/gcc/gimple-iterator.c:540
0xa1a104 gimple_seq_add_stmt(gimple**, gimple*)
        $SRC/gcc/gimple.c:1282
0x10cd184 duplicate_and_interleave(vec_info*, gimple**, tree_node*,
vec<tree_node*, va_heap, vl_ptr>, unsigned int, vec<tree_node*, va_heap,
vl_ptr>&)
        $SRC/gcc/tree-vect-slp.c:4958
0x10cdc02 vect_create_constant_vectors
        $SRC/gcc/tree-vect-slp.c:5112
0x10cdc02 vect_schedule_slp_node
        $SRC/gcc/tree-vect-slp.c:5696
0x10dbff9 vect_schedule_scc
        $SRC/gcc/tree-vect-slp.c:5958
0x10dc2d2 vect_schedule_scc
        $SRC/gcc/tree-vect-slp.c:5975
0x10dcc0f vect_schedule_slp(vec_info*, vec<_slp_instance*, va_heap, vl_ptr>)
        $SRC/gcc/tree-vect-slp.c:6110
0x10b658d vect_transform_loop(_loop_vec_info*, gimple*)
        $SRC/gcc/tree-vect-loop.c:9468
0x10e6dd8 try_vectorize_loop_1
        $SRC/gcc/tree-vectorizer.c:1104
0x10e7458 try_vectorize_loop_1
        $SRC/gcc/tree-vectorizer.c:1141
0x10e7501 try_vectorize_loop
        $SRC/gcc/tree-vectorizer.c:1161
0x10e7847 vectorize_loops()
        $SRC/gcc/tree-vectorizer.c:1242
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See <https://gcc.gnu.org/bugs/> for instructions.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug tree-optimization/98535] [11 Regression] ICE in operands_scanner::get_expr_operands(tree_node**, int) building 538.imagick_r
  2021-01-05 15:43 [Bug tree-optimization/98535] New: [11 Regression] ICE in operands_scanner::get_expr_operands(tree_node**, int) building 538.imagick_r ktkachov at gcc dot gnu.org
@ 2021-01-05 15:43 ` ktkachov at gcc dot gnu.org
  2021-01-05 17:01 ` ktkachov at gcc dot gnu.org
                   ` (12 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: ktkachov at gcc dot gnu.org @ 2021-01-05 15:43 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98535

ktkachov at gcc dot gnu.org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|---                         |11.0
           Priority|P3                          |P1

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug tree-optimization/98535] [11 Regression] ICE in operands_scanner::get_expr_operands(tree_node**, int) building 538.imagick_r
  2021-01-05 15:43 [Bug tree-optimization/98535] New: [11 Regression] ICE in operands_scanner::get_expr_operands(tree_node**, int) building 538.imagick_r ktkachov at gcc dot gnu.org
  2021-01-05 15:43 ` [Bug tree-optimization/98535] " ktkachov at gcc dot gnu.org
@ 2021-01-05 17:01 ` ktkachov at gcc dot gnu.org
  2021-01-05 18:54 ` ktkachov at gcc dot gnu.org
                   ` (11 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: ktkachov at gcc dot gnu.org @ 2021-01-05 17:01 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98535

--- Comment #1 from ktkachov at gcc dot gnu.org ---
This backtrace started with my commit 64432b680eab0bddbe9a4ad4798457cf6a14ad60
but before this it still ICEd with:
foo.c: In function 'i':
foo.c:12:1: error: type mismatch in 'vec_perm_expr'
   12 | i() {
      | ^
vector([4,4]) unsigned short
vector([4,4]) unsigned short
unsigned long
vector([4,4]) ssizetype
_112 = VEC_PERM_EXPR <_111, niters.19_72, { 0, POLY_INT_CST [4, 4], 1,
POLY_INT_CST [5, 4], 2, POLY_INT_CST [6, 4], ... }>;
during GIMPLE pass: vect

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug tree-optimization/98535] [11 Regression] ICE in operands_scanner::get_expr_operands(tree_node**, int) building 538.imagick_r
  2021-01-05 15:43 [Bug tree-optimization/98535] New: [11 Regression] ICE in operands_scanner::get_expr_operands(tree_node**, int) building 538.imagick_r ktkachov at gcc dot gnu.org
  2021-01-05 15:43 ` [Bug tree-optimization/98535] " ktkachov at gcc dot gnu.org
  2021-01-05 17:01 ` ktkachov at gcc dot gnu.org
@ 2021-01-05 18:54 ` ktkachov at gcc dot gnu.org
  2021-01-05 20:18 ` marxin at gcc dot gnu.org
                   ` (10 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: ktkachov at gcc dot gnu.org @ 2021-01-05 18:54 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98535

ktkachov at gcc dot gnu.org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |rsandifo at gcc dot gnu.org

--- Comment #2 from ktkachov at gcc dot gnu.org ---
Bisection points to 6c3ce63b04b38f84c0357e4648383f0e3ab11cd9

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug tree-optimization/98535] [11 Regression] ICE in operands_scanner::get_expr_operands(tree_node**, int) building 538.imagick_r
  2021-01-05 15:43 [Bug tree-optimization/98535] New: [11 Regression] ICE in operands_scanner::get_expr_operands(tree_node**, int) building 538.imagick_r ktkachov at gcc dot gnu.org
                   ` (2 preceding siblings ...)
  2021-01-05 18:54 ` ktkachov at gcc dot gnu.org
@ 2021-01-05 20:18 ` marxin at gcc dot gnu.org
  2021-01-06  8:49 ` rguenth at gcc dot gnu.org
                   ` (9 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: marxin at gcc dot gnu.org @ 2021-01-05 20:18 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98535

Martin Liška <marxin at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |marxin at gcc dot gnu.org

--- Comment #3 from Martin Liška <marxin at gcc dot gnu.org> ---
g:6c3ce63b04b38f84c0357e4648383f0e3ab11cd9

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug tree-optimization/98535] [11 Regression] ICE in operands_scanner::get_expr_operands(tree_node**, int) building 538.imagick_r
  2021-01-05 15:43 [Bug tree-optimization/98535] New: [11 Regression] ICE in operands_scanner::get_expr_operands(tree_node**, int) building 538.imagick_r ktkachov at gcc dot gnu.org
                   ` (3 preceding siblings ...)
  2021-01-05 20:18 ` marxin at gcc dot gnu.org
@ 2021-01-06  8:49 ` rguenth at gcc dot gnu.org
  2021-01-19 10:04 ` rguenth at gcc dot gnu.org
                   ` (8 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-01-06  8:49 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98535

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Version|unknown                     |11.0
                 CC|                            |rguenth at gcc dot gnu.org

--- Comment #4 from Richard Biener <rguenth at gcc dot gnu.org> ---
guess 'pieces' is accessed in uninitialized parts.  Using quick_grow_cleared
to init it might more reliably "crash" things rather than ending up with
strange 'niters' entry ;)

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug tree-optimization/98535] [11 Regression] ICE in operands_scanner::get_expr_operands(tree_node**, int) building 538.imagick_r
  2021-01-05 15:43 [Bug tree-optimization/98535] New: [11 Regression] ICE in operands_scanner::get_expr_operands(tree_node**, int) building 538.imagick_r ktkachov at gcc dot gnu.org
                   ` (4 preceding siblings ...)
  2021-01-06  8:49 ` rguenth at gcc dot gnu.org
@ 2021-01-19 10:04 ` rguenth at gcc dot gnu.org
  2021-01-19 10:35 ` rguenth at gcc dot gnu.org
                   ` (7 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-01-19 10:04 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98535

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Assignee|unassigned at gcc dot gnu.org      |rguenth at gcc dot gnu.org
     Ever confirmed|0                           |1
   Last reconfirmed|                            |2021-01-19
             Status|UNCONFIRMED                 |ASSIGNED

--- Comment #5 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #4)
> guess 'pieces' is accessed in uninitialized parts.  Using quick_grow_cleared
> to init it might more reliably "crash" things rather than ending up with
> strange 'niters' entry ;)

Indeed.

> ./cc1 -quiet t.c -I include -O3 -mcpu=neoverse-v1
during GIMPLE pass: vect
t.c: In function 'i':
t.c:12:1: internal compiler error: in duplicate_and_interleave, at
tree-vect-slp.c:5115
   12 | i() {
      | ^
0x1778d9b duplicate_and_interleave(vec_info*, gimple**, tree_node*,
vec<tree_node*, va_heap, vl_ptr>, unsigned int, vec<tree_node*, va_heap,
vl_ptr>&)
        ../../src/trunk/gcc/tree-vect-slp.c:5115

with

diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c
index 1787ad74268..f4b2b69a6eb 100644
--- a/gcc/tree-vect-slp.c
+++ b/gcc/tree-vect-slp.c
@@ -5063,7 +5063,7 @@ duplicate_and_interleave (vec_info *vinfo, gimple_seq
*seq, tree vector_type,

   tree_vector_builder partial_elts;
   auto_vec<tree, 32> pieces (nvectors * 2);
-  pieces.quick_grow (nvectors * 2);
+  pieces.quick_grow_cleared (nvectors * 2);
   for (unsigned int i = 0; i < nvectors; ++i)
     {
       /* (2) Replace ELTS[0:NELTS] with ELTS'[0:NELTS'], where each element of
@@ -5112,6 +5112,7 @@ duplicate_and_interleave (vec_info *vinfo, gimple_seq
*seq, tree vector_type,
          tree output = make_ssa_name (new_vector_type);
          tree input1 = pieces[in_start + (i / 2)];
          tree input2 = pieces[in_start + (i / 2) + hi_start];
+         gcc_assert (input1 && input2);
          gassign *stmt = gimple_build_assign (output, VEC_PERM_EXPR,
                                               input1, input2,
                                               permutes[i & 1]);

I'll see what happens.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug tree-optimization/98535] [11 Regression] ICE in operands_scanner::get_expr_operands(tree_node**, int) building 538.imagick_r
  2021-01-05 15:43 [Bug tree-optimization/98535] New: [11 Regression] ICE in operands_scanner::get_expr_operands(tree_node**, int) building 538.imagick_r ktkachov at gcc dot gnu.org
                   ` (5 preceding siblings ...)
  2021-01-19 10:04 ` rguenth at gcc dot gnu.org
@ 2021-01-19 10:35 ` rguenth at gcc dot gnu.org
  2021-01-19 10:38 ` rguenth at gcc dot gnu.org
                   ` (6 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-01-19 10:35 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98535

--- Comment #6 from Richard Biener <rguenth at gcc dot gnu.org> ---
So the issue is that hi_start isn't halved I think or hi_start isn't
added when i&1 != 0.  So probably all non-single round duplicate&interleaves
are broken right now.

@@ -5091,13 +5091,14 @@ duplicate_and_interleave (vec_info *vinfo, gimple_seq
*seq, tree vector_type,
      a multiple of N * 2, the HI result is the same as the LO.  */
   unsigned int in_start = 0;
   unsigned int out_start = nvectors;
-  unsigned int hi_start = nvectors / 2;
+  unsigned int hi_start = nvectors;
   /* A bound on the number of outputs needed to produce NRESULTS results
      in the final iteration.  */
   unsigned int noutputs_bound = nvectors * nresults;
   for (unsigned int in_repeat = 1; in_repeat < nvectors; in_repeat *= 2)
     {
       noutputs_bound /= 2;
+      hi_start /= 2;
       unsigned int limit = MIN (noutputs_bound, nvectors);
       for (unsigned int i = 0; i < limit; ++i)
        {

fixes the crash but the code generated seemingly lacks a permute anyway
since seq is initially

_99 = {j$b_2(D)};
_100 = VIEW_CONVERT_EXPR<unsigned short>(_99);
_101 = [vec_duplicate_expr] _100;
_102 = {j$c_10(D)};
_103 = VIEW_CONVERT_EXPR<unsigned short>(_102);
_104 = [vec_duplicate_expr] _103;
_105 = {j$d_11(D)};
_106 = VIEW_CONVERT_EXPR<unsigned short>(_105);
_107 = [vec_duplicate_expr] _106;
_108 = {j$e_12(D)};
_109 = VIEW_CONVERT_EXPR<unsigned short>(_108);
_110 = [vec_duplicate_expr] _109;

but we re-use the interleaving of _101 and _107 for the other part
due to

          if ((i & 1) != 0
              && multiple_p (TYPE_VECTOR_SUBPARTS (new_vector_type),
                             2 * in_repeat))
            {
              pieces[out_start + i] = pieces[out_start + i - 1];
              continue;
            }

that is, we end up with

  <bb 26> [local count: 73320728]:
  _99 = {j$b_2(D)};
  _100 = VIEW_CONVERT_EXPR<unsigned short>(_99);
  _101 = [vec_duplicate_expr] _100;
  _102 = {j$c_10(D)};
  _103 = VIEW_CONVERT_EXPR<unsigned short>(_102);
  _104 = [vec_duplicate_expr] _103;
  _105 = {j$d_11(D)};
  _106 = VIEW_CONVERT_EXPR<unsigned short>(_105);
  _107 = [vec_duplicate_expr] _106;
  _108 = {j$e_12(D)};
  _109 = VIEW_CONVERT_EXPR<unsigned short>(_108);
  _110 = [vec_duplicate_expr] _109;
  _111 = VEC_PERM_EXPR <_101, _107, { 0, POLY_INT_CST [4, 4], 1, POLY_INT_CST
[5, 4], 2, POLY_INT_CST [6, 4], ... }>;
  _112 = VEC_PERM_EXPR <_111, _111, { 0, POLY_INT_CST [4, 4], 1, POLY_INT_CST
[5, 4], 2, POLY_INT_CST [6, 4], ... }>;
  _113 = VIEW_CONVERT_EXPR<vector([4,4]) short int>(_112);

seemingly ignoring _104 and _108 entirely ...

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug tree-optimization/98535] [11 Regression] ICE in operands_scanner::get_expr_operands(tree_node**, int) building 538.imagick_r
  2021-01-05 15:43 [Bug tree-optimization/98535] New: [11 Regression] ICE in operands_scanner::get_expr_operands(tree_node**, int) building 538.imagick_r ktkachov at gcc dot gnu.org
                   ` (6 preceding siblings ...)
  2021-01-19 10:35 ` rguenth at gcc dot gnu.org
@ 2021-01-19 10:38 ` rguenth at gcc dot gnu.org
  2021-01-19 11:19 ` rsandifo at gcc dot gnu.org
                   ` (5 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-01-19 10:38 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98535

--- Comment #7 from Richard Biener <rguenth at gcc dot gnu.org> ---
Created attachment 50000
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50000&action=edit
patch

Can somebody test this?  As said I believe there's a wrong-code piece left
(but the testcase in this PR is too reduced).

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug tree-optimization/98535] [11 Regression] ICE in operands_scanner::get_expr_operands(tree_node**, int) building 538.imagick_r
  2021-01-05 15:43 [Bug tree-optimization/98535] New: [11 Regression] ICE in operands_scanner::get_expr_operands(tree_node**, int) building 538.imagick_r ktkachov at gcc dot gnu.org
                   ` (7 preceding siblings ...)
  2021-01-19 10:38 ` rguenth at gcc dot gnu.org
@ 2021-01-19 11:19 ` rsandifo at gcc dot gnu.org
  2021-01-19 12:13 ` rsandifo at gcc dot gnu.org
                   ` (4 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: rsandifo at gcc dot gnu.org @ 2021-01-19 11:19 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98535

--- Comment #8 from rsandifo at gcc dot gnu.org <rsandifo at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #6)
> that is, we end up with
> 
>   <bb 26> [local count: 73320728]:
>   _99 = {j$b_2(D)};
>   _100 = VIEW_CONVERT_EXPR<unsigned short>(_99);
>   _101 = [vec_duplicate_expr] _100;
>   _102 = {j$c_10(D)};
>   _103 = VIEW_CONVERT_EXPR<unsigned short>(_102);
>   _104 = [vec_duplicate_expr] _103;
>   _105 = {j$d_11(D)};
>   _106 = VIEW_CONVERT_EXPR<unsigned short>(_105);
>   _107 = [vec_duplicate_expr] _106;
>   _108 = {j$e_12(D)};
>   _109 = VIEW_CONVERT_EXPR<unsigned short>(_108);
>   _110 = [vec_duplicate_expr] _109;
>   _111 = VEC_PERM_EXPR <_101, _107, { 0, POLY_INT_CST [4, 4], 1,
> POLY_INT_CST [5, 4], 2, POLY_INT_CST [6, 4], ... }>;
>   _112 = VEC_PERM_EXPR <_111, _111, { 0, POLY_INT_CST [4, 4], 1,
> POLY_INT_CST [5, 4], 2, POLY_INT_CST [6, 4], ... }>;
>   _113 = VIEW_CONVERT_EXPR<vector([4,4]) short int>(_112);
> 
> seemingly ignoring _104 and _108 entirely ...
Yeah, that does seem wrong.  I'll have a look.

Congrats on getting attachment 50000 btw ;-)

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug tree-optimization/98535] [11 Regression] ICE in operands_scanner::get_expr_operands(tree_node**, int) building 538.imagick_r
  2021-01-05 15:43 [Bug tree-optimization/98535] New: [11 Regression] ICE in operands_scanner::get_expr_operands(tree_node**, int) building 538.imagick_r ktkachov at gcc dot gnu.org
                   ` (8 preceding siblings ...)
  2021-01-19 11:19 ` rsandifo at gcc dot gnu.org
@ 2021-01-19 12:13 ` rsandifo at gcc dot gnu.org
  2021-01-20 13:17 ` cvs-commit at gcc dot gnu.org
                   ` (3 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: rsandifo at gcc dot gnu.org @ 2021-01-19 12:13 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98535

rsandifo at gcc dot gnu.org <rsandifo at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Assignee|rguenth at gcc dot gnu.org         |rsandifo at gcc dot gnu.org

--- Comment #9 from rsandifo at gcc dot gnu.org <rsandifo at gcc dot gnu.org> ---
I think the problem is in the way that noutputs_bound is used.
If we force limit = nvectors to suppress the (attempted) DCE,
we get the correct output.

In other words, the optimisation is trying to make sure
we only generate the bare mininum vectors needed on each
iteration.  But in this case it's generating the wrong ones.
Perhaps it would be easier to get rid of that and do something
similar to the i&1 handling inside the loop.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug tree-optimization/98535] [11 Regression] ICE in operands_scanner::get_expr_operands(tree_node**, int) building 538.imagick_r
  2021-01-05 15:43 [Bug tree-optimization/98535] New: [11 Regression] ICE in operands_scanner::get_expr_operands(tree_node**, int) building 538.imagick_r ktkachov at gcc dot gnu.org
                   ` (9 preceding siblings ...)
  2021-01-19 12:13 ` rsandifo at gcc dot gnu.org
@ 2021-01-20 13:17 ` cvs-commit at gcc dot gnu.org
  2021-01-20 13:26 ` rsandifo at gcc dot gnu.org
                   ` (2 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2021-01-20 13:17 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98535

--- Comment #10 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Richard Sandiford <rsandifo@gcc.gnu.org>:

https://gcc.gnu.org/g:ea74a3f548eb321429c371e327e778e63d9128a0

commit r11-6814-gea74a3f548eb321429c371e327e778e63d9128a0
Author: Richard Sandiford <richard.sandiford@arm.com>
Date:   Wed Jan 20 13:16:30 2021 +0000

    vect: Fix VLA SLP invariant optimisation [PR98535]

    duplicate_and_interleave is the main fallback way of loading
    a repeating sequence of elements into variable-length vectors.
    The code handles cases in which the number of elements in the
    sequence is potentially several times greater than the number
    of elements in a vector.

    Let:

    - NE be the (compile-time) number of elements in the sequence
    - NR be the (compile-time) number of vector results and
    - VE be the (run-time) number of elements in each vector

    The basic approach is to duplicate each element into a
    separate vector, giving NE vectors in total, then use
    log2(NE) rows of NE permutes to generate NE results.

    In the worst case â when VE has no known compile-time factor
    and NR >= NE â all of these permutes are necessary.  However,
    if VE is known to be a multiple of 2**F, then each of the
    first F permute rows produces duplicate results; specifically,
    the high permute for a given pair is the same as the low permute.
    The code dealt with this by reusing the low result for the
    high result.  This part was OK.

    However, having duplicate results from one row meant that the
    next row did duplicate work.  The redundancies would be optimised
    away by later passes, but the code tried to avoid generating them
    in the first place.  This is the part that went wrong.

    Specifically, NR is typically less than NE when some permutes are
    redundant, so the code tried to use NR to reduce the amount of work
    performed.  The problem was that, although it correctly calculated
    a conservative bound on how many results were needed in each row,
    it chose the wrong results for anything other than the final row.

    This doesn't usually matter for fully-packed SVE vectors.  We first
    try to coalesce smaller elements into larger ones, so normally
    VE ends up being 2**VQ (where VQ is the number of 128-bit blocks
    in an SVE vector).  In that situation we'd only apply the faulty
    optimisation to the final row, i.e. the case it handled correctly.
    E.g. for things like:

      void
      f (long *x)
      {
        for (int i = 0; i < 100; i += 8)
          {
            x[i] += 1;
            x[i + 1] += 2;
            x[i + 2] += 3;
            x[i + 3] += 4;
            x[i + 4] += 5;
            x[i + 5] += 6;
            x[i + 6] += 7;
            x[i + 7] += 8;
          }
      }

    (already tested by the testsuite), we'd have 3 rows of permutes
    producing 4 vector results.  The schemne produced:

    1st row: 8 results from 4 permutes, highs duplicates of lows
    2nd row: 8 results from 8 permutes (half of which are actually redundant)
    3rd row: 4 results from 4 permutes

    However, coalescing elements is trickier for unpacked vectors,
    and at the moment we don't try to do it (see the GET_MODE_SIZE
    check in can_duplicate_and_interleave_p).  Unpacked vectors
    therefore stress the code in ways that packed vectors didn't.

    The patch fixes this by removing the redundancies from each row,
    rather than trying to work around them later.  This also removes
    the redundant work in the second row of the example above.

    gcc/
            PR tree-optimization/98535
            * tree-vect-slp.c (duplicate_and_interleave): Use
quick_grow_cleared.
            If the high and low permutes are the same, remove the high permutes
            from the working set and only continue with the low ones.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug tree-optimization/98535] [11 Regression] ICE in operands_scanner::get_expr_operands(tree_node**, int) building 538.imagick_r
  2021-01-05 15:43 [Bug tree-optimization/98535] New: [11 Regression] ICE in operands_scanner::get_expr_operands(tree_node**, int) building 538.imagick_r ktkachov at gcc dot gnu.org
                   ` (10 preceding siblings ...)
  2021-01-20 13:17 ` cvs-commit at gcc dot gnu.org
@ 2021-01-20 13:26 ` rsandifo at gcc dot gnu.org
  2021-01-22  9:13 ` cvs-commit at gcc dot gnu.org
  2021-01-22  9:19 ` rsandifo at gcc dot gnu.org
  13 siblings, 0 replies; 15+ messages in thread
From: rsandifo at gcc dot gnu.org @ 2021-01-20 13:26 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98535

--- Comment #11 from rsandifo at gcc dot gnu.org <rsandifo at gcc dot gnu.org> ---
Fixed on trunk, but it's latent on release branches and should be
fixed there too.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug tree-optimization/98535] [11 Regression] ICE in operands_scanner::get_expr_operands(tree_node**, int) building 538.imagick_r
  2021-01-05 15:43 [Bug tree-optimization/98535] New: [11 Regression] ICE in operands_scanner::get_expr_operands(tree_node**, int) building 538.imagick_r ktkachov at gcc dot gnu.org
                   ` (11 preceding siblings ...)
  2021-01-20 13:26 ` rsandifo at gcc dot gnu.org
@ 2021-01-22  9:13 ` cvs-commit at gcc dot gnu.org
  2021-01-22  9:19 ` rsandifo at gcc dot gnu.org
  13 siblings, 0 replies; 15+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2021-01-22  9:13 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98535

--- Comment #12 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The releases/gcc-10 branch has been updated by Richard Sandiford
<rsandifo@gcc.gnu.org>:

https://gcc.gnu.org/g:51b23ba76f00610360a023de4cdae1641ca3b961

commit r10-9287-g51b23ba76f00610360a023de4cdae1641ca3b961
Author: Richard Sandiford <richard.sandiford@arm.com>
Date:   Fri Jan 22 09:13:12 2021 +0000

    vect: Fix VLA SLP invariant optimisation [PR98535]

    duplicate_and_interleave is the main fallback way of loading
    a repeating sequence of elements into variable-length vectors.
    The code handles cases in which the number of elements in the
    sequence is potentially several times greater than the number
    of elements in a vector.

    Let:

    - NE be the (compile-time) number of elements in the sequence
    - NR be the (compile-time) number of vector results and
    - VE be the (run-time) number of elements in each vector

    The basic approach is to duplicate each element into a
    separate vector, giving NE vectors in total, then use
    log2(NE) rows of NE permutes to generate NE results.

    In the worst case --- when VE has no known compile-time factor
    and NR >= NE --- all of these permutes are necessary.  However,
    if VE is known to be a multiple of 2**F, then each of the
    first F permute rows produces duplicate results; specifically,
    the high permute for a given pair is the same as the low permute.
    The code dealt with this by reusing the low result for the
    high result.  This part was OK.

    However, having duplicate results from one row meant that the
    next row did duplicate work.  The redundancies would be optimised
    away by later passes, but the code tried to avoid generating them
    in the first place.  This is the part that went wrong.

    Specifically, NR is typically less than NE when some permutes are
    redundant, so the code tried to use NR to reduce the amount of work
    performed.  The problem was that, although it correctly calculated
    a conservative bound on how many results were needed in each row,
    it chose the wrong results for anything other than the final row.

    This doesn't usually matter for fully-packed SVE vectors.  We first
    try to coalesce smaller elements into larger ones, so normally
    VE ends up being 2**VQ (where VQ is the number of 128-bit blocks
    in an SVE vector).  In that situation we'd only apply the faulty
    optimisation to the final row, i.e. the case it handled correctly.
    E.g. for things like:

      void
      f (long *x)
      {
        for (int i = 0; i < 100; i += 8)
          {
            x[i] += 1;
            x[i + 1] += 2;
            x[i + 2] += 3;
            x[i + 3] += 4;
            x[i + 4] += 5;
            x[i + 5] += 6;
            x[i + 6] += 7;
            x[i + 7] += 8;
          }
      }

    (already tested by the testsuite), we'd have 3 rows of permutes
    producing 4 vector results.  The schemne produced:

    1st row: 8 results from 4 permutes, highs duplicates of lows
    2nd row: 8 results from 8 permutes (half of which are actually redundant)
    3rd row: 4 results from 4 permutes

    However, coalescing elements is trickier for unpacked vectors,
    and at the moment we don't try to do it (see the GET_MODE_SIZE
    check in can_duplicate_and_interleave_p).  Unpacked vectors
    therefore stress the code in ways that packed vectors didn't.

    The patch fixes this by removing the redundancies from each row,
    rather than trying to work around them later.  This also removes
    the redundant work in the second row of the example above.

    gcc/
            PR tree-optimization/98535
            * tree-vect-slp.c (duplicate_and_interleave): Use
quick_grow_cleared.
            If the high and low permutes are the same, remove the high permutes
            from the working set and only continue with the low ones.

    (cherry picked from commit ea74a3f548eb321429c371e327e778e63d9128a0)

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug tree-optimization/98535] [11 Regression] ICE in operands_scanner::get_expr_operands(tree_node**, int) building 538.imagick_r
  2021-01-05 15:43 [Bug tree-optimization/98535] New: [11 Regression] ICE in operands_scanner::get_expr_operands(tree_node**, int) building 538.imagick_r ktkachov at gcc dot gnu.org
                   ` (12 preceding siblings ...)
  2021-01-22  9:13 ` cvs-commit at gcc dot gnu.org
@ 2021-01-22  9:19 ` rsandifo at gcc dot gnu.org
  13 siblings, 0 replies; 15+ messages in thread
From: rsandifo at gcc dot gnu.org @ 2021-01-22  9:19 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98535

rsandifo at gcc dot gnu.org <rsandifo at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|ASSIGNED                    |RESOLVED
         Resolution|---                         |FIXED

--- Comment #13 from rsandifo at gcc dot gnu.org <rsandifo at gcc dot gnu.org> ---
Fixed in GCC 10 and above.  Although the code goes back to GCC 8,
I'm not sure it would cause a visible failure on GCC 8 or 9.

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2021-01-22  9:19 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-05 15:43 [Bug tree-optimization/98535] New: [11 Regression] ICE in operands_scanner::get_expr_operands(tree_node**, int) building 538.imagick_r ktkachov at gcc dot gnu.org
2021-01-05 15:43 ` [Bug tree-optimization/98535] " ktkachov at gcc dot gnu.org
2021-01-05 17:01 ` ktkachov at gcc dot gnu.org
2021-01-05 18:54 ` ktkachov at gcc dot gnu.org
2021-01-05 20:18 ` marxin at gcc dot gnu.org
2021-01-06  8:49 ` rguenth at gcc dot gnu.org
2021-01-19 10:04 ` rguenth at gcc dot gnu.org
2021-01-19 10:35 ` rguenth at gcc dot gnu.org
2021-01-19 10:38 ` rguenth at gcc dot gnu.org
2021-01-19 11:19 ` rsandifo at gcc dot gnu.org
2021-01-19 12:13 ` rsandifo at gcc dot gnu.org
2021-01-20 13:17 ` cvs-commit at gcc dot gnu.org
2021-01-20 13:26 ` rsandifo at gcc dot gnu.org
2021-01-22  9:13 ` cvs-commit at gcc dot gnu.org
2021-01-22  9:19 ` rsandifo at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).