[Bug rtl-optimization/111754] New: [14 Regression] ICE: in decompose, at rtl.h:2313 at -O

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug rtl-optimization/111754] New: [14 Regression] ICE: in decompose, at rtl.h:2313 at -O
@ 2023-10-10  5:15 zsojka at seznam dot cz
  2023-10-10  9:25 ` [Bug middle-end/111754] " rguenth at gcc dot gnu.org
                   ` (14 more replies)
  0 siblings, 15 replies; 16+ messages in thread
From: zsojka at seznam dot cz @ 2023-10-10  5:15 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111754

            Bug ID: 111754
           Summary: [14 Regression] ICE: in decompose, at rtl.h:2313 at -O
           Product: gcc
           Version: 14.0
            Status: UNCONFIRMED
          Keywords: ice-on-valid-code
          Severity: normal
          Priority: P3
         Component: rtl-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: zsojka at seznam dot cz
  Target Milestone: ---
              Host: x86_64-pc-linux-gnu
            Target: x86_64-pc-linux-gnu

Created attachment 56088
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=56088&action=edit
reduced testcase

Compiler output:
$ x86_64-pc-linux-gnu-gcc -O testcase.c
during RTL pass: expand
testcase.c: In function 'foo':
testcase.c:14:10: internal compiler error: in decompose, at rtl.h:2313
   14 |   return bar ((F){9}, (F){});
      |          ^~~~~~~~~~~~~~~~~~~
0x7ea1ba wi::int_traits<std::pair<rtx_def*, machine_mode> >::decompose(long*,
unsigned int, std::pair<rtx_def*, machine_mode> const&)
        /repo/gcc-trunk/gcc/rtl.h:2313
0x7ea1ba wide_int_ref_storage<false,
false>::wide_int_ref_storage<std::pair<rtx_def*, machine_mode>
>(std::pair<rtx_def*, machine_mode> const&)
        /repo/gcc-trunk/gcc/wide-int.h:1030
0x7ea1ba generic_wide_int<wide_int_ref_storage<false, false>
>::generic_wide_int<std::pair<rtx_def*, machine_mode> >(std::pair<rtx_def*,
machine_mode> const&)
        /repo/gcc-trunk/gcc/wide-int.h:788
0x7ea1ba poly_int<1u, generic_wide_int<wide_int_ref_storage<false, false> >
>::poly_int<std::pair<rtx_def*, machine_mode> >(poly_int_full,
std::pair<rtx_def*, machine_mode> const&)
        /repo/gcc-trunk/gcc/poly-int.h:453
0x7ea1ba poly_int<1u, generic_wide_int<wide_int_ref_storage<false, false> >
>::poly_int<std::pair<rtx_def*, machine_mode> >(std::pair<rtx_def*,
machine_mode> const&)
        /repo/gcc-trunk/gcc/poly-int.h:439
0x7ea1ba wi::to_poly_wide(rtx_def const*, machine_mode)
        /repo/gcc-trunk/gcc/rtl.h:2382
0x7ea1ba rtx_vector_builder::step(rtx_def*, rtx_def*) const
        /repo/gcc-trunk/gcc/rtx-vector-builder.h:122
0x143d95b vector_builder<rtx_def*, machine_mode,
rtx_vector_builder>::elt(unsigned int) const
        /repo/gcc-trunk/gcc/vector-builder.h:254
0x143d841 rtx_vector_builder::build()
        /repo/gcc-trunk/gcc/rtx-vector-builder.cc:73
0x107c7a1 const_vector_from_tree
        /repo/gcc-trunk/gcc/expr.cc:13494
0x10856ce expand_expr_real_1(tree_node*, rtx_def*, machine_mode,
expand_modifier, rtx_def**, bool)
        /repo/gcc-trunk/gcc/expr.cc:11066
0xf50792 expand_expr(tree_node*, rtx_def*, machine_mode, expand_modifier)
        /repo/gcc-trunk/gcc/expr.h:310
0xf50792 expand_return
        /repo/gcc-trunk/gcc/cfgexpand.cc:3809
0xf50792 expand_gimple_stmt_1
        /repo/gcc-trunk/gcc/cfgexpand.cc:3918
0xf50792 expand_gimple_stmt
        /repo/gcc-trunk/gcc/cfgexpand.cc:4044
0xf51106 expand_gimple_basic_block
        /repo/gcc-trunk/gcc/cfgexpand.cc:6100
0xf5378e execute
        /repo/gcc-trunk/gcc/cfgexpand.cc:6835
Please submit a full bug report, with preprocessed source (by using
-freport-bug).
Please include the complete backtrace with any bug report.
See <https://gcc.gnu.org/bugs/> for instructions.

$ x86_64-pc-linux-gnu-gcc -v
Using built-in specs.
COLLECT_GCC=/repo/gcc-trunk/binary-latest-amd64/bin/x86_64-pc-linux-gnu-gcc
COLLECT_LTO_WRAPPER=/repo/gcc-trunk/binary-trunk-r14-4521-20231009151152-g08d0f840dc7-checking-yes-rtl-df-extra-amd64/bin/../libexec/gcc/x86_64-pc-linux-gnu/14.0.0/lto-wrapper
Target: x86_64-pc-linux-gnu
Configured with: /repo/gcc-trunk//configure --enable-languages=c,c++
--enable-valgrind-annotations --disable-nls --enable-checking=yes,rtl,df,extra
--with-cloog --with-ppl --with-isl --build=x86_64-pc-linux-gnu
--host=x86_64-pc-linux-gnu --target=x86_64-pc-linux-gnu
--with-ld=/usr/bin/x86_64-pc-linux-gnu-ld
--with-as=/usr/bin/x86_64-pc-linux-gnu-as --disable-libstdcxx-pch
--prefix=/repo/gcc-trunk//binary-trunk-r14-4521-20231009151152-g08d0f840dc7-checking-yes-rtl-df-extra-amd64
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 14.0.0 20231009 (experimental) (GCC)

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug middle-end/111754] [14 Regression] ICE: in decompose, at rtl.h:2313 at -O
  2023-10-10  5:15 [Bug rtl-optimization/111754] New: [14 Regression] ICE: in decompose, at rtl.h:2313 at -O zsojka at seznam dot cz
@ 2023-10-10  9:25 ` rguenth at gcc dot gnu.org
  2023-10-10  9:38 ` prathamesh3492 at gcc dot gnu.org
                   ` (13 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-10-10  9:25 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111754

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2023-10-10
          Component|rtl-optimization            |middle-end
     Ever confirmed|0                           |1
   Target Milestone|---                         |14.0
           Priority|P3                          |P1
           Keywords|                            |needs-bisection

--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
(gdb) p debug_tree (exp)
 <vector_cst 0x7ffff6f25398
    type <vector_type 0x7ffff6eebf18 F
        type <real_type 0x7ffff6d672a0 float SF
            size <integer_cst 0x7ffff6d61180 constant 32>
            unit-size <integer_cst 0x7ffff6d61198 constant 4>
            align:32 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type
0x7ffff6d672a0 precision:32
            pointer_to_this <pointer_type 0x7ffff6d67930>>
        sizes-gimplified V4SF
        size <integer_cst 0x7ffff6d40f78 constant 128>
        unit-size <integer_cst 0x7ffff6d40f90 constant 16>
        align:128 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type
0x7ffff6e99b28 nunits:4>
    constant npatterns:1 nelts-per-pattern:3
    elt:0:  <real_cst 0x7ffff6f13540 type <real_type 0x7ffff6d672a0 float>
        constant 0.0>
    elt:1:  <real_cst 0x7ffff6f13510 type <real_type 0x7ffff6d672a0 float>
        constant 9.0e+0> elt:2:  <real_cst 0x7ffff6f13540 0.0>>

Confirmed.  That's an odd vector mode 'F'!?  And we have three elements?

Happens after inlining bar () into foo ():

F bar (F a, F b)
{
  F c;

  <bb 2> [local count: 1073741824]:
  c_2 = VEC_PERM_EXPR <a_1(D), a_1(D), { 1, 0, 1, 2 }>; 
  __builtin_logbl (0.0);
  return c_2;

F foo ()
{
  F _3;

  <bb 2> [local count: 1073741824]:
  _3 = bar ({ 9.0e+0, 0.0, 0.0, 0.0 }, { 0.0, 0.0, 0.0, 0.0 });

and the constant folding of the VEC_PERM_EXPR causes this.

ICEs way earlier (during inlining) when you do -fdump-ipa-inline

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug middle-end/111754] [14 Regression] ICE: in decompose, at rtl.h:2313 at -O
  2023-10-10  5:15 [Bug rtl-optimization/111754] New: [14 Regression] ICE: in decompose, at rtl.h:2313 at -O zsojka at seznam dot cz
  2023-10-10  9:25 ` [Bug middle-end/111754] " rguenth at gcc dot gnu.org
@ 2023-10-10  9:38 ` prathamesh3492 at gcc dot gnu.org
  2023-10-10 11:41 ` prathamesh3492 at gcc dot gnu.org
                   ` (12 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: prathamesh3492 at gcc dot gnu.org @ 2023-10-10  9:38 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111754

--- Comment #2 from prathamesh3492 at gcc dot gnu.org ---
Hi,
Sorry for the breakage, will take a look.

Thanks,
Prathamesh

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug middle-end/111754] [14 Regression] ICE: in decompose, at rtl.h:2313 at -O
  2023-10-10  5:15 [Bug rtl-optimization/111754] New: [14 Regression] ICE: in decompose, at rtl.h:2313 at -O zsojka at seznam dot cz
  2023-10-10  9:25 ` [Bug middle-end/111754] " rguenth at gcc dot gnu.org
  2023-10-10  9:38 ` prathamesh3492 at gcc dot gnu.org
@ 2023-10-10 11:41 ` prathamesh3492 at gcc dot gnu.org
  2023-10-10 11:52 ` rguenther at suse dot de
                   ` (11 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: prathamesh3492 at gcc dot gnu.org @ 2023-10-10 11:41 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111754

--- Comment #3 from prathamesh3492 at gcc dot gnu.org ---
The issue is that we only support integral vector types in fold_vec_perm_cst,
but fail to check for the same before calling it from fold_vec_perm.
The following tweak fixes the ICE:

diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
index 4f8561509ff..a29a8af6d2f 100644
--- a/gcc/fold-const.cc
+++ b/gcc/fold-const.cc
@@ -10801,7 +10801,8 @@ fold_vec_perm (tree type, tree arg0, tree arg1, const
vec_perm_indices &sel)
     return NULL_TREE;

   if (TREE_CODE (arg0) == VECTOR_CST
-      && TREE_CODE (arg1) == VECTOR_CST)
+      && TREE_CODE (arg1) == VECTOR_CST
+      && INTEGRAL_TYPE_P (TREE_TYPE (type)))
     return fold_vec_perm_cst (type, arg0, arg1, sel);

   /* For fall back case, we want to ensure we have VLS vectors

and results in the following .optimized dump:
F bar (F a, F b)
{
  F c;

  <bb 2> [local count: 1073741824]:
  c_2 = VEC_PERM_EXPR <a_1(D), a_1(D), { 1, 0, 1, 2 }>;
  __builtin_logbl (0.0);
  return c_2;

}

F foo ()
{
  <bb 2> [local count: 1073741824]:
  __builtin_logbl (0.0);
  return { 0.0, 9.0e+0, 0.0, 0.0 };

}

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug middle-end/111754] [14 Regression] ICE: in decompose, at rtl.h:2313 at -O
  2023-10-10  5:15 [Bug rtl-optimization/111754] New: [14 Regression] ICE: in decompose, at rtl.h:2313 at -O zsojka at seznam dot cz
                   ` (2 preceding siblings ...)
  2023-10-10 11:41 ` prathamesh3492 at gcc dot gnu.org
@ 2023-10-10 11:52 ` rguenther at suse dot de
  2023-10-10 13:38 ` rguenth at gcc dot gnu.org
                   ` (10 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: rguenther at suse dot de @ 2023-10-10 11:52 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111754

--- Comment #4 from rguenther at suse dot de <rguenther at suse dot de> ---
On Tue, 10 Oct 2023, prathamesh3492 at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111754
> 
> --- Comment #3 from prathamesh3492 at gcc dot gnu.org ---
> The issue is that we only support integral vector types in fold_vec_perm_cst,
> but fail to check for the same before calling it from fold_vec_perm.
> The following tweak fixes the ICE:
> 
> diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
> index 4f8561509ff..a29a8af6d2f 100644
> --- a/gcc/fold-const.cc
> +++ b/gcc/fold-const.cc
> @@ -10801,7 +10801,8 @@ fold_vec_perm (tree type, tree arg0, tree arg1, const
> vec_perm_indices &sel)
>      return NULL_TREE;
> 
>    if (TREE_CODE (arg0) == VECTOR_CST
> -      && TREE_CODE (arg1) == VECTOR_CST)
> +      && TREE_CODE (arg1) == VECTOR_CST
> +      && INTEGRAL_TYPE_P (TREE_TYPE (type)))
>      return fold_vec_perm_cst (type, arg0, arg1, sel);

Huh, that looks wrong.  I fail to see how the element type matters
at all.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug middle-end/111754] [14 Regression] ICE: in decompose, at rtl.h:2313 at -O
  2023-10-10  5:15 [Bug rtl-optimization/111754] New: [14 Regression] ICE: in decompose, at rtl.h:2313 at -O zsojka at seznam dot cz
                   ` (3 preceding siblings ...)
  2023-10-10 11:52 ` rguenther at suse dot de
@ 2023-10-10 13:38 ` rguenth at gcc dot gnu.org
  2023-10-10 14:07 ` prathamesh3492 at gcc dot gnu.org
                   ` (9 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-10-10 13:38 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111754

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |rsandifo at gcc dot gnu.org

--- Comment #5 from Richard Biener <rguenth at gcc dot gnu.org> ---
It seems we have VECTOR_CST_NELTS_PER_PATTERN ({ 9.0e+0, 0.0, 0.0, 0.0 })
2 and VECTOR_CST_NPATTERNS == 1.  And the selector { 1, 0, 1, 2 } has
npatterns == 1 and nelts-per-pattern == 3.

  /* (1) If SEL is a suitable mask as determined by
     valid_mask_for_fold_vec_perm_cst_p, then:
     res_npatterns = max of npatterns between ARG0, ARG1, and SEL
     res_nelts_per_pattern = max of nelts_per_pattern between
                             ARG0, ARG1 and SEL.
     (2) If SEL is not a suitable mask, and TYPE is VLS then:
     res_npatterns = nelts in result vector.
     res_nelts_per_pattern = 1.
     This exception is made so that VLS ARG0, ARG1 and SEL work as before.  */
  if (valid_mask_for_fold_vec_perm_cst_p (arg0, arg1, sel, reason))
    {
      res_npatterns
        = std::max (VECTOR_CST_NPATTERNS (arg0),
                    std::max (VECTOR_CST_NPATTERNS (arg1),
                              sel.encoding ().npatterns ()));

      res_nelts_per_pattern
        = std::max (VECTOR_CST_NELTS_PER_PATTERN (arg0),
                    std::max (VECTOR_CST_NELTS_PER_PATTERN (arg1),
                              sel.encoding ().nelts_per_pattern ()));

      res_nelts = res_npatterns * res_nelts_per_pattern;

this seems to be a case that doesn't fit, so the fix needs to be to
valid_mask_for_fold_vec_perm_cst_p which really looks a bit
unwieldly.

An assert that res_nelts is power-of-two would be nice to add.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug middle-end/111754] [14 Regression] ICE: in decompose, at rtl.h:2313 at -O
  2023-10-10  5:15 [Bug rtl-optimization/111754] New: [14 Regression] ICE: in decompose, at rtl.h:2313 at -O zsojka at seznam dot cz
                   ` (4 preceding siblings ...)
  2023-10-10 13:38 ` rguenth at gcc dot gnu.org
@ 2023-10-10 14:07 ` prathamesh3492 at gcc dot gnu.org
  2023-10-10 14:37 ` prathamesh3492 at gcc dot gnu.org
                   ` (8 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: prathamesh3492 at gcc dot gnu.org @ 2023-10-10 14:07 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111754

--- Comment #6 from prathamesh3492 at gcc dot gnu.org ---
(In reply to rguenther@suse.de from comment #4)
> On Tue, 10 Oct 2023, prathamesh3492 at gcc dot gnu.org wrote:
> 
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111754
> > 
> > --- Comment #3 from prathamesh3492 at gcc dot gnu.org ---
> > The issue is that we only support integral vector types in fold_vec_perm_cst,
> > but fail to check for the same before calling it from fold_vec_perm.
> > The following tweak fixes the ICE:
> > 
> > diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
> > index 4f8561509ff..a29a8af6d2f 100644
> > --- a/gcc/fold-const.cc
> > +++ b/gcc/fold-const.cc
> > @@ -10801,7 +10801,8 @@ fold_vec_perm (tree type, tree arg0, tree arg1, const
> > vec_perm_indices &sel)
> >      return NULL_TREE;
> > 
> >    if (TREE_CODE (arg0) == VECTOR_CST
> > -      && TREE_CODE (arg1) == VECTOR_CST)
> > +      && TREE_CODE (arg1) == VECTOR_CST
> > +      && INTEGRAL_TYPE_P (TREE_TYPE (type)))
> >      return fold_vec_perm_cst (type, arg0, arg1, sel);
> 
> Huh, that looks wrong.  I fail to see how the element type matters
> at all.

IIUC, the element type matters for VLA folding when sel has a stepped sequence
because in that case we need to derive elements from the encoding using
vector_cst_elt / vector_cst_int_elt, and it gets enforced for VLS vectors too
because they are handled in unified manner in fold_vec_perm_cst.

Another possible approach is to use "VLS exception" in fold_vec_perm_cst to
encode all the elements:
res_npatterns = res_nelts;
res_nelts_per_patterns = 1
just like we do if valid_mask_for_fold_vec_perm_cst_p returns false.

Does the following fix look OK instead ?

diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
index 4f8561509ff..356eb052fbc 100644
--- a/gcc/fold-const.cc
+++ b/gcc/fold-const.cc
@@ -10642,6 +10642,11 @@ valid_mask_for_fold_vec_perm_cst_p (tree arg0, tree
arg1,
   if (sel_nelts_per_pattern < 3)
     return true;

+  /* If SEL contains stepped sequence, ensure that we are dealing with
+     integral vector_cst.  */
+  if (!INTEGRAL_TYPE_P (TREE_TYPE (arg0)))
+    return false;
+
   for (unsigned pattern = 0; pattern < sel_npatterns; pattern++)
     {
       poly_uint64 a1 = sel[pattern + sel_npatterns];

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug middle-end/111754] [14 Regression] ICE: in decompose, at rtl.h:2313 at -O
  2023-10-10  5:15 [Bug rtl-optimization/111754] New: [14 Regression] ICE: in decompose, at rtl.h:2313 at -O zsojka at seznam dot cz
                   ` (5 preceding siblings ...)
  2023-10-10 14:07 ` prathamesh3492 at gcc dot gnu.org
@ 2023-10-10 14:37 ` prathamesh3492 at gcc dot gnu.org
  2023-10-10 15:45 ` rguenther at suse dot de
                   ` (7 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: prathamesh3492 at gcc dot gnu.org @ 2023-10-10 14:37 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111754

--- Comment #7 from prathamesh3492 at gcc dot gnu.org ---
(In reply to Richard Biener from comment #5)
> It seems we have VECTOR_CST_NELTS_PER_PATTERN ({ 9.0e+0, 0.0, 0.0, 0.0 })
> 2 and VECTOR_CST_NPATTERNS == 1.  And the selector { 1, 0, 1, 2 } has
> npatterns == 1 and nelts-per-pattern == 3.
> 
>   /* (1) If SEL is a suitable mask as determined by
>      valid_mask_for_fold_vec_perm_cst_p, then:
>      res_npatterns = max of npatterns between ARG0, ARG1, and SEL
>      res_nelts_per_pattern = max of nelts_per_pattern between
>                              ARG0, ARG1 and SEL.
>      (2) If SEL is not a suitable mask, and TYPE is VLS then:
>      res_npatterns = nelts in result vector.
>      res_nelts_per_pattern = 1.
>      This exception is made so that VLS ARG0, ARG1 and SEL work as before. 
> */
>   if (valid_mask_for_fold_vec_perm_cst_p (arg0, arg1, sel, reason))
>     {
>       res_npatterns
>         = std::max (VECTOR_CST_NPATTERNS (arg0),
>                     std::max (VECTOR_CST_NPATTERNS (arg1),
>                               sel.encoding ().npatterns ()));
> 
>       res_nelts_per_pattern
>         = std::max (VECTOR_CST_NELTS_PER_PATTERN (arg0),
>                     std::max (VECTOR_CST_NELTS_PER_PATTERN (arg1),
>                               sel.encoding ().nelts_per_pattern ()));
> 
>       res_nelts = res_npatterns * res_nelts_per_pattern;
> 
> this seems to be a case that doesn't fit, so the fix needs to be to
> valid_mask_for_fold_vec_perm_cst_p which really looks a bit
> unwieldly.
valid_mask_for_fold_vec_perm_cst_p returns incorrectly true here,
which is being addressed in PR111648 patch:
https://gcc.gnu.org/pipermail/gcc-patches/2023-October/631926.html

Even if the vectors had integral element type:
arg0 = arg1 = (v4si){ 9, 0, 0, 0 }  // encoded as {9, 0, ...}
and sel = { 1, 0, 1, 2 }  // encoded as {1, 0, 1, ...}

The pattern in sel {1, 0, 1, ...}
would choose elements from arg0, and
res would have incorrect encoding with step = -9:
res = { arg0[1], arg0[0], arg0[1], ... } 
    = { 0, 9, 0, ... }
And res[3] will be incorrectly computed as -9 instead of arg0[2].

However, for floating element types, even if encoding is correct,
I assume it will still ICE when trying to derive elements not present in
encoding since poly_int_cst can only deal with integral elements ?
> 
> An assert that res_nelts is power-of-two would be nice to add.
Sorry, I don't understand. res_nelts for VLA need not be power of 2,
since res_nelts_per_pattern can be 3. The encoding for res is chosen
to be max of npatterns and max of nelts_per_pattern between arg0, arg1, and
sel.

Thanks,
Prathamesh

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug middle-end/111754] [14 Regression] ICE: in decompose, at rtl.h:2313 at -O
  2023-10-10  5:15 [Bug rtl-optimization/111754] New: [14 Regression] ICE: in decompose, at rtl.h:2313 at -O zsojka at seznam dot cz
                   ` (6 preceding siblings ...)
  2023-10-10 14:37 ` prathamesh3492 at gcc dot gnu.org
@ 2023-10-10 15:45 ` rguenther at suse dot de
  2023-11-05 22:00 ` pinskia at gcc dot gnu.org
                   ` (6 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: rguenther at suse dot de @ 2023-10-10 15:45 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111754

--- Comment #8 from rguenther at suse dot de <rguenther at suse dot de> ---
> Am 10.10.2023 um 16:07 schrieb prathamesh3492 at gcc dot gnu.org <gcc-bugzilla@gcc.gnu.org>:
> 
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111754
> 
> --- Comment #6 from prathamesh3492 at gcc dot gnu.org ---
> (In reply to rguenther@suse.de from comment #4)
>>> On Tue, 10 Oct 2023, prathamesh3492 at gcc dot gnu.org wrote:
>>> 
>>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111754
>>> 
>>> --- Comment #3 from prathamesh3492 at gcc dot gnu.org ---
>>> The issue is that we only support integral vector types in fold_vec_perm_cst,
>>> but fail to check for the same before calling it from fold_vec_perm.
>>> The following tweak fixes the ICE:
>>> 
>>> diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
>>> index 4f8561509ff..a29a8af6d2f 100644
>>> --- a/gcc/fold-const.cc
>>> +++ b/gcc/fold-const.cc
>>> @@ -10801,7 +10801,8 @@ fold_vec_perm (tree type, tree arg0, tree arg1, const
>>> vec_perm_indices &sel)
>>>     return NULL_TREE;
>>> 
>>>   if (TREE_CODE (arg0) == VECTOR_CST
>>> -      && TREE_CODE (arg1) == VECTOR_CST)
>>> +      && TREE_CODE (arg1) == VECTOR_CST
>>> +      && INTEGRAL_TYPE_P (TREE_TYPE (type)))
>>>     return fold_vec_perm_cst (type, arg0, arg1, sel);
>> 
>> Huh, that looks wrong.  I fail to see how the element type matters
>> at all.
> 
> IIUC, the element type matters for VLA folding when sel has a stepped sequence
> because in that case we need to derive elements from the encoding using
> vector_cst_elt / vector_cst_int_elt, and it gets enforced for VLS vectors too
> because they are handled in unified manner in fold_vec_perm_cst.
> 
> Another possible approach is to use "VLS exception" in fold_vec_perm_cst to
> encode all the elements:
> res_npatterns = res_nelts;
> res_nelts_per_patterns = 1
> just like we do if valid_mask_for_fold_vec_perm_cst_p returns false.
> 
> Does the following fix look OK instead ?

I think so, the important part is to fold for VLS types.  I defer to Richard S.
For whether we can do better for VLA here

> diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
> index 4f8561509ff..356eb052fbc 100644
> --- a/gcc/fold-const.cc
> +++ b/gcc/fold-const.cc
> @@ -10642,6 +10642,11 @@ valid_mask_for_fold_vec_perm_cst_p (tree arg0, tree
> arg1,
>   if (sel_nelts_per_pattern < 3)
>     return true;
> 
> +  /* If SEL contains stepped sequence, ensure that we are dealing with
> +     integral vector_cst.  */
> +  if (!INTEGRAL_TYPE_P (TREE_TYPE (arg0)))
> +    return false;
> +
>   for (unsigned pattern = 0; pattern < sel_npatterns; pattern++)
>     {
>       poly_uint64 a1 = sel[pattern + sel_npatterns];
> 
> -- 
> You are receiving this mail because:
> You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug middle-end/111754] [14 Regression] ICE: in decompose, at rtl.h:2313 at -O
  2023-10-10  5:15 [Bug rtl-optimization/111754] New: [14 Regression] ICE: in decompose, at rtl.h:2313 at -O zsojka at seznam dot cz
                   ` (7 preceding siblings ...)
  2023-10-10 15:45 ` rguenther at suse dot de
@ 2023-11-05 22:00 ` pinskia at gcc dot gnu.org
  2023-11-05 22:19 ` rsandifo at gcc dot gnu.org
                   ` (5 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-11-05 22:00 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111754

--- Comment #9 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
The ICE seems to be gone, and the generated code looks correct.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug middle-end/111754] [14 Regression] ICE: in decompose, at rtl.h:2313 at -O
  2023-10-10  5:15 [Bug rtl-optimization/111754] New: [14 Regression] ICE: in decompose, at rtl.h:2313 at -O zsojka at seznam dot cz
                   ` (8 preceding siblings ...)
  2023-11-05 22:00 ` pinskia at gcc dot gnu.org
@ 2023-11-05 22:19 ` rsandifo at gcc dot gnu.org
  2023-11-11  1:37 ` sjames at gcc dot gnu.org
                   ` (4 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: rsandifo at gcc dot gnu.org @ 2023-11-05 22:19 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111754

--- Comment #10 from Richard Sandiford <rsandifo at gcc dot gnu.org> ---
Yeah, the problem went latent after the fix for PR111648, but the underlying
problem is still there.  Prathamesh is working on a fix for that.  See the
thread starting at
https://gcc.gnu.org/pipermail/gcc-patches/2023-October/633784.html for
discussion.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug middle-end/111754] [14 Regression] ICE: in decompose, at rtl.h:2313 at -O
  2023-10-10  5:15 [Bug rtl-optimization/111754] New: [14 Regression] ICE: in decompose, at rtl.h:2313 at -O zsojka at seznam dot cz
                   ` (9 preceding siblings ...)
  2023-11-05 22:19 ` rsandifo at gcc dot gnu.org
@ 2023-11-11  1:37 ` sjames at gcc dot gnu.org
  2023-11-27 17:12 ` cvs-commit at gcc dot gnu.org
                   ` (3 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: sjames at gcc dot gnu.org @ 2023-11-11  1:37 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111754

Sam James <sjames at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |sjames at gcc dot gnu.org
           Keywords|needs-bisection             |

--- Comment #11 from Sam James <sjames at gcc dot gnu.org> ---
(In reply to Richard Sandiford from comment #10)
> Yeah, the problem went latent after the fix for PR111648, [...]

OK, removing needs-bisection then.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug middle-end/111754] [14 Regression] ICE: in decompose, at rtl.h:2313 at -O
  2023-10-10  5:15 [Bug rtl-optimization/111754] New: [14 Regression] ICE: in decompose, at rtl.h:2313 at -O zsojka at seznam dot cz
                   ` (10 preceding siblings ...)
  2023-11-11  1:37 ` sjames at gcc dot gnu.org
@ 2023-11-27 17:12 ` cvs-commit at gcc dot gnu.org
  2023-11-27 17:15 ` prathamesh3492 at gcc dot gnu.org
                   ` (2 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2023-11-27 17:12 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111754

--- Comment #12 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Prathamesh Kulkarni
<prathamesh3492@gcc.gnu.org>:

https://gcc.gnu.org/g:2065438db4ac13af7e0de2f934959413647f74a7

commit r14-5890-g2065438db4ac13af7e0de2f934959413647f74a7
Author: Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org>
Date:   Mon Nov 27 22:40:49 2023 +0530

    PR111754: Rework encoding of result for VEC_PERM_EXPR with constant input
vectors.

    gcc/ChangeLog:
            PR middle-end/111754
            * fold-const.cc (fold_vec_perm_cst): Set result's encoding to sel's
            encoding, and set res_nelts_per_pattern to 2 if sel contains
stepped
            sequence but input vectors do not.
            (test_nunits_min_2): New test Case 8.
            (test_nunits_min_4): New tests Case 8 and Case 9.

    gcc/testsuite/ChangeLog:
            PR middle-end/111754
            * gcc.target/aarch64/sve/slp_3.c: Adjust code-gen.
            * gcc.target/aarch64/sve/slp_4.c: Likewise.
            * gcc.dg/vect/pr111754.c: New test.

    Co-authored-by: Richard Sandiford <richard.sandiford@arm.com>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug middle-end/111754] [14 Regression] ICE: in decompose, at rtl.h:2313 at -O
  2023-10-10  5:15 [Bug rtl-optimization/111754] New: [14 Regression] ICE: in decompose, at rtl.h:2313 at -O zsojka at seznam dot cz
                   ` (11 preceding siblings ...)
  2023-11-27 17:12 ` cvs-commit at gcc dot gnu.org
@ 2023-11-27 17:15 ` prathamesh3492 at gcc dot gnu.org
  2023-11-28  9:18 ` cvs-commit at gcc dot gnu.org
  2023-11-28 15:40 ` prathamesh3492 at gcc dot gnu.org
  14 siblings, 0 replies; 16+ messages in thread
From: prathamesh3492 at gcc dot gnu.org @ 2023-11-27 17:15 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111754

prathamesh3492 at gcc dot gnu.org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|---                         |FIXED

--- Comment #13 from prathamesh3492 at gcc dot gnu.org ---
Fixed.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug middle-end/111754] [14 Regression] ICE: in decompose, at rtl.h:2313 at -O
  2023-10-10  5:15 [Bug rtl-optimization/111754] New: [14 Regression] ICE: in decompose, at rtl.h:2313 at -O zsojka at seznam dot cz
                   ` (12 preceding siblings ...)
  2023-11-27 17:15 ` prathamesh3492 at gcc dot gnu.org
@ 2023-11-28  9:18 ` cvs-commit at gcc dot gnu.org
  2023-11-28 15:40 ` prathamesh3492 at gcc dot gnu.org
  14 siblings, 0 replies; 16+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2023-11-28  9:18 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111754

--- Comment #14 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>:

https://gcc.gnu.org/g:e6c01334ccfca8bc748c8de90ba2a636d1662490

commit r14-5902-ge6c01334ccfca8bc748c8de90ba2a636d1662490
Author: Jakub Jelinek <jakub@redhat.com>
Date:   Tue Nov 28 10:16:47 2023 +0100

    testsuite: Fix up pr111754.c test

    On Tue, Nov 28, 2023 at 03:56:47PM +0800, juzhe.zhong@rivai.ai wrote:
    > Hi, there is a regression in RISC-V caused by this patch:
    >
    > FAIL: gcc.dg/vect/pr111754.c -flto -ffat-lto-objects  scan-tree-dump
optimized "return { 0.0, 9.0e\\+0, 0.0, 0.0 }"
    > FAIL: gcc.dg/vect/pr111754.c scan-tree-dump optimized "return { 0.0,
9.0e\\+0, 0.0, 0.0 }"
    >
    > I have checked the dump is :
    > F foo (F a, F b)
    > {
    >   <bb 2> [local count: 1073741824]:
    >   <retval> = { 0.0, 9.0e+0, 0.0, 0.0 };
    >   return <retval>;
    >
    > }
    >
    > The dump IR seems reasonable to me.
    > I wonder whether we should walk around in RISC-V backend to generate the
same IR as ARM SVE ?
    > Or we should adjust the test ?

    Note, the test also FAILs on i686-linux (but not e.g. on x86_64-linux):
    /home/jakub/src/gcc/obj67/gcc/xgcc -B/home/jakub/src/gcc/obj67/gcc/
/home/jakub/src/gcc/gcc/testsuite/gcc.dg/vect/pr111754.c
-fdiagnostics-plain-output -O2 -fdump-tree-optimized -S
    +-o pr111754.s
    /home/jakub/src/gcc/gcc/testsuite/gcc.dg/vect/pr111754.c: In function
'foo':
    /home/jakub/src/gcc/gcc/testsuite/gcc.dg/vect/pr111754.c:7:1: warning: SSE
vector return without SSE enabled changes the ABI [-Wpsabi]
    /home/jakub/src/gcc/gcc/testsuite/gcc.dg/vect/pr111754.c:6:3: note: the ABI
for passing parameters with 16-byte alignment has changed in GCC 4.6
    /home/jakub/src/gcc/gcc/testsuite/gcc.dg/vect/pr111754.c:6:3: warning: SSE
vector argument without SSE enabled changes the ABI [-Wpsabi]
    FAIL: gcc.dg/vect/pr111754.c (test for excess errors)
    Excess errors:
    /home/jakub/src/gcc/gcc/testsuite/gcc.dg/vect/pr111754.c:7:1: warning: SSE
vector return without SSE enabled changes the ABI [-Wpsabi]
    /home/jakub/src/gcc/gcc/testsuite/gcc.dg/vect/pr111754.c:6:3: warning: SSE
vector argument without SSE enabled changes the ABI [-Wpsabi]

    PASS: gcc.dg/vect/pr111754.c scan-tree-dump-not optimized "VEC_PERM_EXPR"
    FAIL: gcc.dg/vect/pr111754.c scan-tree-dump optimized "return { 0.0,
9.0e\\+0, 0.0, 0.0 }"

    So, I think it is wrong to specify
    /* { dg-options "-O2 -fdump-tree-optimized" } */
    in the test, should be dg-additional-options instead, so that it gets
    the implied vector compilation options e.g. for i686-linux (-msse2 in that
    case at least), question is if -Wno-psabi should be added as well or not,
    and certainly the scan-tree-dump needs to be guarded by appropriate
    vect_* effective target (but dunno which, one which asserts support for
    V4SFmode and returning it).
    Alternatively, perhaps don't check optimized dump but some earlier one
    before generic vector lowering, then hopefully it could match on all
    targets?  Maybe with the <retval> = ... vs. return ... variants.

    2023-11-28  Jakub Jelinek  <jakub@redhat.com>

            PR middle-end/111754
            * gcc.dg/vect/pr111754.c: Use dg-additional-options rather than
            dg-options, add -Wno-psabi and use -fdump-tree-forwprop1 rather
than
            -fdump-tree-optimized.  Scan forwprop1 dump rather than optimized
and
            scan for either direct return or setting of <retval> to the vector.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug middle-end/111754] [14 Regression] ICE: in decompose, at rtl.h:2313 at -O
  2023-10-10  5:15 [Bug rtl-optimization/111754] New: [14 Regression] ICE: in decompose, at rtl.h:2313 at -O zsojka at seznam dot cz
                   ` (13 preceding siblings ...)
  2023-11-28  9:18 ` cvs-commit at gcc dot gnu.org
@ 2023-11-28 15:40 ` prathamesh3492 at gcc dot gnu.org
  14 siblings, 0 replies; 16+ messages in thread
From: prathamesh3492 at gcc dot gnu.org @ 2023-11-28 15:40 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111754

--- Comment #15 from prathamesh3492 at gcc dot gnu.org ---
Sorry for the regression, and thanks for the prompt fix!

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2023-11-28 15:40 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-10-10  5:15 [Bug rtl-optimization/111754] New: [14 Regression] ICE: in decompose, at rtl.h:2313 at -O zsojka at seznam dot cz
2023-10-10  9:25 ` [Bug middle-end/111754] " rguenth at gcc dot gnu.org
2023-10-10  9:38 ` prathamesh3492 at gcc dot gnu.org
2023-10-10 11:41 ` prathamesh3492 at gcc dot gnu.org
2023-10-10 11:52 ` rguenther at suse dot de
2023-10-10 13:38 ` rguenth at gcc dot gnu.org
2023-10-10 14:07 ` prathamesh3492 at gcc dot gnu.org
2023-10-10 14:37 ` prathamesh3492 at gcc dot gnu.org
2023-10-10 15:45 ` rguenther at suse dot de
2023-11-05 22:00 ` pinskia at gcc dot gnu.org
2023-11-05 22:19 ` rsandifo at gcc dot gnu.org
2023-11-11  1:37 ` sjames at gcc dot gnu.org
2023-11-27 17:12 ` cvs-commit at gcc dot gnu.org
2023-11-27 17:15 ` prathamesh3492 at gcc dot gnu.org
2023-11-28  9:18 ` cvs-commit at gcc dot gnu.org
2023-11-28 15:40 ` prathamesh3492 at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).