public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* PR111754
@ 2023-11-28  7:56 juzhe.zhong
  2023-11-28  8:43 ` PR111754 Jakub Jelinek
  0 siblings, 1 reply; 16+ messages in thread
From: juzhe.zhong @ 2023-11-28  7:56 UTC (permalink / raw)
  To: gcc-patches; +Cc: richard.sandiford, prathamesh.kulkarni

[-- Attachment #1: Type: text/plain, Size: 629 bytes --]

Hi, there is a regression in RISC-V caused by this patch:

FAIL: gcc.dg/vect/pr111754.c -flto -ffat-lto-objects  scan-tree-dump optimized "return { 0.0, 9.0e\\+0, 0.0, 0.0 }"
FAIL: gcc.dg/vect/pr111754.c scan-tree-dump optimized "return { 0.0, 9.0e\\+0, 0.0, 0.0 }"

I have checked the dump is :
F foo (F a, F b)
{
  <bb 2> [local count: 1073741824]:
  <retval> = { 0.0, 9.0e+0, 0.0, 0.0 };
  return <retval>;

}

The dump IR seems reasonable to me.
I wonder whether we should walk around in RISC-V backend to generate the same IR as ARM SVE ?
Or we should adjust the test ?

Thanks.


juzhe.zhong@rivai.ai

^ permalink raw reply	[flat|nested] 16+ messages in thread
* PR111754
@ 2023-10-20 17:25 Prathamesh Kulkarni
  2023-10-24 21:28 ` PR111754 Richard Sandiford
  0 siblings, 1 reply; 16+ messages in thread
From: Prathamesh Kulkarni @ 2023-10-20 17:25 UTC (permalink / raw)
  To: gcc Patches, Richard Sandiford, Richard Biener

[-- Attachment #1: Type: text/plain, Size: 4251 bytes --]

Hi,
For the following test-case:

typedef float __attribute__((__vector_size__ (16))) F;
F foo (F a, F b)
{
  F v = (F) { 9 };
  return __builtin_shufflevector (v, v, 1, 0, 1, 2);
}

Compiling with -O2 results in following ICE:
foo.c: In function ‘foo’:
foo.c:6:10: internal compiler error: in decompose, at rtl.h:2314
    6 |   return __builtin_shufflevector (v, v, 1, 0, 1, 2);
      |          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
0x7f3185 wi::int_traits<std::pair<rtx_def*, machine_mode>
>::decompose(long*, unsigned int, std::pair<rtx_def*, machine_mode>
const&)
        ../../gcc/gcc/rtl.h:2314
0x7f3185 wide_int_ref_storage<false,
false>::wide_int_ref_storage<std::pair<rtx_def*, machine_mode>
>(std::pair<rtx_def*, machine_mode> const&)
        ../../gcc/gcc/wide-int.h:1089
0x7f3185 generic_wide_int<wide_int_ref_storage<false, false>
>::generic_wide_int<std::pair<rtx_def*, machine_mode>
>(std::pair<rtx_def*, machine_mode> const&)
        ../../gcc/gcc/wide-int.h:847
0x7f3185 poly_int<1u, generic_wide_int<wide_int_ref_storage<false,
false> > >::poly_int<std::pair<rtx_def*, machine_mode>
>(poly_int_full, std::pair<rtx_def*, machine_mode> const&)
        ../../gcc/gcc/poly-int.h:467
0x7f3185 poly_int<1u, generic_wide_int<wide_int_ref_storage<false,
false> > >::poly_int<std::pair<rtx_def*, machine_mode>
>(std::pair<rtx_def*, machine_mode> const&)
        ../../gcc/gcc/poly-int.h:453
0x7f3185 wi::to_poly_wide(rtx_def const*, machine_mode)
        ../../gcc/gcc/rtl.h:2383
0x7f3185 rtx_vector_builder::step(rtx_def*, rtx_def*) const
        ../../gcc/gcc/rtx-vector-builder.h:122
0xfd4e1b vector_builder<rtx_def*, machine_mode,
rtx_vector_builder>::elt(unsigned int) const
        ../../gcc/gcc/vector-builder.h:253
0xfd4d11 rtx_vector_builder::build()
        ../../gcc/gcc/rtx-vector-builder.cc:73
0xc21d9c const_vector_from_tree
        ../../gcc/gcc/expr.cc:13487
0xc21d9c expand_expr_real_1(tree_node*, rtx_def*, machine_mode,
expand_modifier, rtx_def**, bool)
        ../../gcc/gcc/expr.cc:11059
0xaee682 expand_expr(tree_node*, rtx_def*, machine_mode, expand_modifier)
        ../../gcc/gcc/expr.h:310
0xaee682 expand_return
        ../../gcc/gcc/cfgexpand.cc:3809
0xaee682 expand_gimple_stmt_1
        ../../gcc/gcc/cfgexpand.cc:3918
0xaee682 expand_gimple_stmt
        ../../gcc/gcc/cfgexpand.cc:4044
0xaf28f0 expand_gimple_basic_block
        ../../gcc/gcc/cfgexpand.cc:6100
0xaf4996 execute
        ../../gcc/gcc/cfgexpand.cc:6835

IIUC, the issue is that fold_vec_perm returns a vector having float element
type with res_nelts_per_pattern == 3, and later ICE's when it tries
to derive element v[3], not present in the encoding, while trying to
build rtx vector
in rtx_vector_builder::build():
 for (unsigned int i = 0; i < nelts; ++i)
    RTVEC_ELT (v, i) = elt (i);

The attached patch tries to fix this by returning false from
valid_mask_for_fold_vec_perm_cst if sel has a stepped sequence and
input vector has non-integral element type, so for VLA vectors, it
will only build result with dup sequence (nelts_per_pattern < 3) for
non-integral element type.

For VLS vectors, this will still work for stepped sequence since it
will then use the "VLS exception" in fold_vec_perm_cst, and set:
res_npattern = res_nelts and
res_nelts_per_pattern = 1

and fold the above case to:
F foo (F a, F b)
{
  <bb 2> [local count: 1073741824]:
  return { 0.0, 9.0e+0, 0.0, 0.0 };
}

But I am not sure if this is entirely correct, since:
tree res = out_elts.build ();
will canonicalize the encoding and may result in a stepped sequence
(vector_builder::finalize() may reduce npatterns at the cost of increasing
nelts_per_pattern)  ?

PS: This issue is now latent after PR111648 fix, since
valid_mask_for_fold_vec_perm_cst with  sel = {1, 0, 1, ...} returns
false because the corresponding pattern in arg0 is not a natural
stepped sequence, and folds correctly using VLS exception. However, I
guess the underlying issue of dealing with non-integral element types
in fold_vec_perm_cst still remains ?

The patch passes bootstrap+test with and without SVE on aarch64-linux-gnu,
and on x86_64-linux-gnu.

Thanks,
Prathamesh

[-- Attachment #2: gnu-964-3.txt --]
[-- Type: text/plain, Size: 1192 bytes --]

diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
index 82299bb7f1d..cedfc9616e9 100644
--- a/gcc/fold-const.cc
+++ b/gcc/fold-const.cc
@@ -10642,6 +10642,11 @@ valid_mask_for_fold_vec_perm_cst_p (tree arg0, tree arg1,
   if (sel_nelts_per_pattern < 3)
     return true;
 
+  /* If SEL contains stepped sequence, ensure that we are dealing with
+     integral vector_cst.  */
+  if (!INTEGRAL_TYPE_P (TREE_TYPE (TREE_TYPE (arg0))))
+    return false;
+
   for (unsigned pattern = 0; pattern < sel_npatterns; pattern++)
     {
       poly_uint64 a1 = sel[pattern + sel_npatterns];
diff --git a/gcc/testsuite/gcc.dg/vect/pr111754.c b/gcc/testsuite/gcc.dg/vect/pr111754.c
new file mode 100644
index 00000000000..7c1c16875c7
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr111754.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+typedef float __attribute__((__vector_size__ (16))) F;
+
+F foo (F a, F b)
+{
+  F v = (F) { 9 };
+  return __builtin_shufflevector (v, v, 1, 0, 1, 2);
+}
+
+/* { dg-final { scan-tree-dump-not "VEC_PERM_EXPR" "optimized" } } */
+/* { dg-final { scan-tree-dump "return \{ 0.0, 9.0e\\+0, 0.0, 0.0 \}" "optimized" } } */

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2023-11-28  9:12 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-11-28  7:56 PR111754 juzhe.zhong
2023-11-28  8:43 ` PR111754 Jakub Jelinek
2023-11-28  8:57   ` [PATCH] testsuite: Fix up pr111754.c test Jakub Jelinek
2023-11-28  9:12     ` Richard Biener
  -- strict thread matches above, loose matches on Subject: below --
2023-10-20 17:25 PR111754 Prathamesh Kulkarni
2023-10-24 21:28 ` PR111754 Richard Sandiford
2023-10-25  8:42   ` PR111754 Richard Sandiford
2023-10-25 10:59   ` PR111754 Prathamesh Kulkarni
2023-10-25 22:38     ` PR111754 Richard Sandiford
2023-10-26  4:13       ` PR111754 Prathamesh Kulkarni
2023-11-08 16:27         ` PR111754 Prathamesh Kulkarni
2023-11-15 15:14           ` PR111754 Prathamesh Kulkarni
2023-11-23 11:40             ` PR111754 Prathamesh Kulkarni
2023-11-23 21:43           ` PR111754 Richard Sandiford
2023-11-27 15:13             ` PR111754 Prathamesh Kulkarni
2023-11-27 16:49               ` PR111754 Richard Sandiford

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).