public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug middle-end/111156] New: [14 Regression] aarch64 aarch64/sve/mask_struct_store_4.c failures
@ 2023-08-25 13:35 adhemerval.zanella at linaro dot org
  2023-08-25 18:02 ` [Bug middle-end/111156] " dcb314 at hotmail dot com
                   ` (23 more replies)
  0 siblings, 24 replies; 25+ messages in thread
From: adhemerval.zanella at linaro dot org @ 2023-08-25 13:35 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111156

            Bug ID: 111156
           Summary: [14 Regression] aarch64
                    aarch64/sve/mask_struct_store_4.c failures
           Product: gcc
           Version: 14.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: middle-end
          Assignee: unassigned at gcc dot gnu.org
          Reporter: adhemerval.zanella at linaro dot org
  Target Milestone: ---

After
https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git;h=a1558e9ad856938f165f838733955b331ebbec09,
I have noticed regressions on aarch64:

Running gcc:gcc.target/aarch64/sve/aarch64-sve.exp ...
FAIL: gcc.target/aarch64/sve/mask_struct_store_4.c (internal compiler error: in
get_group_load_store_type, at tree-vect-stmts.cc:2121)
FAIL: gcc.target/aarch64/sve/mask_struct_store_4.c (test for excess errors)
UNRESOLVED: gcc.target/aarch64/sve/mask_struct_store_4.c scan-assembler-not
\\tst2b\\t.z[0-9]
UNRESOLVED: gcc.target/aarch64/sve/mask_struct_store_4.c scan-assembler-not
\\tst2d\\t.z[0-9]
UNRESOLVED: gcc.target/aarch64/sve/mask_struct_store_4.c scan-assembler-not
\\tst2h\\t.z[0-9]
UNRESOLVED: gcc.target/aarch64/sve/mask_struct_store_4.c scan-assembler-not
\\tst2w\\t.z[0-9]

(As indicated by Linaro CI
https://ci.linaro.org/job/tcwg_gnu_native_check_gcc--master-aarch64-build/570/artifact/artifacts/notify/mail-body.txt/*view*/)

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug middle-end/111156] [14 Regression] aarch64 aarch64/sve/mask_struct_store_4.c failures
  2023-08-25 13:35 [Bug middle-end/111156] New: [14 Regression] aarch64 aarch64/sve/mask_struct_store_4.c failures adhemerval.zanella at linaro dot org
@ 2023-08-25 18:02 ` dcb314 at hotmail dot com
  2023-08-25 18:12 ` dcb314 at hotmail dot com
                   ` (22 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: dcb314 at hotmail dot com @ 2023-08-25 18:02 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111156

David Binderman <dcb314 at hotmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |dcb314 at hotmail dot com

--- Comment #1 from David Binderman <dcb314 at hotmail dot com> ---
I see this also, on x86_64, with -O2 -march=znver1.

I will reduce the code.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug middle-end/111156] [14 Regression] aarch64 aarch64/sve/mask_struct_store_4.c failures
  2023-08-25 13:35 [Bug middle-end/111156] New: [14 Regression] aarch64 aarch64/sve/mask_struct_store_4.c failures adhemerval.zanella at linaro dot org
  2023-08-25 18:02 ` [Bug middle-end/111156] " dcb314 at hotmail dot com
@ 2023-08-25 18:12 ` dcb314 at hotmail dot com
  2023-08-25 18:15 ` dcb314 at hotmail dot com
                   ` (21 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: dcb314 at hotmail dot com @ 2023-08-25 18:12 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111156

--- Comment #2 from David Binderman <dcb314 at hotmail dot com> ---
The bug first seems to appear sometime between g:93f803d53b5ccaab
and g:68f7cb6cf9e8b9f2, some 39 commits.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug middle-end/111156] [14 Regression] aarch64 aarch64/sve/mask_struct_store_4.c failures
  2023-08-25 13:35 [Bug middle-end/111156] New: [14 Regression] aarch64 aarch64/sve/mask_struct_store_4.c failures adhemerval.zanella at linaro dot org
  2023-08-25 18:02 ` [Bug middle-end/111156] " dcb314 at hotmail dot com
  2023-08-25 18:12 ` dcb314 at hotmail dot com
@ 2023-08-25 18:15 ` dcb314 at hotmail dot com
  2023-08-28  7:13 ` rguenth at gcc dot gnu.org
                   ` (20 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: dcb314 at hotmail dot com @ 2023-08-25 18:15 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111156

--- Comment #3 from David Binderman <dcb314 at hotmail dot com> ---
Reduced C code seems to be:

struct median_estimator {
  long median;
  long step
} median_diff_ts[];
median_estimator_update_data, median_estimator_update_diff,
    median_estimator_update_median, mm_profile_print_i;
median_estimator_update(struct median_estimator *me) {
  if (__builtin_expect(me->step, 0))
    me->median = median_estimator_update_data;
  if (median_estimator_update_diff)
    me->step = median_estimator_update_median;
}
mm_profile_print() {
  mm_profile_print_i = 1;
  for (; mm_profile_print_i; mm_profile_print_i++)
    median_estimator_update(&median_diff_ts[mm_profile_print_i]);
}

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug middle-end/111156] [14 Regression] aarch64 aarch64/sve/mask_struct_store_4.c failures
  2023-08-25 13:35 [Bug middle-end/111156] New: [14 Regression] aarch64 aarch64/sve/mask_struct_store_4.c failures adhemerval.zanella at linaro dot org
                   ` (2 preceding siblings ...)
  2023-08-25 18:15 ` dcb314 at hotmail dot com
@ 2023-08-28  7:13 ` rguenth at gcc dot gnu.org
  2023-08-31 17:25 ` adhemerval.zanella at linaro dot org
                   ` (19 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-08-28  7:13 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111156

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Resolution|---                         |DUPLICATE
   Target Milestone|---                         |14.0
             Status|UNCONFIRMED                 |RESOLVED

--- Comment #4 from Richard Biener <rguenth at gcc dot gnu.org> ---
Dup.

*** This bug has been marked as a duplicate of bug 111136 ***

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug middle-end/111156] [14 Regression] aarch64 aarch64/sve/mask_struct_store_4.c failures
  2023-08-25 13:35 [Bug middle-end/111156] New: [14 Regression] aarch64 aarch64/sve/mask_struct_store_4.c failures adhemerval.zanella at linaro dot org
                   ` (3 preceding siblings ...)
  2023-08-28  7:13 ` rguenth at gcc dot gnu.org
@ 2023-08-31 17:25 ` adhemerval.zanella at linaro dot org
  2023-11-24  0:45 ` pinskia at gcc dot gnu.org
                   ` (18 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: adhemerval.zanella at linaro dot org @ 2023-08-31 17:25 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111156

Adhemerval Zanella <adhemerval.zanella at linaro dot org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Resolution|DUPLICATE                   |---
             Status|RESOLVED                    |UNCONFIRMED

--- Comment #5 from Adhemerval Zanella <adhemerval.zanella at linaro dot org> ---
Reopening since this is not a duplicate of bug 111136.  The issue is
mask_struct_store_4.c generates the very instructions that the test is
checking:

$ ./gcc/xgcc -Bgcc -march=armv8.2-a+sve -O2 -ftree-vectorize -ffast-math
[..]/gcc/testsuite/gcc.target/aarch64/sve/mask_struct_store_4.c -S -o - | grep
st2b
        st2b    {z30.b - z31.b}, p7, [x0, x5]
        st2b    {z28.b - z29.b}, p6, [x0, x5]
        st2b    {z28.b - z29.b}, p6, [x0, x5]
        st2b    {z28.b - z29.b}, p7, [x0, x5]
        st2b    {z26.b - z27.b}, p6, [x0, x5]
        st2b    {z26.b - z27.b}, p6, [x0, x5]
        st2b    {z28.b - z29.b}, p7, [x0, x5]
        st2b    {z26.b - z27.b}, p6, [x0, x5]
        st2b    {z26.b - z27.b}, p6, [x0, x5]
        st2b    {z28.b - z29.b}, p7, [x0, x5]
        st2b    {z26.b - z27.b}, p6, [x0, x5]
        st2b    {z26.b - z27.b}, p6, [x0, x5]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug middle-end/111156] [14 Regression] aarch64 aarch64/sve/mask_struct_store_4.c failures
  2023-08-25 13:35 [Bug middle-end/111156] New: [14 Regression] aarch64 aarch64/sve/mask_struct_store_4.c failures adhemerval.zanella at linaro dot org
                   ` (4 preceding siblings ...)
  2023-08-31 17:25 ` adhemerval.zanella at linaro dot org
@ 2023-11-24  0:45 ` pinskia at gcc dot gnu.org
  2024-01-15 13:52 ` rguenth at gcc dot gnu.org
                   ` (17 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-11-24  0:45 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111156

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
     Ever confirmed|0                           |1
                 CC|                            |pinskia at gcc dot gnu.org
   Last reconfirmed|                            |2023-11-24

--- Comment #6 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Confirmed.

I was going through all of the failures on aarch64 today and noticed this one
still fails.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug middle-end/111156] [14 Regression] aarch64 aarch64/sve/mask_struct_store_4.c failures
  2023-08-25 13:35 [Bug middle-end/111156] New: [14 Regression] aarch64 aarch64/sve/mask_struct_store_4.c failures adhemerval.zanella at linaro dot org
                   ` (5 preceding siblings ...)
  2023-11-24  0:45 ` pinskia at gcc dot gnu.org
@ 2024-01-15 13:52 ` rguenth at gcc dot gnu.org
  2024-02-01 10:15 ` tnfchris at gcc dot gnu.org
                   ` (16 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-01-15 13:52 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111156

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|                            |needs-bisection
           Priority|P3                          |P1

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug middle-end/111156] [14 Regression] aarch64 aarch64/sve/mask_struct_store_4.c failures
  2023-08-25 13:35 [Bug middle-end/111156] New: [14 Regression] aarch64 aarch64/sve/mask_struct_store_4.c failures adhemerval.zanella at linaro dot org
                   ` (6 preceding siblings ...)
  2024-01-15 13:52 ` rguenth at gcc dot gnu.org
@ 2024-02-01 10:15 ` tnfchris at gcc dot gnu.org
  2024-02-02  3:31 ` pinskia at gcc dot gnu.org
                   ` (15 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: tnfchris at gcc dot gnu.org @ 2024-02-01 10:15 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111156

Tamar Christina <tnfchris at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |tnfchris at gcc dot gnu.org

--- Comment #7 from Tamar Christina <tnfchris at gcc dot gnu.org> ---
Yeah, I know it started between g:4e27ba6e2dd85a5ad4751c35270dbd8f277302dd and
g:721f7e2c4e5eed645593258624dd91e6c39f3bd2 but the bisect is hard because some
of the commits produce an ICE instead.

The bisects lands at

commit a739bac402ea5a583e43dbd01c14ebaff317c885 (refs/bisect/bad)
Author: Richard Biener <rguenther@suse.de>
Date:   Fri Aug 25 09:42:16 2023 +0200

    tree-optimization/111136 - STMT_VINFO_SLP_VECT_ONLY and stores

    vect_dissolve_slp_only_groups currently only expects loads, for stores
    we have to make sure to mark the dissolved "groups" strided.

            PR tree-optimization/111136
            * tree-vect-loop.cc (vect_dissolve_slp_only_groups): For
            stores force STMT_VINFO_STRIDED_P and also duplicate that
            to all elements.

but the previous commit seems to be an ICE? so I guess this one will have to be
done the hard way.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug middle-end/111156] [14 Regression] aarch64 aarch64/sve/mask_struct_store_4.c failures
  2023-08-25 13:35 [Bug middle-end/111156] New: [14 Regression] aarch64 aarch64/sve/mask_struct_store_4.c failures adhemerval.zanella at linaro dot org
                   ` (7 preceding siblings ...)
  2024-02-01 10:15 ` tnfchris at gcc dot gnu.org
@ 2024-02-02  3:31 ` pinskia at gcc dot gnu.org
  2024-02-02  3:51 ` pinskia at gcc dot gnu.org
                   ` (14 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: pinskia at gcc dot gnu.org @ 2024-02-02  3:31 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111156

--- Comment #8 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Created attachment 57286
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57286&action=edit
Testcase that shows this is wrong code

I reduced the testcase into something which shows it is wrong code too.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug middle-end/111156] [14 Regression] aarch64 aarch64/sve/mask_struct_store_4.c failures
  2023-08-25 13:35 [Bug middle-end/111156] New: [14 Regression] aarch64 aarch64/sve/mask_struct_store_4.c failures adhemerval.zanella at linaro dot org
                   ` (8 preceding siblings ...)
  2024-02-02  3:31 ` pinskia at gcc dot gnu.org
@ 2024-02-02  3:51 ` pinskia at gcc dot gnu.org
  2024-02-02 20:10 ` pinskia at gcc dot gnu.org
                   ` (13 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: pinskia at gcc dot gnu.org @ 2024-02-02  3:51 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111156

--- Comment #9 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Note I think GCC should be able to vectorize this loop but it goes wrong.


SVE the 7 part gets lost:

```
  vect__3.12_54 = .MASK_LOAD (_48, 16B, loop_mask_52);
  _32 = cond_17(D) + POLY_INT_CST [16, 16];
  _25 = &MEM <vector([8,8]) short unsigned int> [(uint16_t *)_32 + ivtmp_77 *
2];
  vect__3.13_56 = .MASK_LOAD (_25, 16B, loop_mask_53);
  _1 = &MEM <vector([16,16]) signed char> [(int8_t *)src_18(D) + ivtmp_77 * 1];
  vect_pretmp_29.16_60 = .MASK_LOAD (_1, 8B, loop_mask_59);
  mask__14.19_66 = vect__3.12_54 > { 2, ... };
  mask__14.19_67 = vect__3.13_56 > { 2, ... };
  mask_patt_4.20_68 = VEC_PACK_TRUNC_EXPR <mask__14.19_66, mask__14.19_67>;
  vect_array.23 ={v} {CLOBBER};
  vect_array.23[0] = vect_pretmp_29.16_60;
  vect_array.23[1] = vect_pretmp_29.16_60;
  vec_mask_and_74 = loop_mask_59 & mask_patt_4.20_68;
  _2 = ivtmp_77 * 2;
  _3 = &MEM <vector([16,16]) signed char[2]> [(int8_t *)dest_19(D) + _2 * 1];
```


But RISCV is able to vectorize it correctly:
```
  vect__3.12_52 = .MASK_LEN_LOAD (vectp_cond.10_13, 16B, { -1, ... }, _72, 0);
  vect_pretmp_29.15_56 = .MASK_LEN_LOAD (vectp_src.13_54, 8B, { -1, ... }, _72,
0);
  mask__27.16_58 = vect__3.12_52 <= { 7, ... };
  .MASK_LEN_SCATTER_STORE (vectp_dest.17_60, { 0, 2, 4, ... }, 1,
vect_pretmp_29.15_56, mask__27.16_58, _72, 0);
  mask__14.19_64 = vect__3.12_52 > { 2, ... };
  .MASK_LEN_SCATTER_STORE (vectp_dest.20_67, { 0, 2, 4, ... }, 1,
vect_pretmp_29.15_56, mask__14.19_64, _72, 0);

```

By using 2 stores and scatter here.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug middle-end/111156] [14 Regression] aarch64 aarch64/sve/mask_struct_store_4.c failures
  2023-08-25 13:35 [Bug middle-end/111156] New: [14 Regression] aarch64 aarch64/sve/mask_struct_store_4.c failures adhemerval.zanella at linaro dot org
                   ` (9 preceding siblings ...)
  2024-02-02  3:51 ` pinskia at gcc dot gnu.org
@ 2024-02-02 20:10 ` pinskia at gcc dot gnu.org
  2024-02-02 20:26 ` pinskia at gcc dot gnu.org
                   ` (12 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: pinskia at gcc dot gnu.org @ 2024-02-02 20:10 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111156

--- Comment #10 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Note I think gcc.target/aarch64/sve/mask_struct_load_3_run.c is the runtime
failure I mentioned.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug middle-end/111156] [14 Regression] aarch64 aarch64/sve/mask_struct_store_4.c failures
  2023-08-25 13:35 [Bug middle-end/111156] New: [14 Regression] aarch64 aarch64/sve/mask_struct_store_4.c failures adhemerval.zanella at linaro dot org
                   ` (10 preceding siblings ...)
  2024-02-02 20:10 ` pinskia at gcc dot gnu.org
@ 2024-02-02 20:26 ` pinskia at gcc dot gnu.org
  2024-02-14 20:11 ` tnfchris at gcc dot gnu.org
                   ` (11 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: pinskia at gcc dot gnu.org @ 2024-02-02 20:26 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111156

--- Comment #11 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
(In reply to Andrew Pinski from comment #10)
> Note I think gcc.target/aarch64/sve/mask_struct_load_3_run.c is the runtime
> failure I mentioned.

And gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c (when tested with
-march=armv9-a which I added to my testing recently).

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug middle-end/111156] [14 Regression] aarch64 aarch64/sve/mask_struct_store_4.c failures
  2023-08-25 13:35 [Bug middle-end/111156] New: [14 Regression] aarch64 aarch64/sve/mask_struct_store_4.c failures adhemerval.zanella at linaro dot org
                   ` (11 preceding siblings ...)
  2024-02-02 20:26 ` pinskia at gcc dot gnu.org
@ 2024-02-14 20:11 ` tnfchris at gcc dot gnu.org
  2024-02-15  8:34 ` rguenth at gcc dot gnu.org
                   ` (10 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: tnfchris at gcc dot gnu.org @ 2024-02-14 20:11 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111156

Tamar Christina <tnfchris at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |rguenth at gcc dot gnu.org

--- Comment #12 from Tamar Christina <tnfchris at gcc dot gnu.org> ---
The commit that caused it is:

commit g:a1558e9ad856938f165f838733955b331ebbec09
Author: Richard Biener <rguenther@suse.de>
Date:   Wed Aug 23 14:28:26 2023 +0200

    tree-optimization/111115 - SLP of masked stores

    The following adds the capability to do SLP on .MASK_STORE, I do not
    plan to add interleaving support.

specifically this change:

diff --git a/gcc/tree-vect-data-refs.cc b/gcc/tree-vect-data-refs.cc
index 3e9a284666c..a2caf6cb1c7 100644
--- a/gcc/tree-vect-data-refs.cc
+++ b/gcc/tree-vect-data-refs.cc
@@ -3048,8 +3048,7 @@ can_group_stmts_p (stmt_vec_info stmt1_info,
stmt_vec_info stmt2_info,
         like those created by build_mask_conversion.  */
       tree mask1 = gimple_call_arg (call1, 2);
       tree mask2 = gimple_call_arg (call2, 2);
-      if (!operand_equal_p (mask1, mask2, 0)
-          && (ifn == IFN_MASK_STORE || !allow_slp_p))
+      if (!operand_equal_p (mask1, mask2, 0) && !allow_slp_p)
        {
          mask1 = strip_conversion (mask1);
          if (!mask1)

With the change it now incorrectly thinks that the two masks (a <=7, a > 2) are
the same which is why one of the masks go missing.

Part of it is that the boolean is used in a weird way. During
vect_analyze_data_ref_accesses where this difference is important we pass true
in the initial check. but the || before made it so that we checked the
MASK_STOREs still.  Now it means during analysis we never check.

later on in the same method we check it again but with false as the argument
for determining STMT_VINFO_SLP_VECT_ONLY.
The debug statement there is weird btw, as it says:

          if (dump_enabled_p () && STMT_VINFO_SLP_VECT_ONLY (stmtinfo_a))
            dump_printf_loc (MSG_NOTE, vect_location,
                             "Load suitable for SLP vectorization only.\n");

but as far as I can see, stmtinfo_a can be a store too, based on the checks for
DR_IS_READ (dra) just slightly higher up.

The patch that added this check (g:997636716c5dde7d59d026726a6f58918069f122)
says it's because the vectorizer doesn't support SLP of masked loads, and I
can't tell if we do now.

If we do, the boolean should be dropped.. if we don't, we probably need the
check back to allow the check for stores.  It looks like this check us being
used to disable STMT_VINFO_SLP_VECT_ONLY for loads, which is a bit counter
intuitive and feels like a hack rather than just doing:

          STMT_VINFO_SLP_VECT_ONLY (stmtinfo_a)
-           = !can_group_stmts_p (stmtinfo_a, stmtinfo_b, false);
+           = !can_group_stmts_p (stmtinfo_a, stmtinfo_b)
+             && DR_IS_WRITE (dra);

So I think the boolean should be dropped and just reject loads for
STMT_VINFO_SLP_VECT_ONLY...
This also seems to give much better codegen... in any case, richi?

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug middle-end/111156] [14 Regression] aarch64 aarch64/sve/mask_struct_store_4.c failures
  2023-08-25 13:35 [Bug middle-end/111156] New: [14 Regression] aarch64 aarch64/sve/mask_struct_store_4.c failures adhemerval.zanella at linaro dot org
                   ` (12 preceding siblings ...)
  2024-02-14 20:11 ` tnfchris at gcc dot gnu.org
@ 2024-02-15  8:34 ` rguenth at gcc dot gnu.org
  2024-02-15  8:39 ` tnfchris at gcc dot gnu.org
                   ` (9 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-02-15  8:34 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111156

--- Comment #13 from Richard Biener <rguenth at gcc dot gnu.org> ---
I didn't add STMT_VINFO_SLP_VECT_ONLY, I'm quite sure we can now do both SLP of
masked loads and stores, so yes, STMT_VINFO_SLP_VECT_ONLY (when we formed
a DR group of stmts we cannot combine without SLP as the masks are not equal)
should be set for both loads and stores.

The can_group_stmts_p checks as present seem correct here (but the dump
should not say "Load" but maybe "Access")

So it looks like the issue is with "late" deciding we can't actually do the
masked SLP store (why?) and the odd "vect_dissolve_slp_only_groups" and
then somehow botching strided store code-gen which likely doesn't expect
masks or should have disabled fully masking?  I'll note that we don't
support single element interleaving for stores, so vect_analyze_group_access_1
would have falled back to STMT_VINFO_STRIDED_P.  But as said, maybe that
somehow misses to disable loop masking then when vect_analyze_loop_operations?

So what's the testcase comment#9 talks about?

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug middle-end/111156] [14 Regression] aarch64 aarch64/sve/mask_struct_store_4.c failures
  2023-08-25 13:35 [Bug middle-end/111156] New: [14 Regression] aarch64 aarch64/sve/mask_struct_store_4.c failures adhemerval.zanella at linaro dot org
                   ` (13 preceding siblings ...)
  2024-02-15  8:34 ` rguenth at gcc dot gnu.org
@ 2024-02-15  8:39 ` tnfchris at gcc dot gnu.org
  2024-02-15  8:41 ` tnfchris at gcc dot gnu.org
                   ` (8 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: tnfchris at gcc dot gnu.org @ 2024-02-15  8:39 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111156

--- Comment #14 from Tamar Christina <tnfchris at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #13)
> I didn't add STMT_VINFO_SLP_VECT_ONLY, I'm quite sure we can now do both SLP
> of masked loads and stores, so yes, STMT_VINFO_SLP_VECT_ONLY (when we formed
> a DR group of stmts we cannot combine without SLP as the masks are not equal)
> should be set for both loads and stores.
> 
> The can_group_stmts_p checks as present seem correct here (but the dump
> should not say "Load" but maybe "Access")

I guess I'm wondering because of this usage:

          /* Check that the data-refs have same first location (except init)
             and they are both either store or load (not load and store,
             not masked loads or stores).  */
          if (DR_IS_READ (dra) != DR_IS_READ (drb)
              || data_ref_compare_tree (DR_BASE_ADDRESS (dra),
                                        DR_BASE_ADDRESS (drb)) != 0
              || data_ref_compare_tree (DR_OFFSET (dra), DR_OFFSET (drb)) != 0
              || !can_group_stmts_p (stmtinfo_a, stmtinfo_b, true))
            break;

We don't exit there now for non-SLP.

> 
> So what's the testcase comment#9 talks about?

You should be able to reproduce it with:

---
typedef __SIZE_TYPE__ size_t;
typedef signed char int8_t;
typedef unsigned short uint16_t ;

void __attribute__((noinline, noclone))
test_i8_i8_i16_2(int8_t *__restrict dest, int8_t *__restrict src,
                 uint16_t *__restrict cond, size_t n) {
    for (size_t i = 0; i < n; ++i) {
        if (cond[i] < 8)
            dest[i * 2] = src[i];
        if (cond[i] > 2)
            dest[i * 2 + 1] = src[i];
    }
}
void __attribute__((noinline, noclone))
test_i8_i8_i16_2_1(volatile int8_t * dest, volatile int8_t * src,
                   volatile uint16_t * cond, size_t n) {
#pragma GCC novector
    for (size_t i = 0; i < n; ++i) {
        if (cond[i] < 8)
            dest[i * 2] = src[i];
        if (cond[i] > 2)
            dest[i * 2 + 1] = src[i];
    }
}

#define size 16

int8_t srcarray[size];
uint16_t maskarray[size];
int8_t destarray[size*2];
int8_t destarray1[size*2];

int main()
{
#pragma GCC novector
  for(int i = 0; i < size; i++)
  {
    maskarray[i] = i == 10 ? 0 : (i == 5 ? 9 : (21111*i) & 0xff);
    srcarray[i] = i;
  }
#pragma GCC novector
  for(int i = 0; i < size*2; i++)
  {
    destarray[i] = i;
    destarray1[i] = i;
  }
  test_i8_i8_i16_2(destarray, srcarray, maskarray, size);
  test_i8_i8_i16_2_1(destarray1, srcarray, maskarray, size);

#pragma GCC novector
  for(int i = 0; i < size*2; i++)
  {
    if (destarray[i] != destarray1[i])
      __builtin_abort();
  }
}

---

since really only one of the functions needs to vectorize.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug middle-end/111156] [14 Regression] aarch64 aarch64/sve/mask_struct_store_4.c failures
  2023-08-25 13:35 [Bug middle-end/111156] New: [14 Regression] aarch64 aarch64/sve/mask_struct_store_4.c failures adhemerval.zanella at linaro dot org
                   ` (14 preceding siblings ...)
  2024-02-15  8:39 ` tnfchris at gcc dot gnu.org
@ 2024-02-15  8:41 ` tnfchris at gcc dot gnu.org
  2024-02-15 10:57 ` rguenth at gcc dot gnu.org
                   ` (7 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: tnfchris at gcc dot gnu.org @ 2024-02-15  8:41 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111156

--- Comment #15 from Tamar Christina <tnfchris at gcc dot gnu.org> ---
and just -O3 -march=armv8-a+sve

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug middle-end/111156] [14 Regression] aarch64 aarch64/sve/mask_struct_store_4.c failures
  2023-08-25 13:35 [Bug middle-end/111156] New: [14 Regression] aarch64 aarch64/sve/mask_struct_store_4.c failures adhemerval.zanella at linaro dot org
                   ` (15 preceding siblings ...)
  2024-02-15  8:41 ` tnfchris at gcc dot gnu.org
@ 2024-02-15 10:57 ` rguenth at gcc dot gnu.org
  2024-02-15 12:41 ` rguenth at gcc dot gnu.org
                   ` (6 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-02-15 10:57 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111156

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |ASSIGNED
           Assignee|unassigned at gcc dot gnu.org      |rguenth at gcc dot gnu.org

--- Comment #16 from Richard Biener <rguenth at gcc dot gnu.org> ---
OK, so the missed SLP is a known one:

t.c:8:26: note:   starting SLP discovery for node 0x5d42840
t.c:8:26: note:   Build SLP for _27 = _3 <= 7;
t.c:8:26: note:   precomputed vectype: vector([8,8]) <signed-boolean:2>
t.c:8:26: note:   nunits = [8,8]
t.c:8:26: note:   Build SLP for _14 = _3 > 2;
t.c:8:26: note:   precomputed vectype: vector([8,8]) <signed-boolean:2>
t.c:8:26: note:   nunits = [8,8]
t.c:8:26: missed:   Build SLP failed: different operation in stmt _14 = _3 > 2;
t.c:8:26: missed:   original stmt _27 = _3 <= 7;

I'm not sure we can do this with a single vector stmt but of course using
'two_operator' support might be possible here (do both > and <= and then
blend the result).

I see we end up using .MASK_STORE_LANES in the end but we're not using
load-lanes.

t.c:8:26: note:   ==> examining pattern statement: .MASK_STORE (_5, 8B,
patt_12, pretmp_29);
t.c:8:26: note:   vect_is_simple_use: operand (<signed-boolean:1>) _27, type of
def: internal
t.c:8:26: note:   vect_is_simple_use: vectype vector([16,16])
<signed-boolean:1>
t.c:8:26: note:   vect_is_simple_use: operand *_28, type of def: internal
t.c:8:26: note:   vect_is_simple_use: vectype vector([16,16]) signed char
t.c:8:26: missed:   cannot use vec_mask_len_store_lanes<VNx32QI><VNx16QI>
t.c:8:26: note:   can use vec_mask_store_lanes<VNx32QI><VNx16QI>
t.c:8:26: note:   vect_is_simple_use: operand *_28, type of def: internal
t.c:8:26: missed:   cannot use vec_mask_len_store_lanes<VNx32QI><VNx16QI>
t.c:8:26: note:   can use vec_mask_store_lanes<VNx32QI><VNx16QI>
...
t.c:8:26: note:   ==> examining pattern statement: .MASK_STORE (_9, 8B, patt_4,
pretmp_29);
t.c:8:26: note:   vect_is_simple_use: operand (<signed-boolean:1>) _14, type of
def: internal
t.c:8:26: note:   vect_is_simple_use: vectype vector([16,16])
<signed-boolean:1>
t.c:8:26: note:   vect_is_simple_use: operand *_28, type of def: internal
t.c:8:26: note:   vect_is_simple_use: vectype vector([16,16]) signed char
t.c:8:26: missed:   cannot use vec_mask_len_store_lanes<VNx32QI><VNx16QI>
t.c:8:26: note:   can use vec_mask_store_lanes<VNx32QI><VNx16QI>
t.c:8:26: missed:   cannot use vec_mask_len_store_lanes<VNx32QI><VNx16QI>
t.c:8:26: note:   can use vec_mask_store_lanes<VNx32QI><VNx16QI>

and somehow transform decides to put the two stores together again, probably
missing to verify the masks are the same.

I'll dig a bit more after lunch.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug middle-end/111156] [14 Regression] aarch64 aarch64/sve/mask_struct_store_4.c failures
  2023-08-25 13:35 [Bug middle-end/111156] New: [14 Regression] aarch64 aarch64/sve/mask_struct_store_4.c failures adhemerval.zanella at linaro dot org
                   ` (16 preceding siblings ...)
  2024-02-15 10:57 ` rguenth at gcc dot gnu.org
@ 2024-02-15 12:41 ` rguenth at gcc dot gnu.org
  2024-02-15 13:02 ` rguenth at gcc dot gnu.org
                   ` (5 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-02-15 12:41 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111156

--- Comment #17 from Richard Biener <rguenth at gcc dot gnu.org> ---
I think the following fixes it, can you verify the runtime (IL looks sane, but
it uses masked scatter stores).

diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 9e26b09504d..5a5865c42fc 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -2551,7 +2551,8 @@ vect_dissolve_slp_only_groups (loop_vec_info loop_vinfo)
   FOR_EACH_VEC_ELT (datarefs, i, dr)
     {
       gcc_assert (DR_REF (dr));
-      stmt_vec_info stmt_info = loop_vinfo->lookup_stmt (DR_STMT (dr));
+      stmt_vec_info stmt_info
+       = vect_stmt_to_vectorize (loop_vinfo->lookup_stmt (DR_STMT (dr)));

       /* Check if the load is a part of an interleaving chain.  */
       if (STMT_VINFO_GROUPED_ACCESS (stmt_info))

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug middle-end/111156] [14 Regression] aarch64 aarch64/sve/mask_struct_store_4.c failures
  2023-08-25 13:35 [Bug middle-end/111156] New: [14 Regression] aarch64 aarch64/sve/mask_struct_store_4.c failures adhemerval.zanella at linaro dot org
                   ` (17 preceding siblings ...)
  2024-02-15 12:41 ` rguenth at gcc dot gnu.org
@ 2024-02-15 13:02 ` rguenth at gcc dot gnu.org
  2024-02-15 14:38 ` cvs-commit at gcc dot gnu.org
                   ` (4 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-02-15 13:02 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111156

--- Comment #18 from Richard Biener <rguenth at gcc dot gnu.org> ---
diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index 7cf9504398c..8deeecfd4aa 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -1280,8 +1280,11 @@ vect_build_slp_tree_1 (vec_info *vinfo, unsigned char
*swap,
                    && rhs_code.is_tree_code ()
                    && (TREE_CODE_CLASS (tree_code (first_stmt_code))
                        == tcc_comparison)
-                   && (swap_tree_comparison (tree_code (first_stmt_code))
-                       == tree_code (rhs_code)))
+                   && ((swap_tree_comparison (tree_code (first_stmt_code))
+                        == tree_code (rhs_code))
+                       || ((TREE_CODE_CLASS (tree_code (alt_stmt_code))
+                            == tcc_comparison)
+                           && rhs_code == alt_stmt_code)))
               && !(STMT_VINFO_GROUPED_ACCESS (stmt_info)
                    && (first_stmt_code == ARRAY_REF
                        || first_stmt_code == BIT_FIELD_REF

should get you SLP but:

t.c:8:26: note:   === vect_slp_analyze_operations ===
t.c:8:26: note:   ==> examining statement: pretmp_29 = *_28;
t.c:8:26: missed:   unsupported load permutation
t.c:10:30: missed:   not vectorized: relevant stmt not supported: pretmp_29 =
*_28;

t.c:8:26: note:   op template: pretmp_29 = *_28;
t.c:8:26: note:         stmt 0 pretmp_29 = *_28;
t.c:8:26: note:         stmt 1 pretmp_29 = *_28;
t.c:8:26: note:         load permutation { 0 0 }

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug middle-end/111156] [14 Regression] aarch64 aarch64/sve/mask_struct_store_4.c failures
  2023-08-25 13:35 [Bug middle-end/111156] New: [14 Regression] aarch64 aarch64/sve/mask_struct_store_4.c failures adhemerval.zanella at linaro dot org
                   ` (18 preceding siblings ...)
  2024-02-15 13:02 ` rguenth at gcc dot gnu.org
@ 2024-02-15 14:38 ` cvs-commit at gcc dot gnu.org
  2024-02-15 14:39 ` rguenth at gcc dot gnu.org
                   ` (3 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2024-02-15 14:38 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111156

--- Comment #19 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Richard Biener <rguenth@gcc.gnu.org>:

https://gcc.gnu.org/g:b312cf21afd62b43fbc5034703e2796b0c3c416d

commit r14-9011-gb312cf21afd62b43fbc5034703e2796b0c3c416d
Author: Richard Biener <rguenther@suse.de>
Date:   Thu Feb 15 13:41:25 2024 +0100

    tree-optimization/111156 - properly dissolve SLP only groups

    The following fixes the omission of failing to look at pattern
    stmts when we need to dissolve SLP only groups.

            PR tree-optimization/111156
            * tree-vect-loop.cc (vect_dissolve_slp_only_groups): Look
            at the pattern stmt if any.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug middle-end/111156] [14 Regression] aarch64 aarch64/sve/mask_struct_store_4.c failures
  2023-08-25 13:35 [Bug middle-end/111156] New: [14 Regression] aarch64 aarch64/sve/mask_struct_store_4.c failures adhemerval.zanella at linaro dot org
                   ` (19 preceding siblings ...)
  2024-02-15 14:38 ` cvs-commit at gcc dot gnu.org
@ 2024-02-15 14:39 ` rguenth at gcc dot gnu.org
  2024-02-15 18:53 ` tnfchris at gcc dot gnu.org
                   ` (2 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-02-15 14:39 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111156

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Resolution|---                         |FIXED
             Status|ASSIGNED                    |RESOLVED

--- Comment #20 from Richard Biener <rguenth at gcc dot gnu.org> ---
fixed.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug middle-end/111156] [14 Regression] aarch64 aarch64/sve/mask_struct_store_4.c failures
  2023-08-25 13:35 [Bug middle-end/111156] New: [14 Regression] aarch64 aarch64/sve/mask_struct_store_4.c failures adhemerval.zanella at linaro dot org
                   ` (20 preceding siblings ...)
  2024-02-15 14:39 ` rguenth at gcc dot gnu.org
@ 2024-02-15 18:53 ` tnfchris at gcc dot gnu.org
  2024-02-15 21:05 ` rguenther at suse dot de
  2024-02-16 13:19 ` rguenth at gcc dot gnu.org
  23 siblings, 0 replies; 25+ messages in thread
From: tnfchris at gcc dot gnu.org @ 2024-02-15 18:53 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111156

--- Comment #21 from Tamar Christina <tnfchris at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #18)
> diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
> index 7cf9504398c..8deeecfd4aa 100644
> --- a/gcc/tree-vect-slp.cc
> +++ b/gcc/tree-vect-slp.cc
> @@ -1280,8 +1280,11 @@ vect_build_slp_tree_1 (vec_info *vinfo, unsigned char
> *swap,
>                     && rhs_code.is_tree_code ()
>                     && (TREE_CODE_CLASS (tree_code (first_stmt_code))
>                         == tcc_comparison)
> -                   && (swap_tree_comparison (tree_code (first_stmt_code))
> -                       == tree_code (rhs_code)))
> +                   && ((swap_tree_comparison (tree_code (first_stmt_code))
> +                        == tree_code (rhs_code))
> +                       || ((TREE_CODE_CLASS (tree_code (alt_stmt_code))
> +                            == tcc_comparison)
> +                           && rhs_code == alt_stmt_code)))
>                && !(STMT_VINFO_GROUPED_ACCESS (stmt_info)
>                     && (first_stmt_code == ARRAY_REF
>                         || first_stmt_code == BIT_FIELD_REF
> 
> should get you SLP but:
> 
> t.c:8:26: note:   === vect_slp_analyze_operations ===
> t.c:8:26: note:   ==> examining statement: pretmp_29 = *_28;
> t.c:8:26: missed:   unsupported load permutation
> t.c:10:30: missed:   not vectorized: relevant stmt not supported: pretmp_29
> = *_28;
> 
> t.c:8:26: note:   op template: pretmp_29 = *_28;
> t.c:8:26: note:         stmt 0 pretmp_29 = *_28;
> t.c:8:26: note:         stmt 1 pretmp_29 = *_28;
> t.c:8:26: note:         load permutation { 0 0 }

hmm with that applied I get:

sve-mis.c:8:26: note:   ==> examining statement: pretmp_29 = *_28;
sve-mis.c:8:26: note:   Vectorizing an unaligned access.
sve-mis.c:8:26: note:   vect_model_load_cost: unaligned supported by hardware.
sve-mis.c:8:26: note:   vect_model_load_cost: inside_cost = 1, prologue_cost =
0 .

but it bails out at:

sve-mis.c:8:26: missed:   Not using elementwise accesses due to variable
vectorization factor.
sve-mis.c:10:25: missed:   not vectorized: relevant stmt not supported:
.MASK_STORE (_5, 8B, _27, pretmp_29);
sve-mis.c:8:26: missed:  bad operation or unsupported loop bound.

for me

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug middle-end/111156] [14 Regression] aarch64 aarch64/sve/mask_struct_store_4.c failures
  2023-08-25 13:35 [Bug middle-end/111156] New: [14 Regression] aarch64 aarch64/sve/mask_struct_store_4.c failures adhemerval.zanella at linaro dot org
                   ` (21 preceding siblings ...)
  2024-02-15 18:53 ` tnfchris at gcc dot gnu.org
@ 2024-02-15 21:05 ` rguenther at suse dot de
  2024-02-16 13:19 ` rguenth at gcc dot gnu.org
  23 siblings, 0 replies; 25+ messages in thread
From: rguenther at suse dot de @ 2024-02-15 21:05 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111156

--- Comment #22 from rguenther at suse dot de <rguenther at suse dot de> ---
> Am 15.02.2024 um 19:53 schrieb tnfchris at gcc dot gnu.org <gcc-bugzilla@gcc.gnu.org>:
> 
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111156
> 
> --- Comment #21 from Tamar Christina <tnfchris at gcc dot gnu.org> ---
> (In reply to Richard Biener from comment #18)
>> diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
>> index 7cf9504398c..8deeecfd4aa 100644
>> --- a/gcc/tree-vect-slp.cc
>> +++ b/gcc/tree-vect-slp.cc
>> @@ -1280,8 +1280,11 @@ vect_build_slp_tree_1 (vec_info *vinfo, unsigned char
>> *swap,
>>                    && rhs_code.is_tree_code ()
>>                    && (TREE_CODE_CLASS (tree_code (first_stmt_code))
>>                        == tcc_comparison)
>> -                   && (swap_tree_comparison (tree_code (first_stmt_code))
>> -                       == tree_code (rhs_code)))
>> +                   && ((swap_tree_comparison (tree_code (first_stmt_code))
>> +                        == tree_code (rhs_code))
>> +                       || ((TREE_CODE_CLASS (tree_code (alt_stmt_code))
>> +                            == tcc_comparison)
>> +                           && rhs_code == alt_stmt_code)))
>>               && !(STMT_VINFO_GROUPED_ACCESS (stmt_info)
>>                    && (first_stmt_code == ARRAY_REF
>>                        || first_stmt_code == BIT_FIELD_REF
>> 
>> should get you SLP but:
>> 
>> t.c:8:26: note:   === vect_slp_analyze_operations ===
>> t.c:8:26: note:   ==> examining statement: pretmp_29 = *_28;
>> t.c:8:26: missed:   unsupported load permutation
>> t.c:10:30: missed:   not vectorized: relevant stmt not supported: pretmp_29
>> = *_28;
>> 
>> t.c:8:26: note:   op template: pretmp_29 = *_28;
>> t.c:8:26: note:         stmt 0 pretmp_29 = *_28;
>> t.c:8:26: note:         stmt 1 pretmp_29 = *_28;
>> t.c:8:26: note:         load permutation { 0 0 }
> 
> hmm with that applied I get:
> 
> sve-mis.c:8:26: note:   ==> examining statement: pretmp_29 = *_28;
> sve-mis.c:8:26: note:   Vectorizing an unaligned access.
> sve-mis.c:8:26: note:   vect_model_load_cost: unaligned supported by hardware.
> sve-mis.c:8:26: note:   vect_model_load_cost: inside_cost = 1, prologue_cost =
> 0 .
> 
> but it bails out at:
> 
> sve-mis.c:8:26: missed:   Not using elementwise accesses due to variable
> vectorization factor.
> sve-mis.c:10:25: missed:   not vectorized: relevant stmt not supported:
> .MASK_STORE (_5, 8B, _27, pretmp_29);
> sve-mis.c:8:26: missed:  bad operation or unsupported loop bound.
> 
> for me

I’ve used -fno-cost-model and looked at the SVE variant only.

> --
> You are receiving this mail because:
> You are the assignee for the bug.
> You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug middle-end/111156] [14 Regression] aarch64 aarch64/sve/mask_struct_store_4.c failures
  2023-08-25 13:35 [Bug middle-end/111156] New: [14 Regression] aarch64 aarch64/sve/mask_struct_store_4.c failures adhemerval.zanella at linaro dot org
                   ` (22 preceding siblings ...)
  2024-02-15 21:05 ` rguenther at suse dot de
@ 2024-02-16 13:19 ` rguenth at gcc dot gnu.org
  23 siblings, 0 replies; 25+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-02-16 13:19 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111156

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |rsandifo at gcc dot gnu.org

--- Comment #23 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #18)
> diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
> index 7cf9504398c..8deeecfd4aa 100644
> --- a/gcc/tree-vect-slp.cc
> +++ b/gcc/tree-vect-slp.cc
> @@ -1280,8 +1280,11 @@ vect_build_slp_tree_1 (vec_info *vinfo, unsigned char
> *swap,
>                     && rhs_code.is_tree_code ()
>                     && (TREE_CODE_CLASS (tree_code (first_stmt_code))
>                         == tcc_comparison)
> -                   && (swap_tree_comparison (tree_code (first_stmt_code))
> -                       == tree_code (rhs_code)))
> +                   && ((swap_tree_comparison (tree_code (first_stmt_code))
> +                        == tree_code (rhs_code))
> +                       || ((TREE_CODE_CLASS (tree_code (alt_stmt_code))
> +                            == tcc_comparison)
> +                           && rhs_code == alt_stmt_code)))
>                && !(STMT_VINFO_GROUPED_ACCESS (stmt_info)
>                     && (first_stmt_code == ARRAY_REF
>                         || first_stmt_code == BIT_FIELD_REF
> 


diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index 7cf9504398c..e35eeeea3fa 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -1519,7 +1522,9 @@ vect_build_slp_tree_1 (vec_info *vinfo, unsigned char
*swap,
   if (alt_stmt_code != ERROR_MARK
       && (!alt_stmt_code.is_tree_code ()
          || (TREE_CODE_CLASS (tree_code (alt_stmt_code)) != tcc_reference
-             && TREE_CODE_CLASS (tree_code (alt_stmt_code)) !=
tcc_comparison)))
+             && (TREE_CODE_CLASS (tree_code (alt_stmt_code)) != tcc_comparison
+                 || (swap_tree_comparison (tree_code (first_stmt_code))
+                     != tree_code (alt_stmt_code))))))
     {
       *two_operators = true;
     }

is also needed btw. to avoid wrong-code.  I see

t.c:8:26: note:   ==> examining statement: pretmp_29 = *_28;
t.c:8:26: missed:   unsupported load permutation
t.c:10:30: missed:   not vectorized: relevant stmt not supported: pretmp_29 =
*_28;
t.c:8:26: note:   removing SLP instance operations starting from: .MASK_STORE
(_5, 8B, patt_12, pretmp_29);

using -O3 -march=armv8.3-a+sve - it then does

t.c:8:26: missed:  unsupported SLP instances
t.c:8:26: note:  re-trying with SLP disabled

and _that_ fails then with

t.c:8:26: missed:   Not using elementwise accesses due to variable
vectorization factor.
t.c:6:1: missed:   not vectorized: relevant stmt not supported: .MASK_STORE
(_5, 8B, patt_12, pretmp_29);

but the interesting bit is why it fails to handle the SLP case.

That's possibly because the load isn't a grouped access, we get
dr_group_size == 1 and group_size == 2 and nunits is {16, 16}
(!repeating_p) and so

      /* We need to construct a separate mask for each vector statement.  */
      unsigned HOST_WIDE_INT const_nunits, const_vf;
      if (!nunits.is_constant (&const_nunits)
          || !vf.is_constant (&const_vf))
        return false;

I'm not sure what that comment means, but supposedly we simply fail to handle
another special case that we could here?  Possibly dr_group_size == 1?

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2024-02-16 13:19 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-08-25 13:35 [Bug middle-end/111156] New: [14 Regression] aarch64 aarch64/sve/mask_struct_store_4.c failures adhemerval.zanella at linaro dot org
2023-08-25 18:02 ` [Bug middle-end/111156] " dcb314 at hotmail dot com
2023-08-25 18:12 ` dcb314 at hotmail dot com
2023-08-25 18:15 ` dcb314 at hotmail dot com
2023-08-28  7:13 ` rguenth at gcc dot gnu.org
2023-08-31 17:25 ` adhemerval.zanella at linaro dot org
2023-11-24  0:45 ` pinskia at gcc dot gnu.org
2024-01-15 13:52 ` rguenth at gcc dot gnu.org
2024-02-01 10:15 ` tnfchris at gcc dot gnu.org
2024-02-02  3:31 ` pinskia at gcc dot gnu.org
2024-02-02  3:51 ` pinskia at gcc dot gnu.org
2024-02-02 20:10 ` pinskia at gcc dot gnu.org
2024-02-02 20:26 ` pinskia at gcc dot gnu.org
2024-02-14 20:11 ` tnfchris at gcc dot gnu.org
2024-02-15  8:34 ` rguenth at gcc dot gnu.org
2024-02-15  8:39 ` tnfchris at gcc dot gnu.org
2024-02-15  8:41 ` tnfchris at gcc dot gnu.org
2024-02-15 10:57 ` rguenth at gcc dot gnu.org
2024-02-15 12:41 ` rguenth at gcc dot gnu.org
2024-02-15 13:02 ` rguenth at gcc dot gnu.org
2024-02-15 14:38 ` cvs-commit at gcc dot gnu.org
2024-02-15 14:39 ` rguenth at gcc dot gnu.org
2024-02-15 18:53 ` tnfchris at gcc dot gnu.org
2024-02-15 21:05 ` rguenther at suse dot de
2024-02-16 13:19 ` rguenth at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).