[Bug tree-optimization/104010] New: [12 regression] short loop no longer vectorized with Neon after r12-6513

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug tree-optimization/104010] New: [12 regression] short loop no longer vectorized with Neon after r12-6513
@ 2022-01-13 15:52 clyon at gcc dot gnu.org
  2022-01-13 15:56 ` [Bug tree-optimization/104010] " pinskia at gcc dot gnu.org
                   ` (16 more replies)
  0 siblings, 17 replies; 18+ messages in thread
From: clyon at gcc dot gnu.org @ 2022-01-13 15:52 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104010

            Bug ID: 104010
           Summary: [12 regression] short loop no longer vectorized with
                    Neon after r12-6513
           Product: gcc
           Version: 12.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: clyon at gcc dot gnu.org
  Target Milestone: ---

This short loop:
void test_vcmpeq_s32x2 (int32_t * __restrict__ dest, int32_t *a, int32_t *b)
{
  int i;
  for (i=0; i<4; i++) {
    dest[i] = a[i] == b[i];
  }
}

used to be vectorized as:
test_vcmpeq_s32x2:
        vld1.32 {d16}, [r1]
        vmov.i32        d17, #0x1  @ v2si
        vld1.32 {d19}, [r2]
        vmov.i32        d18, #0  @ v2si
        vceq.i32        d16, d16, d19
        vbsl    d16, d17, d18
        vst1.32 {d16}, [r0]
        bx      lr

After r12-6513, we get:
test_vcmpeq_s32x2:
        ldr     ip, [r1]
        ldr     r3, [r1, #4]
        str     lr, [sp, #-4]!
        ldr     lr, [r2]
        ldr     r2, [r2, #4]
        sub     ip, ip, lr
        clz     ip, ip
        sub     r3, r3, r2
        lsr     ip, ip, #5
        clz     r3, r3
        lsr     r3, r3, #5
        str     ip, [r0]
        str     r3, [r0, #4]
        ldr     pc, [sp], #4

when compiling for arm-none-linux-gnueabihf with -mcpu=cortex-a9 -mfpu=neon

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug tree-optimization/104010] [12 regression] short loop no longer vectorized with Neon after r12-6513
  2022-01-13 15:52 [Bug tree-optimization/104010] New: [12 regression] short loop no longer vectorized with Neon after r12-6513 clyon at gcc dot gnu.org
@ 2022-01-13 15:56 ` pinskia at gcc dot gnu.org
  2022-01-13 16:00 ` clyon at gcc dot gnu.org
                   ` (15 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: pinskia at gcc dot gnu.org @ 2022-01-13 15:56 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104010

--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
I think you have the wrong revision in there as that commit only adds a
testcase.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug tree-optimization/104010] [12 regression] short loop no longer vectorized with Neon after r12-6513
  2022-01-13 15:52 [Bug tree-optimization/104010] New: [12 regression] short loop no longer vectorized with Neon after r12-6513 clyon at gcc dot gnu.org
  2022-01-13 15:56 ` [Bug tree-optimization/104010] " pinskia at gcc dot gnu.org
@ 2022-01-13 16:00 ` clyon at gcc dot gnu.org
  2022-01-14  7:58 ` [Bug tree-optimization/104010] [12 regression] short loop no longer vectorized with Neon after r12-3362 rguenth at gcc dot gnu.org
                   ` (14 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: clyon at gcc dot gnu.org @ 2022-01-13 16:00 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104010

--- Comment #2 from Christophe Lyon <clyon at gcc dot gnu.org> ---
Ha right, git gcc-descr with no argument didn't what I expected (ie. git
gcc-descr HEAD after a bisect...)

So I meant r12-3362 g:a3fb781d4b341c0d50ef1b92cd3e8734e673ef18

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug tree-optimization/104010] [12 regression] short loop no longer vectorized with Neon after r12-3362
  2022-01-13 15:52 [Bug tree-optimization/104010] New: [12 regression] short loop no longer vectorized with Neon after r12-6513 clyon at gcc dot gnu.org
  2022-01-13 15:56 ` [Bug tree-optimization/104010] " pinskia at gcc dot gnu.org
  2022-01-13 16:00 ` clyon at gcc dot gnu.org
@ 2022-01-14  7:58 ` rguenth at gcc dot gnu.org
  2022-01-14  8:19 ` rguenth at gcc dot gnu.org
                   ` (13 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: rguenth at gcc dot gnu.org @ 2022-01-14  7:58 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104010

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|---                         |12.0
           Keywords|                            |missed-optimization

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug tree-optimization/104010] [12 regression] short loop no longer vectorized with Neon after r12-3362
  2022-01-13 15:52 [Bug tree-optimization/104010] New: [12 regression] short loop no longer vectorized with Neon after r12-6513 clyon at gcc dot gnu.org
                   ` (2 preceding siblings ...)
  2022-01-14  7:58 ` [Bug tree-optimization/104010] [12 regression] short loop no longer vectorized with Neon after r12-3362 rguenth at gcc dot gnu.org
@ 2022-01-14  8:19 ` rguenth at gcc dot gnu.org
  2022-01-14  8:35 ` rguenth at gcc dot gnu.org
                   ` (12 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: rguenth at gcc dot gnu.org @ 2022-01-14  8:19 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104010

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Last reconfirmed|                            |2022-01-14
     Ever confirmed|0                           |1
           Assignee|unassigned at gcc dot gnu.org      |rguenth at gcc dot gnu.org
             Status|UNCONFIRMED                 |WAITING

--- Comment #3 from Richard Biener <rguenth at gcc dot gnu.org> ---
Not sure if I can parse the assembly.  The rev quoted changes costing, so I
assume the rest is the same.  I see

t.c:5:20: missed:   not vectorized: relevant stmt not supported: _24 = _27 ==
_25;
t.c:5:13: note:   Building vector operands of 0x3411680 from scalars instead
t.c:5:13: note:   ==> examining statement: _22 = (int) _24;
t.c:5:13: missed:   type conversion to/from bit-precision unsupported.
t.c:5:20: missed:   not vectorized: relevant stmt not supported: _22 = (int)
_24;
t.c:5:13: note:   Building vector operands of 0x34115f8 from scalars instead

and so we end up with

t.c:5:13: note: ***** Analysis succeeded with vector mode V8QI
t.c:5:13: note: SLPing BB part
t.c:5:13: note: Costing subgraph:
t.c:5:13: note: node 0x3411570 (max_nunits=4, refcnt=1)
t.c:5:13: note: op template: *dest_15(D) = _22;
t.c:5:13: note:         stmt 0 *dest_15(D) = _22;
t.c:5:13: note:         stmt 1 *_45 = _46;
t.c:5:13: note:         stmt 2 *_60 = _61;
t.c:5:13: note:         stmt 3 *_8 = _9;
t.c:5:13: note:         children 0x34115f8
t.c:5:13: note: node (external) 0x34115f8 (max_nunits=4, refcnt=1)
t.c:5:13: note:         stmt 0 _22 = (int) _24;
t.c:5:13: note:         stmt 1 _46 = (int) _44;
t.c:5:13: note:         stmt 2 _61 = (int) _59;
t.c:5:13: note:         stmt 3 _9 = (int) _7;
t.c:5:13: note:         children 0x3411680
t.c:5:13: note: node (external) 0x3411680 (max_nunits=4, refcnt=1)
t.c:5:13: note:         stmt 0 _24 = _27 == _25;
t.c:5:13: note:         stmt 1 _44 = _41 == _43;
t.c:5:13: note:         stmt 2 _59 = _56 == _58;
t.c:5:13: note:         stmt 3 _7 = _4 == _6;
t.c:5:13: note:         children 0x3411708 0x3411790
t.c:5:13: note: node 0x3411708 (max_nunits=2, refcnt=1)
t.c:5:13: note: op template: _27 = *a_13(D);
t.c:5:13: note:         stmt 0 _27 = *a_13(D);
t.c:5:13: note:         stmt 1 _41 = *_40;
t.c:5:13: note:         stmt 2 _56 = *_55;
t.c:5:13: note:         stmt 3 _4 = *_3;
t.c:5:13: note: node 0x3411790 (max_nunits=2, refcnt=1)
t.c:5:13: note: op template: _25 = *b_14(D);
t.c:5:13: note:         stmt 0 _25 = *b_14(D);
t.c:5:13: note:         stmt 1 _43 = *_42;
t.c:5:13: note:         stmt 2 _58 = *_57;
t.c:5:13: note:         stmt 3 _6 = *_5;
t.c:5:13: note: Cost model analysis:
_22 1 times scalar_store costs 1 in body
_46 1 times scalar_store costs 1 in body
_61 1 times scalar_store costs 1 in body
_9 1 times scalar_store costs 1 in body
_22 2 times unaligned_store (misalign -1) costs 2 in body
<unknown> 1 times vec_construct costs 2 in prologue
<unknown> 1 times vec_construct costs 2 in prologue
t.c:5:13: note: Cost model analysis for part in loop 0:
  Vector cost: 6
  Scalar cost: 4
t.c:5:13: missed: not vectorized: vectorization is not profitable.

but maybe I'm doing sth wrong since your assembler has the compare vectorized.

I'm doing, with a cc1 cross configured as

./src/trunk/configure --target=arm-none-linux-gnueabihf --with-float=hard
--with-cpu=cortex-a9 --with-fpu=neon-fp1

> ./cc1 -quiet t.c -I include -mcpu=cortex-a9 -mfpu=neon -O3

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug tree-optimization/104010] [12 regression] short loop no longer vectorized with Neon after r12-3362
  2022-01-13 15:52 [Bug tree-optimization/104010] New: [12 regression] short loop no longer vectorized with Neon after r12-6513 clyon at gcc dot gnu.org
                   ` (3 preceding siblings ...)
  2022-01-14  8:19 ` rguenth at gcc dot gnu.org
@ 2022-01-14  8:35 ` rguenth at gcc dot gnu.org
  2022-01-14  9:11 ` rguenth at gcc dot gnu.org
                   ` (11 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: rguenth at gcc dot gnu.org @ 2022-01-14  8:35 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104010

--- Comment #4 from Richard Biener <rguenth at gcc dot gnu.org> ---
Btw, you show V4SI vectorization but the analysis with this mode has the loads
unsupported for me:

t.c:5:13: note:   ==> examining statement: _27 = *a_13(D);
t.c:5:13: missed:   Aligned load, but unsupported type.
t.c:5:16: missed:   not vectorized: relevant stmt not supported: _27 =
*a_13(D);
t.c:5:13: note:   Building vector operands of 0x3411790 from scalars instead

and

t.c:5:13: note:   ==> examining statement: *dest_15(D) = _22;
t.c:5:13: note:   vect_is_simple_use: operand (int) _44, type of def: internal
t.c:5:13: note:   vect_is_simple_use: operand (int) _59, type of def: internal
t.c:5:13: note:   vect_is_simple_use: operand (int) _7, type of def: internal
t.c:5:13: missed:   unsupported unaligned access
t.c:5:13: missed:   not vectorized: relevant stmt not supported: *dest_15(D) =
_22;

please make sure to post _exact_ instructions on how to configure & invoke cc1,
the arm family is a mess and it's wasting my time each and every time I have to
dig into these kind of bugs :/

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug tree-optimization/104010] [12 regression] short loop no longer vectorized with Neon after r12-3362
  2022-01-13 15:52 [Bug tree-optimization/104010] New: [12 regression] short loop no longer vectorized with Neon after r12-6513 clyon at gcc dot gnu.org
                   ` (4 preceding siblings ...)
  2022-01-14  8:35 ` rguenth at gcc dot gnu.org
@ 2022-01-14  9:11 ` rguenth at gcc dot gnu.org
  2022-04-13 10:42 ` rearnsha at gcc dot gnu.org
                   ` (10 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: rguenth at gcc dot gnu.org @ 2022-01-14  9:11 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104010

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Assignee|rguenth at gcc dot gnu.org         |unassigned at gcc dot gnu.org
                 CC|                            |rguenth at gcc dot gnu.org

--- Comment #5 from Richard Biener <rguenth at gcc dot gnu.org> ---
adding -mfloat-abi=hard helps, but that vectorizes the loop (or the unrolled
loop with -fno-tree-loop-vectorize) as expected.

So I can't reproduce this.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug tree-optimization/104010] [12 regression] short loop no longer vectorized with Neon after r12-3362
  2022-01-13 15:52 [Bug tree-optimization/104010] New: [12 regression] short loop no longer vectorized with Neon after r12-6513 clyon at gcc dot gnu.org
                   ` (5 preceding siblings ...)
  2022-01-14  9:11 ` rguenth at gcc dot gnu.org
@ 2022-04-13 10:42 ` rearnsha at gcc dot gnu.org
  2022-04-13 10:43 ` rearnsha at gcc dot gnu.org
                   ` (9 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: rearnsha at gcc dot gnu.org @ 2022-04-13 10:42 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104010

Richard Earnshaw <rearnsha at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|WAITING                     |NEW

--- Comment #6 from Richard Earnshaw <rearnsha at gcc dot gnu.org> ---
The reason this wasn't reproducible is because there is a typo in the testcase
- the loop iteration count should be 2 not 4.  Clues are in the function name
and the assembly code generated, which both show 2 iterations of the loop.

Changing the test to:

void test_vcmpeq_s32x2 (int32_t * __restrict__ dest, int32_t *a, int32_t *b)
{
  int i;
  for (i=0; i<2; i++) {
    dest[i] = a[i] == b[i];
  }
}

Does indeed show a regression between gcc-11 and trunk.  With gcc-11 the
costing shows:

vect.c:5:13: note: Cost model analysis: 
0x2f0a780 _28 1 times scalar_store costs 1 in body
0x2f0a780 _41 1 times scalar_store costs 1 in body
0x2f0a780 (int) _26 1 times scalar_stmt costs 1 in body
0x2f0a780 (int) _39 1 times scalar_stmt costs 1 in body
0x2f0a780 _23 == _25 1 times scalar_stmt costs 1 in body
0x2f0a780 _36 == _38 1 times scalar_stmt costs 1 in body
0x2f0a780 *a_13(D) 1 times scalar_load costs 1 in body
0x2f0a780 MEM[(int *)a_13(D) + 4B] 1 times scalar_load costs 1 in body
0x2f0a780 *b_14(D) 1 times scalar_load costs 1 in body
0x2f0a780 MEM[(int *)b_14(D) + 4B] 1 times scalar_load costs 1 in body
0x2f0a780 *a_13(D) 1 times unaligned_load (misalign -1) costs 1 in body
0x2f0a780 *b_14(D) 1 times unaligned_load (misalign -1) costs 1 in body
0x2f0a780 _23 == _25 1 times vector_stmt costs 1 in body
0x2f0a780 _26 ? 1 : 0 1 times vector_stmt costs 1 in body
0x2f0a780 <unknown> 1 times vector_load costs 1 in prologue
0x2f0a780 <unknown> 1 times vector_load costs 1 in prologue
0x2f0a780 _28 1 times unaligned_store (misalign -1) costs 1 in body
vect.c:5:13: note: Cost model analysis for part in loop 0:
  Vector cost: 7
  Scalar cost: 10

While trunk shows:

vect.c:5:13: note: Cost model analysis: 
_28 1 times scalar_store costs 1 in body
_41 1 times scalar_store costs 1 in body
(int) _26 1 times scalar_stmt costs 1 in body
(int) _39 1 times scalar_stmt costs 1 in body
*a_13(D) 1 times unaligned_load (misalign -1) costs 1 in body
*b_14(D) 1 times unaligned_load (misalign -1) costs 1 in body
_23 == _25 1 times vector_stmt costs 1 in body
_26 ? 1 : 0 1 times vector_stmt costs 1 in body
node 0x3bc5078 1 times vector_load costs 1 in prologue
node 0x3bc5100 1 times vector_load costs 1 in prologue
_28 1 times unaligned_store (misalign -1) costs 1 in body
vect.c:5:13: note: Cost model analysis for part in loop 0:
  Vector cost: 7
  Scalar cost: 4
vect.c:5:13: missed: not vectorized: vectorization is not profitable.

Now the question is why has the scalar cost has been so dramatically reduced?

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug tree-optimization/104010] [12 regression] short loop no longer vectorized with Neon after r12-3362
  2022-01-13 15:52 [Bug tree-optimization/104010] New: [12 regression] short loop no longer vectorized with Neon after r12-6513 clyon at gcc dot gnu.org
                   ` (6 preceding siblings ...)
  2022-04-13 10:42 ` rearnsha at gcc dot gnu.org
@ 2022-04-13 10:43 ` rearnsha at gcc dot gnu.org
  2022-04-13 11:36 ` rguenth at gcc dot gnu.org
                   ` (8 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: rearnsha at gcc dot gnu.org @ 2022-04-13 10:43 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104010

--- Comment #7 from Richard Earnshaw <rearnsha at gcc dot gnu.org> ---
Options to reproduce on arm-none-eabi:

-O3 -mcpu=cortex-a9 -mfpu=neon-fp16 -mfloat-abi=hard -o - vect.c -S
-fdump-tree-all-details

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug tree-optimization/104010] [12 regression] short loop no longer vectorized with Neon after r12-3362
  2022-01-13 15:52 [Bug tree-optimization/104010] New: [12 regression] short loop no longer vectorized with Neon after r12-6513 clyon at gcc dot gnu.org
                   ` (7 preceding siblings ...)
  2022-04-13 10:43 ` rearnsha at gcc dot gnu.org
@ 2022-04-13 11:36 ` rguenth at gcc dot gnu.org
  2022-04-13 11:37 ` rguenth at gcc dot gnu.org
                   ` (7 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: rguenth at gcc dot gnu.org @ 2022-04-13 11:36 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104010

--- Comment #8 from Richard Biener <rguenth at gcc dot gnu.org> ---
This is likely because of a similar issue as PR103941 and related to pattern
recognition.  We're trying to be more precise now but with patterns there are
still issues and we are now erroring on the "safe" side (not vectorizing).

The way we compute the "scalar cover" of the vectorized stmts is currently
not correct and it probably needs another rewrite.  Likely the fix for
PR102176 regressed this.

I do have a small fix for this particular testcase but I think it will then
not cost pattern stmts with multiple scalar pieces correctly - basically
the whole vect_bb_slp_scalar_cost needs a rewrite.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug tree-optimization/104010] [12 regression] short loop no longer vectorized with Neon after r12-3362
  2022-01-13 15:52 [Bug tree-optimization/104010] New: [12 regression] short loop no longer vectorized with Neon after r12-6513 clyon at gcc dot gnu.org
                   ` (8 preceding siblings ...)
  2022-04-13 11:36 ` rguenth at gcc dot gnu.org
@ 2022-04-13 11:37 ` rguenth at gcc dot gnu.org
  2022-04-13 11:38 ` rguenth at gcc dot gnu.org
                   ` (6 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: rguenth at gcc dot gnu.org @ 2022-04-13 11:37 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104010

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |ASSIGNED
           Assignee|unassigned at gcc dot gnu.org      |rguenth at gcc dot gnu.org

--- Comment #9 from Richard Biener <rguenth at gcc dot gnu.org> ---
Created attachment 52798
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=52798&action=edit
candidate patch

Candidate.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug tree-optimization/104010] [12 regression] short loop no longer vectorized with Neon after r12-3362
  2022-01-13 15:52 [Bug tree-optimization/104010] New: [12 regression] short loop no longer vectorized with Neon after r12-6513 clyon at gcc dot gnu.org
                   ` (9 preceding siblings ...)
  2022-04-13 11:37 ` rguenth at gcc dot gnu.org
@ 2022-04-13 11:38 ` rguenth at gcc dot gnu.org
  2022-04-13 11:52 ` rguenth at gcc dot gnu.org
                   ` (5 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: rguenth at gcc dot gnu.org @ 2022-04-13 11:38 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104010

--- Comment #10 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Richard Earnshaw from comment #7)
> Options to reproduce on arm-none-eabi:
> 
> -O3 -mcpu=cortex-a9 -mfpu=neon-fp16 -mfloat-abi=hard -o - vect.c -S
> -fdump-tree-all-details

Can you create a testcase suitable to be put in gcc.target/arm/ with
the required dg- stuff?  Alternatively it would fit in
gcc.dg/vect/costmodel/ but there's no arm specific directory there.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug tree-optimization/104010] [12 regression] short loop no longer vectorized with Neon after r12-3362
  2022-01-13 15:52 [Bug tree-optimization/104010] New: [12 regression] short loop no longer vectorized with Neon after r12-6513 clyon at gcc dot gnu.org
                   ` (10 preceding siblings ...)
  2022-04-13 11:38 ` rguenth at gcc dot gnu.org
@ 2022-04-13 11:52 ` rguenth at gcc dot gnu.org
  2022-04-13 13:12 ` clyon at gcc dot gnu.org
                   ` (4 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: rguenth at gcc dot gnu.org @ 2022-04-13 11:52 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104010

--- Comment #11 from Richard Biener <rguenth at gcc dot gnu.org> ---
The testcases from PR103941 are also fixed - I fear this might cause quite some
extra BB vectorization, so not sure if it is good to do right now.  OTOH it's
probably the last chance to get benchmark coverage from autotesters.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug tree-optimization/104010] [12 regression] short loop no longer vectorized with Neon after r12-3362
  2022-01-13 15:52 [Bug tree-optimization/104010] New: [12 regression] short loop no longer vectorized with Neon after r12-6513 clyon at gcc dot gnu.org
                   ` (11 preceding siblings ...)
  2022-04-13 11:52 ` rguenth at gcc dot gnu.org
@ 2022-04-13 13:12 ` clyon at gcc dot gnu.org
  2022-04-14 11:30 ` rguenth at gcc dot gnu.org
                   ` (3 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: clyon at gcc dot gnu.org @ 2022-04-13 13:12 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104010

--- Comment #12 from Christophe Lyon <clyon at gcc dot gnu.org> ---
The test in arm/simd/neon-vcmp.c currently fails, and passes with your
candidate patch, thanks.

(I wrote that test before the regression, and noticed it had regressed while
working on my MVE/VCMP patche

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug tree-optimization/104010] [12 regression] short loop no longer vectorized with Neon after r12-3362
  2022-01-13 15:52 [Bug tree-optimization/104010] New: [12 regression] short loop no longer vectorized with Neon after r12-6513 clyon at gcc dot gnu.org
                   ` (12 preceding siblings ...)
  2022-04-13 13:12 ` clyon at gcc dot gnu.org
@ 2022-04-14 11:30 ` rguenth at gcc dot gnu.org
  2022-04-14 11:31 ` rguenth at gcc dot gnu.org
                   ` (2 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: rguenth at gcc dot gnu.org @ 2022-04-14 11:30 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104010

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  Attachment #52798|0                           |1
        is obsolete|                            |

--- Comment #13 from Richard Biener <rguenth at gcc dot gnu.org> ---
Created attachment 52809
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=52809&action=edit
patch

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug tree-optimization/104010] [12 regression] short loop no longer vectorized with Neon after r12-3362
  2022-01-13 15:52 [Bug tree-optimization/104010] New: [12 regression] short loop no longer vectorized with Neon after r12-6513 clyon at gcc dot gnu.org
                   ` (13 preceding siblings ...)
  2022-04-14 11:30 ` rguenth at gcc dot gnu.org
@ 2022-04-14 11:31 ` rguenth at gcc dot gnu.org
  2022-04-19 14:42 ` cvs-commit at gcc dot gnu.org
  2022-04-19 14:43 ` rguenth at gcc dot gnu.org
  16 siblings, 0 replies; 18+ messages in thread
From: rguenth at gcc dot gnu.org @ 2022-04-14 11:31 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104010

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|                            |patch

--- Comment #14 from Richard Biener <rguenth at gcc dot gnu.org> ---
Patch posted for review.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug tree-optimization/104010] [12 regression] short loop no longer vectorized with Neon after r12-3362
  2022-01-13 15:52 [Bug tree-optimization/104010] New: [12 regression] short loop no longer vectorized with Neon after r12-6513 clyon at gcc dot gnu.org
                   ` (14 preceding siblings ...)
  2022-04-14 11:31 ` rguenth at gcc dot gnu.org
@ 2022-04-19 14:42 ` cvs-commit at gcc dot gnu.org
  2022-04-19 14:43 ` rguenth at gcc dot gnu.org
  16 siblings, 0 replies; 18+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2022-04-19 14:42 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104010

--- Comment #15 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Richard Biener <rguenth@gcc.gnu.org>:

https://gcc.gnu.org/g:353434b65ef7972172597d232ae17022d9a57244

commit r12-8195-g353434b65ef7972172597d232ae17022d9a57244
Author: Richard Biener <rguenther@suse.de>
Date:   Wed Apr 13 13:49:45 2022 +0200

    tree-optimization/104010 - fix SLP scalar costing with patterns

    When doing BB vectorization the scalar cost compute is derailed
    by patterns, causing lanes to be considered live and thus not
    costed on the scalar side.  For the testcase in PR104010 this
    prevents vectorization which was done by GCC 11.  PR103941
    shows similar cases of missed optimizations that are fixed by
    this patch.

    2022-04-13  Richard Biener  <rguenther@suse.de>

            PR tree-optimization/104010
            PR tree-optimization/103941
            * tree-vect-slp.cc (vect_bb_slp_scalar_cost): When
            we run into stmts in patterns continue walking those
            for uses outside of the vectorized region instead of
            marking the lane live.

            * gcc.target/i386/pr103941-1.c: New testcase.
            * gcc.target/i386/pr103941-2.c: Likewise.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug tree-optimization/104010] [12 regression] short loop no longer vectorized with Neon after r12-3362
  2022-01-13 15:52 [Bug tree-optimization/104010] New: [12 regression] short loop no longer vectorized with Neon after r12-6513 clyon at gcc dot gnu.org
                   ` (15 preceding siblings ...)
  2022-04-19 14:42 ` cvs-commit at gcc dot gnu.org
@ 2022-04-19 14:43 ` rguenth at gcc dot gnu.org
  16 siblings, 0 replies; 18+ messages in thread
From: rguenth at gcc dot gnu.org @ 2022-04-19 14:43 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104010

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|ASSIGNED                    |RESOLVED
         Resolution|---                         |FIXED

--- Comment #16 from Richard Biener <rguenth at gcc dot gnu.org> ---
Fixed.

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2022-04-19 14:43 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-01-13 15:52 [Bug tree-optimization/104010] New: [12 regression] short loop no longer vectorized with Neon after r12-6513 clyon at gcc dot gnu.org
2022-01-13 15:56 ` [Bug tree-optimization/104010] " pinskia at gcc dot gnu.org
2022-01-13 16:00 ` clyon at gcc dot gnu.org
2022-01-14  7:58 ` [Bug tree-optimization/104010] [12 regression] short loop no longer vectorized with Neon after r12-3362 rguenth at gcc dot gnu.org
2022-01-14  8:19 ` rguenth at gcc dot gnu.org
2022-01-14  8:35 ` rguenth at gcc dot gnu.org
2022-01-14  9:11 ` rguenth at gcc dot gnu.org
2022-04-13 10:42 ` rearnsha at gcc dot gnu.org
2022-04-13 10:43 ` rearnsha at gcc dot gnu.org
2022-04-13 11:36 ` rguenth at gcc dot gnu.org
2022-04-13 11:37 ` rguenth at gcc dot gnu.org
2022-04-13 11:38 ` rguenth at gcc dot gnu.org
2022-04-13 11:52 ` rguenth at gcc dot gnu.org
2022-04-13 13:12 ` clyon at gcc dot gnu.org
2022-04-14 11:30 ` rguenth at gcc dot gnu.org
2022-04-14 11:31 ` rguenth at gcc dot gnu.org
2022-04-19 14:42 ` cvs-commit at gcc dot gnu.org
2022-04-19 14:43 ` rguenth at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).