public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug tree-optimization/104010] New: [12 regression] short loop no longer vectorized with Neon after r12-6513
@ 2022-01-13 15:52 clyon at gcc dot gnu.org
2022-01-13 15:56 ` [Bug tree-optimization/104010] " pinskia at gcc dot gnu.org
` (16 more replies)
0 siblings, 17 replies; 18+ messages in thread
From: clyon at gcc dot gnu.org @ 2022-01-13 15:52 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104010
Bug ID: 104010
Summary: [12 regression] short loop no longer vectorized with
Neon after r12-6513
Product: gcc
Version: 12.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: clyon at gcc dot gnu.org
Target Milestone: ---
This short loop:
void test_vcmpeq_s32x2 (int32_t * __restrict__ dest, int32_t *a, int32_t *b)
{
int i;
for (i=0; i<4; i++) {
dest[i] = a[i] == b[i];
}
}
used to be vectorized as:
test_vcmpeq_s32x2:
vld1.32 {d16}, [r1]
vmov.i32 d17, #0x1 @ v2si
vld1.32 {d19}, [r2]
vmov.i32 d18, #0 @ v2si
vceq.i32 d16, d16, d19
vbsl d16, d17, d18
vst1.32 {d16}, [r0]
bx lr
After r12-6513, we get:
test_vcmpeq_s32x2:
ldr ip, [r1]
ldr r3, [r1, #4]
str lr, [sp, #-4]!
ldr lr, [r2]
ldr r2, [r2, #4]
sub ip, ip, lr
clz ip, ip
sub r3, r3, r2
lsr ip, ip, #5
clz r3, r3
lsr r3, r3, #5
str ip, [r0]
str r3, [r0, #4]
ldr pc, [sp], #4
when compiling for arm-none-linux-gnueabihf with -mcpu=cortex-a9 -mfpu=neon
^ permalink raw reply [flat|nested] 18+ messages in thread
* [Bug tree-optimization/104010] [12 regression] short loop no longer vectorized with Neon after r12-6513
2022-01-13 15:52 [Bug tree-optimization/104010] New: [12 regression] short loop no longer vectorized with Neon after r12-6513 clyon at gcc dot gnu.org
@ 2022-01-13 15:56 ` pinskia at gcc dot gnu.org
2022-01-13 16:00 ` clyon at gcc dot gnu.org
` (15 subsequent siblings)
16 siblings, 0 replies; 18+ messages in thread
From: pinskia at gcc dot gnu.org @ 2022-01-13 15:56 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104010
--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
I think you have the wrong revision in there as that commit only adds a
testcase.
^ permalink raw reply [flat|nested] 18+ messages in thread
* [Bug tree-optimization/104010] [12 regression] short loop no longer vectorized with Neon after r12-6513
2022-01-13 15:52 [Bug tree-optimization/104010] New: [12 regression] short loop no longer vectorized with Neon after r12-6513 clyon at gcc dot gnu.org
2022-01-13 15:56 ` [Bug tree-optimization/104010] " pinskia at gcc dot gnu.org
@ 2022-01-13 16:00 ` clyon at gcc dot gnu.org
2022-01-14 7:58 ` [Bug tree-optimization/104010] [12 regression] short loop no longer vectorized with Neon after r12-3362 rguenth at gcc dot gnu.org
` (14 subsequent siblings)
16 siblings, 0 replies; 18+ messages in thread
From: clyon at gcc dot gnu.org @ 2022-01-13 16:00 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104010
--- Comment #2 from Christophe Lyon <clyon at gcc dot gnu.org> ---
Ha right, git gcc-descr with no argument didn't what I expected (ie. git
gcc-descr HEAD after a bisect...)
So I meant r12-3362 g:a3fb781d4b341c0d50ef1b92cd3e8734e673ef18
^ permalink raw reply [flat|nested] 18+ messages in thread
* [Bug tree-optimization/104010] [12 regression] short loop no longer vectorized with Neon after r12-3362
2022-01-13 15:52 [Bug tree-optimization/104010] New: [12 regression] short loop no longer vectorized with Neon after r12-6513 clyon at gcc dot gnu.org
2022-01-13 15:56 ` [Bug tree-optimization/104010] " pinskia at gcc dot gnu.org
2022-01-13 16:00 ` clyon at gcc dot gnu.org
@ 2022-01-14 7:58 ` rguenth at gcc dot gnu.org
2022-01-14 8:19 ` rguenth at gcc dot gnu.org
` (13 subsequent siblings)
16 siblings, 0 replies; 18+ messages in thread
From: rguenth at gcc dot gnu.org @ 2022-01-14 7:58 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104010
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Target Milestone|--- |12.0
Keywords| |missed-optimization
^ permalink raw reply [flat|nested] 18+ messages in thread
* [Bug tree-optimization/104010] [12 regression] short loop no longer vectorized with Neon after r12-3362
2022-01-13 15:52 [Bug tree-optimization/104010] New: [12 regression] short loop no longer vectorized with Neon after r12-6513 clyon at gcc dot gnu.org
` (2 preceding siblings ...)
2022-01-14 7:58 ` [Bug tree-optimization/104010] [12 regression] short loop no longer vectorized with Neon after r12-3362 rguenth at gcc dot gnu.org
@ 2022-01-14 8:19 ` rguenth at gcc dot gnu.org
2022-01-14 8:35 ` rguenth at gcc dot gnu.org
` (12 subsequent siblings)
16 siblings, 0 replies; 18+ messages in thread
From: rguenth at gcc dot gnu.org @ 2022-01-14 8:19 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104010
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Last reconfirmed| |2022-01-14
Ever confirmed|0 |1
Assignee|unassigned at gcc dot gnu.org |rguenth at gcc dot gnu.org
Status|UNCONFIRMED |WAITING
--- Comment #3 from Richard Biener <rguenth at gcc dot gnu.org> ---
Not sure if I can parse the assembly. The rev quoted changes costing, so I
assume the rest is the same. I see
t.c:5:20: missed: not vectorized: relevant stmt not supported: _24 = _27 ==
_25;
t.c:5:13: note: Building vector operands of 0x3411680 from scalars instead
t.c:5:13: note: ==> examining statement: _22 = (int) _24;
t.c:5:13: missed: type conversion to/from bit-precision unsupported.
t.c:5:20: missed: not vectorized: relevant stmt not supported: _22 = (int)
_24;
t.c:5:13: note: Building vector operands of 0x34115f8 from scalars instead
and so we end up with
t.c:5:13: note: ***** Analysis succeeded with vector mode V8QI
t.c:5:13: note: SLPing BB part
t.c:5:13: note: Costing subgraph:
t.c:5:13: note: node 0x3411570 (max_nunits=4, refcnt=1)
t.c:5:13: note: op template: *dest_15(D) = _22;
t.c:5:13: note: stmt 0 *dest_15(D) = _22;
t.c:5:13: note: stmt 1 *_45 = _46;
t.c:5:13: note: stmt 2 *_60 = _61;
t.c:5:13: note: stmt 3 *_8 = _9;
t.c:5:13: note: children 0x34115f8
t.c:5:13: note: node (external) 0x34115f8 (max_nunits=4, refcnt=1)
t.c:5:13: note: stmt 0 _22 = (int) _24;
t.c:5:13: note: stmt 1 _46 = (int) _44;
t.c:5:13: note: stmt 2 _61 = (int) _59;
t.c:5:13: note: stmt 3 _9 = (int) _7;
t.c:5:13: note: children 0x3411680
t.c:5:13: note: node (external) 0x3411680 (max_nunits=4, refcnt=1)
t.c:5:13: note: stmt 0 _24 = _27 == _25;
t.c:5:13: note: stmt 1 _44 = _41 == _43;
t.c:5:13: note: stmt 2 _59 = _56 == _58;
t.c:5:13: note: stmt 3 _7 = _4 == _6;
t.c:5:13: note: children 0x3411708 0x3411790
t.c:5:13: note: node 0x3411708 (max_nunits=2, refcnt=1)
t.c:5:13: note: op template: _27 = *a_13(D);
t.c:5:13: note: stmt 0 _27 = *a_13(D);
t.c:5:13: note: stmt 1 _41 = *_40;
t.c:5:13: note: stmt 2 _56 = *_55;
t.c:5:13: note: stmt 3 _4 = *_3;
t.c:5:13: note: node 0x3411790 (max_nunits=2, refcnt=1)
t.c:5:13: note: op template: _25 = *b_14(D);
t.c:5:13: note: stmt 0 _25 = *b_14(D);
t.c:5:13: note: stmt 1 _43 = *_42;
t.c:5:13: note: stmt 2 _58 = *_57;
t.c:5:13: note: stmt 3 _6 = *_5;
t.c:5:13: note: Cost model analysis:
_22 1 times scalar_store costs 1 in body
_46 1 times scalar_store costs 1 in body
_61 1 times scalar_store costs 1 in body
_9 1 times scalar_store costs 1 in body
_22 2 times unaligned_store (misalign -1) costs 2 in body
<unknown> 1 times vec_construct costs 2 in prologue
<unknown> 1 times vec_construct costs 2 in prologue
t.c:5:13: note: Cost model analysis for part in loop 0:
Vector cost: 6
Scalar cost: 4
t.c:5:13: missed: not vectorized: vectorization is not profitable.
but maybe I'm doing sth wrong since your assembler has the compare vectorized.
I'm doing, with a cc1 cross configured as
./src/trunk/configure --target=arm-none-linux-gnueabihf --with-float=hard
--with-cpu=cortex-a9 --with-fpu=neon-fp1
> ./cc1 -quiet t.c -I include -mcpu=cortex-a9 -mfpu=neon -O3
^ permalink raw reply [flat|nested] 18+ messages in thread
* [Bug tree-optimization/104010] [12 regression] short loop no longer vectorized with Neon after r12-3362
2022-01-13 15:52 [Bug tree-optimization/104010] New: [12 regression] short loop no longer vectorized with Neon after r12-6513 clyon at gcc dot gnu.org
` (3 preceding siblings ...)
2022-01-14 8:19 ` rguenth at gcc dot gnu.org
@ 2022-01-14 8:35 ` rguenth at gcc dot gnu.org
2022-01-14 9:11 ` rguenth at gcc dot gnu.org
` (11 subsequent siblings)
16 siblings, 0 replies; 18+ messages in thread
From: rguenth at gcc dot gnu.org @ 2022-01-14 8:35 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104010
--- Comment #4 from Richard Biener <rguenth at gcc dot gnu.org> ---
Btw, you show V4SI vectorization but the analysis with this mode has the loads
unsupported for me:
t.c:5:13: note: ==> examining statement: _27 = *a_13(D);
t.c:5:13: missed: Aligned load, but unsupported type.
t.c:5:16: missed: not vectorized: relevant stmt not supported: _27 =
*a_13(D);
t.c:5:13: note: Building vector operands of 0x3411790 from scalars instead
and
t.c:5:13: note: ==> examining statement: *dest_15(D) = _22;
t.c:5:13: note: vect_is_simple_use: operand (int) _44, type of def: internal
t.c:5:13: note: vect_is_simple_use: operand (int) _59, type of def: internal
t.c:5:13: note: vect_is_simple_use: operand (int) _7, type of def: internal
t.c:5:13: missed: unsupported unaligned access
t.c:5:13: missed: not vectorized: relevant stmt not supported: *dest_15(D) =
_22;
please make sure to post _exact_ instructions on how to configure & invoke cc1,
the arm family is a mess and it's wasting my time each and every time I have to
dig into these kind of bugs :/
^ permalink raw reply [flat|nested] 18+ messages in thread
* [Bug tree-optimization/104010] [12 regression] short loop no longer vectorized with Neon after r12-3362
2022-01-13 15:52 [Bug tree-optimization/104010] New: [12 regression] short loop no longer vectorized with Neon after r12-6513 clyon at gcc dot gnu.org
` (4 preceding siblings ...)
2022-01-14 8:35 ` rguenth at gcc dot gnu.org
@ 2022-01-14 9:11 ` rguenth at gcc dot gnu.org
2022-04-13 10:42 ` rearnsha at gcc dot gnu.org
` (10 subsequent siblings)
16 siblings, 0 replies; 18+ messages in thread
From: rguenth at gcc dot gnu.org @ 2022-01-14 9:11 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104010
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Assignee|rguenth at gcc dot gnu.org |unassigned at gcc dot gnu.org
CC| |rguenth at gcc dot gnu.org
--- Comment #5 from Richard Biener <rguenth at gcc dot gnu.org> ---
adding -mfloat-abi=hard helps, but that vectorizes the loop (or the unrolled
loop with -fno-tree-loop-vectorize) as expected.
So I can't reproduce this.
^ permalink raw reply [flat|nested] 18+ messages in thread
* [Bug tree-optimization/104010] [12 regression] short loop no longer vectorized with Neon after r12-3362
2022-01-13 15:52 [Bug tree-optimization/104010] New: [12 regression] short loop no longer vectorized with Neon after r12-6513 clyon at gcc dot gnu.org
` (5 preceding siblings ...)
2022-01-14 9:11 ` rguenth at gcc dot gnu.org
@ 2022-04-13 10:42 ` rearnsha at gcc dot gnu.org
2022-04-13 10:43 ` rearnsha at gcc dot gnu.org
` (9 subsequent siblings)
16 siblings, 0 replies; 18+ messages in thread
From: rearnsha at gcc dot gnu.org @ 2022-04-13 10:42 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104010
Richard Earnshaw <rearnsha at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|WAITING |NEW
--- Comment #6 from Richard Earnshaw <rearnsha at gcc dot gnu.org> ---
The reason this wasn't reproducible is because there is a typo in the testcase
- the loop iteration count should be 2 not 4. Clues are in the function name
and the assembly code generated, which both show 2 iterations of the loop.
Changing the test to:
void test_vcmpeq_s32x2 (int32_t * __restrict__ dest, int32_t *a, int32_t *b)
{
int i;
for (i=0; i<2; i++) {
dest[i] = a[i] == b[i];
}
}
Does indeed show a regression between gcc-11 and trunk. With gcc-11 the
costing shows:
vect.c:5:13: note: Cost model analysis:
0x2f0a780 _28 1 times scalar_store costs 1 in body
0x2f0a780 _41 1 times scalar_store costs 1 in body
0x2f0a780 (int) _26 1 times scalar_stmt costs 1 in body
0x2f0a780 (int) _39 1 times scalar_stmt costs 1 in body
0x2f0a780 _23 == _25 1 times scalar_stmt costs 1 in body
0x2f0a780 _36 == _38 1 times scalar_stmt costs 1 in body
0x2f0a780 *a_13(D) 1 times scalar_load costs 1 in body
0x2f0a780 MEM[(int *)a_13(D) + 4B] 1 times scalar_load costs 1 in body
0x2f0a780 *b_14(D) 1 times scalar_load costs 1 in body
0x2f0a780 MEM[(int *)b_14(D) + 4B] 1 times scalar_load costs 1 in body
0x2f0a780 *a_13(D) 1 times unaligned_load (misalign -1) costs 1 in body
0x2f0a780 *b_14(D) 1 times unaligned_load (misalign -1) costs 1 in body
0x2f0a780 _23 == _25 1 times vector_stmt costs 1 in body
0x2f0a780 _26 ? 1 : 0 1 times vector_stmt costs 1 in body
0x2f0a780 <unknown> 1 times vector_load costs 1 in prologue
0x2f0a780 <unknown> 1 times vector_load costs 1 in prologue
0x2f0a780 _28 1 times unaligned_store (misalign -1) costs 1 in body
vect.c:5:13: note: Cost model analysis for part in loop 0:
Vector cost: 7
Scalar cost: 10
While trunk shows:
vect.c:5:13: note: Cost model analysis:
_28 1 times scalar_store costs 1 in body
_41 1 times scalar_store costs 1 in body
(int) _26 1 times scalar_stmt costs 1 in body
(int) _39 1 times scalar_stmt costs 1 in body
*a_13(D) 1 times unaligned_load (misalign -1) costs 1 in body
*b_14(D) 1 times unaligned_load (misalign -1) costs 1 in body
_23 == _25 1 times vector_stmt costs 1 in body
_26 ? 1 : 0 1 times vector_stmt costs 1 in body
node 0x3bc5078 1 times vector_load costs 1 in prologue
node 0x3bc5100 1 times vector_load costs 1 in prologue
_28 1 times unaligned_store (misalign -1) costs 1 in body
vect.c:5:13: note: Cost model analysis for part in loop 0:
Vector cost: 7
Scalar cost: 4
vect.c:5:13: missed: not vectorized: vectorization is not profitable.
Now the question is why has the scalar cost has been so dramatically reduced?
^ permalink raw reply [flat|nested] 18+ messages in thread
* [Bug tree-optimization/104010] [12 regression] short loop no longer vectorized with Neon after r12-3362
2022-01-13 15:52 [Bug tree-optimization/104010] New: [12 regression] short loop no longer vectorized with Neon after r12-6513 clyon at gcc dot gnu.org
` (6 preceding siblings ...)
2022-04-13 10:42 ` rearnsha at gcc dot gnu.org
@ 2022-04-13 10:43 ` rearnsha at gcc dot gnu.org
2022-04-13 11:36 ` rguenth at gcc dot gnu.org
` (8 subsequent siblings)
16 siblings, 0 replies; 18+ messages in thread
From: rearnsha at gcc dot gnu.org @ 2022-04-13 10:43 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104010
--- Comment #7 from Richard Earnshaw <rearnsha at gcc dot gnu.org> ---
Options to reproduce on arm-none-eabi:
-O3 -mcpu=cortex-a9 -mfpu=neon-fp16 -mfloat-abi=hard -o - vect.c -S
-fdump-tree-all-details
^ permalink raw reply [flat|nested] 18+ messages in thread
* [Bug tree-optimization/104010] [12 regression] short loop no longer vectorized with Neon after r12-3362
2022-01-13 15:52 [Bug tree-optimization/104010] New: [12 regression] short loop no longer vectorized with Neon after r12-6513 clyon at gcc dot gnu.org
` (7 preceding siblings ...)
2022-04-13 10:43 ` rearnsha at gcc dot gnu.org
@ 2022-04-13 11:36 ` rguenth at gcc dot gnu.org
2022-04-13 11:37 ` rguenth at gcc dot gnu.org
` (7 subsequent siblings)
16 siblings, 0 replies; 18+ messages in thread
From: rguenth at gcc dot gnu.org @ 2022-04-13 11:36 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104010
--- Comment #8 from Richard Biener <rguenth at gcc dot gnu.org> ---
This is likely because of a similar issue as PR103941 and related to pattern
recognition. We're trying to be more precise now but with patterns there are
still issues and we are now erroring on the "safe" side (not vectorizing).
The way we compute the "scalar cover" of the vectorized stmts is currently
not correct and it probably needs another rewrite. Likely the fix for
PR102176 regressed this.
I do have a small fix for this particular testcase but I think it will then
not cost pattern stmts with multiple scalar pieces correctly - basically
the whole vect_bb_slp_scalar_cost needs a rewrite.
^ permalink raw reply [flat|nested] 18+ messages in thread
* [Bug tree-optimization/104010] [12 regression] short loop no longer vectorized with Neon after r12-3362
2022-01-13 15:52 [Bug tree-optimization/104010] New: [12 regression] short loop no longer vectorized with Neon after r12-6513 clyon at gcc dot gnu.org
` (8 preceding siblings ...)
2022-04-13 11:36 ` rguenth at gcc dot gnu.org
@ 2022-04-13 11:37 ` rguenth at gcc dot gnu.org
2022-04-13 11:38 ` rguenth at gcc dot gnu.org
` (6 subsequent siblings)
16 siblings, 0 replies; 18+ messages in thread
From: rguenth at gcc dot gnu.org @ 2022-04-13 11:37 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104010
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |ASSIGNED
Assignee|unassigned at gcc dot gnu.org |rguenth at gcc dot gnu.org
--- Comment #9 from Richard Biener <rguenth at gcc dot gnu.org> ---
Created attachment 52798
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=52798&action=edit
candidate patch
Candidate.
^ permalink raw reply [flat|nested] 18+ messages in thread
* [Bug tree-optimization/104010] [12 regression] short loop no longer vectorized with Neon after r12-3362
2022-01-13 15:52 [Bug tree-optimization/104010] New: [12 regression] short loop no longer vectorized with Neon after r12-6513 clyon at gcc dot gnu.org
` (9 preceding siblings ...)
2022-04-13 11:37 ` rguenth at gcc dot gnu.org
@ 2022-04-13 11:38 ` rguenth at gcc dot gnu.org
2022-04-13 11:52 ` rguenth at gcc dot gnu.org
` (5 subsequent siblings)
16 siblings, 0 replies; 18+ messages in thread
From: rguenth at gcc dot gnu.org @ 2022-04-13 11:38 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104010
--- Comment #10 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Richard Earnshaw from comment #7)
> Options to reproduce on arm-none-eabi:
>
> -O3 -mcpu=cortex-a9 -mfpu=neon-fp16 -mfloat-abi=hard -o - vect.c -S
> -fdump-tree-all-details
Can you create a testcase suitable to be put in gcc.target/arm/ with
the required dg- stuff? Alternatively it would fit in
gcc.dg/vect/costmodel/ but there's no arm specific directory there.
^ permalink raw reply [flat|nested] 18+ messages in thread
* [Bug tree-optimization/104010] [12 regression] short loop no longer vectorized with Neon after r12-3362
2022-01-13 15:52 [Bug tree-optimization/104010] New: [12 regression] short loop no longer vectorized with Neon after r12-6513 clyon at gcc dot gnu.org
` (10 preceding siblings ...)
2022-04-13 11:38 ` rguenth at gcc dot gnu.org
@ 2022-04-13 11:52 ` rguenth at gcc dot gnu.org
2022-04-13 13:12 ` clyon at gcc dot gnu.org
` (4 subsequent siblings)
16 siblings, 0 replies; 18+ messages in thread
From: rguenth at gcc dot gnu.org @ 2022-04-13 11:52 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104010
--- Comment #11 from Richard Biener <rguenth at gcc dot gnu.org> ---
The testcases from PR103941 are also fixed - I fear this might cause quite some
extra BB vectorization, so not sure if it is good to do right now. OTOH it's
probably the last chance to get benchmark coverage from autotesters.
^ permalink raw reply [flat|nested] 18+ messages in thread
* [Bug tree-optimization/104010] [12 regression] short loop no longer vectorized with Neon after r12-3362
2022-01-13 15:52 [Bug tree-optimization/104010] New: [12 regression] short loop no longer vectorized with Neon after r12-6513 clyon at gcc dot gnu.org
` (11 preceding siblings ...)
2022-04-13 11:52 ` rguenth at gcc dot gnu.org
@ 2022-04-13 13:12 ` clyon at gcc dot gnu.org
2022-04-14 11:30 ` rguenth at gcc dot gnu.org
` (3 subsequent siblings)
16 siblings, 0 replies; 18+ messages in thread
From: clyon at gcc dot gnu.org @ 2022-04-13 13:12 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104010
--- Comment #12 from Christophe Lyon <clyon at gcc dot gnu.org> ---
The test in arm/simd/neon-vcmp.c currently fails, and passes with your
candidate patch, thanks.
(I wrote that test before the regression, and noticed it had regressed while
working on my MVE/VCMP patche
^ permalink raw reply [flat|nested] 18+ messages in thread
* [Bug tree-optimization/104010] [12 regression] short loop no longer vectorized with Neon after r12-3362
2022-01-13 15:52 [Bug tree-optimization/104010] New: [12 regression] short loop no longer vectorized with Neon after r12-6513 clyon at gcc dot gnu.org
` (12 preceding siblings ...)
2022-04-13 13:12 ` clyon at gcc dot gnu.org
@ 2022-04-14 11:30 ` rguenth at gcc dot gnu.org
2022-04-14 11:31 ` rguenth at gcc dot gnu.org
` (2 subsequent siblings)
16 siblings, 0 replies; 18+ messages in thread
From: rguenth at gcc dot gnu.org @ 2022-04-14 11:30 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104010
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Attachment #52798|0 |1
is obsolete| |
--- Comment #13 from Richard Biener <rguenth at gcc dot gnu.org> ---
Created attachment 52809
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=52809&action=edit
patch
^ permalink raw reply [flat|nested] 18+ messages in thread
* [Bug tree-optimization/104010] [12 regression] short loop no longer vectorized with Neon after r12-3362
2022-01-13 15:52 [Bug tree-optimization/104010] New: [12 regression] short loop no longer vectorized with Neon after r12-6513 clyon at gcc dot gnu.org
` (13 preceding siblings ...)
2022-04-14 11:30 ` rguenth at gcc dot gnu.org
@ 2022-04-14 11:31 ` rguenth at gcc dot gnu.org
2022-04-19 14:42 ` cvs-commit at gcc dot gnu.org
2022-04-19 14:43 ` rguenth at gcc dot gnu.org
16 siblings, 0 replies; 18+ messages in thread
From: rguenth at gcc dot gnu.org @ 2022-04-14 11:31 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104010
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Keywords| |patch
--- Comment #14 from Richard Biener <rguenth at gcc dot gnu.org> ---
Patch posted for review.
^ permalink raw reply [flat|nested] 18+ messages in thread
* [Bug tree-optimization/104010] [12 regression] short loop no longer vectorized with Neon after r12-3362
2022-01-13 15:52 [Bug tree-optimization/104010] New: [12 regression] short loop no longer vectorized with Neon after r12-6513 clyon at gcc dot gnu.org
` (14 preceding siblings ...)
2022-04-14 11:31 ` rguenth at gcc dot gnu.org
@ 2022-04-19 14:42 ` cvs-commit at gcc dot gnu.org
2022-04-19 14:43 ` rguenth at gcc dot gnu.org
16 siblings, 0 replies; 18+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2022-04-19 14:42 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104010
--- Comment #15 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Richard Biener <rguenth@gcc.gnu.org>:
https://gcc.gnu.org/g:353434b65ef7972172597d232ae17022d9a57244
commit r12-8195-g353434b65ef7972172597d232ae17022d9a57244
Author: Richard Biener <rguenther@suse.de>
Date: Wed Apr 13 13:49:45 2022 +0200
tree-optimization/104010 - fix SLP scalar costing with patterns
When doing BB vectorization the scalar cost compute is derailed
by patterns, causing lanes to be considered live and thus not
costed on the scalar side. For the testcase in PR104010 this
prevents vectorization which was done by GCC 11. PR103941
shows similar cases of missed optimizations that are fixed by
this patch.
2022-04-13 Richard Biener <rguenther@suse.de>
PR tree-optimization/104010
PR tree-optimization/103941
* tree-vect-slp.cc (vect_bb_slp_scalar_cost): When
we run into stmts in patterns continue walking those
for uses outside of the vectorized region instead of
marking the lane live.
* gcc.target/i386/pr103941-1.c: New testcase.
* gcc.target/i386/pr103941-2.c: Likewise.
^ permalink raw reply [flat|nested] 18+ messages in thread
* [Bug tree-optimization/104010] [12 regression] short loop no longer vectorized with Neon after r12-3362
2022-01-13 15:52 [Bug tree-optimization/104010] New: [12 regression] short loop no longer vectorized with Neon after r12-6513 clyon at gcc dot gnu.org
` (15 preceding siblings ...)
2022-04-19 14:42 ` cvs-commit at gcc dot gnu.org
@ 2022-04-19 14:43 ` rguenth at gcc dot gnu.org
16 siblings, 0 replies; 18+ messages in thread
From: rguenth at gcc dot gnu.org @ 2022-04-19 14:43 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104010
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|ASSIGNED |RESOLVED
Resolution|--- |FIXED
--- Comment #16 from Richard Biener <rguenth at gcc dot gnu.org> ---
Fixed.
^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2022-04-19 14:43 UTC | newest]
Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-01-13 15:52 [Bug tree-optimization/104010] New: [12 regression] short loop no longer vectorized with Neon after r12-6513 clyon at gcc dot gnu.org
2022-01-13 15:56 ` [Bug tree-optimization/104010] " pinskia at gcc dot gnu.org
2022-01-13 16:00 ` clyon at gcc dot gnu.org
2022-01-14 7:58 ` [Bug tree-optimization/104010] [12 regression] short loop no longer vectorized with Neon after r12-3362 rguenth at gcc dot gnu.org
2022-01-14 8:19 ` rguenth at gcc dot gnu.org
2022-01-14 8:35 ` rguenth at gcc dot gnu.org
2022-01-14 9:11 ` rguenth at gcc dot gnu.org
2022-04-13 10:42 ` rearnsha at gcc dot gnu.org
2022-04-13 10:43 ` rearnsha at gcc dot gnu.org
2022-04-13 11:36 ` rguenth at gcc dot gnu.org
2022-04-13 11:37 ` rguenth at gcc dot gnu.org
2022-04-13 11:38 ` rguenth at gcc dot gnu.org
2022-04-13 11:52 ` rguenth at gcc dot gnu.org
2022-04-13 13:12 ` clyon at gcc dot gnu.org
2022-04-14 11:30 ` rguenth at gcc dot gnu.org
2022-04-14 11:31 ` rguenth at gcc dot gnu.org
2022-04-19 14:42 ` cvs-commit at gcc dot gnu.org
2022-04-19 14:43 ` rguenth at gcc dot gnu.org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).