public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug tree-optimization/106343] New: Addition with constants is not vectorized by SLP when it includes zero
@ 2022-07-18 15:51 manolis.tsamis at vrull dot eu
2022-07-18 16:07 ` [Bug tree-optimization/106343] " ktkachov at gcc dot gnu.org
` (8 more replies)
0 siblings, 9 replies; 10+ messages in thread
From: manolis.tsamis at vrull dot eu @ 2022-07-18 15:51 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106343
Bug ID: 106343
Summary: Addition with constants is not vectorized by SLP when
it includes zero
Product: gcc
Version: 13.0
Status: UNCONFIRMED
Keywords: missed-optimization
Severity: normal
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: manolis.tsamis at vrull dot eu
Target Milestone: ---
Target: aarch64
Created attachment 53316
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53316&action=edit
Does not vectorize
The following test case:
void foo (uint32_t dst[8], uint8_t src1[8], uint8_t src2[8])
{
uint16_t diff_e0 = src1[0] - src2[0];
uint16_t diff_e1 = src1[1] - src2[1];
uint16_t diff_e2 = src1[2] - src2[2];
uint16_t diff_e3 = src1[3] - src2[3];
uint16_t diff_e4 = src1[4] - src2[4];
uint16_t diff_e5 = src1[5] - src2[5];
uint16_t diff_e6 = src1[6] - src2[6];
uint16_t diff_e7 = src1[7] - src2[7];
uint32_t a0 = diff_e0 + 1;
uint32_t a1 = diff_e1 + 3;
uint32_t a2 = diff_e2 + 4;
uint32_t a3 = diff_e3 + 2;
uint32_t a4 = diff_e4 + 12;
uint32_t a5 = diff_e5 + 11;
uint32_t a6 = diff_e6 + 9;
uint32_t a7 = diff_e7 + 3;
dst[0] = a0;
dst[1] = a1;
dst[2] = a2;
dst[3] = a3;
dst[4] = a4;
dst[5] = a5;
dst[6] = a6;
dst[7] = a7;
}
Produces nice vectorized code on aarch64:
ldr d2, [x2]
adrp x3, .LC0
ldr d0, [x1]
ldr q1, [x3, #:lo12:.LC0]
usubl v0.8h, v0.8b, v2.8b
uaddl v2.4s, v0.4h, v1.4h
uaddl2 v0.4s, v0.8h, v1.8h
stp q2, q0, [x0]
ret
But if any of the constants is replaced with zero instead then scalar code is
produced:
ldrb w4, [x2, 1]
ldrb w8, [x1, 1]
ldrb w3, [x2, 3]
ldrb w7, [x1, 3]
sub w8, w8, w4
ldrb w6, [x1, 4]
and w8, w8, 65535
ldrb w4, [x2, 4]
sub w7, w7, w3
ldrb w5, [x1, 5]
and w7, w7, 65535
ldrb w3, [x2, 5]
sub w6, w6, w4
ldrb w9, [x2, 6]
and w6, w6, 65535
ldrb w4, [x1, 6]
sub w5, w5, w3
ldrb w10, [x2, 7]
and w5, w5, 65535
ldrb w3, [x1, 7]
sub w4, w4, w9
ldrb w11, [x2]
and w4, w4, 65535
ldrb w9, [x1]
sub w3, w3, w10
ldrb w2, [x2, 2]
add w8, w8, 3
ldrb w10, [x1, 2]
sub w9, w9, w11
and w1, w3, 65535
and w9, w9, 65535
sub w10, w10, w2
add w3, w5, 11
add w2, w4, 9
add w7, w7, 2
add w6, w6, 12
add w1, w1, 3
add w4, w9, 1
and w5, w10, 65535
stp w4, w8, [x0]
stp w5, w7, [x0, 8]
stp w6, w3, [x0, 16]
stp w2, w1, [x0, 24]
ret
It would be possible to produce the same vectorized code as above but with zero
in the constants. If I understand correctly, the identity element of addition
is not taken into consideration in the SLP vectorizer, which could be improved.
The same happens with subtraction.
I can reproduce this in any recent version of GCC (e.g. >= 10).
Vectorized case: https://godbolt.org/z/5sbb1an89
Scalar case: https://godbolt.org/z/v8jPT9jEe
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug tree-optimization/106343] Addition with constants is not vectorized by SLP when it includes zero
2022-07-18 15:51 [Bug tree-optimization/106343] New: Addition with constants is not vectorized by SLP when it includes zero manolis.tsamis at vrull dot eu
@ 2022-07-18 16:07 ` ktkachov at gcc dot gnu.org
2022-07-18 16:12 ` [Bug tree-optimization/106343] SLP does not support no-op case pinskia at gcc dot gnu.org
` (7 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: ktkachov at gcc dot gnu.org @ 2022-07-18 16:07 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106343
ktkachov at gcc dot gnu.org changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|UNCONFIRMED |NEW
Ever confirmed|0 |1
Last reconfirmed| |2022-07-18
CC| |ktkachov at gcc dot gnu.org,
| |rguenth at gcc dot gnu.org
Target|aarch64 |aarch64, x86_64
--- Comment #1 from ktkachov at gcc dot gnu.org ---
Confirmed, it's quite odd. x86_64 is also affected:
https://godbolt.org/z/q46z3hh9Y
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug tree-optimization/106343] SLP does not support no-op case
2022-07-18 15:51 [Bug tree-optimization/106343] New: Addition with constants is not vectorized by SLP when it includes zero manolis.tsamis at vrull dot eu
2022-07-18 16:07 ` [Bug tree-optimization/106343] " ktkachov at gcc dot gnu.org
@ 2022-07-18 16:12 ` pinskia at gcc dot gnu.org
2022-07-18 16:12 ` pinskia at gcc dot gnu.org
` (6 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: pinskia at gcc dot gnu.org @ 2022-07-18 16:12 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106343
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |UNCONFIRMED
Ever confirmed|1 |0
Severity|normal |enhancement
Last reconfirmed|2022-07-18 00:00:00 |
Target|aarch64, x86_64 |aarch64
Summary|Addition with constants is |SLP does not support no-op
|not vectorized by SLP when |case
|it includes zero |
--- Comment #2 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Also an issue with multiply:
void foo (unsigned *__restrict dst, unsigned *__restrict src1)
{
dst[0] = src1[0] * 1;
dst[1] = src1[1] * 2;
dst[2] = src1[2] * 3;
dst[3] = src1[3] * 4;
dst[4] = src1[4] * 5;
dst[5] = src1[5] * 6;
dst[6] = src1[6] * 7;
dst[7] = src1[7] * 8;
}
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug tree-optimization/106343] SLP does not support no-op case
2022-07-18 15:51 [Bug tree-optimization/106343] New: Addition with constants is not vectorized by SLP when it includes zero manolis.tsamis at vrull dot eu
2022-07-18 16:07 ` [Bug tree-optimization/106343] " ktkachov at gcc dot gnu.org
2022-07-18 16:12 ` [Bug tree-optimization/106343] SLP does not support no-op case pinskia at gcc dot gnu.org
@ 2022-07-18 16:12 ` pinskia at gcc dot gnu.org
2022-07-18 16:20 ` pinskia at gcc dot gnu.org
` (5 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: pinskia at gcc dot gnu.org @ 2022-07-18 16:12 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106343
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Ever confirmed|0 |1
Status|UNCONFIRMED |NEW
Last reconfirmed| |2022-07-18
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug tree-optimization/106343] SLP does not support no-op case
2022-07-18 15:51 [Bug tree-optimization/106343] New: Addition with constants is not vectorized by SLP when it includes zero manolis.tsamis at vrull dot eu
` (2 preceding siblings ...)
2022-07-18 16:12 ` pinskia at gcc dot gnu.org
@ 2022-07-18 16:20 ` pinskia at gcc dot gnu.org
2022-07-18 16:24 ` pinskia at gcc dot gnu.org
` (4 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: pinskia at gcc dot gnu.org @ 2022-07-18 16:20 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106343
--- Comment #3 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
I should note that I noticed LLVM does not handle this either.
Basically the following operators and values can be used:
For integer:
+ 0
- 0
* 1
/ 1
| 0
& -1 (all ones)
^ 0
For floating point (only with -ffast-math, I think sub can be used with 0 and
add with -0.0 without but I am not 100% sure):
+ 0
- 0
* 1
/ 1
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug tree-optimization/106343] SLP does not support no-op case
2022-07-18 15:51 [Bug tree-optimization/106343] New: Addition with constants is not vectorized by SLP when it includes zero manolis.tsamis at vrull dot eu
` (3 preceding siblings ...)
2022-07-18 16:20 ` pinskia at gcc dot gnu.org
@ 2022-07-18 16:24 ` pinskia at gcc dot gnu.org
2022-07-18 17:42 ` eochoa at gcc dot gnu.org
` (3 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: pinskia at gcc dot gnu.org @ 2022-07-18 16:24 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106343
--- Comment #4 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
(In reply to Andrew Pinski from comment #3)
> I should note that I noticed LLVM does not handle this either.
>
> Basically the following operators and values can be used:
> For integer:
> + 0
> - 0
> * 1
> / 1
> | 0
> & -1 (all ones)
> ^ 0
>
> For floating point (only with -ffast-math, I think sub can be used with 0
> and add with -0.0 without but I am not 100% sure):
> + 0
> - 0
> * 1
> / 1
Note for the following operators can support some constants which were there
instead of a calculation (note this might be harder and maybe a different bug):
op cst rhs
* 0 0
| -1 -1
& 0 0
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug tree-optimization/106343] SLP does not support no-op case
2022-07-18 15:51 [Bug tree-optimization/106343] New: Addition with constants is not vectorized by SLP when it includes zero manolis.tsamis at vrull dot eu
` (4 preceding siblings ...)
2022-07-18 16:24 ` pinskia at gcc dot gnu.org
@ 2022-07-18 17:42 ` eochoa at gcc dot gnu.org
2022-07-18 17:48 ` pinskia at gcc dot gnu.org
` (2 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: eochoa at gcc dot gnu.org @ 2022-07-18 17:42 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106343
Erick Ochoa <eochoa at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |eochoa at gcc dot gnu.org
--- Comment #5 from Erick Ochoa <eochoa at gcc dot gnu.org> ---
(In reply to Andrew Pinski from comment #3)
> I should note that I noticed LLVM does not handle this either.
>
> Basically the following operators and values can be used:
> For integer:
> + 0
> - 0
> * 1
> / 1
> | 0
> & -1 (all ones)
> ^ 0
>
> For floating point (only with -ffast-math, I think sub can be used with 0
> and add with -0.0 without but I am not 100% sure):
> + 0
> - 0
> * 1
> / 1
I think it should be possible to also consider the bit-shifts operations:
>> 0
<< 0
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug tree-optimization/106343] SLP does not support no-op case
2022-07-18 15:51 [Bug tree-optimization/106343] New: Addition with constants is not vectorized by SLP when it includes zero manolis.tsamis at vrull dot eu
` (5 preceding siblings ...)
2022-07-18 17:42 ` eochoa at gcc dot gnu.org
@ 2022-07-18 17:48 ` pinskia at gcc dot gnu.org
2022-07-19 6:44 ` rguenth at gcc dot gnu.org
2023-11-17 6:19 ` pinskia at gcc dot gnu.org
8 siblings, 0 replies; 10+ messages in thread
From: pinskia at gcc dot gnu.org @ 2022-07-18 17:48 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106343
--- Comment #6 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
(In reply to Erick Ochoa from comment #5)
> I think it should be possible to also consider the bit-shifts operations:
>
> >> 0
> << 0
Yes and rotates too.
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug tree-optimization/106343] SLP does not support no-op case
2022-07-18 15:51 [Bug tree-optimization/106343] New: Addition with constants is not vectorized by SLP when it includes zero manolis.tsamis at vrull dot eu
` (6 preceding siblings ...)
2022-07-18 17:48 ` pinskia at gcc dot gnu.org
@ 2022-07-19 6:44 ` rguenth at gcc dot gnu.org
2023-11-17 6:19 ` pinskia at gcc dot gnu.org
8 siblings, 0 replies; 10+ messages in thread
From: rguenth at gcc dot gnu.org @ 2022-07-19 6:44 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106343
--- Comment #7 from Richard Biener <rguenth at gcc dot gnu.org> ---
SLP discovery doesn't support this (and there's for sure some duplicate bug
about this). Note that SLP discovery currently does a greedy search from the
stores and it commits to the first "working" graph (where "working" differs
from loop vs. non-loop operation), opening up more "fixup" possibilities will
also open up the chance for it to de-rail more easily.
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug tree-optimization/106343] SLP does not support no-op case
2022-07-18 15:51 [Bug tree-optimization/106343] New: Addition with constants is not vectorized by SLP when it includes zero manolis.tsamis at vrull dot eu
` (7 preceding siblings ...)
2022-07-19 6:44 ` rguenth at gcc dot gnu.org
@ 2023-11-17 6:19 ` pinskia at gcc dot gnu.org
8 siblings, 0 replies; 10+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-11-17 6:19 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106343
--- Comment #8 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
*** Bug 112579 has been marked as a duplicate of this bug. ***
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2023-11-17 6:19 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-07-18 15:51 [Bug tree-optimization/106343] New: Addition with constants is not vectorized by SLP when it includes zero manolis.tsamis at vrull dot eu
2022-07-18 16:07 ` [Bug tree-optimization/106343] " ktkachov at gcc dot gnu.org
2022-07-18 16:12 ` [Bug tree-optimization/106343] SLP does not support no-op case pinskia at gcc dot gnu.org
2022-07-18 16:12 ` pinskia at gcc dot gnu.org
2022-07-18 16:20 ` pinskia at gcc dot gnu.org
2022-07-18 16:24 ` pinskia at gcc dot gnu.org
2022-07-18 17:42 ` eochoa at gcc dot gnu.org
2022-07-18 17:48 ` pinskia at gcc dot gnu.org
2022-07-19 6:44 ` rguenth at gcc dot gnu.org
2023-11-17 6:19 ` pinskia at gcc dot gnu.org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).