public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug tree-optimization/106343] New: Addition with constants is not vectorized by SLP when it includes zero
@ 2022-07-18 15:51 manolis.tsamis at vrull dot eu
  2022-07-18 16:07 ` [Bug tree-optimization/106343] " ktkachov at gcc dot gnu.org
                   ` (8 more replies)
  0 siblings, 9 replies; 10+ messages in thread
From: manolis.tsamis at vrull dot eu @ 2022-07-18 15:51 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106343

            Bug ID: 106343
           Summary: Addition with constants is not vectorized by SLP when
                    it includes zero
           Product: gcc
           Version: 13.0
            Status: UNCONFIRMED
          Keywords: missed-optimization
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: manolis.tsamis at vrull dot eu
  Target Milestone: ---
            Target: aarch64

Created attachment 53316
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53316&action=edit
Does not vectorize

The following test case:

  void foo (uint32_t dst[8], uint8_t src1[8], uint8_t src2[8])
  {
    uint16_t diff_e0 = src1[0] - src2[0];
    uint16_t diff_e1 = src1[1] - src2[1];
    uint16_t diff_e2 = src1[2] - src2[2];
    uint16_t diff_e3 = src1[3] - src2[3];
    uint16_t diff_e4 = src1[4] - src2[4];
    uint16_t diff_e5 = src1[5] - src2[5];
    uint16_t diff_e6 = src1[6] - src2[6];
    uint16_t diff_e7 = src1[7] - src2[7];

    uint32_t a0 = diff_e0 + 1;
    uint32_t a1 = diff_e1 + 3;
    uint32_t a2 = diff_e2 + 4;
    uint32_t a3 = diff_e3 + 2;
    uint32_t a4 = diff_e4 + 12;
    uint32_t a5 = diff_e5 + 11;
    uint32_t a6 = diff_e6 + 9;
    uint32_t a7 = diff_e7 + 3;

    dst[0] = a0;
    dst[1] = a1;
    dst[2] = a2;
    dst[3] = a3;
    dst[4] = a4;
    dst[5] = a5;
    dst[6] = a6;
    dst[7] = a7;
  }

Produces nice vectorized code on aarch64:

  ldr     d2, [x2]
  adrp    x3, .LC0
  ldr     d0, [x1]
  ldr     q1, [x3, #:lo12:.LC0]
  usubl   v0.8h, v0.8b, v2.8b
  uaddl   v2.4s, v0.4h, v1.4h
  uaddl2  v0.4s, v0.8h, v1.8h
  stp     q2, q0, [x0]
  ret

But if any of the constants is replaced with zero instead then scalar code is
produced:

  ldrb    w4, [x2, 1]
  ldrb    w8, [x1, 1]
  ldrb    w3, [x2, 3]
  ldrb    w7, [x1, 3]
  sub     w8, w8, w4
  ldrb    w6, [x1, 4]
  and     w8, w8, 65535
  ldrb    w4, [x2, 4]
  sub     w7, w7, w3
  ldrb    w5, [x1, 5]
  and     w7, w7, 65535
  ldrb    w3, [x2, 5]
  sub     w6, w6, w4
  ldrb    w9, [x2, 6]
  and     w6, w6, 65535
  ldrb    w4, [x1, 6]
  sub     w5, w5, w3
  ldrb    w10, [x2, 7]
  and     w5, w5, 65535
  ldrb    w3, [x1, 7]
  sub     w4, w4, w9
  ldrb    w11, [x2]
  and     w4, w4, 65535
  ldrb    w9, [x1]
  sub     w3, w3, w10
  ldrb    w2, [x2, 2]
  add     w8, w8, 3
  ldrb    w10, [x1, 2]
  sub     w9, w9, w11
  and     w1, w3, 65535
  and     w9, w9, 65535
  sub     w10, w10, w2
  add     w3, w5, 11
  add     w2, w4, 9
  add     w7, w7, 2
  add     w6, w6, 12
  add     w1, w1, 3
  add     w4, w9, 1
  and     w5, w10, 65535
  stp     w4, w8, [x0]
  stp     w5, w7, [x0, 8]
  stp     w6, w3, [x0, 16]
  stp     w2, w1, [x0, 24]
  ret

It would be possible to produce the same vectorized code as above but with zero
in the constants. If I understand correctly, the identity element of addition
is not taken into consideration in the SLP vectorizer, which could be improved.
The same happens with subtraction.

I can reproduce this in any recent version of GCC (e.g. >= 10).

Vectorized case: https://godbolt.org/z/5sbb1an89
Scalar case:     https://godbolt.org/z/v8jPT9jEe

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug tree-optimization/106343] Addition with constants is not vectorized by SLP when it includes zero
  2022-07-18 15:51 [Bug tree-optimization/106343] New: Addition with constants is not vectorized by SLP when it includes zero manolis.tsamis at vrull dot eu
@ 2022-07-18 16:07 ` ktkachov at gcc dot gnu.org
  2022-07-18 16:12 ` [Bug tree-optimization/106343] SLP does not support no-op case pinskia at gcc dot gnu.org
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: ktkachov at gcc dot gnu.org @ 2022-07-18 16:07 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106343

ktkachov at gcc dot gnu.org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
     Ever confirmed|0                           |1
   Last reconfirmed|                            |2022-07-18
                 CC|                            |ktkachov at gcc dot gnu.org,
                   |                            |rguenth at gcc dot gnu.org
             Target|aarch64                     |aarch64, x86_64

--- Comment #1 from ktkachov at gcc dot gnu.org ---
Confirmed, it's quite odd. x86_64 is also affected:
https://godbolt.org/z/q46z3hh9Y

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug tree-optimization/106343] SLP does not support no-op case
  2022-07-18 15:51 [Bug tree-optimization/106343] New: Addition with constants is not vectorized by SLP when it includes zero manolis.tsamis at vrull dot eu
  2022-07-18 16:07 ` [Bug tree-optimization/106343] " ktkachov at gcc dot gnu.org
@ 2022-07-18 16:12 ` pinskia at gcc dot gnu.org
  2022-07-18 16:12 ` pinskia at gcc dot gnu.org
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: pinskia at gcc dot gnu.org @ 2022-07-18 16:12 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106343

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |UNCONFIRMED
     Ever confirmed|1                           |0
           Severity|normal                      |enhancement
   Last reconfirmed|2022-07-18 00:00:00         |
             Target|aarch64, x86_64             |aarch64
            Summary|Addition with constants is  |SLP does not support no-op
                   |not vectorized by SLP when  |case
                   |it includes zero            |

--- Comment #2 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Also an issue with multiply:

void foo (unsigned *__restrict dst, unsigned *__restrict src1)
{
    dst[0] = src1[0] * 1;
    dst[1] = src1[1] * 2;
    dst[2] = src1[2] * 3;
    dst[3] = src1[3] * 4;
    dst[4] = src1[4] * 5;
    dst[5] = src1[5] * 6;
    dst[6] = src1[6] * 7;
    dst[7] = src1[7] * 8;
}

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug tree-optimization/106343] SLP does not support no-op case
  2022-07-18 15:51 [Bug tree-optimization/106343] New: Addition with constants is not vectorized by SLP when it includes zero manolis.tsamis at vrull dot eu
  2022-07-18 16:07 ` [Bug tree-optimization/106343] " ktkachov at gcc dot gnu.org
  2022-07-18 16:12 ` [Bug tree-optimization/106343] SLP does not support no-op case pinskia at gcc dot gnu.org
@ 2022-07-18 16:12 ` pinskia at gcc dot gnu.org
  2022-07-18 16:20 ` pinskia at gcc dot gnu.org
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: pinskia at gcc dot gnu.org @ 2022-07-18 16:12 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106343

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
     Ever confirmed|0                           |1
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2022-07-18

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug tree-optimization/106343] SLP does not support no-op case
  2022-07-18 15:51 [Bug tree-optimization/106343] New: Addition with constants is not vectorized by SLP when it includes zero manolis.tsamis at vrull dot eu
                   ` (2 preceding siblings ...)
  2022-07-18 16:12 ` pinskia at gcc dot gnu.org
@ 2022-07-18 16:20 ` pinskia at gcc dot gnu.org
  2022-07-18 16:24 ` pinskia at gcc dot gnu.org
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: pinskia at gcc dot gnu.org @ 2022-07-18 16:20 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106343

--- Comment #3 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
I should note that I noticed LLVM does not handle this either.

Basically the following operators and values can be used:
For integer:
+ 0
- 0
* 1
/ 1
| 0
& -1 (all ones)
^ 0

For floating point (only with -ffast-math, I think sub can be used with 0 and
add with -0.0 without but I am not 100% sure):
+ 0
- 0
* 1
/ 1

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug tree-optimization/106343] SLP does not support no-op case
  2022-07-18 15:51 [Bug tree-optimization/106343] New: Addition with constants is not vectorized by SLP when it includes zero manolis.tsamis at vrull dot eu
                   ` (3 preceding siblings ...)
  2022-07-18 16:20 ` pinskia at gcc dot gnu.org
@ 2022-07-18 16:24 ` pinskia at gcc dot gnu.org
  2022-07-18 17:42 ` eochoa at gcc dot gnu.org
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: pinskia at gcc dot gnu.org @ 2022-07-18 16:24 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106343

--- Comment #4 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
(In reply to Andrew Pinski from comment #3)
> I should note that I noticed LLVM does not handle this either.
> 
> Basically the following operators and values can be used:
> For integer:
> + 0
> - 0
> * 1
> / 1
> | 0
> & -1 (all ones)
> ^ 0
> 
> For floating point (only with -ffast-math, I think sub can be used with 0
> and add with -0.0 without but I am not 100% sure):
> + 0
> - 0
> * 1
> / 1

Note for the following operators can support some constants which were there
instead of a calculation (note this might be harder and maybe a different bug):
op cst   rhs
*   0     0
|  -1    -1
&   0     0

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug tree-optimization/106343] SLP does not support no-op case
  2022-07-18 15:51 [Bug tree-optimization/106343] New: Addition with constants is not vectorized by SLP when it includes zero manolis.tsamis at vrull dot eu
                   ` (4 preceding siblings ...)
  2022-07-18 16:24 ` pinskia at gcc dot gnu.org
@ 2022-07-18 17:42 ` eochoa at gcc dot gnu.org
  2022-07-18 17:48 ` pinskia at gcc dot gnu.org
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: eochoa at gcc dot gnu.org @ 2022-07-18 17:42 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106343

Erick Ochoa <eochoa at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |eochoa at gcc dot gnu.org

--- Comment #5 from Erick Ochoa <eochoa at gcc dot gnu.org> ---
(In reply to Andrew Pinski from comment #3)
> I should note that I noticed LLVM does not handle this either.
> 
> Basically the following operators and values can be used:
> For integer:
> + 0
> - 0
> * 1
> / 1
> | 0
> & -1 (all ones)
> ^ 0
> 
> For floating point (only with -ffast-math, I think sub can be used with 0
> and add with -0.0 without but I am not 100% sure):
> + 0
> - 0
> * 1
> / 1

I think it should be possible to also consider the bit-shifts operations:

>> 0
<< 0

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug tree-optimization/106343] SLP does not support no-op case
  2022-07-18 15:51 [Bug tree-optimization/106343] New: Addition with constants is not vectorized by SLP when it includes zero manolis.tsamis at vrull dot eu
                   ` (5 preceding siblings ...)
  2022-07-18 17:42 ` eochoa at gcc dot gnu.org
@ 2022-07-18 17:48 ` pinskia at gcc dot gnu.org
  2022-07-19  6:44 ` rguenth at gcc dot gnu.org
  2023-11-17  6:19 ` pinskia at gcc dot gnu.org
  8 siblings, 0 replies; 10+ messages in thread
From: pinskia at gcc dot gnu.org @ 2022-07-18 17:48 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106343

--- Comment #6 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
(In reply to Erick Ochoa from comment #5)
> I think it should be possible to also consider the bit-shifts operations:
> 
> >> 0
> << 0

Yes and rotates too.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug tree-optimization/106343] SLP does not support no-op case
  2022-07-18 15:51 [Bug tree-optimization/106343] New: Addition with constants is not vectorized by SLP when it includes zero manolis.tsamis at vrull dot eu
                   ` (6 preceding siblings ...)
  2022-07-18 17:48 ` pinskia at gcc dot gnu.org
@ 2022-07-19  6:44 ` rguenth at gcc dot gnu.org
  2023-11-17  6:19 ` pinskia at gcc dot gnu.org
  8 siblings, 0 replies; 10+ messages in thread
From: rguenth at gcc dot gnu.org @ 2022-07-19  6:44 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106343

--- Comment #7 from Richard Biener <rguenth at gcc dot gnu.org> ---
SLP discovery doesn't support this (and there's for sure some duplicate bug
about this).  Note that SLP discovery currently does a greedy search from the
stores and it commits to the first "working" graph (where "working" differs
from loop vs. non-loop operation), opening up more "fixup" possibilities will
also open up the chance for it to de-rail more easily.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug tree-optimization/106343] SLP does not support no-op case
  2022-07-18 15:51 [Bug tree-optimization/106343] New: Addition with constants is not vectorized by SLP when it includes zero manolis.tsamis at vrull dot eu
                   ` (7 preceding siblings ...)
  2022-07-19  6:44 ` rguenth at gcc dot gnu.org
@ 2023-11-17  6:19 ` pinskia at gcc dot gnu.org
  8 siblings, 0 replies; 10+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-11-17  6:19 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106343

--- Comment #8 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
*** Bug 112579 has been marked as a duplicate of this bug. ***

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2023-11-17  6:19 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-07-18 15:51 [Bug tree-optimization/106343] New: Addition with constants is not vectorized by SLP when it includes zero manolis.tsamis at vrull dot eu
2022-07-18 16:07 ` [Bug tree-optimization/106343] " ktkachov at gcc dot gnu.org
2022-07-18 16:12 ` [Bug tree-optimization/106343] SLP does not support no-op case pinskia at gcc dot gnu.org
2022-07-18 16:12 ` pinskia at gcc dot gnu.org
2022-07-18 16:20 ` pinskia at gcc dot gnu.org
2022-07-18 16:24 ` pinskia at gcc dot gnu.org
2022-07-18 17:42 ` eochoa at gcc dot gnu.org
2022-07-18 17:48 ` pinskia at gcc dot gnu.org
2022-07-19  6:44 ` rguenth at gcc dot gnu.org
2023-11-17  6:19 ` pinskia at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).