public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug tree-optimization/115534] New: intermediate stack use not eliminated
@ 2024-06-18  7:48 tnfchris at gcc dot gnu.org
  2024-06-18 12:36 ` [Bug tree-optimization/115534] " pinskia at gcc dot gnu.org
                   ` (5 more replies)
  0 siblings, 6 replies; 7+ messages in thread
From: tnfchris at gcc dot gnu.org @ 2024-06-18  7:48 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115534

            Bug ID: 115534
           Summary: intermediate stack use not eliminated
           Product: gcc
           Version: 15.0
            Status: UNCONFIRMED
          Keywords: missed-optimization
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: tnfchris at gcc dot gnu.org
  Target Milestone: ---

Consider the following example:

#include <stdint.h>

typedef struct _pixel_t
{
  double red, green, blue, opacity;
} pixel_t;

typedef struct _PixelPacket
{
  unsigned short blue, green, red, opacity;
} PixelPacket;

pixel_t f (unsigned height, unsigned width, unsigned virt_width,
           uint8_t *restrict k, const PixelPacket *restrict k_pixels)
{
    pixel_t result = {};
    for (unsigned u=0; u < (width & -4); u++, k--) {
        result.red     += (*k)*k_pixels[u].red;
        result.green   += (*k)*k_pixels[u].green;
        result.blue    += (*k)*k_pixels[u].blue;
        result.opacity += (*k)*k_pixels[u].opacity;
        k_pixels += virt_width;
    }
    return result;
}

---

compiled with -O3 vectorizes as good, but the epilogue code is very
inefficient:

        fadd    v29.2d, v29.2d, v30.2d
        fadd    v28.2d, v28.2d, v31.2d
        cmp     w5, w1
        bhi     .L3
        mov     v31.16b, v28.16b
        ins     v31.d[1], v29.d[1]
        ins     v29.d[1], v28.d[1]
        stp     q31, q29, [sp, 32]
        ldp     d0, d1, [sp, 32]
        ldp     d2, d3, [sp, 48]
        add     sp, sp, 64
        ret
.L4:
        movi    v29.2d, 0
        mov     v31.16b, v29.16b
        stp     q31, q29, [sp, 32]
        ldp     d0, d1, [sp, 32]
        ldp     d2, d3, [sp, 48]
        add     sp, sp, 64
        ret

as in it goes through the stack to create the return registers.  This looks
like  at gimple we still have the store:

  <bb 5> [local count: 105119324]:
  _33 = VEC_PERM_EXPR <vect__10.16_41, vect__10.16_42, { 0, 3 }>;
  _31 = VEC_PERM_EXPR <vect__10.16_42, vect__10.16_41, { 0, 3 }>;

  <bb 6> [local count: 118111600]:
  # vect_result_red_64.18_28 = PHI <_33(5), { 0.0, 0.0 }(2)>
  # vect_result_red_64.18_105 = PHI <_31(5), { 0.0, 0.0 }(2)>
  MEM <vector(2) double> [(double *)&D.4535] = vect_result_red_64.18_28;
  MEM <vector(2) double> [(double *)&D.4535 + 16B] = vect_result_red_64.18_105;
  return D.4535;

clang is able to generate much better code here:

        fadd    v0.2d, v0.2d, v1.2d
        fadd    v2.2d, v2.2d, v3.2d
        b.ne    .LBB0_2
.LBB0_3:
        mov     d1, v2.d[1]
        mov     d3, v0.d[1]
        ret

The vectorized code gets reg-alloc'ed so that d0 an d2 are already in the right
registers at the end of the vector loop, and the epilogue only has to split the
registers up to get d1 and d3.

I think we would generate the same if we were to elide the intermediate stack
store.

See https://godbolt.org/z/ocqchWWs5

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug tree-optimization/115534] intermediate stack use not eliminated
  2024-06-18  7:48 [Bug tree-optimization/115534] New: intermediate stack use not eliminated tnfchris at gcc dot gnu.org
@ 2024-06-18 12:36 ` pinskia at gcc dot gnu.org
  2024-06-18 12:36 ` [Bug middle-end/115534] " pinskia at gcc dot gnu.org
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: pinskia at gcc dot gnu.org @ 2024-06-18 12:36 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115534

--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
I suspect there is a dup of this already. See the bug which I made this one
blocking for a list of related bugs.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug middle-end/115534] intermediate stack use not eliminated
  2024-06-18  7:48 [Bug tree-optimization/115534] New: intermediate stack use not eliminated tnfchris at gcc dot gnu.org
  2024-06-18 12:36 ` [Bug tree-optimization/115534] " pinskia at gcc dot gnu.org
@ 2024-06-18 12:36 ` pinskia at gcc dot gnu.org
  2024-06-18 12:49 ` tnfchris at gcc dot gnu.org
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: pinskia at gcc dot gnu.org @ 2024-06-18 12:36 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115534

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
          Component|tree-optimization           |middle-end
           Severity|normal                      |enhancement

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug middle-end/115534] intermediate stack use not eliminated
  2024-06-18  7:48 [Bug tree-optimization/115534] New: intermediate stack use not eliminated tnfchris at gcc dot gnu.org
  2024-06-18 12:36 ` [Bug tree-optimization/115534] " pinskia at gcc dot gnu.org
  2024-06-18 12:36 ` [Bug middle-end/115534] " pinskia at gcc dot gnu.org
@ 2024-06-18 12:49 ` tnfchris at gcc dot gnu.org
  2024-06-18 17:26 ` pinskia at gcc dot gnu.org
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: tnfchris at gcc dot gnu.org @ 2024-06-18 12:49 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115534

--- Comment #2 from Tamar Christina <tnfchris at gcc dot gnu.org> ---
(In reply to Andrew Pinski from comment #1)
> I suspect there is a dup of this already. See the bug which I made this one
> blocking for a list of related bugs.

Most of the other bugs relate to the argument expansions, however this one,
regardless of the expansion itself shouldn't need the intermediate stack.

I think there are various other ways the operation could have been kept in a
gimple register.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug middle-end/115534] intermediate stack use not eliminated
  2024-06-18  7:48 [Bug tree-optimization/115534] New: intermediate stack use not eliminated tnfchris at gcc dot gnu.org
                   ` (2 preceding siblings ...)
  2024-06-18 12:49 ` tnfchris at gcc dot gnu.org
@ 2024-06-18 17:26 ` pinskia at gcc dot gnu.org
  2024-06-18 17:29 ` pinskia at gcc dot gnu.org
  2024-06-18 18:11 ` tnfchris at gcc dot gnu.org
  5 siblings, 0 replies; 7+ messages in thread
From: pinskia at gcc dot gnu.org @ 2024-06-18 17:26 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115534

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Last reconfirmed|                            |2024-06-18
             Status|UNCONFIRMED                 |NEW
     Ever confirmed|0                           |1

--- Comment #3 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Confirmed.

Note even without vectorizer turned on we get really bad code from the return
when expanding (just happens the RTL optimizers can remove the load/stores but
not the stack location):
```

;; return D.4535;

(insn 60 59 61 (set (reg:DF 156)
        (mem/c:DF (plus:DI (reg/f:DI 95 virtual-stack-vars)
                (const_int -32 [0xffffffffffffffe0])) [4 D.4535+0 S8 A128]))
"/app/example.cpp":24:12 -1
     (nil))

(insn 61 60 62 (set (reg:DF 157)
        (mem/c:DF (plus:DI (reg/f:DI 95 virtual-stack-vars)
                (const_int -24 [0xffffffffffffffe8])) [4 D.4535+8 S8 A64]))
"/app/example.cpp":24:12 -1
     (nil))

(insn 62 61 63 (set (reg:DF 158)
        (mem/c:DF (plus:DI (reg/f:DI 95 virtual-stack-vars)
                (const_int -16 [0xfffffffffffffff0])) [4 D.4535+16 S8 A128]))
"/app/example.cpp":24:12 -1
     (nil))

(insn 63 62 64 (set (reg:DF 159)
        (mem/c:DF (plus:DI (reg/f:DI 95 virtual-stack-vars)
                (const_int -8 [0xfffffffffffffff8])) [4 D.4535+24 S8 A64]))
"/app/example.cpp":24:12 -1
     (nil))

(insn 64 63 65 (set (reg:DF 132 [ <retval> ])
        (reg:DF 156)) "/app/example.cpp":24:12 -1
     (nil))

(insn 65 64 66 (set (reg:DF 133 [ <retval>+8 ])
        (reg:DF 157)) "/app/example.cpp":24:12 -1
     (nil))

(insn 66 65 67 (set (reg:DF 134 [ <retval>+16 ])
        (reg:DF 158)) "/app/example.cpp":24:12 -1
     (nil))

(insn 67 66 68 (set (reg:DF 135 [ <retval>+24 ])
        (reg:DF 159)) "/app/example.cpp":24:12 -1
     (nil))

(jump_insn 68 67 69 (set (pc)
        (label_ref 0)) "/app/example.cpp":24:12 -1
     (nil))
```

On the stack location we get:
```
        sub     sp, sp, #64
```

This is why I said there are a few duplicates there ...

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug middle-end/115534] intermediate stack use not eliminated
  2024-06-18  7:48 [Bug tree-optimization/115534] New: intermediate stack use not eliminated tnfchris at gcc dot gnu.org
                   ` (3 preceding siblings ...)
  2024-06-18 17:26 ` pinskia at gcc dot gnu.org
@ 2024-06-18 17:29 ` pinskia at gcc dot gnu.org
  2024-06-18 18:11 ` tnfchris at gcc dot gnu.org
  5 siblings, 0 replies; 7+ messages in thread
From: pinskia at gcc dot gnu.org @ 2024-06-18 17:29 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115534

--- Comment #4 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
This might be improved by
https://gcc.gnu.org/pipermail/gcc-patches/2024-June/654819.html . Or it might
be the case the vectorizer case needs to be improved afterwards. But I think
that is the infrastructure for fixing this issue.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug middle-end/115534] intermediate stack use not eliminated
  2024-06-18  7:48 [Bug tree-optimization/115534] New: intermediate stack use not eliminated tnfchris at gcc dot gnu.org
                   ` (4 preceding siblings ...)
  2024-06-18 17:29 ` pinskia at gcc dot gnu.org
@ 2024-06-18 18:11 ` tnfchris at gcc dot gnu.org
  5 siblings, 0 replies; 7+ messages in thread
From: tnfchris at gcc dot gnu.org @ 2024-06-18 18:11 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115534

--- Comment #5 from Tamar Christina <tnfchris at gcc dot gnu.org> ---
(In reply to Andrew Pinski from comment #4)
> This might be improved by
> https://gcc.gnu.org/pipermail/gcc-patches/2024-June/654819.html . Or it
> might be the case the vectorizer case needs to be improved afterwards. But I
> think that is the infrastructure for fixing this issue.

Yeah Richard pointed me to this today as well. The vectorizer case is a bit
unique because the vectorizer has packed scalar values in two vector registers.

So yeah think it's likely some work will be needed afterwards but will see
after the fsra patch lands :)

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2024-06-18 18:11 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-06-18  7:48 [Bug tree-optimization/115534] New: intermediate stack use not eliminated tnfchris at gcc dot gnu.org
2024-06-18 12:36 ` [Bug tree-optimization/115534] " pinskia at gcc dot gnu.org
2024-06-18 12:36 ` [Bug middle-end/115534] " pinskia at gcc dot gnu.org
2024-06-18 12:49 ` tnfchris at gcc dot gnu.org
2024-06-18 17:26 ` pinskia at gcc dot gnu.org
2024-06-18 17:29 ` pinskia at gcc dot gnu.org
2024-06-18 18:11 ` tnfchris at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).