public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug rtl-optimization/114187] New: [14 regression] bizarre register dance on x86_64 for pass-by-value struct
@ 2024-03-01  9:16 matteo at mitalia dot net
  2024-03-01  9:25 ` [Bug target/114187] " pinskia at gcc dot gnu.org
                   ` (6 more replies)
  0 siblings, 7 replies; 8+ messages in thread
From: matteo at mitalia dot net @ 2024-03-01  9:16 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114187

            Bug ID: 114187
           Summary: [14 regression] bizarre register dance on x86_64 for
                    pass-by-value struct
           Product: gcc
           Version: 14.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: rtl-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: matteo at mitalia dot net
  Target Milestone: ---

Sample code (+ godbolt link https://godbolt.org/z/zf6e16Wcq )

```
struct P2d {
    double x, y;
};

double sumxy(double x, double y) {
    return x + y;
}

double sumxy_p(P2d p) {
    return p.x + p.y;
}

double sumxy_p_ref(const P2d& p) {
    return p.x + p.y;
}
```

with g++ 13.2 -O3 generates a perfectly reasonable

```
sumxy(double, double):
        addsd   xmm0, xmm1
        ret
sumxy_p(P2d):
        addsd   xmm0, xmm1
        ret
sumxy_p_ref(P2d const&):
        movsd   xmm0, QWORD PTR [rdi]
        addsd   xmm0, QWORD PTR [rdi+8]
        ret
```

instead with g++ 14 (g++
(Compiler-Explorer-Build-gcc-b05f474c8f7768dad50a99a2d676660ee4db09c6-binutils-2.40)
14.0.1 20240301 (experimental)) we get

```
sumxy(double, double):
        addsd   xmm0, xmm1
        ret
sumxy_p(P2d):
        movq    rax, xmm1
        movq    rdx, xmm0
        xchg    rdx, rax
        movq    xmm0, rax
        movq    xmm2, rdx
        addsd   xmm0, xmm2
        ret
sumxy_p_ref(P2d const&):
        movsd   xmm0, QWORD PTR [rdi]
        addsd   xmm0, QWORD PTR [rdi+8]
        ret
```

Notice the bizarre registers dance for sumxy_p(P2d) (p.x goes through xmm0 →
rdx → rax → xmm0; p.y in turn xmm1 → rax → rdx → xmm2; then they finally get
summed); sumxy(double, double) which, register-wise, should be the same, is
unaffected.

This exact same code (both for gcc 13 and gcc 14) is generated at all
optimization levels I tested (-Og, -O1, -O2, -O3) except -O0 of course, so it
doesn't seem to depend from particular optimization passes enabled only at high
optimization levels. Also (as reasonable) it doesn't seem to depend on the C++
frontend, as compiling this with plain gcc (adding a typedef for the struct and
changing the reference to a pointer) yields the exact same results.

Most importantly, it seems something target-specific, as ARM64 builds don't
exhibit particular problems, and produce pretty much the same (reasonable) code
both on 14.0 and 13.2

```
sumxy(double, double):
        fadd    d0, d0, d1
        ret
sumxy_p(P2d):
        fadd    d0, d0, d1
        ret
sumxy_p_ref(P2d const&):
        ldp     d0, d31, [x0]
        fadd    d0, d0, d31
        ret
```

(gcc 13.2 generates slightly different code for sumxy_p_ref, but in a very
minor way)

Fiddling around, with -march=nocona (that leaves gcc 13.2 unaffected) I get a
more compact but still absurd dance:

```
sumxy_p(P2d):
        movsd   QWORD PTR [rsp-8], xmm1
        mov     rdx, QWORD PTR [rsp-8]
        movq    xmm2, rdx
        addsd   xmm0, xmm2
        ret
```

here p.x is left in xmm0 where it should, but xmm1 goes through the stack (!),
a GP register (rdx) and finally to xmm2. It feels like in general it wants to
launder xmm1 through a 64 bit GP register before summing it, a bit like a light
version of -ffloat-store.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug target/114187] [14 regression] bizarre register dance on x86_64 for pass-by-value struct
  2024-03-01  9:16 [Bug rtl-optimization/114187] New: [14 regression] bizarre register dance on x86_64 for pass-by-value struct matteo at mitalia dot net
@ 2024-03-01  9:25 ` pinskia at gcc dot gnu.org
  2024-03-01  9:27 ` pinskia at gcc dot gnu.org
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: pinskia at gcc dot gnu.org @ 2024-03-01  9:25 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114187

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|---                         |14.0
                 CC|                            |roger at nextmovesoftware dot com
          Component|rtl-optimization            |target
             Target|                            |X86_64

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug target/114187] [14 regression] bizarre register dance on x86_64 for pass-by-value struct
  2024-03-01  9:16 [Bug rtl-optimization/114187] New: [14 regression] bizarre register dance on x86_64 for pass-by-value struct matteo at mitalia dot net
  2024-03-01  9:25 ` [Bug target/114187] " pinskia at gcc dot gnu.org
@ 2024-03-01  9:27 ` pinskia at gcc dot gnu.org
  2024-03-01 12:53 ` [Bug target/114187] [14 regression] bizarre register dance on x86_64 for pass-by-value struct since r14-2526 jakub at gcc dot gnu.org
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: pinskia at gcc dot gnu.org @ 2024-03-01  9:27 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114187

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|                            |missed-optimization

--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Most likely related to the TImode changes...

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug target/114187] [14 regression] bizarre register dance on x86_64 for pass-by-value struct since r14-2526
  2024-03-01  9:16 [Bug rtl-optimization/114187] New: [14 regression] bizarre register dance on x86_64 for pass-by-value struct matteo at mitalia dot net
  2024-03-01  9:25 ` [Bug target/114187] " pinskia at gcc dot gnu.org
  2024-03-01  9:27 ` pinskia at gcc dot gnu.org
@ 2024-03-01 12:53 ` jakub at gcc dot gnu.org
  2024-03-01 16:08 ` roger at nextmovesoftware dot com
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: jakub at gcc dot gnu.org @ 2024-03-01 12:53 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114187

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Summary|[14 regression] bizarre     |[14 regression] bizarre
                   |register dance on x86_64    |register dance on x86_64
                   |for pass-by-value struct    |for pass-by-value struct
                   |                            |since r14-2526
                 CC|                            |jakub at gcc dot gnu.org

--- Comment #2 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Indeed, r14-2526-g8911879415d6c2a7baad88235554a912887a1c5c

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug target/114187] [14 regression] bizarre register dance on x86_64 for pass-by-value struct since r14-2526
  2024-03-01  9:16 [Bug rtl-optimization/114187] New: [14 regression] bizarre register dance on x86_64 for pass-by-value struct matteo at mitalia dot net
                   ` (2 preceding siblings ...)
  2024-03-01 12:53 ` [Bug target/114187] [14 regression] bizarre register dance on x86_64 for pass-by-value struct since r14-2526 jakub at gcc dot gnu.org
@ 2024-03-01 16:08 ` roger at nextmovesoftware dot com
  2024-03-01 19:15 ` roger at nextmovesoftware dot com
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: roger at nextmovesoftware dot com @ 2024-03-01 16:08 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114187

Roger Sayle <roger at nextmovesoftware dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Last reconfirmed|                            |2024-03-01
             Status|UNCONFIRMED                 |NEW
     Ever confirmed|0                           |1

--- Comment #3 from Roger Sayle <roger at nextmovesoftware dot com> ---
There's a missing simplification in combine:

Trying 6 -> 11:
    6: r102:TI=zero_extend(r109:DF#0)<<0x40|zero_extend(r108:DF#0)
      REG_DEAD r108:DF
      REG_DEAD r109:DF
   11: r105:DF=r102:TI#0+r102:TI#8
      REG_DEAD r102:TI
Failed to match this instruction:
(set (reg:DF 105 [ _4 ])
    (plus:DF (subreg:DF (ior:TI (ashift:TI (zero_extend:TI (subreg:DI (reg:DF
109) 0))
                    (const_int 64 [0x40]))
                (zero_extend:TI (subreg:DI (reg:DF 108) 0))) 8)
        (reg:DF 108)))

where the lowpart is getting simplified to reg:DF 108, but the highpart isn't
getting simplified to reg:DF 109.  i.e.

(subreg:DF (ior:TI (ashift:TI (zero_extend:TI (subreg:DI (reg:DF 109) 0))
                              (const_int 64 [0x40]))
                              (zero_extend:TI (subreg:DI (reg:DF 108) 0))) 8)
can be simplified to just (reg:DF 109).

I'm looking into why this isn't happening.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug target/114187] [14 regression] bizarre register dance on x86_64 for pass-by-value struct since r14-2526
  2024-03-01  9:16 [Bug rtl-optimization/114187] New: [14 regression] bizarre register dance on x86_64 for pass-by-value struct matteo at mitalia dot net
                   ` (3 preceding siblings ...)
  2024-03-01 16:08 ` roger at nextmovesoftware dot com
@ 2024-03-01 19:15 ` roger at nextmovesoftware dot com
  2024-03-04  0:51 ` cvs-commit at gcc dot gnu.org
  2024-03-04 13:19 ` rguenth at gcc dot gnu.org
  6 siblings, 0 replies; 8+ messages in thread
From: roger at nextmovesoftware dot com @ 2024-03-01 19:15 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114187

Roger Sayle <roger at nextmovesoftware dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |ASSIGNED
           Assignee|unassigned at gcc dot gnu.org      |roger at nextmovesoftware dot com

--- Comment #4 from Roger Sayle <roger at nextmovesoftware dot com> ---
Created attachment 57587
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57587&action=edit
proposed patch

Proposed fix attached.  Currently bootstrapping and regression testing.  The
problematic code (from March 2023) has an interesting history.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug target/114187] [14 regression] bizarre register dance on x86_64 for pass-by-value struct since r14-2526
  2024-03-01  9:16 [Bug rtl-optimization/114187] New: [14 regression] bizarre register dance on x86_64 for pass-by-value struct matteo at mitalia dot net
                   ` (4 preceding siblings ...)
  2024-03-01 19:15 ` roger at nextmovesoftware dot com
@ 2024-03-04  0:51 ` cvs-commit at gcc dot gnu.org
  2024-03-04 13:19 ` rguenth at gcc dot gnu.org
  6 siblings, 0 replies; 8+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2024-03-04  0:51 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114187

--- Comment #5 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Roger Sayle <sayle@gcc.gnu.org>:

https://gcc.gnu.org/g:d35b5b0e0a0727cfdaba5f859e44116c33648639

commit r14-9287-gd35b5b0e0a0727cfdaba5f859e44116c33648639
Author: Roger Sayle <roger@nextmovesoftware.com>
Date:   Mon Mar 4 00:47:08 2024 +0000

    PR target/114187: Fix ?Fmode SUBREG simplification in simplify_subreg.

    This patch fixes PR target/114187 a typo/missed-optimization in
simplify-rtx
    that's exposed by (my) changes to x86_64's parameter passing.  The context
    is that construction of double word (TImode) values now uses the idiom:

    (ior:TI (ashift:TI (zero_extend:TI (reg:DI x)) (const_int 64 [0x40]))
            (zero_extend:TI (reg:DI y)))

    Extracting the DImode highpart and lowpart halves of this complex
expression
    is supported by simplications in simplify_subreg.  The problem is when the
    doubleword TImode value represents two DFmode fields, there isn't a direct
    simplification to extract the highpart or lowpart SUBREGs, but instead GCC
    uses two steps, extract the DImode {high,low} part and then cast the result
    back to a floating point mode, DFmode.

    The (buggy) code to do this is:

      /* If the outer mode is not integral, try taking a subreg with the
equivalent
         integer outer mode and then bitcasting the result.
         Other simplifications rely on integer to integer subregs and we'd
         potentially miss out on optimizations otherwise.  */
      if (known_gt (GET_MODE_SIZE (innermode),
                    GET_MODE_SIZE (outermode))
          && SCALAR_INT_MODE_P (innermode)
          && !SCALAR_INT_MODE_P (outermode)
          && int_mode_for_size (GET_MODE_BITSIZE (outermode),
                                0).exists (&int_outermode))
        {
          rtx tem = simplify_subreg (int_outermode, op, innermode, byte);
          if (tem)
            return simplify_gen_subreg (outermode, tem, int_outermode, byte);
        }

    The issue/mistake is that the second call, to simplify_subreg, shouldn't
    use "byte" as the final argument; the offset has already been handled by
    the first call, to simplify_subreg, and this second call is just a type
    conversion from an integer mode to floating point (from DImode to DFmode).

    Interestingly, this mistake was already spotted by Richard Sandiford when
    the optimization was originally contributed in January 2023.
    https://gcc.gnu.org/pipermail/gcc-patches/2023-January/610920.html
    >> Richard Sandiford writes:
    >> Also, the final line should pass 0 rather than byte.

    Unfortunately a miscommunication/misunderstanding in a later thread
    https://gcc.gnu.org/pipermail/gcc-patches/2023-February/612898.html
    resulted in this correction being undone.  Using lowpart_subreg should
    avoid/reduce confusion in future.

    2024-03-03  Roger Sayle  <roger@nextmovesoftware.com>

    gcc/ChangeLog
            PR target/114187
            * simplify-rtx.cc (simplify_context::simplify_subreg): Call
            lowpart_subreg to perform type conversion, to avoid confusion
            over the offset to use in the call to simplify_reg_subreg.

    gcc/testsuite/ChangeLog
            PR target/114187
            * g++.target/i386/pr114187.C: New test case.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug target/114187] [14 regression] bizarre register dance on x86_64 for pass-by-value struct since r14-2526
  2024-03-01  9:16 [Bug rtl-optimization/114187] New: [14 regression] bizarre register dance on x86_64 for pass-by-value struct matteo at mitalia dot net
                   ` (5 preceding siblings ...)
  2024-03-04  0:51 ` cvs-commit at gcc dot gnu.org
@ 2024-03-04 13:19 ` rguenth at gcc dot gnu.org
  6 siblings, 0 replies; 8+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-03-04 13:19 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114187

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|ASSIGNED                    |RESOLVED
         Resolution|---                         |FIXED

--- Comment #6 from Richard Biener <rguenth at gcc dot gnu.org> ---
Fixed I assume.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2024-03-04 13:19 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-03-01  9:16 [Bug rtl-optimization/114187] New: [14 regression] bizarre register dance on x86_64 for pass-by-value struct matteo at mitalia dot net
2024-03-01  9:25 ` [Bug target/114187] " pinskia at gcc dot gnu.org
2024-03-01  9:27 ` pinskia at gcc dot gnu.org
2024-03-01 12:53 ` [Bug target/114187] [14 regression] bizarre register dance on x86_64 for pass-by-value struct since r14-2526 jakub at gcc dot gnu.org
2024-03-01 16:08 ` roger at nextmovesoftware dot com
2024-03-01 19:15 ` roger at nextmovesoftware dot com
2024-03-04  0:51 ` cvs-commit at gcc dot gnu.org
2024-03-04 13:19 ` rguenth at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).