public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/114991] New: [14/15 Regression] AArch64: LDP pass does not handle some structure copies
@ 2024-05-08 17:25 wilco at gcc dot gnu.org
  2024-05-08 17:25 ` [Bug target/114991] " wilco at gcc dot gnu.org
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: wilco at gcc dot gnu.org @ 2024-05-08 17:25 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114991

            Bug ID: 114991
           Summary: [14/15 Regression] AArch64: LDP pass does not handle
                    some structure copies
           Product: gcc
           Version: 14.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: wilco at gcc dot gnu.org
  Target Milestone: ---

The following example no longer emits LDP/STP since GCC14:

#include <string.h>

typedef struct { int arr[20]; } S;

void g (S *);
void h (S);
void f(int x)
{
  S s;
  g (&s);
  h (s);
}

f:
        stp     x29, x30, [sp, -176]!
        add     x1, sp, 96
        mov     x29, sp
        add     x0, sp, 16
        ldp     q29, q31, [x1]
        ldr     q30, [x1, 32]
        str     q29, [sp, 16]
        ldr     q29, [x1, 48]
        str     q31, [x0, 16]
        ldr     q31, [x1, 64]
        stp     q30, q29, [x0, 32]
        str     q31, [x0, 64]
        bl      h
        ldp     x29, x30, [sp], 176
        ret

The expansions for memcpy/move/memset no longer emit LDP directly in RTL and
now rely on the new LDP pass. Stack based loads/stores seem to confuse its
alias checks and it gives up.

Using -fno-schedule-insns fixes this example, but not all cases.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug target/114991] [14/15 Regression] AArch64: LDP pass does not handle some structure copies
  2024-05-08 17:25 [Bug target/114991] New: [14/15 Regression] AArch64: LDP pass does not handle some structure copies wilco at gcc dot gnu.org
@ 2024-05-08 17:25 ` wilco at gcc dot gnu.org
  2024-05-09 13:49 ` acoplan at gcc dot gnu.org
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: wilco at gcc dot gnu.org @ 2024-05-08 17:25 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114991

Wilco <wilco at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Target|                            |aarch64-*-*
   Target Milestone|---                         |15.0

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug target/114991] [14/15 Regression] AArch64: LDP pass does not handle some structure copies
  2024-05-08 17:25 [Bug target/114991] New: [14/15 Regression] AArch64: LDP pass does not handle some structure copies wilco at gcc dot gnu.org
  2024-05-08 17:25 ` [Bug target/114991] " wilco at gcc dot gnu.org
@ 2024-05-09 13:49 ` acoplan at gcc dot gnu.org
  2024-05-09 15:55 ` acoplan at gcc dot gnu.org
  2024-05-09 15:58 ` acoplan at gcc dot gnu.org
  3 siblings, 0 replies; 5+ messages in thread
From: acoplan at gcc dot gnu.org @ 2024-05-09 13:49 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114991

Alex Coplan <acoplan at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Last reconfirmed|                            |2024-05-09
             Status|UNCONFIRMED                 |NEW
                 CC|                            |acoplan at gcc dot gnu.org,
                   |                            |vmakarov at gcc dot gnu.org
     Ever confirmed|0                           |1
           Keywords|                            |missed-optimization, ra

--- Comment #1 from Alex Coplan <acoplan at gcc dot gnu.org> ---
Confirmed.  There is a lot to unpack here.  Of course, the include isn't needed
in this testcase and the problem can be seen more clearly with a slightly
smaller array size:

typedef struct { int arr[16]; } S;

void g (S *);
void h (S);
void f(int x)
{
  S s;
  g (&s);
  h (s);
}

In this case sizeof(S) = 64 so we should be able to do the copy with 2 LDPs + 2
STPs.

So just for clarity, the missed ldp/stp started when we turned off the early
ldp/stp formation in memcpy expansion, i.e. with
r14-9373-g19b23bf3c32df3cbb96b3d898a1d7142f7bea4a0 .

However, things already started to regress earlier for this testcase with
r14-4944-gf55cdce3f8dd8503e080e35be59c5f5390f6d95e i.e.

commit f55cdce3f8dd8503e080e35be59c5f5390f6d95e
Author: Vladimir N. Makarov <vmakarov@redhat.com>
Date:   Thu Oct 26 14:50:40 2023

    [RA]: Modfify cost calculation for dealing with equivalences

before that RA change we get:

f:
        stp     x29, x30, [sp, -144]!
        mov     x29, sp
        add     x0, sp, 80
        bl      g
        ldp     q29, q28, [sp, 80]
        add     x0, sp, 16
        ldp     q31, q30, [sp, 112]
        stp     q29, q28, [sp, 16]
        stp     q31, q30, [sp, 48]
        bl      h
        ldp     x29, x30, [sp], 144
        ret

and afterwards we get:

f:
        stp     x29, x30, [sp, -160]!
        mov     x29, sp
        str     x19, [sp, 16]
        add     x19, sp, 96
        mov     x0, x19
        bl      g
        add     x0, sp, 32
        ldp     q29, q28, [x19]
        ldp     q31, q30, [x19, 32]
        stp     q29, q28, [x0]
        stp     q31, q30, [x0, 32]
        bl      h
        ldr     x19, [sp, 16]
        ldp     x29, x30, [sp], 160
        ret

which is really not great as now we have a save/restore of x19 and the accesses
end up using different (non-sp) registers which I suspect doesn't help with the
ldp/stp formation (on trunk).

I will try to give a detailed analysis on what goes wrong with the ldp/stp
formation at the RTL level shortly (there are a lot of different issues), but I
think that RA change is a contributing factor.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug target/114991] [14/15 Regression] AArch64: LDP pass does not handle some structure copies
  2024-05-08 17:25 [Bug target/114991] New: [14/15 Regression] AArch64: LDP pass does not handle some structure copies wilco at gcc dot gnu.org
  2024-05-08 17:25 ` [Bug target/114991] " wilco at gcc dot gnu.org
  2024-05-09 13:49 ` acoplan at gcc dot gnu.org
@ 2024-05-09 15:55 ` acoplan at gcc dot gnu.org
  2024-05-09 15:58 ` acoplan at gcc dot gnu.org
  3 siblings, 0 replies; 5+ messages in thread
From: acoplan at gcc dot gnu.org @ 2024-05-09 15:55 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114991

--- Comment #2 from Alex Coplan <acoplan at gcc dot gnu.org> ---
Here is some analysis on why we miss some of these opportunities in ldp_fusion.
So initially in 267r.vregs we have some very clean RTL:

    6: r101:DI=sfp:DI-0x40
    7: x0:DI=r101:DI
    8: call [`g'] argc:0
      REG_CALL_DECL `g'
    9: r102:DI=sfp:DI-0x80
   10: r103:DI=sfp:DI-0x40
   11: r104:V4SI=[r103:DI]
   13: r105:V4SI=[r103:DI+0x10]
   15: r106:V4SI=[r103:DI+0x20]
   17: r107:V4SI=[r103:DI+0x30]
   12: [r102:DI]=r104:V4SI
   14: [r102:DI+0x10]=r105:V4SI
   16: [r102:DI+0x20]=r106:V4SI
   18: [r102:DI+0x30]=r107:V4SI

if were to run the ldp/stp pass on this it should form the pairs without a
problem.  Of course things go downhill from here.  The first slightly strange
thing is that fwprop propagates the sfp into the first of each group of
accesses (i.e. with offset 0), but not the others:

    9: r102:DI=sfp:DI-0x80
   11: r104:V4SI=[sfp:DI-0x40]
   13: r105:V4SI=[r101:DI+0x10]
   15: r106:V4SI=[r101:DI+0x20]
   17: r107:V4SI=[r101:DI+0x30]
      REG_DEAD r103:DI
   12: [sfp:DI-0x80]=r104:V4SI
   14: [r102:DI+0x10]=r105:V4SI
      REG_DEAD r105:V4SI
   16: [r102:DI+0x20]=r106:V4SI
      REG_DEAD r106:V4SI
   18: [r102:DI+0x30]=r107:V4SI

the RTL then stays mostly unchanged until sched1, where things really start to
go downhill:

   11: r104:V4SI=[sfp:DI-0x40]
    9: r102:DI=sfp:DI-0x80
   13: r105:V4SI=[r101:DI+0x10]
   20: x0:DI=r102:DI
      REG_DEAD r102:DI
      REG_EQUAL sfp:DI-0x80
   15: r106:V4SI=[r101:DI+0x20]
   12: [sfp:DI-0x80]=r104:V4SI
      REG_DEAD r104:V4SI
   17: r107:V4SI=[r101:DI+0x30]
      REG_DEAD r101:DI
   14: [r102:DI+0x10]=r105:V4SI
      REG_DEAD r105:V4SI
   16: [r102:DI+0x20]=r106:V4SI
      REG_DEAD r106:V4SI
   18: [r102:DI+0x30]=r107:V4SI

here the first of the stores (i12) has been moved up between the last pair of
loads (i15, i17).  Now the interesting thing is how sched1 knows that it is
safe to perform this transformation.  In the ldp_fusion1 pass we miss this pair
because we think that the loads may alias with i12:

cannot form pair (15,17) due to alias conflicts (12,12)

so it would be good to look into how our alias analysis differs from what
sched1 is doing.  It's worth further noting that while the loads have MEM_EXPR
information (they point to the var_decl for s) the stores do not.  Presumably
this is because the copy of s mandated by the ABI doesn't necessarily have a
tree decl representation that the MEM_EXPRs could point to.

Separately to the aliasing issue, because:
 - there is no MEM_EXPR information for the stores, and
 - forwprop1 substituted the sfp in for the first store
ldp_fusion fails to discover the (i12,i14) store pair opportunity.  As a result
we unfortunately end up forming an stp in the middle.

Interestingly if I turn off fwprop1 then we still fail to form the
(12,14) pair due to aliasing.

So it seems the main thing to investigate is how sched1 does its alias
analysis and how that differs from what we're doing in ldp_fusion.

I have some WIP patches that should improve the pair discovery and could
potentially be extended to help with the case of the (12,14) pair here.
Another thing that could help with that is if we populated the MEM_EXPR for the
stores of the structure copy.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug target/114991] [14/15 Regression] AArch64: LDP pass does not handle some structure copies
  2024-05-08 17:25 [Bug target/114991] New: [14/15 Regression] AArch64: LDP pass does not handle some structure copies wilco at gcc dot gnu.org
                   ` (2 preceding siblings ...)
  2024-05-09 15:55 ` acoplan at gcc dot gnu.org
@ 2024-05-09 15:58 ` acoplan at gcc dot gnu.org
  3 siblings, 0 replies; 5+ messages in thread
From: acoplan at gcc dot gnu.org @ 2024-05-09 15:58 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114991

Alex Coplan <acoplan at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |ASSIGNED
           Assignee|unassigned at gcc dot gnu.org      |acoplan at gcc dot gnu.org

--- Comment #3 from Alex Coplan <acoplan at gcc dot gnu.org> ---
Mine for the aliasing issues/investigation, might be worth splitting off the RA
problem to track that separately.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2024-05-09 15:58 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-05-08 17:25 [Bug target/114991] New: [14/15 Regression] AArch64: LDP pass does not handle some structure copies wilco at gcc dot gnu.org
2024-05-08 17:25 ` [Bug target/114991] " wilco at gcc dot gnu.org
2024-05-09 13:49 ` acoplan at gcc dot gnu.org
2024-05-09 15:55 ` acoplan at gcc dot gnu.org
2024-05-09 15:58 ` acoplan at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).