public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug rtl-optimization/114674] New: [aarch64] ldp_fusion fails to merge 2 strs due to imprecise alignment info
@ 2024-04-10  7:41 dizhao at os dot amperecomputing.com
  2024-04-10  7:43 ` [Bug rtl-optimization/114674] " dizhao at os dot amperecomputing.com
                   ` (6 more replies)
  0 siblings, 7 replies; 8+ messages in thread
From: dizhao at os dot amperecomputing.com @ 2024-04-10  7:41 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114674

            Bug ID: 114674
           Summary: [aarch64] ldp_fusion fails to merge 2 strs due to
                    imprecise alignment info
           Product: gcc
           Version: 14.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: rtl-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: dizhao at os dot amperecomputing.com
  Target Milestone: ---

For the case below:

        typedef struct {
                unsigned int f1;
                unsigned int f2;
        } test_struct;

        static test_struct ts = {
                123, 456
        };

        void foo(void)
        {
                ts.f2 = 36969 * (ts.f2 & 65535) + (ts.f1 >> 16);
                ts.f1 = 18000 * (ts.f2 & 65535) + (ts.f2 >> 16);
        }

When compiled with "-O3 --param=aarch64-stp-policy=aligned", gcc failed to fuse
the memory access instructions into stp/ldp, the dump file 312r.ldp_fusion1
says "ldp/stp policy says no". However, the accessing to ts.f1 is aligned to 64
bit:

(insn 26 23 33 2 (set (mem/c:SI (lo_sum:DI (reg/f:DI 113)
                (symbol_ref:DI ("*.LANCHOR0") [flags 0x182])) [1 ts.f1+0 S4
A64])
...

So it looks like the 2 str instructions should be fused with
aarch64-stp-policy=aligned.

After debugged this a bit, it seems the problem is ldp_bb_info::fuse_pair
changed the alignment info when calling adjust_address_nv, to rewrite the base
of ts.f1.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug rtl-optimization/114674] [aarch64] ldp_fusion fails to merge 2 strs due to imprecise alignment info
  2024-04-10  7:41 [Bug rtl-optimization/114674] New: [aarch64] ldp_fusion fails to merge 2 strs due to imprecise alignment info dizhao at os dot amperecomputing.com
@ 2024-04-10  7:43 ` dizhao at os dot amperecomputing.com
  2024-04-10  8:34 ` acoplan at gcc dot gnu.org
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: dizhao at os dot amperecomputing.com @ 2024-04-10  7:43 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114674

--- Comment #1 from Di Zhao <dizhao at os dot amperecomputing.com> ---
Here's a quick fix I tried, that works on the small test case above:

diff --git a/gcc/config/aarch64/aarch64-ldp-fusion.cc
b/gcc/config/aarch64/aarch64-ldp-fusion.cc
index 365dcf48b22..43478ede72b 100644
--- a/gcc/config/aarch64/aarch64-ldp-fusion.cc
+++ b/gcc/config/aarch64/aarch64-ldp-fusion.cc
@@ -1735,6 +1735,9 @@ ldp_bb_info::fuse_pair (bool load_p,
       rtx new_mem = adjust_address_nv (effective_base,
                                       mode_for_mem,
                                       adjust_amt);
+      // Keep original alignment info for ldp/stp policy.
+      set_mem_align (new_mem, MEM_ALIGN (change_mem));
+
       rtx new_set = load_p
        ? gen_rtx_SET (change_reg, new_mem)
        : gen_rtx_SET (new_mem, change_reg);


Is is OK?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug rtl-optimization/114674] [aarch64] ldp_fusion fails to merge 2 strs due to imprecise alignment info
  2024-04-10  7:41 [Bug rtl-optimization/114674] New: [aarch64] ldp_fusion fails to merge 2 strs due to imprecise alignment info dizhao at os dot amperecomputing.com
  2024-04-10  7:43 ` [Bug rtl-optimization/114674] " dizhao at os dot amperecomputing.com
@ 2024-04-10  8:34 ` acoplan at gcc dot gnu.org
  2024-04-10 13:27 ` acoplan at gcc dot gnu.org
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: acoplan at gcc dot gnu.org @ 2024-04-10  8:34 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114674

Alex Coplan <acoplan at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |acoplan at gcc dot gnu.org
           Keywords|                            |missed-optimization

--- Comment #2 from Alex Coplan <acoplan at gcc dot gnu.org> ---
Thanks for the report (and patch), I'll look into this.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug rtl-optimization/114674] [aarch64] ldp_fusion fails to merge 2 strs due to imprecise alignment info
  2024-04-10  7:41 [Bug rtl-optimization/114674] New: [aarch64] ldp_fusion fails to merge 2 strs due to imprecise alignment info dizhao at os dot amperecomputing.com
  2024-04-10  7:43 ` [Bug rtl-optimization/114674] " dizhao at os dot amperecomputing.com
  2024-04-10  8:34 ` acoplan at gcc dot gnu.org
@ 2024-04-10 13:27 ` acoplan at gcc dot gnu.org
  2024-04-10 14:00 ` acoplan at gcc dot gnu.org
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: acoplan at gcc dot gnu.org @ 2024-04-10 13:27 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114674

Alex Coplan <acoplan at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
     Ever confirmed|0                           |1
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2024-04-10

--- Comment #3 from Alex Coplan <acoplan at gcc dot gnu.org> ---
Confirmed.

I think it might be best to take the maximum MEM_ALIGN between the adjusted mem
(new_mem) and the original mem (change_mem).  In this case it happens that the
original mem (change_mem) has a stronger alignment guarantee, but in general it
could be the case that the adjusted mem gives a better alignment guarantee. 
Since both locations are known to point to the same address, it seems best to
me to take the largest alignment of the two.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug rtl-optimization/114674] [aarch64] ldp_fusion fails to merge 2 strs due to imprecise alignment info
  2024-04-10  7:41 [Bug rtl-optimization/114674] New: [aarch64] ldp_fusion fails to merge 2 strs due to imprecise alignment info dizhao at os dot amperecomputing.com
                   ` (2 preceding siblings ...)
  2024-04-10 13:27 ` acoplan at gcc dot gnu.org
@ 2024-04-10 14:00 ` acoplan at gcc dot gnu.org
  2024-04-11  6:40 ` dizhao at os dot amperecomputing.com
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: acoplan at gcc dot gnu.org @ 2024-04-10 14:00 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114674

Alex Coplan <acoplan at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Assignee|unassigned at gcc dot gnu.org      |acoplan at gcc dot gnu.org
             Status|NEW                         |ASSIGNED

--- Comment #4 from Alex Coplan <acoplan at gcc dot gnu.org> ---
Discussing offline with Richard S an alternative approach would be to use
replace_equiv_address[_nv] instead of adjust_address[_nv]; that way we preserve
most properties of the original mem and just take the address from the other.

In fact this is what aarch64_check_consecutive_mems already does so I think we
should follow that.

I'll try a patch along those lines for stage 1.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug rtl-optimization/114674] [aarch64] ldp_fusion fails to merge 2 strs due to imprecise alignment info
  2024-04-10  7:41 [Bug rtl-optimization/114674] New: [aarch64] ldp_fusion fails to merge 2 strs due to imprecise alignment info dizhao at os dot amperecomputing.com
                   ` (3 preceding siblings ...)
  2024-04-10 14:00 ` acoplan at gcc dot gnu.org
@ 2024-04-11  6:40 ` dizhao at os dot amperecomputing.com
  2024-05-07 13:44 ` cvs-commit at gcc dot gnu.org
  2024-05-07 13:46 ` acoplan at gcc dot gnu.org
  6 siblings, 0 replies; 8+ messages in thread
From: dizhao at os dot amperecomputing.com @ 2024-04-11  6:40 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114674

--- Comment #5 from Di Zhao <dizhao at os dot amperecomputing.com> ---
Yes it's better to take a maximum MEM_ALIGN. I've missed that. Thanks.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug rtl-optimization/114674] [aarch64] ldp_fusion fails to merge 2 strs due to imprecise alignment info
  2024-04-10  7:41 [Bug rtl-optimization/114674] New: [aarch64] ldp_fusion fails to merge 2 strs due to imprecise alignment info dizhao at os dot amperecomputing.com
                   ` (4 preceding siblings ...)
  2024-04-11  6:40 ` dizhao at os dot amperecomputing.com
@ 2024-05-07 13:44 ` cvs-commit at gcc dot gnu.org
  2024-05-07 13:46 ` acoplan at gcc dot gnu.org
  6 siblings, 0 replies; 8+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2024-05-07 13:44 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114674

--- Comment #6 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Alex Coplan <acoplan@gcc.gnu.org>:

https://gcc.gnu.org/g:74690ff96b263b8609639b7fbc5d6afd3f19cb98

commit r15-282-g74690ff96b263b8609639b7fbc5d6afd3f19cb98
Author: Alex Coplan <alex.coplan@arm.com>
Date:   Wed Apr 10 16:30:36 2024 +0100

    aarch64: Preserve mem info on change of base for ldp/stp [PR114674]

    The ldp/stp fusion pass can change the base of an access so that the two
    accesses end up using a common base register.  So far we have been using
    adjust_address_nv to do this, but this means that we don't preserve
    other properties of the mem we're replacing.  It seems better to use
    replace_equiv_address_nv, as this will preserve e.g. the MEM_ALIGN of the
    mem whose address we're changing.

    The PR shows that by adjusting the other mem we lose alignment
    information about the original access and therefore end up rejecting an
    otherwise viable pair when --param=aarch64-stp-policy=aligned is passed.
    This patch fixes that by using replace_equiv_address_nv instead.

    Notably this is the same approach as taken by
    aarch64_check_consecutive_mems when a change of base is required, so
    this at least makes things more consistent between the ldp fusion pass
    and the peepholes.

    gcc/ChangeLog:

            PR target/114674
            * config/aarch64/aarch64-ldp-fusion.cc (ldp_bb_info::fuse_pair):
            Use replace_equiv_address_nv on a change of base instead of
            adjust_address_nv on the other access.

    gcc/testsuite/ChangeLog:

            PR target/114674
            * gcc.target/aarch64/pr114674.c: New test.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug rtl-optimization/114674] [aarch64] ldp_fusion fails to merge 2 strs due to imprecise alignment info
  2024-04-10  7:41 [Bug rtl-optimization/114674] New: [aarch64] ldp_fusion fails to merge 2 strs due to imprecise alignment info dizhao at os dot amperecomputing.com
                   ` (5 preceding siblings ...)
  2024-05-07 13:44 ` cvs-commit at gcc dot gnu.org
@ 2024-05-07 13:46 ` acoplan at gcc dot gnu.org
  6 siblings, 0 replies; 8+ messages in thread
From: acoplan at gcc dot gnu.org @ 2024-05-07 13:46 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114674

Alex Coplan <acoplan at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|ASSIGNED                    |RESOLVED
         Resolution|---                         |FIXED

--- Comment #7 from Alex Coplan <acoplan at gcc dot gnu.org> ---
Fixed for GCC 15, thanks for the report.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2024-05-07 13:46 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-04-10  7:41 [Bug rtl-optimization/114674] New: [aarch64] ldp_fusion fails to merge 2 strs due to imprecise alignment info dizhao at os dot amperecomputing.com
2024-04-10  7:43 ` [Bug rtl-optimization/114674] " dizhao at os dot amperecomputing.com
2024-04-10  8:34 ` acoplan at gcc dot gnu.org
2024-04-10 13:27 ` acoplan at gcc dot gnu.org
2024-04-10 14:00 ` acoplan at gcc dot gnu.org
2024-04-11  6:40 ` dizhao at os dot amperecomputing.com
2024-05-07 13:44 ` cvs-commit at gcc dot gnu.org
2024-05-07 13:46 ` acoplan at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).