public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug rtl-optimization/114674] New: [aarch64] ldp_fusion fails to merge 2 strs due to imprecise alignment info
@ 2024-04-10 7:41 dizhao at os dot amperecomputing.com
2024-04-10 7:43 ` [Bug rtl-optimization/114674] " dizhao at os dot amperecomputing.com
` (6 more replies)
0 siblings, 7 replies; 8+ messages in thread
From: dizhao at os dot amperecomputing.com @ 2024-04-10 7:41 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114674
Bug ID: 114674
Summary: [aarch64] ldp_fusion fails to merge 2 strs due to
imprecise alignment info
Product: gcc
Version: 14.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: rtl-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: dizhao at os dot amperecomputing.com
Target Milestone: ---
For the case below:
typedef struct {
unsigned int f1;
unsigned int f2;
} test_struct;
static test_struct ts = {
123, 456
};
void foo(void)
{
ts.f2 = 36969 * (ts.f2 & 65535) + (ts.f1 >> 16);
ts.f1 = 18000 * (ts.f2 & 65535) + (ts.f2 >> 16);
}
When compiled with "-O3 --param=aarch64-stp-policy=aligned", gcc failed to fuse
the memory access instructions into stp/ldp, the dump file 312r.ldp_fusion1
says "ldp/stp policy says no". However, the accessing to ts.f1 is aligned to 64
bit:
(insn 26 23 33 2 (set (mem/c:SI (lo_sum:DI (reg/f:DI 113)
(symbol_ref:DI ("*.LANCHOR0") [flags 0x182])) [1 ts.f1+0 S4
A64])
...
So it looks like the 2 str instructions should be fused with
aarch64-stp-policy=aligned.
After debugged this a bit, it seems the problem is ldp_bb_info::fuse_pair
changed the alignment info when calling adjust_address_nv, to rewrite the base
of ts.f1.
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug rtl-optimization/114674] [aarch64] ldp_fusion fails to merge 2 strs due to imprecise alignment info
2024-04-10 7:41 [Bug rtl-optimization/114674] New: [aarch64] ldp_fusion fails to merge 2 strs due to imprecise alignment info dizhao at os dot amperecomputing.com
@ 2024-04-10 7:43 ` dizhao at os dot amperecomputing.com
2024-04-10 8:34 ` acoplan at gcc dot gnu.org
` (5 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: dizhao at os dot amperecomputing.com @ 2024-04-10 7:43 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114674
--- Comment #1 from Di Zhao <dizhao at os dot amperecomputing.com> ---
Here's a quick fix I tried, that works on the small test case above:
diff --git a/gcc/config/aarch64/aarch64-ldp-fusion.cc
b/gcc/config/aarch64/aarch64-ldp-fusion.cc
index 365dcf48b22..43478ede72b 100644
--- a/gcc/config/aarch64/aarch64-ldp-fusion.cc
+++ b/gcc/config/aarch64/aarch64-ldp-fusion.cc
@@ -1735,6 +1735,9 @@ ldp_bb_info::fuse_pair (bool load_p,
rtx new_mem = adjust_address_nv (effective_base,
mode_for_mem,
adjust_amt);
+ // Keep original alignment info for ldp/stp policy.
+ set_mem_align (new_mem, MEM_ALIGN (change_mem));
+
rtx new_set = load_p
? gen_rtx_SET (change_reg, new_mem)
: gen_rtx_SET (new_mem, change_reg);
Is is OK?
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug rtl-optimization/114674] [aarch64] ldp_fusion fails to merge 2 strs due to imprecise alignment info
2024-04-10 7:41 [Bug rtl-optimization/114674] New: [aarch64] ldp_fusion fails to merge 2 strs due to imprecise alignment info dizhao at os dot amperecomputing.com
2024-04-10 7:43 ` [Bug rtl-optimization/114674] " dizhao at os dot amperecomputing.com
@ 2024-04-10 8:34 ` acoplan at gcc dot gnu.org
2024-04-10 13:27 ` acoplan at gcc dot gnu.org
` (4 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: acoplan at gcc dot gnu.org @ 2024-04-10 8:34 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114674
Alex Coplan <acoplan at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |acoplan at gcc dot gnu.org
Keywords| |missed-optimization
--- Comment #2 from Alex Coplan <acoplan at gcc dot gnu.org> ---
Thanks for the report (and patch), I'll look into this.
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug rtl-optimization/114674] [aarch64] ldp_fusion fails to merge 2 strs due to imprecise alignment info
2024-04-10 7:41 [Bug rtl-optimization/114674] New: [aarch64] ldp_fusion fails to merge 2 strs due to imprecise alignment info dizhao at os dot amperecomputing.com
2024-04-10 7:43 ` [Bug rtl-optimization/114674] " dizhao at os dot amperecomputing.com
2024-04-10 8:34 ` acoplan at gcc dot gnu.org
@ 2024-04-10 13:27 ` acoplan at gcc dot gnu.org
2024-04-10 14:00 ` acoplan at gcc dot gnu.org
` (3 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: acoplan at gcc dot gnu.org @ 2024-04-10 13:27 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114674
Alex Coplan <acoplan at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Ever confirmed|0 |1
Status|UNCONFIRMED |NEW
Last reconfirmed| |2024-04-10
--- Comment #3 from Alex Coplan <acoplan at gcc dot gnu.org> ---
Confirmed.
I think it might be best to take the maximum MEM_ALIGN between the adjusted mem
(new_mem) and the original mem (change_mem). In this case it happens that the
original mem (change_mem) has a stronger alignment guarantee, but in general it
could be the case that the adjusted mem gives a better alignment guarantee.
Since both locations are known to point to the same address, it seems best to
me to take the largest alignment of the two.
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug rtl-optimization/114674] [aarch64] ldp_fusion fails to merge 2 strs due to imprecise alignment info
2024-04-10 7:41 [Bug rtl-optimization/114674] New: [aarch64] ldp_fusion fails to merge 2 strs due to imprecise alignment info dizhao at os dot amperecomputing.com
` (2 preceding siblings ...)
2024-04-10 13:27 ` acoplan at gcc dot gnu.org
@ 2024-04-10 14:00 ` acoplan at gcc dot gnu.org
2024-04-11 6:40 ` dizhao at os dot amperecomputing.com
` (2 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: acoplan at gcc dot gnu.org @ 2024-04-10 14:00 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114674
Alex Coplan <acoplan at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Assignee|unassigned at gcc dot gnu.org |acoplan at gcc dot gnu.org
Status|NEW |ASSIGNED
--- Comment #4 from Alex Coplan <acoplan at gcc dot gnu.org> ---
Discussing offline with Richard S an alternative approach would be to use
replace_equiv_address[_nv] instead of adjust_address[_nv]; that way we preserve
most properties of the original mem and just take the address from the other.
In fact this is what aarch64_check_consecutive_mems already does so I think we
should follow that.
I'll try a patch along those lines for stage 1.
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug rtl-optimization/114674] [aarch64] ldp_fusion fails to merge 2 strs due to imprecise alignment info
2024-04-10 7:41 [Bug rtl-optimization/114674] New: [aarch64] ldp_fusion fails to merge 2 strs due to imprecise alignment info dizhao at os dot amperecomputing.com
` (3 preceding siblings ...)
2024-04-10 14:00 ` acoplan at gcc dot gnu.org
@ 2024-04-11 6:40 ` dizhao at os dot amperecomputing.com
2024-05-07 13:44 ` cvs-commit at gcc dot gnu.org
2024-05-07 13:46 ` acoplan at gcc dot gnu.org
6 siblings, 0 replies; 8+ messages in thread
From: dizhao at os dot amperecomputing.com @ 2024-04-11 6:40 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114674
--- Comment #5 from Di Zhao <dizhao at os dot amperecomputing.com> ---
Yes it's better to take a maximum MEM_ALIGN. I've missed that. Thanks.
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug rtl-optimization/114674] [aarch64] ldp_fusion fails to merge 2 strs due to imprecise alignment info
2024-04-10 7:41 [Bug rtl-optimization/114674] New: [aarch64] ldp_fusion fails to merge 2 strs due to imprecise alignment info dizhao at os dot amperecomputing.com
` (4 preceding siblings ...)
2024-04-11 6:40 ` dizhao at os dot amperecomputing.com
@ 2024-05-07 13:44 ` cvs-commit at gcc dot gnu.org
2024-05-07 13:46 ` acoplan at gcc dot gnu.org
6 siblings, 0 replies; 8+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2024-05-07 13:44 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114674
--- Comment #6 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Alex Coplan <acoplan@gcc.gnu.org>:
https://gcc.gnu.org/g:74690ff96b263b8609639b7fbc5d6afd3f19cb98
commit r15-282-g74690ff96b263b8609639b7fbc5d6afd3f19cb98
Author: Alex Coplan <alex.coplan@arm.com>
Date: Wed Apr 10 16:30:36 2024 +0100
aarch64: Preserve mem info on change of base for ldp/stp [PR114674]
The ldp/stp fusion pass can change the base of an access so that the two
accesses end up using a common base register. So far we have been using
adjust_address_nv to do this, but this means that we don't preserve
other properties of the mem we're replacing. It seems better to use
replace_equiv_address_nv, as this will preserve e.g. the MEM_ALIGN of the
mem whose address we're changing.
The PR shows that by adjusting the other mem we lose alignment
information about the original access and therefore end up rejecting an
otherwise viable pair when --param=aarch64-stp-policy=aligned is passed.
This patch fixes that by using replace_equiv_address_nv instead.
Notably this is the same approach as taken by
aarch64_check_consecutive_mems when a change of base is required, so
this at least makes things more consistent between the ldp fusion pass
and the peepholes.
gcc/ChangeLog:
PR target/114674
* config/aarch64/aarch64-ldp-fusion.cc (ldp_bb_info::fuse_pair):
Use replace_equiv_address_nv on a change of base instead of
adjust_address_nv on the other access.
gcc/testsuite/ChangeLog:
PR target/114674
* gcc.target/aarch64/pr114674.c: New test.
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug rtl-optimization/114674] [aarch64] ldp_fusion fails to merge 2 strs due to imprecise alignment info
2024-04-10 7:41 [Bug rtl-optimization/114674] New: [aarch64] ldp_fusion fails to merge 2 strs due to imprecise alignment info dizhao at os dot amperecomputing.com
` (5 preceding siblings ...)
2024-05-07 13:44 ` cvs-commit at gcc dot gnu.org
@ 2024-05-07 13:46 ` acoplan at gcc dot gnu.org
6 siblings, 0 replies; 8+ messages in thread
From: acoplan at gcc dot gnu.org @ 2024-05-07 13:46 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114674
Alex Coplan <acoplan at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|ASSIGNED |RESOLVED
Resolution|--- |FIXED
--- Comment #7 from Alex Coplan <acoplan at gcc dot gnu.org> ---
Fixed for GCC 15, thanks for the report.
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2024-05-07 13:46 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-04-10 7:41 [Bug rtl-optimization/114674] New: [aarch64] ldp_fusion fails to merge 2 strs due to imprecise alignment info dizhao at os dot amperecomputing.com
2024-04-10 7:43 ` [Bug rtl-optimization/114674] " dizhao at os dot amperecomputing.com
2024-04-10 8:34 ` acoplan at gcc dot gnu.org
2024-04-10 13:27 ` acoplan at gcc dot gnu.org
2024-04-10 14:00 ` acoplan at gcc dot gnu.org
2024-04-11 6:40 ` dizhao at os dot amperecomputing.com
2024-05-07 13:44 ` cvs-commit at gcc dot gnu.org
2024-05-07 13:46 ` acoplan at gcc dot gnu.org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).