public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug rtl-optimization/110215] New: RA fails to allocate register when loop invariant lives through EH region
@ 2023-06-12 6:25 wwwhhhyyy333 at gmail dot com
2023-06-12 6:37 ` [Bug rtl-optimization/110215] RA fails to allocate register when loop invariant lives across calls pinskia at gcc dot gnu.org
` (6 more replies)
0 siblings, 7 replies; 8+ messages in thread
From: wwwhhhyyy333 at gmail dot com @ 2023-06-12 6:25 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110215
Bug ID: 110215
Summary: RA fails to allocate register when loop invariant
lives through EH region
Product: gcc
Version: 14.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: rtl-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: wwwhhhyyy333 at gmail dot com
Target Milestone: ---
Created attachment 55305
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55305&action=edit
A Testcase
Compiled with -Ofast, The innermost loop is
.L41:
movups (%rax), %xmm3
movaps (%rsp), %xmm0
addq $16, %rax
subps %xmm3, %xmm0
andps %xmm2, %xmm0
movups %xmm0, -16(%rax)
addps %xmm0, %xmm1
cmpq %rax, %rdx
jne .L41
While for Clang it produces
.LBB0_14: # Parent Loop BB0_3 Depth=1
movups (%rbp,%rax), %xmm1
movaps %xmm3, %xmm2
subps %xmm1, %xmm2
andps %xmm4, %xmm2
movups %xmm2, (%rbp,%rax)
addps %xmm2, %xmm0
addq $16, %rax
cmpq %rax, %r12
jne .LBB0_14
The loop invariant `base` was spilled to stack in GCC, but for clang it can
directly use a sse register.
Godbolt: https://godbolt.org/z/TTvG8M6E8
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug rtl-optimization/110215] RA fails to allocate register when loop invariant lives across calls
2023-06-12 6:25 [Bug rtl-optimization/110215] New: RA fails to allocate register when loop invariant lives through EH region wwwhhhyyy333 at gmail dot com
@ 2023-06-12 6:37 ` pinskia at gcc dot gnu.org
2023-06-12 6:52 ` [Bug rtl-optimization/110215] RA fails to allocate register when loop invariant lives across calls and eh pinskia at gcc dot gnu.org
` (5 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-06-12 6:37 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110215
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Summary|RA fails to allocate |RA fails to allocate
|register when loop |register when loop
|invariant lives through EH |invariant lives across
|region |calls
Keywords|ra |
--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
happens on aarch64 also:
```
.L41:
ldr q31, [x0]
ldr q29, [sp, 112]
fabd v31.4s, v29.4s, v31.4s
fadd v30.4s, v30.4s, v31.4s
str q31, [x0], 16
cmp x1, x0
bne .L41
```
Gimple level looks like:
<bb 19> [local count: 372044713]:
# vect_sum_lsm.128_11.134_88 = PHI <vect__7.142_39(19), { 0.0, 0.0, 0.0, 0.0
}(18)>
# ivtmp.155_176 = PHI <ivtmp.155_175(19), ivtmp.155_174(18)>
_173 = (void *) ivtmp.155_176;
vect__4.137_85 = MEM <vector(4) float> [(value_type &)_173];
vect__5.138_74 = vect_cst__84 - vect__4.137_85;
vect__38.139_68 = ABS_EXPR <vect__5.138_74>;
MEM <vector(4) float> [(value_type &)_173] = vect__38.139_68;
vect__7.142_39 = vect__38.139_68 + vect_sum_lsm.128_11.134_88;
ivtmp.155_175 = ivtmp.155_176 + 16;
if (_6 != ivtmp.155_175)
goto <bb 19>; [83.33%]
else
goto <bb 20>; [16.67%]
That would be vect_cst__84 .
So what I think is happening is the spilling is happening is not related at all
to EH but rather a call.
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug rtl-optimization/110215] RA fails to allocate register when loop invariant lives across calls and eh
2023-06-12 6:25 [Bug rtl-optimization/110215] New: RA fails to allocate register when loop invariant lives through EH region wwwhhhyyy333 at gmail dot com
2023-06-12 6:37 ` [Bug rtl-optimization/110215] RA fails to allocate register when loop invariant lives across calls pinskia at gcc dot gnu.org
@ 2023-06-12 6:52 ` pinskia at gcc dot gnu.org
2023-06-12 8:59 ` rguenth at gcc dot gnu.org
` (4 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-06-12 6:52 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110215
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|UNCONFIRMED |NEW
Summary|RA fails to allocate |RA fails to allocate
|register when loop |register when loop
|invariant lives across |invariant lives across
|calls |calls and eh
Ever confirmed|0 |1
Keywords| |ra
Last reconfirmed| |2023-06-12
--- Comment #2 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Reduced testcase for both x86_64 and aarch64:
```
#define vec __attribute__((vector_size(4*sizeof(float))))
struct s1
{
s1();
~s1();
};
void g();
void g(float);
void f(float a, float b, vec float **c, int n, int j)
{
s1 t2;
float t = a/b;
vec float d = {t, t, t, t};
for (int l = 0; l < j; l++)
{
vec float s = {};
for(int i =0;i<n;i++)
{
c[l][i]+=d;
s+=c[l][i];
}
float sum = s[0]+s[1]+s[2]+s[3];
g(sum);
}
g();
}
```
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug rtl-optimization/110215] RA fails to allocate register when loop invariant lives across calls and eh
2023-06-12 6:25 [Bug rtl-optimization/110215] New: RA fails to allocate register when loop invariant lives through EH region wwwhhhyyy333 at gmail dot com
2023-06-12 6:37 ` [Bug rtl-optimization/110215] RA fails to allocate register when loop invariant lives across calls pinskia at gcc dot gnu.org
2023-06-12 6:52 ` [Bug rtl-optimization/110215] RA fails to allocate register when loop invariant lives across calls and eh pinskia at gcc dot gnu.org
@ 2023-06-12 8:59 ` rguenth at gcc dot gnu.org
2023-06-14 13:20 ` vmakarov at gcc dot gnu.org
` (3 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-06-12 8:59 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110215
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |vmakarov at gcc dot gnu.org
Keywords|EH |
--- Comment #3 from Richard Biener <rguenth at gcc dot gnu.org> ---
The issue is that we fail to sink
d_29 = {t_28, t_28, t_28 t_28};
we compute a good place in select_best_block but then since it is at the
same loop depth as the original place we apply
/* If BEST_BB is at the same nesting level, then require it to have
significantly lower execution frequency to avoid gratuitous movement. */
if (bb_loop_depth (best_bb) == bb_loop_depth (early_bb)
/* If result of comparsion is unknown, prefer EARLY_BB.
Thus use !(...>=..) rather than (...<...) */
&& !(best_bb->count * 100 >= early_bb->count * threshold))
return best_bb;
and fail to sink. I'm not exactly sure why we do the above - we probably
should when best_bb post-dominates early_bb, also if the sunk stmt
possibly (or provably) will enlarge lifetime of its uses (but that's also
hard to guess since we process sinking of the defs of the uses only
afterwards). In this case we have a single use and a single def so
sinking shouldn't make things worse. We could also weight in
spilling class of a reg here.
In our case we have the dominated block with a higher(!) count than
the dominating block which means the profile is corrupt.
With --param sink-frequency-threshold we sink the ctor and the feeding
division but still get
.L5:
movq (%rbx), %rax
pxor %xmm1, %xmm1
leaq 0(%rbp,%rax), %rdx
.p2align 4,,10
.p2align 3
.L4:
movaps (%rsp), %xmm0
addps (%rax), %xmm0
addq $16, %rax
movaps %xmm0, -16(%rax)
addps %xmm0, %xmm1
cmpq %rax, %rdx
jne .L4
movaps %xmm1, %xmm0
movhlps %xmm1, %xmm0
addps %xmm0, %xmm1
movaps %xmm1, %xmm0
shufps $85, %xmm1, %xmm0
addps %xmm1, %xmm0
.LEHB1:
call _Z1gf
addq $8, %rbx
cmpq %rbx, %r12
jne .L5
because we (rightfully so) refuse to sink into the outer loop. What we
fail to do is hoist the reload out of the inner loop (I suppose
clang does exactly that).
We don't have any pass after reload that would perform loop invatiant motion,
I'm not sure how this situation is handled in general in RA - is a post-RA
pass optimizing the spill/reload placement "globally" usually done?
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug rtl-optimization/110215] RA fails to allocate register when loop invariant lives across calls and eh
2023-06-12 6:25 [Bug rtl-optimization/110215] New: RA fails to allocate register when loop invariant lives through EH region wwwhhhyyy333 at gmail dot com
` (2 preceding siblings ...)
2023-06-12 8:59 ` rguenth at gcc dot gnu.org
@ 2023-06-14 13:20 ` vmakarov at gcc dot gnu.org
2023-06-16 15:17 ` cvs-commit at gcc dot gnu.org
` (2 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: vmakarov at gcc dot gnu.org @ 2023-06-14 13:20 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110215
--- Comment #4 from Vladimir Makarov <vmakarov at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #3)
>
>
> We don't have any pass after reload that would perform loop invatiant motion,
> I'm not sure how this situation is handled in general in RA - is a post-RA
> pass optimizing the spill/reload placement "globally" usually done?
LRA does not do placement of reload insns. Global RA is supposed to do this
when it forms regions for the allocation.
I've been working on this issue. I hope the fix will be ready on this week.
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug rtl-optimization/110215] RA fails to allocate register when loop invariant lives across calls and eh
2023-06-12 6:25 [Bug rtl-optimization/110215] New: RA fails to allocate register when loop invariant lives through EH region wwwhhhyyy333 at gmail dot com
` (3 preceding siblings ...)
2023-06-14 13:20 ` vmakarov at gcc dot gnu.org
@ 2023-06-16 15:17 ` cvs-commit at gcc dot gnu.org
2023-06-27 6:00 ` wwwhhhyyy333 at gmail dot com
2023-11-09 18:25 ` cvs-commit at gcc dot gnu.org
6 siblings, 0 replies; 8+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2023-06-16 15:17 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110215
--- Comment #5 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Vladimir Makarov <vmakarov@gcc.gnu.org>:
https://gcc.gnu.org/g:154c69039571c66b3a6d16ecfa9e6ff22942f59f
commit r14-1891-g154c69039571c66b3a6d16ecfa9e6ff22942f59f
Author: Vladimir N. Makarov <vmakarov@redhat.com>
Date: Fri Jun 16 11:12:32 2023 -0400
RA: Ignore conflicts for some pseudos from insns throwing a final exception
IRA adds conflicts to the pseudos from insns can throw exceptions
internally even if the exception code is final for the function and
the pseudo value is not used in the exception code. This results in
spilling a pseudo in a loop (see
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110215).
The following patch fixes the problem.
PR rtl-optimization/110215
gcc/ChangeLog:
* ira-lives.cc: Include except.h.
(process_bb_node_lives): Ignore conflicts from cleanup exceptions
when the pseudo does not live at the exception landing pad.
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug rtl-optimization/110215] RA fails to allocate register when loop invariant lives across calls and eh
2023-06-12 6:25 [Bug rtl-optimization/110215] New: RA fails to allocate register when loop invariant lives through EH region wwwhhhyyy333 at gmail dot com
` (4 preceding siblings ...)
2023-06-16 15:17 ` cvs-commit at gcc dot gnu.org
@ 2023-06-27 6:00 ` wwwhhhyyy333 at gmail dot com
2023-11-09 18:25 ` cvs-commit at gcc dot gnu.org
6 siblings, 0 replies; 8+ messages in thread
From: wwwhhhyyy333 at gmail dot com @ 2023-06-27 6:00 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110215
--- Comment #6 from Hongyu Wang <wwwhhhyyy333 at gmail dot com> ---
Thanks for the fix, now for the attached test, main loop will not have any
load.
There is a remaining issue that the loop epilogue still contains load from
stack and constant pool
.L9:
movslq %edx, %rax
movss 72(%rsp), %xmm5
salq $2, %rax
leaq (%rbx,%rax), %rcx
movaps %xmm5, %xmm1
subss (%rcx), %xmm1
andps .LC4(%rip), %xmm1
movss %xmm1, (%rcx)
leal 1(%rdx), %ecx
addss %xmm1, %xmm0
cmpl %ecx, %r12d
jle .L8
IRA dump shows the pseudos does not have conflict but they still failed to be
allocated with register. This issue does not exist on aarch64.
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug rtl-optimization/110215] RA fails to allocate register when loop invariant lives across calls and eh
2023-06-12 6:25 [Bug rtl-optimization/110215] New: RA fails to allocate register when loop invariant lives through EH region wwwhhhyyy333 at gmail dot com
` (5 preceding siblings ...)
2023-06-27 6:00 ` wwwhhhyyy333 at gmail dot com
@ 2023-11-09 18:25 ` cvs-commit at gcc dot gnu.org
6 siblings, 0 replies; 8+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2023-11-09 18:25 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110215
--- Comment #7 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Vladimir Makarov <vmakarov@gcc.gnu.org>:
https://gcc.gnu.org/g:a99f6bb142bc4506dcb8aa2b7722310ad92e4528
commit r14-5294-ga99f6bb142bc4506dcb8aa2b7722310ad92e4528
Author: Vladimir N. Makarov <vmakarov@redhat.com>
Date: Thu Nov 9 08:51:15 2023 -0500
[IRA]: Fixing conflict calculation from region landing pads.
The following patch fixes conflict calculation from exception landing
pads. The previous patch processed only one newly created landing pad.
Besides it was wrong, it also resulted in large memory consumption by IRA.
gcc/ChangeLog:
PR rtl-optimization/110215
* ira-lives.cc: (add_conflict_from_region_landing_pads): New
function.
(process_bb_node_lives): Use it.
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2023-11-09 18:25 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-06-12 6:25 [Bug rtl-optimization/110215] New: RA fails to allocate register when loop invariant lives through EH region wwwhhhyyy333 at gmail dot com
2023-06-12 6:37 ` [Bug rtl-optimization/110215] RA fails to allocate register when loop invariant lives across calls pinskia at gcc dot gnu.org
2023-06-12 6:52 ` [Bug rtl-optimization/110215] RA fails to allocate register when loop invariant lives across calls and eh pinskia at gcc dot gnu.org
2023-06-12 8:59 ` rguenth at gcc dot gnu.org
2023-06-14 13:20 ` vmakarov at gcc dot gnu.org
2023-06-16 15:17 ` cvs-commit at gcc dot gnu.org
2023-06-27 6:00 ` wwwhhhyyy333 at gmail dot com
2023-11-09 18:25 ` cvs-commit at gcc dot gnu.org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).