public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug tree-optimization/106533] New: loop distribution not distributing inner loop (to memcpy) when perfect loop nest
@ 2022-08-05 7:22 vineetg at rivosinc dot com
2022-08-05 7:32 ` [Bug tree-optimization/106533] " vineetg at rivosinc dot com
` (9 more replies)
0 siblings, 10 replies; 11+ messages in thread
From: vineetg at rivosinc dot com @ 2022-08-05 7:22 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106533
Bug ID: 106533
Summary: loop distribution not distributing inner loop (to
memcpy) when perfect loop nest
Product: gcc
Version: 13.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: vineetg at rivosinc dot com
Target Milestone: ---
Created attachment 53415
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53415&action=edit
test case
When tinkering with a slightly modified version of stream benchmark [1]
observed that Loop distribution is not distributing a nested copy loop into "0
loop and 1 libcall (memcpy)".
This is with test built with -O2, mainline gcc, as of June 14, 2022: commit
6abe341558ab
Actual test is attached but the loops look like following. Loop 7 (copy) is
distributed to memcpy in general case - but not if benchmark built with #define
COPYONLY (which elides loops 8,9,10 from compilation).
-->8---
for (j=0; j<10000000; j++) { // 1
a[j] = 1.0;
b[j] = 2.0;
c[j] = 0.0;
}
for (j = 0; j < 10000000; j++) // 2
a[j] = 2.0E0 * a[j];
for (k=0; k<10; k++) // 3
{
for (j=0; j<10000000; j++) c[j] = a[j]; // 7
#ifndef COPYONLY
for (j=0; j<10000000; j++) b[j] = scalar*c[j]; // 8
for (j=0; j<10000000; j++) c[j] = a[j]+b[j]; // 9
for (j=0; j<10000000; j++) a[j] = b[j]+scalar*c[j]; // 10
#endif
}
for (k=1; k<10; k++)
for (j=0; j<4; j++) // 6
avgtime[j] = avgtime[j] + times[j][k];
..
for (j=0; j<4; j++) // 5
avgtime[j] = avgtime[j]/(double)(NTIMES-1);
..
-->8---
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug tree-optimization/106533] loop distribution not distributing inner loop (to memcpy) when perfect loop nest
2022-08-05 7:22 [Bug tree-optimization/106533] New: loop distribution not distributing inner loop (to memcpy) when perfect loop nest vineetg at rivosinc dot com
@ 2022-08-05 7:32 ` vineetg at rivosinc dot com
2022-08-05 7:35 ` vineetg at rivosinc dot com
` (8 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: vineetg at rivosinc dot com @ 2022-08-05 7:32 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106533
Vineet Gupta <vineetg at rivosinc dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |vineetg at rivosinc dot com
--- Comment #1 from Vineet Gupta <vineetg at rivosinc dot com> ---
I'm not familiar with actual algorithm of loop distribution, but I debugged and
found the point of divergence.
loop_distribution::execute() loops thru loops_list (cfun, LI_ONLY_INNERMOST).
The copy loop 7 (in both the builds) is processed but
prepare_perfect_loop_nest() returns different values
For single copy src loop, it deduces "perfect nesting" and returns outer loop
3. This essentially skips any further distribution of loop 7.
For multi-loop src build, prepare_perfect_loop_nest() exits early as
outer->inner == loop fails (outer loop 3 has inner pointing to scaling loop 10,
the last loop inside it, not 7 which is first). This causes further logic to
eventually distribute it to 0 loop and memcpy.
I'm not sure if this is a bug or intended, hence this report.
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug tree-optimization/106533] loop distribution not distributing inner loop (to memcpy) when perfect loop nest
2022-08-05 7:22 [Bug tree-optimization/106533] New: loop distribution not distributing inner loop (to memcpy) when perfect loop nest vineetg at rivosinc dot com
2022-08-05 7:32 ` [Bug tree-optimization/106533] " vineetg at rivosinc dot com
@ 2022-08-05 7:35 ` vineetg at rivosinc dot com
2022-08-05 7:41 ` vineetg at rivosinc dot com
` (7 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: vineetg at rivosinc dot com @ 2022-08-05 7:35 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106533
--- Comment #2 from Vineet Gupta <vineetg at rivosinc dot com> ---
Original stream benchmark [1] at https://github.com/jeffhammond/STREAM
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug tree-optimization/106533] loop distribution not distributing inner loop (to memcpy) when perfect loop nest
2022-08-05 7:22 [Bug tree-optimization/106533] New: loop distribution not distributing inner loop (to memcpy) when perfect loop nest vineetg at rivosinc dot com
2022-08-05 7:32 ` [Bug tree-optimization/106533] " vineetg at rivosinc dot com
2022-08-05 7:35 ` vineetg at rivosinc dot com
@ 2022-08-05 7:41 ` vineetg at rivosinc dot com
2022-08-05 8:36 ` rguenth at gcc dot gnu.org
` (6 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: vineetg at rivosinc dot com @ 2022-08-05 7:41 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106533
--- Comment #3 from Vineet Gupta <vineetg at rivosinc dot com> ---
FWIW this was seen with riscv64 build of gcc, but same tree behavior seen with
aarch64 gcc 12.1.
For single copy-loop src, final output is inline copy loop
-->8--
<bb 11> [local count: 1063004409]:
# j_131 = PHI <j_92(28), 0(10)>
# ivtmp_135 = PHI <ivtmp_88(28), 10000000(10)>
_10 = a[j_131];
c[j_131] = _10;
j_92 = j_131 + 1;
ivtmp_88 = ivtmp_135 - 1;
if (ivtmp_88 != 0)
goto <bb 28>; [99.00%]
else
goto <bb 12>; [1.00%]
.L74:
// ../stream-4-loop.c:315: c[j] = a[j];
ldr q0, [x27, x0] // MEM <vector(2) double> [(double *)&a +
ivtmp.224_247 * 1], MEM <vector(2) double> [(double *)&a + ivtmp.224_247 * 1]
str q0, [x19, x0] // MEM <vector(2) double> [(double *)&a +
ivtmp.224_247 * 1], MEM <vector(2) double> [(double *)&c + ivtmp.224_247 * 1]
add x0, x0, 16 // ivtmp.224, ivtmp.224,
cmp x0, x28 // ivtmp.224, tmp291
bne .L74
-->8--
While for multi-loop src we see
-->8--
MEM <unsigned char[80000000]> [(char * {ref-all})&c] = MEM <unsigned
char[80000000]> [(char * {ref-all})&a];
bl memcpy
-->8--
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug tree-optimization/106533] loop distribution not distributing inner loop (to memcpy) when perfect loop nest
2022-08-05 7:22 [Bug tree-optimization/106533] New: loop distribution not distributing inner loop (to memcpy) when perfect loop nest vineetg at rivosinc dot com
` (2 preceding siblings ...)
2022-08-05 7:41 ` vineetg at rivosinc dot com
@ 2022-08-05 8:36 ` rguenth at gcc dot gnu.org
2022-08-05 10:12 ` cvs-commit at gcc dot gnu.org
` (5 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: rguenth at gcc dot gnu.org @ 2022-08-05 8:36 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106533
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Ever confirmed|0 |1
Assignee|unassigned at gcc dot gnu.org |rguenth at gcc dot gnu.org
Keywords| |missed-optimization
Status|UNCONFIRMED |ASSIGNED
Last reconfirmed| |2022-08-05
--- Comment #4 from Richard Biener <rguenth at gcc dot gnu.org> ---
The code should iterate and eventually distribute inner loops of perfect loop
nests. In fact for
void bar (int *a, int * __restrict b)
{
for (int k = 0; k < 10; k++)
for (int j = 0; j < 100000; ++j)
a[j] = b[j];
}
I do see
> ./cc1 -quiet t.c -O2 -fopt-info
t.c:4:23: optimized: Loop 2 distributed: split to 0 loops and 1 library calls.
the issue is likely that
void bar (int *a, int * __restrict b)
{
for (int k = 0; k < 10; k++)
{
for (int j = 0; j < 100000; ++j)
a[j] = b[j];
__builtin_printf ("Foo!");
}
}
causes distribution to fail early in find_seed_stmts_for_distribution and
that's taken as fatal error. I have a fix.
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug tree-optimization/106533] loop distribution not distributing inner loop (to memcpy) when perfect loop nest
2022-08-05 7:22 [Bug tree-optimization/106533] New: loop distribution not distributing inner loop (to memcpy) when perfect loop nest vineetg at rivosinc dot com
` (3 preceding siblings ...)
2022-08-05 8:36 ` rguenth at gcc dot gnu.org
@ 2022-08-05 10:12 ` cvs-commit at gcc dot gnu.org
2022-08-05 10:12 ` rguenth at gcc dot gnu.org
` (4 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2022-08-05 10:12 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106533
--- Comment #5 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Richard Biener <rguenth@gcc.gnu.org>:
https://gcc.gnu.org/g:36bc2a8f24f9c8f6eb2c579d520d7fc73a113ae1
commit r13-1972-g36bc2a8f24f9c8f6eb2c579d520d7fc73a113ae1
Author: Richard Biener <rguenther@suse.de>
Date: Fri Aug 5 10:40:18 2022 +0200
tree-optimization/106533 - loop distribution of inner loop of nest
Loop distribution currently gives up if the outer loop of a loop
nest it analyzes contains a stmt with side-effects instead of
continuing to analyze the innermost loop. The following fixes that
by continuing anyway.
PR tree-optimization/106533
* tree-loop-distribution.cc (loop_distribution::execute): Continue
analyzing the inner loops when find_seed_stmts_for_distribution
fails.
* gcc.dg/tree-ssa/ldist-39.c: New testcase.
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug tree-optimization/106533] loop distribution not distributing inner loop (to memcpy) when perfect loop nest
2022-08-05 7:22 [Bug tree-optimization/106533] New: loop distribution not distributing inner loop (to memcpy) when perfect loop nest vineetg at rivosinc dot com
` (4 preceding siblings ...)
2022-08-05 10:12 ` cvs-commit at gcc dot gnu.org
@ 2022-08-05 10:12 ` rguenth at gcc dot gnu.org
2022-08-16 4:40 ` vineetg at rivosinc dot com
` (3 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: rguenth at gcc dot gnu.org @ 2022-08-05 10:12 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106533
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Resolution|--- |FIXED
Status|ASSIGNED |RESOLVED
--- Comment #6 from Richard Biener <rguenth at gcc dot gnu.org> ---
Fixed for GCC 13.
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug tree-optimization/106533] loop distribution not distributing inner loop (to memcpy) when perfect loop nest
2022-08-05 7:22 [Bug tree-optimization/106533] New: loop distribution not distributing inner loop (to memcpy) when perfect loop nest vineetg at rivosinc dot com
` (5 preceding siblings ...)
2022-08-05 10:12 ` rguenth at gcc dot gnu.org
@ 2022-08-16 4:40 ` vineetg at rivosinc dot com
2022-08-16 7:16 ` rguenth at gcc dot gnu.org
` (2 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: vineetg at rivosinc dot com @ 2022-08-16 4:40 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106533
--- Comment #7 from Vineet Gupta <vineetg at rivosinc dot com> ---
Can this be back-ported to gcc 12 please ?
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug tree-optimization/106533] loop distribution not distributing inner loop (to memcpy) when perfect loop nest
2022-08-05 7:22 [Bug tree-optimization/106533] New: loop distribution not distributing inner loop (to memcpy) when perfect loop nest vineetg at rivosinc dot com
` (6 preceding siblings ...)
2022-08-16 4:40 ` vineetg at rivosinc dot com
@ 2022-08-16 7:16 ` rguenth at gcc dot gnu.org
2022-08-16 16:33 ` vineetg at rivosinc dot com
2022-08-16 16:48 ` pinskia at gcc dot gnu.org
9 siblings, 0 replies; 11+ messages in thread
From: rguenth at gcc dot gnu.org @ 2022-08-16 7:16 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106533
--- Comment #8 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Vineet Gupta from comment #7)
> Can this be back-ported to gcc 12 please ?
Do you have an indication that this is a regression?
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug tree-optimization/106533] loop distribution not distributing inner loop (to memcpy) when perfect loop nest
2022-08-05 7:22 [Bug tree-optimization/106533] New: loop distribution not distributing inner loop (to memcpy) when perfect loop nest vineetg at rivosinc dot com
` (7 preceding siblings ...)
2022-08-16 7:16 ` rguenth at gcc dot gnu.org
@ 2022-08-16 16:33 ` vineetg at rivosinc dot com
2022-08-16 16:48 ` pinskia at gcc dot gnu.org
9 siblings, 0 replies; 11+ messages in thread
From: vineetg at rivosinc dot com @ 2022-08-16 16:33 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106533
--- Comment #9 from Vineet Gupta <vineetg at rivosinc dot com> ---
> > Can this be back-ported to gcc 12 please ?
>
> Do you have an indication that this is a regression?
No, but this does seem like a bug, aren't those backported ?
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug tree-optimization/106533] loop distribution not distributing inner loop (to memcpy) when perfect loop nest
2022-08-05 7:22 [Bug tree-optimization/106533] New: loop distribution not distributing inner loop (to memcpy) when perfect loop nest vineetg at rivosinc dot com
` (8 preceding siblings ...)
2022-08-16 16:33 ` vineetg at rivosinc dot com
@ 2022-08-16 16:48 ` pinskia at gcc dot gnu.org
9 siblings, 0 replies; 11+ messages in thread
From: pinskia at gcc dot gnu.org @ 2022-08-16 16:48 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106533
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Known to fail| |5.1.0
--- Comment #10 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
(In reply to Vineet Gupta from comment #9)
> > > Can this be back-ported to gcc 12 please ?
> >
> > Do you have an indication that this is a regression?
>
> No, but this does seem like a bug, aren't those backported ?
Only regressions are backported really; especially when it comes to
optimizations.
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2022-08-16 16:48 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-05 7:22 [Bug tree-optimization/106533] New: loop distribution not distributing inner loop (to memcpy) when perfect loop nest vineetg at rivosinc dot com
2022-08-05 7:32 ` [Bug tree-optimization/106533] " vineetg at rivosinc dot com
2022-08-05 7:35 ` vineetg at rivosinc dot com
2022-08-05 7:41 ` vineetg at rivosinc dot com
2022-08-05 8:36 ` rguenth at gcc dot gnu.org
2022-08-05 10:12 ` cvs-commit at gcc dot gnu.org
2022-08-05 10:12 ` rguenth at gcc dot gnu.org
2022-08-16 4:40 ` vineetg at rivosinc dot com
2022-08-16 7:16 ` rguenth at gcc dot gnu.org
2022-08-16 16:33 ` vineetg at rivosinc dot com
2022-08-16 16:48 ` pinskia at gcc dot gnu.org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).