public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug tree-optimization/106533] New: loop distribution not distributing inner loop (to memcpy) when perfect loop nest
@ 2022-08-05  7:22 vineetg at rivosinc dot com
  2022-08-05  7:32 ` [Bug tree-optimization/106533] " vineetg at rivosinc dot com
                   ` (9 more replies)
  0 siblings, 10 replies; 11+ messages in thread
From: vineetg at rivosinc dot com @ 2022-08-05  7:22 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106533

            Bug ID: 106533
           Summary: loop distribution not distributing inner loop (to
                    memcpy) when perfect loop nest
           Product: gcc
           Version: 13.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: vineetg at rivosinc dot com
  Target Milestone: ---

Created attachment 53415
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53415&action=edit
test case

When tinkering with a slightly modified version of stream benchmark [1]
observed that Loop distribution is not distributing a nested copy loop into "0
loop and 1 libcall (memcpy)".

This is with test built with -O2, mainline gcc, as of June 14, 2022: commit
6abe341558ab

Actual test is attached but the loops look like following. Loop 7 (copy) is
distributed to memcpy in general case - but not if benchmark built with #define
COPYONLY (which elides loops 8,9,10 from compilation).

-->8---
    for (j=0; j<10000000; j++) {                            // 1
        a[j] = 1.0;
        b[j] = 2.0;
        c[j] = 0.0;
    }

    for (j = 0; j < 10000000; j++)                          // 2
        a[j] = 2.0E0 * a[j];

    for (k=0; k<10; k++)                                    // 3
    {
        for (j=0; j<10000000; j++) c[j] = a[j];             // 7
#ifndef COPYONLY
        for (j=0; j<10000000; j++) b[j] = scalar*c[j];      // 8
        for (j=0; j<10000000; j++) c[j] = a[j]+b[j];        // 9
        for (j=0; j<10000000; j++) a[j] = b[j]+scalar*c[j]; // 10
#endif
    }

    for (k=1; k<10; k++)
        for (j=0; j<4; j++)                                 // 6
            avgtime[j] = avgtime[j] + times[j][k];
            ..

    for (j=0; j<4; j++)                                     // 5
        avgtime[j] = avgtime[j]/(double)(NTIMES-1);
            ..

-->8---

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug tree-optimization/106533] loop distribution not distributing inner loop (to memcpy) when perfect loop nest
  2022-08-05  7:22 [Bug tree-optimization/106533] New: loop distribution not distributing inner loop (to memcpy) when perfect loop nest vineetg at rivosinc dot com
@ 2022-08-05  7:32 ` vineetg at rivosinc dot com
  2022-08-05  7:35 ` vineetg at rivosinc dot com
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: vineetg at rivosinc dot com @ 2022-08-05  7:32 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106533

Vineet Gupta <vineetg at rivosinc dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |vineetg at rivosinc dot com

--- Comment #1 from Vineet Gupta <vineetg at rivosinc dot com> ---
I'm not familiar with actual algorithm of loop distribution, but I debugged and
found the point of divergence.

loop_distribution::execute() loops thru loops_list (cfun, LI_ONLY_INNERMOST).
The copy loop 7 (in both the builds) is processed but
prepare_perfect_loop_nest() returns different values

For single copy src loop, it deduces "perfect nesting" and returns outer loop
3. This essentially skips any further distribution of loop 7.

For multi-loop src build, prepare_perfect_loop_nest() exits early as 
outer->inner == loop fails (outer loop 3 has inner pointing to scaling loop 10,
the last loop inside it, not 7 which is first). This causes further logic to
eventually distribute it to 0 loop and memcpy.

I'm not sure if this is a bug or intended, hence this report.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug tree-optimization/106533] loop distribution not distributing inner loop (to memcpy) when perfect loop nest
  2022-08-05  7:22 [Bug tree-optimization/106533] New: loop distribution not distributing inner loop (to memcpy) when perfect loop nest vineetg at rivosinc dot com
  2022-08-05  7:32 ` [Bug tree-optimization/106533] " vineetg at rivosinc dot com
@ 2022-08-05  7:35 ` vineetg at rivosinc dot com
  2022-08-05  7:41 ` vineetg at rivosinc dot com
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: vineetg at rivosinc dot com @ 2022-08-05  7:35 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106533

--- Comment #2 from Vineet Gupta <vineetg at rivosinc dot com> ---
Original stream benchmark [1] at https://github.com/jeffhammond/STREAM

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug tree-optimization/106533] loop distribution not distributing inner loop (to memcpy) when perfect loop nest
  2022-08-05  7:22 [Bug tree-optimization/106533] New: loop distribution not distributing inner loop (to memcpy) when perfect loop nest vineetg at rivosinc dot com
  2022-08-05  7:32 ` [Bug tree-optimization/106533] " vineetg at rivosinc dot com
  2022-08-05  7:35 ` vineetg at rivosinc dot com
@ 2022-08-05  7:41 ` vineetg at rivosinc dot com
  2022-08-05  8:36 ` rguenth at gcc dot gnu.org
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: vineetg at rivosinc dot com @ 2022-08-05  7:41 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106533

--- Comment #3 from Vineet Gupta <vineetg at rivosinc dot com> ---
FWIW this was seen with riscv64 build of gcc, but same tree behavior seen with
aarch64 gcc 12.1.

For single copy-loop src, final output is inline copy loop

-->8--
  <bb 11> [local count: 1063004409]:
  # j_131 = PHI <j_92(28), 0(10)>
  # ivtmp_135 = PHI <ivtmp_88(28), 10000000(10)>
  _10 = a[j_131];
  c[j_131] = _10;
  j_92 = j_131 + 1;
  ivtmp_88 = ivtmp_135 - 1;
  if (ivtmp_88 != 0)
    goto <bb 28>; [99.00%]
  else
    goto <bb 12>; [1.00%]


.L74:
// ../stream-4-loop.c:315:          c[j] = a[j];
        ldr     q0, [x27, x0]   // MEM <vector(2) double> [(double *)&a +
ivtmp.224_247 * 1], MEM <vector(2) double> [(double *)&a + ivtmp.224_247 * 1]
        str     q0, [x19, x0]   // MEM <vector(2) double> [(double *)&a +
ivtmp.224_247 * 1], MEM <vector(2) double> [(double *)&c + ivtmp.224_247 * 1]
        add     x0, x0, 16      // ivtmp.224, ivtmp.224,
        cmp     x0, x28 // ivtmp.224, tmp291
        bne     .L74
-->8--


While for multi-loop src we see

-->8--

  MEM <unsigned char[80000000]> [(char * {ref-all})&c] = MEM <unsigned
char[80000000]> [(char * {ref-all})&a];

        bl      memcpy
-->8--

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug tree-optimization/106533] loop distribution not distributing inner loop (to memcpy) when perfect loop nest
  2022-08-05  7:22 [Bug tree-optimization/106533] New: loop distribution not distributing inner loop (to memcpy) when perfect loop nest vineetg at rivosinc dot com
                   ` (2 preceding siblings ...)
  2022-08-05  7:41 ` vineetg at rivosinc dot com
@ 2022-08-05  8:36 ` rguenth at gcc dot gnu.org
  2022-08-05 10:12 ` cvs-commit at gcc dot gnu.org
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: rguenth at gcc dot gnu.org @ 2022-08-05  8:36 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106533

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
     Ever confirmed|0                           |1
           Assignee|unassigned at gcc dot gnu.org      |rguenth at gcc dot gnu.org
           Keywords|                            |missed-optimization
             Status|UNCONFIRMED                 |ASSIGNED
   Last reconfirmed|                            |2022-08-05

--- Comment #4 from Richard Biener <rguenth at gcc dot gnu.org> ---
The code should iterate and eventually distribute inner loops of perfect loop
nests.  In fact for

void bar (int *a, int * __restrict b)
{
  for (int k = 0; k < 10; k++)
    for (int j = 0; j < 100000; ++j)
      a[j] = b[j];
}

I do see

> ./cc1 -quiet t.c -O2 -fopt-info
t.c:4:23: optimized: Loop 2 distributed: split to 0 loops and 1 library calls.

the issue is likely that

void bar (int *a, int * __restrict b)
{
  for (int k = 0; k < 10; k++)
    {
    for (int j = 0; j < 100000; ++j)
      a[j] = b[j];
    __builtin_printf ("Foo!");
    }
}

causes distribution to fail early in find_seed_stmts_for_distribution and
that's taken as fatal error.  I have a fix.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug tree-optimization/106533] loop distribution not distributing inner loop (to memcpy) when perfect loop nest
  2022-08-05  7:22 [Bug tree-optimization/106533] New: loop distribution not distributing inner loop (to memcpy) when perfect loop nest vineetg at rivosinc dot com
                   ` (3 preceding siblings ...)
  2022-08-05  8:36 ` rguenth at gcc dot gnu.org
@ 2022-08-05 10:12 ` cvs-commit at gcc dot gnu.org
  2022-08-05 10:12 ` rguenth at gcc dot gnu.org
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2022-08-05 10:12 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106533

--- Comment #5 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Richard Biener <rguenth@gcc.gnu.org>:

https://gcc.gnu.org/g:36bc2a8f24f9c8f6eb2c579d520d7fc73a113ae1

commit r13-1972-g36bc2a8f24f9c8f6eb2c579d520d7fc73a113ae1
Author: Richard Biener <rguenther@suse.de>
Date:   Fri Aug 5 10:40:18 2022 +0200

    tree-optimization/106533 - loop distribution of inner loop of nest

    Loop distribution currently gives up if the outer loop of a loop
    nest it analyzes contains a stmt with side-effects instead of
    continuing to analyze the innermost loop.  The following fixes that
    by continuing anyway.

            PR tree-optimization/106533
            * tree-loop-distribution.cc (loop_distribution::execute): Continue
            analyzing the inner loops when find_seed_stmts_for_distribution
            fails.

            * gcc.dg/tree-ssa/ldist-39.c: New testcase.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug tree-optimization/106533] loop distribution not distributing inner loop (to memcpy) when perfect loop nest
  2022-08-05  7:22 [Bug tree-optimization/106533] New: loop distribution not distributing inner loop (to memcpy) when perfect loop nest vineetg at rivosinc dot com
                   ` (4 preceding siblings ...)
  2022-08-05 10:12 ` cvs-commit at gcc dot gnu.org
@ 2022-08-05 10:12 ` rguenth at gcc dot gnu.org
  2022-08-16  4:40 ` vineetg at rivosinc dot com
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: rguenth at gcc dot gnu.org @ 2022-08-05 10:12 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106533

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Resolution|---                         |FIXED
             Status|ASSIGNED                    |RESOLVED

--- Comment #6 from Richard Biener <rguenth at gcc dot gnu.org> ---
Fixed for GCC 13.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug tree-optimization/106533] loop distribution not distributing inner loop (to memcpy) when perfect loop nest
  2022-08-05  7:22 [Bug tree-optimization/106533] New: loop distribution not distributing inner loop (to memcpy) when perfect loop nest vineetg at rivosinc dot com
                   ` (5 preceding siblings ...)
  2022-08-05 10:12 ` rguenth at gcc dot gnu.org
@ 2022-08-16  4:40 ` vineetg at rivosinc dot com
  2022-08-16  7:16 ` rguenth at gcc dot gnu.org
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: vineetg at rivosinc dot com @ 2022-08-16  4:40 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106533

--- Comment #7 from Vineet Gupta <vineetg at rivosinc dot com> ---
Can this be back-ported to gcc 12 please ?

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug tree-optimization/106533] loop distribution not distributing inner loop (to memcpy) when perfect loop nest
  2022-08-05  7:22 [Bug tree-optimization/106533] New: loop distribution not distributing inner loop (to memcpy) when perfect loop nest vineetg at rivosinc dot com
                   ` (6 preceding siblings ...)
  2022-08-16  4:40 ` vineetg at rivosinc dot com
@ 2022-08-16  7:16 ` rguenth at gcc dot gnu.org
  2022-08-16 16:33 ` vineetg at rivosinc dot com
  2022-08-16 16:48 ` pinskia at gcc dot gnu.org
  9 siblings, 0 replies; 11+ messages in thread
From: rguenth at gcc dot gnu.org @ 2022-08-16  7:16 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106533

--- Comment #8 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Vineet Gupta from comment #7)
> Can this be back-ported to gcc 12 please ?

Do you have an indication that this is a regression?

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug tree-optimization/106533] loop distribution not distributing inner loop (to memcpy) when perfect loop nest
  2022-08-05  7:22 [Bug tree-optimization/106533] New: loop distribution not distributing inner loop (to memcpy) when perfect loop nest vineetg at rivosinc dot com
                   ` (7 preceding siblings ...)
  2022-08-16  7:16 ` rguenth at gcc dot gnu.org
@ 2022-08-16 16:33 ` vineetg at rivosinc dot com
  2022-08-16 16:48 ` pinskia at gcc dot gnu.org
  9 siblings, 0 replies; 11+ messages in thread
From: vineetg at rivosinc dot com @ 2022-08-16 16:33 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106533

--- Comment #9 from Vineet Gupta <vineetg at rivosinc dot com> ---
> > Can this be back-ported to gcc 12 please ?
> 
> Do you have an indication that this is a regression?

No, but this does seem like a bug, aren't those backported ?

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug tree-optimization/106533] loop distribution not distributing inner loop (to memcpy) when perfect loop nest
  2022-08-05  7:22 [Bug tree-optimization/106533] New: loop distribution not distributing inner loop (to memcpy) when perfect loop nest vineetg at rivosinc dot com
                   ` (8 preceding siblings ...)
  2022-08-16 16:33 ` vineetg at rivosinc dot com
@ 2022-08-16 16:48 ` pinskia at gcc dot gnu.org
  9 siblings, 0 replies; 11+ messages in thread
From: pinskia at gcc dot gnu.org @ 2022-08-16 16:48 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106533

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
      Known to fail|                            |5.1.0

--- Comment #10 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
(In reply to Vineet Gupta from comment #9)
> > > Can this be back-ported to gcc 12 please ?
> > 
> > Do you have an indication that this is a regression?
> 
> No, but this does seem like a bug, aren't those backported ?

Only regressions are backported really; especially when it comes to
optimizations.

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2022-08-16 16:48 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-05  7:22 [Bug tree-optimization/106533] New: loop distribution not distributing inner loop (to memcpy) when perfect loop nest vineetg at rivosinc dot com
2022-08-05  7:32 ` [Bug tree-optimization/106533] " vineetg at rivosinc dot com
2022-08-05  7:35 ` vineetg at rivosinc dot com
2022-08-05  7:41 ` vineetg at rivosinc dot com
2022-08-05  8:36 ` rguenth at gcc dot gnu.org
2022-08-05 10:12 ` cvs-commit at gcc dot gnu.org
2022-08-05 10:12 ` rguenth at gcc dot gnu.org
2022-08-16  4:40 ` vineetg at rivosinc dot com
2022-08-16  7:16 ` rguenth at gcc dot gnu.org
2022-08-16 16:33 ` vineetg at rivosinc dot com
2022-08-16 16:48 ` pinskia at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).