public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug tree-optimization/94828] New: Failure to merge merge-able loops
@ 2020-04-28 17:49 gabravier at gmail dot com
  2020-04-28 21:52 ` [Bug tree-optimization/94828] Loop fusion is not implemented pinskia at gcc dot gnu.org
  2020-04-29  7:02 ` [Bug tree-optimization/94828] Loop fusion is not implemented outside of ISL rguenth at gcc dot gnu.org
  0 siblings, 2 replies; 3+ messages in thread
From: gabravier at gmail dot com @ 2020-04-28 17:49 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94828

            Bug ID: 94828
           Summary: Failure to merge merge-able loops
           Product: gcc
           Version: 10.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: gabravier at gmail dot com
  Target Milestone: ---

void f(int *__restrict a, int *__restrict b, size_t sz)
{
    for (int i = 0; i < sz; ++i)
        a[i] += b[i];

    for (int i = 0; i < sz; ++i)
        a[i] += b[i];
}

These two loops could be merged into a single one doing two additions per
iteration. ICC does this transformation, GCC does not.

Also, I'd like to note that from the GCC output, only the first loop is
vectorized, and not the second one.

See https://godbolt.org/z/DfGpSi

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [Bug tree-optimization/94828] Loop fusion is not implemented
  2020-04-28 17:49 [Bug tree-optimization/94828] New: Failure to merge merge-able loops gabravier at gmail dot com
@ 2020-04-28 21:52 ` pinskia at gcc dot gnu.org
  2020-04-29  7:02 ` [Bug tree-optimization/94828] Loop fusion is not implemented outside of ISL rguenth at gcc dot gnu.org
  1 sibling, 0 replies; 3+ messages in thread
From: pinskia at gcc dot gnu.org @ 2020-04-28 21:52 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94828

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Severity|normal                      |enhancement
             Status|UNCONFIRMED                 |NEW
     Ever confirmed|0                           |1
   Last reconfirmed|                            |2020-04-28
            Summary|Failure to merge merge-able |Loop fusion is not
                   |loops                       |implemented

--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Loop fusion is not implemented correct.  

See PR 45661 and PR 85531 also.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [Bug tree-optimization/94828] Loop fusion is not implemented outside of ISL
  2020-04-28 17:49 [Bug tree-optimization/94828] New: Failure to merge merge-able loops gabravier at gmail dot com
  2020-04-28 21:52 ` [Bug tree-optimization/94828] Loop fusion is not implemented pinskia at gcc dot gnu.org
@ 2020-04-29  7:02 ` rguenth at gcc dot gnu.org
  1 sibling, 0 replies; 3+ messages in thread
From: rguenth at gcc dot gnu.org @ 2020-04-29  7:02 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94828

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Summary|Loop fusion is not          |Loop fusion is not
                   |implemented                 |implemented outside of ISL

--- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> ---
Both loops are vectorized:

> ./cc1 -quiet y.c -O3 -fopt-info-vec
y.c:6:11: optimized: loop vectorized using 16 byte vectors
y.c:3:7: optimized: loop vectorized using 16 byte vectors

GCC fuses the loops with -floop-nest-optimize

[scheduler] original ast:
{
  for (int c0 = 0; c0 < P_20; c0 += 1)
    S_3(c0);
  for (int c0 = 0; c0 < P_20; c0 += 1)
    S_4(c0);
}

[scheduler] AST generated by isl:
for (int c0 = 0; c0 < P_20; c0 += 1) {
  S_3(c0);
  S_4(c0);
}

producing

.L4:
        movdqu  (%rdi,%rax), %xmm0
        movdqu  (%rsi,%rax), %xmm2
        paddd   %xmm2, %xmm0
        paddd   %xmm2, %xmm0
        movups  %xmm0, (%rdi,%rax)
        addq    $16, %rax
        cmpq    %rdx, %rax
        jne     .L4

but it's true that GCC does not implement classical loop fusion.

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2020-04-29  7:02 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-04-28 17:49 [Bug tree-optimization/94828] New: Failure to merge merge-able loops gabravier at gmail dot com
2020-04-28 21:52 ` [Bug tree-optimization/94828] Loop fusion is not implemented pinskia at gcc dot gnu.org
2020-04-29  7:02 ` [Bug tree-optimization/94828] Loop fusion is not implemented outside of ISL rguenth at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).