public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug tree-optimization/60577] New: inefficient FDO instrumentation code
@ 2014-03-19  3:02 carrot at google dot com
  2014-03-19 10:02 ` [Bug tree-optimization/60577] [4.9 Regression] " rguenth at gcc dot gnu.org
                   ` (10 more replies)
  0 siblings, 11 replies; 12+ messages in thread
From: carrot at google dot com @ 2014-03-19  3:02 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60577

            Bug ID: 60577
           Summary: inefficient FDO instrumentation code
           Product: gcc
           Version: 4.9.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: carrot at google dot com

This is actually a regression caused by r175916.

Compile the following code with options -O2 -fno-strict-aliasing
-fprofile-generate


struct thread_param
{
  long* buf;
  long iterations;
  long accesses;
} param;

void access_buf(struct thread_param* p)
{
  long i,j;
  long iterations = p->iterations;
  long accesses = p->accesses;
  for (i=0; i<iterations; i++)
  {
    long* pbuf = p->buf;
    for (j=0; j<accesses; j++)
      pbuf[j] += 1;
  }
}



Trunk gcc generates following for innermost loop:

.L9:
        addq    $1, __gcov0.access_buf(%rip)
        addq    $1, (%rax)
        addq    $8, %rax
        cmpq    %rdx, %rax
        jne     .L9

The fdo counter in memory is incremented in each iteration.



GCC at revision r175915 generates following for innermost loop

        movq    .LPBX1(%rip), %rsi
    ...
.L4:
        addq    $1, (%rax)
        addq    $8, %rax
        cmpq    %rdx, %rax
        jne     .L4
        leaq    1(%rsi,%r9), %rsi
    ...
    movq    %rsi, .LPBX1(%rip)

The fdo counter doesn't bring any overhead to the innermost loop.



GCC at revision r175916 generates following for innermost loop

        movq    .LPBX1(%rip), %rcx
        xorl    %eax, %eax
        leaq    1(%rcx), %r8 
        .p2align 4,,10
        .p2align 3
.L4:
        leaq    (%r8,%rax), %rcx
        movq    %rcx, .LPBX1(%rip)
        addq    $1, (%rdx,%rax,8)
        addq    $1, %rax
        cmpq    %rsi, %rax
        jne     .L4

The fdo counter is incremented and written to memory in each iteration.


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2014-10-30 10:43 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-03-19  3:02 [Bug tree-optimization/60577] New: inefficient FDO instrumentation code carrot at google dot com
2014-03-19 10:02 ` [Bug tree-optimization/60577] [4.9 Regression] " rguenth at gcc dot gnu.org
2014-03-19 10:54 ` [Bug tree-optimization/60577] [4.7/4.8/4.9 " rguenth at gcc dot gnu.org
2014-03-19 11:51 ` rguenth at gcc dot gnu.org
2014-03-19 12:04 ` rguenth at gcc dot gnu.org
2014-03-19 12:36 ` rguenth at gcc dot gnu.org
2014-03-20 11:36 ` rguenth at gcc dot gnu.org
2014-03-21 11:53 ` [Bug tree-optimization/60577] [4.7/4.8 " rguenth at gcc dot gnu.org
2014-03-21 11:53 ` rguenth at gcc dot gnu.org
2014-04-22 11:38 ` jakub at gcc dot gnu.org
2014-07-16 13:31 ` [Bug tree-optimization/60577] [4.8 " jakub at gcc dot gnu.org
2014-10-30 10:43 ` jakub at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).