public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug rtl-optimization/66890] New: function splitting only works with profile feedback
@ 2015-07-15 22:17 andi-gcc at firstfloor dot org
  2015-07-16  7:05 ` [Bug rtl-optimization/66890] " andi-gcc at firstfloor dot org
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: andi-gcc at firstfloor dot org @ 2015-07-15 22:17 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66890

            Bug ID: 66890
           Summary: function splitting only works with profile feedback
           Product: gcc
           Version: 5.0
            Status: UNCONFIRMED
          Severity: enhancement
          Priority: P3
         Component: rtl-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: andi-gcc at firstfloor dot org
  Target Milestone: ---

Consider this simple example:

volatile int count;

int main()
{
        int i;
        for (i = 0; i < 100000; i++) {
                if (i == 999)
                        count *= 2;
                count++;
        }
}

The default EQ is unlikely heuristic in predict.* predicts that the if (i ==
999) is unlikely. So the tracer moves the count *= 2 basic block out of line to
preserve instruction cache.

gcc50 -O2 -S thotcold.c

        movl    $1, %edx
        jmp     .L2
        .p2align 4,,10
        .p2align 3
.L4:
        addl    $1, %edx
.L2:
        cmpl    $1000, %edx
        movl    count(%rip), %eax
        je      .L6
        addl    $1, %eax
        cmpl    $100000, %edx
        movl    %eax, count(%rip)
        jne     .L4
        xorl    %eax, %eax
        ret
# out of line code
.L6:
        addl    %eax, %eax
        movl    %eax, count(%rip)
        movl    count(%rip), %eax
        addl    $1, %eax
        movl    %eax, count(%rip)
        jmp     .L4


Now if we enable -freorder-blocks-and-partition I would expect it to be also
put into .text.unlikely to given even better cache layout. But that's what is
not happening. It generates the same code.

Only when I use actual profile feedback and -freorder-blocks-and-partition the
code actually ends up being in a separate section

(it also unrolled the loop, so the code looks a bit different)

gcc -O2 -fprofile-generate -freorder-blocks-and-partition thotcold.c
./a.out 
gcc -O2 -fprofile-use -freorder-blocks-and-partition thotcold.c 
...
       .cfi_endproc
        .section        .text.unlikely
        .cfi_startproc
.L55:
        movl    count(%rip), %ecx
        addl    $1, %eax
        addl    $1, %ecx
        cmpl    $100000, %eax
        movl    %ecx, count(%rip)
        je      .L6
        cmpl    $1, %edx
        je      .L5
        cmpl    $2, %edx
        je      .L28
        cmpl    $3, %edx


-freorder-blocks-and-partition should already use the extra section even
without profile feedback. 

I tested some larger programs and without profile feedback the unlikely section
is always empty.

The heuristics in predict.* often work quite well and a lot of code would
benefit from moving cold code out of the way of the caches.

This would allow to use the option to improve frontend bound codes without
needing to do full profile feedback.


^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug rtl-optimization/66890] function splitting only works with profile feedback
  2015-07-15 22:17 [Bug rtl-optimization/66890] New: function splitting only works with profile feedback andi-gcc at firstfloor dot org
@ 2015-07-16  7:05 ` andi-gcc at firstfloor dot org
  2015-07-17 17:42 ` andi-gcc at firstfloor dot org
  2023-05-16 22:26 ` pinskia at gcc dot gnu.org
  2 siblings, 0 replies; 4+ messages in thread
From: andi-gcc at firstfloor dot org @ 2015-07-16  7:05 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66890

--- Comment #3 from Andi Kleen <andi-gcc at firstfloor dot org> ---
I suspect the patch may be too simple because it could get stuck in unlikely,
but high frequency edges in the cold area. Perhaps need to adapt more of the
code of the non partitioning reordering


^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug rtl-optimization/66890] function splitting only works with profile feedback
  2015-07-15 22:17 [Bug rtl-optimization/66890] New: function splitting only works with profile feedback andi-gcc at firstfloor dot org
  2015-07-16  7:05 ` [Bug rtl-optimization/66890] " andi-gcc at firstfloor dot org
@ 2015-07-17 17:42 ` andi-gcc at firstfloor dot org
  2023-05-16 22:26 ` pinskia at gcc dot gnu.org
  2 siblings, 0 replies; 4+ messages in thread
From: andi-gcc at firstfloor dot org @ 2015-07-17 17:42 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66890

--- Comment #4 from Andi Kleen <andi-gcc at firstfloor dot org> ---
Created attachment 36008
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=36008&action=edit
Updated patch with documentation and param

I updated the patch with proper documentation and a param for the cut off.
In some tests it appears to do the right thing when building a Linux kernel.


^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug rtl-optimization/66890] function splitting only works with profile feedback
  2015-07-15 22:17 [Bug rtl-optimization/66890] New: function splitting only works with profile feedback andi-gcc at firstfloor dot org
  2015-07-16  7:05 ` [Bug rtl-optimization/66890] " andi-gcc at firstfloor dot org
  2015-07-17 17:42 ` andi-gcc at firstfloor dot org
@ 2023-05-16 22:26 ` pinskia at gcc dot gnu.org
  2 siblings, 0 replies; 4+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-05-16 22:26 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66890

--- Comment #6 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Really for this loop, I would have assume to be split into 3 different loops
like:
volatile int count;

int main()
{
        int i;
        for (i = 0; i < 999; i++) {
                if (i == 999)
                        count *= 2;
                count++;
        }
        for (; i < 999+1; i++) {
                if (i == 999)
                        count *= 2;
                count++;
        }
        for (; i < 100000; i++) {
                if (i == 999)
                        count *= 2;
                count++;
        }
}

And then it would not have an extra branch inside the loop itself either.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2023-05-16 22:26 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-07-15 22:17 [Bug rtl-optimization/66890] New: function splitting only works with profile feedback andi-gcc at firstfloor dot org
2015-07-16  7:05 ` [Bug rtl-optimization/66890] " andi-gcc at firstfloor dot org
2015-07-17 17:42 ` andi-gcc at firstfloor dot org
2023-05-16 22:26 ` pinskia at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).