From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 110870 invoked by alias); 15 Jul 2015 22:17:10 -0000 Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-bugs-owner@gcc.gnu.org Received: (qmail 106060 invoked by uid 48); 15 Jul 2015 22:17:06 -0000 From: "andi-gcc at firstfloor dot org" To: gcc-bugs@gcc.gnu.org Subject: [Bug rtl-optimization/66890] New: function splitting only works with profile feedback Date: Wed, 15 Jul 2015 22:17:00 -0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: rtl-optimization X-Bugzilla-Version: 5.0 X-Bugzilla-Keywords: X-Bugzilla-Severity: enhancement X-Bugzilla-Who: andi-gcc at firstfloor dot org X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_id short_desc product version bug_status bug_severity priority component assigned_to reporter target_milestone Message-ID: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-SW-Source: 2015-07/txt/msg01274.txt.bz2 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66890 Bug ID: 66890 Summary: function splitting only works with profile feedback Product: gcc Version: 5.0 Status: UNCONFIRMED Severity: enhancement Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: andi-gcc at firstfloor dot org Target Milestone: --- Consider this simple example: volatile int count; int main() { int i; for (i = 0; i < 100000; i++) { if (i == 999) count *= 2; count++; } } The default EQ is unlikely heuristic in predict.* predicts that the if (i == 999) is unlikely. So the tracer moves the count *= 2 basic block out of line to preserve instruction cache. gcc50 -O2 -S thotcold.c movl $1, %edx jmp .L2 .p2align 4,,10 .p2align 3 .L4: addl $1, %edx .L2: cmpl $1000, %edx movl count(%rip), %eax je .L6 addl $1, %eax cmpl $100000, %edx movl %eax, count(%rip) jne .L4 xorl %eax, %eax ret # out of line code .L6: addl %eax, %eax movl %eax, count(%rip) movl count(%rip), %eax addl $1, %eax movl %eax, count(%rip) jmp .L4 Now if we enable -freorder-blocks-and-partition I would expect it to be also put into .text.unlikely to given even better cache layout. But that's what is not happening. It generates the same code. Only when I use actual profile feedback and -freorder-blocks-and-partition the code actually ends up being in a separate section (it also unrolled the loop, so the code looks a bit different) gcc -O2 -fprofile-generate -freorder-blocks-and-partition thotcold.c ./a.out gcc -O2 -fprofile-use -freorder-blocks-and-partition thotcold.c ... .cfi_endproc .section .text.unlikely .cfi_startproc .L55: movl count(%rip), %ecx addl $1, %eax addl $1, %ecx cmpl $100000, %eax movl %ecx, count(%rip) je .L6 cmpl $1, %edx je .L5 cmpl $2, %edx je .L28 cmpl $3, %edx -freorder-blocks-and-partition should already use the extra section even without profile feedback. I tested some larger programs and without profile feedback the unlikely section is always empty. The heuristics in predict.* often work quite well and a lot of code would benefit from moving cold code out of the way of the caches. This would allow to use the option to improve frontend bound codes without needing to do full profile feedback.