public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug tree-optimization/53791] New: Branches not re-ordered using profile-information
@ 2012-06-27 23:25 steven at gcc dot gnu.org
2012-06-28 9:52 ` [Bug tree-optimization/53791] " rguenth at gcc dot gnu.org
2012-06-28 12:05 ` steven at gcc dot gnu.org
0 siblings, 2 replies; 3+ messages in thread
From: steven at gcc dot gnu.org @ 2012-06-27 23:25 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53791
Bug #: 53791
Summary: Branches not re-ordered using profile-information
Classification: Unclassified
Product: gcc
Version: 4.8.0
Status: UNCONFIRMED
Severity: enhancement
Priority: P3
Component: tree-optimization
AssignedTo: unassigned@gcc.gnu.org
ReportedBy: steven@gcc.gnu.org
Consider the following test case:
extern void abort(void) __attribute__((__noreturn__));
static int __attribute__((__noinline__,__noclone__))
candidate_for_reordering (int x)
{
asm volatile ("" : : : "memory");
if (x == 1)
return 2;
else if (x == 2)
return 4;
else if (x == 3)
return 8;
abort ();
}
int accum = 0;
int main (int argc __attribute__((__unused__)),
char **argv __attribute__((__unused__)))
{
accum += candidate_for_reordering (3);
accum += candidate_for_reordering (3);
accum += candidate_for_reordering (3);
accum += candidate_for_reordering (3);
accum += candidate_for_reordering (3);
accum += candidate_for_reordering (3);
accum += candidate_for_reordering (3);
accum += candidate_for_reordering (3);
accum += candidate_for_reordering (3);
accum += candidate_for_reordering (3);
accum += candidate_for_reordering (1);
accum += candidate_for_reordering (3);
accum += candidate_for_reordering (3);
accum += candidate_for_reordering (3);
accum += candidate_for_reordering (3);
accum += candidate_for_reordering (3);
accum += candidate_for_reordering (3);
accum += candidate_for_reordering (3);
accum += candidate_for_reordering (3);
accum += candidate_for_reordering (3);
accum += candidate_for_reordering (3);
accum += candidate_for_reordering (2);
accum += candidate_for_reordering (3);
accum += candidate_for_reordering (3);
accum += candidate_for_reordering (3);
accum += candidate_for_reordering (3);
accum += candidate_for_reordering (3);
accum += candidate_for_reordering (3);
accum += candidate_for_reordering (3);
accum += candidate_for_reordering (3);
accum += candidate_for_reordering (3);
accum += candidate_for_reordering (3);
return accum; /* 30*8 + 4 + 2 = 246 */
}
Compiling with -fprofile-generate, doing 3 test runs, and compiling with
-fprofile-use, results in this .040t.feedback_fnsplit dump for the
candidate_for_reordering function:
candidate_for_reordering (intD.0 xD.1241)
{
intD.0 D.1319;
# BLOCK 2 freq:10000 count:96
# PRED: ENTRY [100.0%] count:96 (fallthru,exec)
# .MEMD.1325_7 = VDEF <.MEMD.1325_6(D)>
__asm__ __volatile__("" : : : "memory");
# SUCC: 3 [100.0%] count:96 (fallthru)
# BLOCK 3 freq:10000 count:96
# PRED: 2 [100.0%] count:96 (fallthru)
if (xD.1241_2(D) == 1)
goto <bb 7>;
else
goto <bb 4>;
# SUCC: 7 [3.1%] count:3 (true,exec) 4 [96.9%] count:93 (false,exec)
# BLOCK 4 freq:9688 count:93
# PRED: 3 [96.9%] count:93 (false,exec)
if (xD.1241_2(D) == 2)
goto <bb 7>;
else
goto <bb 5>;
# SUCC: 7 [3.2%] count:3 (true,exec) 5 [96.8%] count:90 (false,exec)
# BLOCK 5 freq:9375 count:90
# PRED: 4 [96.8%] count:90 (false,exec)
if (xD.1241_2(D) == 3)
goto <bb 7>;
else
goto <bb 6>;
# SUCC: 7 [100.0%] count:90 (true,exec) 6 (false,exec)
# BLOCK 6
# PRED: 5 (false,exec)
# VUSE <.MEMD.1325_7>
# USE = nonlocal
# CLB = nonlocal
abortD.830 ();
# SUCC:
# BLOCK 7 freq:10000 count:96
# PRED: 3 [3.1%] count:3 (true,exec) 4 [3.2%] count:3 (true,exec) 5
[100.0%] count:90 (true,exec)
# D.1319_1 = PHI <2(3), 4(4), 8(5)>
return D.1319_1;
# SUCC: EXIT [100.0%] count:96
}
... and the following assembly (powerpc64):
candidate_for_reordering:
.quad .L.candidate_for_reordering,.TOC.@tocbase
.previous
.type candidate_for_reordering, @function
.L.candidate_for_reordering:
mflr 0
std 0,16(1)
stdu 1,-112(1)
cmpwi 7,3,1
beq- 7,.L4
cmpwi 0,3,2
beq- 0,.L5
cmpwi 1,3,3
bne- 1,.L3
li 3,8
.L1:
addi 1,1,112
ld 4,16(1)
mtlr 4
blr
.L4:
li 3,2
b .L1
.L5:
li 3,4
b .L1
.L3:
bl abort
nop
Note the very obvious (to the human eye, anyway) optimization opportunity to
test for (x == 3) first. GCC is currently (AFAICT) not able to optimize the
order of branches for mutually exclusive conditions using profile info.
This prevents my work to re-implement emit_case_bit_tests as a GIMPLE lowering
pass to have any meaningful impact on the compiler speed with
profiledbootstrap: There are a couple of tree and GIMPLE predicates that would
benefit from this transformation, but it's not happening...
^ permalink raw reply [flat|nested] 3+ messages in thread
* [Bug tree-optimization/53791] Branches not re-ordered using profile-information
2012-06-27 23:25 [Bug tree-optimization/53791] New: Branches not re-ordered using profile-information steven at gcc dot gnu.org
@ 2012-06-28 9:52 ` rguenth at gcc dot gnu.org
2012-06-28 12:05 ` steven at gcc dot gnu.org
1 sibling, 0 replies; 3+ messages in thread
From: rguenth at gcc dot gnu.org @ 2012-06-28 9:52 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53791
Richard Guenther <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|UNCONFIRMED |NEW
Last reconfirmed| |2012-06-28
Ever Confirmed|0 |1
--- Comment #1 from Richard Guenther <rguenth at gcc dot gnu.org> 2012-06-28 09:51:48 UTC ---
Confirmed. tree-ssa-ifcombine.c deals with two-comparison CFG patterns, so
it could handle transforming
# BLOCK 4 freq:9688 count:93
# PRED: 3 [96.9%] count:93 (false,exec)
if (xD.1241_2(D) == 2)
goto <bb 7>;
else
goto <bb 5>;
# SUCC: 7 [3.2%] count:3 (true,exec) 5 [96.8%] count:90 (false,exec)
# BLOCK 5 freq:9375 count:90
# PRED: 4 [96.8%] count:90 (false,exec)
if (xD.1241_2(D) == 3)
goto <bb 7>;
else
goto <bb 6>;
at least (and by iterating would then catch the 2nd opportunity). That's
of course somewhat ad-hoc (but the easiest place to implement).
The other trivially obvious possibility is to pattern-match this open-coded
switch/case and transform it back to switch/case early.
^ permalink raw reply [flat|nested] 3+ messages in thread
* [Bug tree-optimization/53791] Branches not re-ordered using profile-information
2012-06-27 23:25 [Bug tree-optimization/53791] New: Branches not re-ordered using profile-information steven at gcc dot gnu.org
2012-06-28 9:52 ` [Bug tree-optimization/53791] " rguenth at gcc dot gnu.org
@ 2012-06-28 12:05 ` steven at gcc dot gnu.org
1 sibling, 0 replies; 3+ messages in thread
From: steven at gcc dot gnu.org @ 2012-06-28 12:05 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53791
--- Comment #2 from Steven Bosscher <steven at gcc dot gnu.org> 2012-06-28 12:05:38 UTC ---
(In reply to comment #1)
> The other trivially obvious possibility is to pattern-match this open-coded
> switch/case and transform it back to switch/case early.
The test case is a simplified example. More complex predicates may not be
transformed to a switch. Also, the point is that this kind of code results from
lowering of switches to the series of if-branches.
(What my emit_case_bit_tests patch does, actually, is to output the branch with
the highest count first. But that means I have to lower after reading in the
profile.)
I'm looking at "Efficient and effective branch reordering using profile data"
(2002) to see if this is something that could be implemented for GCC...
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2012-06-28 12:05 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-06-27 23:25 [Bug tree-optimization/53791] New: Branches not re-ordered using profile-information steven at gcc dot gnu.org
2012-06-28 9:52 ` [Bug tree-optimization/53791] " rguenth at gcc dot gnu.org
2012-06-28 12:05 ` steven at gcc dot gnu.org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).