From mboxrd@z Thu Jan 1 00:00:00 1970 From: Brad Lucier To: zack@codesourcery.com (Zack Weinberg) Cc: lucier@math.purdue.edu (Brad Lucier), gcc@gcc.gnu.org, jh@suse.cz (Jan Hubicka) Subject: Re: Some statement counts for gcc Date: Mon, 26 Aug 2002 06:18:00 -0000 Message-id: <200208261318.g7QDIFL04502@banach.math.purdue.edu> References: <20020826071657.GD28065@codesourcery.com> X-SW-Source: 2002-08/msg01588.html > > On Sun, Aug 25, 2002 at 06:04:11PM -0500, Brad Lucier wrote: > > > branch prediction : 165.11 (33%) usr 0.07 ( 1%) sys 165.50 (33%) wall > ... > > A surprising (to me) number of lines in real.c are executed; I might look > > there to see what's going on. > > The branch predictor uses emulated floating point numbers internally. > Jan Hubicka has explained why this is currently necessary -- can't > find the message at the moment though. However, real.c is indeed > quite slow; I suspect that accounts entirely for the amount of time > spent in this patch The problem is that gcc's fp code generator on x87 is broken enough that you can get different results for the same expression, hence the use of the simulator which does not use extended-precision arithmetic by default. I'm not really sure that the simulator is unreasonably slow. > If I remember correctly this code has a very complicated flow graph, > and branch prediction may not help much; perhaps the right thing is > to detect code like this and disable that optimization. This has been the response to several of my recent observations about gcc's algorithms, etc. I'd prefer that if there are problems they be fixed rather than papered over by a -fbrad's_code_don't_optimize flag. At any rate, here's some data for compiling the larger file: banach-139% gcc/cc1 -fnew-ra -m64 -O1 -fschedule-insns2 -fno-strict-aliasing -fno-math-errno -mcpu=ultrasparc -mtune=ultrasparc _io.i ___H__20___io {GC 75305k -> 25172k} {GC 42902k -> 24245k} {GC 32413k -> 24597k} {GC 35635k -> 27127k} {GC 45565k -> 27034k} {GC 50659k -> 26850k} {GC 37567k -> 30050k} {GC 52691k -> 31806k} ___init_proc ____20___io Execution times (seconds) garbage collection : 10.79 ( 0%) usr 0.24 ( 0%) sys 21.50 ( 0%) wall cfg construction : 29.83 ( 0%) usr 4.03 ( 1%) sys 34.50 ( 0%) wall cfg cleanup : 99.46 ( 0%) usr 0.03 ( 0%) sys 99.00 ( 0%) wall trivially dead code : 5.48 ( 0%) usr 0.00 ( 0%) sys 5.50 ( 0%) wall life analysis : 435.03 ( 2%) usr 0.08 ( 0%) sys 442.50 ( 2%) wall life info update : 58.31 ( 0%) usr 0.00 ( 0%) sys 58.50 ( 0%) wall preprocessing : 2.93 ( 0%) usr 2.57 ( 1%) sys 5.00 ( 0%) wall lexical analysis : 1.64 ( 0%) usr 5.18 ( 1%) sys 10.00 ( 0%) wall parser : 12.39 ( 0%) usr 2.87 ( 1%) sys 13.50 ( 0%) wall expand : 4.24 ( 0%) usr 0.17 ( 0%) sys 5.00 ( 0%) wall varconst : 1.49 ( 0%) usr 0.01 ( 0%) sys 1.50 ( 0%) wall integration : 1.04 ( 0%) usr 0.02 ( 0%) sys 1.00 ( 0%) wall jump : 64.02 ( 0%) usr 0.00 ( 0%) sys 63.50 ( 0%) wall CSE : 12.37 ( 0%) usr 0.00 ( 0%) sys 12.50 ( 0%) wall loop analysis : 0.12 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall branch prediction :1211.11 ( 5%) usr 1.25 ( 0%) sys1214.50 ( 5%) wall flow analysis : 3.76 ( 0%) usr 0.00 ( 0%) sys 3.50 ( 0%) wall combiner : 18.08 ( 0%) usr 0.00 ( 0%) sys 18.50 ( 0%) wall if-conversion : 5.82 ( 0%) usr 0.00 ( 0%) sys 6.00 ( 0%) wall local alloc :20603.26 (91%) usr 399.06 (96%) sys21695.50 (91%) wall global alloc : 16.60 ( 0%) usr 0.05 ( 0%) sys 22.00 ( 0%) wall reload CSE regs : 62.53 ( 0%) usr 0.06 ( 0%) sys 69.00 ( 0%) wall flow 2 : 1.11 ( 0%) usr 0.00 ( 0%) sys 0.50 ( 0%) wall if-conversion 2 : 5.83 ( 0%) usr 0.01 ( 0%) sys 6.50 ( 0%) wall rename registers : 10.91 ( 0%) usr 0.00 ( 0%) sys 11.00 ( 0%) wall scheduling 2 : 9.92 ( 0%) usr 0.02 ( 0%) sys 10.00 ( 0%) wall delay branch sched : 14.78 ( 0%) usr 0.00 ( 0%) sys 14.50 ( 0%) wall shorten branches : 0.85 ( 0%) usr 0.00 ( 0%) sys 1.00 ( 0%) wall final : 4.50 ( 0%) usr 0.05 ( 0%) sys 5.00 ( 0%) wall rest of compilation : 8.71 ( 0%) usr 0.02 ( 0%) sys 9.50 ( 0%) wall TOTAL :22716.96 415.73 23860.50 Although the GC statistics indicate not much memory use, this code took up to 3.8 GB of swap when running. This is for the file at http://www.math.purdue.edu/~lucier/_io.i.gz Here we see an excessive amount of time spent in the new register allocator; the most-executed lines in the ra-* files are at the end of this message. The complete .c.gcov files for this run are at http://www.math.purdue.edu/~lucier/gcovs_io.tgz The ra-*.c.gcov files are at http://www.math.purdue.edu/~lucier/ra-gcovs_io.tgz and the executed lines sorted in decreasing order of execution are at http://www.math.purdue.edu/~lucier/ra-sorted-lines.gz Brad ra-build.c.gcov: 79915683: 1625: if (web1 == web2 || TEST_BIT (igraph, index)) ra-build.c.gcov: 79915683: 1623: unsigned int index = igraph_index (id1, id2); ra-build.c.gcov: 79915683: 1622: unsigned int id1 = web1->id, id2 = web2->id; ra-build.c.gcov: 79915683: 1621:{ ra-colorize.c.gcov: 79134101: 741: for (wl = v->conflict_list; wl; wl = wl->next) ra-colorize.c.gcov: 79111468: 764: for (i = 0; i < nregs; i++) ra-colorize.c.gcov: 79111383: 802: if (pweb->type != SELECT && pweb->type != COALESCED) ra-colorize.c.gcov: 79111383: 799: if (u->type != PRECOLORED) ra-colorize.c.gcov: 79111383: 769: record_conflict (web, pweb); ra-colorize.c.gcov: 79111383: 768: if (wl->sub == NULL) ra-colorize.c.gcov: 79111383: 766: if (u->type == PRECOLORED) ra-colorize.c.gcov: 79111383: 755: if (u->type == PRECOLORED) ra-colorize.c.gcov: 79111383: 754: int nregs = 1 + v->add_hardregs; ra-colorize.c.gcov: 79111383: 753: struct web *web = u; ra-colorize.c.gcov: 79111383: 751: if (1) ra-colorize.c.gcov: 79111383: 743: struct web *pweb = wl->t; ra-colorize.c.gcov: 79111298: 800: break; ra-build.c.gcov: 69833156: 1636: if ((web1->type == PRECOLORED ra-build.c.gcov: 69833156: 1633: return; ra-build.c.gcov: 69833156: 1631: if ((web1->regno < FIRST_PSEUDO_REGISTER && fixed_regs[web1->regno]) ra-build.c.gcov: 69833156: 1627: if (id1 == id2) ra-build.c.gcov: 69833156: 1626: return; ra-build.c.gcov: 69827720: 1652: if (web1->type != PRECOLORED && web2->type != PRECOLORED ra-build.c.gcov: 69827615: 1665: add_conflict_edge (web2, web1); ra-build.c.gcov: 69827615: 1664: add_conflict_edge (web1, web2); ra-build.c.gcov: 69827615: 1663: SET_BIT (igraph, index); ra-rewrite.c.gcov: 66002982: 1141: for (d = (pass) ? WEBS(SPILLED) : WEBS(COALESCED); d; d = d->next) ra-rewrite.c.gcov: 65990988: 1145: if (aweb->type != SPILLED) ra-rewrite.c.gcov: 65990988: 1144: struct web *aweb = alias (web); ra-rewrite.c.gcov: 65990988: 1143: struct web *web = DLIST_WEB (d);ra-rewrite.c.gcov: 65972997: 1146: continue; ra-colorize.c.gcov: 58624158: 1759: for (nn = web2->conflict_list; nn && !wide_p; nn = nn->next) ra-colorize.c.gcov: 58583640: 1760: if (alias (nn->t)->add_hardregs) ra-colorize.c.gcov: 55463179: 431: if (web->num_conflicts < NUM_REGS (web) && before >= NUM_REGS (web)) ra-colorize.c.gcov: 55463179: 430: web->num_conflicts -= dec; ra-colorize.c.gcov: 55463179: 429: int before = web->num_conflicts; ra-colorize.c.gcov: 55463179: 428:{ ra-colorize.c.gcov: 54992227: 803: decrement_degree (pweb, 1 + v->add_hardregs); ra-colorize.c.gcov: 46572957: 1306: for (wl = web->conflict_list; wl; wl = wl->next) ra-colorize.c.gcov: 46540347: 1313: if (ptarget->type != COLORED && ptarget->type != PRECOLORED ra-colorize.c.gcov: 46540347: 1312: w = sl ? sl->t : wl->t; ra-colorize.c.gcov: 46540347: 1311: IOR_HARD_REG_SET (bias, ptarget->bias_colors); ra-colorize.c.gcov: 46540347: 1310: struct sub_conflict *sl = wl->sub; ra-colorize.c.gcov: 46540347: 1309: struct web *ptarget = alias (wl->t); ra-colorize.c.gcov: 46540347: 1308: struct web *w; ra-colorize.c.gcov: 46512519: 480: for (wl = web->conflict_list; wl; wl = wl->next) ra-colorize.c.gcov: 46479912: 483: if (pweb->type != SELECT && pweb->type != COALESCED) ra-colorize.c.gcov: 46479912: 482: struct web *pweb = wl->t; ra-rewrite.c.gcov: 19823503: 814: for (; size--;) ra-build.c.gcov: 16465236: 546: return r1; ra-build.c.gcov: 16465236: 488: if (r1 != r2) ra-build.c.gcov: 16465236: 487:{ ra-colorize.c.gcov: 11909293: 1959: if (web1->spill_cost > web2->spill_cost)