From mboxrd@z Thu Jan 1 00:00:00 1970 From: Robert Wilhelm To: egcs@cygnus.com Subject: Re: egcs: A new compiler project to merge the existing GCC forks (fwd) Date: Tue, 19 Aug 1997 07:36:17 -0000 Message-id: <199708190621.IAA04731@haegar.physiol.med.tu-muenchen.de> In-reply-to: Pine.GSO.3.96.970819004905.4653A-100000@drabble X-SW-Source: 1997-08/0133.html > > Can someone point me the location of mdbench? > http://www.sissa.it/furio/Mdbnch/info.html Robert From mboxrd@z Thu Jan 1 00:00:00 1970 From: Bernd Schmidt To: egcs@cygnus.com Subject: Re: Reload patch to improve 386 code Date: Tue, 19 Aug 1997 08:08:21 -0000 Message-ID: <199708190621.IAA04731@haegar.physiol.med.tu-muenchen.de> In-reply-to: 199708181517.LAA23280@tweedledumb.cygnus.com X-SW-Source: 1997-08/0138.html Message-ID: <19970819080821.AuRKay0Wf9ttwEfLjxCPC1ws9H_pELzzq6Fv7FMeWfo@z> > My comment on reload is meant for more > reload as we currently know it. I think it is an horrible kludge that reload > is run as a pass after global-alloc, and that it forces reload registers not to > be used for any other purpose (which is murder on the x86 with each register > being special purpose in some way). That problem can be solved; actually it is mostly solved in my patch. > I think it should be integrated into global-alloc, taking lifetimes, ranges, > etc. into consideration, possibly leaving the fossil reload for when you are > not optimizing. Given that reload has pretty much been the place where we all > fear to tread, at least since RMS handed over the gcc2 reigns to Kenner (and > even in the 1.3x time frame, I got the sense that RMS no longer had a good > handle on reload anymore), I think it is time for a rethought. Now, given > it is a rather critical piece of the compiler, it may take months if not > years to get it better than it currently is. I have some ideas for changing reload that could go on top of what I have. I'll describe what I'm planning to do, if you have any additional ideas I'd be happy to hear about them. First, I'd like to eliminate the code that counts the needs for registers globally. The patch I sent is a first step in that direction; it could probably easily be extended to spill registers locally for every instruction. This would require a slightly different way of calculating the possible damage from spilling a register, but I think it would not be hard to do. It would be nice to have code that detects that an insn needs a spill reg in a one-register class (like ecx for variable shifts on the 386), but a different register is free. In that case, reload should move ecx to the free register before the instruction, and back afterwards, if this is profitable. I'm not sure how hard that would be, and I haven't experimented with that kind of thing yet. Then, there are some simplifications that could be done. I don't like the inheritance code, find_equiv_reg and all that. IMHO reload shouldn't try to be very clever about this sort of thing - the reload_cse_regs pass can be made more clever. I've already submitted a patch to Kenner that enables reload_cse_regs to generate optional reloads. If we could add some more cleverness (e.g. deleting redundant stores into spill slots or eliminating register-register copies), quite a bit of code in reload could be deleted. I've already made some experiments in this direction which indicate that this approach may be feasible. Another way of simplifying reload would be to try to generate auto-inc addressing after reload has run, not before. That would eliminate some more special-case code (and it might make other parts of the compiler simpler as well). Bernd From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jeffrey A Law To: egcs@cygnus.com Subject: Testsuite stuff Date: Tue, 19 Aug 1997 08:08:21 -0000 Message-ID: <199708190621.IAA04731@haegar.physiol.med.tu-muenchen.de> X-SW-Source: 1997-08/0136.html Message-ID: <19970819080821.ZCQjhnjGiIf49cyde85PaRZ9Z2gAQg_X5_azCDyD-hU@z> I'm hoping to include the basic testsuite for c & c++ in the next snapshot. The testsuites use the dejagnu testing framework; therefore, you'll need a copy of dejagnu installed to be able to run the testsuites. So, I've put a link to a recent dejagnu snapshot in ftp.cygnus.com:/pub/egcs/infrastructure/dejagnu-970707.tar.gz [ Note, old releases of dejagnu will not work -- a great deal of dejagnu and the testsuite harnesses changed recently to improve testing of cross compilers, dos hosted toolchains, etc etc. ] Running the testsuite is easy. After you've built the compiler in theory all you have to do is say "make check" and wait. This is one small step in the rather large job of exporting Cygnus's internal testing infrastructure to the egcs community. Jeff From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jeffrey A Law To: egcs@cygnus.com Subject: Re: Reload patch to improve 386 code Date: Tue, 19 Aug 1997 08:08:21 -0000 Message-ID: <199708190621.IAA04731@haegar.physiol.med.tu-muenchen.de> In-reply-to: Reload patch to improve 386 code X-SW-Source: 1997-08/0137.html Message-ID: <19970819080821.YdSdbJ2C0StzYtJeuSpxNO2BSBwRC0nq-kOap5ls4zI@z> In message you write: > The idea of running sched before reload seems to be to improve code like > this: > move mem1 => pseudo1 > move pseudo1 => mem2 > move mem3 => pseudo2 > move pseudo2 => mem4 > move mem5 => pseudo3 > move pseudo3 => mem6 Or, more generally: * The scheduling pass before reload has fewer data dependencies (because it works with pseudos), and thus can more instructions more freely to produce better schedules. However, the downside is potentially increased register pressure. * The scheduling pass after reload primarily helps with scheduling of spill code and prologue/epilogue insns. With pseudos mapped to hard registers, there are far fewer opportunities for the scheduler to move instructions. > If this is left as it stands, register allocation will most likely allocate > the pseudo to the same hard register. This means the post-reload sched pass > can't do anything with it, and the CPU can't either because there is no > parallelism in the code (well, at least the Pentium can't). Yup. > you suddenly have two blocks of three independent instructions which could > run in parallel. However, this will lose badly once you don't have three > instructions of that kind, but a hundred (since your average CPU doesn't > have a hundred hard registers). Yup. Hence my message about throttling the scheduler once the ratio of pseudos to hard registers hits some magic number. > Another approach I've been thinking about is to add code that analyzes code > like this after reload > > move mem1 => hardreg1 > move hardreg1 => mem2 > move mem3 => hardreg1 > move hardreg1 => mem4 > move mem5 => hardreg1 > move hardreg1 => mem6 > > and tries to make it use as many independent hard registers as possible. > That would make the scheduling opportunities available without the risk > of over-scheduling before reload. I don't know how feasible this is. Yes, I've considered doing similar things myself, but I didn't think it was really worth the effort.. Jeff Jeff From mboxrd@z Thu Jan 1 00:00:00 1970 From: Bernd Schmidt To: egcs@cygnus.com Subject: Re: Reload patch to improve 386 code Date: Tue, 19 Aug 1997 08:08:21 -0000 Message-ID: <199708190621.IAA04731@haegar.physiol.med.tu-muenchen.de> In-reply-to: 199708181855.OAA03711@jenolan.rutgers.edu X-SW-Source: 1997-08/0135.html Message-ID: <19970819080821.S1vj85E23h4XVB_8oZ-M_HFWyV5ba-M9yv10QdkxNmU@z> > Before this leaves my head, I wanted to point something out which > you've reminded me of. When the scheduler (this applies to both the > original and Haifa versions equally) becomes aggressive, it produces a > large number of reloads in certain situations. The idea of running sched before reload seems to be to improve code like this: move mem1 => pseudo1 move pseudo1 => mem2 move mem3 => pseudo2 move pseudo2 => mem4 move mem5 => pseudo3 move pseudo3 => mem6 If this is left as it stands, register allocation will most likely allocate the pseudo to the same hard register. This means the post-reload sched pass can't do anything with it, and the CPU can't either because there is no parallelism in the code (well, at least the Pentium can't). If sched modifies the above to look like this move mem1 => pseudo1 move mem3 => pseudo2 move mem5 => pseudo3 move pseudo1 => mem2 move pseudo2 => mem4 move pseudo3 => mem6 you suddenly have two blocks of three independent instructions which could run in parallel. However, this will lose badly once you don't have three instructions of that kind, but a hundred (since your average CPU doesn't have a hundred hard registers). Another approach I've been thinking about is to add code that analyzes code like this after reload move mem1 => hardreg1 move hardreg1 => mem2 move mem3 => hardreg1 move hardreg1 => mem4 move mem5 => hardreg1 move hardreg1 => mem6 and tries to make it use as many independent hard registers as possible. That would make the scheduling opportunities available without the risk of over-scheduling before reload. I don't know how feasible this is. Bernd From mboxrd@z Thu Jan 1 00:00:00 1970 From: Bernd Schmidt To: egcs@cygnus.com Subject: Some Haifa scheduler bugs Date: Tue, 19 Aug 1997 09:34:19 -0000 Message-ID: <199708190621.IAA04731@haegar.physiol.med.tu-muenchen.de> X-SW-Source: 1997-08/0134.html Message-ID: <19970819093419.qr1BbvYpBQQZaLOVZ1o9l-70KTPZxRO_mShZqu6jFkY@z> I've run c-torture on the egcs snapshot, using the Haifa scheduler with all flags turned on. My system is an i586-linux one. Here's a patch to fix some of the failures. There seems to be at least one additional problem, which I haven't really investigated yet. It appears as if the scheduler extends the lifetime of hard registers, which is deadly on SMALL_REGISTER_CLASSES machines. More specifically, the broken code looks like this: call somefunction do a multiplication; clobbers eax movl eax,ebx while the correct code moves eax to ebx directly after the call. I have not found any code in either haifa-sched.c or sched.c to prevent this problem, so it's quite possible that the normal scheduler may also suffer from this. * haifa-sched.c (target_bb, bbset_size, dom, prob, rgn_nr_edges, rgn_edges, edgeset_size, edge_to_bit, pot_split, ancestor_edges): Make static. (move_insn): Call reemit_notes for every insn in a SCHED_GROUP. (schedule_block): When checking whether an insn is the basic_block_head of a different block, use the first insn in the same SCHED_GROUP instead of the insn itself. (debug_dependencies): GET_RTX_NAME takes an rtx code, not an rtx. *** haifa-sched.c.orig-1 Mon Aug 18 13:03:47 1997 --- haifa-sched.c Mon Aug 18 21:00:48 1997 *************** int *bblst_table, bblst_size, bblst_last *** 626,632 **** #define SRC_PROB(src) ( candidate_table[src].src_prob ) /* The bb being currently scheduled. */ ! int target_bb; /* List of edges. */ typedef bitlst edgelst; --- 626,632 ---- #define SRC_PROB(src) ( candidate_table[src].src_prob ) /* The bb being currently scheduled. */ ! static int target_bb; /* List of edges. */ typedef bitlst edgelst; *************** void debug_candidates PROTO ((int)); *** 642,652 **** typedef bitset bbset; /* Number of words of the bbset. */ ! int bbset_size; /* Dominators array: dom[i] contains the bbset of dominators of bb i in the region. */ ! bbset *dom; /* bb 0 is the only region entry */ #define IS_RGN_ENTRY(bb) (!bb) --- 642,652 ---- typedef bitset bbset; /* Number of words of the bbset. */ ! static int bbset_size; /* Dominators array: dom[i] contains the bbset of dominators of bb i in the region. */ ! static bbset *dom; /* bb 0 is the only region entry */ #define IS_RGN_ENTRY(bb) (!bb) *************** bbset *dom; *** 657,663 **** /* Probability: Prob[i] is a float in [0, 1] which is the probability of bb i relative to the region entry. */ ! float *prob; /* The probability of bb_src, relative to bb_trg. Note, that while the 'prob[bb]' is a float in [0, 1], this macro returns an integer --- 657,663 ---- /* Probability: Prob[i] is a float in [0, 1] which is the probability of bb i relative to the region entry. */ ! static float *prob; /* The probability of bb_src, relative to bb_trg. Note, that while the 'prob[bb]' is a float in [0, 1], this macro returns an integer *************** float *prob; *** 669,684 **** typedef bitset edgeset; /* Number of edges in the region. */ ! int rgn_nr_edges; /* Array of size rgn_nr_edges. */ ! int *rgn_edges; /* Number of words in an edgeset. */ ! int edgeset_size; /* Mapping from each edge in the graph to its number in the rgn. */ ! int *edge_to_bit; #define EDGE_TO_BIT(edge) (edge_to_bit[edge]) /* The split edges of a source bb is different for each target --- 669,684 ---- typedef bitset edgeset; /* Number of edges in the region. */ ! static int rgn_nr_edges; /* Array of size rgn_nr_edges. */ ! static int *rgn_edges; /* Number of words in an edgeset. */ ! static int edgeset_size; /* Mapping from each edge in the graph to its number in the rgn. */ ! static int *edge_to_bit; #define EDGE_TO_BIT(edge) (edge_to_bit[edge]) /* The split edges of a source bb is different for each target *************** int *edge_to_bit; *** 687,696 **** the split edges of each bb relative to the region entry. pot_split[bb] is the set of potential split edges of bb. */ ! edgeset *pot_split; /* For every bb, a set of its ancestor edges. */ ! edgeset *ancestor_edges; static void compute_dom_prob_ps PROTO ((int)); --- 687,696 ---- the split edges of each bb relative to the region entry. pot_split[bb] is the set of potential split edges of bb. */ ! static edgeset *pot_split; /* For every bb, a set of its ancestor edges. */ ! static edgeset *ancestor_edges; static void compute_dom_prob_ps PROTO ((int)); *************** build_control_flow () *** 1277,1284 **** /* construct edges in the control flow graph, from 'source' block, to blocks refered to by 'pattern'. */ ! static ! void build_jmp_edges (pattern, source) rtx pattern; int source; --- 1277,1283 ---- /* construct edges in the control flow graph, from 'source' block, to blocks refered to by 'pattern'. */ ! static void build_jmp_edges (pattern, source) rtx pattern; int source; *************** move_insn (insn, last) *** 6512,6520 **** move_insn1 (insn, last); insn = prev; } - move_insn1 (insn, last); ! return reemit_notes (new_last, new_last); } /* Return an insn which represents a SCHED_GROUP, which is --- 6511,6524 ---- move_insn1 (insn, last); insn = prev; } move_insn1 (insn, last); ! while (insn != new_last) ! { ! rtx next = NEXT_INSN (insn); ! reemit_notes (insn, insn); ! insn = next; ! } ! return reemit_notes (insn, insn); } /* Return an insn which represents a SCHED_GROUP, which is *************** schedule_block (bb, rgn, rgn_n_insns) *** 6840,6848 **** /* an interblock motion? */ if (INSN_BB (insn) != target_bb) { if (IS_SPECULATIVE_INSN (insn)) { - if (!check_live (insn, INSN_BB (insn), target_bb)) { /* speculative motion, live check failed, remove --- 6844,6853 ---- /* an interblock motion? */ if (INSN_BB (insn) != target_bb) { + rtx tmp; + if (IS_SPECULATIVE_INSN (insn)) { if (!check_live (insn, INSN_BB (insn), target_bb)) { /* speculative motion, live check failed, remove *************** schedule_block (bb, rgn, rgn_n_insns) *** 6861,6878 **** nr_inter++; /* update source block boundaries */ b1 = INSN_BLOCK (insn); ! if (insn == basic_block_head[b1] && insn == basic_block_end[b1]) { ! emit_note_after (NOTE_INSN_DELETED, basic_block_head[b1]); basic_block_end[b1] = basic_block_head[b1] = NEXT_INSN (insn); } else if (insn == basic_block_end[b1]) { ! basic_block_end[b1] = PREV_INSN (insn); } ! else if (insn == basic_block_head[b1]) { basic_block_head[b1] = NEXT_INSN (insn); } --- 6866,6886 ---- nr_inter++; /* update source block boundaries */ + tmp = insn; + while (SCHED_GROUP_P (tmp)) + tmp = PREV_INSN (tmp); b1 = INSN_BLOCK (insn); ! if (tmp == basic_block_head[b1] && insn == basic_block_end[b1]) { ! emit_note_after (NOTE_INSN_DELETED, insn); basic_block_end[b1] = basic_block_head[b1] = NEXT_INSN (insn); } else if (insn == basic_block_end[b1]) { ! basic_block_end[b1] = PREV_INSN (tmp); } ! else if (tmp == basic_block_head[b1]) { basic_block_head[b1] = NEXT_INSN (insn); } *************** debug_dependencies () *** 7383,7389 **** NOTE_SOURCE_FILE (insn)); } else ! fprintf (dump, " {%s}\n", GET_RTX_NAME (insn)); continue; } --- 7391,7397 ---- NOTE_SOURCE_FILE (insn)); } else ! fprintf (dump, " {%s}\n", GET_RTX_NAME (GET_CODE (insn))); continue; }