Re: egcs: A new compiler project to merge the existing GCC forks (fwd)

public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed

* Re: egcs: A new compiler project to merge the existing GCC forks (fwd)
@ 1997-08-19  7:36 Robert Wilhelm
  1997-08-19  8:08 ` Reload patch to improve 386 code Jeffrey A Law
                   ` (4 more replies)
  0 siblings, 5 replies; 14+ messages in thread
From: Robert Wilhelm @ 1997-08-19  7:36 UTC (permalink / raw)
  To: egcs

> 
> Can someone point me the location of mdbench?
>

http://www.sissa.it/furio/Mdbnch/info.html

Robert

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Reload patch to improve 386 code
  1997-08-19  7:36 egcs: A new compiler project to merge the existing GCC forks (fwd) Robert Wilhelm
@ 1997-08-19  8:08 ` Jeffrey A Law
  1997-08-19  8:08 ` Testsuite stuff Jeffrey A Law
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 14+ messages in thread
From: Jeffrey A Law @ 1997-08-19  8:08 UTC (permalink / raw)
  To: egcs

  In message <Pine.SOL.3.90.970819094031.291D-100000@starsky.informatik.rwth-aa
chen.de>you write:
  > The idea of running sched before reload seems to be to improve code like
  > this:
  > move mem1 => pseudo1
  > move pseudo1 => mem2
  > move mem3 => pseudo2
  > move pseudo2 => mem4
  > move mem5 => pseudo3
  > move pseudo3 => mem6
Or, more generally:

  * The scheduling pass before reload has fewer data dependencies (because
  it works with pseudos), and thus can more instructions more freely to
  produce better schedules.  However, the downside is potentially increased
  register pressure.

  * The scheduling pass after reload primarily helps with scheduling of
  spill code and prologue/epilogue insns.  With pseudos mapped to hard
  registers, there are far fewer opportunities for the scheduler to
  move instructions.

  > If this is left as it stands, register allocation will most likely allocate
  > the pseudo to the same hard register. This means the post-reload sched pass
  > can't do anything with it, and the CPU can't either because there is no
  > parallelism in the code (well, at least the Pentium can't).
Yup.

  > you suddenly have two blocks of three independent instructions which could
  > run in parallel. However, this will lose badly once you don't have three
  > instructions of that kind, but a hundred (since your average CPU doesn't
  > have a hundred hard registers).
Yup.  Hence my message about throttling the scheduler once the ratio of
pseudos to hard registers hits some magic number.


  > Another approach I've been thinking about is to add code that analyzes code
  > like this after reload
  > 
  > move mem1 => hardreg1
  > move hardreg1 => mem2
  > move mem3 => hardreg1
  > move hardreg1 => mem4
  > move mem5 => hardreg1
  > move hardreg1 => mem6
  > 
  > and tries to make it use as many independent hard registers as possible.
  > That would make the scheduling opportunities available without the risk
  > of over-scheduling before reload. I don't know how feasible this is.
Yes, I've considered doing similar things myself, but I didn't think it
was really worth the effort..

Jeff
Jeff

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Testsuite stuff
  1997-08-19  7:36 egcs: A new compiler project to merge the existing GCC forks (fwd) Robert Wilhelm
  1997-08-19  8:08 ` Reload patch to improve 386 code Jeffrey A Law
@ 1997-08-19  8:08 ` Jeffrey A Law
  1997-08-19  8:08 ` Reload patch to improve 386 code Bernd Schmidt
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 14+ messages in thread
From: Jeffrey A Law @ 1997-08-19  8:08 UTC (permalink / raw)
  To: egcs

I'm hoping to include the basic testsuite for c & c++ in the
next snapshot.

The testsuites use the dejagnu testing framework; therefore,
you'll need a copy of dejagnu installed to be able to run the
testsuites.

So, I've put a link to a recent dejagnu snapshot in
ftp.cygnus.com:/pub/egcs/infrastructure/dejagnu-970707.tar.gz

[ Note, old releases of dejagnu will not work -- a great deal of
  dejagnu and the testsuite harnesses changed recently to improve
  testing of cross compilers, dos hosted toolchains, etc etc. ]

Running the testsuite is easy.  After you've built the compiler
in theory all you have to do is say "make check" and wait.

This is one small step in the rather large job of exporting
Cygnus's internal testing infrastructure to the egcs community.

Jeff

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Reload patch to improve 386 code
  1997-08-19  7:36 egcs: A new compiler project to merge the existing GCC forks (fwd) Robert Wilhelm
  1997-08-19  8:08 ` Reload patch to improve 386 code Jeffrey A Law
  1997-08-19  8:08 ` Testsuite stuff Jeffrey A Law
@ 1997-08-19  8:08 ` Bernd Schmidt
  1997-08-19  8:08 ` Bernd Schmidt
  1997-08-19  9:34 ` Some Haifa scheduler bugs Bernd Schmidt
  4 siblings, 0 replies; 14+ messages in thread
From: Bernd Schmidt @ 1997-08-19  8:08 UTC (permalink / raw)
  To: egcs

> My comment on reload is meant for more
> reload as we currently know it.  I think it is an horrible kludge that reload
> is run as a pass after global-alloc, and that it forces reload registers not to
> be used for any other purpose (which is murder on the x86 with each register
> being special purpose in some way).

That problem can be solved; actually it is mostly solved in my patch.

> I think it should be integrated into global-alloc, taking lifetimes, ranges,
> etc. into consideration, possibly leaving the fossil reload for when you are
> not optimizing.  Given that reload has pretty much been the place where we all
> fear to tread, at least since RMS handed over the gcc2 reigns to Kenner (and
> even in the 1.3x time frame, I got the sense that RMS no longer had a good
> handle on reload anymore), I think it is time for a rethought.  Now, given
> it is a rather critical piece of the compiler, it may take months if not
> years to get it better than it currently is.

I have some ideas for changing reload that could go on top of what I have. I'll
describe what I'm planning to do, if you have any additional ideas I'd be happy
to hear about them.
First, I'd like to eliminate the code that counts the needs for registers
globally. The patch I sent is a first step in that direction; it could probably
easily be extended to spill registers locally for every instruction. This would
require a slightly different way of calculating the possible damage from
spilling a register, but I think it would not be hard to do.
It would be nice to have code that detects that an insn needs a spill reg in a
one-register class (like ecx for variable shifts on the 386), but a different
register is free. In that case, reload should move ecx to the free register
before the instruction, and back afterwards, if this is profitable. I'm not
sure how hard that would be, and I haven't experimented with that kind of
thing yet.
Then, there are some simplifications that could be done. I don't like the
inheritance code, find_equiv_reg and all that. IMHO reload shouldn't try to be
very clever about this sort of thing - the reload_cse_regs pass can be made
more clever. I've already submitted a patch to Kenner that enables
reload_cse_regs to generate optional reloads. If we could add some more
cleverness (e.g. deleting redundant stores into spill slots or eliminating
register-register copies), quite a bit of code in reload could be deleted.
I've already made some experiments in this direction which indicate that this
approach may be feasible.
Another way of simplifying reload would be to try to generate auto-inc
addressing after reload has run, not before. That would eliminate some more
special-case code (and it might make other parts of the compiler simpler as
well).

Bernd

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Reload patch to improve 386 code
  1997-08-19  7:36 egcs: A new compiler project to merge the existing GCC forks (fwd) Robert Wilhelm
                   ` (2 preceding siblings ...)
  1997-08-19  8:08 ` Reload patch to improve 386 code Bernd Schmidt
@ 1997-08-19  8:08 ` Bernd Schmidt
  1997-08-19  9:34 ` Some Haifa scheduler bugs Bernd Schmidt
  4 siblings, 0 replies; 14+ messages in thread
From: Bernd Schmidt @ 1997-08-19  8:08 UTC (permalink / raw)
  To: egcs

> Before this leaves my head, I wanted to point something out which
> you've reminded me of.  When the scheduler (this applies to both the
> original and Haifa versions equally) becomes aggressive, it produces a
> large number of reloads in certain situations.  

The idea of running sched before reload seems to be to improve code like this:
move mem1 => pseudo1
move pseudo1 => mem2
move mem3 => pseudo2
move pseudo2 => mem4
move mem5 => pseudo3
move pseudo3 => mem6

If this is left as it stands, register allocation will most likely allocate
the pseudo to the same hard register. This means the post-reload sched pass
can't do anything with it, and the CPU can't either because there is no
parallelism in the code (well, at least the Pentium can't).

If sched modifies the above to look like this

move mem1 => pseudo1
move mem3 => pseudo2
move mem5 => pseudo3
move pseudo1 => mem2
move pseudo2 => mem4
move pseudo3 => mem6

you suddenly have two blocks of three independent instructions which could
run in parallel. However, this will lose badly once you don't have three
instructions of that kind, but a hundred (since your average CPU doesn't
have a hundred hard registers).

Another approach I've been thinking about is to add code that analyzes code
like this after reload

move mem1 => hardreg1
move hardreg1 => mem2
move mem3 => hardreg1
move hardreg1 => mem4
move mem5 => hardreg1
move hardreg1 => mem6

and tries to make it use as many independent hard registers as possible.
That would make the scheduling opportunities available without the risk
of over-scheduling before reload. I don't know how feasible this is.

Bernd

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Some Haifa scheduler bugs
  1997-08-19  7:36 egcs: A new compiler project to merge the existing GCC forks (fwd) Robert Wilhelm
                   ` (3 preceding siblings ...)
  1997-08-19  8:08 ` Bernd Schmidt
@ 1997-08-19  9:34 ` Bernd Schmidt
  4 siblings, 0 replies; 14+ messages in thread
From: Bernd Schmidt @ 1997-08-19  9:34 UTC (permalink / raw)
  To: egcs

I've run c-torture on the egcs snapshot, using the Haifa scheduler with
all flags turned on. My system is an i586-linux one. Here's a patch to fix
some of the failures.

There seems to be at least one additional problem, which I haven't really
investigated yet. It appears as if the scheduler extends the lifetime of
hard registers, which is deadly on SMALL_REGISTER_CLASSES machines. More
specifically, the broken code looks like this:

	call somefunction
	do a multiplication; clobbers eax
	movl eax,ebx

while the correct code moves eax to ebx directly after the call. I have not
found any code in either haifa-sched.c or sched.c to prevent this problem,
so it's quite possible that the normal scheduler may also suffer from this.

	* haifa-sched.c (target_bb, bbset_size, dom, prob, rgn_nr_edges,
	rgn_edges, edgeset_size, edge_to_bit, pot_split, ancestor_edges):
	Make static.
	(move_insn): Call reemit_notes for every insn in a SCHED_GROUP.
	(schedule_block): When checking whether an insn is the 
	basic_block_head of a different block, use the first insn in the
	same SCHED_GROUP instead of the insn itself.
	(debug_dependencies): GET_RTX_NAME takes an rtx code, not an rtx.

*** haifa-sched.c.orig-1	Mon Aug 18 13:03:47 1997
--- haifa-sched.c	Mon Aug 18 21:00:48 1997
*************** int *bblst_table, bblst_size, bblst_last
*** 626,632 ****
  #define SRC_PROB(src) ( candidate_table[src].src_prob )
  
  /* The bb being currently scheduled.  */
! int target_bb;
  
  /* List of edges.  */
  typedef bitlst edgelst;
--- 626,632 ----
  #define SRC_PROB(src) ( candidate_table[src].src_prob )
  
  /* The bb being currently scheduled.  */
! static int target_bb;
  
  /* List of edges.  */
  typedef bitlst edgelst;
*************** void debug_candidates PROTO ((int));
*** 642,652 ****
  typedef bitset bbset;
  
  /* Number of words of the bbset.  */
! int bbset_size;
  
  /* Dominators array: dom[i] contains the bbset of dominators of
     bb i in the region.  */
! bbset *dom;
  
  /* bb 0 is the only region entry */
  #define IS_RGN_ENTRY(bb) (!bb)
--- 642,652 ----
  typedef bitset bbset;
  
  /* Number of words of the bbset.  */
! static int bbset_size;
  
  /* Dominators array: dom[i] contains the bbset of dominators of
     bb i in the region.  */
! static bbset *dom;
  
  /* bb 0 is the only region entry */
  #define IS_RGN_ENTRY(bb) (!bb)
*************** bbset *dom;
*** 657,663 ****
  
  /* Probability: Prob[i] is a float in [0, 1] which is the probability
     of bb i relative to the region entry.  */
! float *prob;
  
  /*  The probability of bb_src, relative to bb_trg.  Note, that while the
     'prob[bb]' is a float in [0, 1], this macro returns an integer
--- 657,663 ----
  
  /* Probability: Prob[i] is a float in [0, 1] which is the probability
     of bb i relative to the region entry.  */
! static float *prob;
  
  /*  The probability of bb_src, relative to bb_trg.  Note, that while the
     'prob[bb]' is a float in [0, 1], this macro returns an integer
*************** float *prob;
*** 669,684 ****
  typedef bitset edgeset;
  
  /* Number of edges in the region.  */
! int rgn_nr_edges;
  
  /* Array of size rgn_nr_edges.    */
! int *rgn_edges;
  
  /* Number of words in an edgeset.    */
! int edgeset_size;
  
  /* Mapping from each edge in the graph to its number in the rgn.  */
! int *edge_to_bit;
  #define EDGE_TO_BIT(edge) (edge_to_bit[edge])
  
  /* The split edges of a source bb is different for each target
--- 669,684 ----
  typedef bitset edgeset;
  
  /* Number of edges in the region.  */
! static int rgn_nr_edges;
  
  /* Array of size rgn_nr_edges.    */
! static int *rgn_edges;
  
  /* Number of words in an edgeset.    */
! static int edgeset_size;
  
  /* Mapping from each edge in the graph to its number in the rgn.  */
! static int *edge_to_bit;
  #define EDGE_TO_BIT(edge) (edge_to_bit[edge])
  
  /* The split edges of a source bb is different for each target
*************** int *edge_to_bit;
*** 687,696 ****
     the split edges of each bb relative to the region entry.
  
     pot_split[bb] is the set of potential split edges of bb.  */
! edgeset *pot_split;
  
  /* For every bb, a set of its ancestor edges.  */
! edgeset *ancestor_edges;
  
  static void compute_dom_prob_ps PROTO ((int));
  
--- 687,696 ----
     the split edges of each bb relative to the region entry.
  
     pot_split[bb] is the set of potential split edges of bb.  */
! static edgeset *pot_split;
  
  /* For every bb, a set of its ancestor edges.  */
! static edgeset *ancestor_edges;
  
  static void compute_dom_prob_ps PROTO ((int));
  
*************** build_control_flow ()
*** 1277,1284 ****
  /* construct edges in the control flow graph, from 'source' block, to
     blocks refered to by 'pattern'.  */
  
! static
! void 
  build_jmp_edges (pattern, source)
       rtx pattern;
       int source;
--- 1277,1283 ----
  /* construct edges in the control flow graph, from 'source' block, to
     blocks refered to by 'pattern'.  */
  
! static void
  build_jmp_edges (pattern, source)
       rtx pattern;
       int source;
*************** move_insn (insn, last)
*** 6512,6520 ****
        move_insn1 (insn, last);
        insn = prev;
      }
- 
    move_insn1 (insn, last);
!   return reemit_notes (new_last, new_last);
  }
  
  /* Return an insn which represents a SCHED_GROUP, which is
--- 6511,6524 ----
        move_insn1 (insn, last);
        insn = prev;
      }
    move_insn1 (insn, last);
!   while (insn != new_last)
!     {
!       rtx next = NEXT_INSN (insn);
!       reemit_notes (insn, insn);
!       insn = next;
!     }
!   return reemit_notes (insn, insn);
  }
  
  /* Return an insn which represents a SCHED_GROUP, which is
*************** schedule_block (bb, rgn, rgn_n_insns)
*** 6840,6848 ****
  	      /* an interblock motion? */
  	      if (INSN_BB (insn) != target_bb)
  		{
  		  if (IS_SPECULATIVE_INSN (insn))
  		    {
- 
  		      if (!check_live (insn, INSN_BB (insn), target_bb))
  			{
  			  /* speculative motion, live check failed, remove
--- 6844,6853 ----
  	      /* an interblock motion? */
  	      if (INSN_BB (insn) != target_bb)
  		{
+ 		  rtx tmp;
+ 
  		  if (IS_SPECULATIVE_INSN (insn))
  		    {
  		      if (!check_live (insn, INSN_BB (insn), target_bb))
  			{
  			  /* speculative motion, live check failed, remove
*************** schedule_block (bb, rgn, rgn_n_insns)
*** 6861,6878 ****
  		  nr_inter++;
  
  		  /* update source block boundaries */
  		  b1 = INSN_BLOCK (insn);
! 		  if (insn == basic_block_head[b1]
  		      && insn == basic_block_end[b1])
  		    {
! 		      emit_note_after (NOTE_INSN_DELETED, basic_block_head[b1]);
  		      basic_block_end[b1] = basic_block_head[b1] = NEXT_INSN (insn);
  		    }
  		  else if (insn == basic_block_end[b1])
  		    {
! 		      basic_block_end[b1] = PREV_INSN (insn);
  		    }
! 		  else if (insn == basic_block_head[b1])
  		    {
  		      basic_block_head[b1] = NEXT_INSN (insn);
  		    }
--- 6866,6886 ----
  		  nr_inter++;
  
  		  /* update source block boundaries */
+ 		  tmp = insn;
+ 		  while (SCHED_GROUP_P (tmp))
+ 		    tmp = PREV_INSN (tmp);
  		  b1 = INSN_BLOCK (insn);
! 		  if (tmp == basic_block_head[b1]
  		      && insn == basic_block_end[b1])
  		    {
! 		      emit_note_after (NOTE_INSN_DELETED, insn);
  		      basic_block_end[b1] = basic_block_head[b1] = NEXT_INSN (insn);
  		    }
  		  else if (insn == basic_block_end[b1])
  		    {
! 		      basic_block_end[b1] = PREV_INSN (tmp);
  		    }
! 		  else if (tmp == basic_block_head[b1])
  		    {
  		      basic_block_head[b1] = NEXT_INSN (insn);
  		    }
*************** debug_dependencies ()
*** 7383,7389 ****
  				 NOTE_SOURCE_FILE (insn));
  		    }
  		  else
! 		    fprintf (dump, " {%s}\n", GET_RTX_NAME (insn));
  		  continue;
  		}
  
--- 7391,7397 ----
  				 NOTE_SOURCE_FILE (insn));
  		    }
  		  else
! 		    fprintf (dump, " {%s}\n", GET_RTX_NAME (GET_CODE (insn)));
  		  continue;
  		}
  

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Some Haifa scheduler bugs
@ 1997-08-19 17:54 Jeffrey A Law
  1997-08-19 17:54 ` egcs: A new compiler project to merge the existing GCC forks (fwd) Dave Love
  0 siblings, 1 reply; 14+ messages in thread
From: Jeffrey A Law @ 1997-08-19 17:54 UTC (permalink / raw)
  To: egcs

  In message <Pine.SOL.3.90.970819092828.291B-100000@starsky.informatik.rwth-aa
chen.de>you write:
  > I've run c-torture on the egcs snapshot, using the Haifa scheduler with
  > all flags turned on. My system is an i586-linux one. Here's a patch to fix
  > some of the failures.
Thanks.

Just some comments:

  * Get a copyright assignment + disclaimer signed and sent to
  the FSF as soon as possible.  Until that time we can't take any
  of your patches and include them without rewriting them first.

  * When submitting patches, send separate patches for bugs from
  random cleanups.  The cleanups are greatly appreciated, especially
  for haifa.

  * When submitting bugfix patches, please submit a testcase, or
  refer us to a c-torture testcase and the options needed to expose
  the bug.

Specific questions/comments:

In move_insn, is there some reason why you can't call reemit_notes
during the loop on SCHED_GROUP_P insns?

ie, does this work instead?  Seems cleaner than using another loop
if it works.

{
  rtx new_last = insn;

  while (SCHED_GROUP_P (insn))
    {
      rtx prev = PREV_INSN (insn);
      move_insn1 (insn, last);
      reemit_notes (insn, insn);
      insn = prev;
    }

  move_insn1 (insn, last);
  return reemit_notes (new_last, new_last);
}

[ Of course if you had referred us to a testcase, we could check
  this ourselves.... ]

Jeff

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: egcs: A new compiler project to merge the existing GCC forks (fwd)
  1997-08-19 17:54 Jeffrey A Law
@ 1997-08-19 17:54 ` Dave Love
  0 siblings, 0 replies; 14+ messages in thread
From: Dave Love @ 1997-08-19 17:54 UTC (permalink / raw)
  To: egcs

>>>>> "Jeffrey" == Jeffrey A Law <law@hurl.cygnus.com> writes:

 Jeffrey> Could be -- I don't think gcc-2.7* scheduled instructions on
 Jeffrey> the x86 machines at all.

FWIW, the last gcc2 snapshot I could build (with -m586 in) typically
seemed to gain about 20% on a 586 with single-precision Fortran code,
roughly consistent with numbers reported by proprietary offerings at
the time, though some with `pentium optimization' only seemed to
perform about as well as the gcc-2.7-based g77 (generating 486 code).

If people care about Fortran performance I'll eventually do some
realistic tests on a 586, but have no access to a 686; this isn't
necessarily trivial, though, and I'm more interested in the release of
a correct g77 0.5.21 at this stage.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: egcs: A new compiler project to merge the existing GCC forks (fwd)
@ 1997-08-19 17:18 Joern Rennecke
  0 siblings, 0 replies; 14+ messages in thread
From: Joern Rennecke @ 1997-08-19 17:18 UTC (permalink / raw)
  To: egcs

>   In message <Pine.GSO.3.96.970819004905.4653A-100000@drabble>you write:
>   > can old scheduler be the source of the problem?
> Could be -- I don't think gcc-2.7* scheduled instructions on the
> x86 machines at all.
> 
> So, one interesting test would be to run the benchmark with "-O2",
> then again with "-O2 -fno-schedule-insns -fno-schedule-insns2".
> 
> That would tell us if we need to focus on the scheduler or not.

And if you look for the best performance right now, I suggest to try
-O2 -fno-schedule-insns .  Scheduling after reload can't hurt register
allocation, and it might do some good.  OTOH, it can hurt when it
disables peepholes.  It's a bity we don't have any actual peephole
optimization pass - combine.c works before reload and is limited in the
number and kind of insns it can combine, and the peepholes used by final
don't allow re-iteration and are not designed to recognize insns sequences
with some unrelated insns in-between.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: egcs: A new compiler project to merge the existing GCC forks (fwd)
@ 1997-08-19 13:19 H.J. Lu
  0 siblings, 0 replies; 14+ messages in thread
From: H.J. Lu @ 1997-08-19 13:19 UTC (permalink / raw)
  To: egcs

> 
> 
> HJ, can you work with this person to find out _why_ performance
> is suffering?
> 

I am still working on prototyping. But if noone is looking at it,
I will take a look.


H.J.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Reload patch to improve 386 code
@ 1997-08-19  8:50 Jakub Jelinek
  1997-08-19  9:47 ` egcs: A new compiler project to merge the existing GCC forks (fwd) Dave Love
  0 siblings, 1 reply; 14+ messages in thread
From: Jakub Jelinek @ 1997-08-19  8:50 UTC (permalink / raw)
  To: egcs

> 
>    Date: Mon, 18 Aug 1997 11:17:38 -0400 (EDT)
>    From: meissner@cygnus.com
> 
>    I think it is an horrible kludge that reload is run as a pass after
>    global-alloc, and that it forces reload registers not to be used
>    for any other purpose (which is murder on the x86 with each
>    register being special purpose in some way).
> 
> Before this leaves my head, I wanted to point something out which
> you've reminded me of.  When the scheduler (this applies to both the
> original and Haifa versions equally) becomes aggressive, it produces a
> large number of reloads in certain situations.  Reloads which would
> not have happened if scheduling did not take place.  This happens
> especially if register pressure is high already.  I noticed this
> particularly on RISC platforms, seems in this case the more registers
> available the worse things became when the register usage was
> saturated.

I thought about a quick solution, which would be during global-alloc, if it
finds out that the number of hard registers is exceeded, it could try to
undo some short pseudo setup RTL sequence merges and move them to the place
of the actual use, if the pseudo being set up is a constant and computable
in small number of instructions not involving memory loads.
Like that, we could rid of the following horror on sparc64:

	sethi %hi(var1), %r1
	stx %r1, [%sp + NN]
	...
	ldx [%sp + NN], %r1
	or %r1, %lo(var1), %r1
	stx %r1, [%sp + NN]
	...
some loop:
	...
	ldx [%sp + NN], %r1
	ldx [%r1], %r1
	...

and could have:

some loop:
	...
	sethi %hi(var1), %r1
	ldx [%r1 + %lo(var1)], %r1
	...

instead...

Cheers,
    Jakub
___________________________________________________________________
Jakub Jelinek | jj@sunsite.mff.cuni.cz | http://sunsite.mff.cuni.cz
Administrator of SunSITE Czech Republic, MFF, Charles University
___________________________________________________________________
Ultralinux - first 64bit OS to take fool power from the UltraSparc
Linux version 2.0.30 on a sparc machine (291.64 BogoMips).
___________________________________________________________________

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: egcs: A new compiler project to merge the existing GCC forks (fwd)
  1997-08-19  8:50 Reload patch to improve 386 code Jakub Jelinek
@ 1997-08-19  9:47 ` Dave Love
  0 siblings, 0 replies; 14+ messages in thread
From: Dave Love @ 1997-08-19  9:47 UTC (permalink / raw)
  To: egcs

>>>>> "H" == H J Lu <hjl@lucon.org> writes:

 H> I downloaded this a few days ago - compiles and runs without any
 H> problems on a PentiumPro Linux 2.0.30 (Redhat 4.2) system - but:

 H> execution speed (floating point) of a test case (mdbench) 

Note that mdbnch (at least the version I know) is in double precision.
Thus other performance considerations are typically overshadowed on
ppro by the double alignment problems.  See the g77 manual.

 H>  - I am back right now to the old
 H> stuff, unless I get to hear a convincing reason why to switch.

I doubt it's wise to use the g77 in egcs seriously at least until it's
based on a version that's completed alpha testing for g77 0.5.21.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* egcs: A new compiler project to merge the existing GCC forks (fwd)
@ 1997-08-19  3:52 H.J. Lu
  1997-08-19  4:27 ` Jeffrey A Law
                   ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: H.J. Lu @ 1997-08-19  3:52 UTC (permalink / raw)
  To: egcs

Forwarded message:
Date: Tue, 19 Aug 1997 11:09:07 +0900
From: Arno PAHLER <paehler@atlas.rc.m-kagaku.co.jp>
Message-Id: <199708190209.LAA04886@atlas.rc.m-kagaku.co.jp>
To: "H.J. Lu" <hjl@lucon.org>
In-reply-to: "H.J. Lu"'s message of Sun, 17 Aug 1997 09:12:40 -0700
Subject: egcs: A new compiler project to merge the existing GCC forks

I downloaded this a few days ago - compiles and runs without any
problems on a PentiumPro Linux 2.0.30 (Redhat 4.2) system - but:

execution speed (floating point) of a test case (mdbench) compiled
with f2c+gcc is about 10% slower than using gcc 2.7.2.1 - it is
about the same or very slighly faster than g77 0.5.19.1 when using
g77 0.5.21 - when using single precision both f2c+gcc and g77 are
about 10-25% slower than their gcc 2.7.2.1/g77 0.5.19.1 counter-
parts.

I had hoped that performance would improve rather than get worse -
is it so hard to optimize for x86? - I am back right now to the old
stuff, unless I get to hear a convincing reason why to switch.

Arno

-- 
H.J. Lu (hjl@gnu.ai.mit.edu)

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: egcs: A new compiler project to merge the existing GCC forks (fwd)
  1997-08-19  3:52 H.J. Lu
@ 1997-08-19  4:27 ` Jeffrey A Law
  1997-08-19  5:08 ` Oleg Krivosheev
  1997-08-19  6:01 ` Jeffrey A Law
  2 siblings, 0 replies; 14+ messages in thread
From: Jeffrey A Law @ 1997-08-19  4:27 UTC (permalink / raw)
  To: egcs

HJ, can you work with this person to find out _why_ performance
is suffering?

If nobody takes the time to analyze these problems, then performance
is never going to get significantly better.

  In message <m0x0fih-0004ecC@ocean.lucon.org>you write:
  > Forwarded message:
  > >From paehler@atlas.rc.m-kagaku.co.jp Mon Aug 18 19:09:18 1997
  > Date: Tue, 19 Aug 1997 11:09:07 +0900
  > From: Arno PAHLER <paehler@atlas.rc.m-kagaku.co.jp>
  > Message-Id: <199708190209.LAA04886@atlas.rc.m-kagaku.co.jp>
  > To: "H.J. Lu" <hjl@lucon.org>
  > In-reply-to: "H.J. Lu"'s message of Sun, 17 Aug 1997 09:12:40 -0700
  > Subject: egcs: A new compiler project to merge the existing GCC forks
  > 
  > 
  > I downloaded this a few days ago - compiles and runs without any
  > problems on a PentiumPro Linux 2.0.30 (Redhat 4.2) system - but:
  > 
  > execution speed (floating point) of a test case (mdbench) compiled
  > with f2c+gcc is about 10% slower than using gcc 2.7.2.1 - it is
  > about the same or very slighly faster than g77 0.5.19.1 when using
  > g77 0.5.21 - when using single precision both f2c+gcc and g77 are
  > about 10-25% slower than their gcc 2.7.2.1/g77 0.5.19.1 counter-
  > parts.
  > 
  > I had hoped that performance would improve rather than get worse -
  > is it so hard to optimize for x86? - I am back right now to the old
  > stuff, unless I get to hear a convincing reason why to switch.
  > 
  > 
  > Arno
  > 
  > 
  > -- 
  > H.J. Lu (hjl@gnu.ai.mit.edu)

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: egcs: A new compiler project to merge the existing GCC forks (fwd)
  1997-08-19  3:52 H.J. Lu
  1997-08-19  4:27 ` Jeffrey A Law
@ 1997-08-19  5:08 ` Oleg Krivosheev
  1997-08-19  6:01 ` Jeffrey A Law
  2 siblings, 0 replies; 14+ messages in thread
From: Oleg Krivosheev @ 1997-08-19  5:08 UTC (permalink / raw)
  To: egcs

  Hi,

On Mon, 18 Aug 1997, Jeffrey A Law wrote:

> HJ, can you work with this person to find out _why_ performance
> is suffering?
> 
> If nobody takes the time to analyze these problems, then performance
> is never going to get significantly better.

can old scheduler be the source of the problem?
i was able to figure out switch --enable-haifa
only looking into ./configure script. New scheduler
is off  by default.

Can someone point me the location of mdbench?

i'll benchmark it...

regards

OK

> 
>   In message <m0x0fih-0004ecC@ocean.lucon.org>you write:
>   > Forwarded message:
>   > >From paehler@atlas.rc.m-kagaku.co.jp Mon Aug 18 19:09:18 1997
>   > Date: Tue, 19 Aug 1997 11:09:07 +0900
>   > From: Arno PAHLER <paehler@atlas.rc.m-kagaku.co.jp>
>   > Message-Id: <199708190209.LAA04886@atlas.rc.m-kagaku.co.jp>
>   > To: "H.J. Lu" <hjl@lucon.org>
>   > In-reply-to: "H.J. Lu"'s message of Sun, 17 Aug 1997 09:12:40 -0700
>   > Subject: egcs: A new compiler project to merge the existing GCC forks
>   > 
>   > 
>   > I downloaded this a few days ago - compiles and runs without any
>   > problems on a PentiumPro Linux 2.0.30 (Redhat 4.2) system - but:
>   > 
>   > execution speed (floating point) of a test case (mdbench) compiled
>   > with f2c+gcc is about 10% slower than using gcc 2.7.2.1 - it is
>   > about the same or very slighly faster than g77 0.5.19.1 when using
>   > g77 0.5.21 - when using single precision both f2c+gcc and g77 are
>   > about 10-25% slower than their gcc 2.7.2.1/g77 0.5.19.1 counter-
>   > parts.
>   > 
>   > I had hoped that performance would improve rather than get worse -
>   > is it so hard to optimize for x86? - I am back right now to the old
>   > stuff, unless I get to hear a convincing reason why to switch.
>   > 
>   > 
>   > Arno
>   > 
>   > 
>   > -- 
>   > H.J. Lu (hjl@gnu.ai.mit.edu)
> 

                                     Oleg Krivosheev, 
                                     MS 345, AD/Physics,
                                     Fermi National Accelerator Laboratory,
                                     P.O.Box 500, Batavia, Illinois, 60510.
                                     phone: (630) 840 8460
                                     FAX  : (630) 840 4552
                                     Email: kriol@fnal.gov

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: egcs: A new compiler project to merge the existing GCC forks (fwd)
  1997-08-19  3:52 H.J. Lu
  1997-08-19  4:27 ` Jeffrey A Law
  1997-08-19  5:08 ` Oleg Krivosheev
@ 1997-08-19  6:01 ` Jeffrey A Law
  2 siblings, 0 replies; 14+ messages in thread
From: Jeffrey A Law @ 1997-08-19  6:01 UTC (permalink / raw)
  To: egcs

  In message <Pine.GSO.3.96.970819004905.4653A-100000@drabble>you write:
  > can old scheduler be the source of the problem?
Could be -- I don't think gcc-2.7* scheduled instructions on the
x86 machines at all.

So, one interesting test would be to run the benchmark with "-O2",
then again with "-O2 -fno-schedule-insns -fno-schedule-insns2".

That would tell us if we need to focus on the scheduler or not.

seem like haifa could help the pentium pro, however the i386.md
file would have to be tweaked to get the best performance out of
haifa.

Jeff

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~1997-08-19 17:54 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
1997-08-19  7:36 egcs: A new compiler project to merge the existing GCC forks (fwd) Robert Wilhelm
1997-08-19  8:08 ` Reload patch to improve 386 code Jeffrey A Law
1997-08-19  8:08 ` Testsuite stuff Jeffrey A Law
1997-08-19  8:08 ` Reload patch to improve 386 code Bernd Schmidt
1997-08-19  8:08 ` Bernd Schmidt
1997-08-19  9:34 ` Some Haifa scheduler bugs Bernd Schmidt
  -- strict thread matches above, loose matches on Subject: below --
1997-08-19 17:54 Jeffrey A Law
1997-08-19 17:54 ` egcs: A new compiler project to merge the existing GCC forks (fwd) Dave Love
1997-08-19 17:18 Joern Rennecke
1997-08-19 13:19 H.J. Lu
1997-08-19  8:50 Reload patch to improve 386 code Jakub Jelinek
1997-08-19  9:47 ` egcs: A new compiler project to merge the existing GCC forks (fwd) Dave Love
1997-08-19  3:52 H.J. Lu
1997-08-19  4:27 ` Jeffrey A Law
1997-08-19  5:08 ` Oleg Krivosheev
1997-08-19  6:01 ` Jeffrey A Law

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).