Re: register allocation vs. scheduling and other stuff

public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed

* Re: register allocation vs. scheduling and other stuff
@ 2003-01-07 12:47 Robert Dewar
  2003-01-07 19:53 ` tm_gccmail
  0 siblings, 1 reply; 15+ messages in thread
From: Robert Dewar @ 2003-01-07 12:47 UTC (permalink / raw)
  To: dberlin, tm; +Cc: dewar, gcc, lucier

> They also aren't necessarily doing better than a good scheduler and 
> good allocator, in terms of runtime performance or compile time 
> performance.

I disagree for architectures like the ia64. Have a look at some of the
recent literature in this area.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: register allocation vs. scheduling and other stuff
  2003-01-07 12:47 register allocation vs. scheduling and other stuff Robert Dewar
@ 2003-01-07 19:53 ` tm_gccmail
  0 siblings, 0 replies; 15+ messages in thread
From: tm_gccmail @ 2003-01-07 19:53 UTC (permalink / raw)
  To: Robert Dewar; +Cc: dberlin, tm, gcc, lucier

On Tue, 7 Jan 2003, Robert Dewar wrote:

> > They also aren't necessarily doing better than a good scheduler and 
> > good allocator, in terms of runtime performance or compile time 
> > performance.
> 
> I disagree for architectures like the ia64. Have a look at some of the
> recent literature in this area.

Do you mean ia64, or do you mean Merced?

Merced doesn't seem to be a very good implementation of the ia64
architecture. McKinley seems to incorporate more opportunities for 
out-of-order execution and is therefore far less sensitive to scheduling
issues.

I would guess you actually mean "Merced" in this context.

If you do mean Merced, there are fewer than 5,000 Merced processors
actually in use in the world, and those have been obsoleted anyway by
McKinley. So I'm not sure it makes much sense to optimize well for Merced.

Can you cite a specific reference for "recent literature" ?

Toshi

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: register allocation vs. scheduling and other stuff
  2003-01-07 23:26 ` tm
@ 2003-01-08  0:34   ` Mike Stump
  0 siblings, 0 replies; 15+ messages in thread
From: Mike Stump @ 2003-01-08  0:34 UTC (permalink / raw)
  To: tm; +Cc: Robert Dewar, tm_gccmail, dberlin, gcc, lucier

On Tuesday, January 7, 2003, at 03:10 PM, tm wrote:
> On Tue, 7 Jan 2003, Robert Dewar wrote:
>  "Intel and Microsoft are the No. 1 buyers of Itanium to date,"

Remember, Enron used to buy power from Enron.  That strategy worked for 
a while.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: register allocation vs. scheduling and other stuff
  2003-01-07 23:18 ` Florian Weimer
@ 2003-01-08  0:32   ` tm_gccmail
  0 siblings, 0 replies; 15+ messages in thread
From: tm_gccmail @ 2003-01-08  0:32 UTC (permalink / raw)
  To: Florian Weimer; +Cc: Robert Dewar, gcc

On Tue, 7 Jan 2003, Florian Weimer wrote:

> dewar@gnat.com (Robert Dewar) writes:
> 
> > are you really sure that only 5000 Itanium machines shipped.
> 
> 5000 processors are even much fewer than 5000 machines!
> 
> (We've got 16-way system at the university, and a few more smaller
> boxen, and I can't believe that we've got a measurable portion of all
> Itaniums installed world-wide over here.)

Well, 2,716 Itaniums 1s were shipped in 2001, and roughly 8,000 Itanium
1+2s in 2002, according to my previous message.

So if you have a total of 30 Itanium processors, then you have about
30/10,716 or about 0.3% of all Itanium processors in existence.

Toshi

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: register allocation vs. scheduling and other stuff
  2003-01-07 20:41 Robert Dewar
  2003-01-07 23:04 ` tm
  2003-01-07 23:18 ` Florian Weimer
@ 2003-01-07 23:26 ` tm
  2003-01-08  0:34   ` Mike Stump
  2 siblings, 1 reply; 15+ messages in thread
From: tm @ 2003-01-07 23:26 UTC (permalink / raw)
  To: Robert Dewar; +Cc: tm_gccmail, dberlin, gcc, lucier

On Tue, 7 Jan 2003, Robert Dewar wrote:

> are you really sure that only 5000 Itanium machines shipped. I knew the
> number was low, but I am surprised it was that low.

Few more:

http://www.itworld.com/Comp/4211/IDG020104itanium/index.html

"During the third quarter of 2001, hardware vendors sold a grand total of
 1,135 workstations, with HP accounting for 650 (57 percent) of those 
 units, according to Kara Yokley, workstation research analyst at
 International Data Corp. (IDC), in Framingham, Massachusetts. IBM
 Corp. sold 385 workstations and Dell Computer Corp. and Silicon Graphics
 Inc. each shipped 50.

 Overall, hardware companies shipped 346,846 workstations in the third
 quarter, according to recent numbers from Dataquest Inc., a unit of
 Gartner Inc.

 What makes the lackluster sales even more disheartening is that the two
 companies banking on the future of Itanium -- Intel and Microsoft
 Corp. -- may have purchased as much as 80 percent of the workstations
 sold in the third quarter, Yokley said.

 "Intel and Microsoft are the No. 1 buyers of Itanium to date," she
 said. "They are pretty much the two major customers for Itanium."

 Only "a handful" of customers bought the remaining Itanium servers not
 scooped up by Intel and Microsoft, said Gartner Dataquest workstation
 analyst Pia Rieppo."
...
 HP had urged customers to begin testing first-generation servers now and
 make large purchases when McKinley comes out. As it turns out, most users
 may begin testing Itanium with the McKinley generation of chips and not
 make large purchases until Madison arrives, analysts said."

I found other articles which mentioned the 2002 sales of Itanium 1/2 chips
seems to be around 8000, and they're "projecting" 50,000 for 2003.

But personally, I think analysts are usually overly optimistic - I was in
the games industry in 1997, and the analysts projected online games would
be a $4 billion dollar industry by 2002, and all the game companies rushed
to build their own online gaming networks...it was actually about $138
million in 2002.

Toshi

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: register allocation vs. scheduling and other stuff
  2003-01-07 20:41 Robert Dewar
  2003-01-07 23:04 ` tm
@ 2003-01-07 23:18 ` Florian Weimer
  2003-01-08  0:32   ` tm_gccmail
  2003-01-07 23:26 ` tm
  2 siblings, 1 reply; 15+ messages in thread
From: Florian Weimer @ 2003-01-07 23:18 UTC (permalink / raw)
  To: Robert Dewar; +Cc: gcc

dewar@gnat.com (Robert Dewar) writes:

> are you really sure that only 5000 Itanium machines shipped.

5000 processors are even much fewer than 5000 machines!

(We've got 16-way system at the university, and a few more smaller
boxen, and I can't believe that we've got a measurable portion of all
Itaniums installed world-wide over here.)

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: register allocation vs. scheduling and other stuff
  2003-01-07 20:41 Robert Dewar
@ 2003-01-07 23:04 ` tm
  2003-01-07 23:18 ` Florian Weimer
  2003-01-07 23:26 ` tm
  2 siblings, 0 replies; 15+ messages in thread
From: tm @ 2003-01-07 23:04 UTC (permalink / raw)
  To: Robert Dewar; +Cc: tm_gccmail, dberlin, gcc, lucier

On Tue, 7 Jan 2003, Robert Dewar wrote:

> are you really sure that only 5000 Itanium machines shipped. I knew the
> number was low, but I am surprised it was that low.

http://news.com.com/2100-1001-816654.html?legacy=cnet&tag=lh

"The first Itanium processor, based on the Merced core, received
 a lukewarm welcome from customers.Only 2,600 Itanium servers shipped in
 the second and third quarter, according to Gartner, while IDC puts the
 figure at closer to 500. Most of the companies that have bought Itanium
 systems have been using them in pilot projects."

http://thewhir.com/marketwatch/ita083002.cfm

"Itanium, however, has yet to establish real traction in the market, with
 only 772 servers built with Itanium or Itanium 2 processors shipped in
 the secondquarter of this year, generating a small $17 million in
 business.

 Last year 2,716 Itanium servers shipped, but the figure included two
 1,000-server clusters, said Gartner."

So maybe 5,000 Merceds shipped by now.

Toshi

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: register allocation vs. scheduling and other stuff
@ 2003-01-07 20:41 Robert Dewar
  2003-01-07 23:04 ` tm
                   ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Robert Dewar @ 2003-01-07 20:41 UTC (permalink / raw)
  To: dewar, tm_gccmail; +Cc: dberlin, gcc, lucier, tm

are you really sure that only 5000 Itanium machines shipped. I knew the
number was low, but I am surprised it was that low.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: register allocation vs. scheduling and other stuff
@ 2003-01-07 20:40 Robert Dewar
  0 siblings, 0 replies; 15+ messages in thread
From: Robert Dewar @ 2003-01-07 20:40 UTC (permalink / raw)
  To: dewar, tm_gccmail; +Cc: dberlin, gcc, lucier, tm

I assume that in your note Merced = Itanium, and McKinley = Itanium-2 (the
code words are long out of use at this stage). I really meant ia64, the
architecture, since regardless of the implementation, the scheduling and
register allocation issues are very much more important with these
implementations (all of them) of ia64.

A lot of useful material can be found at www.trimaran.org

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: register allocation vs. scheduling and other stuff
  2003-01-07  5:08 ` tm
@ 2003-01-07  6:53   ` Daniel Berlin
  0 siblings, 0 replies; 15+ messages in thread
From: Daniel Berlin @ 2003-01-07  6:53 UTC (permalink / raw)
  To: tm; +Cc: Robert Dewar, lucier, gcc


On Monday, January 6, 2003, at 10:35  PM, tm wrote:

> On Mon, 6 Jan 2003, Robert Dewar wrote:
>
>>> The point I was trying to make that got lost in all the verbiage is
>>> that one might want to try to teach the register allocator a bit
>>> about scheduling rather than trying to teach the scheduler a bit 
>>> about
>>> register allocation.
>>
>> At this point there are very good combined algorithms that are neither
>> of the above, but instead treat the allocation/scheduling problem as
>> a single problem with a unified solution.
>
> Will these combined algorithms handle register rematerialization,
> reload_cse, and other things cleanly? It sounds like it would become a
> huge, monolithic block of code.
I agree.
They also aren't necessarily doing better than a good scheduler and 
good allocator, in terms of runtime performance or compile time 
performance.

For the most part, the part you'd most want to combine the two for 
would be loops, which is what modulo scheduling (in some forms) would 
do for us.

If register usage is the main problem, then we could use a form of 
minimum register instruction sequencing (generating an instruction 
sequence S that is optimal in terms of the number of registers used) 
before register allocation, then after regalloc, let it go wild.

Their are relatively simple heuristics to give near-optimal results.
See http://citeseer.nj.nec.com/427865.html.

This might be interesting to implement, even if one doesn't want 
minimum register need all the time, just to see how good we *could* do 
(IE even if it was only used as a comparison point to see whether we 
even *could* do better on some test cases, assuming we do get some 
register pressure heuristics in the scheduler that are used instead)
> Toshi
>
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: register allocation vs. scheduling and other stuff
  2003-01-07  3:42 Robert Dewar
@ 2003-01-07  5:08 ` tm
  2003-01-07  6:53   ` Daniel Berlin
  0 siblings, 1 reply; 15+ messages in thread
From: tm @ 2003-01-07  5:08 UTC (permalink / raw)
  To: Robert Dewar; +Cc: lucier, gcc

On Mon, 6 Jan 2003, Robert Dewar wrote:

> > The point I was trying to make that got lost in all the verbiage is
> > that one might want to try to teach the register allocator a bit
> > about scheduling rather than trying to teach the scheduler a bit about
> > register allocation.
> 
> At this point there are very good combined algorithms that are neither
> of the above, but instead treat the allocation/scheduling problem as
> a single problem with a unified solution.

Will these combined algorithms handle register rematerialization,
reload_cse, and other things cleanly? It sounds like it would become a
huge, monolithic block of code.

Toshi


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: register allocation vs. scheduling and other stuff
@ 2003-01-07  3:42 Robert Dewar
  2003-01-07  5:08 ` tm
  0 siblings, 1 reply; 15+ messages in thread
From: Robert Dewar @ 2003-01-07  3:42 UTC (permalink / raw)
  To: lucier, tm; +Cc: gcc

> The point I was trying to make that got lost in all the verbiage is
> that one might want to try to teach the register allocator a bit
> about scheduling rather than trying to teach the scheduler a bit about
> register allocation.

At this point there are very good combined algorithms that are neither
of the above, but instead treat the allocation/scheduling problem as
a single problem with a unified solution.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: register allocation vs. scheduling and other stuff
  2003-01-07  0:18 ` Brad Lucier
@ 2003-01-07  1:27   ` tm_gccmail
  0 siblings, 0 replies; 15+ messages in thread
From: tm_gccmail @ 2003-01-07  1:27 UTC (permalink / raw)
  To: Brad Lucier; +Cc: tm, gcc

On Mon, 6 Jan 2003, Brad Lucier wrote:

> > >Since we will have a new register allocator for 3.4 that is based on
> > >graph coloring, perhaps one could add a new flag
> > >
> > >--fuse-all-registers
> > >
> > >that would not try to use the *minimum* number of colors to color an
> > >interference graph, but, if the minimum number is less than the actual
> > >number of available registers, it could use the actual numbers of
> > >registers to color the graph, perhaps guided by liveness information.
> > 
> > This is a kludgy solution IMHO.
> 
> OK, the special flag might be a kludge.
> 
> And it's a known problem after all.
> 
> The point I was trying to make that got lost in all the verbiage is
> that one might want to try to teach the register allocator a bit
> about scheduling rather than trying to teach the scheduler a bit about
> register allocation.

This makes sense to me too. Scheduler modifications to estimate register
pressure are dependent on the register allocation algorithm (and
possibly sensitive to small tweaks in the register allocator) whereas
register allocator modifications to reduce register pressure are not
dependent on the scheduling algorithm.

Toshi


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: register allocation vs. scheduling and other stuff
  2003-01-06 23:52 tm
@ 2003-01-07  0:18 ` Brad Lucier
  2003-01-07  1:27   ` tm_gccmail
  0 siblings, 1 reply; 15+ messages in thread
From: Brad Lucier @ 2003-01-07  0:18 UTC (permalink / raw)
  To: tm; +Cc: lucier, gcc

> >Since we will have a new register allocator for 3.4 that is based on
> >graph coloring, perhaps one could add a new flag
> >
> >--fuse-all-registers
> >
> >that would not try to use the *minimum* number of colors to color an
> >interference graph, but, if the minimum number is less than the actual
> >number of available registers, it could use the actual numbers of
> >registers to color the graph, perhaps guided by liveness information.
> 
> This is a kludgy solution IMHO.

OK, the special flag might be a kludge.

And it's a known problem after all.

The point I was trying to make that got lost in all the verbiage is
that one might want to try to teach the register allocator a bit
about scheduling rather than trying to teach the scheduler a bit about
register allocation.

For example, one *must* assign different colors to pseudo-registers if their
lifetimes overlap.  However, if one has more than enough registers for
a particular routine, one may want to assign different colors to
pseudo-registers even if their lifetimes are "close", in order to allow
the post-register-allocation scheduler a bit more flexibility.

But Dan Berlin claims that the new register allocator already uses all
the registers available to it, and a quick test on alphaev6 shows this
to indeed be the case (but I'm a bit surprised I can't get the scheduler
to hoist more of the loads to the top of the basic block---perhaps I
have to rebuild my powerpc-darwin compiler to see what happens there again).

Brad

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: register allocation vs. scheduling and other stuff
@ 2003-01-06 23:52 tm
  2003-01-07  0:18 ` Brad Lucier
  0 siblings, 1 reply; 15+ messages in thread
From: tm @ 2003-01-06 23:52 UTC (permalink / raw)
  To: lucier; +Cc: gcc

Brad Lucier wrote:

>I've been playing around with -fnew-ra, -fno-trapping-math, and various
>schedule options for 3.4 on powerpc-darwin.
>
>For a molecular energy minization code, where an affine transformation
>is applied consecutively to the location of each atom in the molecule, 
>
>-O1 -fno-trapping-math -fschedule-insns2 -fnew-ra -mcpu=7400
>
>works very well, the code is a bunch of overlapped loads, stores,
>and floating-point operations, but
>
>-O1 -fno-trapping-math -fschedule-insns -fschedule-insns2 -fnew-ra
>-mcpu=7400
>
>which also schedules *before* register allocation is 50% slower, since
>the schedule pass before hard register allocation loads *all* the x-y-z
>information for all the atoms into pseudo-registers at the top of the
>routine,
>and requires many moves between the stack and registers when these values
>are actually needed for computations.

Actually, I've seen it much worse than this.

The PowerPC has 32 registers. The SH only has 16 registers, and
when the first instruction scheduling was enabled, the code could
run over 3x slower in extreme cases. I would see assembly listings where
about 80% of the page was spent thrashing registers to/from the stack.

>Perhaps, since -fno-trapping-math is a relatively new option, this is
>a recent concern.
>
>I've heard people on these lists talk about making scheduling smarter, so
>it knows something about register pressure.  It seems that the general
>solution would be to do register allocation and scheduling together.

It may be possible to have something simpler. In many cases where I've
seen this problem, it's because the scheduler has hoisted multiple loads
up too high which starves the register allocator. If the register
lifetimes could be shortened by moving down the loads, much of the problem
would be solved.

>Since we will have a new register allocator for 3.4 that is based on
>graph coloring, perhaps one could add a new flag
>
>--fuse-all-registers
>
>that would not try to use the *minimum* number of colors to color an
>interference graph, but, if the minimum number is less than the actual
>number of available registers, it could use the actual numbers of
>registers to color the graph, perhaps guided by liveness information.

This is a kludgy solution IMHO.

1. If you have multiple functions in a source file, then you wind up
applying this option to all the functions in that file, and to have
function-level granularity you would need to split the file.

2. This doesn't permit basic-block level control of the optimization.

3. Most compiler end-users don't understand the concept of machine
registers much less "high register pressure". In order to be really
effective, it needs to be enabled automagically when needed without user
intervention.

Toshi

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2003-01-07 23:28 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-01-07 12:47 register allocation vs. scheduling and other stuff Robert Dewar
2003-01-07 19:53 ` tm_gccmail
  -- strict thread matches above, loose matches on Subject: below --
2003-01-07 20:41 Robert Dewar
2003-01-07 23:04 ` tm
2003-01-07 23:18 ` Florian Weimer
2003-01-08  0:32   ` tm_gccmail
2003-01-07 23:26 ` tm
2003-01-08  0:34   ` Mike Stump
2003-01-07 20:40 Robert Dewar
2003-01-07  3:42 Robert Dewar
2003-01-07  5:08 ` tm
2003-01-07  6:53   ` Daniel Berlin
2003-01-06 23:52 tm
2003-01-07  0:18 ` Brad Lucier
2003-01-07  1:27   ` tm_gccmail

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).