public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* RE: gcc and the IA64 ABI
@ 2003-05-23 18:41 Winalski, Paul
  0 siblings, 0 replies; 4+ messages in thread
From: Winalski, Paul @ 2003-05-23 18:41 UTC (permalink / raw)
  To: 'gcc@gcc.gnu.org'

We are experimenting with taking advantage of non-default symbol
visibility (i.e., restricting symbol preemption) to enable more
aggressive optimizations, including dispensing with the caller
save/restore of gp on external calls made via undefined external
symbols with restricted (protected, hidden, or internal) visibility.
As permitted by the ELF Spec, we are interpreting a declaration such
as:

    void foo() __attribute__((visibility("protected")));

where foo is external to the current compilation unit, as an assertion
by the programmer that foo(), although external, will be linked into
the same component as the current compilation unit, and therefore
will share the same gp value, and therefore calls to foo() can dispense
with saving/restoring gp around the call.

The part of the IA64 ABI under discussion restricts tail calls to undefined
external symbols to those cases where the compiler knows that the
target of the tail call will be in the same component (and hence share
the same gp value) as the routine making the tail call.  Essentially
the same conditions as we are using for the optimization dispensing
with the save/restore of gp around a call.

Our study of the situation so far indicates that tail call opportunities
on Itanium are quite limited.  Use of the alloc instruction in the caller
precludes tail calls, for example.  But eliminating caller gp save/restore
seems to be promising.  Early testing seems to indicate significant
improvement on some important programs; we're still in the process of
collecting performance data.  If you have evidence that tail call
optimization is a significant performance win on Itanium, we'd love
to hear about it before we go charging down the wrong path.

Regards,

-Paul Winalski, Intel Corporation

^ permalink raw reply	[flat|nested] 4+ messages in thread

* RE: gcc and the IA64 ABI
@ 2003-05-28 15:12 Winalski, Paul
  0 siblings, 0 replies; 4+ messages in thread
From: Winalski, Paul @ 2003-05-28 15:12 UTC (permalink / raw)
  To: 'Steve Ellcey', gcc, rth

Actually, I was speaking about our (Intel's) compiler, not gcc.

-Paul W.

-----Original Message-----
From: Steve Ellcey [mailto:sje@cup.hp.com]
Sent: Friday, May 23, 2003 4:36 PM
To: Winalski, Paul; gcc@gcc.gnu.org; rth@redhat.com
Subject: RE: gcc and the IA64 ABI


> > But eliminating caller gp save/restore
> > seems to be promising.  Early testing seems to indicate significant
> > improvement on some important programs; we're still in the process of
> > collecting performance data.
> 
> I wouldn't have suspected that this would produce that much of a win,
> given that restoring the gp is so cheap; just a single mov instruction.

I wonder if the speed up is you are seeing is due to a side-effect of
how the GP is saved and restored.  GCC on IA64 was (is?)  using a single
register to save/restore gp and it always used the same register.  This
made otherwise loop invariant function calls (like integer division with
loop invariant arguments) look like they were never loop-invariant and
could not be moved out of the loop, especially when there were multiple
divisions (all loop invariant) in one loop.  See
http://gcc.gnu.org/ml/gcc-patches/2002-03/msg00179.html for a change I
proposed that allowed loop invariant code motion of calls, this change
was not checked in and I believe this situation still exists in 3.3.  It
may have been fixed on the top-of-tree as the way calls are expanded has
been changed and the routine ia64_gp_save_reg no longer exists.  See
http://gcc.gnu.org/ml/gcc-patches/2003-03/msg01196.html for where that
was removed by Richard Henderson.

Steve Ellcey
sje@cup.hp.com

^ permalink raw reply	[flat|nested] 4+ messages in thread

* RE: gcc and the IA64 ABI
@ 2003-05-23 20:38 Steve Ellcey
  0 siblings, 0 replies; 4+ messages in thread
From: Steve Ellcey @ 2003-05-23 20:38 UTC (permalink / raw)
  To: paul.winalski, gcc, rth

> > But eliminating caller gp save/restore
> > seems to be promising.  Early testing seems to indicate significant
> > improvement on some important programs; we're still in the process of
> > collecting performance data.
> 
> I wouldn't have suspected that this would produce that much of a win,
> given that restoring the gp is so cheap; just a single mov instruction.

I wonder if the speed up is you are seeing is due to a side-effect of
how the GP is saved and restored.  GCC on IA64 was (is?)  using a single
register to save/restore gp and it always used the same register.  This
made otherwise loop invariant function calls (like integer division with
loop invariant arguments) look like they were never loop-invariant and
could not be moved out of the loop, especially when there were multiple
divisions (all loop invariant) in one loop.  See
http://gcc.gnu.org/ml/gcc-patches/2002-03/msg00179.html for a change I
proposed that allowed loop invariant code motion of calls, this change
was not checked in and I believe this situation still exists in 3.3.  It
may have been fixed on the top-of-tree as the way calls are expanded has
been changed and the routine ia64_gp_save_reg no longer exists.  See
http://gcc.gnu.org/ml/gcc-patches/2003-03/msg01196.html for where that
was removed by Richard Henderson.

Steve Ellcey
sje@cup.hp.com

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: gcc and the IA64 ABI
       [not found] <A5974D8E5F98D511BB910002A50A6647065013B8@hdsmsx103.hd.intel.com>
@ 2003-05-23 19:09 ` Richard Henderson
  0 siblings, 0 replies; 4+ messages in thread
From: Richard Henderson @ 2003-05-23 19:09 UTC (permalink / raw)
  To: Winalski, Paul
  Cc: 'gcc@gcc.gnu.org',
	Lu, Hongjiu, Sehr, David C, Kirkegaard, Knud J, Rao, Suresh K

On Fri, May 23, 2003 at 11:04:27AM -0700, Winalski, Paul wrote:
> The part of the IA64 ABI under discussion restricts tail calls to undefined
> external symbols to those cases where the compiler knows that the
> target of the tail call will be in the same component (and hence share
> the same gp value) as the routine making the tail call.

Yes.

Though we missed this restriction in GCC.  So at present GCC does
not follow this bit of the ABI, and doesn't preserve *any* value
in the GP after a call.

But I suspect you know this and that's why you're writing this mail.  ;-)

> Our study of the situation so far indicates that tail call opportunities
> on Itanium are quite limited.  Use of the alloc instruction in the caller
> precludes tail calls, for example.

This is false.  All that is required is that there be another
alloc instruction preceeding the actual jump that deallocates
the caller's register stack frame.

For instance, 

	void foo() { bar(baz()); }

foo:
        .prologue 12, 34
        .mii
        .save ar.pfs, r35
        alloc r35 = ar.pfs, 1, 3, 0, 0
        .save rp, r34
        mov r34 = b0
        mov r33 = r1
        .body
        ;;
        .bbb
        nop 0
        nop 0
        br.call.sptk.many b0 = baz#
        ;;
        .mmi
        mov r32 = r8
        mov r1 = r33
        mov ar.pfs = r35
        .mii
        nop 0
        mov b0 = r34
        ;;
        nop 0
        .mfb
        alloc r2 = ar.pfs, 0, 0, 1, 0
        nop 0
        br.sptk.many bar#
        ;;
        break.f 0
        ;;

> But eliminating caller gp save/restore
> seems to be promising.  Early testing seems to indicate significant
> improvement on some important programs; we're still in the process of
> collecting performance data.

I wouldn't have suspected that this would produce that much of a win,
given that restoring the gp is so cheap; just a single mov instruction.

> If you have evidence that tail call optimization is a significant
> performance win on Itanium, we'd love to hear about it before we
> go charging down the wrong path.

I have no hard data, but I would think that the more tail calls we
can allow, the shallower the register stack depth is needed, and so 
result in less traffic by the RSE.


r~

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2003-05-28 14:38 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-05-23 18:41 gcc and the IA64 ABI Winalski, Paul
     [not found] <A5974D8E5F98D511BB910002A50A6647065013B8@hdsmsx103.hd.intel.com>
2003-05-23 19:09 ` Richard Henderson
2003-05-23 20:38 Steve Ellcey
2003-05-28 15:12 Winalski, Paul

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).