public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* Inlining Improvements
@ 1999-12-21  2:49 Oskar Enoksson
  1999-12-21  4:59 ` Martin v. Loewis
  1999-12-31 23:54 ` Oskar Enoksson
  0 siblings, 2 replies; 32+ messages in thread
From: Oskar Enoksson @ 1999-12-21  2:49 UTC (permalink / raw)
  To: gcc

I read the announcement about the "inlining improvements" on the website.
It's great news! Is this code checked in already? If not, how soon could
it be available?

Thanks!

/*              Oskar Enoksson, Linkoping, Sweden                  */

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Inlining Improvements
  1999-12-21  2:49 Inlining Improvements Oskar Enoksson
@ 1999-12-21  4:59 ` Martin v. Loewis
  1999-12-21  8:04   ` Jamie Lokier
  1999-12-31 23:54   ` Martin v. Loewis
  1999-12-31 23:54 ` Oskar Enoksson
  1 sibling, 2 replies; 32+ messages in thread
From: Martin v. Loewis @ 1999-12-21  4:59 UTC (permalink / raw)
  To: osken393; +Cc: gcc

> I read the announcement about the "inlining improvements" on the website.
> It's great news! Is this code checked in already?

Yes, it is. Have a look at cp/ChangeLog, in particular

1999-12-05  Mark Mitchell  <mark@codesourcery.com>
1999-12-04  Mark Mitchell  <mark@codesourcery.com>
1999-11-25  Mark Mitchell  <mark@codesourcery.com>

and others.

Regards,
Martin

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Inlining Improvements
  1999-12-21  4:59 ` Martin v. Loewis
@ 1999-12-21  8:04   ` Jamie Lokier
  1999-12-21  8:55     ` Mark Mitchell
                       ` (3 more replies)
  1999-12-31 23:54   ` Martin v. Loewis
  1 sibling, 4 replies; 32+ messages in thread
From: Jamie Lokier @ 1999-12-21  8:04 UTC (permalink / raw)
  To: Martin v. Loewis, n; +Cc: osken393, gcc

Martin v. Loewis wrote:
> > I read the announcement about the "inlining improvements" on the website.
> > It's great news! Is this code checked in already?
> 
> Yes, it is. Have a look at cp/ChangeLog, [...]

So we have a situation where the C++ compiler generates better code than
the C compiler from the same source?

Are there plans to add the tree inlining to C any time soon?

thanks,
-- Jamie

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Inlining Improvements
  1999-12-21  8:04   ` Jamie Lokier
@ 1999-12-21  8:55     ` Mark Mitchell
  1999-12-21  9:06       ` Jamie Lokier
  1999-12-31 23:54       ` Mark Mitchell
  1999-12-21  9:43     ` Jeffrey A Law
                       ` (2 subsequent siblings)
  3 siblings, 2 replies; 32+ messages in thread
From: Mark Mitchell @ 1999-12-21  8:55 UTC (permalink / raw)
  To: jamie.lokier; +Cc: martin, n, osken393, gcc

>>>>> "Jamie" == Jamie Lokier <jamie.lokier@cern.ch> writes:

    Jamie> So we have a situation where the C++ compiler generates
    Jamie> better code than the C compiler from the same source?

    Jamie> Are there plans to add the tree inlining to C any time
    Jamie> soon?

We (CodeSourcerY) don't have any such plans, although we're actively
encouraging customers to do that work.

I believe that Cygnus is working on moving some of the
function-at-a-time work which is a necessary prerequisite for the new
inliner, into language-independent code.

I'm actually quite surprised that the tree-based inlining has made as
much a difference (in the quality of the generated code) as it has in
some cases.  Some MIPS benchmarks one of our customers had now run
twice as quickly -- somehow, the new inliner is making it easier for
the back-end to do its job, at least in some situations.

--
Mark Mitchell                   mark@codesourcery.com
CodeSourcery, LLC               http://www.codesourcery.com

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Inlining Improvements
  1999-12-21  8:55     ` Mark Mitchell
@ 1999-12-21  9:06       ` Jamie Lokier
  1999-12-31 23:54         ` Jamie Lokier
  1999-12-31 23:54       ` Mark Mitchell
  1 sibling, 1 reply; 32+ messages in thread
From: Jamie Lokier @ 1999-12-21  9:06 UTC (permalink / raw)
  To: Mark Mitchell; +Cc: martin, n, osken393, gcc

Mark Mitchell wrote:
> I'm actually quite surprised that the tree-based inlining has made as
> much a difference (in the quality of the generated code) as it has in
> some cases.  Some MIPS benchmarks one of our customers had now run
> twice as quickly -- somehow, the new inliner is making it easier for
> the back-end to do its job, at least in some situations.

I'm not surprised.  I long ago complained that the "inline function is
as fast as a macro claim" was totally bogus.

With tree-based inlining changes hopefully the claim will finally
reflect reality.

Just recently I noticed that in an inline (C) function,
__builtin_constant_p was returning 1 just fine.  But in a nested inline
function, it was not.  Perhaps the large expression was too much for the
constant folder after RTL inlining, and it gave up.

Presumably if __builtin_constant_p is not reflecting constantness even
in some simple cases due to RTL inlining, early code generation
decisions based on "is this a constant" are also assuming "no".

Perhaps this gives some clue as to the kind of transformation the back
end should have been doing all along to do good RTL-based inlining?  I
would not be surprised if such a transformation would be effective on
other kinds code too.

I look forward to seeing if tree inlining gives better
__builtin_constant_p results.

-- Jamie

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Inlining Improvements
  1999-12-21  8:04   ` Jamie Lokier
  1999-12-21  8:55     ` Mark Mitchell
@ 1999-12-21  9:43     ` Jeffrey A Law
  1999-12-31 23:54       ` Jeffrey A Law
  1999-12-21  9:46     ` Martin v. Loewis
  1999-12-31 23:54     ` Jamie Lokier
  3 siblings, 1 reply; 32+ messages in thread
From: Jeffrey A Law @ 1999-12-21  9:43 UTC (permalink / raw)
  To: Jamie Lokier; +Cc: Martin v. Loewis, n, osken393, gcc

  In message < 19991221170444.B10482@pcep-jamie.cern.ch >you write:
  > 
  > So we have a situation where the C++ compiler generates better code than
  > the C compiler from the same source?
  > 
  > Are there plans to add the tree inlining to C any time soon?
Cygnus is currently working on implementing functions as trees.  The plan
is to submit it for review as soon as it's working.

jeff

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Inlining Improvements
  1999-12-21  8:04   ` Jamie Lokier
  1999-12-21  8:55     ` Mark Mitchell
  1999-12-21  9:43     ` Jeffrey A Law
@ 1999-12-21  9:46     ` Martin v. Loewis
  1999-12-21 16:00       ` Jamie Lokier
  1999-12-31 23:54       ` Martin v. Loewis
  1999-12-31 23:54     ` Jamie Lokier
  3 siblings, 2 replies; 32+ messages in thread
From: Martin v. Loewis @ 1999-12-21  9:46 UTC (permalink / raw)
  To: jamie.lokier; +Cc: n, osken393, gcc

> So we have a situation where the C++ compiler generates better code than
> the C compiler from the same source?

It might be possible to create examples. On the average, I doubt that.
If it is plain C code that also compiles as C++ code, inlining most
likely happens at the same places.

I believe that the main advantage is in terms of memory consumption in
the compiler itself.

> Are there plans to add the tree inlining to C any time soon?

I can't answer that question.

Regards,
Martin

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Inlining Improvements
  1999-12-21  9:46     ` Martin v. Loewis
@ 1999-12-21 16:00       ` Jamie Lokier
  1999-12-21 16:08         ` Joe Buck
                           ` (2 more replies)
  1999-12-31 23:54       ` Martin v. Loewis
  1 sibling, 3 replies; 32+ messages in thread
From: Jamie Lokier @ 1999-12-21 16:00 UTC (permalink / raw)
  To: Martin v. Loewis; +Cc: osken393, gcc

Martin v. Loewis wrote:
> > So we have a situation where the C++ compiler generates better code than
> > the C compiler from the same source?
> 
> It might be possible to create examples. On the average, I doubt that.
> If it is plain C code that also compiles as C++ code, inlining most
> likely happens at the same places.

The point is that tree inlining seems to generate better code than RTL
inlining which the C compiler currently does.

-- Jamie

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Inlining Improvements
  1999-12-21 16:00       ` Jamie Lokier
@ 1999-12-21 16:08         ` Joe Buck
  1999-12-22  0:35           ` Martin v. Loewis
  1999-12-31 23:54           ` Joe Buck
  1999-12-22  0:04         ` Martin v. Loewis
  1999-12-31 23:54         ` Jamie Lokier
  2 siblings, 2 replies; 32+ messages in thread
From: Joe Buck @ 1999-12-21 16:08 UTC (permalink / raw)
  To: Jamie Lokier; +Cc: martin, osken393, gcc

> Martin v. Loewis wrote:
> > > So we have a situation where the C++ compiler generates better code than
> > > the C compiler from the same source?
> > 
> > It might be possible to create examples. On the average, I doubt that.
> > If it is plain C code that also compiles as C++ code, inlining most
> > likely happens at the same places.
> 
> The point is that tree inlining seems to generate better code than RTL
> inlining which the C compiler currently does.

The RTL inlining happens too late, after some objects have already been
assigned to memory.  Thus passing an automatic struct or C++ class to an
inline function often results in dead stores when the RTL inliner is used.


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Inlining Improvements
  1999-12-21 16:00       ` Jamie Lokier
  1999-12-21 16:08         ` Joe Buck
@ 1999-12-22  0:04         ` Martin v. Loewis
  1999-12-22  0:15           ` Marcin Dalecki
                             ` (2 more replies)
  1999-12-31 23:54         ` Jamie Lokier
  2 siblings, 3 replies; 32+ messages in thread
From: Martin v. Loewis @ 1999-12-22  0:04 UTC (permalink / raw)
  To: jamie.lokier; +Cc: gcc

> The point is that tree inlining seems to generate better code than RTL
> inlining which the C compiler currently does.

Examples?

Martin

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Inlining Improvements
  1999-12-22  0:04         ` Martin v. Loewis
@ 1999-12-22  0:15           ` Marcin Dalecki
  1999-12-22  1:56             ` Martin v. Loewis
  1999-12-31 23:54             ` Marcin Dalecki
  1999-12-22  6:57           ` Jamie Lokier
  1999-12-31 23:54           ` Martin v. Loewis
  2 siblings, 2 replies; 32+ messages in thread
From: Marcin Dalecki @ 1999-12-22  0:15 UTC (permalink / raw)
  To: Martin v. Loewis; +Cc: jamie.lokier, gcc

"Martin v. Loewis" wrote:
> 
> > The point is that tree inlining seems to generate better code than RTL
> > inlining which the C compiler currently does.
> 
> Examples?
> 

No problem: looks at some recent linux-2.3.xxx kernel source:

root:/usr/src/linux/fs# less buffer.c 

Look for the macros:

#define _hashfn(dev,block) 
#define hash(dev,block) 

And theyr usage.

Later down they are used with some intermediate value
which get's outpotimized for the macro version, but which
doesn't go away without reordering of the usage code
for the inline versions thereof.

--Marcin

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Inlining Improvements
  1999-12-21 16:08         ` Joe Buck
@ 1999-12-22  0:35           ` Martin v. Loewis
  1999-12-31  1:56             ` Kevin Atkinson
  1999-12-31 23:54             ` Martin v. Loewis
  1999-12-31 23:54           ` Joe Buck
  1 sibling, 2 replies; 32+ messages in thread
From: Martin v. Loewis @ 1999-12-22  0:35 UTC (permalink / raw)
  To: jbuck; +Cc: jamie.lokier, gcc

> The RTL inlining happens too late, after some objects have already been
> assigned to memory.  Thus passing an automatic struct or C++ class to an
> inline function often results in dead stores when the RTL inliner is used.

Given this hint, I would guess that the code

struct A{
  int i;
  int j;
};

inline
int foo(struct A a)
{
  return a.i+a.j;
}

int bar()
{
  struct A a = {1,2};
  return foo(a);
}

should compile better now, right? Compiled with g++ -V2.95.2 -O2
-fomit-frame-pointer, I get

bar__Fv:
.LFB1:
	subl $28,%esp
.LCFI0:
	movl $0,8(%esp)
	movl $1,8(%esp)
	movl 8(%esp),%eax
	movl $0,12(%esp)
	movl $2,12(%esp)
	movl 12(%esp),%edx
	addl %edx,%eax
	addl $28,%esp
.LCFI1:
	ret
.LFE1:

I can clearly see the dead stores you are talking about. Now let's try
2.96 19991221:

bar__Fv:
.LFB1:
	subl	$28, %esp
.LCFI0:
	movl	$1, %eax
	movl	$2, %edx
	movl	%eax, 8(%esp)
	movl	$3, %eax
	movl	%edx, 12(%esp)
	addl	$28, %esp
	ret
.LFE1:

Yes, it does eliminate some of the dead stores. Now compile it as
plain C (with either 2.95, or the new back-end):

bar:
	movl $3,%eax
	ret

So C is still much better than C++. I understand that 2.96 still
stores the final state of "a", because it believes the address of a
was taken, but I'm surprised it can't emit

	movl	$1, 8(%esp)
	movl	$2, 12(%esp)
	movl	$3, %eax
	ret

since the values of %eax and %edx are not used after the store,
anymore. Also, the stack manipulation seems unnecessary. I was blaming
it on exception handling, but -fno-exceptions does not improve the
code.

Regards,
Martin

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Inlining Improvements
  1999-12-22  0:15           ` Marcin Dalecki
@ 1999-12-22  1:56             ` Martin v. Loewis
  1999-12-31 23:54               ` Martin v. Loewis
  1999-12-31 23:54             ` Marcin Dalecki
  1 sibling, 1 reply; 32+ messages in thread
From: Martin v. Loewis @ 1999-12-22  1:56 UTC (permalink / raw)
  To: dalecki; +Cc: jamie.lokier, gcc

> > > The point is that tree inlining seems to generate better code than RTL
> > > inlining which the C compiler currently does.
> > 
> > Examples?
> > 
> 
> No problem: looks at some recent linux-2.3.xxx kernel source:

Pardon? How is this example relevant to tree inlining? Tree inlining
is currently done only by the C++ compiler, and the kernel is not
compiled by the C++ compiler.

I have no doubt macros generate better code than inline functions, in
any version of gcc. But that was not my question.

Regards,
Martin

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Inlining Improvements
  1999-12-22  0:04         ` Martin v. Loewis
  1999-12-22  0:15           ` Marcin Dalecki
@ 1999-12-22  6:57           ` Jamie Lokier
  1999-12-22  7:58             ` Mark Mitchell
  1999-12-31 23:54             ` Jamie Lokier
  1999-12-31 23:54           ` Martin v. Loewis
  2 siblings, 2 replies; 32+ messages in thread
From: Jamie Lokier @ 1999-12-22  6:57 UTC (permalink / raw)
  To: Martin v. Loewis; +Cc: gcc

Martin v. Loewis wrote:
> > The point is that tree inlining seems to generate better code than RTL
> > inlining which the C compiler currently does.
> 
> Examples?

Mark Mitchell said so; I believe him.  I haven't used the tree inlining
compiler yet.

There are many fine examples of trivial optimisation not being done with
inline functions that are done with macros.  I assume most of them will
occur with tree inlining too (why not?).  But I will have to wait and
see.

-- Jamie

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Inlining Improvements
  1999-12-22  6:57           ` Jamie Lokier
@ 1999-12-22  7:58             ` Mark Mitchell
  1999-12-31 23:54               ` Mark Mitchell
  1999-12-31 23:54             ` Jamie Lokier
  1 sibling, 1 reply; 32+ messages in thread
From: Mark Mitchell @ 1999-12-22  7:58 UTC (permalink / raw)
  To: jamie.lokier; +Cc: martin, gcc

  Martin v. Loewis wrote:
  > > The point is that tree inlining seems to generate better code than RTL
  > > inlining which the C compiler currently does.
  > 
  > Examples?

  Mark Mitchell said so; I believe him.  I haven't used the tree inlining
  compiler yet.

The LANL Pooma II library runs faster with the changes on some of its
benchmarks.  There is *extreme* inlining going on there, and the final
loops are very small.  So, saving one instruction to do one dead store
going away, say, could make a 30% difference.

  There are many fine examples of trivial optimisation not being done with
  inline functions that are done with macros.  I assume most of them will
  occur with tree inlining too (why not?).  But I will have to wait and
  see.

I concur.  I don't expect typical code to see major wins, yet.

One of the things now easy to do (in theory) is scatter-gather of
loads and stores.  That will expose small structures (with two
members, say, like a `complex' class) to the back-end optimizers
(which deal almost exclusively with REGs).

--
Mark Mitchell                   mark@codesourcery.com
CodeSourcery, LLC               http://www.codesourcery.com

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Inlining Improvements
  1999-12-22  0:35           ` Martin v. Loewis
@ 1999-12-31  1:56             ` Kevin Atkinson
  1999-12-31 23:54               ` Kevin Atkinson
  1999-12-31 23:54             ` Martin v. Loewis
  1 sibling, 1 reply; 32+ messages in thread
From: Kevin Atkinson @ 1999-12-31  1:56 UTC (permalink / raw)
  To: Martin v. Loewis; +Cc: jbuck, jamie.lokier, gcc

"Martin v. Loewis" wrote:
> 
> > The RTL inlining happens too late, after some objects have already been
> > assigned to memory.  Thus passing an automatic struct or C++ class to an
> > inline function often results in dead stores when the RTL inliner is used.
> 
> Given this hint, I would guess that the code
> 
> struct A{
>   int i;
>   int j;
> };
> 
> inline
> int foo(struct A a)
> {
>   return a.i+a.j;
> }
> 
> int bar()
> {
>   struct A a = {1,2};
>   return foo(a);
> }
> 
> should compile better now, right? Compiled with g++ -V2.95.2 -O2
> -fomit-frame-pointer, I get
> 
> bar__Fv:
> .LFB1:
>         subl $28,%esp
> .LCFI0:
>         movl $0,8(%esp)
>         movl $1,8(%esp)
>         movl 8(%esp),%eax
>         movl $0,12(%esp)
>         movl $2,12(%esp)
>         movl 12(%esp),%edx
>         addl %edx,%eax
>         addl $28,%esp
> .LCFI1:
>         ret
> .LFE1:
> 
> I can clearly see the dead stores you are talking about. Now let's try
> 2.96 19991221:
> 
> bar__Fv:
> .LFB1:
>         subl    $28, %esp
> .LCFI0:
>         movl    $1, %eax
>         movl    $2, %edx
>         movl    %eax, 8(%esp)
>         movl    $3, %eax
>         movl    %edx, 12(%esp)
>         addl    $28, %esp
>         ret
> .LFE1:
> 
> Yes, it does eliminate some of the dead stores. Now compile it as
> plain C (with either 2.95, or the new back-end):
> 
> bar:
>         movl $3,%eax
>         ret
> 
> So C is still much better than C++. I understand that 2.96 still
> stores the final state of "a", because it believes the address of a
> was taken, but I'm surprised it can't emit
> 
>         movl    $1, 8(%esp)
>         movl    $2, 12(%esp)
>         movl    $3, %eax
>         ret
> 
> since the values of %eax and %edx are not used after the store,
> anymore. Also, the stack manipulation seems unnecessary. I was blaming
> it on exception handling, but -fno-exceptions does not improve the
> code.

So when compiled as plain C gcc does a better job at inlining then C++
or did you use a macro there and just not tell us?

-- 
Kevin Atkinson
kevinatk@home.com
http://metalab.unc.edu/kevina/

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Inlining Improvements
  1999-12-22  0:15           ` Marcin Dalecki
  1999-12-22  1:56             ` Martin v. Loewis
@ 1999-12-31 23:54             ` Marcin Dalecki
  1 sibling, 0 replies; 32+ messages in thread
From: Marcin Dalecki @ 1999-12-31 23:54 UTC (permalink / raw)
  To: Martin v. Loewis; +Cc: jamie.lokier, gcc

"Martin v. Loewis" wrote:
> 
> > The point is that tree inlining seems to generate better code than RTL
> > inlining which the C compiler currently does.
> 
> Examples?
> 

No problem: looks at some recent linux-2.3.xxx kernel source:

root:/usr/src/linux/fs# less buffer.c 

Look for the macros:

#define _hashfn(dev,block) 
#define hash(dev,block) 

And theyr usage.

Later down they are used with some intermediate value
which get's outpotimized for the macro version, but which
doesn't go away without reordering of the usage code
for the inline versions thereof.

--Marcin

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Inlining Improvements
  1999-12-21  9:43     ` Jeffrey A Law
@ 1999-12-31 23:54       ` Jeffrey A Law
  0 siblings, 0 replies; 32+ messages in thread
From: Jeffrey A Law @ 1999-12-31 23:54 UTC (permalink / raw)
  To: Jamie Lokier; +Cc: Martin v. Loewis, n, osken393, gcc

  In message < 19991221170444.B10482@pcep-jamie.cern.ch >you write:
  > 
  > So we have a situation where the C++ compiler generates better code than
  > the C compiler from the same source?
  > 
  > Are there plans to add the tree inlining to C any time soon?
Cygnus is currently working on implementing functions as trees.  The plan
is to submit it for review as soon as it's working.

jeff

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Inlining Improvements
  1999-12-22  0:04         ` Martin v. Loewis
  1999-12-22  0:15           ` Marcin Dalecki
  1999-12-22  6:57           ` Jamie Lokier
@ 1999-12-31 23:54           ` Martin v. Loewis
  2 siblings, 0 replies; 32+ messages in thread
From: Martin v. Loewis @ 1999-12-31 23:54 UTC (permalink / raw)
  To: jamie.lokier; +Cc: gcc

> The point is that tree inlining seems to generate better code than RTL
> inlining which the C compiler currently does.

Examples?

Martin

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Inlining Improvements
  1999-12-22  6:57           ` Jamie Lokier
  1999-12-22  7:58             ` Mark Mitchell
@ 1999-12-31 23:54             ` Jamie Lokier
  1 sibling, 0 replies; 32+ messages in thread
From: Jamie Lokier @ 1999-12-31 23:54 UTC (permalink / raw)
  To: Martin v. Loewis; +Cc: gcc

Martin v. Loewis wrote:
> > The point is that tree inlining seems to generate better code than RTL
> > inlining which the C compiler currently does.
> 
> Examples?

Mark Mitchell said so; I believe him.  I haven't used the tree inlining
compiler yet.

There are many fine examples of trivial optimisation not being done with
inline functions that are done with macros.  I assume most of them will
occur with tree inlining too (why not?).  But I will have to wait and
see.

-- Jamie

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Inlining Improvements
  1999-12-21 16:00       ` Jamie Lokier
  1999-12-21 16:08         ` Joe Buck
  1999-12-22  0:04         ` Martin v. Loewis
@ 1999-12-31 23:54         ` Jamie Lokier
  2 siblings, 0 replies; 32+ messages in thread
From: Jamie Lokier @ 1999-12-31 23:54 UTC (permalink / raw)
  To: Martin v. Loewis; +Cc: osken393, gcc

Martin v. Loewis wrote:
> > So we have a situation where the C++ compiler generates better code than
> > the C compiler from the same source?
> 
> It might be possible to create examples. On the average, I doubt that.
> If it is plain C code that also compiles as C++ code, inlining most
> likely happens at the same places.

The point is that tree inlining seems to generate better code than RTL
inlining which the C compiler currently does.

-- Jamie

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Inlining Improvements
  1999-12-21  9:06       ` Jamie Lokier
@ 1999-12-31 23:54         ` Jamie Lokier
  0 siblings, 0 replies; 32+ messages in thread
From: Jamie Lokier @ 1999-12-31 23:54 UTC (permalink / raw)
  To: Mark Mitchell; +Cc: martin, n, osken393, gcc

Mark Mitchell wrote:
> I'm actually quite surprised that the tree-based inlining has made as
> much a difference (in the quality of the generated code) as it has in
> some cases.  Some MIPS benchmarks one of our customers had now run
> twice as quickly -- somehow, the new inliner is making it easier for
> the back-end to do its job, at least in some situations.

I'm not surprised.  I long ago complained that the "inline function is
as fast as a macro claim" was totally bogus.

With tree-based inlining changes hopefully the claim will finally
reflect reality.

Just recently I noticed that in an inline (C) function,
__builtin_constant_p was returning 1 just fine.  But in a nested inline
function, it was not.  Perhaps the large expression was too much for the
constant folder after RTL inlining, and it gave up.

Presumably if __builtin_constant_p is not reflecting constantness even
in some simple cases due to RTL inlining, early code generation
decisions based on "is this a constant" are also assuming "no".

Perhaps this gives some clue as to the kind of transformation the back
end should have been doing all along to do good RTL-based inlining?  I
would not be surprised if such a transformation would be effective on
other kinds code too.

I look forward to seeing if tree inlining gives better
__builtin_constant_p results.

-- Jamie

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Inlining Improvements
  1999-12-21  4:59 ` Martin v. Loewis
  1999-12-21  8:04   ` Jamie Lokier
@ 1999-12-31 23:54   ` Martin v. Loewis
  1 sibling, 0 replies; 32+ messages in thread
From: Martin v. Loewis @ 1999-12-31 23:54 UTC (permalink / raw)
  To: osken393; +Cc: gcc

> I read the announcement about the "inlining improvements" on the website.
> It's great news! Is this code checked in already?

Yes, it is. Have a look at cp/ChangeLog, in particular

1999-12-05  Mark Mitchell  <mark@codesourcery.com>
1999-12-04  Mark Mitchell  <mark@codesourcery.com>
1999-11-25  Mark Mitchell  <mark@codesourcery.com>

and others.

Regards,
Martin

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Inlining Improvements
  1999-12-31  1:56             ` Kevin Atkinson
@ 1999-12-31 23:54               ` Kevin Atkinson
  0 siblings, 0 replies; 32+ messages in thread
From: Kevin Atkinson @ 1999-12-31 23:54 UTC (permalink / raw)
  To: Martin v. Loewis; +Cc: jbuck, jamie.lokier, gcc

"Martin v. Loewis" wrote:
> 
> > The RTL inlining happens too late, after some objects have already been
> > assigned to memory.  Thus passing an automatic struct or C++ class to an
> > inline function often results in dead stores when the RTL inliner is used.
> 
> Given this hint, I would guess that the code
> 
> struct A{
>   int i;
>   int j;
> };
> 
> inline
> int foo(struct A a)
> {
>   return a.i+a.j;
> }
> 
> int bar()
> {
>   struct A a = {1,2};
>   return foo(a);
> }
> 
> should compile better now, right? Compiled with g++ -V2.95.2 -O2
> -fomit-frame-pointer, I get
> 
> bar__Fv:
> .LFB1:
>         subl $28,%esp
> .LCFI0:
>         movl $0,8(%esp)
>         movl $1,8(%esp)
>         movl 8(%esp),%eax
>         movl $0,12(%esp)
>         movl $2,12(%esp)
>         movl 12(%esp),%edx
>         addl %edx,%eax
>         addl $28,%esp
> .LCFI1:
>         ret
> .LFE1:
> 
> I can clearly see the dead stores you are talking about. Now let's try
> 2.96 19991221:
> 
> bar__Fv:
> .LFB1:
>         subl    $28, %esp
> .LCFI0:
>         movl    $1, %eax
>         movl    $2, %edx
>         movl    %eax, 8(%esp)
>         movl    $3, %eax
>         movl    %edx, 12(%esp)
>         addl    $28, %esp
>         ret
> .LFE1:
> 
> Yes, it does eliminate some of the dead stores. Now compile it as
> plain C (with either 2.95, or the new back-end):
> 
> bar:
>         movl $3,%eax
>         ret
> 
> So C is still much better than C++. I understand that 2.96 still
> stores the final state of "a", because it believes the address of a
> was taken, but I'm surprised it can't emit
> 
>         movl    $1, 8(%esp)
>         movl    $2, 12(%esp)
>         movl    $3, %eax
>         ret
> 
> since the values of %eax and %edx are not used after the store,
> anymore. Also, the stack manipulation seems unnecessary. I was blaming
> it on exception handling, but -fno-exceptions does not improve the
> code.

So when compiled as plain C gcc does a better job at inlining then C++
or did you use a macro there and just not tell us?

-- 
Kevin Atkinson
kevinatk@home.com
http://metalab.unc.edu/kevina/

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Inlining Improvements
  1999-12-21 16:08         ` Joe Buck
  1999-12-22  0:35           ` Martin v. Loewis
@ 1999-12-31 23:54           ` Joe Buck
  1 sibling, 0 replies; 32+ messages in thread
From: Joe Buck @ 1999-12-31 23:54 UTC (permalink / raw)
  To: Jamie Lokier; +Cc: martin, osken393, gcc

> Martin v. Loewis wrote:
> > > So we have a situation where the C++ compiler generates better code than
> > > the C compiler from the same source?
> > 
> > It might be possible to create examples. On the average, I doubt that.
> > If it is plain C code that also compiles as C++ code, inlining most
> > likely happens at the same places.
> 
> The point is that tree inlining seems to generate better code than RTL
> inlining which the C compiler currently does.

The RTL inlining happens too late, after some objects have already been
assigned to memory.  Thus passing an automatic struct or C++ class to an
inline function often results in dead stores when the RTL inliner is used.


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Inlining Improvements
  1999-12-22  1:56             ` Martin v. Loewis
@ 1999-12-31 23:54               ` Martin v. Loewis
  0 siblings, 0 replies; 32+ messages in thread
From: Martin v. Loewis @ 1999-12-31 23:54 UTC (permalink / raw)
  To: dalecki; +Cc: jamie.lokier, gcc

> > > The point is that tree inlining seems to generate better code than RTL
> > > inlining which the C compiler currently does.
> > 
> > Examples?
> > 
> 
> No problem: looks at some recent linux-2.3.xxx kernel source:

Pardon? How is this example relevant to tree inlining? Tree inlining
is currently done only by the C++ compiler, and the kernel is not
compiled by the C++ compiler.

I have no doubt macros generate better code than inline functions, in
any version of gcc. But that was not my question.

Regards,
Martin

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Inlining Improvements
  1999-12-22  7:58             ` Mark Mitchell
@ 1999-12-31 23:54               ` Mark Mitchell
  0 siblings, 0 replies; 32+ messages in thread
From: Mark Mitchell @ 1999-12-31 23:54 UTC (permalink / raw)
  To: jamie.lokier; +Cc: martin, gcc

  Martin v. Loewis wrote:
  > > The point is that tree inlining seems to generate better code than RTL
  > > inlining which the C compiler currently does.
  > 
  > Examples?

  Mark Mitchell said so; I believe him.  I haven't used the tree inlining
  compiler yet.

The LANL Pooma II library runs faster with the changes on some of its
benchmarks.  There is *extreme* inlining going on there, and the final
loops are very small.  So, saving one instruction to do one dead store
going away, say, could make a 30% difference.

  There are many fine examples of trivial optimisation not being done with
  inline functions that are done with macros.  I assume most of them will
  occur with tree inlining too (why not?).  But I will have to wait and
  see.

I concur.  I don't expect typical code to see major wins, yet.

One of the things now easy to do (in theory) is scatter-gather of
loads and stores.  That will expose small structures (with two
members, say, like a `complex' class) to the back-end optimizers
(which deal almost exclusively with REGs).

--
Mark Mitchell                   mark@codesourcery.com
CodeSourcery, LLC               http://www.codesourcery.com

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Inlining Improvements
  1999-12-21  8:04   ` Jamie Lokier
                       ` (2 preceding siblings ...)
  1999-12-21  9:46     ` Martin v. Loewis
@ 1999-12-31 23:54     ` Jamie Lokier
  3 siblings, 0 replies; 32+ messages in thread
From: Jamie Lokier @ 1999-12-31 23:54 UTC (permalink / raw)
  To: Martin v. Loewis, n; +Cc: osken393, gcc

Martin v. Loewis wrote:
> > I read the announcement about the "inlining improvements" on the website.
> > It's great news! Is this code checked in already?
> 
> Yes, it is. Have a look at cp/ChangeLog, [...]

So we have a situation where the C++ compiler generates better code than
the C compiler from the same source?

Are there plans to add the tree inlining to C any time soon?

thanks,
-- Jamie

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Inlining Improvements
  1999-12-21  8:55     ` Mark Mitchell
  1999-12-21  9:06       ` Jamie Lokier
@ 1999-12-31 23:54       ` Mark Mitchell
  1 sibling, 0 replies; 32+ messages in thread
From: Mark Mitchell @ 1999-12-31 23:54 UTC (permalink / raw)
  To: jamie.lokier; +Cc: martin, n, osken393, gcc

>>>>> "Jamie" == Jamie Lokier <jamie.lokier@cern.ch> writes:

    Jamie> So we have a situation where the C++ compiler generates
    Jamie> better code than the C compiler from the same source?

    Jamie> Are there plans to add the tree inlining to C any time
    Jamie> soon?

We (CodeSourcerY) don't have any such plans, although we're actively
encouraging customers to do that work.

I believe that Cygnus is working on moving some of the
function-at-a-time work which is a necessary prerequisite for the new
inliner, into language-independent code.

I'm actually quite surprised that the tree-based inlining has made as
much a difference (in the quality of the generated code) as it has in
some cases.  Some MIPS benchmarks one of our customers had now run
twice as quickly -- somehow, the new inliner is making it easier for
the back-end to do its job, at least in some situations.

--
Mark Mitchell                   mark@codesourcery.com
CodeSourcery, LLC               http://www.codesourcery.com

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Inlining Improvements
  1999-12-21  9:46     ` Martin v. Loewis
  1999-12-21 16:00       ` Jamie Lokier
@ 1999-12-31 23:54       ` Martin v. Loewis
  1 sibling, 0 replies; 32+ messages in thread
From: Martin v. Loewis @ 1999-12-31 23:54 UTC (permalink / raw)
  To: jamie.lokier; +Cc: n, osken393, gcc

> So we have a situation where the C++ compiler generates better code than
> the C compiler from the same source?

It might be possible to create examples. On the average, I doubt that.
If it is plain C code that also compiles as C++ code, inlining most
likely happens at the same places.

I believe that the main advantage is in terms of memory consumption in
the compiler itself.

> Are there plans to add the tree inlining to C any time soon?

I can't answer that question.

Regards,
Martin

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Inlining Improvements
  1999-12-21  2:49 Inlining Improvements Oskar Enoksson
  1999-12-21  4:59 ` Martin v. Loewis
@ 1999-12-31 23:54 ` Oskar Enoksson
  1 sibling, 0 replies; 32+ messages in thread
From: Oskar Enoksson @ 1999-12-31 23:54 UTC (permalink / raw)
  To: gcc

I read the announcement about the "inlining improvements" on the website.
It's great news! Is this code checked in already? If not, how soon could
it be available?

Thanks!

/*              Oskar Enoksson, Linkoping, Sweden                  */

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Inlining Improvements
  1999-12-22  0:35           ` Martin v. Loewis
  1999-12-31  1:56             ` Kevin Atkinson
@ 1999-12-31 23:54             ` Martin v. Loewis
  1 sibling, 0 replies; 32+ messages in thread
From: Martin v. Loewis @ 1999-12-31 23:54 UTC (permalink / raw)
  To: jbuck; +Cc: jamie.lokier, gcc

> The RTL inlining happens too late, after some objects have already been
> assigned to memory.  Thus passing an automatic struct or C++ class to an
> inline function often results in dead stores when the RTL inliner is used.

Given this hint, I would guess that the code

struct A{
  int i;
  int j;
};

inline
int foo(struct A a)
{
  return a.i+a.j;
}

int bar()
{
  struct A a = {1,2};
  return foo(a);
}

should compile better now, right? Compiled with g++ -V2.95.2 -O2
-fomit-frame-pointer, I get

bar__Fv:
.LFB1:
	subl $28,%esp
.LCFI0:
	movl $0,8(%esp)
	movl $1,8(%esp)
	movl 8(%esp),%eax
	movl $0,12(%esp)
	movl $2,12(%esp)
	movl 12(%esp),%edx
	addl %edx,%eax
	addl $28,%esp
.LCFI1:
	ret
.LFE1:

I can clearly see the dead stores you are talking about. Now let's try
2.96 19991221:

bar__Fv:
.LFB1:
	subl	$28, %esp
.LCFI0:
	movl	$1, %eax
	movl	$2, %edx
	movl	%eax, 8(%esp)
	movl	$3, %eax
	movl	%edx, 12(%esp)
	addl	$28, %esp
	ret
.LFE1:

Yes, it does eliminate some of the dead stores. Now compile it as
plain C (with either 2.95, or the new back-end):

bar:
	movl $3,%eax
	ret

So C is still much better than C++. I understand that 2.96 still
stores the final state of "a", because it believes the address of a
was taken, but I'm surprised it can't emit

	movl	$1, 8(%esp)
	movl	$2, 12(%esp)
	movl	$3, %eax
	ret

since the values of %eax and %edx are not used after the store,
anymore. Also, the stack manipulation seems unnecessary. I was blaming
it on exception handling, but -fno-exceptions does not improve the
code.

Regards,
Martin

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~1999-12-31 23:54 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
1999-12-21  2:49 Inlining Improvements Oskar Enoksson
1999-12-21  4:59 ` Martin v. Loewis
1999-12-21  8:04   ` Jamie Lokier
1999-12-21  8:55     ` Mark Mitchell
1999-12-21  9:06       ` Jamie Lokier
1999-12-31 23:54         ` Jamie Lokier
1999-12-31 23:54       ` Mark Mitchell
1999-12-21  9:43     ` Jeffrey A Law
1999-12-31 23:54       ` Jeffrey A Law
1999-12-21  9:46     ` Martin v. Loewis
1999-12-21 16:00       ` Jamie Lokier
1999-12-21 16:08         ` Joe Buck
1999-12-22  0:35           ` Martin v. Loewis
1999-12-31  1:56             ` Kevin Atkinson
1999-12-31 23:54               ` Kevin Atkinson
1999-12-31 23:54             ` Martin v. Loewis
1999-12-31 23:54           ` Joe Buck
1999-12-22  0:04         ` Martin v. Loewis
1999-12-22  0:15           ` Marcin Dalecki
1999-12-22  1:56             ` Martin v. Loewis
1999-12-31 23:54               ` Martin v. Loewis
1999-12-31 23:54             ` Marcin Dalecki
1999-12-22  6:57           ` Jamie Lokier
1999-12-22  7:58             ` Mark Mitchell
1999-12-31 23:54               ` Mark Mitchell
1999-12-31 23:54             ` Jamie Lokier
1999-12-31 23:54           ` Martin v. Loewis
1999-12-31 23:54         ` Jamie Lokier
1999-12-31 23:54       ` Martin v. Loewis
1999-12-31 23:54     ` Jamie Lokier
1999-12-31 23:54   ` Martin v. Loewis
1999-12-31 23:54 ` Oskar Enoksson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).