public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* abysmal code generated by gcc 3.2
@ 2002-10-21  0:56 Denys Duchier
  2002-10-21  7:37 ` Fergus Henderson
                   ` (2 more replies)
  0 siblings, 3 replies; 16+ messages in thread
From: Denys Duchier @ 2002-10-21  0:56 UTC (permalink / raw)
  To: gcc

my application is the implementation of a virtual machine for an
emulated programming language.  Switching from gcc 2.95.x to 3.2
brought a few expected pains due to the change in data layout, but the
major issue is that gcc 3.2 produces extremely poor code for my
application on x86 (also on others, but I have not measured those
personally).

Measuring just the impact on the main emulator loop (which uses the
classical threaded code technique, i.e. jumps to first class labels) I
found that the emulator was slowed down by a FACTOR of 8.27.

Looking at the generated assembly code, it is clear that the 3.2
compiler expends a lot of effort trying to keep a certain set of
values in registers.  On x86, this is a horrible policy (especially in
a threaded code interpretation loop).

Part of the problem comes from an interaction with inlining.  I turned
inlining off for a couple of non-critical functions which were
exposing values that the compiler ended up trying to keep in
registers, and I declared one variable volatile (much better results
than trying to switch off gcse).

This got me to only a factor 1.37 slowdown :-) ... measured on
basically pure emulated recursion (i.e. the speed of looping while
doing nothing else).

Which of course still sucks majorly since this is the MAIN emulator
loop (and since _every_ part of the implementation has been sizeably
slowed down... aargh!)

Here is an example of what I still cannot get rid of.  Here is the
code produced by gcc 2.95.x for the MOVEXX instruction:

#APP
         MOVEXX:
#NO_APP
        movl 4(%ebp),%edx
        movl 8(%ebp),%eax
        addl $12,%ebp
        movl (%edx),%edx
        movl %edx,(%eax)
        jmp *(%ebp)

Here is the code produced by gcc 3.2:

#APP
         MOVEXX:
#NO_APP
        movl    4(%ebp), %esi
        movl    8(%ebp), %eax
        addl    $12, %ebp               #  PC
        movl    (%esi), %ebx
        movl    _oz_heap_end, %esi      #  _oz_heap_end
        movl    %ebx, (%eax)
        movl    _oz_heap_cur, %ebx      #  _oz_heap_cur,  sPointer
        movl    480(%esp), %eax         #  CAP
        movl    am+52, %ecx             #  <variable>._currentOptVar, <anonymous>
        movl    am+28, %edx             #  <variable>.statusReg,  <anonymous>
        leal    12(%eax), %edi          #  <anonymous>
        jmp     *(%ebp)                 # * PC

To my uneducated eye, it looks like gcc is now trying very hard to
keep a bunch of values in registers.  Every emulated instruction is
like that, thus resulting in considerable overhead.  I tried to
declare _oz_heap_end and _oz_heap_cur volatile, but, curiously, that
had no effect on this particular code generation.

I am at my wits ends. Can anyone help?  (I realize that my application
is atypical).

Cheers,

PS: the compiler options used for the emulator file are:
-fno-exceptions -O3 -pipe -fstrict-aliasing -march=pentium -mcpu=pentiumpro -fomit-frame-pointer

-- 
Dr. Denys Duchier			Denys.Duchier@ps.uni-sb.de
Forschungsbereich Programmiersysteme	(Programming Systems Lab)
Universitaet des Saarlandes, Geb. 45	http://www.ps.uni-sb.de/~duchier
Postfach 15 11 50			Phone: +49 681 302 5618
66041 Saarbruecken, Germany		Fax:   +49 681 302 5615

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: abysmal code generated by gcc 3.2
  2002-10-21  0:56 abysmal code generated by gcc 3.2 Denys Duchier
@ 2002-10-21  7:37 ` Fergus Henderson
  2002-10-21  8:48   ` Denys Duchier
  2002-10-21 12:59 ` Mike Stump
  2002-10-21 18:43 ` Denys Duchier
  2 siblings, 1 reply; 16+ messages in thread
From: Fergus Henderson @ 2002-10-21  7:37 UTC (permalink / raw)
  To: Denys Duchier; +Cc: gcc

On 21-Oct-2002, Denys Duchier <Denys.Duchier@ps.uni-sb.de> wrote:
> my application is the implementation of a virtual machine for an
> emulated programming language.  Switching from gcc 2.95.x to 3.2
> brought a few expected pains due to the change in data layout, but the
> major issue is that gcc 3.2 produces extremely poor code for my
> application on x86 (also on others, but I have not measured those
> personally).

Could you post the source code and the `.i' file (compile with -save-temps)
for the function in question, or for suitable parts of it?

-- 
Fergus Henderson <fjh@cs.mu.oz.au>  |  "I have always known that the pursuit
The University of Melbourne         |  of excellence is a lethal habit"
WWW: <http://www.cs.mu.oz.au/~fjh>  |     -- the last words of T. S. Garp.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: abysmal code generated by gcc 3.2
  2002-10-21  7:37 ` Fergus Henderson
@ 2002-10-21  8:48   ` Denys Duchier
  0 siblings, 0 replies; 16+ messages in thread
From: Denys Duchier @ 2002-10-21  8:48 UTC (permalink / raw)
  To: gcc

Fergus Henderson <fjh@cs.mu.OZ.AU> writes:

> Could you post the source code and the `.i' file (compile with -save-temps)
> for the function in question, or for suitable parts of it?

Files emulate.ii.gz and emulate.s.gz are too big to attach to this
message.  I have placed them on the web at the following URLs:

http://www.ps.uni-sb.de/~duchier/emulate.ii.gz
http://www.ps.uni-sb.de/~duchier/emulate.s.gz

Just in case, the source is also available at:

http://www.ps.uni-sb.de/~duchier/emulate.cc.gz

I am not sure what would be a _suitable_ part of it, but the MOVEXX
emulated instruction shown in my earlier email is typical.

Thanks a lot for looking into this.

Cheers,

-- 
Dr. Denys Duchier			Denys.Duchier@ps.uni-sb.de
Forschungsbereich Programmiersysteme	(Programming Systems Lab)
Universitaet des Saarlandes, Geb. 45	http://www.ps.uni-sb.de/~duchier
Postfach 15 11 50			Phone: +49 681 302 5618
66041 Saarbruecken, Germany		Fax:   +49 681 302 5615

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: abysmal code generated by gcc 3.2
  2002-10-21  0:56 abysmal code generated by gcc 3.2 Denys Duchier
  2002-10-21  7:37 ` Fergus Henderson
@ 2002-10-21 12:59 ` Mike Stump
  2002-10-21 15:07   ` Denys Duchier
  2002-10-21 18:43 ` Denys Duchier
  2 siblings, 1 reply; 16+ messages in thread
From: Mike Stump @ 2002-10-21 12:59 UTC (permalink / raw)
  To: Denys Duchier; +Cc: gcc

On Sunday, October 20, 2002, at 03:02 PM, Denys Duchier wrote:
> I am at my wits ends. Can anyone help?

Well, if all else fails, you can build compilers from the cvs tree, and 
binary search for when code generation changed from good to bad for 
you.  cvs co -D '300 days ago' gcc and then see if that compiler is as 
bad.  If it is not, cvs co -D '150 days ago' and try that one...  If 
you're lucky, in about 8 checkouts and rebuilds you should have it down 
to the day the codegen died for you.

I'd do this only after experimenting with the top of the tree (to 
ensure performance hasn't already returned for you), and experimenting 
with all the relevant compiler flags, for example, see the following 
parameters: max-inline-insns-single, max-inline-insns, 
max-inline-slope, and min-inline-insns.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: abysmal code generated by gcc 3.2
  2002-10-21 12:59 ` Mike Stump
@ 2002-10-21 15:07   ` Denys Duchier
  2002-10-21 15:12     ` Fergus Henderson
  2002-10-21 15:37     ` Mike Stump
  0 siblings, 2 replies; 16+ messages in thread
From: Denys Duchier @ 2002-10-21 15:07 UTC (permalink / raw)
  To: gcc

Mike Stump <mrs@apple.com> writes:

> Well, if all else fails, you can build compilers from the cvs tree,
> and binary search for when code generation changed from good to bad
> for you.

That possibly could reveal what's at the root of the issue, but it
would not solve my problem which is to get my application to perform
well on today's distributions.  Every Linux distribution is now based
on gcc 3.2, thus I must get the Oz emulator to perform well when
compiled with it.

> I'd do this only after experimenting with the top of the tree (to
> ensure performance hasn't already returned for you),

yes, I definitely plan to experiment with gcc out of CVS to see what I
can expect in the future.  However, for the moment (1) I must solve
the remaining issues introduced with the switch to the new ABI, (2) I
need to make it reasonably fast across the board for the current user
base.

> and experimenting
> with all the relevant compiler flags, for example, see the following
> parameters: max-inline-insns-single, max-inline-insns,
> max-inline-slope, and min-inline-insns.

uh? except for max-inline-insns, I never heard of the others.  Are
these new (and what do they mean? - is that maybe documented in CVS).

Cheers,

-- 
Dr. Denys Duchier			Denys.Duchier@ps.uni-sb.de
Forschungsbereich Programmiersysteme	(Programming Systems Lab)
Universitaet des Saarlandes, Geb. 45	http://www.ps.uni-sb.de/~duchier
Postfach 15 11 50			Phone: +49 681 302 5618
66041 Saarbruecken, Germany		Fax:   +49 681 302 5615

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: abysmal code generated by gcc 3.2
  2002-10-21 15:07   ` Denys Duchier
@ 2002-10-21 15:12     ` Fergus Henderson
  2002-10-21 15:37     ` Mike Stump
  1 sibling, 0 replies; 16+ messages in thread
From: Fergus Henderson @ 2002-10-21 15:12 UTC (permalink / raw)
  To: Denys Duchier; +Cc: gcc

On 21-Oct-2002, Denys Duchier <Denys.Duchier@ps.uni-sb.de> wrote:
> Mike Stump <mrs@apple.com> writes:
> 
> > Well, if all else fails, you can build compilers from the cvs tree,
> > and binary search for when code generation changed from good to bad
> > for you.
> 
> That possibly could reveal what's at the root of the issue, but it
> would not solve my problem which is to get my application to perform
> well on today's distributions.  Every Linux distribution is now based
> on gcc 3.2, thus I must get the Oz emulator to perform well when
> compiled with it.

If you can identify a patch that is responsible for a significant
performance regression, then the release manager might be willing
to revert that patch for the next minor release (e.g. 3.2.2).
That would be likely to make it's way into the distributions
reasonably soon.

-- 
Fergus Henderson <fjh@cs.mu.oz.au>  |  "I have always known that the pursuit
The University of Melbourne         |  of excellence is a lethal habit"
WWW: <http://www.cs.mu.oz.au/~fjh>  |     -- the last words of T. S. Garp.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: abysmal code generated by gcc 3.2
  2002-10-21 15:07   ` Denys Duchier
  2002-10-21 15:12     ` Fergus Henderson
@ 2002-10-21 15:37     ` Mike Stump
  2002-10-21 16:06       ` Dale Johannesen
  1 sibling, 1 reply; 16+ messages in thread
From: Mike Stump @ 2002-10-21 15:37 UTC (permalink / raw)
  To: Denys Duchier; +Cc: gcc

On Monday, October 21, 2002, at 11:45 AM, Denys Duchier wrote:
> Mike Stump <mrs@apple.com> writes:
>
>> Well, if all else fails, you can build compilers from the cvs tree,
>> and binary search for when code generation changed from good to bad
>> for you.
>
> That possibly could reveal what's at the root of the issue, but it
> would not solve my problem which is to get my application to perform
> well on today's distributions.  Every Linux distribution is now based
> on gcc 3.2, thus I must get the Oz emulator to perform well when
> compiled with it.


>> I'd do this only after experimenting with the top of the tree (to
>> ensure performance hasn't already returned for you),
>
> yes, I definitely plan to experiment with gcc out of CVS to see what I
> can expect in the future.  However, for the moment (1) I must solve
> the remaining issues introduced with the switch to the new ABI, (2) I
> need to make it reasonably fast across the board for the current user
> base.
>
>> and experimenting
>> with all the relevant compiler flags, for example, see the following
>> parameters: max-inline-insns-single, max-inline-insns,
>> max-inline-slope, and min-inline-insns.
>
> uh? except for max-inline-insns, I never heard of the others.  Are
> these new (and what do they mean? - is that maybe documented in CVS).

I'll assume all of these are documented in the manual, if all else 
fails, you can read it, and it that fails, you can see params.def:

/* The single function inlining limit. This is the maximum size
    of a function counted in internal gcc instructions (not in
    real machine instructions) that is eligible for inlining
    by the tree inliner.
    The default value is 300.
    Only functions marked inline (or methods defined in the class
    definition for C++) are affected by this, unless you set the
    -finline-functions (included in -O3) compiler option.
    There are more restrictions to inlining: If inlined functions
    call other functions, the already inlined instructions are
    counted and once the recursive inline limit (see
    "max-inline-insns" parameter) is exceeded, the acceptable size
    gets decreased.  */
DEFPARAM (PARAM_MAX_INLINE_INSNS_SINGLE,
	  "max-inline-insns-single",
	  "The maximum number of instructions in a single function eliglible 
for inlining",
	  300)

/* The repeated inlining limit. After this number of instructions
    (in the internal gcc representation, not real machine instructions)
    got inlined by repeated inlining, gcc starts to decrease the maximum
    number of inlinable instructions in the tree inliner.
    This is done by a linear function, see "max-inline-slope" parameter.
    It is necessary in order to limit the compile-time resources, that
    could otherwise become very high.
    It is recommended to set this value to twice the value of the single
    function limit (set by the "max-inline-insns-single" parameter) or
    higher. The default value is 600.
    Higher values mean that more inlining is done, resulting in
    better performance of the code, at the expense of higher
    compile-time resource (time, memory) requirements and larger
    binaries.
    This parameters also controls the maximum size of functions 
considered
    for inlining in the RTL inliner.  */
DEFPARAM (PARAM_MAX_INLINE_INSNS,
	  "max-inline-insns",
	  "The maximuem number of instructions by repeated inlining before gcc 
starts to throttle inlining",
	  600)

/* After the repeated inline limit has been exceeded (see
    "max-inline-insns" parameter), a linear function is used to
    decrease the size of single functions eligible for inlining.
    The slope of this linear function is given the negative
    reciprocal value (-1/x) of this parameter.
    The default vlue is 32.
    This linear function is used until it falls below a minimum
    value specified by the "min-inline-insns" parameter.  */
DEFPARAM (PARAM_MAX_INLINE_SLOPE,
	  "max-inline-slope",
	  "The slope of the linear funtion throttling inlining after the 
recursive inlining limit has been reached is given by the negative 
reciprocal value of this parameter",
	  32)

/* When gcc has inlined so many instructions (by repeated
    inlining) that the throttling limits the inlining very much,
    inlining for very small functions is still desirable to
    achieve good runtime performance. The size of single functions
    (measured in gcc instructions) which will still be eligible for
    inlining then is given by this parameter. It defaults to 130.
    Only much later (after exceeding 128 times the recursive limit)
    inlining is cut down completely.  */
DEFPARAM (PARAM_MIN_INLINE_INSNS,
	  "min-inline-insns",
	  "The number of instructions in a single functions still eligible to 
inlining after a lot recursive inlining",
	  130)

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: abysmal code generated by gcc 3.2
  2002-10-21 15:37     ` Mike Stump
@ 2002-10-21 16:06       ` Dale Johannesen
  2002-10-22  6:03         ` Michael Matz
  0 siblings, 1 reply; 16+ messages in thread
From: Dale Johannesen @ 2002-10-21 16:06 UTC (permalink / raw)
  To: Mike Stump; +Cc: Dale Johannesen, Denys Duchier, gcc


On Monday, October 21, 2002, at 12:21  PM, Mike Stump wrote:
> On Monday, October 21, 2002, at 11:45 AM, Denys Duchier wrote:
>> Mike Stump <mrs@apple.com> writes:
>>> and experimenting
>>> with all the relevant compiler flags, for example, see the following
>>> parameters: max-inline-insns-single, max-inline-insns,
>>> max-inline-slope, and min-inline-insns.
>>
>> uh? except for max-inline-insns, I never heard of the others.  Are
>> these new (and what do they mean? - is that maybe documented in CVS).
>
> I'll assume all of these are documented in the manual,

No, they are not.  Perhaps the patch that added them

2002-04-27  Kurt Garloff <garloff@suse.de>

         * tree-inline.c (inlinable_function_p): Improve heuristics
         by using a smoother function to cut down allowable inlinable 
size.
         * param.def: Add parameters max-inline-insns-single,
         max-inline-slope, min-inline-insns that determine the exact
         shape of the above function.
         * param.h: Likewise.

should be reverted until this is fixed.  (-fmax-inline-insns was broken 
at
the same time so that it no longer works to increase the size limit.)

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: abysmal code generated by gcc 3.2
  2002-10-21  0:56 abysmal code generated by gcc 3.2 Denys Duchier
  2002-10-21  7:37 ` Fergus Henderson
  2002-10-21 12:59 ` Mike Stump
@ 2002-10-21 18:43 ` Denys Duchier
  2002-10-22  3:56   ` Richard Henderson
  2 siblings, 1 reply; 16+ messages in thread
From: Denys Duchier @ 2002-10-21 18:43 UTC (permalink / raw)
  To: Denys Duchier; +Cc: gcc

Compiling emulate.cc with ... -O3 -fno-inline-functions ... with gcc
3.2 is still not nearly as good as -O3 with gcc 2.95, but at least it
is in the same ball park.  After examining the assembly code some
more: a lot of it looks nicer with 3.2 than with 2.95 (once the
compiler has been stopped from behaving utterly foolishly).  So, I am
guessing that now most of the regression lies elsewhere.  After
comparing builtins.s for 2.95 and 3.2, I noticed some patterns; for
example with 2.95 I get all over the place:

	testb $3, %dl

while with 3.2 I get instead:

	testl $3, %edx

I believe this is the tag test for ref pointers, for the inlined deref
loop.  This is is highly critical since pretty much everywhere we must
synchronize on data and make sure we unwind chains of deref pointers
before we can actually operate on the data.

For example, here is the relevant part for the BIsendPort primitive
compiled with gcc 3.2:

BIsendPort:
        pushl   %ebx
        subl    $40, %esp
        movl    48(%esp), %ecx  #  _OZ_LOC,  _OZ_LOC
        movl    (%ecx), %eax    # * _OZ_LOC
        movl    (%eax), %edx    #  prt
        xorl    %eax, %eax
        testl   $3, %edx        #  prt
        jne     .L2922
        .p2align 4,,15
.L2914:
        movl    %edx, %eax      #  prt,  prtPtr
        movl    (%edx), %edx    # * prt,  prt
        testl   $3, %edx        #  prt
        je      .L2914
.L2922:

The corresponding code generated by gcc 2.95 is:

BIsendPort:
        subl $24,%esp
        pushl %ebx
        movl 32(%esp),%ecx
        movl (%ecx),%eax
        movl (%eax),%edx
        xorl %eax,%eax
        testb $3,%dl
        jne .L20218
        .p2align 4,,7
.L20219:
        movl %edx,%eax
        movl (%edx),%edx
        testb $3,%dl
        je .L20219
.L20218:

Could the difference explain part of the regression I am seeing or
should I be looking for something else?  If testl is significantly
more costly, how can I get testb back?

Cheers,

-- 
Dr. Denys Duchier			Denys.Duchier@ps.uni-sb.de
Forschungsbereich Programmiersysteme	(Programming Systems Lab)
Universitaet des Saarlandes, Geb. 45	http://www.ps.uni-sb.de/~duchier
Postfach 15 11 50			Phone: +49 681 302 5618
66041 Saarbruecken, Germany		Fax:   +49 681 302 5615

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: abysmal code generated by gcc 3.2
  2002-10-21 18:43 ` Denys Duchier
@ 2002-10-22  3:56   ` Richard Henderson
  0 siblings, 0 replies; 16+ messages in thread
From: Richard Henderson @ 2002-10-22  3:56 UTC (permalink / raw)
  To: Denys Duchier; +Cc: gcc

On Tue, Oct 22, 2002 at 01:49:47AM +0200, Denys Duchier wrote:
> 	testb $3, %dl
> 
> while with 3.2 I get instead:
> 
> 	testl $3, %edx

This will not explain any difference at all.


r~

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: abysmal code generated by gcc 3.2
  2002-10-21 16:06       ` Dale Johannesen
@ 2002-10-22  6:03         ` Michael Matz
  2002-10-22  8:30           ` Kurt Garloff
  2002-10-22 11:29           ` Dale Johannesen
  0 siblings, 2 replies; 16+ messages in thread
From: Michael Matz @ 2002-10-22  6:03 UTC (permalink / raw)
  To: Dale Johannesen, Kurt Garloff; +Cc: Mike Stump, Denys Duchier, gcc

Hi,

On Mon, 21 Oct 2002, Dale Johannesen wrote:

> >> uh? except for max-inline-insns, I never heard of the others.  Are
> >> these new (and what do they mean? - is that maybe documented in CVS).
> >
> > I'll assume all of these are documented in the manual,
>
> No, they are not.  Perhaps the patch that added them
>
> 2002-04-27  Kurt Garloff <garloff@suse.de>
>
>          * tree-inline.c (inlinable_function_p): Improve heuristics
>          by using a smoother function to cut down allowable inlinable
> size.
>          * param.def: Add parameters max-inline-insns-single,
>          max-inline-slope, min-inline-insns that determine the exact
>          shape of the above function.
>          * param.h: Likewise.
>
> should be reverted until this is fixed.

A little bit too harsh I would say.  The documentation (which Kurt wrote)
just wasn't committed for some reason (I believe another person than Kurt
actually committed the patch on Kurt's behalf).

> (-fmax-inline-insns was broken at the same time so that it no longer
> works to increase the size limit.)


Ciao,
Michael.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: abysmal code generated by gcc 3.2
  2002-10-22  6:03         ` Michael Matz
@ 2002-10-22  8:30           ` Kurt Garloff
  2002-10-22 11:29           ` Dale Johannesen
  1 sibling, 0 replies; 16+ messages in thread
From: Kurt Garloff @ 2002-10-22  8:30 UTC (permalink / raw)
  To: Michael Matz; +Cc: Dale Johannesen, Mike Stump, Denys Duchier, gcc

[-- Attachment #1: Type: text/plain, Size: 1473 bytes --]

Hi,

On Tue, Oct 22, 2002 at 10:51:47AM +0200, Michael Matz wrote:
> On Mon, 21 Oct 2002, Dale Johannesen wrote:
> 
> > >> uh? except for max-inline-insns, I never heard of the others.  Are
> > >> these new (and what do they mean? - is that maybe documented in CVS).
> > >
> > > I'll assume all of these are documented in the manual,
> >
> > No, they are not.  Perhaps the patch that added them
> >
> > 2002-04-27  Kurt Garloff <garloff@suse.de>
> >
> >          * tree-inline.c (inlinable_function_p): Improve heuristics
> >          by using a smoother function to cut down allowable inlinable
> > size.
> >          * param.def: Add parameters max-inline-insns-single,
> >          max-inline-slope, min-inline-insns that determine the exact
> >          shape of the above function.
> >          * param.h: Likewise.
> >
> > should be reverted until this is fixed.
> 
> A little bit too harsh I would say.  The documentation (which Kurt wrote)
> just wasn't committed for some reason (I believe another person than Kurt
> actually committed the patch on Kurt's behalf).

This reminds me I should finally take care of splitting the rest of my
pending patches and submit the ones containing the documentation.

Sorry,
-- 
Kurt Garloff  <garloff@suse.de>                          Eindhoven, NL
GPG key: See mail header, key servers                        SuSE Labs
SuSE Linux AG, Nuernberg, DE                            SCSI, Security

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: abysmal code generated by gcc 3.2
  2002-10-22  6:03         ` Michael Matz
  2002-10-22  8:30           ` Kurt Garloff
@ 2002-10-22 11:29           ` Dale Johannesen
  1 sibling, 0 replies; 16+ messages in thread
From: Dale Johannesen @ 2002-10-22 11:29 UTC (permalink / raw)
  To: Michael Matz; +Cc: Dale Johannesen, gcc


On Tuesday, October 22, 2002, at 01:51  AM, Michael Matz wrote:
> On Mon, 21 Oct 2002, Dale Johannesen wrote:
>
>>>> uh? except for max-inline-insns, I never heard of the others.  Are
>>>> these new (and what do they mean? - is that maybe documented in 
>>>> CVS).
>>>
>>> I'll assume all of these are documented in the manual,
>>
>> No, they are not.  Perhaps the patch that added them
>>
>> 2002-04-27  Kurt Garloff <garloff@suse.de>
>>
>>          * tree-inline.c (inlinable_function_p): Improve heuristics
>>          by using a smoother function to cut down allowable inlinable
>> size.
>>          * param.def: Add parameters max-inline-insns-single,
>>          max-inline-slope, min-inline-insns that determine the exact
>>          shape of the above function.
>>          * param.h: Likewise.
>>
>> should be reverted until this is fixed.
>
> A little bit too harsh I would say.  The documentation (which Kurt 
> wrote)
> just wasn't committed for some reason (I believe another person than 
> Kurt
> actually committed the patch on Kurt's behalf).
>> (-fmax-inline-insns was broken at the same time so that it no longer
>> works to increase the size limit.)
>
Well, it is too harsh, and besides I believe the patch is generally a 
good
thing, but this has been broken for months and nobody seems to be 
interested
in fixing it.  How else can I get it to happen?  (It may not be just 
documentation,
depending on how you look at it; the functionality of 
-fmax-inline-insns is
changed, and no longer matches its documentation.  It's not clear this 
change in
functionality was approved; in general a change to the documented user 
interface
needs quite a good reason IMO, as it can screw up people with existing 
makefiles.)

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: abysmal code generated by gcc 3.2
@ 2002-10-21 11:01 Joe Wilson
  0 siblings, 0 replies; 16+ messages in thread
From: Joe Wilson @ 2002-10-21 11:01 UTC (permalink / raw)
  To: Denys Duchier; +Cc: gcc

My mistake.  I did not see the cross-jump in the -O3 -finline-limit case.
If you follow the jumps you have many more instructions for MOVEXX.

-O2 does produce "optimal" code, though.

How does one disable these cross-jumps (if that's the correct term) in GCC 3.2?

--- Joe Wilson <developir@yahoo.com> wrote:
> The following GCC 3.2 flags:
> 
> -S -fno-exceptions -O3 -pipe -fstrict-aliasing -march=pentium -mcpu=pentiumpro
> -fomit-frame-pointer emulate.ii -finline-limit=10000000
> 
> also produce:
> 
>          MOVEXX:
> /NO_APP 
>         movl    4(%ebp), %esi
>         movl    8(%ebp), %eax
>         movl    (%esi), %ebx
>         movl    %ebx, (%eax)
> L6530:  
>         addl    $12, %ebp
>         jmp     L6288 
> 
> Using -O2 with the default inline limit produces comparable results as well.
> 
> --- Denys Duchier <Denys.Duchier@ps.uni-sb.de> wrote: 
> > As I mentioned to Brad Lucier (pc), it seems that the poor code
> > generation is somehow triggered in connection with inlining.  IIRC (I
> > have tried so many variations) If I supply -fno-inline-functions then
> > indeed I get the code above.  This very marginally improves straight
> > emulated recursion, but degrades the rest of the emulated
> > instructions.  Overall, its a loss.  Any further lowering of the
> > optimization level also leads to a degradation in performance.
> 
> 
> __________________________________________________
> Do you Yahoo!?
> Y! Web Hosting - Let the expert host your web site
> http://webhosting.yahoo.com/
> 


__________________________________________________
Do you Yahoo!?
Y! Web Hosting - Let the expert host your web site
http://webhosting.yahoo.com/

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: abysmal code generated by gcc 3.2
  2002-10-21 10:29 ` Denys Duchier
@ 2002-10-21 10:47   ` Joe Wilson
  0 siblings, 0 replies; 16+ messages in thread
From: Joe Wilson @ 2002-10-21 10:47 UTC (permalink / raw)
  To: Denys Duchier; +Cc: gcc

The following GCC 3.2 flags:

-S -fno-exceptions -O3 -pipe -fstrict-aliasing -march=pentium -mcpu=pentiumpro
-fomit-frame-pointer emulate.ii -finline-limit=10000000

also produce:

         MOVEXX:
/NO_APP 
        movl    4(%ebp), %esi
        movl    8(%ebp), %eax
        movl    (%esi), %ebx
        movl    %ebx, (%eax)
L6530:  
        addl    $12, %ebp
        jmp     L6288 

Using -O2 with the default inline limit produces comparable results as well.

--- Denys Duchier <Denys.Duchier@ps.uni-sb.de> wrote: 
> As I mentioned to Brad Lucier (pc), it seems that the poor code
> generation is somehow triggered in connection with inlining.  IIRC (I
> have tried so many variations) If I supply -fno-inline-functions then
> indeed I get the code above.  This very marginally improves straight
> emulated recursion, but degrades the rest of the emulated
> instructions.  Overall, its a loss.  Any further lowering of the
> optimization level also leads to a degradation in performance.


__________________________________________________
Do you Yahoo!?
Y! Web Hosting - Let the expert host your web site
http://webhosting.yahoo.com/

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: abysmal code generated by gcc 3.2
       [not found] <20021021142058.25276.qmail@web21104.mail.yahoo.com>
@ 2002-10-21 10:29 ` Denys Duchier
  2002-10-21 10:47   ` Joe Wilson
  0 siblings, 1 reply; 16+ messages in thread
From: Denys Duchier @ 2002-10-21 10:29 UTC (permalink / raw)
  To: Joe Wilson; +Cc: gcc

Joe Wilson <developir@yahoo.com> writes:

> Hello Denys,
>
> If you compile with -O instead of -O3 you will get the following
> code for MOVEXX for GCC 3.2, which is similar to what you had for GCC 2.95:
>
> /usr/local/gcc3_2/bin/gcc -S -fno-exceptions -O -pipe -fstrict-aliasing -march=pentium
> -mcpu=pentiumpro -fomit-frame-pointer emulate.ii
>
>          MOVEXX:
> /NO_APP 
>         movl    8(%ebp), %eax
>         movl    4(%ebp), %edx
>         movl    (%edx), %edx
>         movl    %edx, (%eax)
>         addl    $12, %ebp
>         jmp     *(%ebp)
>
> I am not sure which GCC -O2 + optimization is causing the code quality regression.
> If you do find out, please share it with the GCC list.

As I mentioned to Brad Lucier (pc), it seems that the poor code
generation is somehow triggered in connection with inlining.  IIRC (I
have tried so many variations) If I supply -fno-inline-functions then
indeed I get the code above.  This very marginally improves straight
emulated recursion, but degrades the rest of the emulated
instructions.  Overall, its a loss.  Any further lowering of the
optimization level also leads to a degradation in performance.

Cheers,

-- 
Dr. Denys Duchier			Denys.Duchier@ps.uni-sb.de
Forschungsbereich Programmiersysteme	(Programming Systems Lab)
Universitaet des Saarlandes, Geb. 45	http://www.ps.uni-sb.de/~duchier
Postfach 15 11 50			Phone: +49 681 302 5618
66041 Saarbruecken, Germany		Fax:   +49 681 302 5615

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2002-10-22 16:56 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2002-10-21  0:56 abysmal code generated by gcc 3.2 Denys Duchier
2002-10-21  7:37 ` Fergus Henderson
2002-10-21  8:48   ` Denys Duchier
2002-10-21 12:59 ` Mike Stump
2002-10-21 15:07   ` Denys Duchier
2002-10-21 15:12     ` Fergus Henderson
2002-10-21 15:37     ` Mike Stump
2002-10-21 16:06       ` Dale Johannesen
2002-10-22  6:03         ` Michael Matz
2002-10-22  8:30           ` Kurt Garloff
2002-10-22 11:29           ` Dale Johannesen
2002-10-21 18:43 ` Denys Duchier
2002-10-22  3:56   ` Richard Henderson
     [not found] <20021021142058.25276.qmail@web21104.mail.yahoo.com>
2002-10-21 10:29 ` Denys Duchier
2002-10-21 10:47   ` Joe Wilson
2002-10-21 11:01 Joe Wilson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).