* Re: abysmal code generated by gcc 3.2
@ 2002-10-21 11:01 Joe Wilson
0 siblings, 0 replies; 16+ messages in thread
From: Joe Wilson @ 2002-10-21 11:01 UTC (permalink / raw)
To: Denys Duchier; +Cc: gcc
My mistake. I did not see the cross-jump in the -O3 -finline-limit case.
If you follow the jumps you have many more instructions for MOVEXX.
-O2 does produce "optimal" code, though.
How does one disable these cross-jumps (if that's the correct term) in GCC 3.2?
--- Joe Wilson <developir@yahoo.com> wrote:
> The following GCC 3.2 flags:
>
> -S -fno-exceptions -O3 -pipe -fstrict-aliasing -march=pentium -mcpu=pentiumpro
> -fomit-frame-pointer emulate.ii -finline-limit=10000000
>
> also produce:
>
> MOVEXX:
> /NO_APP
> movl 4(%ebp), %esi
> movl 8(%ebp), %eax
> movl (%esi), %ebx
> movl %ebx, (%eax)
> L6530:
> addl $12, %ebp
> jmp L6288
>
> Using -O2 with the default inline limit produces comparable results as well.
>
> --- Denys Duchier <Denys.Duchier@ps.uni-sb.de> wrote:
> > As I mentioned to Brad Lucier (pc), it seems that the poor code
> > generation is somehow triggered in connection with inlining. IIRC (I
> > have tried so many variations) If I supply -fno-inline-functions then
> > indeed I get the code above. This very marginally improves straight
> > emulated recursion, but degrades the rest of the emulated
> > instructions. Overall, its a loss. Any further lowering of the
> > optimization level also leads to a degradation in performance.
>
>
> __________________________________________________
> Do you Yahoo!?
> Y! Web Hosting - Let the expert host your web site
> http://webhosting.yahoo.com/
>
__________________________________________________
Do you Yahoo!?
Y! Web Hosting - Let the expert host your web site
http://webhosting.yahoo.com/
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: abysmal code generated by gcc 3.2
2002-10-22 6:03 ` Michael Matz
2002-10-22 8:30 ` Kurt Garloff
@ 2002-10-22 11:29 ` Dale Johannesen
1 sibling, 0 replies; 16+ messages in thread
From: Dale Johannesen @ 2002-10-22 11:29 UTC (permalink / raw)
To: Michael Matz; +Cc: Dale Johannesen, gcc
On Tuesday, October 22, 2002, at 01:51 AM, Michael Matz wrote:
> On Mon, 21 Oct 2002, Dale Johannesen wrote:
>
>>>> uh? except for max-inline-insns, I never heard of the others. Are
>>>> these new (and what do they mean? - is that maybe documented in
>>>> CVS).
>>>
>>> I'll assume all of these are documented in the manual,
>>
>> No, they are not. Perhaps the patch that added them
>>
>> 2002-04-27 Kurt Garloff <garloff@suse.de>
>>
>> * tree-inline.c (inlinable_function_p): Improve heuristics
>> by using a smoother function to cut down allowable inlinable
>> size.
>> * param.def: Add parameters max-inline-insns-single,
>> max-inline-slope, min-inline-insns that determine the exact
>> shape of the above function.
>> * param.h: Likewise.
>>
>> should be reverted until this is fixed.
>
> A little bit too harsh I would say. The documentation (which Kurt
> wrote)
> just wasn't committed for some reason (I believe another person than
> Kurt
> actually committed the patch on Kurt's behalf).
>> (-fmax-inline-insns was broken at the same time so that it no longer
>> works to increase the size limit.)
>
Well, it is too harsh, and besides I believe the patch is generally a
good
thing, but this has been broken for months and nobody seems to be
interested
in fixing it. How else can I get it to happen? (It may not be just
documentation,
depending on how you look at it; the functionality of
-fmax-inline-insns is
changed, and no longer matches its documentation. It's not clear this
change in
functionality was approved; in general a change to the documented user
interface
needs quite a good reason IMO, as it can screw up people with existing
makefiles.)
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: abysmal code generated by gcc 3.2
2002-10-22 6:03 ` Michael Matz
@ 2002-10-22 8:30 ` Kurt Garloff
2002-10-22 11:29 ` Dale Johannesen
1 sibling, 0 replies; 16+ messages in thread
From: Kurt Garloff @ 2002-10-22 8:30 UTC (permalink / raw)
To: Michael Matz; +Cc: Dale Johannesen, Mike Stump, Denys Duchier, gcc
[-- Attachment #1: Type: text/plain, Size: 1473 bytes --]
Hi,
On Tue, Oct 22, 2002 at 10:51:47AM +0200, Michael Matz wrote:
> On Mon, 21 Oct 2002, Dale Johannesen wrote:
>
> > >> uh? except for max-inline-insns, I never heard of the others. Are
> > >> these new (and what do they mean? - is that maybe documented in CVS).
> > >
> > > I'll assume all of these are documented in the manual,
> >
> > No, they are not. Perhaps the patch that added them
> >
> > 2002-04-27 Kurt Garloff <garloff@suse.de>
> >
> > * tree-inline.c (inlinable_function_p): Improve heuristics
> > by using a smoother function to cut down allowable inlinable
> > size.
> > * param.def: Add parameters max-inline-insns-single,
> > max-inline-slope, min-inline-insns that determine the exact
> > shape of the above function.
> > * param.h: Likewise.
> >
> > should be reverted until this is fixed.
>
> A little bit too harsh I would say. The documentation (which Kurt wrote)
> just wasn't committed for some reason (I believe another person than Kurt
> actually committed the patch on Kurt's behalf).
This reminds me I should finally take care of splitting the rest of my
pending patches and submit the ones containing the documentation.
Sorry,
--
Kurt Garloff <garloff@suse.de> Eindhoven, NL
GPG key: See mail header, key servers SuSE Labs
SuSE Linux AG, Nuernberg, DE SCSI, Security
[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: abysmal code generated by gcc 3.2
2002-10-21 16:06 ` Dale Johannesen
@ 2002-10-22 6:03 ` Michael Matz
2002-10-22 8:30 ` Kurt Garloff
2002-10-22 11:29 ` Dale Johannesen
0 siblings, 2 replies; 16+ messages in thread
From: Michael Matz @ 2002-10-22 6:03 UTC (permalink / raw)
To: Dale Johannesen, Kurt Garloff; +Cc: Mike Stump, Denys Duchier, gcc
Hi,
On Mon, 21 Oct 2002, Dale Johannesen wrote:
> >> uh? except for max-inline-insns, I never heard of the others. Are
> >> these new (and what do they mean? - is that maybe documented in CVS).
> >
> > I'll assume all of these are documented in the manual,
>
> No, they are not. Perhaps the patch that added them
>
> 2002-04-27 Kurt Garloff <garloff@suse.de>
>
> * tree-inline.c (inlinable_function_p): Improve heuristics
> by using a smoother function to cut down allowable inlinable
> size.
> * param.def: Add parameters max-inline-insns-single,
> max-inline-slope, min-inline-insns that determine the exact
> shape of the above function.
> * param.h: Likewise.
>
> should be reverted until this is fixed.
A little bit too harsh I would say. The documentation (which Kurt wrote)
just wasn't committed for some reason (I believe another person than Kurt
actually committed the patch on Kurt's behalf).
> (-fmax-inline-insns was broken at the same time so that it no longer
> works to increase the size limit.)
Ciao,
Michael.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: abysmal code generated by gcc 3.2
2002-10-21 18:43 ` Denys Duchier
@ 2002-10-22 3:56 ` Richard Henderson
0 siblings, 0 replies; 16+ messages in thread
From: Richard Henderson @ 2002-10-22 3:56 UTC (permalink / raw)
To: Denys Duchier; +Cc: gcc
On Tue, Oct 22, 2002 at 01:49:47AM +0200, Denys Duchier wrote:
> testb $3, %dl
>
> while with 3.2 I get instead:
>
> testl $3, %edx
This will not explain any difference at all.
r~
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: abysmal code generated by gcc 3.2
2002-10-21 0:56 Denys Duchier
2002-10-21 7:37 ` Fergus Henderson
2002-10-21 12:59 ` Mike Stump
@ 2002-10-21 18:43 ` Denys Duchier
2002-10-22 3:56 ` Richard Henderson
2 siblings, 1 reply; 16+ messages in thread
From: Denys Duchier @ 2002-10-21 18:43 UTC (permalink / raw)
To: Denys Duchier; +Cc: gcc
Compiling emulate.cc with ... -O3 -fno-inline-functions ... with gcc
3.2 is still not nearly as good as -O3 with gcc 2.95, but at least it
is in the same ball park. After examining the assembly code some
more: a lot of it looks nicer with 3.2 than with 2.95 (once the
compiler has been stopped from behaving utterly foolishly). So, I am
guessing that now most of the regression lies elsewhere. After
comparing builtins.s for 2.95 and 3.2, I noticed some patterns; for
example with 2.95 I get all over the place:
testb $3, %dl
while with 3.2 I get instead:
testl $3, %edx
I believe this is the tag test for ref pointers, for the inlined deref
loop. This is is highly critical since pretty much everywhere we must
synchronize on data and make sure we unwind chains of deref pointers
before we can actually operate on the data.
For example, here is the relevant part for the BIsendPort primitive
compiled with gcc 3.2:
BIsendPort:
pushl %ebx
subl $40, %esp
movl 48(%esp), %ecx # _OZ_LOC, _OZ_LOC
movl (%ecx), %eax # * _OZ_LOC
movl (%eax), %edx # prt
xorl %eax, %eax
testl $3, %edx # prt
jne .L2922
.p2align 4,,15
.L2914:
movl %edx, %eax # prt, prtPtr
movl (%edx), %edx # * prt, prt
testl $3, %edx # prt
je .L2914
.L2922:
The corresponding code generated by gcc 2.95 is:
BIsendPort:
subl $24,%esp
pushl %ebx
movl 32(%esp),%ecx
movl (%ecx),%eax
movl (%eax),%edx
xorl %eax,%eax
testb $3,%dl
jne .L20218
.p2align 4,,7
.L20219:
movl %edx,%eax
movl (%edx),%edx
testb $3,%dl
je .L20219
.L20218:
Could the difference explain part of the regression I am seeing or
should I be looking for something else? If testl is significantly
more costly, how can I get testb back?
Cheers,
--
Dr. Denys Duchier Denys.Duchier@ps.uni-sb.de
Forschungsbereich Programmiersysteme (Programming Systems Lab)
Universitaet des Saarlandes, Geb. 45 http://www.ps.uni-sb.de/~duchier
Postfach 15 11 50 Phone: +49 681 302 5618
66041 Saarbruecken, Germany Fax: +49 681 302 5615
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: abysmal code generated by gcc 3.2
2002-10-21 15:37 ` Mike Stump
@ 2002-10-21 16:06 ` Dale Johannesen
2002-10-22 6:03 ` Michael Matz
0 siblings, 1 reply; 16+ messages in thread
From: Dale Johannesen @ 2002-10-21 16:06 UTC (permalink / raw)
To: Mike Stump; +Cc: Dale Johannesen, Denys Duchier, gcc
On Monday, October 21, 2002, at 12:21 PM, Mike Stump wrote:
> On Monday, October 21, 2002, at 11:45 AM, Denys Duchier wrote:
>> Mike Stump <mrs@apple.com> writes:
>>> and experimenting
>>> with all the relevant compiler flags, for example, see the following
>>> parameters: max-inline-insns-single, max-inline-insns,
>>> max-inline-slope, and min-inline-insns.
>>
>> uh? except for max-inline-insns, I never heard of the others. Are
>> these new (and what do they mean? - is that maybe documented in CVS).
>
> I'll assume all of these are documented in the manual,
No, they are not. Perhaps the patch that added them
2002-04-27 Kurt Garloff <garloff@suse.de>
* tree-inline.c (inlinable_function_p): Improve heuristics
by using a smoother function to cut down allowable inlinable
size.
* param.def: Add parameters max-inline-insns-single,
max-inline-slope, min-inline-insns that determine the exact
shape of the above function.
* param.h: Likewise.
should be reverted until this is fixed. (-fmax-inline-insns was broken
at
the same time so that it no longer works to increase the size limit.)
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: abysmal code generated by gcc 3.2
2002-10-21 15:07 ` Denys Duchier
2002-10-21 15:12 ` Fergus Henderson
@ 2002-10-21 15:37 ` Mike Stump
2002-10-21 16:06 ` Dale Johannesen
1 sibling, 1 reply; 16+ messages in thread
From: Mike Stump @ 2002-10-21 15:37 UTC (permalink / raw)
To: Denys Duchier; +Cc: gcc
On Monday, October 21, 2002, at 11:45 AM, Denys Duchier wrote:
> Mike Stump <mrs@apple.com> writes:
>
>> Well, if all else fails, you can build compilers from the cvs tree,
>> and binary search for when code generation changed from good to bad
>> for you.
>
> That possibly could reveal what's at the root of the issue, but it
> would not solve my problem which is to get my application to perform
> well on today's distributions. Every Linux distribution is now based
> on gcc 3.2, thus I must get the Oz emulator to perform well when
> compiled with it.
>> I'd do this only after experimenting with the top of the tree (to
>> ensure performance hasn't already returned for you),
>
> yes, I definitely plan to experiment with gcc out of CVS to see what I
> can expect in the future. However, for the moment (1) I must solve
> the remaining issues introduced with the switch to the new ABI, (2) I
> need to make it reasonably fast across the board for the current user
> base.
>
>> and experimenting
>> with all the relevant compiler flags, for example, see the following
>> parameters: max-inline-insns-single, max-inline-insns,
>> max-inline-slope, and min-inline-insns.
>
> uh? except for max-inline-insns, I never heard of the others. Are
> these new (and what do they mean? - is that maybe documented in CVS).
I'll assume all of these are documented in the manual, if all else
fails, you can read it, and it that fails, you can see params.def:
/* The single function inlining limit. This is the maximum size
of a function counted in internal gcc instructions (not in
real machine instructions) that is eligible for inlining
by the tree inliner.
The default value is 300.
Only functions marked inline (or methods defined in the class
definition for C++) are affected by this, unless you set the
-finline-functions (included in -O3) compiler option.
There are more restrictions to inlining: If inlined functions
call other functions, the already inlined instructions are
counted and once the recursive inline limit (see
"max-inline-insns" parameter) is exceeded, the acceptable size
gets decreased. */
DEFPARAM (PARAM_MAX_INLINE_INSNS_SINGLE,
"max-inline-insns-single",
"The maximum number of instructions in a single function eliglible
for inlining",
300)
/* The repeated inlining limit. After this number of instructions
(in the internal gcc representation, not real machine instructions)
got inlined by repeated inlining, gcc starts to decrease the maximum
number of inlinable instructions in the tree inliner.
This is done by a linear function, see "max-inline-slope" parameter.
It is necessary in order to limit the compile-time resources, that
could otherwise become very high.
It is recommended to set this value to twice the value of the single
function limit (set by the "max-inline-insns-single" parameter) or
higher. The default value is 600.
Higher values mean that more inlining is done, resulting in
better performance of the code, at the expense of higher
compile-time resource (time, memory) requirements and larger
binaries.
This parameters also controls the maximum size of functions
considered
for inlining in the RTL inliner. */
DEFPARAM (PARAM_MAX_INLINE_INSNS,
"max-inline-insns",
"The maximuem number of instructions by repeated inlining before gcc
starts to throttle inlining",
600)
/* After the repeated inline limit has been exceeded (see
"max-inline-insns" parameter), a linear function is used to
decrease the size of single functions eligible for inlining.
The slope of this linear function is given the negative
reciprocal value (-1/x) of this parameter.
The default vlue is 32.
This linear function is used until it falls below a minimum
value specified by the "min-inline-insns" parameter. */
DEFPARAM (PARAM_MAX_INLINE_SLOPE,
"max-inline-slope",
"The slope of the linear funtion throttling inlining after the
recursive inlining limit has been reached is given by the negative
reciprocal value of this parameter",
32)
/* When gcc has inlined so many instructions (by repeated
inlining) that the throttling limits the inlining very much,
inlining for very small functions is still desirable to
achieve good runtime performance. The size of single functions
(measured in gcc instructions) which will still be eligible for
inlining then is given by this parameter. It defaults to 130.
Only much later (after exceeding 128 times the recursive limit)
inlining is cut down completely. */
DEFPARAM (PARAM_MIN_INLINE_INSNS,
"min-inline-insns",
"The number of instructions in a single functions still eligible to
inlining after a lot recursive inlining",
130)
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: abysmal code generated by gcc 3.2
2002-10-21 15:07 ` Denys Duchier
@ 2002-10-21 15:12 ` Fergus Henderson
2002-10-21 15:37 ` Mike Stump
1 sibling, 0 replies; 16+ messages in thread
From: Fergus Henderson @ 2002-10-21 15:12 UTC (permalink / raw)
To: Denys Duchier; +Cc: gcc
On 21-Oct-2002, Denys Duchier <Denys.Duchier@ps.uni-sb.de> wrote:
> Mike Stump <mrs@apple.com> writes:
>
> > Well, if all else fails, you can build compilers from the cvs tree,
> > and binary search for when code generation changed from good to bad
> > for you.
>
> That possibly could reveal what's at the root of the issue, but it
> would not solve my problem which is to get my application to perform
> well on today's distributions. Every Linux distribution is now based
> on gcc 3.2, thus I must get the Oz emulator to perform well when
> compiled with it.
If you can identify a patch that is responsible for a significant
performance regression, then the release manager might be willing
to revert that patch for the next minor release (e.g. 3.2.2).
That would be likely to make it's way into the distributions
reasonably soon.
--
Fergus Henderson <fjh@cs.mu.oz.au> | "I have always known that the pursuit
The University of Melbourne | of excellence is a lethal habit"
WWW: <http://www.cs.mu.oz.au/~fjh> | -- the last words of T. S. Garp.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: abysmal code generated by gcc 3.2
2002-10-21 12:59 ` Mike Stump
@ 2002-10-21 15:07 ` Denys Duchier
2002-10-21 15:12 ` Fergus Henderson
2002-10-21 15:37 ` Mike Stump
0 siblings, 2 replies; 16+ messages in thread
From: Denys Duchier @ 2002-10-21 15:07 UTC (permalink / raw)
To: gcc
Mike Stump <mrs@apple.com> writes:
> Well, if all else fails, you can build compilers from the cvs tree,
> and binary search for when code generation changed from good to bad
> for you.
That possibly could reveal what's at the root of the issue, but it
would not solve my problem which is to get my application to perform
well on today's distributions. Every Linux distribution is now based
on gcc 3.2, thus I must get the Oz emulator to perform well when
compiled with it.
> I'd do this only after experimenting with the top of the tree (to
> ensure performance hasn't already returned for you),
yes, I definitely plan to experiment with gcc out of CVS to see what I
can expect in the future. However, for the moment (1) I must solve
the remaining issues introduced with the switch to the new ABI, (2) I
need to make it reasonably fast across the board for the current user
base.
> and experimenting
> with all the relevant compiler flags, for example, see the following
> parameters: max-inline-insns-single, max-inline-insns,
> max-inline-slope, and min-inline-insns.
uh? except for max-inline-insns, I never heard of the others. Are
these new (and what do they mean? - is that maybe documented in CVS).
Cheers,
--
Dr. Denys Duchier Denys.Duchier@ps.uni-sb.de
Forschungsbereich Programmiersysteme (Programming Systems Lab)
Universitaet des Saarlandes, Geb. 45 http://www.ps.uni-sb.de/~duchier
Postfach 15 11 50 Phone: +49 681 302 5618
66041 Saarbruecken, Germany Fax: +49 681 302 5615
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: abysmal code generated by gcc 3.2
2002-10-21 0:56 Denys Duchier
2002-10-21 7:37 ` Fergus Henderson
@ 2002-10-21 12:59 ` Mike Stump
2002-10-21 15:07 ` Denys Duchier
2002-10-21 18:43 ` Denys Duchier
2 siblings, 1 reply; 16+ messages in thread
From: Mike Stump @ 2002-10-21 12:59 UTC (permalink / raw)
To: Denys Duchier; +Cc: gcc
On Sunday, October 20, 2002, at 03:02 PM, Denys Duchier wrote:
> I am at my wits ends. Can anyone help?
Well, if all else fails, you can build compilers from the cvs tree, and
binary search for when code generation changed from good to bad for
you. cvs co -D '300 days ago' gcc and then see if that compiler is as
bad. If it is not, cvs co -D '150 days ago' and try that one... If
you're lucky, in about 8 checkouts and rebuilds you should have it down
to the day the codegen died for you.
I'd do this only after experimenting with the top of the tree (to
ensure performance hasn't already returned for you), and experimenting
with all the relevant compiler flags, for example, see the following
parameters: max-inline-insns-single, max-inline-insns,
max-inline-slope, and min-inline-insns.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: abysmal code generated by gcc 3.2
2002-10-21 10:29 ` Denys Duchier
@ 2002-10-21 10:47 ` Joe Wilson
0 siblings, 0 replies; 16+ messages in thread
From: Joe Wilson @ 2002-10-21 10:47 UTC (permalink / raw)
To: Denys Duchier; +Cc: gcc
The following GCC 3.2 flags:
-S -fno-exceptions -O3 -pipe -fstrict-aliasing -march=pentium -mcpu=pentiumpro
-fomit-frame-pointer emulate.ii -finline-limit=10000000
also produce:
MOVEXX:
/NO_APP
movl 4(%ebp), %esi
movl 8(%ebp), %eax
movl (%esi), %ebx
movl %ebx, (%eax)
L6530:
addl $12, %ebp
jmp L6288
Using -O2 with the default inline limit produces comparable results as well.
--- Denys Duchier <Denys.Duchier@ps.uni-sb.de> wrote:
> As I mentioned to Brad Lucier (pc), it seems that the poor code
> generation is somehow triggered in connection with inlining. IIRC (I
> have tried so many variations) If I supply -fno-inline-functions then
> indeed I get the code above. This very marginally improves straight
> emulated recursion, but degrades the rest of the emulated
> instructions. Overall, its a loss. Any further lowering of the
> optimization level also leads to a degradation in performance.
__________________________________________________
Do you Yahoo!?
Y! Web Hosting - Let the expert host your web site
http://webhosting.yahoo.com/
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: abysmal code generated by gcc 3.2
[not found] <20021021142058.25276.qmail@web21104.mail.yahoo.com>
@ 2002-10-21 10:29 ` Denys Duchier
2002-10-21 10:47 ` Joe Wilson
0 siblings, 1 reply; 16+ messages in thread
From: Denys Duchier @ 2002-10-21 10:29 UTC (permalink / raw)
To: Joe Wilson; +Cc: gcc
Joe Wilson <developir@yahoo.com> writes:
> Hello Denys,
>
> If you compile with -O instead of -O3 you will get the following
> code for MOVEXX for GCC 3.2, which is similar to what you had for GCC 2.95:
>
> /usr/local/gcc3_2/bin/gcc -S -fno-exceptions -O -pipe -fstrict-aliasing -march=pentium
> -mcpu=pentiumpro -fomit-frame-pointer emulate.ii
>
> MOVEXX:
> /NO_APP
> movl 8(%ebp), %eax
> movl 4(%ebp), %edx
> movl (%edx), %edx
> movl %edx, (%eax)
> addl $12, %ebp
> jmp *(%ebp)
>
> I am not sure which GCC -O2 + optimization is causing the code quality regression.
> If you do find out, please share it with the GCC list.
As I mentioned to Brad Lucier (pc), it seems that the poor code
generation is somehow triggered in connection with inlining. IIRC (I
have tried so many variations) If I supply -fno-inline-functions then
indeed I get the code above. This very marginally improves straight
emulated recursion, but degrades the rest of the emulated
instructions. Overall, its a loss. Any further lowering of the
optimization level also leads to a degradation in performance.
Cheers,
--
Dr. Denys Duchier Denys.Duchier@ps.uni-sb.de
Forschungsbereich Programmiersysteme (Programming Systems Lab)
Universitaet des Saarlandes, Geb. 45 http://www.ps.uni-sb.de/~duchier
Postfach 15 11 50 Phone: +49 681 302 5618
66041 Saarbruecken, Germany Fax: +49 681 302 5615
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: abysmal code generated by gcc 3.2
2002-10-21 7:37 ` Fergus Henderson
@ 2002-10-21 8:48 ` Denys Duchier
0 siblings, 0 replies; 16+ messages in thread
From: Denys Duchier @ 2002-10-21 8:48 UTC (permalink / raw)
To: gcc
Fergus Henderson <fjh@cs.mu.OZ.AU> writes:
> Could you post the source code and the `.i' file (compile with -save-temps)
> for the function in question, or for suitable parts of it?
Files emulate.ii.gz and emulate.s.gz are too big to attach to this
message. I have placed them on the web at the following URLs:
http://www.ps.uni-sb.de/~duchier/emulate.ii.gz
http://www.ps.uni-sb.de/~duchier/emulate.s.gz
Just in case, the source is also available at:
http://www.ps.uni-sb.de/~duchier/emulate.cc.gz
I am not sure what would be a _suitable_ part of it, but the MOVEXX
emulated instruction shown in my earlier email is typical.
Thanks a lot for looking into this.
Cheers,
--
Dr. Denys Duchier Denys.Duchier@ps.uni-sb.de
Forschungsbereich Programmiersysteme (Programming Systems Lab)
Universitaet des Saarlandes, Geb. 45 http://www.ps.uni-sb.de/~duchier
Postfach 15 11 50 Phone: +49 681 302 5618
66041 Saarbruecken, Germany Fax: +49 681 302 5615
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: abysmal code generated by gcc 3.2
2002-10-21 0:56 Denys Duchier
@ 2002-10-21 7:37 ` Fergus Henderson
2002-10-21 8:48 ` Denys Duchier
2002-10-21 12:59 ` Mike Stump
2002-10-21 18:43 ` Denys Duchier
2 siblings, 1 reply; 16+ messages in thread
From: Fergus Henderson @ 2002-10-21 7:37 UTC (permalink / raw)
To: Denys Duchier; +Cc: gcc
On 21-Oct-2002, Denys Duchier <Denys.Duchier@ps.uni-sb.de> wrote:
> my application is the implementation of a virtual machine for an
> emulated programming language. Switching from gcc 2.95.x to 3.2
> brought a few expected pains due to the change in data layout, but the
> major issue is that gcc 3.2 produces extremely poor code for my
> application on x86 (also on others, but I have not measured those
> personally).
Could you post the source code and the `.i' file (compile with -save-temps)
for the function in question, or for suitable parts of it?
--
Fergus Henderson <fjh@cs.mu.oz.au> | "I have always known that the pursuit
The University of Melbourne | of excellence is a lethal habit"
WWW: <http://www.cs.mu.oz.au/~fjh> | -- the last words of T. S. Garp.
^ permalink raw reply [flat|nested] 16+ messages in thread
* abysmal code generated by gcc 3.2
@ 2002-10-21 0:56 Denys Duchier
2002-10-21 7:37 ` Fergus Henderson
` (2 more replies)
0 siblings, 3 replies; 16+ messages in thread
From: Denys Duchier @ 2002-10-21 0:56 UTC (permalink / raw)
To: gcc
my application is the implementation of a virtual machine for an
emulated programming language. Switching from gcc 2.95.x to 3.2
brought a few expected pains due to the change in data layout, but the
major issue is that gcc 3.2 produces extremely poor code for my
application on x86 (also on others, but I have not measured those
personally).
Measuring just the impact on the main emulator loop (which uses the
classical threaded code technique, i.e. jumps to first class labels) I
found that the emulator was slowed down by a FACTOR of 8.27.
Looking at the generated assembly code, it is clear that the 3.2
compiler expends a lot of effort trying to keep a certain set of
values in registers. On x86, this is a horrible policy (especially in
a threaded code interpretation loop).
Part of the problem comes from an interaction with inlining. I turned
inlining off for a couple of non-critical functions which were
exposing values that the compiler ended up trying to keep in
registers, and I declared one variable volatile (much better results
than trying to switch off gcse).
This got me to only a factor 1.37 slowdown :-) ... measured on
basically pure emulated recursion (i.e. the speed of looping while
doing nothing else).
Which of course still sucks majorly since this is the MAIN emulator
loop (and since _every_ part of the implementation has been sizeably
slowed down... aargh!)
Here is an example of what I still cannot get rid of. Here is the
code produced by gcc 2.95.x for the MOVEXX instruction:
#APP
MOVEXX:
#NO_APP
movl 4(%ebp),%edx
movl 8(%ebp),%eax
addl $12,%ebp
movl (%edx),%edx
movl %edx,(%eax)
jmp *(%ebp)
Here is the code produced by gcc 3.2:
#APP
MOVEXX:
#NO_APP
movl 4(%ebp), %esi
movl 8(%ebp), %eax
addl $12, %ebp # PC
movl (%esi), %ebx
movl _oz_heap_end, %esi # _oz_heap_end
movl %ebx, (%eax)
movl _oz_heap_cur, %ebx # _oz_heap_cur, sPointer
movl 480(%esp), %eax # CAP
movl am+52, %ecx # <variable>._currentOptVar, <anonymous>
movl am+28, %edx # <variable>.statusReg, <anonymous>
leal 12(%eax), %edi # <anonymous>
jmp *(%ebp) # * PC
To my uneducated eye, it looks like gcc is now trying very hard to
keep a bunch of values in registers. Every emulated instruction is
like that, thus resulting in considerable overhead. I tried to
declare _oz_heap_end and _oz_heap_cur volatile, but, curiously, that
had no effect on this particular code generation.
I am at my wits ends. Can anyone help? (I realize that my application
is atypical).
Cheers,
PS: the compiler options used for the emulator file are:
-fno-exceptions -O3 -pipe -fstrict-aliasing -march=pentium -mcpu=pentiumpro -fomit-frame-pointer
--
Dr. Denys Duchier Denys.Duchier@ps.uni-sb.de
Forschungsbereich Programmiersysteme (Programming Systems Lab)
Universitaet des Saarlandes, Geb. 45 http://www.ps.uni-sb.de/~duchier
Postfach 15 11 50 Phone: +49 681 302 5618
66041 Saarbruecken, Germany Fax: +49 681 302 5615
^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2002-10-22 16:56 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2002-10-21 11:01 abysmal code generated by gcc 3.2 Joe Wilson
[not found] <20021021142058.25276.qmail@web21104.mail.yahoo.com>
2002-10-21 10:29 ` Denys Duchier
2002-10-21 10:47 ` Joe Wilson
-- strict thread matches above, loose matches on Subject: below --
2002-10-21 0:56 Denys Duchier
2002-10-21 7:37 ` Fergus Henderson
2002-10-21 8:48 ` Denys Duchier
2002-10-21 12:59 ` Mike Stump
2002-10-21 15:07 ` Denys Duchier
2002-10-21 15:12 ` Fergus Henderson
2002-10-21 15:37 ` Mike Stump
2002-10-21 16:06 ` Dale Johannesen
2002-10-22 6:03 ` Michael Matz
2002-10-22 8:30 ` Kurt Garloff
2002-10-22 11:29 ` Dale Johannesen
2002-10-21 18:43 ` Denys Duchier
2002-10-22 3:56 ` Richard Henderson
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).