Slowdowns in code generated by GCC>=3.3

public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed

* Slowdowns in code generated by GCC>=3.3
@ 2004-10-20 13:38 Remko Troncon
  2004-10-20 13:52 ` Steven Bosscher
                   ` (2 more replies)
  0 siblings, 3 replies; 33+ messages in thread
From: Remko Troncon @ 2004-10-20 13:38 UTC (permalink / raw)
  To: gcc

Hi,

I am a developer of a bytecode emulator for the Prolog language. With the
release of GCC-3.3, our emulator was slowed down by a factor of 3 on x86 with
-O3 turned on (we didn't measure other platforms; the optimization flag doesn't
seem to matter). We were hoping this was a temporary issue, but the situation
didn't improve in any of the newer releases :(
I don't know whether i should file this as a bug report, so i first ask
for advice her.

I'll try to explain on a high level what happens. If this isn't sufficient, 
i can try to give some code, but this will take me some time to isolate the
code. This is the situation:
- Since the program counter in our emulator is very crucial, we use the 
  'register' and 'asm ("bx")' hints.
- For each instruction in the bytecode, we store the address of the label
  of the code which has to be executed for the instruction. Therefore, 
  the program counter always contains points to an address of code to
  be executed, and after each instruction we do a 
	goto  **(void **)program_counter
Previous versions of GCC keep the program counter in ebx, and do a 
jmp *(%ebx) after the instructions (as expected). The newer GCCs seem
to unnecessarily move the program counter around between registers, and
don't do the jmp*(%ebx) after each instruction, but seem to jump to a
'common' piece of code doing this jump.

Looking at the changelog of gcc-3.3, i can only deduce this has to do with 
the new DFA scheduler, but of course i can not tell for sure.

I don't know if any of this information is useful, but we could use some 
pointers in places to look where things are going wrong in the code generation.
The factor 3 of slowdown is really a lot.

Does anyone have any ideas ?

thanks a lot,
Remko

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Slowdowns in code generated by GCC>=3.3
  2004-10-20 13:38 Slowdowns in code generated by GCC>=3.3 Remko Troncon
@ 2004-10-20 13:52 ` Steven Bosscher
  2004-10-20 14:03   ` Remko Troncon
  2004-10-20 14:30 ` Ranjit Mathew
  2004-10-21 11:10 ` Mike Stump
  2 siblings, 1 reply; 33+ messages in thread
From: Steven Bosscher @ 2004-10-20 13:52 UTC (permalink / raw)
  To: Remko Troncon, gcc

On Wednesday 20 October 2004 14:34, Remko Troncon wrote:
> Hi,
>
> I am a developer of a bytecode emulator for the Prolog language. With the
> release of GCC-3.3, our emulator was slowed down by a factor of 3 on x86
> with -O3 turned on (we didn't measure other platforms; the optimization
> flag doesn't seem to matter).

Which x86 architecture variant?

> We were hoping this was a temporary issue,
> but the situation didn't improve in any of the newer releases :(
> I don't know whether i should file this as a bug report, so i first ask
> for advice here.

Filing a bug report is only going to be useful if you can report your
problem in a way such that we can reproduce it: test case, output of
"gcc -v", etc.  See http://gcc.gnu.org/bugs.html for the details ;-)

> I'll try to explain on a high level what happens. If this isn't sufficient,
> i can try to give some code, but this will take me some time to isolate the
> code. This is the situation:
> - Since the program counter in our emulator is very crucial, we use the
>   'register' and 'asm ("bx")' hints.

Is the program counter a global variable, or local?  And if you remove
those hints, does that make your code worse?
I would actually expect it to improve if you remove those hints.  x86
is a register starved architecture, and as the documentation mentions:

  "Defining such a register variable does not reserve the register; it
remains available for other uses in places where flow control determines
the variable's value is not live.  However, these registers are made
unavailable for use in the reload pass; excessive use of this feature
leaves the compiler too few available registers to compile certain
functions."
(see "info gcc", look for "Explicit Reg Vars")

For an architecture with basically only 6 registers, taking up just one
is probably "Excessive use" already.

> - For each instruction in the bytecode, we store the address of the label
>   of the code which has to be executed for the instruction. Therefore,
>   the program counter always contains points to an address of code to
>   be executed, and after each instruction we do a
> 	goto  **(void **)program_counter
> Previous versions of GCC keep the program counter in ebx, and do a
> jmp *(%ebx) after the instructions (as expected). The newer GCCs seem
> to unnecessarily move the program counter around between registers, and
> don't do the jmp*(%ebx) after each instruction, but seem to jump to a
> 'common' piece of code doing this jump.

Yes.  Indirect jumps are incredibly expensive at compile time, so what
the compiler does is "factor" the computed jump, i.e. given,

  goto *x;
  [ ... ]

  goto *x;
  [ ... ]

  goto *x;
  [ ... ]

the compiler factors the computed jumps results in the following code
sequence which has a much simpler control flow graph:

  goto y;
  [ ... ]

  goto y;
  [ ... ]

  goto y;
  [ ... ]

y:
  goto *x;

The compiler is supposed to unfactor this in the basic block reordering
pass, perhaps that is not happening for your code for some reason.

> Looking at the changelog of gcc-3.3, i can only deduce this has to do with
> the new DFA scheduler, but of course i can not tell for sure.

I can tell almost for sure that this is not the problem.  In GCC 3.3,
only the pentium has a DFA scheduler description, all other architecture
variants still use the old scheduler.  Besides, scheduling on i386 is a
local list scheduling and your problem seems to be control flow related.

> I don't know if any of this information is useful, but we could use some
> pointers in places to look where things are going wrong in the code
> generation. The factor 3 of slowdown is really a lot.

I would first try to remove that "register ... asm (...)" junk, and try
to optimize for something more advanced than i386 (which is the default
x86 architecture, see the manual, -march=*).  If that does not help,
please file a bug report including a test case as explained on bugs.html.

Gr.
Steven

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Slowdowns in code generated by GCC>=3.3
  2004-10-20 13:52 ` Steven Bosscher
@ 2004-10-20 14:03   ` Remko Troncon
  2004-10-20 14:21     ` Richard Guenther
  0 siblings, 1 reply; 33+ messages in thread
From: Remko Troncon @ 2004-10-20 14:03 UTC (permalink / raw)
  To: gcc

Hi,

> Which x86 architecture variant?

I measured in on a Pentium 4.

> Filing a bug report is only going to be useful if you can report your
> problem in a way such that we can reproduce it: test case, output of
> "gcc -v", etc.  See http://gcc.gnu.org/bugs.html for the details ;-)

Okay, but then i'll have to reduce the 7000 lines of code in the crucial
file to something smaller ;)

> Is the program counter a global variable, or local? 

It's local.

>  And if you remove those hints, does that make your code worse?  

Removing 'register ... asm(bx)' makes it a bit worse. Removing only
'asm(bx)' makes it worser than that. Although the difference in time is
not very big.

> Yes.  Indirect jumps are incredibly expensive at compile time, so what
> the compiler does is "factor" the computed jump, i.e. given,

This is indeed what happens.

> I can tell almost for sure that this is not the problem.  In GCC 3.3,
> only the pentium has a DFA scheduler description, all other
> architecture variants still use the old scheduler. 

Okay. It just seemed like the only change looking at the changelog, but
it doesn't sound like it has anything to do with it indeed. I just
didn't know exactly what it was.

> try to optimize for something more advanced than i386 (which is the
> default x86 architecture, see the manual, -march=*). 

I tried it with -march=pentium4 this time. Strangely enough, this makes
matters even worse.

thanks,
Remko

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Slowdowns in code generated by GCC>=3.3
  2004-10-20 14:03   ` Remko Troncon
@ 2004-10-20 14:21     ` Richard Guenther
  2004-10-20 14:52       ` Steven Bosscher
  2004-10-20 15:40       ` Remko Troncon
  0 siblings, 2 replies; 33+ messages in thread
From: Richard Guenther @ 2004-10-20 14:21 UTC (permalink / raw)
  To: Remko Troncon; +Cc: gcc

On Wed, 20 Oct 2004 15:19:14 +0200, Remko Troncon
<remko.troncon@cs.kuleuven.ac.be> wrote:
> Hi,
> 
> > Which x86 architecture variant?
> 
> I measured in on a Pentium 4.
> 
> > Filing a bug report is only going to be useful if you can report your
> > problem in a way such that we can reproduce it: test case, output of
> > "gcc -v", etc.  See http://gcc.gnu.org/bugs.html for the details ;-)
> 
> Okay, but then i'll have to reduce the 7000 lines of code in the crucial
> file to something smaller ;)

This looks similar to PR15242 and related bugs.

Richard.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Slowdowns in code generated by GCC>=3.3
  2004-10-20 13:38 Slowdowns in code generated by GCC>=3.3 Remko Troncon
  2004-10-20 13:52 ` Steven Bosscher
@ 2004-10-20 14:30 ` Ranjit Mathew
  2004-10-21 11:10 ` Mike Stump
  2 siblings, 0 replies; 33+ messages in thread
From: Ranjit Mathew @ 2004-10-20 14:30 UTC (permalink / raw)
  To: Remko Troncon; +Cc: GCC

Remko Troncon wrote:
> 
> I am a developer of a bytecode emulator for the Prolog language. With the
> release of GCC-3.3, our emulator was slowed down by a factor of 3 on x86 with
> -O3 turned on (we didn't measure other platforms; the optimization flag doesn't
> seem to matter). We were hoping this was a temporary issue, but the situation
> didn't improve in any of the newer releases :(

FWIW, there have been other grumblings about newer GCC
versions slowing down interpreter-like code:

http://article.gmane.org/gmane.comp.java.vm.sablevm.devel/1247

Ranjit.

-- 
Ranjit Mathew          Email: rmathew AT gmail DOT com

Bangalore, INDIA.      Web: http://ranjitmathew.tripod.com/

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Slowdowns in code generated by GCC>=3.3
  2004-10-20 14:21     ` Richard Guenther
@ 2004-10-20 14:52       ` Steven Bosscher
  2004-10-20 15:40       ` Remko Troncon
  1 sibling, 0 replies; 33+ messages in thread
From: Steven Bosscher @ 2004-10-20 14:52 UTC (permalink / raw)
  To: Richard Guenther, Remko Troncon; +Cc: gcc

On Wednesday 20 October 2004 15:26, Richard Guenther wrote:
> This looks similar to PR15242 and related bugs.

It certainly does, thanks!

Let's see if we can wiggle it into mainline if
it fixes the problem...

Gr.
Steven


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Slowdowns in code generated by GCC>=3.3
  2004-10-20 14:21     ` Richard Guenther
  2004-10-20 14:52       ` Steven Bosscher
@ 2004-10-20 15:40       ` Remko Troncon
  2004-10-20 16:21         ` Richard Guenther
  2004-10-20 16:29         ` Steven Bosscher
  1 sibling, 2 replies; 33+ messages in thread
From: Remko Troncon @ 2004-10-20 15:40 UTC (permalink / raw)
  To: gcc

> This looks similar to PR15242 and related bugs.

It sure does. However, I tried applying the patch to GCC-CVS, and
recompiled with -O3, but that didn't do anything to my performance :(

cheers,
Remko

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Slowdowns in code generated by GCC>=3.3
  2004-10-20 15:40       ` Remko Troncon
@ 2004-10-20 16:21         ` Richard Guenther
  2004-10-20 16:29         ` Steven Bosscher
  1 sibling, 0 replies; 33+ messages in thread
From: Richard Guenther @ 2004-10-20 16:21 UTC (permalink / raw)
  To: Remko Troncon; +Cc: gcc

On Wed, 20 Oct 2004 16:25:16 +0200, Remko Troncon
<remko.troncon@cs.kuleuven.ac.be> wrote:
> > This looks similar to PR15242 and related bugs.
> 
> It sure does. However, I tried applying the patch to GCC-CVS, and
> recompiled with -O3, but that didn't do anything to my performance :(

I guess you need to specify -fno-crossjumping or -fno-reorder-blocks
(or both) to make it work.  See the patch chunk changing passes.c which
checks for !flag_crossjumping before doing the optimization.

Richard.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Slowdowns in code generated by GCC>=3.3
  2004-10-20 15:40       ` Remko Troncon
  2004-10-20 16:21         ` Richard Guenther
@ 2004-10-20 16:29         ` Steven Bosscher
  1 sibling, 0 replies; 33+ messages in thread
From: Steven Bosscher @ 2004-10-20 16:29 UTC (permalink / raw)
  To: Remko Troncon, gcc

On Wednesday 20 October 2004 16:25, Remko Troncon wrote:
> > This looks similar to PR15242 and related bugs.
>
> It sure does. However, I tried applying the patch to GCC-CVS, and
> recompiled with -O3, but that didn't do anything to my performance :(

Probably because it happens too late and you're still killed by
register pressure, or does not actually change anything at all
for your test case (did you look to see if the computed jump was
un-factored?).  Try -fno-crossjumping.

Gr.
Steven




^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Slowdowns in code generated by GCC>=3.3
  2004-10-20 13:38 Slowdowns in code generated by GCC>=3.3 Remko Troncon
  2004-10-20 13:52 ` Steven Bosscher
  2004-10-20 14:30 ` Ranjit Mathew
@ 2004-10-21 11:10 ` Mike Stump
  2004-10-24  8:03   ` Remko Troncon
  2 siblings, 1 reply; 33+ messages in thread
From: Mike Stump @ 2004-10-21 11:10 UTC (permalink / raw)
  To: Remko Troncon; +Cc: gcc

On Oct 20, 2004, at 5:34 AM, Remko Troncon wrote:
> With the release of GCC-3.3, our emulator was slowed down by a factor 
> of 3 on x86

Another thought, you can binary search the compiler sources from cvs, 
compiling your application at each instance to determine the patch that 
went in that regressed performance for you.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Slowdowns in code generated by GCC>=3.3
  2004-10-21 11:10 ` Mike Stump
@ 2004-10-24  8:03   ` Remko Troncon
  2004-10-26 20:26     ` Mike Stump
  0 siblings, 1 reply; 33+ messages in thread
From: Remko Troncon @ 2004-10-24  8:03 UTC (permalink / raw)
  To: gcc

Hi again,

> Another thought, you can binary search the compiler sources from cvs, 
> compiling your application at each instance to determine the patch that 
> went in that regressed performance for you.

I did a search through GCC CVS to find out which patch caused our factor
3 slowdown. Apparently, it is the patch with this ChangeLog entry:

2003-02-15  Richard Henderson  <rth@redhat.com>
        * bb-reorder.c (find_traces_1_round): Don't connect easy to copy
        successors with multiple predecessors.
        (connect_traces): Try harder to copy traces of length 1.
        * function.h (struct function): Add computed_goto_common_label,
        computed_goto_common_reg.
        * function.c (free_after_compilation): Zap them.
        * stmt.c (expand_computed_goto): Use them to produce one 
        indirect branch per function.

I tried patching the files one by one (in the order above, as i assumed the
problem was in stmt.c and that that file depended on function.*), and indeed,
once the 'stmt.c' is patched, the slowdown occurs.

Can something be done about this ?

thanks,
Remko

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Slowdowns in code generated by GCC>=3.3
  2004-10-24  8:03   ` Remko Troncon
@ 2004-10-26 20:26     ` Mike Stump
  2004-10-26 22:27       ` Steven Bosscher
  0 siblings, 1 reply; 33+ messages in thread
From: Mike Stump @ 2004-10-26 20:26 UTC (permalink / raw)
  To: Remko Troncon; +Cc: gcc

On Oct 23, 2004, at 5:40 AM, Remko Troncon wrote:
>> Another thought, you can binary search the compiler sources from cvs,
>> compiling your application at each instance to determine the patch 
>> that
>> went in that regressed performance for you.
>
> I did a search through GCC CVS to find out which patch caused our 
> factor
> 3 slowdown. Apparently, it is the patch with this ChangeLog entry:
>
> 2003-02-15  Richard Henderson  <rth@redhat.com>
>         * bb-reorder.c (find_traces_1_round): Don't connect easy to 
> copy
>         successors with multiple predecessors.
>         (connect_traces): Try harder to copy traces of length 1.
>         * function.h (struct function): Add computed_goto_common_label,
>         computed_goto_common_reg.
>         * function.c (free_after_compilation): Zap them.
>         * stmt.c (expand_computed_goto): Use them to produce one
>         indirect branch per function.

Wait, we're not done yet, that is just the first step.  The next step 
is to find an instance of changed code generation...  It would help if 
you gperf or gcov your code, and then find a hot instance of the code 
that changed.  gcc -save-temps can be used to preserve the preprocessed 
source code and the assembly.

 From there, if you can, trim down the extraneous code from the .i/.ii 
file and then submit that file as a bug report, with the flags, then 
generated .s before and after, and the net effect of the change (3x 
application slowdown) and the fact it is a regression and a pointer to 
the above changelog entry and the timings you get with and without 
that.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Slowdowns in code generated by GCC>=3.3
  2004-10-26 20:26     ` Mike Stump
@ 2004-10-26 22:27       ` Steven Bosscher
  2004-10-26 22:44         ` Pablo Mejia
  0 siblings, 1 reply; 33+ messages in thread
From: Steven Bosscher @ 2004-10-26 22:27 UTC (permalink / raw)
  To: Mike Stump, Remko Troncon; +Cc: gcc, rth

On Tuesday 26 October 2004 20:57, Mike Stump wrote:
> On Oct 23, 2004, at 5:40 AM, Remko Troncon wrote:
> >> Another thought, you can binary search the compiler sources from cvs,
> >> compiling your application at each instance to determine the patch
> >> that
> >> went in that regressed performance for you.
> >
> > I did a search through GCC CVS to find out which patch caused our
> > factor
> > 3 slowdown. Apparently, it is the patch with this ChangeLog entry:
> >
> > 2003-02-15  Richard Henderson  <rth@redhat.com>
> >         * bb-reorder.c (find_traces_1_round): Don't connect easy to
> > copy
> >         successors with multiple predecessors.
> >         (connect_traces): Try harder to copy traces of length 1.
> >         * function.h (struct function): Add computed_goto_common_label,
> >         computed_goto_common_reg.
> >         * function.c (free_after_compilation): Zap them.
> >         * stmt.c (expand_computed_goto): Use them to produce one
> >         indirect branch per function.
>
> Wait, we're not done yet, that is just the first step.  The next step
> is to find an instance of changed code generation...  It would help if
> you gperf or gcov your code, and then find a hot instance of the code
> that changed.  gcc -save-temps can be used to preserve the preprocessed
> source code and the assembly.
>
>  From there, if you can, trim down the extraneous code from the .i/.ii
> file and then submit that file as a bug report, with the flags, then
> generated .s before and after, and the net effect of the change (3x
> application slowdown) and the fact it is a regression and a pointer to
> the above changelog entry and the timings you get with and without
> that.

I already know exactly what the problem is.
In fact I mentioned it even before Remko confirmed that this patch is
causing his problem.

The patch makes us "factor" computed jumps.  From cfg.texi:

@smallexample
  goto *x;
  [ ... ]

  goto *x;
  [ ... ]

  goto *x;
  [ ... ]
@end smallexample

@noindent
factoring the computed jumps results in the following code sequence
which has a much simpler flow graph:

@smallexample
  goto y;
  [ ... ]

  goto y;
  [ ... ]

  goto y;
  [ ... ]

y:
  goto *x;
@end smallexample

Now, the problem is that gcse and crossjumping may move code in the
block with the factored computed jump, so when we later try to undo
the factoring, we think it is too expensive to do that.  We end up
with lots and lots of jumps to a single computed jump that is very
difficult to predict.
To make things even worse, the expressions gcse moves out increase
register pressure by just enough to make us spill too much on ix86.
For dProlog this caused a code size increase of ~60% for the (large)
function with the computed jump.

In rth's patch, bb-reorder is supposed to try harder to connect short
traces and duplicate them.  This apparently doesn't work, so Josef
Zlomek posted his patch that is linked to from the audit trail of
PR15242.
That patch fails, we clear current_function_has_computed_jump so we
never run the unfactoring.
Fixing that uncovers the problem with gcse and crossjumping, making
the unfactoring too expensive.
GCSE is a bigger problem (it's quite unlikely that crossjumping can
merge tails for typical code with computed jumps), so I suggest we
simply disable GCSE completely if there are computed jumps in some
function.  GCSE doesn't buy us much anyway for such code.

A patch I posted to Remko fixes his problem if he disables gcse and
crossjumping by hand.  I'm working on cleaning up that patch now so
that it is acceptable for mainline.  The most important problem is
current_function_has_computed_jump.  I plan to patch tree-cfg.c to
clean up computed jumps at the tree level, and let emit_indirect_jump
set current_function_has_computed_jump.  When it is set, we never
clear it after that.  This only affects sched-rgn in the unlikely
case that we'd clean up a computed jump at the RTL level.
The rest is just Josef's patch, revamped.

I hope to find time to finish the patch later this week.

Gr.
Steven

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Slowdowns in code generated by GCC>=3.3
  2004-10-26 22:27       ` Steven Bosscher
@ 2004-10-26 22:44         ` Pablo Mejia
  2004-10-26 22:47           ` Steven Bosscher
  0 siblings, 1 reply; 33+ messages in thread
From: Pablo Mejia @ 2004-10-26 22:44 UTC (permalink / raw)
  To: Steven Bosscher; +Cc: gcc

    Steven> Now, the problem is that gcse and crossjumping may move
    Steven> code in the block with the factored computed jump, so when
    Steven> we later try to undo the factoring, we think it is too
    Steven> expensive to do that.  We end up with lots and lots of
    Steven> jumps to a single computed jump that is very difficult to
    Steven> predict.

I just lurk on this list, so feel free to ignore this comment...

Perhaps it would be possible to mark the block with the factored
computed jump in some way.  Any pass which tries to do code motion, or
other problematic optimizations, could check that note and inhibit
code motion into the block.  That would preserve the block with the
factored computed jump so it can be un-factored later.  Perhaps then
it wouldn't be necessary to disable gcse in functions with computed
jumps.

Pablo

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Slowdowns in code generated by GCC>=3.3
  2004-10-26 22:44         ` Pablo Mejia
@ 2004-10-26 22:47           ` Steven Bosscher
  0 siblings, 0 replies; 33+ messages in thread
From: Steven Bosscher @ 2004-10-26 22:47 UTC (permalink / raw)
  To: pablo, Pablo Mejia; +Cc: gcc

On Tuesday 26 October 2004 22:09, Pablo Mejia wrote:
>     Steven> Now, the problem is that gcse and crossjumping may move
>     Steven> code in the block with the factored computed jump, so when
>     Steven> we later try to undo the factoring, we think it is too
>     Steven> expensive to do that.  We end up with lots and lots of
>     Steven> jumps to a single computed jump that is very difficult to
>     Steven> predict.
>
> I just lurk on this list, so feel free to ignore this comment...
>
> Perhaps it would be possible to mark the block with the factored
> computed jump in some way.

It is cumbersome to maintain such a marking.  I tried it at
first but it's hard to not lose the mark, and it's harder to
change all passes to be aware of such a block.

Gr.
Steven

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Slowdowns in code generated by GCC>=3.3
  2004-10-27 17:49                   ` Steven Bosscher
@ 2004-10-27 18:04                     ` Giovanni Bajo
  0 siblings, 0 replies; 33+ messages in thread
From: Giovanni Bajo @ 2004-10-27 18:04 UTC (permalink / raw)
  To: Steven Bosscher, Remko Troncon; +Cc: gcc

Steven Bosscher wrote:

>>>> OK, then PR 15242 is *NOT* the problem reported by the original
>>>> poster of this thread, because he *does* have a regression when
>>>> compiling his code with 3.3.
>>>
>>> If 'regression' includes performance decrease, then indeed.
>>
>> Can you please file a bug report about this then, if you hadn't
>> already? CC me on the bug report please.
>
> *sigh*
> /me points at thread which mentions PR15242

/me points at the quoted text.

15242 is not a regression, the user is seeing a regression since 3.3. So either
you should mark 15242 as a regression (which I am told it would be incorrect),
or 15242, if related, does not capture the essence of the regression, which
appears (or is unconvered - it does not matter) only for versions above 3.3.

Giovanni Bajo

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Slowdowns in code generated by GCC>=3.3
  2004-10-27 17:29                 ` Giovanni Bajo
@ 2004-10-27 17:49                   ` Steven Bosscher
  2004-10-27 18:04                     ` Giovanni Bajo
  0 siblings, 1 reply; 33+ messages in thread
From: Steven Bosscher @ 2004-10-27 17:49 UTC (permalink / raw)
  To: Giovanni Bajo, Remko Troncon; +Cc: gcc

On Wednesday 27 October 2004 18:26, Giovanni Bajo wrote:
> Remko Troncon wrote:
> >> OK, then PR 15242 is *NOT* the problem reported by the original
> >> poster of this thread, because he *does* have a regression when
> >> compiling his code with 3.3.
> >
> > If 'regression' includes performance decrease, then indeed.
>
> Can you please file a bug report about this then, if you hadn't already? CC
> me on the bug report please.

*sigh*
/me points at thread which mentions PR15242

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Slowdowns in code generated by GCC>=3.3
  2004-10-20 23:29               ` Remko Troncon
  2004-10-20 23:29                 ` Scott Robert Ladd
  2004-10-21  7:34                 ` Joe Buck
@ 2004-10-27 17:29                 ` Giovanni Bajo
  2004-10-27 17:49                   ` Steven Bosscher
  2 siblings, 1 reply; 33+ messages in thread
From: Giovanni Bajo @ 2004-10-27 17:29 UTC (permalink / raw)
  To: Remko Troncon; +Cc: gcc

Remko Troncon wrote:

>> OK, then PR 15242 is *NOT* the problem reported by the original
>> poster of this thread, because he *does* have a regression when
>> compiling his code with 3.3.
>
> If 'regression' includes performance decrease, then indeed.

Can you please file a bug report about this then, if you hadn't already? CC me
on the bug report please.

Giovanni Bajo


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Slowdowns in code generated by GCC>=3.3
  2004-10-20 23:29               ` Remko Troncon
  2004-10-20 23:29                 ` Scott Robert Ladd
@ 2004-10-21  7:34                 ` Joe Buck
  2004-10-27 17:29                 ` Giovanni Bajo
  2 siblings, 0 replies; 33+ messages in thread
From: Joe Buck @ 2004-10-21  7:34 UTC (permalink / raw)
  To: Remko Troncon; +Cc: gcc

On Wed, Oct 20, 2004 at 11:22:10PM +0200, Remko Troncon wrote:
> > OK, then PR 15242 is *NOT* the problem reported by the original poster of
> > this thread, because he *does* have a regression when compiling his code
> > with 3.3.
> 
> If 'regression' includes performance decrease, then indeed.

A 3x performance decrease on a production program is most certainly a
regression.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Slowdowns in code generated by GCC>=3.3
  2004-10-20 23:29               ` Remko Troncon
@ 2004-10-20 23:29                 ` Scott Robert Ladd
  2004-10-21  7:34                 ` Joe Buck
  2004-10-27 17:29                 ` Giovanni Bajo
  2 siblings, 0 replies; 33+ messages in thread
From: Scott Robert Ladd @ 2004-10-20 23:29 UTC (permalink / raw)
  To: Remko Troncon; +Cc: gcc

Remko Troncon wrote:
> If 'regression' includes performance decrease, then indeed.
> 
> I tried playing with the -fno-crossjumping or -fno-reorder-blocks flags,
> but no change in performance. I didn't look at the generated code
> closely yet, because the code looked quite different than before, so i
> couldn't immediately if the jump was still factored out. It did look
> like the code was a lot more complex than with older GCCs, but I'll have
> to look at it closer tomorrow.

I'm not certain if this will help, and I don't know the nature of your 
program.

If you can isolate the offending code in a small, short-run example, you 
could run it through my Acovea program to analyze for pessimistic 
options. That might give us a clearer idea of what is causing the problem.

	http://www.coyotegulch.com/products/acovea/index.html

If this isn't something you want or can do, I'd be interested is anyone 
can come up with a short example that I could run through Acovea over 
the weekend.

-- 
Scott Robert Ladd
site: http://www.coyotegulch.com
blog: http://chaoticcoyote.blogspot.com

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Slowdowns in code generated by GCC>=3.3
  2004-10-20 23:27             ` Giovanni Bajo
@ 2004-10-20 23:29               ` Remko Troncon
  2004-10-20 23:29                 ` Scott Robert Ladd
                                   ` (2 more replies)
  0 siblings, 3 replies; 33+ messages in thread
From: Remko Troncon @ 2004-10-20 23:29 UTC (permalink / raw)
  To: gcc

> OK, then PR 15242 is *NOT* the problem reported by the original poster of
> this thread, because he *does* have a regression when compiling his code
> with 3.3.

If 'regression' includes performance decrease, then indeed.

I tried playing with the -fno-crossjumping or -fno-reorder-blocks flags,
but no change in performance. I didn't look at the generated code
closely yet, because the code looked quite different than before, so i
couldn't immediately if the jump was still factored out. It did look
like the code was a lot more complex than with older GCCs, but I'll have
to look at it closer tomorrow.

cheers,
Remko

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Slowdowns in code generated by GCC>=3.3
  2004-10-20 23:26           ` Andrew Pinski
@ 2004-10-20 23:27             ` Giovanni Bajo
  2004-10-20 23:29               ` Remko Troncon
  0 siblings, 1 reply; 33+ messages in thread
From: Giovanni Bajo @ 2004-10-20 23:27 UTC (permalink / raw)
  To: Andrew Pinski; +Cc: Földy Lajos, gcc

Andrew Pinski wrote:

>> We accept only bugfixes for regression on release branches, and this
>> PR is
>> not marked as a regression, right now. Given the subject of this
>> thread,
>> this could be a mistake on our side. Can anybody confirm that the
>> problem in
>> PR 15242 is a regression?
>
> This is not a regression see PR 8092 which talks about why this is not
> really a regression or a defect really.

OK, then PR 15242 is *NOT* the problem reported by the original poster of
this thread, because he *does* have a regression when compiling his code
with 3.3.

Giovanni Bajo

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Slowdowns in code generated by GCC>=3.3
  2004-10-20 23:21         ` Giovanni Bajo
@ 2004-10-20 23:26           ` Andrew Pinski
  2004-10-20 23:27             ` Giovanni Bajo
  0 siblings, 1 reply; 33+ messages in thread
From: Andrew Pinski @ 2004-10-20 23:26 UTC (permalink / raw)
  To: Giovanni Bajo; +Cc: Földy Lajos, gcc


On Oct 20, 2004, at 4:14 PM, Giovanni Bajo wrote:

> Földy Lajos wrote:
>
>> Is there a chance, that the patch for PR15242 (if available) will go 
>> into
>> 3.4.3?
>
> We accept only bugfixes for regression on release branches, and this 
> PR is
> not marked as a regression, right now. Given the subject of this 
> thread,
> this could be a mistake on our side. Can anybody confirm that the 
> problem in
> PR 15242 is a regression?

This is not a regression see PR 8092 which talks about why this is not
really a regression or a defect really.

-- Pinski

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Slowdowns in code generated by GCC>=3.3
  2004-10-20 20:14       ` Földy Lajos
  2004-10-20 20:21         ` Zack Weinberg
  2004-10-20 21:22         ` Peter Barada
@ 2004-10-20 23:21         ` Giovanni Bajo
  2004-10-20 23:26           ` Andrew Pinski
  2 siblings, 1 reply; 33+ messages in thread
From: Giovanni Bajo @ 2004-10-20 23:21 UTC (permalink / raw)
  To: Földy Lajos; +Cc: gcc

FĂśldy Lajos wrote:

> Is there a chance, that the patch for PR15242 (if available) will go into
> 3.4.3?

We accept only bugfixes for regression on release branches, and this PR is
not marked as a regression, right now. Given the subject of this thread,
this could be a mistake on our side. Can anybody confirm that the problem in
PR 15242 is a regression?

Giovanni Bajo

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Slowdowns in code generated by GCC>=3.3
  2004-10-20 21:22         ` Peter Barada
@ 2004-10-20 22:45           ` Zack Weinberg
  0 siblings, 0 replies; 33+ messages in thread
From: Zack Weinberg @ 2004-10-20 22:45 UTC (permalink / raw)
  To: Peter Barada; +Cc: foldy, gcc

Peter Barada <peter@the-baradas.com> writes:

> Wouldn't it be better to make the compiler not do
> tail-merging/crossjumping unless the savings makes it wortwhile;
> i.e. don't do the jump to save less than N instructions(where in
> this case, it actaully *increases* the instruction count by one)?

I don't think you understand the problem.  The control flow graph
generated from the original code contains a very large number of basic
blocks each of which begins and ends with an abnormal edge.  The early
phases of optimization go into a very bad performance regime when
faced with such a control flow graph -- we're talking hours to compile
one function.

The tail merge in this case is done in spite of its making the
generated code worse; its sole function is to get the early optimizers
out of the bad performance regime.  It's *supposed* to be undone at
the end.  The bug is that that isn't happening.

(An alternative would be to detect the problem basic block structure
and disable the optimizations that can't handle it; if I remember
correctly which those are, they are unlikely to do anything useful on
such a function anyway.)

In any case, we're definitely in patches-are-welcome mode with regard
to getting this fixed.

zw

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Slowdowns in code generated by GCC>=3.3
  2004-10-20 20:14       ` Földy Lajos
  2004-10-20 20:21         ` Zack Weinberg
@ 2004-10-20 21:22         ` Peter Barada
  2004-10-20 22:45           ` Zack Weinberg
  2004-10-20 23:21         ` Giovanni Bajo
  2 siblings, 1 reply; 33+ messages in thread
From: Peter Barada @ 2004-10-20 21:22 UTC (permalink / raw)
  To: foldy; +Cc: zack, gcc


>> 
>> Nothing stops GCC from folding that asm into the common-ized jump
>> block.
>> 
>> zw
>> 
>
>I see.
>
>So the practical solution is 
>
>- generate asm listing
>- identify appropriate jumps, eg:
>
>        ...
>        movl    8(%ecx), %eax
>        jmp     .Lxxxxx
>	...
>.Lxxxxx:
>        jmp     *%eax
>
>- change "jmp .Lxxxxx" to "jmp *%eax" with your favorite text editor
>- compile the asm source

Wouldn't it be better to make the compiler not do
tail-merging/crossjumping unless the savings makes it
wortwhile; i.e. don't do the jump to save less than N
instructions(where in this case, it actaully *increases* the
instruction count by one)?

-- 
Peter Barada
peter@the-baradas.com

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Slowdowns in code generated by GCC>=3.3
  2004-10-20 20:14       ` Földy Lajos
@ 2004-10-20 20:21         ` Zack Weinberg
  2004-10-20 21:22         ` Peter Barada
  2004-10-20 23:21         ` Giovanni Bajo
  2 siblings, 0 replies; 33+ messages in thread
From: Zack Weinberg @ 2004-10-20 20:21 UTC (permalink / raw)
  To: Földy Lajos; +Cc: gcc

Földy Lajos <foldy@rmki.kfki.hu> writes:

> Is there a chance, that the patch for PR15242 (if available) will go
> into 3.4.3?

I can't speak to that.  From the discussion, it sounds like the bug
hasn't yet been fixed.

zw

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Slowdowns in code generated by GCC>=3.3
  2004-10-20 19:00     ` Zack Weinberg
  2004-10-20 19:05       ` Daniel Berlin
@ 2004-10-20 20:14       ` Földy Lajos
  2004-10-20 20:21         ` Zack Weinberg
                           ` (2 more replies)
  1 sibling, 3 replies; 33+ messages in thread
From: Földy Lajos @ 2004-10-20 20:14 UTC (permalink / raw)
  To: Zack Weinberg; +Cc: gcc


On Wed, 20 Oct 2004, Zack Weinberg wrote:

> Földy Lajos <foldy@rmki.kfki.hu> writes:
> 
> > and what about adding the real goto (which will be never executed, but
> > gcc will have "almost real" control flow):
> >
> >  	__asm__("jmp *%0" : : "a" (pc));
> > 	goto *pc;
> 
> Nothing stops GCC from folding that asm into the common-ized jump
> block.
> 
> zw
> 

I see.

So the practical solution is 

- generate asm listing
- identify appropriate jumps, eg:

        ...
        movl    8(%ecx), %eax
        jmp     .Lxxxxx
	...
.Lxxxxx:
        jmp     *%eax

- change "jmp .Lxxxxx" to "jmp *%eax" with your favorite text editor
- compile the asm source

:-)


Is there a chance, that the patch for PR15242 (if available) will go into 
3.4.3?

regards,
lajos

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Slowdowns in code generated by GCC>=3.3
  2004-10-20 19:00     ` Zack Weinberg
@ 2004-10-20 19:05       ` Daniel Berlin
  2004-10-20 20:14       ` Földy Lajos
  1 sibling, 0 replies; 33+ messages in thread
From: Daniel Berlin @ 2004-10-20 19:05 UTC (permalink / raw)
  To: Zack Weinberg; +Cc: Földy Lajos, gcc


On Oct 20, 2004, at 1:30 PM, Zack Weinberg wrote:

> Földy Lajos <foldy@rmki.kfki.hu> writes:
>
>> and what about adding the real goto (which will be never executed, but
>> gcc will have "almost real" control flow):
>>
>>  	__asm__("jmp *%0" : : "a" (pc));
>> 	goto *pc;
>
> Nothing stops GCC from folding that asm into the common-ized jump
> block.

Right.
Never try to work around the express constraints given to you by the 
compiler, it will always win, and you will always lose.
:)

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Slowdowns in code generated by GCC>=3.3
  2004-10-20 18:41   ` Földy Lajos
@ 2004-10-20 19:00     ` Zack Weinberg
  2004-10-20 19:05       ` Daniel Berlin
  2004-10-20 20:14       ` Földy Lajos
  0 siblings, 2 replies; 33+ messages in thread
From: Zack Weinberg @ 2004-10-20 19:00 UTC (permalink / raw)
  To: Földy Lajos; +Cc: gcc

Földy Lajos <foldy@rmki.kfki.hu> writes:

> and what about adding the real goto (which will be never executed, but
> gcc will have "almost real" control flow):
>
>  	__asm__("jmp *%0" : : "a" (pc));
> 	goto *pc;

Nothing stops GCC from folding that asm into the common-ized jump
block.

zw

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Slowdowns in code generated by GCC>=3.3
  2004-10-20 18:09 ` Zack Weinberg
@ 2004-10-20 18:41   ` Földy Lajos
  2004-10-20 19:00     ` Zack Weinberg
  0 siblings, 1 reply; 33+ messages in thread
From: Földy Lajos @ 2004-10-20 18:41 UTC (permalink / raw)
  To: Zack Weinberg; +Cc: gcc


On Wed, 20 Oct 2004, Zack Weinberg wrote:

> Földy Lajos <foldy@rmki.kfki.hu> writes:
> 
> > not portable, but on i386 you can try using the good old inline assembly: 
> >
> > 	void* pc;
> > 	...
> > 	pc=&&lab;
> > 	__asm__("jmp *%0" : : "a" (pc));
> > 	...
> > 	lab:
> 
> No, you can't do this.  Asm statements cannot alter control flow.
> 
> zw
> 


and what about adding the real goto (which will be never executed, but
gcc will have "almost real" control flow):

 	__asm__("jmp *%0" : : "a" (pc));
	goto *pc;

regards,
lajos

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Slowdowns in code generated by GCC>=3.3
  2004-10-20 17:26 Földy Lajos
@ 2004-10-20 18:09 ` Zack Weinberg
  2004-10-20 18:41   ` Földy Lajos
  0 siblings, 1 reply; 33+ messages in thread
From: Zack Weinberg @ 2004-10-20 18:09 UTC (permalink / raw)
  To: Földy Lajos; +Cc: remko.troncon, gcc

Földy Lajos <foldy@rmki.kfki.hu> writes:

> not portable, but on i386 you can try using the good old inline assembly: 
>
> 	void* pc;
> 	...
> 	pc=&&lab;
> 	__asm__("jmp *%0" : : "a" (pc));
> 	...
> 	lab:

No, you can't do this.  Asm statements cannot alter control flow.

zw

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Slowdowns in code generated by GCC>=3.3
@ 2004-10-20 17:26 Földy Lajos
  2004-10-20 18:09 ` Zack Weinberg
  0 siblings, 1 reply; 33+ messages in thread
From: Földy Lajos @ 2004-10-20 17:26 UTC (permalink / raw)
  To: remko.troncon; +Cc: gcc

> Hi,
> 
> I am a developer of a bytecode emulator for the Prolog language. With 
> the release of GCC-3.3, our emulator was slowed down by a factor of 3 on 
> x86 with -O3 turned on (we didn't measure other platforms; the 
> optimization flag doesn't seem to matter). We were hoping this was a 
> temporary issue, but the situation didn't improve in any of the newer 
> releases :( I don't know whether i should file this as a bug report, so 
> i first ask for advice her.
>
> I'll try to explain on a high level what happens. If this isn't 
> sufficient, i can try to give some code, but this will take me some time
> to isolate the code. This is the situation:
> - Since the program counter in our emulator is very crucial, we use the 
>   'register' and 'asm ("bx")' hints.
> - For each instruction in the bytecode, we store the address of the 
>   label of the code which has to be executed for the instruction. 
>   Therefore, the program counter always contains points to an address of
>   code to be executed, and after each instruction we do a 
>         goto  **(void **)program_counter
> Previous versions of GCC keep the program counter in ebx, and do a 
> jmp *(%ebx) after the instructions (as expected). The newer GCCs seem
> to unnecessarily move the program counter around between registers, and
> don't do the jmp*(%ebx) after each instruction, but seem to jump to a
> 'common' piece of code doing this jump.
>
> Looking at the changelog of gcc-3.3, i can only deduce this has to do 
> with the new DFA scheduler, but of course i can not tell for sure.
>
> I don't know if any of this information is useful, but we could use some 
> pointers in places to look where things are going wrong in the code 
> generation. The factor 3 of slowdown is really a lot.
>
> Does anyone have any ideas ?
>
> thanks a lot,
> Remko


Hi,

not portable, but on i386 you can try using the good old inline assembly: 

	void* pc;
	...
	pc=&&lab;
	__asm__("jmp *%0" : : "a" (pc));
	...
	lab:

best regards,
lajos foldy

^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2004-10-27 16:50 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-10-20 13:38 Slowdowns in code generated by GCC>=3.3 Remko Troncon
2004-10-20 13:52 ` Steven Bosscher
2004-10-20 14:03   ` Remko Troncon
2004-10-20 14:21     ` Richard Guenther
2004-10-20 14:52       ` Steven Bosscher
2004-10-20 15:40       ` Remko Troncon
2004-10-20 16:21         ` Richard Guenther
2004-10-20 16:29         ` Steven Bosscher
2004-10-20 14:30 ` Ranjit Mathew
2004-10-21 11:10 ` Mike Stump
2004-10-24  8:03   ` Remko Troncon
2004-10-26 20:26     ` Mike Stump
2004-10-26 22:27       ` Steven Bosscher
2004-10-26 22:44         ` Pablo Mejia
2004-10-26 22:47           ` Steven Bosscher
2004-10-20 17:26 Földy Lajos
2004-10-20 18:09 ` Zack Weinberg
2004-10-20 18:41   ` Földy Lajos
2004-10-20 19:00     ` Zack Weinberg
2004-10-20 19:05       ` Daniel Berlin
2004-10-20 20:14       ` Földy Lajos
2004-10-20 20:21         ` Zack Weinberg
2004-10-20 21:22         ` Peter Barada
2004-10-20 22:45           ` Zack Weinberg
2004-10-20 23:21         ` Giovanni Bajo
2004-10-20 23:26           ` Andrew Pinski
2004-10-20 23:27             ` Giovanni Bajo
2004-10-20 23:29               ` Remko Troncon
2004-10-20 23:29                 ` Scott Robert Ladd
2004-10-21  7:34                 ` Joe Buck
2004-10-27 17:29                 ` Giovanni Bajo
2004-10-27 17:49                   ` Steven Bosscher
2004-10-27 18:04                     ` Giovanni Bajo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).