[Bug optimization/15242] New: pessimization of "goto *" - anton at mips dot complang dot tuwien dot ac dot at

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

From: "anton at mips dot complang dot tuwien dot ac dot at" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug optimization/15242] New: pessimization of "goto *"
Date: Sat, 01 May 2004 14:27:00 -0000	[thread overview]
Message-ID: <20040501142654.15242.anton@mips.complang.tuwien.ac.at> (raw)

This is essentially the 3.4.0 incarnation of the problem discussed at
the end of PR8092.  mmitchel suggested to create a new bug report, so
here it is:

Code example and observed behaviour
-----------------------------------

In the attached code, the fragment between I_question_branch and
J_question_branch_lp_plus_store_number contains the following code
(slightly edited for readability):

I_question_branch:   
{
  Cell * a_target;
  Bool f;
  ;
  (( a_target )=(Cell *)( ( (* (ip) )   )  )) ;
  (( f )=(Bool)( (sp[0])  )) ;
  ({ ip+=( 1 );}) ;
  sp += 1;
  {
    if (f==0) {
      ({ip=( (Xt *)a_target );  ;}) ;
      ;
      (ip++) ;
      ;
      ({asm("":"=X"(cfa));  goto **(ip-1);}) ;
    }
    ;
  }
  ;
  (ip++) ;
  ;
K_question_branch : 
  ({asm("":"=X"(cfa));  goto **(ip-1);}) ;
}
J_question_branch_lp_plus_store_number :

When the file is compiled with

gcc-3.4.0  -I../../gforth-0.6.2/engine/../arch/386 -I. -Wall -O2
-fomit-frame-pointer -fforce-addr -fforce-mem -march=pentium -DHAVE_CONFIG_H
-DDEFAULTPATH='".:/usr/local/lib/gforth/site-forth:/usr/local/share/gforth/site-forth:/usr/local/lib/gforth/0.6.2:/usr/local/share/gforth/0.6.2"'
-fno-gcse -fno-strict-aliasing -fno-defer-pop -fcaller-saves -fno-inline -S ef.i
-o ef-3.4.0-def.s

this produces the following code:

.L20:
	movl	(%edi), %eax
	movl	(%ebp), %edx
	addl	$4, %edi
	addl	$4, %ebp
	testl	%eax, %eax
	jne	.L1073
	leal	4(%edx), %ebp
	movl	-4(%ebp), %eax
	jmp	.L1373
.L739:
... #actually the following fragment comes earlier
.L1073:
	addl	$4, %ebp
.L376:
	movl	-4(%ebp), %eax
	jmp	.L1373
... #the following fragment comes even earlier.
.L1373: 
	jmp	*%eax

What is the problem?
--------------------

The main problem here is that the code jumps to a shared "jmp *eax"
instead of just putting "jmp *eax" (or, in this case, preferably "jmp
*-4(%ebp)") where the jump to the shared indirect jump is.  All
indirect jumps are converted into jumps to a single shared indirect
jump in this way.

This obstructs effective usage of the BTB for branch prediction on
CPUs such as the Athlon, Opteron, and Pentium III (see below), and
most interpreters using GCC's labels-as-values extension will be
affected by this.

It also makes it impossible for us to use dynamic superinstructions
portably (aka selective inlining, PLDI'98 p. 291).  I could eventually
find ways to work around the problems we have had with 3.2 and 3.3,
but this time I don't see a workaround (and a quick look for new
options revealed nothing).

A smaller problem is that the basic blocks are distributed throughout
the code unless we use -fno-reorder-blocks; this reduces the number of
cases where we can apply dynamic superinstructions.  For gcc-3.3,
using -fno-crossjumping suppressed the main problem, but not in
combination with -fno-reorder-blocks.  So if you fix the main problem,
it would be nice if it also worked with -fno-reorder-blocks.

How much does this cost in performance?
---------------------------------------

Here are the results of doing a

configure --enable-force-reg CC=...; make; make onebench
ENGINE_FAST="./gforth-fast --dynamic"; make onebench ENGINE_FAST="./gforth-fast
--no-dynamic"

for gforth-0.6.2 with various compilers (The ef.i file used for the
code example above was also created from gforth-0.6.2, with gcc-2.95
and without --enable-force-reg):

Pentium III 1000MHz; numbers are times in seconds user time
	   gcc-2.95.3	   gcc-3.3	gcc-3.4.0
	dynamic no-dyn	dynamic	no-dyn	no-dyn
siev    0.56    0.85    0.67    0.90    2.31
bubble  0.72	1.28	0.93	1.35	2.68
matrix  0.35	1.43	0.35	1.38	1.90
fib     0.76	1.11	0.95	1.27	2.90

Here you see a factor 3.7-5.4 between the best result (gcc-2.95
dynamic) and the gcc-3.4.0 result for matrix.  This slowdown has two
main components:

- we cannot apply the dynamic optimizations (and I don't see a
portable way to work around this problem), resulting in a slowdown
factors of 1.5-4.

- from the shared indirect jump; the jump to this indirect jump costs
something, but the more important component of the slowdown is
probably the reduction in branch prediction accuracy for the indirect
jumps; the resulting slowdown is about a factor of 1.4-2.5.

On the Athlon I have seen similar results in the past.

Here are results from a Pentium 4:

Pentium 4 2.26GHz; numbers are times in seconds user time
	   gcc-2.95.3	   gcc-3.3	gcc-3.4.0
	dynamic no-dyn	dynamic	no-dyn	no-dyn
siev    0.24    0.48    0.31    0.47    0.50
bubble  0.30	0.78	0.36	0.77	0.78
matrix  0.19	0.94	0.17	0.92	0.96
fib     0.34	0.57	0.41	0.58	0.59

On the Pentium 4 the second component apparently does not play a big
role, probably due to the effects of the trace cache, but the first
component results in even higher slowdowns (1.7-5) than on the
PentiumIII.

Do flag variations change the problem?
--------------------------------------

Adding -fno-reorder-buffers moves .L1073 right below the first "jmp
.L1373" (as it should), but the jumps to .L1373 are still there.
Adding -fno-crossjumping in either case changes essentially nothing.
Reducing the command line to

gcc-3.4.0 -Wall -O2 -fomit-frame-pointer -S ef.i -o ef-3.4.0-nofs.s

produces similar results (but the register allocation is worse).

Suggested fix
-------------

AFAIK gcc introduces the shared indirect jump for the benefit of
data-flow analysis.  In gcc-3.3 with some combinations of flags this
change was undone later (but the pass that did this was suppressed
with -fno-reorder-jumps).  In 3.4.0 the undoing is apparently
completely disabled.

One fix would be to put the undoing into a pass that is invoked after
the data-flow analysis, but before the last combining pass, and that
is independent of flags such as -fno-reorder-blocks (maybe a separate
pass?).  An alternative would be to produce the desired control-flow
graph for data-flow analysis without changing the code (i.e., without
introducing shared jumps).

I might give writing one of these approaches pass a try, but would
prefer to see it done by people who know more about gcc internals than
I do.  In case I do it, I would apreciate feedback and suggestions on
how to proceed.

- anton

-- 
           Summary: pessimization of "goto *"
           Product: gcc
           Version: 3.4.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P2
         Component: optimization
        AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: anton at mips dot complang dot tuwien dot ac dot at
                CC: bernd dot paysan at gmx dot de,gcc-bugs at gcc dot gnu
                    dot org
 GCC build triplet: i686-pc-linux-gnu
  GCC host triplet: i686-pc-linux-gnu
GCC target triplet: i686-pc-linux-gnu

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=15242

next             reply	other threads:[~2004-05-01 14:27 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-05-01 14:27 anton at mips dot complang dot tuwien dot ac dot at [this message]
2004-05-01 14:30 ` [Bug optimization/15242] " anton at mips dot complang dot tuwien dot ac dot at
2004-05-01 14:38 ` pinskia at gcc dot gnu dot org
2004-05-01 14:41 ` pinskia at gcc dot gnu dot org
2004-05-01 14:42 ` pinskia at gcc dot gnu dot org
2004-05-01 15:03 ` anton at a0 dot complang dot tuwien dot ac dot at
2004-05-01 15:13 ` pinskia at gcc dot gnu dot org
2004-05-02  7:24 ` anton at a0 dot complang dot tuwien dot ac dot at
2004-05-02 19:06 ` jsm at polyomino dot org dot uk
2004-05-04 15:06 ` anton at a0 dot complang dot tuwien dot ac dot at
2004-05-06  4:45 ` pinskia at gcc dot gnu dot org
2004-05-23  9:40 ` [Bug rtl-optimization/15242] " zlomek at gcc dot gnu dot org
2004-05-25  9:28 ` zlomek at gcc dot gnu dot org
2004-05-26 18:57 ` pinskia at gcc dot gnu dot org
2004-06-16  7:01 ` pinskia at gcc dot gnu dot org
2004-07-06 20:35 ` anton at mips dot complang dot tuwien dot ac dot at
2004-07-06 22:00 ` pinskia at gcc dot gnu dot org
2004-07-07  4:49 ` zlomj9am at artax dot karlin dot mff dot cuni dot cz
2004-07-07 13:34 ` zlomek at gcc dot gnu dot org
2004-07-08 13:03 ` anton at mips dot complang dot tuwien dot ac dot at
2004-07-16  6:50 ` zlomek at gcc dot gnu dot org
2004-07-16  7:01 ` zlomj9am at artax dot karlin dot mff dot cuni dot cz
2004-07-16  8:17 ` anton at mips dot complang dot tuwien dot ac dot at
2004-07-16  8:42 ` zlomj9am at artax dot karlin dot mff dot cuni dot cz
2004-10-03 20:25 ` bernd dot paysan at gmx dot de
2004-10-04 12:21 ` giovannibajo at libero dot it
2004-10-25 21:10 ` steven at gcc dot gnu dot org
2004-12-18 16:46 ` [Bug rtl-optimization/15242] [4.0 regression] " steven at gcc dot gnu dot org
2004-12-18 18:18 ` [Bug rtl-optimization/15242] [3.3/3.4/4.0 " pinskia at gcc dot gnu dot org
2005-01-26 17:28 ` steven at gcc dot gnu dot org
2005-01-27 13:45 ` pinskia at gcc dot gnu dot org
2005-02-01 10:04 ` cvs-commit at gcc dot gnu dot org
2005-02-01 10:11 ` [Bug rtl-optimization/15242] [3.3/3.4 " steven at gcc dot gnu dot org
2005-02-07 18:56 ` kazu at cs dot umass dot edu
2005-02-07 19:00 ` pinskia at gcc dot gnu dot org
2005-02-27 18:12 ` anton at mips dot complang dot tuwien dot ac dot at
2005-03-10 12:48 ` steven at gcc dot gnu dot org
2005-03-12 21:38 ` anton at mips dot complang dot tuwien dot ac dot at
2005-03-12 21:54 ` stevenb at suse dot de
2005-04-03 13:24 ` schwab at suse dot de
2005-04-07  9:03 ` schwab at suse dot de
2005-04-07 14:45 ` schwab at suse dot de
2005-05-19 17:36 ` mmitchel at gcc dot gnu dot org

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20040501142654.15242.anton@mips.complang.tuwien.ac.at \
    --to=gcc-bugzilla@gcc.gnu.org \
    --cc=gcc-bugs@gcc.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).