public inbox for gcc-bugs@sourceware.org help / color / mirror / Atom feed
From: "anton at mips dot complang dot tuwien dot ac dot at" <gcc-bugzilla@gcc.gnu.org> To: gcc-bugs@gcc.gnu.org Subject: [Bug optimization/15242] New: pessimization of "goto *" Date: Sat, 01 May 2004 14:27:00 -0000 [thread overview] Message-ID: <20040501142654.15242.anton@mips.complang.tuwien.ac.at> (raw) This is essentially the 3.4.0 incarnation of the problem discussed at the end of PR8092. mmitchel suggested to create a new bug report, so here it is: Code example and observed behaviour ----------------------------------- In the attached code, the fragment between I_question_branch and J_question_branch_lp_plus_store_number contains the following code (slightly edited for readability): I_question_branch: { Cell * a_target; Bool f; ; (( a_target )=(Cell *)( ( (* (ip) ) ) )) ; (( f )=(Bool)( (sp[0]) )) ; ({ ip+=( 1 );}) ; sp += 1; { if (f==0) { ({ip=( (Xt *)a_target ); ;}) ; ; (ip++) ; ; ({asm("":"=X"(cfa)); goto **(ip-1);}) ; } ; } ; (ip++) ; ; K_question_branch : ({asm("":"=X"(cfa)); goto **(ip-1);}) ; } J_question_branch_lp_plus_store_number : When the file is compiled with gcc-3.4.0 -I../../gforth-0.6.2/engine/../arch/386 -I. -Wall -O2 -fomit-frame-pointer -fforce-addr -fforce-mem -march=pentium -DHAVE_CONFIG_H -DDEFAULTPATH='".:/usr/local/lib/gforth/site-forth:/usr/local/share/gforth/site-forth:/usr/local/lib/gforth/0.6.2:/usr/local/share/gforth/0.6.2"' -fno-gcse -fno-strict-aliasing -fno-defer-pop -fcaller-saves -fno-inline -S ef.i -o ef-3.4.0-def.s this produces the following code: .L20: movl (%edi), %eax movl (%ebp), %edx addl $4, %edi addl $4, %ebp testl %eax, %eax jne .L1073 leal 4(%edx), %ebp movl -4(%ebp), %eax jmp .L1373 .L739: ... #actually the following fragment comes earlier .L1073: addl $4, %ebp .L376: movl -4(%ebp), %eax jmp .L1373 ... #the following fragment comes even earlier. .L1373: jmp *%eax What is the problem? -------------------- The main problem here is that the code jumps to a shared "jmp *eax" instead of just putting "jmp *eax" (or, in this case, preferably "jmp *-4(%ebp)") where the jump to the shared indirect jump is. All indirect jumps are converted into jumps to a single shared indirect jump in this way. This obstructs effective usage of the BTB for branch prediction on CPUs such as the Athlon, Opteron, and Pentium III (see below), and most interpreters using GCC's labels-as-values extension will be affected by this. It also makes it impossible for us to use dynamic superinstructions portably (aka selective inlining, PLDI'98 p. 291). I could eventually find ways to work around the problems we have had with 3.2 and 3.3, but this time I don't see a workaround (and a quick look for new options revealed nothing). A smaller problem is that the basic blocks are distributed throughout the code unless we use -fno-reorder-blocks; this reduces the number of cases where we can apply dynamic superinstructions. For gcc-3.3, using -fno-crossjumping suppressed the main problem, but not in combination with -fno-reorder-blocks. So if you fix the main problem, it would be nice if it also worked with -fno-reorder-blocks. How much does this cost in performance? --------------------------------------- Here are the results of doing a configure --enable-force-reg CC=...; make; make onebench ENGINE_FAST="./gforth-fast --dynamic"; make onebench ENGINE_FAST="./gforth-fast --no-dynamic" for gforth-0.6.2 with various compilers (The ef.i file used for the code example above was also created from gforth-0.6.2, with gcc-2.95 and without --enable-force-reg): Pentium III 1000MHz; numbers are times in seconds user time gcc-2.95.3 gcc-3.3 gcc-3.4.0 dynamic no-dyn dynamic no-dyn no-dyn siev 0.56 0.85 0.67 0.90 2.31 bubble 0.72 1.28 0.93 1.35 2.68 matrix 0.35 1.43 0.35 1.38 1.90 fib 0.76 1.11 0.95 1.27 2.90 Here you see a factor 3.7-5.4 between the best result (gcc-2.95 dynamic) and the gcc-3.4.0 result for matrix. This slowdown has two main components: - we cannot apply the dynamic optimizations (and I don't see a portable way to work around this problem), resulting in a slowdown factors of 1.5-4. - from the shared indirect jump; the jump to this indirect jump costs something, but the more important component of the slowdown is probably the reduction in branch prediction accuracy for the indirect jumps; the resulting slowdown is about a factor of 1.4-2.5. On the Athlon I have seen similar results in the past. Here are results from a Pentium 4: Pentium 4 2.26GHz; numbers are times in seconds user time gcc-2.95.3 gcc-3.3 gcc-3.4.0 dynamic no-dyn dynamic no-dyn no-dyn siev 0.24 0.48 0.31 0.47 0.50 bubble 0.30 0.78 0.36 0.77 0.78 matrix 0.19 0.94 0.17 0.92 0.96 fib 0.34 0.57 0.41 0.58 0.59 On the Pentium 4 the second component apparently does not play a big role, probably due to the effects of the trace cache, but the first component results in even higher slowdowns (1.7-5) than on the PentiumIII. Do flag variations change the problem? -------------------------------------- Adding -fno-reorder-buffers moves .L1073 right below the first "jmp .L1373" (as it should), but the jumps to .L1373 are still there. Adding -fno-crossjumping in either case changes essentially nothing. Reducing the command line to gcc-3.4.0 -Wall -O2 -fomit-frame-pointer -S ef.i -o ef-3.4.0-nofs.s produces similar results (but the register allocation is worse). Suggested fix ------------- AFAIK gcc introduces the shared indirect jump for the benefit of data-flow analysis. In gcc-3.3 with some combinations of flags this change was undone later (but the pass that did this was suppressed with -fno-reorder-jumps). In 3.4.0 the undoing is apparently completely disabled. One fix would be to put the undoing into a pass that is invoked after the data-flow analysis, but before the last combining pass, and that is independent of flags such as -fno-reorder-blocks (maybe a separate pass?). An alternative would be to produce the desired control-flow graph for data-flow analysis without changing the code (i.e., without introducing shared jumps). I might give writing one of these approaches pass a try, but would prefer to see it done by people who know more about gcc internals than I do. In case I do it, I would apreciate feedback and suggestions on how to proceed. - anton -- Summary: pessimization of "goto *" Product: gcc Version: 3.4.0 Status: UNCONFIRMED Severity: normal Priority: P2 Component: optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: anton at mips dot complang dot tuwien dot ac dot at CC: bernd dot paysan at gmx dot de,gcc-bugs at gcc dot gnu dot org GCC build triplet: i686-pc-linux-gnu GCC host triplet: i686-pc-linux-gnu GCC target triplet: i686-pc-linux-gnu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=15242
next reply other threads:[~2004-05-01 14:27 UTC|newest] Thread overview: 43+ messages / expand[flat|nested] mbox.gz Atom feed top 2004-05-01 14:27 anton at mips dot complang dot tuwien dot ac dot at [this message] 2004-05-01 14:30 ` [Bug optimization/15242] " anton at mips dot complang dot tuwien dot ac dot at 2004-05-01 14:38 ` pinskia at gcc dot gnu dot org 2004-05-01 14:41 ` pinskia at gcc dot gnu dot org 2004-05-01 14:42 ` pinskia at gcc dot gnu dot org 2004-05-01 15:03 ` anton at a0 dot complang dot tuwien dot ac dot at 2004-05-01 15:13 ` pinskia at gcc dot gnu dot org 2004-05-02 7:24 ` anton at a0 dot complang dot tuwien dot ac dot at 2004-05-02 19:06 ` jsm at polyomino dot org dot uk 2004-05-04 15:06 ` anton at a0 dot complang dot tuwien dot ac dot at 2004-05-06 4:45 ` pinskia at gcc dot gnu dot org 2004-05-23 9:40 ` [Bug rtl-optimization/15242] " zlomek at gcc dot gnu dot org 2004-05-25 9:28 ` zlomek at gcc dot gnu dot org 2004-05-26 18:57 ` pinskia at gcc dot gnu dot org 2004-06-16 7:01 ` pinskia at gcc dot gnu dot org 2004-07-06 20:35 ` anton at mips dot complang dot tuwien dot ac dot at 2004-07-06 22:00 ` pinskia at gcc dot gnu dot org 2004-07-07 4:49 ` zlomj9am at artax dot karlin dot mff dot cuni dot cz 2004-07-07 13:34 ` zlomek at gcc dot gnu dot org 2004-07-08 13:03 ` anton at mips dot complang dot tuwien dot ac dot at 2004-07-16 6:50 ` zlomek at gcc dot gnu dot org 2004-07-16 7:01 ` zlomj9am at artax dot karlin dot mff dot cuni dot cz 2004-07-16 8:17 ` anton at mips dot complang dot tuwien dot ac dot at 2004-07-16 8:42 ` zlomj9am at artax dot karlin dot mff dot cuni dot cz 2004-10-03 20:25 ` bernd dot paysan at gmx dot de 2004-10-04 12:21 ` giovannibajo at libero dot it 2004-10-25 21:10 ` steven at gcc dot gnu dot org 2004-12-18 16:46 ` [Bug rtl-optimization/15242] [4.0 regression] " steven at gcc dot gnu dot org 2004-12-18 18:18 ` [Bug rtl-optimization/15242] [3.3/3.4/4.0 " pinskia at gcc dot gnu dot org 2005-01-26 17:28 ` steven at gcc dot gnu dot org 2005-01-27 13:45 ` pinskia at gcc dot gnu dot org 2005-02-01 10:04 ` cvs-commit at gcc dot gnu dot org 2005-02-01 10:11 ` [Bug rtl-optimization/15242] [3.3/3.4 " steven at gcc dot gnu dot org 2005-02-07 18:56 ` kazu at cs dot umass dot edu 2005-02-07 19:00 ` pinskia at gcc dot gnu dot org 2005-02-27 18:12 ` anton at mips dot complang dot tuwien dot ac dot at 2005-03-10 12:48 ` steven at gcc dot gnu dot org 2005-03-12 21:38 ` anton at mips dot complang dot tuwien dot ac dot at 2005-03-12 21:54 ` stevenb at suse dot de 2005-04-03 13:24 ` schwab at suse dot de 2005-04-07 9:03 ` schwab at suse dot de 2005-04-07 14:45 ` schwab at suse dot de 2005-05-19 17:36 ` mmitchel at gcc dot gnu dot org
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20040501142654.15242.anton@mips.complang.tuwien.ac.at \ --to=gcc-bugzilla@gcc.gnu.org \ --cc=gcc-bugs@gcc.gnu.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).