public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
From: "jake.stine at gmail dot com" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug tree-optimization/54073] [4.7 Regression] SciMark Monte Carlo test performance has seriously decreased in recent GCC releases
Date: Sat, 16 Feb 2013 19:12:00 -0000	[thread overview]
Message-ID: <bug-54073-4-8DirZzV8pz@http.gcc.gnu.org/bugzilla/> (raw)
In-Reply-To: <bug-54073-4@http.gcc.gnu.org/bugzilla/>


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54073

Jake Stine <jake.stine at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jake.stine at gmail dot com

--- Comment #16 from Jake Stine <jake.stine at gmail dot com> 2013-02-16 19:12:05 UTC ---
Hi,

I have done quite a bit of analysis on cmov performance across x86
architectures, so I will share here in case it helps:

Quick summary: Conditional moves on Intel Core/Xeon and AMD Bulldozer
architectures should probably be avoided "as a rule."

History: Conditional moves were beneficial for the Intel Pentium 4, and also
(but less-so) for AMD Athlon/Phenom chips.  In the AMD Athlon/Phenom case the
performance of cmov vs cmp+branch is determined more by the alignment of the
target of the branch, than by the prediction rate of the branch.  The
instruction decoders would incur penalties on certain types of unaligned branch
targets (when taken), or when decoding sequences of instructions that contained
multiple branches within a 16byte "fetch" window (taken or not).  cmov was
sometimes handy for avoiding those.

With regard to more current Intel Core and AMD Bulldozer/Bobcat architecture:

I have found that use of conditional moves (cmov) is only beneficial if the
branch that the move is replacing is badly mis-predicted.  In my tests, the
cmov only became clearly "optimal" when the branch was predicted correctly less
than 92% of the time, which is abysmal by modern branch predictor standards and
rarely occurs in practice.  Above 97% prediction rates, cmov is typically
slower than cmp+branch. Inside loops that contain branches with prediction
rates approaching 100% (as is the case presented by the OP), cmov becomes a
severe performance bottleneck.  This holds true for both Core and Bulldozer. 
Bulldozer has less efficient branching than the i7, but is also severely
bottlenecked by its limited fetch/decode.  Cmov requires executing more total
instructions, and that makes Bulldozer very unhappy.

Note that my tests involved relatively simple loops that did not suffer from
the added register pressure that cmov introduces.  In practice, the prognosis
for cmov being "optimal" is even worse than what I've observed in a controlled
environment.  Furthermore, to my knowledge the status of cmov vs. branch
performance on x86 will not be changing anytime soon.  cmov will continue to be
a liability well into the next couple architecture releases from Intel and AMD.
 Piledriver will have added fetch/decode resources but should also have a
smaller mispredict penalty, so its doubtful cmov will gain much advantages
there either.

Therefore I would recommend setting -fno-tree-loop-if-convert for all -march
matching Intel Core and AMD Bulldozer/Bobcat families.


There is one good use-case for cmov on x86:  Mis-predicted conditions inside of
loops.  Currently there's no way to force that behavior in situations where I,
the programmer, am fully aware that the condition is chaotic/random.  A builtin
cmov or condition hint would be nice.  For now I'm forced to address those
(fortunately infrequent) situations via inline asm.


  parent reply	other threads:[~2013-02-16 19:12 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-07-23 15:25 [Bug tree-optimization/54073] New: " t.artem at mailcity dot com
2012-07-23 15:44 ` [Bug tree-optimization/54073] " t.artem at mailcity dot com
2012-07-24  9:23 ` [Bug tree-optimization/54073] [4.7/4.8 Regression] " rguenth at gcc dot gnu.org
2012-07-24 11:29 ` markus at trippelsdorf dot de
2012-07-24 13:21 ` rguenth at gcc dot gnu.org
2012-07-26 15:41 ` venkataramanan.kumar at amd dot com
2012-07-26 16:13 ` markus at trippelsdorf dot de
2012-08-16 11:06 ` rguenth at gcc dot gnu.org
2012-09-07 10:09 ` rguenth at gcc dot gnu.org
2012-09-20 10:28 ` jakub at gcc dot gnu.org
2012-11-13 13:05 ` jakub at gcc dot gnu.org
2012-11-13 15:07 ` t.artem at mailcity dot com
2012-11-13 15:14 ` ubizjak at gmail dot com
2012-11-13 15:24 ` jakub at gcc dot gnu.org
2012-11-13 15:55 ` hubicka at gcc dot gnu.org
2012-11-16 11:41 ` jakub at gcc dot gnu.org
2012-11-16 14:50 ` [Bug tree-optimization/54073] [4.7 " jakub at gcc dot gnu.org
2012-12-31  9:41 ` pinskia at gcc dot gnu.org
2013-02-16 19:12 ` jake.stine at gmail dot com [this message]
2013-02-17  8:41 ` ubizjak at gmail dot com
2013-04-11  7:59 ` rguenth at gcc dot gnu.org
2014-06-12 13:16 ` rguenth at gcc dot gnu.org

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bug-54073-4-8DirZzV8pz@http.gcc.gnu.org/bugzilla/ \
    --to=gcc-bugzilla@gcc.gnu.org \
    --cc=gcc-bugs@gcc.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).