[Bug tree-optimization/38785] huge performance regression on EEMBC bitmnp01

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

From: "amylaar at gcc dot gnu dot org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug tree-optimization/38785] huge performance regression on EEMBC bitmnp01
Date: Sat, 10 Jan 2009 16:10:00 -0000	[thread overview]
Message-ID: <20090110161021.10810.qmail@sourceware.org> (raw)
In-Reply-To: <bug-38785-5394@http.gcc.gnu.org/bugzilla/>

------- Comment #6 from amylaar at gcc dot gnu dot org  2009-01-10 16:10 -------
(In reply to comment #5)
> Joern, re. comment #4, Richi refers to my patch to enable PRE at -Os, see [1]. 
> An extension to this patch that we tested on x86 machines, is to disable PRE
> for scalar integer registers, via SMALL_REGISTER_CLASSES.  I changed
> SMALL_REGISTER_CLASSES into a target hook for this purpose, see [2]. You could
> play with this, see if you can use this to cure your problem...

This is not a problem of having inserted more expressions.  The expressions
actually go down, 7 add expressions are actually eliminated.
The problem is that this comes at the cost of addding 127 phi nodes which
are not free. It's a cascade of 64/32/16/8/4/2/1 phi nodes that are used
to compute the 7 chained adds through all the possible paths through the
6 consecutive if-blocks, and this requires 64/32/16/8/4/2/1 reg-reg copies
to implement inside these if-blocks, plus 64 unconditional constant loads
at the start.
requiring 64 + x registers for this one computation alone does cause register
allocation trouble for ARCompact, but it is in good company here, as lost of
other RISC architectures also have no more than 32 general purpose registers.
And even if you did this for a processor with lots of registers like the i960,
requiring 64 constant loads in the unconditional path and then having 127
conditional reg-reg copies is certainly worse than having 7 conditional add
operations.

I think the problem is that the algorithm, like many SSA algorithm,
has no idea of the run time cost of a phi node, and uses the lower
bound 0 as an approximation.
As you make more sophisticated algorithms to approximate the cost minimum
for a flawed cost function, you will pessimize more and more code.

Is there a way to determine when replacing one expression causes the
number of phi nodes in a dominator to increase?  I would think that
this criterion would be a possibly useful indicator of register
pressure and instruction count increase.

I haven't looked wht the ssa pre does exactly, but from the code
transformation performed I would guess that it sees one expression which
is fed by a directed acyclic graph (DAG) of phi nodes with constant leaves,
and figures it only needs to use a replacement DAG of phi nodes where the
expression is evaluated for each constant, assuming that the original DAG
will be mostly dead then.
However, what happens in the testcase is that the original DAG is still
fully live, the replacement DAG is added in parallel, and thus the number
of phi nodes from the original DAG has been doubled.

For the testcase (and for fbital, if you have access to it), it is
interesting to observe that the problem would not arise if pre had a notion
of what expressions better to leave for conditional execution.
Unfortunately, many architectures, conditional execution requires
two-address (wrt the ALU) operations, which cannot be expressed in SSA.

-- 

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38785

next prev parent reply	other threads:[~2009-01-10 16:10 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-01-09 15:46 [Bug tree-optimization/38785] New: " amylaar at gcc dot gnu dot org
2009-01-09 15:55 ` [Bug tree-optimization/38785] " rguenth at gcc dot gnu dot org
2009-01-09 16:39 ` amylaar at gcc dot gnu dot org
2009-01-09 17:35 ` amylaar at gcc dot gnu dot org
2009-01-09 17:59 ` rguenth at gcc dot gnu dot org
2009-01-09 20:55 ` steven at gcc dot gnu dot org
2009-01-10 16:10 ` amylaar at gcc dot gnu dot org [this message]
2009-01-14 10:08 ` Joey dot ye at intel dot com
2009-01-14 10:54 ` steven at gcc dot gnu dot org
2009-01-14 18:47 ` amylaar at gcc dot gnu dot org
2009-01-14 20:51 ` rguenther at suse dot de
2009-01-14 22:06 ` amylaar at gcc dot gnu dot org
2009-01-15 11:36 ` amylaar at gcc dot gnu dot org
2009-01-20 23:02 ` steven at gcc dot gnu dot org
2009-03-04 22:58 ` amylaar at gcc dot gnu dot org
2009-03-05  0:32 ` amylaar at gcc dot gnu dot org
2009-03-31 16:08 ` [Bug tree-optimization/38785] [4.3/4.4/4.5 Regression] " jsm28 at gcc dot gnu dot org
2009-04-14  9:49 ` jakub at gcc dot gnu dot org
2009-07-15 21:12 ` steven at gcc dot gnu dot org
2009-07-15 21:35 ` steven at gcc dot gnu dot org
2009-07-23 21:51 ` drow at gcc dot gnu dot org
2009-07-23 22:23 ` stevenb dot gcc at gmail dot com
2009-08-04 12:46 ` rguenth at gcc dot gnu dot org
2010-01-29 20:33 ` amylaar at gcc dot gnu dot org
2010-02-19 12:18 ` rguenth at gcc dot gnu dot org
2010-02-19 12:29 ` amylaar at gcc dot gnu dot org
2010-02-19 13:19 ` matz at gcc dot gnu dot org
2010-02-19 14:08 ` drow at gcc dot gnu dot org
2010-02-19 23:33 ` stevenb dot gcc at gmail dot com
2010-05-22 18:28 ` [Bug tree-optimization/38785] [4.3/4.4/4.5/4.6 " rguenth at gcc dot gnu dot org

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090110161021.10810.qmail@sourceware.org \
    --to=gcc-bugzilla@gcc.gnu.org \
    --cc=gcc-bugs@gcc.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).