From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-bugs-return-477645-listarch-gcc-bugs=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 5571 invoked by alias); 18 Feb 2015 11:09:39 -0000
Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-bugs.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-bugs/>
List-Post: <mailto:gcc-bugs@gcc.gnu.org>
List-Help: <mailto:gcc-bugs-help@gcc.gnu.org>
Sender: gcc-bugs-owner@gcc.gnu.org
Received: (qmail 5486 invoked by uid 48); 18 Feb 2015 11:09:35 -0000
From: "rguenth at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug tree-optimization/51017] [4.8/4.9/5 Regression] GCC performance regression (vs. 4.4/4.5), PRE increases register pressure too much
Date: Wed, 18 Feb 2015 11:09:00 -0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: tree-optimization
X-Bugzilla-Version: 4.6.2
X-Bugzilla-Keywords: missed-optimization
X-Bugzilla-Severity: normal
X-Bugzilla-Who: rguenth at gcc dot gnu.org
X-Bugzilla-Status: ASSIGNED
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: rguenth at gcc dot gnu.org
X-Bugzilla-Target-Milestone: 4.8.5
X-Bugzilla-Flags:
X-Bugzilla-Changed-Fields:
Message-ID: <bug-51017-4-rnhfTbRA1t@http.gcc.gnu.org/bugzilla/>
In-Reply-To: <bug-51017-4@http.gcc.gnu.org/bugzilla/>
References: <bug-51017-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: 7bit
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-SW-Source: 2015-02/txt/msg01978.txt.bz2

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51017
--- Comment #21 from Richard Biener <rguenth at gcc dot gnu.org> ---
We do already inhibit creating loop-carried dependencies of some kind, but only
when vectorization is enabled (because it can inhibit vectorization).  But we
still PRE invariant loads:

Replaced MEM[(vtype * {ref-all})&DES_bs_all + 20528B] with prephitmp_2898 in
all uses of _1195 = MEM[(vtype * {ref-all})&DES_bs_all + 20528B] because we
know
it's {0, 0} on entry.  Note that store motion doesn't apply here because
those stores are said to alias with the MEM[(vtype * {ref-all})k_2 + 848B]
kinds (iterating DES_bs_all.KS.v - unfortunately field-sensitive points-to
analysis doesn't help here as the points-to result itself isn't
field-sensitive).
Of course without store-motion applying this kind of PRE is not really useful.
If store-motion applied it would create the same kind of problem, of course
(in this case up to 0x300(?) live registers).

One possible solution is to simply avoid this kind of "partly" store-motion,
that is converting

  for (;;)
    reg = MEM;
    MEM = fn(reg);

to

  reg = MEM;
  for (;;)
    reg = fn(reg);
    MEM = reg;

of course this is also a profitable transform.  Thus the solution might be
instead to limit register pressure in some way by somehow assessing costs
to individual transforms.  At least it seems to be too difficult for
the register allocator to re-materialize 'reg' from MEM (as it would also
need to perform sophisticated analysis to determine that, basically
undoing the PRE transform).