From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 5571 invoked by alias); 18 Feb 2015 11:09:39 -0000 Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-bugs-owner@gcc.gnu.org Received: (qmail 5486 invoked by uid 48); 18 Feb 2015 11:09:35 -0000 From: "rguenth at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug tree-optimization/51017] [4.8/4.9/5 Regression] GCC performance regression (vs. 4.4/4.5), PRE increases register pressure too much Date: Wed, 18 Feb 2015 11:09:00 -0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: tree-optimization X-Bugzilla-Version: 4.6.2 X-Bugzilla-Keywords: missed-optimization X-Bugzilla-Severity: normal X-Bugzilla-Who: rguenth at gcc dot gnu.org X-Bugzilla-Status: ASSIGNED X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: rguenth at gcc dot gnu.org X-Bugzilla-Target-Milestone: 4.8.5 X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-SW-Source: 2015-02/txt/msg01978.txt.bz2 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51017 --- Comment #21 from Richard Biener --- We do already inhibit creating loop-carried dependencies of some kind, but only when vectorization is enabled (because it can inhibit vectorization). But we still PRE invariant loads: Replaced MEM[(vtype * {ref-all})&DES_bs_all + 20528B] with prephitmp_2898 in all uses of _1195 = MEM[(vtype * {ref-all})&DES_bs_all + 20528B] because we know it's {0, 0} on entry. Note that store motion doesn't apply here because those stores are said to alias with the MEM[(vtype * {ref-all})k_2 + 848B] kinds (iterating DES_bs_all.KS.v - unfortunately field-sensitive points-to analysis doesn't help here as the points-to result itself isn't field-sensitive). Of course without store-motion applying this kind of PRE is not really useful. If store-motion applied it would create the same kind of problem, of course (in this case up to 0x300(?) live registers). One possible solution is to simply avoid this kind of "partly" store-motion, that is converting for (;;) reg = MEM; MEM = fn(reg); to reg = MEM; for (;;) reg = fn(reg); MEM = reg; of course this is also a profitable transform. Thus the solution might be instead to limit register pressure in some way by somehow assessing costs to individual transforms. At least it seems to be too difficult for the register allocator to re-materialize 'reg' from MEM (as it would also need to perform sophisticated analysis to determine that, basically undoing the PRE transform).