From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-bugs-return-248499-listarch-gcc-bugs=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 32403 invoked by alias); 27 Mar 2008 05:53:35 -0000
Received: (qmail 32324 invoked by alias); 27 Mar 2008 05:52:59 -0000
Date: Thu, 27 Mar 2008 05:53:00 -0000
Message-ID: <20080327055259.32323.qmail@sourceware.org>
X-Bugzilla-Reason: CC
References: <bug-35695-50@http.gcc.gnu.org/bugzilla/>
Subject: [Bug target/35695] [4.3/4.4 Regression] -funroll-loops breaks inline float divide
In-Reply-To: <bug-35695-50@http.gcc.gnu.org/bugzilla/>
Reply-To: gcc-bugzilla@gcc.gnu.org
To: gcc-bugs@gcc.gnu.org
From: "wilson at tuliptree dot org" <gcc-bugzilla@gcc.gnu.org>
Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-bugs.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-bugs/>
List-Post: <mailto:gcc-bugs@gcc.gnu.org>
List-Help: <mailto:gcc-bugs-help@gcc.gnu.org>
Sender: gcc-bugs-owner@gcc.gnu.org
X-SW-Source: 2008-03/txt/msg02125.txt.bz2


------- Comment #2 from wilson at tuliptree dot org  2008-03-27 05:52 -------
Subject: Re:   New: [4.3/4.4 regression] -funroll-loops
        breaks inline float divide

On Tue, 2008-03-25 at 17:29 +0000, schwab at suse dot de wrote:
> With -funroll-loops the insn that computes e = 1 - (b * y) is optimized in cse2
> to e = 0.

It seems that tree-ssa only partly unrolls and simplifies the loop.  It
isn't until the RTL loop optimization pass that we figure out that the
entire loop disappears after unrolling.  At that point, we have constant
divides 1.0/1.0, 1.0/2.0, 1.0/2.0, and 1.0/3.0.  Unfortunately, at RTL
expansion time, we already emitted long recip approx sequences.  The
second cse pass tries to propagate the FP constants into the recip
approx sequence and we get a mess.

I think the main problem here is that the reciprocal approximation
pattern is using div, which misleads the RTL optimizer into thinking
that we have a divide result when we actually don't.  Changing this to
use an UNSPEC instead seems to solve the problem, as this prevents the
cse optimization.  I just fixed the one recip pattern in div.md, but the
others should probably be fixed also.  I only tested this with a cross
compiler; I don't want to disturb the neighbors by turning my Itanium
machine on this late in the evening.

There is another problem here that we don't really need a long sequence
to compute 1.0/2.0, but that is going to take some thought.  Delaying
the expansion of the recip approx sequence might help, but will probably
also hurt in other cases.  We do have REG_EQUAL notes at the end of the
recip approx sequences, maybe we can do something with those, like
pre-compute the constant divide result and place it in the constant
pool.

Jim


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35695