From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 32403 invoked by alias); 27 Mar 2008 05:53:35 -0000 Received: (qmail 32324 invoked by alias); 27 Mar 2008 05:52:59 -0000 Date: Thu, 27 Mar 2008 05:53:00 -0000 Message-ID: <20080327055259.32323.qmail@sourceware.org> X-Bugzilla-Reason: CC References: Subject: [Bug target/35695] [4.3/4.4 Regression] -funroll-loops breaks inline float divide In-Reply-To: Reply-To: gcc-bugzilla@gcc.gnu.org To: gcc-bugs@gcc.gnu.org From: "wilson at tuliptree dot org" Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-bugs-owner@gcc.gnu.org X-SW-Source: 2008-03/txt/msg02125.txt.bz2 ------- Comment #2 from wilson at tuliptree dot org 2008-03-27 05:52 ------- Subject: Re: New: [4.3/4.4 regression] -funroll-loops breaks inline float divide On Tue, 2008-03-25 at 17:29 +0000, schwab at suse dot de wrote: > With -funroll-loops the insn that computes e = 1 - (b * y) is optimized in cse2 > to e = 0. It seems that tree-ssa only partly unrolls and simplifies the loop. It isn't until the RTL loop optimization pass that we figure out that the entire loop disappears after unrolling. At that point, we have constant divides 1.0/1.0, 1.0/2.0, 1.0/2.0, and 1.0/3.0. Unfortunately, at RTL expansion time, we already emitted long recip approx sequences. The second cse pass tries to propagate the FP constants into the recip approx sequence and we get a mess. I think the main problem here is that the reciprocal approximation pattern is using div, which misleads the RTL optimizer into thinking that we have a divide result when we actually don't. Changing this to use an UNSPEC instead seems to solve the problem, as this prevents the cse optimization. I just fixed the one recip pattern in div.md, but the others should probably be fixed also. I only tested this with a cross compiler; I don't want to disturb the neighbors by turning my Itanium machine on this late in the evening. There is another problem here that we don't really need a long sequence to compute 1.0/2.0, but that is going to take some thought. Delaying the expansion of the recip approx sequence might help, but will probably also hurt in other cases. We do have REG_EQUAL notes at the end of the recip approx sequences, maybe we can do something with those, like pre-compute the constant divide result and place it in the constant pool. Jim -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35695