From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 15671 invoked by alias); 27 Jun 2005 19:20:28 -0000 Mailing-List: contact gcc-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Archive: List-Post: List-Help: Sender: gcc-owner@gcc.gnu.org Received: (qmail 15617 invoked by uid 22791); 27 Jun 2005 19:20:22 -0000 Received: from mail-out4.apple.com (HELO mail-out4.apple.com) (17.254.13.23) by sourceware.org (qpsmtpd/0.30-dev) with ESMTP; Mon, 27 Jun 2005 19:20:22 +0000 Received: from mailgate2.apple.com (a17-128-100-204.apple.com [17.128.100.204]) by mail-out4.apple.com (8.12.11/8.12.11) with ESMTP id j5RJKK7k001383 for ; Mon, 27 Jun 2005 12:20:20 -0700 (PDT) Received: from relay1.apple.com (relay1.apple.com) by mailgate2.apple.com (Content Technologies SMTPRS 4.3.17) with ESMTP id for ; Mon, 27 Jun 2005 12:20:20 -0700 Received: from [17.201.21.188] (jahan5.apple.com [17.201.21.188]) by relay1.apple.com (8.12.11/8.12.11) with ESMTP id j5RJKI2e001689; Mon, 27 Jun 2005 12:20:18 -0700 (PDT) In-Reply-To: References: Mime-Version: 1.0 (Apple Message framework v728) Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed Message-Id: <6269B965-AF17-4FBD-B3ED-B1BA96936380@apple.com> Cc: gcc@gcc.gnu.org Content-Transfer-Encoding: 7bit From: Fariborz Jahanian Subject: Re: [RFH] - Less than optimal code compiling 252.eon -O2 for x86 Date: Mon, 27 Jun 2005 19:20:00 -0000 To: Fariborz Jahanian X-SW-Source: 2005-06/txt/msg01063.txt.bz2 FYI, the change to rtl in -O2 vs. -O1 is that -O2 includes -fforce- mem which forces memory operands to registers to make memory references common sub-expressions. In this case, the constant double float value is assigned to an xmm register which is used where it is needed. So, I would say this behavior is as expected but not ideal for x86 where a couple of 'movl $0x0, mem' may be preferred to a single 'movsd %xmm7, mem' for 252.eon on x86-darwin. - fariborz On Jun 24, 2005, at 3:07 PM, Fariborz Jahanian wrote: > A source file mrSurfaceList.cc of 252.eon produces less efficient > code initializing instance objects to 0 at -O2 than at -O1. > Behavior is random and it does not happen on all x86 platforms and > making the test smaller makes the problem go away. But here is what > I found out is the cause. > > When source is compiled with -O1 -march=pentium4, 'cse' phase sees > the following pattern initializing a 'double' with 0. > > (insn 18 13 19 0 (set (reg:SF 109) > (mem/u/i:SF (symbol_ref/u:SI ("*LC11") [flags 0x2]) [0 S4 > A32])) -1 (nil) > (nil)) > > (insn 19 18 20 0 (set (mem/s/j:DF (plus:SI (reg/f:SI 20 frame) > (const_int -32 [0xffffffffffffffe0])) [0 > objectBox.pmin.e+16 S8 A128]) > (float_extend:DF (reg:SF 109))) 86 {*extendsfdf2_sse} (nil) > (nil)) > > Then fold_rtx routine converts it into its reduced form, resulting > in optimum code: > > (insn 19 13 21 0 (set (mem/s/j:DF (plus:SI (reg/f:SI 20 frame) > (const_int -32 [0xffffffffffffffe0])) [0 > objectBox.pmin.e+16 S8 A128]) > (const_double:DF 0.0 [0x0.0p+0])) 64 {*movdf_nointeger} (nil) > (nil)) > > > But when the same source is compiled with -O2 march=pentium4, 'cse' > phase sees a slightly different pattern (note that float_extend:DF > has moved) > > (insn 18 13 19 0 (set (reg:DF 109) > (float_extend:DF (mem/u/i:SF (symbol_ref/u:SI ("*LC13") > [flags 0x2]) [0 S4 A32]))) -1 (nil) > (nil)) > > (insn 19 18 20 0 (set (mem/s/j:DF (plus:SI (reg/f:SI 20 frame) > (const_int -32 [0xffffffffffffffe0])) [0 > objectBox.pmin.e+16 S8 A128]) > (reg:DF 109)) 64 {*movdf_nointeger} (nil) > (nil)) > > This cannot be simplified by fold_rtx, resulting in less efficient > code. > > Change in pattern is most likely because of additional tree > optimization phases running at -O2. If so, then should the cse be > taught to simplify the new rtl pattern. Or, the tree optimizer > phase responsible for the less than optimal tree need be twiked to > generate the same tree as with -O1? > > Thanks, fariborz > >