From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-patches-return-253841-listarch-gcc-patches=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 1284 invoked by alias); 2 Dec 2009 22:23:16 -0000
Received: (qmail 1273 invoked by uid 22791); 2 Dec 2009 22:23:15 -0000
X-SWARE-Spam-Status: No, hits=-2.4 required=5.0 	tests=AWL,BAYES_00,SPF_PASS
X-Spam-Check-By: sourceware.org
Received: from mail-ew0-f227.google.com (HELO mail-ew0-f227.google.com) (209.85.219.227)     by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Wed, 02 Dec 2009 22:23:11 +0000
Received: by ewy27 with SMTP id 27so855933ewy.16         for <gcc-patches@gcc.gnu.org>; Wed, 02 Dec 2009 14:23:08 -0800 (PST)
Received: by 10.213.96.226 with SMTP id i34mr1676629ebn.4.1259792588160;         Wed, 02 Dec 2009 14:23:08 -0800 (PST)
Received: from localhost (rsandifo.gotadsl.co.uk [82.133.89.107])         by mx.google.com with ESMTPS id 13sm1006288ewy.9.2009.12.02.14.23.05         (version=TLSv1/SSLv3 cipher=RC4-MD5);         Wed, 02 Dec 2009 14:23:05 -0800 (PST)
To: Michael Matz <matz@suse.de>
Mail-Followup-To: Michael Matz <matz@suse.de>,Paolo Bonzini <bonzini@gnu.org>, Richard Guenther <richard.guenther@gmail.com>, gcc-patches@gcc.gnu.org, rdsandiford@googlemail.com
Cc: Paolo Bonzini <bonzini@gnu.org>, 	  Richard Guenther <richard.guenther@gmail.com>, 	  gcc-patches@gcc.gnu.org
Subject: Re: RFC patch: invariant addresses too cheap
References: <Pine.LNX.4.64.0910191743440.15566@wotan.suse.de> 	<f865508f0910200506m1664a75et9dce10aa7fa094de@mail.gmail.com> 	<878werq41v.fsf@firetop.home> <hch59e$tpv$1@ger.gmane.org> 	<84fc9c000910310414l40824315m45ff282e98e3b640@mail.gmail.com> 	<4AEC1C84.9050606@gnu.org> <873a4zpth9.fsf@firetop.home> 	<4AEC48FF.7000808@gnu.org> <871vkiy24o.fsf@firetop.home> 	<874ooxmkf8.fsf@firetop.home> 	<f865508f0911150848o5bb0fc56ic32b6f38a363d25f@mail.gmail.com> 	<87skcfirlx.fsf@firetop.home> <87einjaski.fsf@firetop.home> 	<Pine.LNX.4.64.0912011254270.18785@wotan.suse.de>
From: Richard Sandiford <rdsandiford@googlemail.com>
Date: Wed, 02 Dec 2009 22:24:00 -0000
In-Reply-To: <Pine.LNX.4.64.0912011254270.18785@wotan.suse.de> (Michael Matz's message of "Tue\, 1 Dec 2009 13\:12\:20 +0100 \(CET\)")
Message-ID: <87pr6xrp6h.fsf@firetop.home>
User-Agent: Gnus/5.11 (Gnus v5.11) Emacs/22.2 (gnu/linux)
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
X-SW-Source: 2009-12/txt/msg00158.txt.bz2

Hi Michael,

Michael Matz <matz@suse.de> writes:
>> > Hmm.  Looking at the example in that PR, am I right in thinking that 
>> > the x86 backend says that the (%rdi,%rax,4) and 16(%rsp,%rax,4) 
>> > addresses have the same cost?  It looks that way from the code, and 
>> > fwprop wouldn't have made the substitution otherwise, right?
>> >
>> > If so, why do they have equal cost, given that replacing one address 
>> > with the other led to such a drastic performance drop?
>> >
>> > (I know I'm missing something here, sorry.)
>> 
>> But without understanding the answer to the question in my earlier mail,
>> it still seems to me that this whole idea of checking for cheap
>> addresses in loop-invariant.c is tackling things in the wrong way.
>
> The answer to your questions are: yes, you're right.  It is so to generate 
> better code, because as the comment in front of ix86_address_cost says, 
> it's better on x86 to generate more complex addresses than to load them 
> into registers, hence a constant offset isn't taken into account, neither 
> is a scaling.  (To say the truth, on some micro architectures addresses 
> that use all components are tougher to the decoder).  In the past we even 
> made more complex addresses _cheaper_ than simple one (!), which is, well 
> ..., wrong ;-) but once generated better code.

Ah, thanks, I see now.

> Unfortunately these target knobs are more an art than a science, and 
> defining them in a logical way derivable from hardware manuals often leads 
> to suboptimal code :-/

Agreed.  I've hit that too.  And I realise that to some extent it's
inevitable.

> Having said that, I'm not sure why the answer to this question matters 
> to you: neither the PR33928 testcase, nor 410.bwaves was about (reg,reg,4) 
> vs. ofs(reg,reg,4).  The former was about ofs(reg) vs ofs2(reg) and the 
> latter was about (reg) vs (reg,reg).

Well, the point was that if fwprop2 is allowed to make the substitution,
it would solve both of the problems you list.  And that seems like
conceptually the right fix.  But as Paolo says, allowing fwprop2 to make
the substitution would regress PR30907, because of the costs issue above.
So the answer mattered because I wanted a patch that involved fwprop2 but
that didn't regress 30907.

> FWIW I'd also say that fwprop2 would be a good place to sink invariants 
> generally (so that loop-invariant could hoist them as it wants).  But I 
> don't see how that will get rid of having to compute an absolute cheapness 
> (as fwprop2 would only want to sink the invariantes into cheap places).

But fwprop.c compares "before" and "after" costs, and I was thinking it
should do the same in this case too (if allowed to).  No absolute cost
checks should be needed.

Before 30907, fwprop2 already sank invariants into cheap places.
The problem was that the x86 backend called a substitution cheap
when in fact it wasn't.  It would be nice if

  (a) fwprop could once again propagate invariants into loops and
  (b) when it did so, it could ask for a stricter cost, so that we don't
      inadvertently replace one loop instruction with a more expensive
      loop instruction (as we did in 30907).

In other words, it sounds from your message that the current x86 costs
are mostly geared for two things:

  (1) stopping forms of CSE, hoisting, etc., from introducing unnecessary
      temporaries
  (2) allowing forward propagation within basic blocks of roughly
      the same frequency, to remove unnecessary temporaries

Which is good.  But it might be nice to have a separate mode that's
suitable for forward-propagating values into blocks of higher frequency,
along the lines of (b).  Does that sound reasonable?

[ Having said that, although I could try to write a patch along
  those lines, I'm not sure I could test it very well. ;(  I don't
  have access to things like SPEC. ]

Richard