From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-patches-return-294377-listarch-gcc-patches=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 26733 invoked by alias); 14 Jun 2011 15:21:46 -0000
Received: (qmail 26713 invoked by uid 22791); 14 Jun 2011 15:21:44 -0000
X-SWARE-Spam-Status: No, hits=-2.4 required=5.0	tests=AWL,BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,RFC_ABUSE_POST
X-Spam-Check-By: sourceware.org
Received: from mail-wy0-f175.google.com (HELO mail-wy0-f175.google.com) (74.125.82.175)    by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Tue, 14 Jun 2011 15:21:20 +0000
Received: by wye20 with SMTP id 20so4731977wye.20        for <multiple recipients>; Tue, 14 Jun 2011 08:21:19 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.227.55.67 with SMTP id t3mr6449842wbg.90.1308064878832; Tue, 14 Jun 2011 08:21:18 -0700 (PDT)
Received: by 10.227.28.69 with HTTP; Tue, 14 Jun 2011 08:21:18 -0700 (PDT)
In-Reply-To: <1308061098.4853.12.camel@oc2474580526.ibm.com>
References: <1307383631.4798.11.camel@L3G5336.ibm.com>	<BANLkTi=tVYvaheULX3CpXSHMmvfqeGwL9Q@mail.gmail.com>	<1307456077.4798.39.camel@L3G5336.ibm.com>	<BANLkTi=i8ApxtSKLrKSJA5MqPekkWBVmuw@mail.gmail.com>	<1307718680.2592.35.camel@gnopaine>	<BANLkTikP=+FA+FTKx0YAADsT1dyJWctC5w@mail.gmail.com>	<1308061098.4853.12.camel@oc2474580526.ibm.com>
Date: Tue, 14 Jun 2011 15:26:00 -0000
Message-ID: <BANLkTimhsgtNMVGZ2KX42Gm6u5kxKz+5NA@mail.gmail.com>
Subject: Re: [Design notes, RFC] Address-lowering prototype design (PR46556)
From: Richard Guenther <richard.guenther@gmail.com>
To: "William J. Schmidt" <wschmidt@linux.vnet.ibm.com>
Cc: gcc-patches@gcc.gnu.org, bergner@vnet.ibm.com, dje.gcc@gmail.com, 	steven@gcc.gnu.org, law@redhat.com
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
X-IsSubscribed: yes
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
X-SW-Source: 2011-06/txt/msg01062.txt.bz2

On Tue, Jun 14, 2011 at 4:18 PM, William J. Schmidt
<wschmidt@linux.vnet.ibm.com> wrote:
> On Tue, 2011-06-14 at 15:39 +0200, Richard Guenther wrote:
>> On Fri, Jun 10, 2011 at 5:11 PM, William J. Schmidt
>> <wschmidt@linux.vnet.ibm.com> wrote:
>> > On Tue, 2011-06-07 at 16:49 +0200, Richard Guenther wrote:
>> >> On Tue, Jun 7, 2011 at 4:14 PM, William J. Schmidt
>> >> <wschmidt@linux.vnet.ibm.com> wrote:
>> >
>> > <snip>
>> >
>> >> >> > Loss of aliasing information
>> >> >> > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D
>> >> >> > The most serious problem I've run into is degraded performance d=
ue to poorer
>> >> >> > instruction scheduling choices. =A0I tracked this down to
>> >> >> > alias.c:nonoverlapping_component_refs_p.
>> >> >> >
>> >> >> > This code proves that two memory accesses don't overlap by attem=
pting to prove
>> >> >> > that they access different fields of the same structure. =A0This=
 is done using
>> >> >> > the MEM_EXPRs of the two rtx's, which record the expression tree=
s that were
>> >> >> > translated into the rtx's during expand. =A0When address lowerin=
g is not
>> >> >> > present, a simple COMPONENT_REF will appear in the MEM_EXPR: =A0=
x.a, for
>> >> >> > example. =A0However, address lowering changes the simple COMPONE=
NT_REF into a
>> >> >> > [TARGET_]MEM_REF that is no longer necessarily identifiable as a=
 field
>> >> >> > reference. =A0Thus the aliasing machinery can no longer prove th=
at two such
>> >> >> > field references are disjoint.
>> >> >> >
>> >> >> > This has severe consequences for performance, and has to be deal=
t with if
>> >> >> > address lowering is to be successful.
>> >> >> >
>> >> >> > I've worked around this with an admittedly fragile solution; I'l=
l discuss the
>> >> >> > drawbacks below. =A0The idea is to construct a mapping from repl=
acement mem_refs
>> >> >> > to the original expressions that they replaced. =A0When a MEM_EX=
PR is being set
>> >> >> > during expand, we first look up the mem_ref in the mapping. =A0I=
f present, the
>> >> >> > MEM_EXPR is set to the original expression, rather than to the m=
em_ref. =A0This
>> >> >> > essentially duplicates the behavior in the absence of address lo=
wering.
>> >> >>
>> >> >> Ick. =A0We had this in the past via TMR_ORIGINAL which caused all =
sorts
>> >> >> of problems. =A0Removing it didn't cause much degradation because =
we now
>> >> >> preserve points-to information.
>> >> >>
>> >> >> Originally I played with lowering all memory accesses to MEM_REFs
>> >> >> (see the old mem-ref branch), and the loss of type-based alias
>> >> >> disambiguation was indeed an issue.
>> >> >>
>> >> >> But - I definitely do not like the idea of preserving something si=
milar
>> >> >> to TMR_ORIGINAL. =A0Instead we can try preserving some information
>> >> >> we derive from it. =A0We keep the original access type that we can=
 use
>> >> >> for TBAA but do not retain knowledge on whether the type of the
>> >> >> MEM_REF is valid for TBAA or if it is view-converted.
>> >> >
>> >> > Yes, I really don't like what I have at the moment, either. =A0I pu=
t it in
>> >> > place as a stopgap to let me proceed to look for other performance
>> >> > problems.
>> >> >
>> >> > The question is how we can infer useful information for TBAA from t=
he
>> >> > MEM_REFs and TMRs. =A0I poked at trying to identify types and offse=
ts from
>> >> > the MEM_EXPRs, but this ended up being useless; I had to constrain =
too
>> >> > many cases to maintain correctness, and couldn't prove the type
>> >> > information for the important cases in SPEC I was trying to address.
>> >> >
>> >> > Unfortunately, the whole design goes down the drain if we can't fin=
d a
>> >> > way to solve the TBAA issue. =A0The performance degradations are too
>> >> > costly.
>> >>
>> >> If you look at what basic TBAA the alias oracle performs then it boils
>> >> down to the fact that get_alias_set for a.b.c might end up using the
>> >> alias-set of the type of C but for MEM[&a + 4] it will use the alias =
set
>> >> of the type of a. =A0The tree alias-oracle extracts both alias sets, =
that
>> >> of the outermost valid type and that of the innermost as both are
>> >> equally useful. =A0But the MEM_REF (or TARGET_MEM_REF) tree
>> >> only have storage for one such alias-set. =A0Thus my idea at some poi=
nt
>> >> was to store the other one as well in some form. =A0It will not be
>> >> the full information (after all, the complete access path does provide
>> >> some extra information - see aliasing_component_refs_p).
>> >
>> > This is what concerns me. =A0TBAA information for the outer and inner
>> > components doesn't seem sufficient to provide what
>> > nonoverlapping_component_refs_p is currently able to prove. =A0The lat=
ter
>> > searches for a common RECORD_TYPE somewhere along the two access paths,
>> > and then disambiguates if the two associated referenced fields differ.
>> > For a simple case like "struct x { int a; int b; };", a and b have the
>> > same type and alias-set, so the alias-set information doesn't add
>> > anything. =A0It isn't sufficient alone for the disambiguation of x1.a =
=3D
>> > MEM_REF[&x1, 0] and x2.b =3D MEM_REF[&x2, 4].
>> >
>> > Obviously the offset is sufficient to disambiguate for this simple case
>> > with a common base type, but when the shared record types aren't at the
>> > outermost level, we can't detect whether it is.
>> >
>> > At the moment I don't see how we can avoid degradation unless we keep
>> > the full access path around somewhere, for [TARGET_]MEM_REFs built from
>> > COMPONENT_REFs. =A0I hope I'm wrong.
>>
>> You are not wrong. =A0But the question is, does it make a difference?
>>
>> Richard.
>
> Yes, it does. =A0This scenario occurs in 188.ammp, among others, and leads
> to a large degradation without the change. =A0The performance-critical
> loop in mm_fv_update_nonbon makes heavy use of indirect references to
> the ATOM structure that contains numerous float variables. =A0When the
> COMPONENT_REFs have been converted to MEM_REFs, the alias machinery can
> no longer disambiguate these, which constrains the scheduler. =A0The
> result of the poor scheduling (on PowerPC, at least) is a large increase
> in floating-point spill code in the critical loop.

As they appear in loops I wonder why IVOPTs doesn't already expose
this problem?

In general the answer to missed TBAA optimizations is of course
"make sure that PTA works", which usually means using LTO ...

I really really don't like preserving TBAA related information as trees.
Instead if we really think preserving access paths is a must (I have
already significantly reduced the number of preserved paths with
introducing MEM_REFs!) then we have to find a different representation.

I suppose you can turn the AMMP case into a (small) testcase that
illustrates the issue and allows easier analysis and test of fixes?

Richard.