[Design notes, RFC] Address-lowering prototype design (PR46556)

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [Design notes, RFC] Address-lowering prototype design (PR46556)
@ 2011-06-06 18:07 William J. Schmidt
  2011-06-07 10:07 ` Richard Guenther
  0 siblings, 1 reply; 11+ messages in thread
From: William J. Schmidt @ 2011-06-06 18:07 UTC (permalink / raw)
  To: gcc-patches; +Cc: bergner, dje.gcc, richard.guenther, steven, law

It's been some time since I last posted about the address lowering issue from
PR46556 (http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46556).  I've had a basic
prototype in place for some time, but I ran into some issues that initially
caused performance regressions; I've had to jump through several hoops to
resolve these.  At this time, I'm now seeing decent performance improvements
from address lowering for PowerPC (about 2% overall on SPEC cpu2000).  A
couple of minor degradations still occur that I am looking into.

I thought this would be a good time to put out another RFC.  However, the code
has grown over time, and I doubt whether all you busy people are particularly
interested in reviewing a 2700-line prototype patch in detail.  Instead, I've
written up some notes on the design, some of the challenges I ran into, and my
current approach to handling those.

I'd like to first get comments from interested parties on the prototype
design, and get consensus on whether this is the appropriate overall
approach.  If so, the next step would be to decide how to go about reviewing
the code.  Since smaller patches are much easier to review, one possibility
would be for me to gate the new pass on a flag that's off by default, and
submit patches incrementally for review.  I.e., start with the bare-bones
algorithm, and gradually add in the complexities as folks have time to review
the code.  I'm quite new to the community, so any advice on how to handle
this would be very welcome.

In any case, here are my notes.  Thanks in advance for your views!

Bill

Basic algorithm
===============
A new pass is added late in the gimple phases, currently between
pass_phi_only_cprop and pass_cd_dce.  I chose this placement because the new
pass can generate a fair amount of dead code.

Address lowering walks the CFG in predominator order.  Common subexpression
elimination is performed on operations that could participate in addressing
expressions.  These operations may have been made available in the current
block, or in any dominating block.

Any statement that may contain a memory access (excluding calls) is considered
for address lowering.  The memory access is expanded into an affine
combination of trees by looking back at feeding statements, using logic
originally developed for IVopts.  The affine combination is subjected to a
number of tests to determine whether it's legal and profitable to lower the
addressing expression into a TARGET_MEM_REF (or MEM_REF, when addressing is
sufficiently simple).  If so, the conversion to a TMR is performed, again
using logic originally developed for IVopts.

The resulting memory reference is again subjected to a number of conditions
testing for legality and profitability.  If these conditions are all met, the
reference data is copied from the original expression to the TMR, and the TMR
replaces the original expression in the gimple statement.

The conversion of a memory access into a TMR generally creates a number of new
gimple statements preceding the statement under analysis.  CSE is performed on
the fly for these new statements as well, using the available expressions data
discussed earlier.  This produces naturally shared addressability for members
of the same structure, for example.

Inhibitors
==========
As mentioned, a number of conditions are checked before replacing a memory
access with a TMR.  Conditions that inhibit replacement include the following.

 * There is not yet any handling for bitfields.

 * An expression containing a direct field reference from a small nonvolatile
   structure is left alone.  It is assumed that these values will be
   register-assigned, so exposing their addresses is counterproductive.

 * I have seen odd cases in C++ code such as 

     lhs = MEM[(struct foo *) 0B].bar;

   This results in an affine combination of trees with no elements, so an
   explicit check for such empty combinations is required.

 * If the affine combination of trees introduces an ADDR_EXPR for any decl
   that is not already marked as addressable, we refuse to make the
   replacement.  Taking the address of something for the first time defeats
   downstream optimizations that would otherwise be possible.

 * create_mem_ref can occasionally produce a TMR or MEM_REF with a base
   expression of constant zero.  One way this can happen is with questionable
   coding practices, such as treating an integer paramter as a pointer.  It's
   an open question whether create_mem_ref should ever produce such a tree.
   In any case, the base expression of zero again confuses the may-aliasing
   machinery, so I refuse the replacement in this case.

 * If the original expression will be recognized as a "hidden global store" in
   tree-ssa-sink.c:is_hidden_global_store, but the replacement expression will
   not, it is possible for the dead code eliminator to remove the modified
   statement.  It seems to me this shouldn't normally happen in practice.  For
   now I detect this case and refuse to do the replacement, but I suspect a
   possible front-end or upstream-optimization problem here.  The test case
   that fails here is libstdc++-v3/testsuite/23_containers/vector/
   ext_pointer_modifiers/insert.cc.  More investigation required.

Probably no longer inhibitors
=============================
I currently am checking for some cases that I think may no longer be
necessary, following this change:

------------------------------------------------------------------------
r174282 | rguenth | 2011-05-26 08:01:48 -0500 (Thu, 26 May 2011) | 10 lines

2011-05-26  Richard Guenther  <rguenther@suse.de>

        PR tree-optimization/48702
        * tree-ssa-address.c (create_mem_ref_raw): Create MEM_REFs
        only when we know the base address is within bounds.
        * tree-ssa-alias.c (indirect_ref_may_alias_decl_p): Do not
        assume the base address of TARGET_MEM_REFs is in bounds.

        * gcc.dg/torture/pr48702.c: New testcase.

------------------------------------------------------------------------

I am in process of checking whether removing these checks has any untoward
consequences:

 * There are several ways in which the affine combination of trees can contain
   subexpressions with negative coefficients.  This is evil and must be
   avoided, since it can result in a memory reference whose base address
   pointer lies outside the object it references.  The may-aliasing machinery
   cannot currently handle this.  So we refuse to make the replacement when a
   negative coefficient is detected.

 * Earlier induction variable optimization may also cause us to end up with a
   base address below the beginning of an object.  (Ivars are often initialized
   to the base of element -1 outside of a loop, and incremented at the top of
   the loop.  Expanding into the affine combination of trees collapses this to
   produce a base expression pointing outside the object.)  I avoid this by
   checking whether the base expression is an SSA name for an ivar that is
   directly or transitively defined by a PHI expression.

Zero-offset memory references
=============================
Following is a fairly common pattern that results from address lowering with
CSE:

      <bb 2>:
        D.2154_8 = n_2(D) * 4;
        D.2107_3 = MEM[base: p_1(D), index: D.2154_8, offset: 0B];
	D.2156_10 = p_1(D) + D.2154_8;
	D.2108_4 = MEM[(struct x *)D.2156_10 + 128B];
	D.2109_5 = MEM[(struct x *)D.2156_10 + 64B];
	foo (D.2107_3, D.2108_4, D.2109_5);
	return;

When the addressing expression consists of two tree addends and a non-zero
offset, the addends are pre-added and used as the base expression, with the
offset becoming the immediate operand in the storage op.  When the offset is
zero, the base and index are added implicitly in an indexed-form storage op.

The bias to the indexed form with a zero offset is a good choice, in absence
of knowledge of the surrounding code; it saves an instruction and a dependency
with a possible small increase in register pressure.  As can be seen here,
though, we end up with a missed commoning opportunity, since the second
instruction could become: 

       D.2107_3 = MEM[(struct x *)D.2156_10 + 0B];

Since the add into D.2156_10 is going to occur anyway, the advantage of the
bias to indexed form vanishes.  Instead, it leads to an increase in register
pressure.  This is more noticeable as expressions are CSEd across blocks. 

As the example shows, when the zero-offset memory reference is first
encountered, the sum of its base and index components may not be a known
available expression.  To find all opportunities where local CSE is possible
for these, the zero-offset expressions are recorded during the basic
algorithm.  At the end of each block, the zero-offset expressions are
post-processed.  If the opportunity exists, the base is replaced with the SSA
representative of the sum, and the index is changed to constant zero.  If
necessary, the definition of the SSA representative of the sum is moved ahead
of the zero-offset expression.  This is safe since both addends are known to
be available there.

Challenge with zero-offset memory references
============================================
Unfortunately, when I first implemented this post-processing, I didn't see any
change in code generation.  Upon investigation, it turned out that fwprop.c
was undoing this optimization in forward_propagate_and_simplify.  To
circumvent this, I ended up adding some extra code to identify zero-offset
memory references that fit the pattern of the example (only in RTL).  I
constrained fwprop.c from propagating into these as locally poor
replacements.

I recognize this might be a controversial choice, and I'm open to other
suggestions.  But I found CSEing these to be a useful optimization that makes
a difference in common situations, so I didn't want to lose it.

Loss of aliasing information
============================
The most serious problem I've run into is degraded performance due to poorer
instruction scheduling choices.  I tracked this down to 
alias.c:nonoverlapping_component_refs_p.

This code proves that two memory accesses don't overlap by attempting to prove
that they access different fields of the same structure.  This is done using
the MEM_EXPRs of the two rtx's, which record the expression trees that were
translated into the rtx's during expand.  When address lowering is not
present, a simple COMPONENT_REF will appear in the MEM_EXPR:  x.a, for
example.  However, address lowering changes the simple COMPONENT_REF into a
[TARGET_]MEM_REF that is no longer necessarily identifiable as a field
reference.  Thus the aliasing machinery can no longer prove that two such
field references are disjoint.

This has severe consequences for performance, and has to be dealt with if
address lowering is to be successful.

I've worked around this with an admittedly fragile solution; I'll discuss the
drawbacks below.  The idea is to construct a mapping from replacement mem_refs
to the original expressions that they replaced.  When a MEM_EXPR is being set
during expand, we first look up the mem_ref in the mapping.  If present, the
MEM_EXPR is set to the original expression, rather than to the mem_ref.  This
essentially duplicates the behavior in the absence of address lowering.

This appears to work well in practice today.  However, it relies on there
being no optimizations between the address lowering pass and expand that
modify [TARGET_]MEM_REFs.  Dead store elimination may remove some, which is
fine, of course.  But if a new pass was to be added that modified operands of
mem_refs after address lowering, code generation would be quietly degraded.
We would need some specific scan tests in the test suite to check for this.

Another (less serious) problem is having to introduce a data structure that
needs to be maintained beyond the lifetime of its associated pass.  It's a bit
of a wart to have this side table that must be kept around until expand is
finished.  It's workable, but not very clean.

As an alternative, I went looking for ways to carry the original expression
along with the mem_ref in the remaining gimple phases; but I wasn't able to
find a good way to do that.  Perhaps others can suggest something.

As noted, I recognize this is a fragile solution, and I'm very interested in
other ideas for keeping the necessary information around until it's needed.

Open issues
===========
CSE of addressing expressions across blocks (or within very large blocks) can
raise register pressure and potentially introduce spill code.  We may want
some heuristics to constrain the distance across which expressions are
considered available.

I am investigating small degradations in 177.mesa, 187.facerec, 300.twolf, and
164.gzip.

Performance testing on other platforms is needed.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Design notes, RFC] Address-lowering prototype design (PR46556)
  2011-06-06 18:07 [Design notes, RFC] Address-lowering prototype design (PR46556) William J. Schmidt
@ 2011-06-07 10:07 ` Richard Guenther
  2011-06-07 14:33   ` William J. Schmidt
  0 siblings, 1 reply; 11+ messages in thread
From: Richard Guenther @ 2011-06-07 10:07 UTC (permalink / raw)
  To: William J. Schmidt; +Cc: gcc-patches, bergner, dje.gcc, steven, law

On Mon, Jun 6, 2011 at 8:07 PM, William J. Schmidt
<wschmidt@linux.vnet.ibm.com> wrote:
> It's been some time since I last posted about the address lowering issue from
> PR46556 (http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46556).  I've had a basic
> prototype in place for some time, but I ran into some issues that initially
> caused performance regressions; I've had to jump through several hoops to
> resolve these.  At this time, I'm now seeing decent performance improvements
> from address lowering for PowerPC (about 2% overall on SPEC cpu2000).  A
> couple of minor degradations still occur that I am looking into.
>
> I thought this would be a good time to put out another RFC.  However, the code
> has grown over time, and I doubt whether all you busy people are particularly
> interested in reviewing a 2700-line prototype patch in detail.  Instead, I've
> written up some notes on the design, some of the challenges I ran into, and my
> current approach to handling those.
>
> I'd like to first get comments from interested parties on the prototype
> design, and get consensus on whether this is the appropriate overall
> approach.  If so, the next step would be to decide how to go about reviewing
> the code.  Since smaller patches are much easier to review, one possibility
> would be for me to gate the new pass on a flag that's off by default, and
> submit patches incrementally for review.  I.e., start with the bare-bones
> algorithm, and gradually add in the complexities as folks have time to review
> the code.  I'm quite new to the community, so any advice on how to handle
> this would be very welcome.
>
> In any case, here are my notes.  Thanks in advance for your views!
>
> Bill
>
>
> Basic algorithm
> ===============
> A new pass is added late in the gimple phases, currently between
> pass_phi_only_cprop and pass_cd_dce.  I chose this placement because the new
> pass can generate a fair amount of dead code.
>
> Address lowering walks the CFG in predominator order.  Common subexpression
> elimination is performed on operations that could participate in addressing
> expressions.  These operations may have been made available in the current
> block, or in any dominating block.
>
> Any statement that may contain a memory access (excluding calls) is considered
> for address lowering.  The memory access is expanded into an affine
> combination of trees by looking back at feeding statements, using logic
> originally developed for IVopts.  The affine combination is subjected to a
> number of tests to determine whether it's legal and profitable to lower the
> addressing expression into a TARGET_MEM_REF (or MEM_REF, when addressing is
> sufficiently simple).  If so, the conversion to a TMR is performed, again
> using logic originally developed for IVopts.
>
> The resulting memory reference is again subjected to a number of conditions
> testing for legality and profitability.  If these conditions are all met, the
> reference data is copied from the original expression to the TMR, and the TMR
> replaces the original expression in the gimple statement.
>
> The conversion of a memory access into a TMR generally creates a number of new
> gimple statements preceding the statement under analysis.  CSE is performed on
> the fly for these new statements as well, using the available expressions data
> discussed earlier.  This produces naturally shared addressability for members
> of the same structure, for example.
>
> Inhibitors
> ==========
> As mentioned, a number of conditions are checked before replacing a memory
> access with a TMR.  Conditions that inhibit replacement include the following.
>
>  * There is not yet any handling for bitfields.
>
>  * An expression containing a direct field reference from a small nonvolatile
>   structure is left alone.  It is assumed that these values will be
>   register-assigned, so exposing their addresses is counterproductive.
>
>  * I have seen odd cases in C++ code such as
>
>     lhs = MEM[(struct foo *) 0B].bar;
>
>   This results in an affine combination of trees with no elements, so an
>   explicit check for such empty combinations is required.
>
>  * If the affine combination of trees introduces an ADDR_EXPR for any decl
>   that is not already marked as addressable, we refuse to make the
>   replacement.  Taking the address of something for the first time defeats
>   downstream optimizations that would otherwise be possible.
>
>  * create_mem_ref can occasionally produce a TMR or MEM_REF with a base
>   expression of constant zero.  One way this can happen is with questionable
>   coding practices, such as treating an integer paramter as a pointer.  It's
>   an open question whether create_mem_ref should ever produce such a tree.
>   In any case, the base expression of zero again confuses the may-aliasing
>   machinery, so I refuse the replacement in this case.
>
>  * If the original expression will be recognized as a "hidden global store" in
>   tree-ssa-sink.c:is_hidden_global_store, but the replacement expression will
>   not, it is possible for the dead code eliminator to remove the modified
>   statement.  It seems to me this shouldn't normally happen in practice.  For
>   now I detect this case and refuse to do the replacement, but I suspect a
>   possible front-end or upstream-optimization problem here.  The test case
>   that fails here is libstdc++-v3/testsuite/23_containers/vector/
>   ext_pointer_modifiers/insert.cc.  More investigation required.

That indeed sounds odd.

>
> Probably no longer inhibitors
> =============================
> I currently am checking for some cases that I think may no longer be
> necessary, following this change:
>
> ------------------------------------------------------------------------
> r174282 | rguenth | 2011-05-26 08:01:48 -0500 (Thu, 26 May 2011) | 10 lines
>
> 2011-05-26  Richard Guenther  <rguenther@suse.de>
>
>        PR tree-optimization/48702
>        * tree-ssa-address.c (create_mem_ref_raw): Create MEM_REFs
>        only when we know the base address is within bounds.
>        * tree-ssa-alias.c (indirect_ref_may_alias_decl_p): Do not
>        assume the base address of TARGET_MEM_REFs is in bounds.
>
>        * gcc.dg/torture/pr48702.c: New testcase.
>
> ------------------------------------------------------------------------
>
> I am in process of checking whether removing these checks has any untoward
> consequences:
>
>  * There are several ways in which the affine combination of trees can contain
>   subexpressions with negative coefficients.  This is evil and must be
>   avoided, since it can result in a memory reference whose base address
>   pointer lies outside the object it references.  The may-aliasing machinery
>   cannot currently handle this.  So we refuse to make the replacement when a
>   negative coefficient is detected.
>
>  * Earlier induction variable optimization may also cause us to end up with a
>   base address below the beginning of an object.  (Ivars are often initialized
>   to the base of element -1 outside of a loop, and incremented at the top of
>   the loop.  Expanding into the affine combination of trees collapses this to
>   produce a base expression pointing outside the object.)  I avoid this by
>   checking whether the base expression is an SSA name for an ivar that is
>   directly or transitively defined by a PHI expression.
>
> Zero-offset memory references
> =============================
> Following is a fairly common pattern that results from address lowering with
> CSE:
>
>      <bb 2>:
>        D.2154_8 = n_2(D) * 4;
>        D.2107_3 = MEM[base: p_1(D), index: D.2154_8, offset: 0B];
>        D.2156_10 = p_1(D) + D.2154_8;
>        D.2108_4 = MEM[(struct x *)D.2156_10 + 128B];
>        D.2109_5 = MEM[(struct x *)D.2156_10 + 64B];
>        foo (D.2107_3, D.2108_4, D.2109_5);
>        return;
>
> When the addressing expression consists of two tree addends and a non-zero
> offset, the addends are pre-added and used as the base expression, with the
> offset becoming the immediate operand in the storage op.  When the offset is
> zero, the base and index are added implicitly in an indexed-form storage op.
>
> The bias to the indexed form with a zero offset is a good choice, in absence
> of knowledge of the surrounding code; it saves an instruction and a dependency
> with a possible small increase in register pressure.  As can be seen here,
> though, we end up with a missed commoning opportunity, since the second
> instruction could become:
>
>       D.2107_3 = MEM[(struct x *)D.2156_10 + 0B];
>
> Since the add into D.2156_10 is going to occur anyway, the advantage of the
> bias to indexed form vanishes.  Instead, it leads to an increase in register
> pressure.  This is more noticeable as expressions are CSEd across blocks.
>
> As the example shows, when the zero-offset memory reference is first
> encountered, the sum of its base and index components may not be a known
> available expression.  To find all opportunities where local CSE is possible
> for these, the zero-offset expressions are recorded during the basic
> algorithm.  At the end of each block, the zero-offset expressions are
> post-processed.  If the opportunity exists, the base is replaced with the SSA
> representative of the sum, and the index is changed to constant zero.  If
> necessary, the definition of the SSA representative of the sum is moved ahead
> of the zero-offset expression.  This is safe since both addends are known to
> be available there.
>
> Challenge with zero-offset memory references
> ============================================
> Unfortunately, when I first implemented this post-processing, I didn't see any
> change in code generation.  Upon investigation, it turned out that fwprop.c
> was undoing this optimization in forward_propagate_and_simplify.  To
> circumvent this, I ended up adding some extra code to identify zero-offset
> memory references that fit the pattern of the example (only in RTL).  I
> constrained fwprop.c from propagating into these as locally poor
> replacements.
>
> I recognize this might be a controversial choice, and I'm open to other
> suggestions.  But I found CSEing these to be a useful optimization that makes
> a difference in common situations, so I didn't want to lose it.
>
> Loss of aliasing information
> ============================
> The most serious problem I've run into is degraded performance due to poorer
> instruction scheduling choices.  I tracked this down to
> alias.c:nonoverlapping_component_refs_p.
>
> This code proves that two memory accesses don't overlap by attempting to prove
> that they access different fields of the same structure.  This is done using
> the MEM_EXPRs of the two rtx's, which record the expression trees that were
> translated into the rtx's during expand.  When address lowering is not
> present, a simple COMPONENT_REF will appear in the MEM_EXPR:  x.a, for
> example.  However, address lowering changes the simple COMPONENT_REF into a
> [TARGET_]MEM_REF that is no longer necessarily identifiable as a field
> reference.  Thus the aliasing machinery can no longer prove that two such
> field references are disjoint.
>
> This has severe consequences for performance, and has to be dealt with if
> address lowering is to be successful.
>
> I've worked around this with an admittedly fragile solution; I'll discuss the
> drawbacks below.  The idea is to construct a mapping from replacement mem_refs
> to the original expressions that they replaced.  When a MEM_EXPR is being set
> during expand, we first look up the mem_ref in the mapping.  If present, the
> MEM_EXPR is set to the original expression, rather than to the mem_ref.  This
> essentially duplicates the behavior in the absence of address lowering.

Ick.  We had this in the past via TMR_ORIGINAL which caused all sorts
of problems.  Removing it didn't cause much degradation because we now
preserve points-to information.

Originally I played with lowering all memory accesses to MEM_REFs
(see the old mem-ref branch), and the loss of type-based alias
disambiguation was indeed an issue.

But - I definitely do not like the idea of preserving something similar
to TMR_ORIGINAL.  Instead we can try preserving some information
we derive from it.  We keep the original access type that we can use
for TBAA but do not retain knowledge on whether the type of the
MEM_REF is valid for TBAA or if it is view-converted.

> This appears to work well in practice today.  However, it relies on there
> being no optimizations between the address lowering pass and expand that
> modify [TARGET_]MEM_REFs.  Dead store elimination may remove some, which is
> fine, of course.  But if a new pass was to be added that modified operands of
> mem_refs after address lowering, code generation would be quietly degraded.
> We would need some specific scan tests in the test suite to check for this.
>
> Another (less serious) problem is having to introduce a data structure that
> needs to be maintained beyond the lifetime of its associated pass.  It's a bit
> of a wart to have this side table that must be kept around until expand is
> finished.  It's workable, but not very clean.
>
> As an alternative, I went looking for ways to carry the original expression
> along with the mem_ref in the remaining gimple phases; but I wasn't able to
> find a good way to do that.  Perhaps others can suggest something.
>
> As noted, I recognize this is a fragile solution, and I'm very interested in
> other ideas for keeping the necessary information around until it's needed.
>
> Open issues
> ===========
> CSE of addressing expressions across blocks (or within very large blocks) can
> raise register pressure and potentially introduce spill code.  We may want
> some heuristics to constrain the distance across which expressions are
> considered available.
>
> I am investigating small degradations in 177.mesa, 187.facerec, 300.twolf, and
> 164.gzip.
>
> Performance testing on other platforms is needed.
>
>
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Design notes, RFC] Address-lowering prototype design (PR46556)
  2011-06-07 10:07 ` Richard Guenther
@ 2011-06-07 14:33   ` William J. Schmidt
  2011-06-07 14:49     ` Richard Guenther
  0 siblings, 1 reply; 11+ messages in thread
From: William J. Schmidt @ 2011-06-07 14:33 UTC (permalink / raw)
  To: Richard Guenther; +Cc: gcc-patches, bergner, dje.gcc, steven, law


On Tue, 2011-06-07 at 12:06 +0200, Richard Guenther wrote:
> On Mon, Jun 6, 2011 at 8:07 PM, William J. Schmidt
> <wschmidt@linux.vnet.ibm.com> wrote:

<snip>

> >  * If the original expression will be recognized as a "hidden global store" in
> >   tree-ssa-sink.c:is_hidden_global_store, but the replacement expression will
> >   not, it is possible for the dead code eliminator to remove the modified
> >   statement.  It seems to me this shouldn't normally happen in practice.  For
> >   now I detect this case and refuse to do the replacement, but I suspect a
> >   possible front-end or upstream-optimization problem here.  The test case
> >   that fails here is libstdc++-v3/testsuite/23_containers/vector/
> >   ext_pointer_modifiers/insert.cc.  More investigation required.
> 
> That indeed sounds odd.

When I looked into it, the addressing expression was fairly complex
initially, with templates and inheritance involved.  The code to create
the affine tree combination was able to collapse a great deal of the
arithmetic and produce something much simpler that no longer referenced
the item that had made it look like a global store originally.  (I.e.,
buried in the expression were an (&x + a) and a -(&x + a) that cancelled
out and removed all traces of x.)  It appeared to me that this was being
done correctly, but it was a very complex expression and I might have
missed something.

The result was that essentially the entire procedure ended up going
dead.  It's possible that some information was either lost or not
provided by the front end that should have prevented that.

<snip>

> > Loss of aliasing information
> > ============================
> > The most serious problem I've run into is degraded performance due to poorer
> > instruction scheduling choices.  I tracked this down to
> > alias.c:nonoverlapping_component_refs_p.
> >
> > This code proves that two memory accesses don't overlap by attempting to prove
> > that they access different fields of the same structure.  This is done using
> > the MEM_EXPRs of the two rtx's, which record the expression trees that were
> > translated into the rtx's during expand.  When address lowering is not
> > present, a simple COMPONENT_REF will appear in the MEM_EXPR:  x.a, for
> > example.  However, address lowering changes the simple COMPONENT_REF into a
> > [TARGET_]MEM_REF that is no longer necessarily identifiable as a field
> > reference.  Thus the aliasing machinery can no longer prove that two such
> > field references are disjoint.
> >
> > This has severe consequences for performance, and has to be dealt with if
> > address lowering is to be successful.
> >
> > I've worked around this with an admittedly fragile solution; I'll discuss the
> > drawbacks below.  The idea is to construct a mapping from replacement mem_refs
> > to the original expressions that they replaced.  When a MEM_EXPR is being set
> > during expand, we first look up the mem_ref in the mapping.  If present, the
> > MEM_EXPR is set to the original expression, rather than to the mem_ref.  This
> > essentially duplicates the behavior in the absence of address lowering.
> 
> Ick.  We had this in the past via TMR_ORIGINAL which caused all sorts
> of problems.  Removing it didn't cause much degradation because we now
> preserve points-to information.
> 
> Originally I played with lowering all memory accesses to MEM_REFs
> (see the old mem-ref branch), and the loss of type-based alias
> disambiguation was indeed an issue.
> 
> But - I definitely do not like the idea of preserving something similar
> to TMR_ORIGINAL.  Instead we can try preserving some information
> we derive from it.  We keep the original access type that we can use
> for TBAA but do not retain knowledge on whether the type of the
> MEM_REF is valid for TBAA or if it is view-converted.

Yes, I really don't like what I have at the moment, either.  I put it in
place as a stopgap to let me proceed to look for other performance
problems.

The question is how we can infer useful information for TBAA from the
MEM_REFs and TMRs.  I poked at trying to identify types and offsets from
the MEM_EXPRs, but this ended up being useless; I had to constrain too
many cases to maintain correctness, and couldn't prove the type
information for the important cases in SPEC I was trying to address.

Unfortunately, the whole design goes down the drain if we can't find a
way to solve the TBAA issue.  The performance degradations are too
costly.

<snip>

Thanks,
Bill

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Design notes, RFC] Address-lowering prototype design (PR46556)
  2011-06-07 14:33   ` William J. Schmidt
@ 2011-06-07 14:49     ` Richard Guenther
  2011-06-10 15:47       ` William J. Schmidt
  0 siblings, 1 reply; 11+ messages in thread
From: Richard Guenther @ 2011-06-07 14:49 UTC (permalink / raw)
  To: William J. Schmidt; +Cc: gcc-patches, bergner, dje.gcc, steven, law

On Tue, Jun 7, 2011 at 4:14 PM, William J. Schmidt
<wschmidt@linux.vnet.ibm.com> wrote:
>
> On Tue, 2011-06-07 at 12:06 +0200, Richard Guenther wrote:
>> On Mon, Jun 6, 2011 at 8:07 PM, William J. Schmidt
>> <wschmidt@linux.vnet.ibm.com> wrote:
>
> <snip>
>
>> >  * If the original expression will be recognized as a "hidden global store" in
>> >   tree-ssa-sink.c:is_hidden_global_store, but the replacement expression will
>> >   not, it is possible for the dead code eliminator to remove the modified
>> >   statement.  It seems to me this shouldn't normally happen in practice.  For
>> >   now I detect this case and refuse to do the replacement, but I suspect a
>> >   possible front-end or upstream-optimization problem here.  The test case
>> >   that fails here is libstdc++-v3/testsuite/23_containers/vector/
>> >   ext_pointer_modifiers/insert.cc.  More investigation required.
>>
>> That indeed sounds odd.
>
> When I looked into it, the addressing expression was fairly complex
> initially, with templates and inheritance involved.  The code to create
> the affine tree combination was able to collapse a great deal of the
> arithmetic and produce something much simpler that no longer referenced
> the item that had made it look like a global store originally.  (I.e.,
> buried in the expression were an (&x + a) and a -(&x + a) that cancelled
> out and removed all traces of x.)  It appeared to me that this was being
> done correctly, but it was a very complex expression and I might have
> missed something.
>
> The result was that essentially the entire procedure ended up going
> dead.  It's possible that some information was either lost or not
> provided by the front end that should have prevented that.
>
> <snip>
>
>> > Loss of aliasing information
>> > ============================
>> > The most serious problem I've run into is degraded performance due to poorer
>> > instruction scheduling choices.  I tracked this down to
>> > alias.c:nonoverlapping_component_refs_p.
>> >
>> > This code proves that two memory accesses don't overlap by attempting to prove
>> > that they access different fields of the same structure.  This is done using
>> > the MEM_EXPRs of the two rtx's, which record the expression trees that were
>> > translated into the rtx's during expand.  When address lowering is not
>> > present, a simple COMPONENT_REF will appear in the MEM_EXPR:  x.a, for
>> > example.  However, address lowering changes the simple COMPONENT_REF into a
>> > [TARGET_]MEM_REF that is no longer necessarily identifiable as a field
>> > reference.  Thus the aliasing machinery can no longer prove that two such
>> > field references are disjoint.
>> >
>> > This has severe consequences for performance, and has to be dealt with if
>> > address lowering is to be successful.
>> >
>> > I've worked around this with an admittedly fragile solution; I'll discuss the
>> > drawbacks below.  The idea is to construct a mapping from replacement mem_refs
>> > to the original expressions that they replaced.  When a MEM_EXPR is being set
>> > during expand, we first look up the mem_ref in the mapping.  If present, the
>> > MEM_EXPR is set to the original expression, rather than to the mem_ref.  This
>> > essentially duplicates the behavior in the absence of address lowering.
>>
>> Ick.  We had this in the past via TMR_ORIGINAL which caused all sorts
>> of problems.  Removing it didn't cause much degradation because we now
>> preserve points-to information.
>>
>> Originally I played with lowering all memory accesses to MEM_REFs
>> (see the old mem-ref branch), and the loss of type-based alias
>> disambiguation was indeed an issue.
>>
>> But - I definitely do not like the idea of preserving something similar
>> to TMR_ORIGINAL.  Instead we can try preserving some information
>> we derive from it.  We keep the original access type that we can use
>> for TBAA but do not retain knowledge on whether the type of the
>> MEM_REF is valid for TBAA or if it is view-converted.
>
> Yes, I really don't like what I have at the moment, either.  I put it in
> place as a stopgap to let me proceed to look for other performance
> problems.
>
> The question is how we can infer useful information for TBAA from the
> MEM_REFs and TMRs.  I poked at trying to identify types and offsets from
> the MEM_EXPRs, but this ended up being useless; I had to constrain too
> many cases to maintain correctness, and couldn't prove the type
> information for the important cases in SPEC I was trying to address.
>
> Unfortunately, the whole design goes down the drain if we can't find a
> way to solve the TBAA issue.  The performance degradations are too
> costly.

If you look at what basic TBAA the alias oracle performs then it boils
down to the fact that get_alias_set for a.b.c might end up using the
alias-set of the type of C but for MEM[&a + 4] it will use the alias set
of the type of a.  The tree alias-oracle extracts both alias sets, that
of the outermost valid type and that of the innermost as both are
equally useful.  But the MEM_REF (or TARGET_MEM_REF) tree
only have storage for one such alias-set.  Thus my idea at some point
was to store the other one as well in some form.  It will not be
the full information (after all, the complete access path does provide
some extra information - see aliasing_component_refs_p).

Btw, I'm looking at lowering bitfield accesses to read-modify-write
cycles and in that context also to lowering unaligned accesses
for targets that do not support them.

Richard.

> <snip>
>
> Thanks,
> Bill
>
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Design notes, RFC] Address-lowering prototype design (PR46556)
  2011-06-07 14:49     ` Richard Guenther
@ 2011-06-10 15:47       ` William J. Schmidt
  2011-06-14 13:58         ` Richard Guenther
  0 siblings, 1 reply; 11+ messages in thread
From: William J. Schmidt @ 2011-06-10 15:47 UTC (permalink / raw)
  To: Richard Guenther; +Cc: gcc-patches, bergner, dje.gcc, steven, law

On Tue, 2011-06-07 at 16:49 +0200, Richard Guenther wrote:
> On Tue, Jun 7, 2011 at 4:14 PM, William J. Schmidt
> <wschmidt@linux.vnet.ibm.com> wrote:

<snip>

> >> > Loss of aliasing information
> >> > ============================
> >> > The most serious problem I've run into is degraded performance due to poorer
> >> > instruction scheduling choices.  I tracked this down to
> >> > alias.c:nonoverlapping_component_refs_p.
> >> >
> >> > This code proves that two memory accesses don't overlap by attempting to prove
> >> > that they access different fields of the same structure.  This is done using
> >> > the MEM_EXPRs of the two rtx's, which record the expression trees that were
> >> > translated into the rtx's during expand.  When address lowering is not
> >> > present, a simple COMPONENT_REF will appear in the MEM_EXPR:  x.a, for
> >> > example.  However, address lowering changes the simple COMPONENT_REF into a
> >> > [TARGET_]MEM_REF that is no longer necessarily identifiable as a field
> >> > reference.  Thus the aliasing machinery can no longer prove that two such
> >> > field references are disjoint.
> >> >
> >> > This has severe consequences for performance, and has to be dealt with if
> >> > address lowering is to be successful.
> >> >
> >> > I've worked around this with an admittedly fragile solution; I'll discuss the
> >> > drawbacks below.  The idea is to construct a mapping from replacement mem_refs
> >> > to the original expressions that they replaced.  When a MEM_EXPR is being set
> >> > during expand, we first look up the mem_ref in the mapping.  If present, the
> >> > MEM_EXPR is set to the original expression, rather than to the mem_ref.  This
> >> > essentially duplicates the behavior in the absence of address lowering.
> >>
> >> Ick.  We had this in the past via TMR_ORIGINAL which caused all sorts
> >> of problems.  Removing it didn't cause much degradation because we now
> >> preserve points-to information.
> >>
> >> Originally I played with lowering all memory accesses to MEM_REFs
> >> (see the old mem-ref branch), and the loss of type-based alias
> >> disambiguation was indeed an issue.
> >>
> >> But - I definitely do not like the idea of preserving something similar
> >> to TMR_ORIGINAL.  Instead we can try preserving some information
> >> we derive from it.  We keep the original access type that we can use
> >> for TBAA but do not retain knowledge on whether the type of the
> >> MEM_REF is valid for TBAA or if it is view-converted.
> >
> > Yes, I really don't like what I have at the moment, either.  I put it in
> > place as a stopgap to let me proceed to look for other performance
> > problems.
> >
> > The question is how we can infer useful information for TBAA from the
> > MEM_REFs and TMRs.  I poked at trying to identify types and offsets from
> > the MEM_EXPRs, but this ended up being useless; I had to constrain too
> > many cases to maintain correctness, and couldn't prove the type
> > information for the important cases in SPEC I was trying to address.
> >
> > Unfortunately, the whole design goes down the drain if we can't find a
> > way to solve the TBAA issue.  The performance degradations are too
> > costly.
> 
> If you look at what basic TBAA the alias oracle performs then it boils
> down to the fact that get_alias_set for a.b.c might end up using the
> alias-set of the type of C but for MEM[&a + 4] it will use the alias set
> of the type of a.  The tree alias-oracle extracts both alias sets, that
> of the outermost valid type and that of the innermost as both are
> equally useful.  But the MEM_REF (or TARGET_MEM_REF) tree
> only have storage for one such alias-set.  Thus my idea at some point
> was to store the other one as well in some form.  It will not be
> the full information (after all, the complete access path does provide
> some extra information - see aliasing_component_refs_p).

This is what concerns me.  TBAA information for the outer and inner
components doesn't seem sufficient to provide what
nonoverlapping_component_refs_p is currently able to prove.  The latter
searches for a common RECORD_TYPE somewhere along the two access paths,
and then disambiguates if the two associated referenced fields differ.
For a simple case like "struct x { int a; int b; };", a and b have the
same type and alias-set, so the alias-set information doesn't add
anything.  It isn't sufficient alone for the disambiguation of x1.a =
MEM_REF[&x1, 0] and x2.b = MEM_REF[&x2, 4].

Obviously the offset is sufficient to disambiguate for this simple case
with a common base type, but when the shared record types aren't at the
outermost level, we can't detect whether it is.

At the moment I don't see how we can avoid degradation unless we keep
the full access path around somewhere, for [TARGET_]MEM_REFs built from
COMPONENT_REFs.  I hope I'm wrong.

> 
> Btw, I'm looking at lowering bitfield accesses to read-modify-write
> cycles and in that context also to lowering unaligned accesses
> for targets that do not support them.
> 
> Richard.
> 
> > <snip>
> >
> > Thanks,
> > Bill
> >
> >


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Design notes, RFC] Address-lowering prototype design (PR46556)
  2011-06-10 15:47       ` William J. Schmidt
@ 2011-06-14 13:58         ` Richard Guenther
  2011-06-14 14:52           ` William J. Schmidt
  2011-06-29 20:17           ` William J. Schmidt
  0 siblings, 2 replies; 11+ messages in thread
From: Richard Guenther @ 2011-06-14 13:58 UTC (permalink / raw)
  To: William J. Schmidt; +Cc: gcc-patches, bergner, dje.gcc, steven, law

On Fri, Jun 10, 2011 at 5:11 PM, William J. Schmidt
<wschmidt@linux.vnet.ibm.com> wrote:
> On Tue, 2011-06-07 at 16:49 +0200, Richard Guenther wrote:
>> On Tue, Jun 7, 2011 at 4:14 PM, William J. Schmidt
>> <wschmidt@linux.vnet.ibm.com> wrote:
>
> <snip>
>
>> >> > Loss of aliasing information
>> >> > ============================
>> >> > The most serious problem I've run into is degraded performance due to poorer
>> >> > instruction scheduling choices.  I tracked this down to
>> >> > alias.c:nonoverlapping_component_refs_p.
>> >> >
>> >> > This code proves that two memory accesses don't overlap by attempting to prove
>> >> > that they access different fields of the same structure.  This is done using
>> >> > the MEM_EXPRs of the two rtx's, which record the expression trees that were
>> >> > translated into the rtx's during expand.  When address lowering is not
>> >> > present, a simple COMPONENT_REF will appear in the MEM_EXPR:  x.a, for
>> >> > example.  However, address lowering changes the simple COMPONENT_REF into a
>> >> > [TARGET_]MEM_REF that is no longer necessarily identifiable as a field
>> >> > reference.  Thus the aliasing machinery can no longer prove that two such
>> >> > field references are disjoint.
>> >> >
>> >> > This has severe consequences for performance, and has to be dealt with if
>> >> > address lowering is to be successful.
>> >> >
>> >> > I've worked around this with an admittedly fragile solution; I'll discuss the
>> >> > drawbacks below.  The idea is to construct a mapping from replacement mem_refs
>> >> > to the original expressions that they replaced.  When a MEM_EXPR is being set
>> >> > during expand, we first look up the mem_ref in the mapping.  If present, the
>> >> > MEM_EXPR is set to the original expression, rather than to the mem_ref.  This
>> >> > essentially duplicates the behavior in the absence of address lowering.
>> >>
>> >> Ick.  We had this in the past via TMR_ORIGINAL which caused all sorts
>> >> of problems.  Removing it didn't cause much degradation because we now
>> >> preserve points-to information.
>> >>
>> >> Originally I played with lowering all memory accesses to MEM_REFs
>> >> (see the old mem-ref branch), and the loss of type-based alias
>> >> disambiguation was indeed an issue.
>> >>
>> >> But - I definitely do not like the idea of preserving something similar
>> >> to TMR_ORIGINAL.  Instead we can try preserving some information
>> >> we derive from it.  We keep the original access type that we can use
>> >> for TBAA but do not retain knowledge on whether the type of the
>> >> MEM_REF is valid for TBAA or if it is view-converted.
>> >
>> > Yes, I really don't like what I have at the moment, either.  I put it in
>> > place as a stopgap to let me proceed to look for other performance
>> > problems.
>> >
>> > The question is how we can infer useful information for TBAA from the
>> > MEM_REFs and TMRs.  I poked at trying to identify types and offsets from
>> > the MEM_EXPRs, but this ended up being useless; I had to constrain too
>> > many cases to maintain correctness, and couldn't prove the type
>> > information for the important cases in SPEC I was trying to address.
>> >
>> > Unfortunately, the whole design goes down the drain if we can't find a
>> > way to solve the TBAA issue.  The performance degradations are too
>> > costly.
>>
>> If you look at what basic TBAA the alias oracle performs then it boils
>> down to the fact that get_alias_set for a.b.c might end up using the
>> alias-set of the type of C but for MEM[&a + 4] it will use the alias set
>> of the type of a.  The tree alias-oracle extracts both alias sets, that
>> of the outermost valid type and that of the innermost as both are
>> equally useful.  But the MEM_REF (or TARGET_MEM_REF) tree
>> only have storage for one such alias-set.  Thus my idea at some point
>> was to store the other one as well in some form.  It will not be
>> the full information (after all, the complete access path does provide
>> some extra information - see aliasing_component_refs_p).
>
> This is what concerns me.  TBAA information for the outer and inner
> components doesn't seem sufficient to provide what
> nonoverlapping_component_refs_p is currently able to prove.  The latter
> searches for a common RECORD_TYPE somewhere along the two access paths,
> and then disambiguates if the two associated referenced fields differ.
> For a simple case like "struct x { int a; int b; };", a and b have the
> same type and alias-set, so the alias-set information doesn't add
> anything.  It isn't sufficient alone for the disambiguation of x1.a =
> MEM_REF[&x1, 0] and x2.b = MEM_REF[&x2, 4].
>
> Obviously the offset is sufficient to disambiguate for this simple case
> with a common base type, but when the shared record types aren't at the
> outermost level, we can't detect whether it is.
>
> At the moment I don't see how we can avoid degradation unless we keep
> the full access path around somewhere, for [TARGET_]MEM_REFs built from
> COMPONENT_REFs.  I hope I'm wrong.

You are not wrong.  But the question is, does it make a difference?

Richard.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Design notes, RFC] Address-lowering prototype design (PR46556)
  2011-06-14 13:58         ` Richard Guenther
@ 2011-06-14 14:52           ` William J. Schmidt
  2011-06-14 15:26             ` Richard Guenther
  2011-06-29 20:17           ` William J. Schmidt
  1 sibling, 1 reply; 11+ messages in thread
From: William J. Schmidt @ 2011-06-14 14:52 UTC (permalink / raw)
  To: Richard Guenther; +Cc: gcc-patches, bergner, dje.gcc, steven, law

On Tue, 2011-06-14 at 15:39 +0200, Richard Guenther wrote:
> On Fri, Jun 10, 2011 at 5:11 PM, William J. Schmidt
> <wschmidt@linux.vnet.ibm.com> wrote:
> > On Tue, 2011-06-07 at 16:49 +0200, Richard Guenther wrote:
> >> On Tue, Jun 7, 2011 at 4:14 PM, William J. Schmidt
> >> <wschmidt@linux.vnet.ibm.com> wrote:
> >
> > <snip>
> >
> >> >> > Loss of aliasing information
> >> >> > ============================
> >> >> > The most serious problem I've run into is degraded performance due to poorer
> >> >> > instruction scheduling choices.  I tracked this down to
> >> >> > alias.c:nonoverlapping_component_refs_p.
> >> >> >
> >> >> > This code proves that two memory accesses don't overlap by attempting to prove
> >> >> > that they access different fields of the same structure.  This is done using
> >> >> > the MEM_EXPRs of the two rtx's, which record the expression trees that were
> >> >> > translated into the rtx's during expand.  When address lowering is not
> >> >> > present, a simple COMPONENT_REF will appear in the MEM_EXPR:  x.a, for
> >> >> > example.  However, address lowering changes the simple COMPONENT_REF into a
> >> >> > [TARGET_]MEM_REF that is no longer necessarily identifiable as a field
> >> >> > reference.  Thus the aliasing machinery can no longer prove that two such
> >> >> > field references are disjoint.
> >> >> >
> >> >> > This has severe consequences for performance, and has to be dealt with if
> >> >> > address lowering is to be successful.
> >> >> >
> >> >> > I've worked around this with an admittedly fragile solution; I'll discuss the
> >> >> > drawbacks below.  The idea is to construct a mapping from replacement mem_refs
> >> >> > to the original expressions that they replaced.  When a MEM_EXPR is being set
> >> >> > during expand, we first look up the mem_ref in the mapping.  If present, the
> >> >> > MEM_EXPR is set to the original expression, rather than to the mem_ref.  This
> >> >> > essentially duplicates the behavior in the absence of address lowering.
> >> >>
> >> >> Ick.  We had this in the past via TMR_ORIGINAL which caused all sorts
> >> >> of problems.  Removing it didn't cause much degradation because we now
> >> >> preserve points-to information.
> >> >>
> >> >> Originally I played with lowering all memory accesses to MEM_REFs
> >> >> (see the old mem-ref branch), and the loss of type-based alias
> >> >> disambiguation was indeed an issue.
> >> >>
> >> >> But - I definitely do not like the idea of preserving something similar
> >> >> to TMR_ORIGINAL.  Instead we can try preserving some information
> >> >> we derive from it.  We keep the original access type that we can use
> >> >> for TBAA but do not retain knowledge on whether the type of the
> >> >> MEM_REF is valid for TBAA or if it is view-converted.
> >> >
> >> > Yes, I really don't like what I have at the moment, either.  I put it in
> >> > place as a stopgap to let me proceed to look for other performance
> >> > problems.
> >> >
> >> > The question is how we can infer useful information for TBAA from the
> >> > MEM_REFs and TMRs.  I poked at trying to identify types and offsets from
> >> > the MEM_EXPRs, but this ended up being useless; I had to constrain too
> >> > many cases to maintain correctness, and couldn't prove the type
> >> > information for the important cases in SPEC I was trying to address.
> >> >
> >> > Unfortunately, the whole design goes down the drain if we can't find a
> >> > way to solve the TBAA issue.  The performance degradations are too
> >> > costly.
> >>
> >> If you look at what basic TBAA the alias oracle performs then it boils
> >> down to the fact that get_alias_set for a.b.c might end up using the
> >> alias-set of the type of C but for MEM[&a + 4] it will use the alias set
> >> of the type of a.  The tree alias-oracle extracts both alias sets, that
> >> of the outermost valid type and that of the innermost as both are
> >> equally useful.  But the MEM_REF (or TARGET_MEM_REF) tree
> >> only have storage for one such alias-set.  Thus my idea at some point
> >> was to store the other one as well in some form.  It will not be
> >> the full information (after all, the complete access path does provide
> >> some extra information - see aliasing_component_refs_p).
> >
> > This is what concerns me.  TBAA information for the outer and inner
> > components doesn't seem sufficient to provide what
> > nonoverlapping_component_refs_p is currently able to prove.  The latter
> > searches for a common RECORD_TYPE somewhere along the two access paths,
> > and then disambiguates if the two associated referenced fields differ.
> > For a simple case like "struct x { int a; int b; };", a and b have the
> > same type and alias-set, so the alias-set information doesn't add
> > anything.  It isn't sufficient alone for the disambiguation of x1.a =
> > MEM_REF[&x1, 0] and x2.b = MEM_REF[&x2, 4].
> >
> > Obviously the offset is sufficient to disambiguate for this simple case
> > with a common base type, but when the shared record types aren't at the
> > outermost level, we can't detect whether it is.
> >
> > At the moment I don't see how we can avoid degradation unless we keep
> > the full access path around somewhere, for [TARGET_]MEM_REFs built from
> > COMPONENT_REFs.  I hope I'm wrong.
> 
> You are not wrong.  But the question is, does it make a difference?
> 
> Richard.

Yes, it does.  This scenario occurs in 188.ammp, among others, and leads
to a large degradation without the change.  The performance-critical
loop in mm_fv_update_nonbon makes heavy use of indirect references to
the ATOM structure that contains numerous float variables.  When the
COMPONENT_REFs have been converted to MEM_REFs, the alias machinery can
no longer disambiguate these, which constrains the scheduler.  The
result of the poor scheduling (on PowerPC, at least) is a large increase
in floating-point spill code in the critical loop.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Design notes, RFC] Address-lowering prototype design (PR46556)
  2011-06-14 14:52           ` William J. Schmidt
@ 2011-06-14 15:26             ` Richard Guenther
  2011-06-14 16:28               ` William J. Schmidt
  0 siblings, 1 reply; 11+ messages in thread
From: Richard Guenther @ 2011-06-14 15:26 UTC (permalink / raw)
  To: William J. Schmidt; +Cc: gcc-patches, bergner, dje.gcc, steven, law

On Tue, Jun 14, 2011 at 4:18 PM, William J. Schmidt
<wschmidt@linux.vnet.ibm.com> wrote:
> On Tue, 2011-06-14 at 15:39 +0200, Richard Guenther wrote:
>> On Fri, Jun 10, 2011 at 5:11 PM, William J. Schmidt
>> <wschmidt@linux.vnet.ibm.com> wrote:
>> > On Tue, 2011-06-07 at 16:49 +0200, Richard Guenther wrote:
>> >> On Tue, Jun 7, 2011 at 4:14 PM, William J. Schmidt
>> >> <wschmidt@linux.vnet.ibm.com> wrote:
>> >
>> > <snip>
>> >
>> >> >> > Loss of aliasing information
>> >> >> > ============================
>> >> >> > The most serious problem I've run into is degraded performance due to poorer
>> >> >> > instruction scheduling choices.  I tracked this down to
>> >> >> > alias.c:nonoverlapping_component_refs_p.
>> >> >> >
>> >> >> > This code proves that two memory accesses don't overlap by attempting to prove
>> >> >> > that they access different fields of the same structure.  This is done using
>> >> >> > the MEM_EXPRs of the two rtx's, which record the expression trees that were
>> >> >> > translated into the rtx's during expand.  When address lowering is not
>> >> >> > present, a simple COMPONENT_REF will appear in the MEM_EXPR:  x.a, for
>> >> >> > example.  However, address lowering changes the simple COMPONENT_REF into a
>> >> >> > [TARGET_]MEM_REF that is no longer necessarily identifiable as a field
>> >> >> > reference.  Thus the aliasing machinery can no longer prove that two such
>> >> >> > field references are disjoint.
>> >> >> >
>> >> >> > This has severe consequences for performance, and has to be dealt with if
>> >> >> > address lowering is to be successful.
>> >> >> >
>> >> >> > I've worked around this with an admittedly fragile solution; I'll discuss the
>> >> >> > drawbacks below.  The idea is to construct a mapping from replacement mem_refs
>> >> >> > to the original expressions that they replaced.  When a MEM_EXPR is being set
>> >> >> > during expand, we first look up the mem_ref in the mapping.  If present, the
>> >> >> > MEM_EXPR is set to the original expression, rather than to the mem_ref.  This
>> >> >> > essentially duplicates the behavior in the absence of address lowering.
>> >> >>
>> >> >> Ick.  We had this in the past via TMR_ORIGINAL which caused all sorts
>> >> >> of problems.  Removing it didn't cause much degradation because we now
>> >> >> preserve points-to information.
>> >> >>
>> >> >> Originally I played with lowering all memory accesses to MEM_REFs
>> >> >> (see the old mem-ref branch), and the loss of type-based alias
>> >> >> disambiguation was indeed an issue.
>> >> >>
>> >> >> But - I definitely do not like the idea of preserving something similar
>> >> >> to TMR_ORIGINAL.  Instead we can try preserving some information
>> >> >> we derive from it.  We keep the original access type that we can use
>> >> >> for TBAA but do not retain knowledge on whether the type of the
>> >> >> MEM_REF is valid for TBAA or if it is view-converted.
>> >> >
>> >> > Yes, I really don't like what I have at the moment, either.  I put it in
>> >> > place as a stopgap to let me proceed to look for other performance
>> >> > problems.
>> >> >
>> >> > The question is how we can infer useful information for TBAA from the
>> >> > MEM_REFs and TMRs.  I poked at trying to identify types and offsets from
>> >> > the MEM_EXPRs, but this ended up being useless; I had to constrain too
>> >> > many cases to maintain correctness, and couldn't prove the type
>> >> > information for the important cases in SPEC I was trying to address.
>> >> >
>> >> > Unfortunately, the whole design goes down the drain if we can't find a
>> >> > way to solve the TBAA issue.  The performance degradations are too
>> >> > costly.
>> >>
>> >> If you look at what basic TBAA the alias oracle performs then it boils
>> >> down to the fact that get_alias_set for a.b.c might end up using the
>> >> alias-set of the type of C but for MEM[&a + 4] it will use the alias set
>> >> of the type of a.  The tree alias-oracle extracts both alias sets, that
>> >> of the outermost valid type and that of the innermost as both are
>> >> equally useful.  But the MEM_REF (or TARGET_MEM_REF) tree
>> >> only have storage for one such alias-set.  Thus my idea at some point
>> >> was to store the other one as well in some form.  It will not be
>> >> the full information (after all, the complete access path does provide
>> >> some extra information - see aliasing_component_refs_p).
>> >
>> > This is what concerns me.  TBAA information for the outer and inner
>> > components doesn't seem sufficient to provide what
>> > nonoverlapping_component_refs_p is currently able to prove.  The latter
>> > searches for a common RECORD_TYPE somewhere along the two access paths,
>> > and then disambiguates if the two associated referenced fields differ.
>> > For a simple case like "struct x { int a; int b; };", a and b have the
>> > same type and alias-set, so the alias-set information doesn't add
>> > anything.  It isn't sufficient alone for the disambiguation of x1.a =
>> > MEM_REF[&x1, 0] and x2.b = MEM_REF[&x2, 4].
>> >
>> > Obviously the offset is sufficient to disambiguate for this simple case
>> > with a common base type, but when the shared record types aren't at the
>> > outermost level, we can't detect whether it is.
>> >
>> > At the moment I don't see how we can avoid degradation unless we keep
>> > the full access path around somewhere, for [TARGET_]MEM_REFs built from
>> > COMPONENT_REFs.  I hope I'm wrong.
>>
>> You are not wrong.  But the question is, does it make a difference?
>>
>> Richard.
>
> Yes, it does.  This scenario occurs in 188.ammp, among others, and leads
> to a large degradation without the change.  The performance-critical
> loop in mm_fv_update_nonbon makes heavy use of indirect references to
> the ATOM structure that contains numerous float variables.  When the
> COMPONENT_REFs have been converted to MEM_REFs, the alias machinery can
> no longer disambiguate these, which constrains the scheduler.  The
> result of the poor scheduling (on PowerPC, at least) is a large increase
> in floating-point spill code in the critical loop.

As they appear in loops I wonder why IVOPTs doesn't already expose
this problem?

In general the answer to missed TBAA optimizations is of course
"make sure that PTA works", which usually means using LTO ...

I really really don't like preserving TBAA related information as trees.
Instead if we really think preserving access paths is a must (I have
already significantly reduced the number of preserved paths with
introducing MEM_REFs!) then we have to find a different representation.

I suppose you can turn the AMMP case into a (small) testcase that
illustrates the issue and allows easier analysis and test of fixes?

Richard.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Design notes, RFC] Address-lowering prototype design (PR46556)
  2011-06-14 15:26             ` Richard Guenther
@ 2011-06-14 16:28               ` William J. Schmidt
  0 siblings, 0 replies; 11+ messages in thread
From: William J. Schmidt @ 2011-06-14 16:28 UTC (permalink / raw)
  To: Richard Guenther; +Cc: gcc-patches, bergner, dje.gcc, steven, law

On Tue, 2011-06-14 at 17:21 +0200, Richard Guenther wrote:
> On Tue, Jun 14, 2011 at 4:18 PM, William J. Schmidt
> <wschmidt@linux.vnet.ibm.com> wrote:
> > On Tue, 2011-06-14 at 15:39 +0200, Richard Guenther wrote:
> >> On Fri, Jun 10, 2011 at 5:11 PM, William J. Schmidt
> >> <wschmidt@linux.vnet.ibm.com> wrote:
> >> > On Tue, 2011-06-07 at 16:49 +0200, Richard Guenther wrote:
> >> >> On Tue, Jun 7, 2011 at 4:14 PM, William J. Schmidt
> >> >> <wschmidt@linux.vnet.ibm.com> wrote:
> >> >
> >> > <snip>
> >> >
> >> >> >> > Loss of aliasing information
> >> >> >> > ============================
> >> >> >> > The most serious problem I've run into is degraded performance due to poorer
> >> >> >> > instruction scheduling choices.  I tracked this down to
> >> >> >> > alias.c:nonoverlapping_component_refs_p.
> >> >> >> >
> >> >> >> > This code proves that two memory accesses don't overlap by attempting to prove
> >> >> >> > that they access different fields of the same structure.  This is done using
> >> >> >> > the MEM_EXPRs of the two rtx's, which record the expression trees that were
> >> >> >> > translated into the rtx's during expand.  When address lowering is not
> >> >> >> > present, a simple COMPONENT_REF will appear in the MEM_EXPR:  x.a, for
> >> >> >> > example.  However, address lowering changes the simple COMPONENT_REF into a
> >> >> >> > [TARGET_]MEM_REF that is no longer necessarily identifiable as a field
> >> >> >> > reference.  Thus the aliasing machinery can no longer prove that two such
> >> >> >> > field references are disjoint.
> >> >> >> >
> >> >> >> > This has severe consequences for performance, and has to be dealt with if
> >> >> >> > address lowering is to be successful.
> >> >> >> >
> >> >> >> > I've worked around this with an admittedly fragile solution; I'll discuss the
> >> >> >> > drawbacks below.  The idea is to construct a mapping from replacement mem_refs
> >> >> >> > to the original expressions that they replaced.  When a MEM_EXPR is being set
> >> >> >> > during expand, we first look up the mem_ref in the mapping.  If present, the
> >> >> >> > MEM_EXPR is set to the original expression, rather than to the mem_ref.  This
> >> >> >> > essentially duplicates the behavior in the absence of address lowering.
> >> >> >>
> >> >> >> Ick.  We had this in the past via TMR_ORIGINAL which caused all sorts
> >> >> >> of problems.  Removing it didn't cause much degradation because we now
> >> >> >> preserve points-to information.
> >> >> >>
> >> >> >> Originally I played with lowering all memory accesses to MEM_REFs
> >> >> >> (see the old mem-ref branch), and the loss of type-based alias
> >> >> >> disambiguation was indeed an issue.
> >> >> >>
> >> >> >> But - I definitely do not like the idea of preserving something similar
> >> >> >> to TMR_ORIGINAL.  Instead we can try preserving some information
> >> >> >> we derive from it.  We keep the original access type that we can use
> >> >> >> for TBAA but do not retain knowledge on whether the type of the
> >> >> >> MEM_REF is valid for TBAA or if it is view-converted.
> >> >> >
> >> >> > Yes, I really don't like what I have at the moment, either.  I put it in
> >> >> > place as a stopgap to let me proceed to look for other performance
> >> >> > problems.
> >> >> >
> >> >> > The question is how we can infer useful information for TBAA from the
> >> >> > MEM_REFs and TMRs.  I poked at trying to identify types and offsets from
> >> >> > the MEM_EXPRs, but this ended up being useless; I had to constrain too
> >> >> > many cases to maintain correctness, and couldn't prove the type
> >> >> > information for the important cases in SPEC I was trying to address.
> >> >> >
> >> >> > Unfortunately, the whole design goes down the drain if we can't find a
> >> >> > way to solve the TBAA issue.  The performance degradations are too
> >> >> > costly.
> >> >>
> >> >> If you look at what basic TBAA the alias oracle performs then it boils
> >> >> down to the fact that get_alias_set for a.b.c might end up using the
> >> >> alias-set of the type of C but for MEM[&a + 4] it will use the alias set
> >> >> of the type of a.  The tree alias-oracle extracts both alias sets, that
> >> >> of the outermost valid type and that of the innermost as both are
> >> >> equally useful.  But the MEM_REF (or TARGET_MEM_REF) tree
> >> >> only have storage for one such alias-set.  Thus my idea at some point
> >> >> was to store the other one as well in some form.  It will not be
> >> >> the full information (after all, the complete access path does provide
> >> >> some extra information - see aliasing_component_refs_p).
> >> >
> >> > This is what concerns me.  TBAA information for the outer and inner
> >> > components doesn't seem sufficient to provide what
> >> > nonoverlapping_component_refs_p is currently able to prove.  The latter
> >> > searches for a common RECORD_TYPE somewhere along the two access paths,
> >> > and then disambiguates if the two associated referenced fields differ.
> >> > For a simple case like "struct x { int a; int b; };", a and b have the
> >> > same type and alias-set, so the alias-set information doesn't add
> >> > anything.  It isn't sufficient alone for the disambiguation of x1.a =
> >> > MEM_REF[&x1, 0] and x2.b = MEM_REF[&x2, 4].
> >> >
> >> > Obviously the offset is sufficient to disambiguate for this simple case
> >> > with a common base type, but when the shared record types aren't at the
> >> > outermost level, we can't detect whether it is.
> >> >
> >> > At the moment I don't see how we can avoid degradation unless we keep
> >> > the full access path around somewhere, for [TARGET_]MEM_REFs built from
> >> > COMPONENT_REFs.  I hope I'm wrong.
> >>
> >> You are not wrong.  But the question is, does it make a difference?
> >>
> >> Richard.
> >
> > Yes, it does.  This scenario occurs in 188.ammp, among others, and leads
> > to a large degradation without the change.  The performance-critical
> > loop in mm_fv_update_nonbon makes heavy use of indirect references to
> > the ATOM structure that contains numerous float variables.  When the
> > COMPONENT_REFs have been converted to MEM_REFs, the alias machinery can
> > no longer disambiguate these, which constrains the scheduler.  The
> > result of the poor scheduling (on PowerPC, at least) is a large increase
> > in floating-point spill code in the critical loop.
> 
> As they appear in loops I wonder why IVOPTs doesn't already expose
> this problem?

That's a fair question.  I haven't analyzed it that deeply yet.  I
suspect that most accesses to different ATOMs are through general
pointers rather than arrays.  Any opportunities for IVOPTs may be too
obfuscated to be easily found.  If I recall correctly, this is also a
deep loop nest with some nontrivial control flow.

> 
> In general the answer to missed TBAA optimizations is of course
> "make sure that PTA works", which usually means using LTO ...
> 
> I really really don't like preserving TBAA related information as trees.
> Instead if we really think preserving access paths is a must (I have
> already significantly reduced the number of preserved paths with
> introducing MEM_REFs!) then we have to find a different representation.
> 

Right, I had been wondering whether some opportunities had been
similarly lost with the MEM_REFs introduced in IVOPTs.  

I will ponder how else we could represent the necessary information from
the access path.  One important goal is to avoid bloating the IR when
the additional information isn't needed (one of the downsides of
something like TMR_ORIGINAL).  We only need this data for (chains of)
COMPONENT_REFs converted into [TARGET_]MEM_REFs, so we don't want to
bloat every MEM_REF or TMR with additional data.

Unfortunately, because of the layout of tree_exprs, the only way to
directly add something to a MEM_REF node is in the operand array.
Although we'd rather not raise the size of all MEM_REFs, I haven't
thought of anything less wasteful.  I considered hiding the extra data
in the type system somehow, but we would need something like a separate
type variant for each MEM_REF created from a different chain of
COMPONENT_REFs.  Even if that was workable, which I doubt, it would cost
more space and complexity than adding a new operand would.  In the end,
we probably require some sort of operand that represents the hidden type
information.  I don't want to use a hackish side table like my prototype
does; the data should be directly represented in the IR for
maintainability.

> I suppose you can turn the AMMP case into a (small) testcase that
> illustrates the issue and allows easier analysis and test of fixes?
> 

I can give it the old college try. :)  I'll have to give some thought
how to detect reliably from dumps that two things either are or are not
aliased.  But I agree this is important to isolate, and I should be able
to come up with something.

Thanks,
Bill

> Richard.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Design notes, RFC] Address-lowering prototype design (PR46556)
  2011-06-14 13:58         ` Richard Guenther
  2011-06-14 14:52           ` William J. Schmidt
@ 2011-06-29 20:17           ` William J. Schmidt
  2011-06-30 10:47             ` Richard Guenther
  1 sibling, 1 reply; 11+ messages in thread
From: William J. Schmidt @ 2011-06-29 20:17 UTC (permalink / raw)
  To: Richard Guenther; +Cc: gcc-patches, bergner

On Tue, 2011-06-14 at 15:39 +0200, Richard Guenther wrote:
> On Fri, Jun 10, 2011 at 5:11 PM, William J. Schmidt
> <wschmidt@linux.vnet.ibm.com> wrote:
> > On Tue, 2011-06-07 at 16:49 +0200, Richard Guenther wrote:
> >> On Tue, Jun 7, 2011 at 4:14 PM, William J. Schmidt
> >> <wschmidt@linux.vnet.ibm.com> wrote:
> >
> > <snip>
> >
> >> >> > Loss of aliasing information
> >> >> > ============================
> >> >> > The most serious problem I've run into is degraded performance due to poorer
> >> >> > instruction scheduling choices. I tracked this down to
> >> >> > alias.c:nonoverlapping_component_refs_p.
> >> >> >
> >> >> > This code proves that two memory accesses don't overlap by attempting to prove
> >> >> > that they access different fields of the same structure. This is done using
> >> >> > the MEM_EXPRs of the two rtx's, which record the expression trees that were
> >> >> > translated into the rtx's during expand. When address lowering is not
> >> >> > present, a simple COMPONENT_REF will appear in the MEM_EXPR: x.a, for
> >> >> > example. However, address lowering changes the simple COMPONENT_REF into a
> >> >> > [TARGET_]MEM_REF that is no longer necessarily identifiable as a field
> >> >> > reference. Thus the aliasing machinery can no longer prove that two such
> >> >> > field references are disjoint.
> >> >> >
> >> >> > This has severe consequences for performance, and has to be dealt with if
> >> >> > address lowering is to be successful.
> >> >> >
> >> >> > I've worked around this with an admittedly fragile solution; I'll discuss the
> >> >> > drawbacks below. The idea is to construct a mapping from replacement mem_refs
> >> >> > to the original expressions that they replaced. When a MEM_EXPR is being set
> >> >> > during expand, we first look up the mem_ref in the mapping. If present, the
> >> >> > MEM_EXPR is set to the original expression, rather than to the mem_ref. This
> >> >> > essentially duplicates the behavior in the absence of address lowering.
> >> >>
> >> >> Ick. We had this in the past via TMR_ORIGINAL which caused all sorts
> >> >> of problems. Removing it didn't cause much degradation because we now
> >> >> preserve points-to information.
> >> >>
> >> >> Originally I played with lowering all memory accesses to MEM_REFs
> >> >> (see the old mem-ref branch), and the loss of type-based alias
> >> >> disambiguation was indeed an issue.
> >> >>
> >> >> But - I definitely do not like the idea of preserving something similar
> >> >> to TMR_ORIGINAL. Instead we can try preserving some information
> >> >> we derive from it. We keep the original access type that we can use
> >> >> for TBAA but do not retain knowledge on whether the type of the
> >> >> MEM_REF is valid for TBAA or if it is view-converted.
> >> >
> >> > Yes, I really don't like what I have at the moment, either. I put it in
> >> > place as a stopgap to let me proceed to look for other performance
> >> > problems.
> >> >
> >> > The question is how we can infer useful information for TBAA from the
> >> > MEM_REFs and TMRs. I poked at trying to identify types and offsets from
> >> > the MEM_EXPRs, but this ended up being useless; I had to constrain too
> >> > many cases to maintain correctness, and couldn't prove the type
> >> > information for the important cases in SPEC I was trying to address.
> >> >
> >> > Unfortunately, the whole design goes down the drain if we can't find a
> >> > way to solve the TBAA issue. The performance degradations are too
> >> > costly.
> >>
> >> If you look at what basic TBAA the alias oracle performs then it boils
> >> down to the fact that get_alias_set for a.b.c might end up using the
> >> alias-set of the type of C but for MEM[&a + 4] it will use the alias set
> >> of the type of a. The tree alias-oracle extracts both alias sets, that
> >> of the outermost valid type and that of the innermost as both are
> >> equally useful. But the MEM_REF (or TARGET_MEM_REF) tree
> >> only have storage for one such alias-set. Thus my idea at some point
> >> was to store the other one as well in some form. It will not be
> >> the full information (after all, the complete access path does provide
> >> some extra information - see aliasing_component_refs_p).
> >
> > This is what concerns me. TBAA information for the outer and inner
> > components doesn't seem sufficient to provide what
> > nonoverlapping_component_refs_p is currently able to prove. The latter
> > searches for a common RECORD_TYPE somewhere along the two access paths,
> > and then disambiguates if the two associated referenced fields differ.
> > For a simple case like "struct x { int a; int b; };", a and b have the
> > same type and alias-set, so the alias-set information doesn't add
> > anything. It isn't sufficient alone for the disambiguation of x1.a =
> > MEM_REF[&x1, 0] and x2.b = MEM_REF[&x2, 4].
> >
> > Obviously the offset is sufficient to disambiguate for this simple case
> > with a common base type, but when the shared record types aren't at the
> > outermost level, we can't detect whether it is.
> >
> > At the moment I don't see how we can avoid degradation unless we keep
> > the full access path around somewhere, for [TARGET_]MEM_REFs built from
> > COMPONENT_REFs. I hope I'm wrong.
> 
> You are not wrong.  But the question is, does it make a difference?
> 
After some rework and measurements, I'm going to change my answer to
this question.  The loss of same-type field disambiguation does make a
difference under limited circumstances.  But at this time, I don't think
those differences are significant enough to warrant the complexities
required to avoid them.

The changes you made a couple weeks ago to tree-ssa-address.c, and the
associated aliasing changes, caused me to remove some heuristics and add
others.  My measurements changed noticeably as a result.  At this point,
the address-lowering patch I've been working on has detrimental effects
(on powerpc64) only on 188.ammp.  As discussed before, this is caused by
the lack of field disambiguation during scheduling of a large hot loop
with high register pressure.  However, the degradation I am seeing is
now roughly 2%.

So with only 2% degradation on a single benchmark, my thought is to add
some commentary to the code about this concern, but not to attempt to
address it.  I would include the salient points from this discussion in
the commentary.

Sound reasonable?

Bill

> Richard.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Design notes, RFC] Address-lowering prototype design (PR46556)
  2011-06-29 20:17           ` William J. Schmidt
@ 2011-06-30 10:47             ` Richard Guenther
  0 siblings, 0 replies; 11+ messages in thread
From: Richard Guenther @ 2011-06-30 10:47 UTC (permalink / raw)
  To: William J. Schmidt; +Cc: gcc-patches, bergner

On Wed, Jun 29, 2011 at 8:18 PM, William J. Schmidt
<wschmidt@linux.vnet.ibm.com> wrote:
> On Tue, 2011-06-14 at 15:39 +0200, Richard Guenther wrote:
>> On Fri, Jun 10, 2011 at 5:11 PM, William J. Schmidt
>> <wschmidt@linux.vnet.ibm.com> wrote:
>> > On Tue, 2011-06-07 at 16:49 +0200, Richard Guenther wrote:
>> >> On Tue, Jun 7, 2011 at 4:14 PM, William J. Schmidt
>> >> <wschmidt@linux.vnet.ibm.com> wrote:
>> >
>> > <snip>
>> >
>> >> >> > Loss of aliasing information
>> >> >> > ============================
>> >> >> > The most serious problem I've run into is degraded performance due to poorer
>> >> >> > instruction scheduling choices. I tracked this down to
>> >> >> > alias.c:nonoverlapping_component_refs_p.
>> >> >> >
>> >> >> > This code proves that two memory accesses don't overlap by attempting to prove
>> >> >> > that they access different fields of the same structure. This is done using
>> >> >> > the MEM_EXPRs of the two rtx's, which record the expression trees that were
>> >> >> > translated into the rtx's during expand. When address lowering is not
>> >> >> > present, a simple COMPONENT_REF will appear in the MEM_EXPR: x.a, for
>> >> >> > example. However, address lowering changes the simple COMPONENT_REF into a
>> >> >> > [TARGET_]MEM_REF that is no longer necessarily identifiable as a field
>> >> >> > reference. Thus the aliasing machinery can no longer prove that two such
>> >> >> > field references are disjoint.
>> >> >> >
>> >> >> > This has severe consequences for performance, and has to be dealt with if
>> >> >> > address lowering is to be successful.
>> >> >> >
>> >> >> > I've worked around this with an admittedly fragile solution; I'll discuss the
>> >> >> > drawbacks below. The idea is to construct a mapping from replacement mem_refs
>> >> >> > to the original expressions that they replaced. When a MEM_EXPR is being set
>> >> >> > during expand, we first look up the mem_ref in the mapping. If present, the
>> >> >> > MEM_EXPR is set to the original expression, rather than to the mem_ref. This
>> >> >> > essentially duplicates the behavior in the absence of address lowering.
>> >> >>
>> >> >> Ick. We had this in the past via TMR_ORIGINAL which caused all sorts
>> >> >> of problems. Removing it didn't cause much degradation because we now
>> >> >> preserve points-to information.
>> >> >>
>> >> >> Originally I played with lowering all memory accesses to MEM_REFs
>> >> >> (see the old mem-ref branch), and the loss of type-based alias
>> >> >> disambiguation was indeed an issue.
>> >> >>
>> >> >> But - I definitely do not like the idea of preserving something similar
>> >> >> to TMR_ORIGINAL. Instead we can try preserving some information
>> >> >> we derive from it. We keep the original access type that we can use
>> >> >> for TBAA but do not retain knowledge on whether the type of the
>> >> >> MEM_REF is valid for TBAA or if it is view-converted.
>> >> >
>> >> > Yes, I really don't like what I have at the moment, either. I put it in
>> >> > place as a stopgap to let me proceed to look for other performance
>> >> > problems.
>> >> >
>> >> > The question is how we can infer useful information for TBAA from the
>> >> > MEM_REFs and TMRs. I poked at trying to identify types and offsets from
>> >> > the MEM_EXPRs, but this ended up being useless; I had to constrain too
>> >> > many cases to maintain correctness, and couldn't prove the type
>> >> > information for the important cases in SPEC I was trying to address.
>> >> >
>> >> > Unfortunately, the whole design goes down the drain if we can't find a
>> >> > way to solve the TBAA issue. The performance degradations are too
>> >> > costly.
>> >>
>> >> If you look at what basic TBAA the alias oracle performs then it boils
>> >> down to the fact that get_alias_set for a.b.c might end up using the
>> >> alias-set of the type of C but for MEM[&a + 4] it will use the alias set
>> >> of the type of a. The tree alias-oracle extracts both alias sets, that
>> >> of the outermost valid type and that of the innermost as both are
>> >> equally useful. But the MEM_REF (or TARGET_MEM_REF) tree
>> >> only have storage for one such alias-set. Thus my idea at some point
>> >> was to store the other one as well in some form. It will not be
>> >> the full information (after all, the complete access path does provide
>> >> some extra information - see aliasing_component_refs_p).
>> >
>> > This is what concerns me. TBAA information for the outer and inner
>> > components doesn't seem sufficient to provide what
>> > nonoverlapping_component_refs_p is currently able to prove. The latter
>> > searches for a common RECORD_TYPE somewhere along the two access paths,
>> > and then disambiguates if the two associated referenced fields differ.
>> > For a simple case like "struct x { int a; int b; };", a and b have the
>> > same type and alias-set, so the alias-set information doesn't add
>> > anything. It isn't sufficient alone for the disambiguation of x1.a =
>> > MEM_REF[&x1, 0] and x2.b = MEM_REF[&x2, 4].
>> >
>> > Obviously the offset is sufficient to disambiguate for this simple case
>> > with a common base type, but when the shared record types aren't at the
>> > outermost level, we can't detect whether it is.
>> >
>> > At the moment I don't see how we can avoid degradation unless we keep
>> > the full access path around somewhere, for [TARGET_]MEM_REFs built from
>> > COMPONENT_REFs. I hope I'm wrong.
>>
>> You are not wrong.  But the question is, does it make a difference?
>>
> After some rework and measurements, I'm going to change my answer to
> this question.  The loss of same-type field disambiguation does make a
> difference under limited circumstances.  But at this time, I don't think
> those differences are significant enough to warrant the complexities
> required to avoid them.
>
> The changes you made a couple weeks ago to tree-ssa-address.c, and the
> associated aliasing changes, caused me to remove some heuristics and add
> others.  My measurements changed noticeably as a result.  At this point,
> the address-lowering patch I've been working on has detrimental effects
> (on powerpc64) only on 188.ammp.  As discussed before, this is caused by
> the lack of field disambiguation during scheduling of a large hot loop
> with high register pressure.  However, the degradation I am seeing is
> now roughly 2%.
>
> So with only 2% degradation on a single benchmark, my thought is to add
> some commentary to the code about this concern, but not to attempt to
> address it.  I would include the salient points from this discussion in
> the commentary.
>
> Sound reasonable?

Yes.

Thanks,
Richard.

> Bill
>
>> Richard.
>
>
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2011-06-30 10:12 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-06-06 18:07 [Design notes, RFC] Address-lowering prototype design (PR46556) William J. Schmidt
2011-06-07 10:07 ` Richard Guenther
2011-06-07 14:33   ` William J. Schmidt
2011-06-07 14:49     ` Richard Guenther
2011-06-10 15:47       ` William J. Schmidt
2011-06-14 13:58         ` Richard Guenther
2011-06-14 14:52           ` William J. Schmidt
2011-06-14 15:26             ` Richard Guenther
2011-06-14 16:28               ` William J. Schmidt
2011-06-29 20:17           ` William J. Schmidt
2011-06-30 10:47             ` Richard Guenther

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).