Re: should MEM tracking be able to optimize this?

public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed

* Re: should MEM tracking be able to optimize this?
@ 2001-11-05 20:23 Richard Kenner
  2001-11-06  8:56 ` Alexandre Oliva
  2001-11-06 19:54 ` Daniel Berlin
  0 siblings, 2 replies; 24+ messages in thread
From: Richard Kenner @ 2001-11-05 20:23 UTC (permalink / raw)
  To: dan; +Cc: gcc

    Only if we are talking about interprocedural algorithms.
    Intraprocedural aliasing algorithms will never get the vast majority of 
    common cases.
    But assuming we are talking about interprocedural algorithms, you are 
    right of course.  aliasing algorithms generally are distinguished by how 
    much info they retain at a given program point, and their alias relation 
    representation.

No, that's way more complex than I mean.  The point is that there are
times (such as this case and many other similar ones) when you can
tell statically that certain types of references to field can never
conflict (because the bit positions don't overlap).  The problem is
representing this information in such a way that you can make that
statement about the references themselves.

Recall the example was:

	struct foo { float a[2048], b[2048]; }

It's obvious statically that X.a[i] and Y.b[j] can never conflict for any
expressions X and Y of type struct foo.  But there needs to be a way to
represent this.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: should MEM tracking be able to optimize this?
  2001-11-05 20:23 should MEM tracking be able to optimize this? Richard Kenner
@ 2001-11-06  8:56 ` Alexandre Oliva
  2001-11-07  3:16   ` Zack Weinberg
  2001-11-06 19:54 ` Daniel Berlin
  1 sibling, 1 reply; 24+ messages in thread
From: Alexandre Oliva @ 2001-11-06  8:56 UTC (permalink / raw)
  To: Richard Kenner; +Cc: dan, gcc

On Nov 18, 2001, kenner@vlsi1.ultra.nyu.edu (Richard Kenner) wrote:

> 	struct foo { float a[2048], b[2048]; }

> It's obvious statically that X.a[i] and Y.b[j] can never conflict for any
> expressions X and Y of type struct foo.  But there needs to be a way to
> represent this.

Err...  Isn't it valid to assume that &X.a[2048] == &X.b[0], which
would be the reason why both fields must be in the same alias set?

-- 
Alexandre Oliva   Enjoy Guarana', see http://www.ic.unicamp.br/~oliva/
Red Hat GCC Developer                  aoliva@{cygnus.com, redhat.com}
CS PhD student at IC-Unicamp        oliva@{lsd.ic.unicamp.br, gnu.org}
Free Software Evangelist    *Please* write to mailing lists, not to me

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: should MEM tracking be able to optimize this?
  2001-11-05 20:23 should MEM tracking be able to optimize this? Richard Kenner
  2001-11-06  8:56 ` Alexandre Oliva
@ 2001-11-06 19:54 ` Daniel Berlin
  1 sibling, 0 replies; 24+ messages in thread
From: Daniel Berlin @ 2001-11-06 19:54 UTC (permalink / raw)
  To: Richard Kenner; +Cc: gcc


On Sunday, November 18, 2001, at 07:31  AM, Richard Kenner wrote:

>     Only if we are talking about interprocedural algorithms.
>     Intraprocedural aliasing algorithms will never get the vast 
> majority of
>     common cases.
>     But assuming we are talking about interprocedural algorithms, you 
> are
>     right of course.  aliasing algorithms generally are distinguished 
> by how
>     much info they retain at a given program point, and their alias 
> relation
>     representation.
>
> No, that's way more complex than I mean.  The point is that there are
> times (such as this case and many other similar ones) when you can
> tell statically that certain types of references to field can never
> conflict (because the bit positions don't overlap).  The problem is
> representing this information in such a way that you can make that
> statement about the references themselves.
>
> Recall the example was:
>
> 	struct foo { float a[2048], b[2048]; }
>
> It's obvious statically that X.a[i] and Y.b[j] can never conflict for 
> any
> expressions X and Y of type struct foo.  But there needs to be a way to
> represent this.
>
Well, these are easy cases, to be honest.

If I remember one alias algorithm i read a while ago, you can assign 
each field a unique name.
All fields in a union are assigned the *same* name (IE you generate one 
unique name, and use it for all of them), to show that they overlap.

You could also add stride, and use it to say they can't ever conflict.

--Dan

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: should MEM tracking be able to optimize this?
  2001-11-06  8:56 ` Alexandre Oliva
@ 2001-11-07  3:16   ` Zack Weinberg
  2001-11-07  3:19     ` Alexandre Oliva
  0 siblings, 1 reply; 24+ messages in thread
From: Zack Weinberg @ 2001-11-07  3:16 UTC (permalink / raw)
  To: Alexandre Oliva; +Cc: Richard Kenner, dan, gcc

On Sun, Nov 18, 2001 at 12:30:11PM -0200, Alexandre Oliva wrote:
> On Nov 18, 2001, kenner@vlsi1.ultra.nyu.edu (Richard Kenner) wrote:
> 
> > 	struct foo { float a[2048], b[2048]; }
> 
> > It's obvious statically that X.a[i] and Y.b[j] can never conflict for any
> > expressions X and Y of type struct foo.  But there needs to be a way to
> > represent this.
> 
> Err...  Isn't it valid to assume that &X.a[2048] == &X.b[0], which
> would be the reason why both fields must be in the same alias set?

No.  &X.a[2048] is a legitimate pointer for purpose of comparing it
with another pointer *into the same array* (X.a) but dereferencing it,
or comparing it to a pointer into a different array (such as X.b) is
undefined behavior.

zw

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: should MEM tracking be able to optimize this?
  2001-11-07  3:16   ` Zack Weinberg
@ 2001-11-07  3:19     ` Alexandre Oliva
  2001-11-07  3:58       ` Zack Weinberg
  0 siblings, 1 reply; 24+ messages in thread
From: Alexandre Oliva @ 2001-11-07  3:19 UTC (permalink / raw)
  To: Zack Weinberg; +Cc: Richard Kenner, dan, gcc

On Nov 18, 2001, Zack Weinberg <zack@codesourcery.com> wrote:

> On Sun, Nov 18, 2001 at 12:30:11PM -0200, Alexandre Oliva wrote:
>> On Nov 18, 2001, kenner@vlsi1.ultra.nyu.edu (Richard Kenner) wrote:
>> 
>> > 	struct foo { float a[2048], b[2048]; }
>> 
>> > It's obvious statically that X.a[i] and Y.b[j] can never conflict for any
>> > expressions X and Y of type struct foo.  But there needs to be a way to
>> > represent this.
>> 
>> Err...  Isn't it valid to assume that &X.a[2048] == &X.b[0], which
>> would be the reason why both fields must be in the same alias set?

> No.  &X.a[2048] is a legitimate pointer for purpose of comparing it
> with another pointer *into the same array* (X.a) but dereferencing it,
> or comparing it to a pointer into a different array (such as X.b) is
> undefined behavior.

C99 6.5.8/#5 hints that such pointers may be valid:

       If  the objects pointed to are members of the same aggregate
       object, pointers to structure members declared later compare
       greater  than  pointers  to  members declared earlier in the
       structure [...]
       In all other cases, the behavior is undefined.

Further support to my impression is given by J.2/#1:

       The   behavior   is   undefined   in   the  following
       circumstances:
[...]
         -- Pointers  that  do  not  point to the same aggregate or
            union (nor just  beyond  the  same  array  object)  are
            compared using relational operators (6.5.8).

I admit it doesn't guarantee my previously-posted assertion is true,
even though I'd expect it to be guaranteed by layout rules I can't
find in the Standard right now.  But still, it implies there is some
difference between two unrelated arrays and two arrays that are
members of the same aggregate.

-- 
Alexandre Oliva   Enjoy Guarana', see http://www.ic.unicamp.br/~oliva/
Red Hat GCC Developer                  aoliva@{cygnus.com, redhat.com}
CS PhD student at IC-Unicamp        oliva@{lsd.ic.unicamp.br, gnu.org}
Free Software Evangelist    *Please* write to mailing lists, not to me

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: should MEM tracking be able to optimize this?
  2001-11-07  3:19     ` Alexandre Oliva
@ 2001-11-07  3:58       ` Zack Weinberg
  2001-11-07  4:00         ` Alexandre Oliva
  0 siblings, 1 reply; 24+ messages in thread
From: Zack Weinberg @ 2001-11-07  3:58 UTC (permalink / raw)
  To: Alexandre Oliva; +Cc: Richard Kenner, dan, gcc

On Sun, Nov 18, 2001 at 04:50:59PM -0200, Alexandre Oliva wrote:
> On Nov 18, 2001, Zack Weinberg <zack@codesourcery.com> wrote:
> 
> > On Sun, Nov 18, 2001 at 12:30:11PM -0200, Alexandre Oliva wrote:
> >> On Nov 18, 2001, kenner@vlsi1.ultra.nyu.edu (Richard Kenner) wrote:
> >> 
> >> > 	struct foo { float a[2048], b[2048]; }
> >> 
> >> > It's obvious statically that X.a[i] and Y.b[j] can never conflict for any
> >> > expressions X and Y of type struct foo.  But there needs to be a way to
> >> > represent this.
> >> 
> >> Err...  Isn't it valid to assume that &X.a[2048] == &X.b[0], which
> >> would be the reason why both fields must be in the same alias set?
> 
> > No.  &X.a[2048] is a legitimate pointer for purpose of comparing it
> > with another pointer *into the same array* (X.a) but dereferencing it,
> > or comparing it to a pointer into a different array (such as X.b) is
> > undefined behavior.
> 
> C99 6.5.8/#5 hints that such pointers may be valid:
> 
>        If  the objects pointed to are members of the same aggregate
>        object, pointers to structure members declared later compare
>        greater  than  pointers  to  members declared earlier in the
>        structure [...]
>        In all other cases, the behavior is undefined.

Hmm, okay, but I'm still pretty confident that dereferencing a pointer to
&X.a[i] for i>=2048 is undefined, which is the important thing for this
optimization.

zw

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: should MEM tracking be able to optimize this?
  2001-11-07  3:58       ` Zack Weinberg
@ 2001-11-07  4:00         ` Alexandre Oliva
  0 siblings, 0 replies; 24+ messages in thread
From: Alexandre Oliva @ 2001-11-07  4:00 UTC (permalink / raw)
  To: Zack Weinberg; +Cc: Richard Kenner, dan, gcc

On Nov 18, 2001, Zack Weinberg <zack@codesourcery.com> wrote:

> Hmm, okay, but I'm still pretty confident that dereferencing a pointer to
> &X.a[i] for i>=2048 is undefined, which is the important thing for this
> optimization.

Hmm, indeed, 6.5.6/#8 and #9 appear to support your view.

Which gives me an interesting insight about aliasing rules.  Given:

struct foo { int x[N]; int y[N]; };
int f(struct foo *foo_p, int *z) {
  int *x = foo_p->x;
  int *y = foo_p->y;

At this point, x and z may alias each other, and y and z may alias
each other, but x and y may be assumed not to alias each other.  Cool!

-- 
Alexandre Oliva   Enjoy Guarana', see http://www.ic.unicamp.br/~oliva/
Red Hat GCC Developer                  aoliva@{cygnus.com, redhat.com}
CS PhD student at IC-Unicamp        oliva@{lsd.ic.unicamp.br, gnu.org}
Free Software Evangelist    *Please* write to mailing lists, not to me

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: should MEM tracking be able to optimize this?
@ 2001-11-08 16:50 Richard Kenner
  0 siblings, 0 replies; 24+ messages in thread
From: Richard Kenner @ 2001-11-08 16:50 UTC (permalink / raw)
  To: dann; +Cc: gcc

    It seems that ex1->a and ex1->c are put in the alias set 0 and it
    should not be necessary. 
    Can this be avoided? 

Perhaps, but it gets into very tricky C semantics, since char * is allowed
to be used to access *any* datatype.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: should MEM tracking be able to optimize this?
  2001-11-05 12:09 Richard Kenner
  2001-11-05 12:52 ` guerby
@ 2001-11-08 15:28 ` Dan Nicolaescu
  1 sibling, 0 replies; 24+ messages in thread
From: Dan Nicolaescu @ 2001-11-08 15:28 UTC (permalink / raw)
  To: Richard Kenner; +Cc: gcc

Richard Kenner <kenner@vlsi1.ultra.nyu.edu> writes:

        Would it be possible to put the fields in different alias sets at
        tree->rtl conversion time (or whenever the alias sets are first
        computed), as we should know they don't overlap?
     
    Well there's no mechanism for fields to have alias sets and even if
    they did, the alias set for the reference would be that of "float", not
    of the field.  This is because of the addressing rules in C.

OK.

How about this example:

struct example
{
  char a;
  int  b;
  char c;
  int  d;
} *ex1;

void
foo (void)
{
  ex1->a = '1';
  ex1->b = 2;
  ex1->c = '3';
  ex1->d = 4;
}

on sparc:

foo:
	!#PROLOGUE# 0
	!#PROLOGUE# 1
	sethi	%hi(ex1), %o1
	ld	[%o1+%lo(ex1)], %o2  <- this
	mov	49, %o0
	stb	%o0, [%o2]
	ld	[%o1+%lo(ex1)], %o3  <-  is reloaded after store byte
	mov	51, %o0
	stb	%o0, [%o3+8]
	ld	[%o1+%lo(ex1)], %o2  <-  same here
	mov	2, %o0
	st	%o0, [%o3+4]
	mov	4, %o1
	retl
	st	%o1, [%o2+12]


It seems that ex1->a and ex1->c are put in the alias set 0 and it
should not be necessary. 
Can this be avoided? 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: should MEM tracking be able to optimize this?
@ 2001-11-06  3:18 Richard Kenner
  0 siblings, 0 replies; 24+ messages in thread
From: Richard Kenner @ 2001-11-06  3:18 UTC (permalink / raw)
  To: dan; +Cc: gcc

Actually, I just saw how to do it.

Right now, there's a function that looks at a tree for the offset
computation and returns the highest power of two that it's known to
be a multiple of.  That can be replaced by a routine that returns the
highest known multiple (or we can have both routines).

We add a field to the mem attribute structure which we could call
MEM_STRIDE.  If it's nonzero, then MEM_OFFSET does not represent a
constant offset, but an assertion that the actual offset is 
MEM_OFFSET + J * MEM_STRIDE for some value of J.

Then if we have two MEMs with the same stride, we can take the offset
modulo the stride and if those (offset, size) pairs don't conflict, we know
the two MEMs can't.

Is anybody interested in doing this?

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: should MEM tracking be able to optimize this?
  2001-11-05 16:49 ` Daniel Berlin
@ 2001-11-05 17:06   ` Paolo Carlini
  0 siblings, 0 replies; 24+ messages in thread
From: Paolo Carlini @ 2001-11-05 17:06 UTC (permalink / raw)
  To: Daniel Berlin, gcc

Slightly off-topic... (C -> C++)

Hi all, hi Daniel,

what do you think of my beloved C++ testcase, which, I'm told, shows the limitations of
current C++ alias analysis? Could be perhaps dealt with at the tree-level similarly to
what Daniel is suggesting for C structs?

I'm told that many tough problems with C++ alias analysis comes from intricate
inherithance hierarchies but (as alluded to by Kenner) could'nt the simplest (sorry if I'm
too naive) cases at least be dealt with? It is'nt possible to stop early the analysis in
those cases without recursion, say: ok this class does not belong to any inheritance
hierarchy, we can deal with it (from the point of view of alias analysis) exactly as if
were a simple C struct...
(sorry if I'm being too naive)

Cheers,
Paolo.

////////////////////////

class RealMatrix {
public:

  float &index(int i, int j)
    {
      return d[i - 1 + n[0] * (j - 1)];
    }
  float index(int i, int j) const
    {
      return d[i - 1 + n[0] * (j - 1)];
    }

  int dim(int i) const { return n[i - 1]; }

private:

  float *d;
  int n[4];
};

void rmatMul(RealMatrix &t, const RealMatrix &a, const RealMatrix &b)
{
  const int M = a.dim(1), N = b.dim(2), K = b.dim(1);

  for (int j = 1; j <= N; j++)
    {
      for (int k = 1; k <= K; k++)
        {
          float temp = b.index(k, j);
          if (temp != 0.0)
            {
              for (int i = 1; i <= M; i++)
                t.index(i, j) += temp * a.index(i, k);
            }
        }
    }
}

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: should MEM tracking be able to optimize this?
  2001-11-05 16:19 Richard Kenner
@ 2001-11-05 16:49 ` Daniel Berlin
  2001-11-05 17:06   ` Paolo Carlini
  0 siblings, 1 reply; 24+ messages in thread
From: Daniel Berlin @ 2001-11-05 16:49 UTC (permalink / raw)
  To: Richard Kenner; +Cc: gcc


On Saturday, November 17, 2001, at 11:40  PM, Richard Kenner wrote:

>     It's not that simple to differentiate between field accesses, and
>     whether two given field accesses overlap.
>
> True in general, but you don't need to solve the general problem.  You
> just need an algorithm that tells you when it's known they *don't*.
Errr, this is still may-alias info, either way.

> And it's not hard to find such an algorithm that will do that in the
> vast majority of all the common cases.
Only if we are talking about interprocedural algorithms.
Intraprocedural aliasing algorithms will never get the vast majority of 
common cases.
But assuming we are talking about interprocedural algorithms, you are 
right of course.  aliasing algorithms generally are distinguished by how 
much info they retain at a given program point, and their alias relation 
representation.

> The tricky part isn't in determining that two fields don't overlap,
> but in all the all recordkeeping needed to have that information at
> the right place.
>
Right.
It's not actually all that tricky to do that either, it's just nobody 
had ever bothered to do it.
Of course, doing it at the RTL level would be pointless (since we've 
lost all the useful info by then), which is why i had done it at the 
tree level, and used it in tree level optimizations.
I also never bothered to try to transfer it down to the RTL level.

--Dan

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: should MEM tracking be able to optimize this?
@ 2001-11-05 16:19 Richard Kenner
  2001-11-05 16:49 ` Daniel Berlin
  0 siblings, 1 reply; 24+ messages in thread
From: Richard Kenner @ 2001-11-05 16:19 UTC (permalink / raw)
  To: dan; +Cc: gcc

    It's not that simple to differentiate between field accesses, and 
    whether two given field accesses overlap.  

True in general, but you don't need to solve the general problem.  You
just need an algorithm that tells you when it's known they *don't*.
And it's not hard to find such an algorithm that will do that in the
vast majority of all the common cases.

The tricky part isn't in determining that two fields don't overlap,
but in all the all recordkeeping needed to have that information at
the right place.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: should MEM tracking be able to optimize this?
  2001-11-05 14:20 ` Zack Weinberg
@ 2001-11-05 15:45   ` Daniel Berlin
  0 siblings, 0 replies; 24+ messages in thread
From: Daniel Berlin @ 2001-11-05 15:45 UTC (permalink / raw)
  To: Zack Weinberg; +Cc: Dan Nicolaescu, gcc, Richard Kenner


On Saturday, November 17, 2001, at 05:16  PM, Zack Weinberg wrote:

> On Fri, Nov 16, 2001 at 12:47:30PM -0800, Dan Nicolaescu wrote:
>>
>> The following 2 functions should generate very similar assembly, right?
> ...
>
> This is purely a sticking-my-nose-in comment, but I looked into it
> briefly and it does appear to be a genuinely hard problem with our
> intermediate representation.
Well, at the RTL level, yes.
At the tree level, in the ast-optimizer branch, it's not that  difficult 
to implement.

> At the RTL level, loads and stores in calc1 look like
>
> (insn 45 43 47 (set (reg:SF 123)
>         (mem/s:SF (plus:SI (reg/f:SI 111)
>                 (reg:SI 115)) [4 A S4 A32])) -1 (nil)
>     (nil))
>
> where in calc2 they look like
>
> (insn 85 83 86 (set (reg:SF 150)
>         (mem/s:SF (plus:SI (reg/f:SI 145)
>                 (reg:SI 148)) [4 p S4 A32])) -1 (nil)
>     (nil))
>
> alias.c uses the information in square brackets to make decisions
> about whether memory refs can conflict.  [4 A S4 A32] means alias set
> 4, variable 'A', unit size 4 (bytes), alignment 32 (bits).
>
> In theory, if we had a thing we could stick into that structure that
> meant "A.p" instead of "A", that would be enough to get identical
> assembly for both loops.
No, it wouldn't.
Think of unions, for instance.
It's not that simple to differentiate between field accesses, and 
whether two given field accesses overlap.  This is partially because 
it's impossible to precisely compute may and must-alias relations in C 
(See "Undecidability of static analysis" by William Landi, or "The 
undecidability of aliasing" by G. Ramalingam).


>   The trouble is that we don't.
>
This is because you'd need an access path or something (see papers on 
non-type based alias analysis, or Muchnick's advanced compiler design 
and implementation) that can properly take into account pointer 
dereferencing, address taking, field accesses, etc. Where properly means 
"conservative enough to not get it wrong".

I've actually done an implementation of this for GCC at the tree level, 
based on an incremental interprocedural alias analysis algorithm.  This 
is one thing i needed tree serialization for.


> At the tree level there is enough information to be clear what is
> going on: the C expression "A.f[i]" becomes this tree:
>
>  <array_ref
>     arg 0 <component_ref
>         arg 0 <var_decl A>
>         arg 1 <field_decl f>
>     arg 1 <var_decl i>>
>
> (This is a brutally trimmed down version of what you'd get if you
> called debug_tree() on the expression node.)
>
> The tree expansion routines eventually call get_inner_reference()
> which transforms that into a (decl, offset) pair:
>
>  <var_decl A>
>
>  <mult_expr
>     arg 0 <var_decl i>
>     arg 1 <integer_cst 4>>
>
> If we'd asked it for A.p[i] instead, the offset would be something
> like <plus <mult <var i> <constant 4>> <constant 8192>> instead.
>
> The "variable" slot of [4 A S4 A32] is, in fact, <var_decl A> as
> returned by get_inner_reference.  Here's the rub: you might think that
> it would work to put <field_decl f> there instead, but FIELD_DECLs are
> unique to the type; all instances of a structure of that type use the
> same <field_decl f>.
>
> It might conceivably work to use <component_ref <var A> <field f>> in
> that slot.  However, I do not believe that COMPONENT_REFs are unique;
> if you refer to A.f twice you'll probably get two trees in memory, and
> that would defeat the simple address comparison currently being done
> by alias.c.
>   They could be _made_ unique, or alias.c could look more
> deeply, but that would be More Work Than I Have Time For (tm).
>
This wouldn't solve the problem of unions, etc.  You need an alias 
analysis algorithm that uses access paths or something (There are other 
ways than access paths, however most recent alias analysis algorithms 
i've seen use them, or some variant of them) that can differentiate 
between pieces of fields, etc.
These can get expensive, depending on how precise they are (and whether 
they are interprocedural or not).



> I do think that we should be able to do this optimization; perhaps
> I'll stick these notes onto the projects page for someone who wants to
> pick up the ball.
>
This optimization is known as scalar replacement of aggregates, BTW.
You need to also take care to pin down variables over function calls, 
etc, unless you use an interprocedural algorithm that can take these 
into account when computing the alias sets.

If you do in the alias analysis algorithm, it makes later elimination 
much simpler, but your alias sets bigger.


> zw

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: should MEM tracking be able to optimize this?
  2001-11-05  4:44 Dan Nicolaescu
  2001-11-05  8:16 ` degger
@ 2001-11-05 14:20 ` Zack Weinberg
  2001-11-05 15:45   ` Daniel Berlin
  1 sibling, 1 reply; 24+ messages in thread
From: Zack Weinberg @ 2001-11-05 14:20 UTC (permalink / raw)
  To: Dan Nicolaescu; +Cc: gcc, Richard Kenner

On Fri, Nov 16, 2001 at 12:47:30PM -0800, Dan Nicolaescu wrote:
> 
> The following 2 functions should generate very similar assembly, right?
...

This is purely a sticking-my-nose-in comment, but I looked into it
briefly and it does appear to be a genuinely hard problem with our
intermediate representation.

At the RTL level, loads and stores in calc1 look like

(insn 45 43 47 (set (reg:SF 123)
        (mem/s:SF (plus:SI (reg/f:SI 111)
                (reg:SI 115)) [4 A S4 A32])) -1 (nil)
    (nil))

where in calc2 they look like

(insn 85 83 86 (set (reg:SF 150)
        (mem/s:SF (plus:SI (reg/f:SI 145)
                (reg:SI 148)) [4 p S4 A32])) -1 (nil)
    (nil))

alias.c uses the information in square brackets to make decisions
about whether memory refs can conflict.  [4 A S4 A32] means alias set
4, variable 'A', unit size 4 (bytes), alignment 32 (bits).

In theory, if we had a thing we could stick into that structure that
meant "A.p" instead of "A", that would be enough to get identical
assembly for both loops.  The trouble is that we don't.

At the tree level there is enough information to be clear what is
going on: the C expression "A.f[i]" becomes this tree:

 <array_ref
    arg 0 <component_ref
        arg 0 <var_decl A>
        arg 1 <field_decl f>
    arg 1 <var_decl i>>

(This is a brutally trimmed down version of what you'd get if you
called debug_tree() on the expression node.)

The tree expansion routines eventually call get_inner_reference()
which transforms that into a (decl, offset) pair:

 <var_decl A>

 <mult_expr
    arg 0 <var_decl i>
    arg 1 <integer_cst 4>>

If we'd asked it for A.p[i] instead, the offset would be something
like <plus <mult <var i> <constant 4>> <constant 8192>> instead.

The "variable" slot of [4 A S4 A32] is, in fact, <var_decl A> as
returned by get_inner_reference.  Here's the rub: you might think that
it would work to put <field_decl f> there instead, but FIELD_DECLs are
unique to the type; all instances of a structure of that type use the
same <field_decl f>.

It might conceivably work to use <component_ref <var A> <field f>> in
that slot.  However, I do not believe that COMPONENT_REFs are unique;
if you refer to A.f twice you'll probably get two trees in memory, and
that would defeat the simple address comparison currently being done
by alias.c.  They could be _made_ unique, or alias.c could look more
deeply, but that would be More Work Than I Have Time For (tm).

I do think that we should be able to do this optimization; perhaps
I'll stick these notes onto the projects page for someone who wants to
pick up the ball.

zw

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: should MEM tracking be able to optimize this?
@ 2001-11-05 12:55 Richard Kenner
  0 siblings, 0 replies; 24+ messages in thread
From: Richard Kenner @ 2001-11-05 12:55 UTC (permalink / raw)
  To: guerby; +Cc: gcc

    Just curious, what about Ada?

Yes, it can be done for nonaliased fields and variables in Ada and we
in fact talked about it in the variable case.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: should MEM tracking be able to optimize this?
  2001-11-05 12:09 Richard Kenner
@ 2001-11-05 12:52 ` guerby
  2001-11-08 15:28 ` Dan Nicolaescu
  1 sibling, 0 replies; 24+ messages in thread
From: guerby @ 2001-11-05 12:52 UTC (permalink / raw)
  To: kenner; +Cc: dann, gcc

> Well there's no mechanism for fields to have alias sets and even if
> they did, the alias set for the reference would be that of "float", not
> of the field.  This is because of the addressing rules in C.

Just curious, what about Ada?

-- 
Laurent Guerby <guerby@acm.org>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: should MEM tracking be able to optimize this?
@ 2001-11-05 12:09 Richard Kenner
  2001-11-05 12:52 ` guerby
  2001-11-08 15:28 ` Dan Nicolaescu
  0 siblings, 2 replies; 24+ messages in thread
From: Richard Kenner @ 2001-11-05 12:09 UTC (permalink / raw)
  To: dann; +Cc: gcc

    Would it be possible to put the fields in different alias sets at
    tree->rtl conversion time (or whenever the alias sets are first
    computed), as we should know they don't overlap?
 
Well there's no mechanism for fields to have alias sets and even if
they did, the alias set for the reference would be that of "float", not
of the field.  This is because of the addressing rules in C.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: should MEM tracking be able to optimize this?
  2001-11-05  5:54 Richard Kenner
@ 2001-11-05 11:34 ` Dan Nicolaescu
  0 siblings, 0 replies; 24+ messages in thread
From: Dan Nicolaescu @ 2001-11-05 11:34 UTC (permalink / raw)
  To: Richard Kenner; +Cc: gcc

Richard Kenner <kenner@vlsi1.ultra.nyu.edu> writes:

        Or it's not considered to be worth the effort? 

    It's a hard problem because there's a limit to what you can keep track of.
    It's not clear it's common enough to be worth the effort.

I think this type of code might be used in the image processing
world. They pass around pointers to structures containing arrays. 

Code like this also exhibits the problem:

struct mystruct{
  float f[2048];
  float p[2048];
  float q[2048];
  float d[2048];
};

void
calc4 (struct mystruct *C)
{
  int i;
  for (i = 0; i < 1024; ++i) {
    C->f[i]  = C->p[i] + C->q[i];
    C->d[i]  = C->p[i] + C->q[i]; 
    C->f[i]  = C->f[i] + C->p[i] + C->q[i];
  }
}

Would it be possible to put the fields in different alias sets at
tree->rtl conversion time (or whenever the alias sets are first
computed), as we should know they don't overlap?

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: should MEM tracking be able to optimize this?
  2001-11-05  4:44 Dan Nicolaescu
@ 2001-11-05  8:16 ` degger
  2001-11-05 14:20 ` Zack Weinberg
  1 sibling, 0 replies; 24+ messages in thread
From: degger @ 2001-11-05  8:16 UTC (permalink / raw)
  To: dann; +Cc: gcc, kenner

On 16 Nov, Dan Nicolaescu wrote:

> On sparc-sun-solaris2.8 using -O2 I get the following assembly:
> (from a gcc updated about 1 hour ago when using -O2, just the loops
> are shown)

Highly interesting. On powerpc-gnu-linux with 2.95.3 the loop is almost
the same while the assebmly before is worse in your second case:

version1:

        li %r0,1024
        mtctr %r0
        lis %r9,A@ha
        la %r9,A@l(%r9)
.L8:
        lfs %f0,8192(%r9)
        lfs %f12,16384(%r9)
        fadds %f13,%f0,%f12
        fadds %f0,%f13,%f0
        stfs %f13,24576(%r9)
        fadds %f0,%f0,%f12
        stfs %f0,0(%r9)
        addi %r9,%r9,4
        bdnz .L8
        blr

version2:
        li %r0,1024
        lis %r9,f@ha
        mtctr %r0
        lis %r11,p@ha
        la %r7,f@l(%r9)
        la %r8,p@l(%r11)
        lis %r9,q@ha
        lis %r11,d@ha
        la %r9,q@l(%r9)
        la %r11,d@l(%r11)
        li %r10,0
.L8:
        lfsx %f0,%r10,%r8
        lfsx %f12,%r10,%r9
        fadds %f13,%f0,%f12
        fadds %f0,%f13,%f0
        stfsx %f13,%r10,%r11
        fadds %f0,%f0,%f12
        stfsx %f0,%r10,%r7
        addi %r10,%r10,4
        bdnz .L8
        blr

While with CVS it's worse in both cases and really evil for the first
one:

version1:
        li %r0,1024
        lis %r9,A@ha
        mtctr %r0
        li %r7,0
        la %r8,A@l(%r9)
.L9:
        slwi %r9,%r7,2
        addi %r7,%r7,1
        addi %r0,%r9,8192
        addi %r11,%r9,16384
        lfsx %f0,%r8,%r11
        addi %r10,%r9,24576
        lfsx %f13,%r8,%r0
        fadds %f13,%f13,%f0
        stfsx %f13,%r8,%r9
        lfsx %f13,%r8,%r11
        lfsx %f0,%r8,%r0
        fadds %f0,%f0,%f13
        stfsx %f0,%r8,%r10
        lfsx %f12,%r8,%r0
        lfsx %f0,%r8,%r9
        lfsx %f13,%r8,%r11
        fadds %f0,%f0,%f12
        fadds %f0,%f0,%f13
        stfsx %f0,%r8,%r9
        bdnz .L9
        blr

version2:
        li %r0,1024
        lis %r9,f@ha
        mtctr %r0
        lis %r11,p@ha
        la %r7,f@l(%r9)
        la %r8,p@l(%r11)
        lis %r9,q@ha
        lis %r11,d@ha
        la %r9,q@l(%r9)
        la %r11,d@l(%r11)
        li %r10,0
.L9:
        slwi %r0,%r10,2
        addi %r10,%r10,1
        lfsx %f0,%r8,%r0
        lfsx %f12,%r9,%r0
        fadds %f13,%f0,%f12
        fadds %f0,%f13,%f0
        stfsx %f13,%r11,%r0
        fadds %f0,%f0,%f12
        stfsx %f0,%r7,%r0
        bdnz .L9
        blr

Please note that each loop is called equally often and thusly the amount
of work differs quite a bit.

--
Servus,
       Daniel

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: should MEM tracking be able to optimize this?
@ 2001-11-05  5:54 Richard Kenner
  2001-11-05 11:34 ` Dan Nicolaescu
  0 siblings, 1 reply; 24+ messages in thread
From: Richard Kenner @ 2001-11-05  5:54 UTC (permalink / raw)
  To: dann; +Cc: gcc

    Do you have any plans to do anything about tracking the fields too? 

No.

    Or it's not considered to be worth the effort? 

It's a hard problem because there's a limit to what you can keep track of.
It's not clear it's common enough to be worth the effort.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: should MEM tracking be able to optimize this?
  2001-11-05  4:49 Richard Kenner
@ 2001-11-05  5:47 ` Dan Nicolaescu
  0 siblings, 0 replies; 24+ messages in thread
From: Dan Nicolaescu @ 2001-11-05  5:47 UTC (permalink / raw)
  To: Richard Kenner; +Cc: gcc

Richard Kenner <kenner@vlsi1.ultra.nyu.edu> writes:

        "calc1" one has too many loads and stores...  Should MEM tracking be
        able to dissambiguate the non-dependend loads and stores in "calc1"?
    
    No, only in calc2.  The point is that it can now disambiguate when you
    have separate variables, like you do in calc2.  In calc1, what you
    have are different *fields* of the same variable and that isn't what
    is tracked.

Do you have any plans to do anything about tracking the fields too? 
Or it's not considered to be worth the effort? 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re:  should MEM tracking be able to optimize this?
@ 2001-11-05  4:49 Richard Kenner
  2001-11-05  5:47 ` Dan Nicolaescu
  0 siblings, 1 reply; 24+ messages in thread
From: Richard Kenner @ 2001-11-05  4:49 UTC (permalink / raw)
  To: dann; +Cc: gcc

    "calc1" one has too many loads and stores...  Should MEM tracking be
    able to dissambiguate the non-dependend loads and stores in "calc1"?

No, only in calc2.  The point is that it can now disambiguate when you
have separate variables, like you do in calc2.  In calc1, what you
have are different *fields* of the same variable and that isn't what
is tracked.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* should MEM tracking be able to optimize this?
@ 2001-11-05  4:44 Dan Nicolaescu
  2001-11-05  8:16 ` degger
  2001-11-05 14:20 ` Zack Weinberg
  0 siblings, 2 replies; 24+ messages in thread
From: Dan Nicolaescu @ 2001-11-05  4:44 UTC (permalink / raw)
  To: gcc; +Cc: Richard Kenner


The following 2 functions should generate very similar assembly, right?

struct {
  float f[2048];
  float p[2048];
  float q[2048];
  float d[2048];
} A;


void
calc1 (void)
{
  int i;
  for (i = 0; i < 1024; ++i) {
    A.f[i]  = A.p[i] + A.q[i];
    A.d[i]  = A.p[i] + A.q[i]; 
    A.f[i]  = A.f[i] + A.p[i] + A.q[i];
  }
}

float f[2048];
float p[2048];
float q[2048];
float d[2048];

void
calc2 (void)
{
  int i;
  for (i = 0; i < 1024; ++i) {
    f[i]  = p[i] + q[i];
    d[i]  = p[i] + q[i]; 
    f[i]  = f[i] + p[i] + q[i];
  }
}


On sparc-sun-solaris2.8 using -O2 I get the following assembly:
(from a gcc updated about 1 hour ago when using -O2, just the loops
are shown)


calc1:
[snip]
.LL5:
	sll	%i5, 2, %i1	! i
	add	%i1, %o0, %i2
	add	%i1, %o7, %i3
	ld	[%i4+%i3], %f3	! A
	ld	[%i4+%i2], %f2	! A
	fadds	%f2, %f3, %f2
	st	%f2, [%i4+%i1]	! A
	ld	[%i4+%i3], %f2	! A
	ld	[%i4+%i2], %f3	! A
	fadds	%f3, %f2, %f3
	add	%i1, %g1, %i0
	st	%f3, [%i4+%i0]	! A
	ld	[%i4+%i2], %f4	! A
	ld	[%i4+%i1], %f2	! A
	ld	[%i4+%i3], %f3	! A
	fadds	%f2, %f4, %f2
	fadds	%f2, %f3, %f2
	add	%i5, 1, %i5	! i, i
	cmp	%i5, 1023	! i
	ble	.LL5
	st	%f2, [%i4+%i1]	! A
[snip]

calc2:
[snip]
.LL13:
	sll	%o2, 2, %o0	! i
	ld	[%o4+%o0], %f2	! p
	ld	[%o3+%o0], %f3	! q
	fadds	%f2, %f3, %f4
	fadds	%f4, %f2, %f2
	fadds	%f2, %f3, %f2
	add	%o2, 1, %o2	! i, i
	st	%f2, [%o5+%o0]	! f
	cmp	%o2, 1023	! i
	ble	.LL13
	st	%f4, [%o1+%o0]	! d



"calc1" one has too many loads and stores...  
Should MEM tracking be able to dissambiguate the non-dependend loads
and stores in "calc1"?



^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2001-11-19 22:38 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2001-11-05 20:23 should MEM tracking be able to optimize this? Richard Kenner
2001-11-06  8:56 ` Alexandre Oliva
2001-11-07  3:16   ` Zack Weinberg
2001-11-07  3:19     ` Alexandre Oliva
2001-11-07  3:58       ` Zack Weinberg
2001-11-07  4:00         ` Alexandre Oliva
2001-11-06 19:54 ` Daniel Berlin
  -- strict thread matches above, loose matches on Subject: below --
2001-11-08 16:50 Richard Kenner
2001-11-06  3:18 Richard Kenner
2001-11-05 16:19 Richard Kenner
2001-11-05 16:49 ` Daniel Berlin
2001-11-05 17:06   ` Paolo Carlini
2001-11-05 12:55 Richard Kenner
2001-11-05 12:09 Richard Kenner
2001-11-05 12:52 ` guerby
2001-11-08 15:28 ` Dan Nicolaescu
2001-11-05  5:54 Richard Kenner
2001-11-05 11:34 ` Dan Nicolaescu
2001-11-05  4:49 Richard Kenner
2001-11-05  5:47 ` Dan Nicolaescu
2001-11-05  4:44 Dan Nicolaescu
2001-11-05  8:16 ` degger
2001-11-05 14:20 ` Zack Weinberg
2001-11-05 15:45   ` Daniel Berlin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).