Aliasing rules for unannotated SYMBOL

public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed

* Aliasing rules for unannotated SYMBOL_REFs
@ 2020-01-25 14:31 Richard Sandiford
  2020-01-27 22:16 ` Jeff Law
  0 siblings, 1 reply; 3+ messages in thread
From: Richard Sandiford @ 2020-01-25 14:31 UTC (permalink / raw)
  To: gcc

TL;DR: if we have two bare SYMBOL_REFs X and Y, neither of which have an
associated source-level decl and neither of which are in an anchor block:

(Q1) can a valid byte access at X+C alias a valid byte access at Y+C?

(Q2) can a valid byte access at X+C1 alias a valid byte access at Y+C2,
     C1 != C2?

Also:

(Q3) If X has a source-level decl and Y doesn't, and neither of them are
     in an anchor block, can valid accesses based on X alias valid accesses
     based on Y?

(well, OK, that wasn't too short either...)

The reason for asking is that memrefs_conflict_p seems to have an
odd structure.  It first checks whether two addresses based on
SYMBOL_REFs refer to the same object, with a tristate result:

      int cmp = compare_base_symbol_refs (x,y);

AFAICT the return values mean:

  1: the SYMBOL_REFs are known to be equal
  0: in-range accesses based on X cannot alias in-range accesses based on Y
 -1: all other cases

If the addresses are known to be equal, we can use an offset-based check:

      /* If both decls are the same, decide by offsets.  */
      if (cmp == 1)
        return offset_overlap_p (c, xsize, ysize);

This part seems obvious enough.  But then, apart from the special case of
forced address alignment, we use an offset-based check even for cmp==-1:

      /* Assume a potential overlap for symbolic addresses that went
	 through alignment adjustments (i.e., that have negative
	 sizes), because we can't know how far they are from each
	 other.  */
      if (maybe_lt (xsize, 0) || maybe_lt (ysize, 0))
	return -1;
      /* If decls are different or we know by offsets that there is no overlap,
	 we win.  */
      if (!cmp || !offset_overlap_p (c, xsize, ysize))
	return 0;

So we seem to be taking cmp==-1 to mean that although we don't know
the relationship between the symbols, it must be the case that either
(a) the symbols are equal (e.g. via aliasing) or (b) the accesses are
to non-overlapping objects.  In other words, one of the situations
described by cmp==1 or cmp==0 must be true, but we don't know which
at compile time.

This means that in practice, the answer to (Q1) appears to be "yes"
but the answer to (Q2) appears to be "no".

This somewhat contradicts:

  /* In general we assume that memory locations pointed to by different labels
     may overlap in undefined ways.  */
  return -1;

at the end of compare_base_symbol_refs, which seems to be saying
that the answer to (Q2) ought to be "yes" instead.  Which is right?

In PR92294 we have a symbol X at ANCHOR+OFFSET that's preemptible.
Under the (Q1)==yes/(Q2)==no assumption, cmp==-1 means that either
(a) X = ANCHOR+OFFSET or (b) X and ANCHOR reference non-overlapping
objects.  So we should take the offset into account when doing:

      if (!cmp || !offset_overlap_p (c, xsize, ysize))
	return 0;

Let's call this FIX1.

But that then brings us to: why does memrefs_conflict_p return -1
when one symbol X has a decl and the other symbol Y doesn't, and neither
of them are block symbols?  Is the answer to (Q3) that we allow equality
but not overlap here too?  E.g. a linker script could define Y to X but
not to a region that contains X at a nonzero offset?

If so, and if one symbol X is an anchor symbol and the other Y has no
information, we have to assume that the linker script could point Y at
any decl in X's block (even if it can't point Y at a constant offset
from those decls).  So we'd need to skip the offset-based check in that
case at least, unless perhaps the block has a single decl.  Let's call
this FIX2.

FIX2 seems like a strange special case though.

On the other hand, if the answer to (Q2) is supposed to be "yes",
I guess we should remove the cmp==-1 offset check altogether.
Let's call this FIX3.

So it looks like there are several "sensible" possibilities:

  Q1  Q2  Q3  | Fixes       | Notes
  ------------+-------------+--------------------------
  yes no  yes | FIX1 + FIX2 | apparently the status quo
  yes yes yes | FIX3        |
  yes no  no  | FIX1        | (N1)
  yes yes no  | FIX3        | (N1)
  no  no  no  | other       | (N2)

(N1) the x_decl && !y_decl and !x_decl && y_decl cases in
     compare_base_symbol_refs are too conservative

(N2) several compare_base_symbol_refs cases are too conservative

Sorry for the overblown write-up.  I was just trying to capture all
the twisty corners I'd turned while working on this PR...

Thanks,
Richard

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Aliasing rules for unannotated SYMBOL_REFs
  2020-01-25 14:31 Aliasing rules for unannotated SYMBOL_REFs Richard Sandiford
@ 2020-01-27 22:16 ` Jeff Law
  2020-02-03 18:06   ` Richard Sandiford
  0 siblings, 1 reply; 3+ messages in thread
From: Jeff Law @ 2020-01-27 22:16 UTC (permalink / raw)
  To: Richard Sandiford, gcc

On Sat, 2020-01-25 at 09:31 +0000, Richard Sandiford wrote:
> TL;DR: if we have two bare SYMBOL_REFs X and Y, neither of which have an
> associated source-level decl and neither of which are in an anchor block:
> 
> (Q1) can a valid byte access at X+C alias a valid byte access at Y+C?
> 
> (Q2) can a valid byte access at X+C1 alias a valid byte access at Y+C2,
>      C1 != C2?
> 
> Also:
> 
> (Q3) If X has a source-level decl and Y doesn't, and neither of them are
>      in an anchor block, can valid accesses based on X alias valid accesses
>      based on Y?
So what are the  cases where Y won't have a source level decl but we
have a decl in RTL?  anchors, other cases? 


> 
> (well, OK, that wasn't too short either...)
I would have thought the answer would be "no" across the board.  But
the code clearly indicates otherwise.

Interposition clearly complicates things as do explicit aliases though.



> 
> This part seems obvious enough.  But then, apart from the special case of
> forced address alignment, we use an offset-based check even for cmp==-1:
> 
>       /* Assume a potential overlap for symbolic addresses that went
> 	 through alignment adjustments (i.e., that have negative
> 	 sizes), because we can't know how far they are from each
> 	 other.  */
>       if (maybe_lt (xsize, 0) || maybe_lt (ysize, 0))
> 	return -1;
>       /* If decls are different or we know by offsets that there is no overlap,
> 	 we win.  */
>       if (!cmp || !offset_overlap_p (c, xsize, ysize))
> 	return 0;
> 
> So we seem to be taking cmp==-1 to mean that although we don't know
> the relationship between the symbols, it must be the case that either
> (a) the symbols are equal (e.g. via aliasing) or (b) the accesses are
> to non-overlapping objects.  In other words, one of the situations
> described by cmp==1 or cmp==0 must be true, but we don't know which
> at compile time.
Right.  That was the conclusion I came to.  If a  SYMBOL_REF has an
alias, the alias must have the same value as the SYMBOL_REF.  So their
either equal or there's no valid case for overlap.

> 
> This means that in practice, the answer to (Q1) appears to be "yes"
> but the answer to (Q2) appears to be "no".
That would be my understanding once aliases/interpositioning come into
play.

> 
> This somewhat contradicts:
> 
>   /* In general we assume that memory locations pointed to by different labels
>      may overlap in undefined ways.  */
>   return -1;
> 
> at the end of compare_base_symbol_refs, which seems to be saying
> that the answer to (Q2) ought to be "yes" instead.  Which is right?
I'm not sure how we could get to yes in that case.  A symbol alias or
interposition ultimately still results in two symbols having the same
final address.  Thus for a byte access if C1 != C2, then we can't have
an overlap.


> 
> In PR92294 we have a symbol X at ANCHOR+OFFSET that's preemptible.
> Under the (Q1)==yes/(Q2)==no assumption, cmp==-1 means that either
> (a) X = ANCHOR+OFFSET or (b) X and ANCHOR reference non-overlapping
> objects.  So we should take the offset into account when doing:
> 
>       if (!cmp || !offset_overlap_p (c, xsize, ysize))
> 	return 0;
> 
> Let's call this FIX1.
So this is a really interesting wrinkle.  Doesn't this change Q2 to a
yes?  In particular it changes the "invariant" that the symbols have
the same address in the event of an symbol alias or interposition.  Of
course one could ask the question of whether or not we should handle
cases with anchors specially.


> 
> But that then brings us to: why does memrefs_conflict_p return -1
> when one symbol X has a decl and the other symbol Y doesn't, and neither
> of them are block symbols?  Is the answer to (Q3) that we allow equality
> but not overlap here too?  E.g. a linker script could define Y to X but
> not to a region that contains X at a nonzero offset?
Does digging into the history provide any insights here?

I'm not sure given the issues you've introduced if I could actually
fill out the matrix of answers without more underlying information. 
ie, when can we get symbols without source level decls, 
anchors+interposition issues, etc.

Jeff
> 

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Aliasing rules for unannotated SYMBOL_REFs
  2020-01-27 22:16 ` Jeff Law
@ 2020-02-03 18:06   ` Richard Sandiford
  0 siblings, 0 replies; 3+ messages in thread
From: Richard Sandiford @ 2020-02-03 18:06 UTC (permalink / raw)
  To: Jeff Law; +Cc: gcc

Thanks for the answer, and sorry for slow follow-up.  Got distracted by
other things...

Jeff Law <law@redhat.com> writes:
> On Sat, 2020-01-25 at 09:31 +0000, Richard Sandiford wrote:
>> TL;DR: if we have two bare SYMBOL_REFs X and Y, neither of which have an
>> associated source-level decl and neither of which are in an anchor block:
>> 
>> (Q1) can a valid byte access at X+C alias a valid byte access at Y+C?
>> 
>> (Q2) can a valid byte access at X+C1 alias a valid byte access at Y+C2,
>>      C1 != C2?
>> 
>> Also:
>> 
>> (Q3) If X has a source-level decl and Y doesn't, and neither of them are
>>      in an anchor block, can valid accesses based on X alias valid accesses
>>      based on Y?
> So what are the  cases where Y won't have a source level decl but we
> have a decl in RTL?  anchors, other cases? 

Not really sure why I wrote "source-level" TBH.  I was really talking
about any symbol that has a SYMBOL_REF_DECL.

I think there are three "interesting" cases:

- symbols with a SYMBOL_REF_DECL
- anchor symbols
- bare symbols (i.e. everything else)

Bare symbols are hopefully rare these days.

>> (well, OK, that wasn't too short either...)
> I would have thought the answer would be "no" across the board.  But
> the code clearly indicates otherwise.
>
> Interposition clearly complicates things as do explicit aliases though.
>
>> This part seems obvious enough.  But then, apart from the special case of
>> forced address alignment, we use an offset-based check even for cmp==-1:
>> 
>>       /* Assume a potential overlap for symbolic addresses that went
>> 	 through alignment adjustments (i.e., that have negative
>> 	 sizes), because we can't know how far they are from each
>> 	 other.  */
>>       if (maybe_lt (xsize, 0) || maybe_lt (ysize, 0))
>> 	return -1;
>>       /* If decls are different or we know by offsets that there is no overlap,
>> 	 we win.  */
>>       if (!cmp || !offset_overlap_p (c, xsize, ysize))
>> 	return 0;
>> 
>> So we seem to be taking cmp==-1 to mean that although we don't know
>> the relationship between the symbols, it must be the case that either
>> (a) the symbols are equal (e.g. via aliasing) or (b) the accesses are
>> to non-overlapping objects.  In other words, one of the situations
>> described by cmp==1 or cmp==0 must be true, but we don't know which
>> at compile time.
> Right.  That was the conclusion I came to.  If a  SYMBOL_REF has an
> alias, the alias must have the same value as the SYMBOL_REF.  So their
> either equal or there's no valid case for overlap.
>
>> 
>> This means that in practice, the answer to (Q1) appears to be "yes"
>> but the answer to (Q2) appears to be "no".
> That would be my understanding once aliases/interpositioning come into
> play.
>
>> 
>> This somewhat contradicts:
>> 
>>   /* In general we assume that memory locations pointed to by different labels
>>      may overlap in undefined ways.  */
>>   return -1;
>> 
>> at the end of compare_base_symbol_refs, which seems to be saying
>> that the answer to (Q2) ought to be "yes" instead.  Which is right?
> I'm not sure how we could get to yes in that case.  A symbol alias or
> interposition ultimately still results in two symbols having the same
> final address.  Thus for a byte access if C1 != C2, then we can't have
> an overlap.

I think it's handling cases in which one symbol is a bare symbol (has no
decl and isn't an anchor).  I assumed the idea was that we could have a
decl-less SYMBOL_REF for the start of a particular section, or things
like that.

>> In PR92294 we have a symbol X at ANCHOR+OFFSET that's preemptible.
>> Under the (Q1)==yes/(Q2)==no assumption, cmp==-1 means that either
>> (a) X = ANCHOR+OFFSET or (b) X and ANCHOR reference non-overlapping
>> objects.  So we should take the offset into account when doing:
>> 
>>       if (!cmp || !offset_overlap_p (c, xsize, ysize))
>> 	return 0;
>> 
>> Let's call this FIX1.
> So this is a really interesting wrinkle.  Doesn't this change Q2 to a
> yes?  In particular it changes the "invariant" that the symbols have
> the same address in the event of an symbol alias or interposition.  Of
> course one could ask the question of whether or not we should handle
> cases with anchors specially.

This wouldn't come under Q2, since that was about symbols that aren't in
an anchor block.  I think it just means we need to generalise the three
cases that don't involve bare symbols from:

  - known equal
  - independent
  - equal or independent

to:

  - known distance apart
  - independent
  - known distance apart or independent

It's fortunate that anchors themselves can't be interposed. :-)

>> But that then brings us to: why does memrefs_conflict_p return -1
>> when one symbol X has a decl and the other symbol Y doesn't, and neither
>> of them are block symbols?  Is the answer to (Q3) that we allow equality
>> but not overlap here too?  E.g. a linker script could define Y to X but
>> not to a region that contains X at a nonzero offset?
> Does digging into the history provide any insights here?

Not that I could see.  The code in question was part of a single patch.

> I'm not sure given the issues you've introduced if I could actually
> fill out the matrix of answers without more underlying information. 
> ie, when can we get symbols without source level decls, 
> anchors+interposition issues, etc.

OK.  In that case, I wonder whether it would be safer to have a
fourth state on top of the three above:

  - known distance apart
  - independent
  - known distance apart or independent
  - don't know

with "don't know" being anything that involves bare symbols?

Richard

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2020-02-03 18:06 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-01-25 14:31 Aliasing rules for unannotated SYMBOL_REFs Richard Sandiford
2020-01-27 22:16 ` Jeff Law
2020-02-03 18:06   ` Richard Sandiford

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).