public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* DR#236 analysis
@ 2004-12-03  3:31 Joseph S. Myers
  2004-12-03  4:15 ` Daniel Berlin
  2004-12-03  7:13 ` Geoffrey Keating
  0 siblings, 2 replies; 12+ messages in thread
From: Joseph S. Myers @ 2004-12-03  3:31 UTC (permalink / raw)
  To: gcc

An analysis of C99 DR#236 (type-based alias rules in the presence of 
unions) with a possible resolution with a note "Optimizers may have 
problems here" has been posted in the posted in the post-Redmond mailing 
<http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1090.htm>; I'd suggest 
people trying to implement aliasing rules read it.  Any comments about 
what resolutions make sense for optimizing real code would probably best 
go to the WG14 reflector (sc22wg14 at open-std.org) though I don't know to 
what extent non-member postings are accepted.

-- 
Joseph S. Myers               http://www.srcf.ucam.org/~jsm28/gcc/
    jsm@polyomino.org.uk (personal mail)
    joseph@codesourcery.com (CodeSourcery mail)
    jsm28@gcc.gnu.org (Bugzilla assignments and CCs)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: DR#236 analysis
  2004-12-03  3:31 DR#236 analysis Joseph S. Myers
@ 2004-12-03  4:15 ` Daniel Berlin
  2004-12-03  7:13 ` Geoffrey Keating
  1 sibling, 0 replies; 12+ messages in thread
From: Daniel Berlin @ 2004-12-03  4:15 UTC (permalink / raw)
  To: Joseph S. Myers; +Cc: gcc

On Fri, 2004-12-03 at 03:31 +0000, Joseph S. Myers wrote:
> An analysis of C99 DR#236 (type-based alias rules in the presence of 
> unions) with a possible resolution with a note "Optimizers may have 
> problems here" has been posted in the posted in the post-Redmond mailing 
> <http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1090.htm>; I'd suggest 
> people trying to implement aliasing rules read it.  Any comments about 
> what resolutions make sense for optimizing real code would probably best 
> go to the WG14 reflector (sc22wg14 at open-std.org) though I don't know to 
> what extent non-member postings are accepted.
> 

They don't seem to be accepted (i tried).
So i'll just state my thoughts here, and if anyone wants to pass them
along, they can.

They just don't seem to want to do the right thing, which is just make
the all or nothing choice and deal with the results.

The whole idea that it is more sane to change the rules at function
boundaries seems broken, even if i like the fact that it provides more
certainty than what we have have now.  

I feel this way partially because lots of optimizations in real world
compilers cross function boundaries, and that raises a whole host of
issues with this.

The simplest example is inlining on the whole program (or single modules
if these were all in the same module, etc) could make every single piece
of example code given the same in terms of intermediate form.  So what
happens then? Are we supposed to track what the aliasing rules say the
result would have been, had it not been inlined, or apply the type based
aliasing rules to the newly created inline function to determine whether
it is now valid, even if it would originally have been invalid in the
non-inlined form?

Bright lines are better than squiggly half-lited ones, and carving out
weird exceptions only leads to carving out further weird exceptions when
you need to accomodate something sane later that happens to conflict[1].

Either decide that point a (in the intention of the aliasing model) is
more important, or point c is more important (and for an optimizing
compiler, point a is much more important), and be done with it.  Drawing
the line in the middle and cutting the baby in half helps no one. 

Part of the reason we have language committees is in part to make the
hard choices so that both programmers and compiler authors don't have to
deal with the uncertainty that results otherwise.

Lastly, I completely agree with whoever raised the point that there is
no good reason for any of this code to be valid in the first place.
Is it really that difficult to explicitly access a union field when you
want to access a union field?

--Dan
[1] This is no different than law, where most of the time in law school
classes is spent teaching you the exceptions and squiggly lines that
were carved out (most of time, with some exceptions) because of some
court originally punting on making a hard choice one way or the other,
and then it coming back to bite them in the ass later because they were
more or less stuck with it at that point.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: DR#236 analysis
  2004-12-03  3:31 DR#236 analysis Joseph S. Myers
  2004-12-03  4:15 ` Daniel Berlin
@ 2004-12-03  7:13 ` Geoffrey Keating
  2004-12-03 14:22   ` Ian Lance Taylor
  1 sibling, 1 reply; 12+ messages in thread
From: Geoffrey Keating @ 2004-12-03  7:13 UTC (permalink / raw)
  To: Joseph S. Myers; +Cc: gcc

"Joseph S. Myers" <joseph@codesourcery.com> writes:

> An analysis of C99 DR#236 (type-based alias rules in the presence of 
> unions) with a possible resolution with a note "Optimizers may have 
> problems here" has been posted in the posted in the post-Redmond mailing 
> <http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1090.htm>; I'd suggest 
> people trying to implement aliasing rules read it.  Any comments about 
> what resolutions make sense for optimizing real code would probably best 
> go to the WG14 reflector (sc22wg14 at open-std.org) though I don't know to 
> what extent non-member postings are accepted.

I prefer the n980 proposal instead, require a visible union.  If it's
felt that's too unhelpful for optimisers, the alternative I'd suggest,
which I like but am not sure about incompatibility with existing code,
would be to say that pointers to (or inside) fields of a union become
invalid on a store to a different field (as would happen if the other
fields were free()ed).  For malloc-ed memory, I strongly support the
idea that the first store should determine its type and the type can't
change after that.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: DR#236 analysis
  2004-12-03  7:13 ` Geoffrey Keating
@ 2004-12-03 14:22   ` Ian Lance Taylor
  2004-12-03 20:42     ` Geoffrey Keating
  0 siblings, 1 reply; 12+ messages in thread
From: Ian Lance Taylor @ 2004-12-03 14:22 UTC (permalink / raw)
  To: Geoffrey Keating; +Cc: Joseph S. Myers, gcc

Geoffrey Keating <geoffk@geoffk.org> writes:

> For malloc-ed memory, I strongly support the
> idea that the first store should determine its type and the type can't
> change after that.

If I understand that correctly, that would break many existing
programs, including BFD.

Ian

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: DR#236 analysis
  2004-12-03 14:22   ` Ian Lance Taylor
@ 2004-12-03 20:42     ` Geoffrey Keating
  2004-12-03 20:55       ` Ian Lance Taylor
  0 siblings, 1 reply; 12+ messages in thread
From: Geoffrey Keating @ 2004-12-03 20:42 UTC (permalink / raw)
  To: Ian Lance Taylor; +Cc: Joseph S. Myers, gcc

Ian Lance Taylor <ian@wasabisystems.com> writes:

> Geoffrey Keating <geoffk@geoffk.org> writes:
> 
> > For malloc-ed memory, I strongly support the
> > idea that the first store should determine its type and the type can't
> > change after that.
> 
> If I understand that correctly, that would break many existing
> programs, including BFD.

Could you give an example?  I'm having trouble of thinking of any
portable code for which this would be useful and not easily avoided.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: DR#236 analysis
  2004-12-03 20:42     ` Geoffrey Keating
@ 2004-12-03 20:55       ` Ian Lance Taylor
  2004-12-03 21:21         ` Geoff Keating
  0 siblings, 1 reply; 12+ messages in thread
From: Ian Lance Taylor @ 2004-12-03 20:55 UTC (permalink / raw)
  To: Geoffrey Keating; +Cc: Joseph S. Myers, gcc

Geoffrey Keating <geoffk@geoffk.org> writes:

> > > For malloc-ed memory, I strongly support the
> > > idea that the first store should determine its type and the type can't
> > > change after that.
> > 
> > If I understand that correctly, that would break many existing
> > programs, including BFD.
> 
> Could you give an example?  I'm having trouble of thinking of any
> portable code for which this would be useful and not easily avoided.

It is possible that I misunderstood your proposal.

In BFD, we have structures like this in allocated memory:

struct elf_link_hash_entry
{

  ...

  union
  {
    /* If this is a weak defined symbol from a dynamic object, this
       field points to a defined symbol with the same value, if there is
       one.  Otherwise it is NULL.  */
    struct elf_link_hash_entry *weakdef;

    /* Hash value of the name computed using the ELF hash function.
       Used part way through size_dynamic_sections, after we've finished
       with weakdefs.  */
    unsigned long elf_hash_value;
  } u;

  ...

};

Here u.weakdef is used in the first pass of the link, through
adjust_dynamic_symbol.  u.elf_hash_value is used starting at
size_dynamic_sections.

This is a simple hack to decrease linker memory usage.  It is portable
and, I believe, fully standards compliant.  It is a case in which
malloc'ed memory changes type after the first store.  Certainly it can
be avoided, but that is not the point.  I think the code is correct
and reasonable, and I think gcc should not break it.

As I say, I may have misunderstood your suggestion.

Ian

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: DR#236 analysis
  2004-12-03 20:55       ` Ian Lance Taylor
@ 2004-12-03 21:21         ` Geoff Keating
  2004-12-03 22:58           ` Ian Lance Taylor
  2004-12-05 16:39           ` Michael Veksler
  0 siblings, 2 replies; 12+ messages in thread
From: Geoff Keating @ 2004-12-03 21:21 UTC (permalink / raw)
  To: Ian Lance Taylor; +Cc: gcc, Joseph S. Myers

[-- Attachment #1: Type: text/plain, Size: 2060 bytes --]


On 03/12/2004, at 12:55 PM, Ian Lance Taylor wrote:

> Geoffrey Keating <geoffk@geoffk.org> writes:
>
>>>> For malloc-ed memory, I strongly support the
>>>> idea that the first store should determine its type and the type 
>>>> can't
>>>> change after that.
>>>
>>> If I understand that correctly, that would break many existing
>>> programs, including BFD.
>>
>> Could you give an example?  I'm having trouble of thinking of any
>> portable code for which this would be useful and not easily avoided.
>
> It is possible that I misunderstood your proposal.
>
> In BFD, we have structures like this in allocated memory:
>
> struct elf_link_hash_entry
> {
>
>   ...
>
>   union
>   {
>     /* If this is a weak defined symbol from a dynamic object, this
>        field points to a defined symbol with the same value, if there 
> is
>        one.  Otherwise it is NULL.  */
>     struct elf_link_hash_entry *weakdef;
>
>     /* Hash value of the name computed using the ELF hash function.
>        Used part way through size_dynamic_sections, after we've 
> finished
>        with weakdefs.  */
>     unsigned long elf_hash_value;
>   } u;
>
>   ...
>
> };
>
> Here u.weakdef is used in the first pass of the link, through
> adjust_dynamic_symbol.  u.elf_hash_value is used starting at
> size_dynamic_sections.
>
> This is a simple hack to decrease linker memory usage.  It is portable
> and, I believe, fully standards compliant.  It is a case in which
> malloc'ed memory changes type after the first store.  Certainly it can
> be avoided, but that is not the point.  I think the code is correct
> and reasonable, and I think gcc should not break it.
>
> As I say, I may have misunderstood your suggestion.

I see.  Yes, that's not what I meant.  In this case, the type of the 
malloc-ed memory is 'struct elf_link_hash_entry', and that's what's not 
allowed to change; what I want prohibited is code like:

   void * mem = malloc (max (sizeof (int), sizeof (float));
   *(int *)mem = 1;
   *(float *)mem = 2.0;

where the top-level type of the memory changes.

[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 2410 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: DR#236 analysis
  2004-12-03 21:21         ` Geoff Keating
@ 2004-12-03 22:58           ` Ian Lance Taylor
  2004-12-05 16:39           ` Michael Veksler
  1 sibling, 0 replies; 12+ messages in thread
From: Ian Lance Taylor @ 2004-12-03 22:58 UTC (permalink / raw)
  To: Geoff Keating; +Cc: gcc, Joseph S. Myers

Geoff Keating <geoffk@geoffk.org> writes:

> I see.  Yes, that's not what I meant.  In this case, the type of the
> malloc-ed memory is 'struct elf_link_hash_entry', and that's what's
> not allowed to change; what I want prohibited is code like:
> 
>    void * mem = malloc (max (sizeof (int), sizeof (float));
>    *(int *)mem = 1;
>    *(float *)mem = 2.0;
> 
> where the top-level type of the memory changes.

I see.  That does seem uncommon.  One case where this happens is in
the malloc implementation itself, in which a freed pointer is treated
as a pointer to some free list structure.  But I suppose that case
would be OK, since after the call to free the programmer isn't
permitted to dereference the original pointer anyhow.

Hmmm, do we have to worry about user-level memory managers, like
obstacks?

Ian

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: DR#236 analysis
  2004-12-03 21:21         ` Geoff Keating
  2004-12-03 22:58           ` Ian Lance Taylor
@ 2004-12-05 16:39           ` Michael Veksler
  1 sibling, 0 replies; 12+ messages in thread
From: Michael Veksler @ 2004-12-05 16:39 UTC (permalink / raw)
  To: Geoff Keating; +Cc: gcc, Ian Lance Taylor, Joseph S. Myers







Geoff Keating <geoffk@geoffk.org> on 03/12/2004 23:21:51:

>
> On 03/12/2004, at 12:55 PM, Ian Lance Taylor wrote:
>
> > Geoffrey Keating <geoffk@geoffk.org> writes:
> >
> >>>> For malloc-ed memory, I strongly support the
> >>>> idea that the first store should determine its type and the type
> >>>> can't
> >>>> change after that.
> >>>
...
> > In BFD, we have structures like this in allocated memory:
> >
> > struct elf_link_hash_entry
> > {
> >   ...
> >
> >   union
> >   {
....
> >     struct elf_link_hash_entry *weakdef;
...
> >     unsigned long elf_hash_value;
> >   } u;
....
> >
> > Here u.weakdef is used in the first pass of the link, through
> > adjust_dynamic_symbol.  u.elf_hash_value is used starting at
> > size_dynamic_sections.
> >
> > This is a simple hack to decrease linker memory usage.  It is portable
> > and, I believe, fully standards compliant.  It is a case in which
> > malloc'ed memory changes type after the first store.  Certainly it can
> > be avoided, but that is not the point.  I think the code is correct
> > and reasonable, and I think gcc should not break it.
> >
> > As I say, I may have misunderstood your suggestion.
>
> I see.  Yes, that's not what I meant.  In this case, the type of the
> malloc-ed memory is 'struct elf_link_hash_entry', and that's what's not
> allowed to change; what I want prohibited is code like:
>
>    void * mem = malloc (max (sizeof (int), sizeof (float));
>    *(int *)mem = 1;
>    *(float *)mem = 2.0;
>
> where the top-level type of the memory changes.

Can you elaborate?  I think that user memory allocators should
be OK with the above clarification, but this is not clear.

Consider a user defined memory manager:
      template <class T> my_allocator;
      list<some_struct, my_allocator<some_struct> > myListObject;
Now, a reasonable implementation of (for this case when size=1):
  1.  my_allocator<T>::allocate(size_t size)
  2.  my_allocator<T>::deallocate(T*ptr, size_t size)

Will allocate a chunk of pool memory, break it into chunks,
and manage a free list of
  union  free_list_element {
      free_list_element *next_free_list_element ;
      char element_memory[sizeof(T)];
      T *next_element_as_T;  // This may be needed...
  };

Everything can be done without a single casting of pointers.

Now repeating, T* p=allocate(1) and deallocate(p,1) will change the
type of the same pointer to T* and back to free_list_element.

This use should be safe (and it used to be safe).

Note that since `allocate' and `deallocate' are inline functions,
the compiler will mess things up if aliasing rules are changed
in the "wrong" direction.

Thinking more about it, I think that my example is not that safe
even with today's aliasing rules (unless 'char' comes to the rescue).
The problematic scenario is:
      T *p1=allocator.allocate(1);
      p1->a= 1;
      int val= p1->a;
      allocator.deallocate(p1, 1);

in this case, it is possible (in theory) that since p1->a is not
accessed through `union' members, aliasing rules will think
that it is not dependent on `union' stuff going on in deallocate.
This may lead to `val' being initialized with the value of
a pointer to `next_free_list_element'.

This is a total mess..... Can't aliasing rules be formulated
to allow for such a use?


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: DR#236 analysis
  2004-12-04  2:07 ` Paul Brook
@ 2004-12-04  3:14   ` Paul Schlie
  0 siblings, 0 replies; 12+ messages in thread
From: Paul Schlie @ 2004-12-04  3:14 UTC (permalink / raw)
  To: Paul Brook, gcc




> From: Paul Brook <paul@codesourcery.com>
>> Paul Schlie wrote:
>>> Geoff Keating writes:
>>> I see. Yes, that's not what I meant. In this case, the type of the
>>> malloc-ed memory is 'struct elf_link_hash_entry', and that's what's not
>>> allowed to change; what I want prohibited is code like:
>>> 
>>>  void * mem = malloc (max (sizeof (int), sizeof (float));
>>>  *(int *)mem = 1;
>>>  *(float *)mem = 2.0;
>>> 
>>> where the top-level type of the memory changes.
>> 
>> What's wrong with conservatively assuming that a pointer may alias
>> cumulatively any object type if ever explicitly dereferences, where
>> a dereferenced union is equivalent to it's cumulative set of types.)
>> 
>>  (i.e. logical equivalent to your example, mem :: int*, float*, or void*)
>> 
>> Where then if someone wants to take advantage of potential optimizations;
>> they should use pointers conservatively, and ideally explicitly declare
>> them as not being aliased if enabled by the language.
>> 
>> As it would seem to be better to be safe than sorry, and not unnecessarily
>> restrict programmers to accomplish what they want to do, with the
>> understanding that there may be consequences to their decisions.
> 
> It's called -fno-strict-aliasing.
> 
> The scheme you suggest rapidly degenerates to this because without whole
> program analysis we have to assume that third party libraries could violate
> aliasing rules. Cases where we can show pointers don't escape aren't
> particularly interesting because we can (should be able to?) prove those
> without using type based aliasing.
> 
> Paul

Yes, but at the programmers discretion (as a consequence of the actual use
of pointers, as opposed to it being forbidden) which seems like a reasonable
tradeoff if one subscribes to the philosophy that the programmer should be
given the latitude to get the job done as desired (within reason)?

WRT library alias attributes, might it be reasonably possible to record the
function's argument alias attributes in an information section within the
compiled library which may be subsequently referenced by the compiler as
necessary? (although do understand such a thing doesn't presently exist)


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: DR#236 analysis
  2004-12-04  0:36 Paul Schlie
@ 2004-12-04  2:07 ` Paul Brook
  2004-12-04  3:14   ` Paul Schlie
  0 siblings, 1 reply; 12+ messages in thread
From: Paul Brook @ 2004-12-04  2:07 UTC (permalink / raw)
  To: gcc; +Cc: Paul Schlie

On Saturday 04 December 2004 00:36, Paul Schlie wrote:
> > Geoff Keating writes:
> > I see. Yes, that's not what I meant. In this case, the type of the
> > malloc-ed memory is 'struct elf_link_hash_entry', and that's what's not
> > allowed to change; what I want prohibited is code like:
> >
> >  void * mem = malloc (max (sizeof (int), sizeof (float));
> >  *(int *)mem = 1;
> >  *(float *)mem = 2.0;
> >
> > where the top-level type of the memory changes.
>
> What's wrong with conservatively assuming that a pointer may alias
> cumulatively any object type if ever explicitly dereferences, where
> a dereferenced union is equivalent to it's cumulative set of types.)
>
>  (i.e. logical equivalent to your example, mem :: int*, float*, or void*)
>
> Where then if someone wants to take advantage of potential optimizations;
> they should use pointers conservatively, and ideally explicitly declare
> them as not being aliased if enabled by the language.
>
> As it would seem to be better to be safe than sorry, and not unnecessarily
> restrict programmers to accomplish what they want to do, with the
> understanding that there may be consequences to their decisions.

It's called -fno-strict-aliasing.

The scheme you suggest rapidly degenerates to this because without whole 
program analysis we have to assume that third party libraries could violate 
aliasing rules. Cases where we can show pointers don't escape aren't 
particularly interesting because we can (should be able to?) prove those 
without using type based aliasing.

Paul

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: DR#236 analysis
@ 2004-12-04  0:36 Paul Schlie
  2004-12-04  2:07 ` Paul Brook
  0 siblings, 1 reply; 12+ messages in thread
From: Paul Schlie @ 2004-12-04  0:36 UTC (permalink / raw)
  To: gcc

> Geoff Keating writes:
> I see. Yes, that's not what I meant. In this case, the type of the malloc-ed
> memory is 'struct elf_link_hash_entry', and that's what's not allowed to
> change; what I want prohibited is code like:
>
>  void * mem = malloc (max (sizeof (int), sizeof (float));
>  *(int *)mem = 1;
>  *(float *)mem = 2.0;
>
> where the top-level type of the memory changes.

What's wrong with conservatively assuming that a pointer may alias
cumulatively any object type if ever explicitly dereferences, where
a dereferenced union is equivalent to it's cumulative set of types.)

 (i.e. logical equivalent to your example, mem :: int*, float*, or void*)

Where then if someone wants to take advantage of potential optimizations;
they should use pointers conservatively, and ideally explicitly declare
them as not being aliased if enabled by the language.

As it would seem to be better to be safe than sorry, and not unnecessarily
restrict programmers to accomplish what they want to do, with the
understanding that there may be consequences to their decisions.


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2004-12-05 16:39 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-12-03  3:31 DR#236 analysis Joseph S. Myers
2004-12-03  4:15 ` Daniel Berlin
2004-12-03  7:13 ` Geoffrey Keating
2004-12-03 14:22   ` Ian Lance Taylor
2004-12-03 20:42     ` Geoffrey Keating
2004-12-03 20:55       ` Ian Lance Taylor
2004-12-03 21:21         ` Geoff Keating
2004-12-03 22:58           ` Ian Lance Taylor
2004-12-05 16:39           ` Michael Veksler
2004-12-04  0:36 Paul Schlie
2004-12-04  2:07 ` Paul Brook
2004-12-04  3:14   ` Paul Schlie

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).