public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* Re: Optimization of conditional access to globals: thread-unsafe?
       [not found] <e2e108260710260634q7a291337s6e66dfa25f28b68a@mail.gmail.com>
@ 2007-10-26 14:11 ` Bart Van Assche
  2007-10-26 15:14   ` Andrew Haley
  2007-10-26 16:08   ` skaller
  0 siblings, 2 replies; 208+ messages in thread
From: Bart Van Assche @ 2007-10-26 14:11 UTC (permalink / raw)
  To: Andrew Haley; +Cc: gcc, Andrew Pinski, Tomash Brechko

On 10/22/07, Andrew Haley <aph-gcc at littlepinkcloud dot COM> wrote:

> The core problem here seems to be that the "C with threads" memory
> model isn't sufficiently well-defined to make a determination
> possible.  You're assuming that you have no responsibility to mark
> shared memory protected by a mutex as volatile, but I know of nothing
> in the C standard that makes such a guarantee.  A prudent programmer
> will make conservative assumptions.

I agree that according to the C and C++ language standards, any
variable shared over threads should be declared volatile. But it is
nearly impossible to live with this requirement: this requirement
implies that for each library function that modifies data through
pointers a second version should be added that accepts a volatile
pointer instead of a regular pointer. Consider e.g. the function
snprintf(), which writes to the character buffer passed as its first
argument. When snprintf() is used to write to a buffer that is not
shared over threads, the existing snprintf() function is fine. When
however snprintf() is used to write to a buffer that is shared by two
or more threads, a version is needed of snprintf() that accepts
volatile char* as its first argument.

My opinion is that it should be possible to declare whether C/C++ code
has acquire, release or acquire+release semantics. The fact that code
has acquire semantics means that no subsequent load or store
operations may be moved in front of that code, and the fact that code
has release semantics means that no preceding load or store operations
may be moved past that code. Adding definitions for acquire and
release semantics in pthread.h would help a lot. E.g.
pthread_mutex_lock() should be declared to have acquire semantics, and
pthread_mutex_unlock() should be declared to have release semantics.
Maybe it is a good idea to add the following function attributes in
gcc: __attribute__((acquire)) and __attribute__((release)) ? A
refinement of these attributes would be to allow to specify not only
the acquire/release attributes, but also the memory locations to which
the acquire and release apply (pthreads synchronization functions
always apply to all memory locations).

I'm not inventing anything new here -- as far as I know the concepts
of acquire and release were first defined by Gharachorloo e.a. in 1990
(Memory consistency and event ordering in scalable shared-memory
multiprocessors, International Symposium on Computer Architecture,
1990, http://portal.acm.org/citation.cfm?id=325102&dl=ACM&coll=GUIDE).

Bart Van Assche.

^ permalink raw reply	[flat|nested] 208+ messages in thread

* Re: Optimization of conditional access to globals: thread-unsafe?
  2007-10-26 14:11 ` Optimization of conditional access to globals: thread-unsafe? Bart Van Assche
@ 2007-10-26 15:14   ` Andrew Haley
  2007-10-26 15:18     ` Robert Dewar
  2007-10-27 12:47     ` Bart Van Assche
  2007-10-26 16:08   ` skaller
  1 sibling, 2 replies; 208+ messages in thread
From: Andrew Haley @ 2007-10-26 15:14 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: gcc, Andrew Pinski, Tomash Brechko

Bart Van Assche writes:
 > On 10/22/07, Andrew Haley <aph-gcc at littlepinkcloud dot COM> wrote:
 > 
 > > The core problem here seems to be that the "C with threads" memory
 > > model isn't sufficiently well-defined to make a determination
 > > possible.  You're assuming that you have no responsibility to mark
 > > shared memory protected by a mutex as volatile, but I know of nothing
 > > in the C standard that makes such a guarantee.  A prudent programmer
 > > will make conservative assumptions.
 > 

...

 > My opinion is that it should be possible to declare whether C/C++
 > code has acquire, release or acquire+release semantics. The fact
 > that code has acquire semantics means that no subsequent load or
 > store operations may be moved in front of that code, and the fact
 > that code has release semantics means that no preceding load or
 > store operations may be moved past that code. Adding definitions
 > for acquire and release semantics in pthread.h would help a
 > lot. E.g.  pthread_mutex_lock() should be declared to have acquire
 > semantics, and pthread_mutex_unlock() should be declared to have
 > release semantics.

Hmmm.  This is an interesting idea, but it sounds to me as though it's
somewhat at variance with what is proposed by the C++ threads working
group.  In any case, gcc will certainly implement whatever the
standards committees come up with, but that is probably two years
away.

Right now the question is whether or not gcc will produce thread-safe
code according to some memory model, rather than any specific details
about what that model should be.

IMO, we need to move rapidly towards tracking the proposed model from
the C++ threads working paper.  This would at least provide a
reasonably sane model that corresponds with most thread and kernel
programmers' understanding.

Andrew.

^ permalink raw reply	[flat|nested] 208+ messages in thread

* Re: Optimization of conditional access to globals: thread-unsafe?
  2007-10-26 15:14   ` Andrew Haley
@ 2007-10-26 15:18     ` Robert Dewar
  2007-10-26 15:27       ` Dave Korn
  2007-10-26 16:00       ` Samuel Tardieu
  2007-10-27 12:47     ` Bart Van Assche
  1 sibling, 2 replies; 208+ messages in thread
From: Robert Dewar @ 2007-10-26 15:18 UTC (permalink / raw)
  To: Andrew Haley; +Cc: Bart Van Assche, gcc, Andrew Pinski, Tomash Brechko

Andrew Haley wrote:

> Hmmm.  This is an interesting idea, but it sounds to me as though it's
> somewhat at variance with what is proposed by the C++ threads working
> group.  In any case, gcc will certainly implement whatever the
> standards committees come up with, but that is probably two years
> away.

One problem at the language standards level is that you can't easily
talk about loads and stores, since you are defining an as-if semantic
model, and if you make a statement about loads and stores, any other
sequence which behaves as if that sequence were obeyed is allowed. In
the absence of a notion of threads at the semantic level it's difficult
to say what you mean in a formal way. In the Ada standard, we get
around this problem by having sections called "implementation advice",
which in practice are treated as requirements, but we can use language
that is not formally sound, even though everyone knows what we mean.
Of course in Ada there is a clear notion of threads semantic, and
a clear definition of what the meaning of code is in the presence
of threads, so the specific situation discussed here is easy to
deal with (though Ada takes the view that unsychronized shared access to
non-atomic or non-volatile data from separate threads has undefined
effects).

^ permalink raw reply	[flat|nested] 208+ messages in thread

* RE: Optimization of conditional access to globals: thread-unsafe?
  2007-10-26 15:18     ` Robert Dewar
@ 2007-10-26 15:27       ` Dave Korn
  2007-10-26 16:28         ` skaller
  2007-10-26 17:04         ` Richard Kenner
  2007-10-26 16:00       ` Samuel Tardieu
  1 sibling, 2 replies; 208+ messages in thread
From: Dave Korn @ 2007-10-26 15:27 UTC (permalink / raw)
  To: 'Robert Dewar', 'Andrew Haley'
  Cc: 'Bart Van Assche', gcc, 'Andrew Pinski',
	'Tomash Brechko'

On 26 October 2007 16:15, Robert Dewar wrote:

> One problem at the language standards level is that you can't easily
> talk about loads and stores, since you are defining an as-if semantic
> model, and if you make a statement about loads and stores, any other
> sequence which behaves as if that sequence were obeyed is allowed. 

  Well, that's precisely the problem - specifically in the context of
memory-mapped I/O registers - that volatile was invented to solve.  It may
never have been clearly defined in the formal language of the specs, but I
thought it was pretty clear in intent: the compiler will emit exactly one
machine load/store operation for any rvalue reference/lvalue assignment
(respectively) in the source, at the exact sequence point in the generated
code corresponding to the location of the reference in the source.  Any other
variable may be accessed more or fewer times than is written, and may be
accessed at places other than exactly where the reference is written in the
source, subject only to the as-if rule.


    cheers,
      DaveK
-- 
Can't think of a witty .sigline today....

^ permalink raw reply	[flat|nested] 208+ messages in thread

* Re: Optimization of conditional access to globals: thread-unsafe?
  2007-10-26 15:18     ` Robert Dewar
  2007-10-26 15:27       ` Dave Korn
@ 2007-10-26 16:00       ` Samuel Tardieu
  2007-10-26 17:03         ` Samuel Tardieu
  2007-10-27  9:33         ` Robert Dewar
  1 sibling, 2 replies; 208+ messages in thread
From: Samuel Tardieu @ 2007-10-26 16:00 UTC (permalink / raw)
  To: Robert Dewar
  Cc: Andrew Haley, Bart Van Assche, gcc, Andrew Pinski, Tomash Brechko

On 26/10, Robert Dewar wrote:

| Of course in Ada there is a clear notion of threads semantic, and
| a clear definition of what the meaning of code is in the presence
| of threads, so the specific situation discussed here is easy to
| deal with (though Ada takes the view that unsychronized shared access to
| non-atomic or non-volatile data from separate threads has undefined
| effects).

In the following example, is the access to "Shared" considered
unsynchronized even though what looks like a proper lock is used
around it?


package P is

   Shared : Natural := 0;

   procedure Maybe_Increment;

end P;


package body P is

   protected Lock is
      procedure Maybe_Lock (Locked : out Boolean);
      procedure Always_Unlock;
   private
      Is_Locked : Boolean := False;
   end Lock;

   protected body Lock is

      procedure Always_Unlock is
      begin
         Is_Locked := False;
      end Always_Unlock;

      procedure Maybe_Lock (Locked : out Boolean) is
      begin
         Locked    := not Is_Locked;
	 Is_Locked := True;
      end Maybe_Lock;

   end Lock;

   procedure Maybe_Increment is
      L : Boolean;
   begin
      Lock.Maybe_Lock (L);
      if L then
         Shared := Shared + 1;
      end if;
      Lock.Always_Unlock;
   end Maybe_Increment;

end P;

By naively reading the code, I would assume that if two tasks were to
call Maybe_Increment once, after completion of those tasks Shared would
contain either 1 or 2, depending on whether they both got the lock in
turn or if only one of them got it.

However, if you look at the x86 code for Maybe_Increment (-O3
-fomit-frame-pointer -fno-inline), you'll see:

 1   p__maybe_increment:
 2   .LFB11:
 3         subl    $12, %esp
 4 .LCFI6:
 5         movl    $p__lock, %eax
 6         call    p__lock__maybe_lockP
 7         cmpb    $1, %al
 8         movl    p__shared, %eax                 <=== unconditional load
 9         sbbl    $-1, %eax                       <=== conditional +1
10         movl    %eax, p__shared                 <=== unconditional store
11         movl    $p__lock, %eax
12         addl    $12, %esp
13         jmp     p__lock__always_unlockP

Note lines 8 to 10: on a multiprocessor system with both tasks running at
the same time on different processors, you can end up with Shared being
zero after the two tasks have ended (for example if the task getting the
lock runs one or two instructions ahead the one without the lock).

  Sam

^ permalink raw reply	[flat|nested] 208+ messages in thread

* Re: Optimization of conditional access to globals: thread-unsafe?
  2007-10-26 14:11 ` Optimization of conditional access to globals: thread-unsafe? Bart Van Assche
  2007-10-26 15:14   ` Andrew Haley
@ 2007-10-26 16:08   ` skaller
  1 sibling, 0 replies; 208+ messages in thread
From: skaller @ 2007-10-26 16:08 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: Andrew Haley, gcc, Andrew Pinski, Tomash Brechko


On Fri, 2007-10-26 at 16:05 +0200, Bart Van Assche wrote:
> On 10/22/07, Andrew Haley <aph-gcc at littlepinkcloud dot COM> wrote:

> I agree that according to the C and C++ language standards, any
> variable shared over threads should be declared volatile.

No, they say nothing about multi-threaded programs.

> My opinion is that it should be possible to declare whether C/C++ code
> has acquire, release or acquire+release semantics. The fact that code
> has acquire semantics means that no subsequent load or store
> operations may be moved in front of that code, and the fact that code
> has release semantics means that no preceding load or store operations
> may be moved past that code. Adding definitions for acquire and
> release semantics in pthread.h would help a lot. E.g.
> pthread_mutex_lock() should be declared to have acquire semantics, and
> pthread_mutex_unlock() should be declared to have release semantics.
> Maybe it is a good idea to add the following function attributes in
> gcc: __attribute__((acquire)) and __attribute__((release)) ? A
> refinement of these attributes would be to allow to specify not only
> the acquire/release attributes, but also the memory locations to which
> the acquire and release apply (pthreads synchronization functions
> always apply to all memory locations).

That sounds quite interesting!

But, now you should continue this idea. You are suggesting
primitives to attach to 'code' but  then only say 'functions'.
What about plain old statements? Expressions?

Now you need to specify a calculus for these properties.

I think this idea is really hot .. it's so much simpler
and fine-grained that just having a mutex.


-- 
John Skaller <skaller at users dot sf dot net>
Felix, successor to C++: http://felix.sf.net

^ permalink raw reply	[flat|nested] 208+ messages in thread

* RE: Optimization of conditional access to globals: thread-unsafe?
  2007-10-26 15:27       ` Dave Korn
@ 2007-10-26 16:28         ` skaller
  2007-10-26 16:38           ` Michael Matz
  2007-10-26 17:04         ` Richard Kenner
  1 sibling, 1 reply; 208+ messages in thread
From: skaller @ 2007-10-26 16:28 UTC (permalink / raw)
  To: Dave Korn
  Cc: 'Robert Dewar', 'Andrew Haley',
	'Bart Van Assche', gcc, 'Andrew Pinski',
	'Tomash Brechko'


On Fri, 2007-10-26 at 16:24 +0100, Dave Korn wrote:
> On 26 October 2007 16:15, Robert Dewar wrote:
> 
> > One problem at the language standards level is that you can't easily
> > talk about loads and stores, since you are defining an as-if semantic
> > model, and if you make a statement about loads and stores, any other
> > sequence which behaves as if that sequence were obeyed is allowed. 
> 
>   Well, that's precisely the problem - specifically in the context of
> memory-mapped I/O registers - that volatile was invented to solve.  It may
> never have been clearly defined in the formal language of the specs, but I
> thought it was pretty clear in intent: the compiler will emit exactly one
> machine load/store operation for any rvalue reference/lvalue assignment
> (respectively) in the source, at the exact sequence point in the generated
> code corresponding to the location of the reference in the source.  Any other
> variable may be accessed more or fewer times than is written, and may be
> accessed at places other than exactly where the reference is written in the
> source, subject only to the as-if rule.

Volatile semantics aren't defined, you have it backwards.

Bart hinted at the way it really works: it isn't a definition,
and it isn't a specification: volatile is part of the 
*conformance* model. Volatile accesses are *observable*.

So when the standard says of:

	int a = 1;
	int b = 2;
	printf("%d %d",a,b);

that a is initialised then b, the compiler can ignore the standard,
because there is no observable way to tell what the ordering is,
except that it has to be complete before the print happens.

If a,b above were volatile .. then the ordering is directly
observable, so the compiler is constrained to obey the
rules.

The point is -- there is no new rule here, and no definition
of what a volatile semantics is: volatile variables have
the SAME 'semantics' as any other variable. If I write:

	int a = 1;
	printf("%d", a);
	int b = 2;
	printf("a,b);

then it is just the same as if a,b were volatile.
In fact I could put

	double x=sin(1.0);

instead of the printf .. sin is a library function.
This means a debugger with a breakpoint on the sin can be
used to bug out the compiler as non-conforming if 'a' isn't
set (or, if b IS set) .. but print statements are easier :)

Stick a user defined function in there, which itself isn't
observable .. and all bets are off again.


-- 
John Skaller <skaller at users dot sf dot net>
Felix, successor to C++: http://felix.sf.net

^ permalink raw reply	[flat|nested] 208+ messages in thread

* RE: Optimization of conditional access to globals: thread-unsafe?
  2007-10-26 16:28         ` skaller
@ 2007-10-26 16:38           ` Michael Matz
  0 siblings, 0 replies; 208+ messages in thread
From: Michael Matz @ 2007-10-26 16:38 UTC (permalink / raw)
  To: skaller; +Cc: gcc

Hi,

On Sat, 27 Oct 2007, skaller wrote:

> The point is -- there is no new rule here, and no definition
> of what a volatile semantics is: volatile variables have
> the SAME 'semantics' as any other variable. If I write:
> 
> 	int a = 1;
> 	printf("%d", a);
> 	int b = 2;
> 	printf("a,b);
> 
> then it is just the same as if a,b were volatile.

Not at all.  As neither a nor b are global memory, printf() (or any other 
function) could not access them, hence no observer could determine if or 
if not 'b' is already set.  In contrast to when a and b were volatile.


Ciao,
Michael.

^ permalink raw reply	[flat|nested] 208+ messages in thread

* Re: Optimization of conditional access to globals: thread-unsafe?
  2007-10-26 16:00       ` Samuel Tardieu
@ 2007-10-26 17:03         ` Samuel Tardieu
  2007-10-27  9:33         ` Robert Dewar
  1 sibling, 0 replies; 208+ messages in thread
From: Samuel Tardieu @ 2007-10-26 17:03 UTC (permalink / raw)
  To: gcc

>>>>> "Sam" == Samuel Tardieu <sam@rfc1149.net> writes:

Sam> In the following example, is the access to "Shared" considered
Sam> unsynchronized even though what looks like a proper lock is used
Sam> around it?

Call to Always_Unlock was incorrect in the previous example, a fixed
one exhibiting the bug is now at http://pastebin.com/f1bc7ba32

  Sam
-- 
Samuel Tardieu -- sam@rfc1149.net -- http://www.rfc1149.net/

^ permalink raw reply	[flat|nested] 208+ messages in thread

* RE: Optimization of conditional access to globals: thread-unsafe?
  2007-10-26 15:27       ` Dave Korn
  2007-10-26 16:28         ` skaller
@ 2007-10-26 17:04         ` Richard Kenner
  1 sibling, 0 replies; 208+ messages in thread
From: Richard Kenner @ 2007-10-26 17:04 UTC (permalink / raw)
  To: dave.korn; +Cc: aph-gcc, bart.vanassche, dewar, gcc, pinskia, tomash.brechko

     I thought it was pretty clear in intent: the compiler will emit
     exactly one machine load/store operation for any rvalue
     reference/lvalue assignment (respectively) in the source, at the exact
     sequence point in the generated code corresponding to the location of
     the reference in the source.

The problem is that "one machine load/store operation" and "any rvalue"
aren't precisely-defined terms.  We've had numerous discussions on this
list before about how one might want to define them.

^ permalink raw reply	[flat|nested] 208+ messages in thread

* Re: Optimization of conditional access to globals: thread-unsafe?
  2007-10-26 16:00       ` Samuel Tardieu
  2007-10-26 17:03         ` Samuel Tardieu
@ 2007-10-27  9:33         ` Robert Dewar
  2007-10-27 13:49           ` Florian Weimer
  1 sibling, 1 reply; 208+ messages in thread
From: Robert Dewar @ 2007-10-27  9:33 UTC (permalink / raw)
  To: Samuel Tardieu
  Cc: Andrew Haley, Bart Van Assche, gcc, Andrew Pinski, Tomash Brechko

Samuel Tardieu wrote:
> On 26/10, Robert Dewar wrote:
> 
> | Of course in Ada there is a clear notion of threads semantic, and
> | a clear definition of what the meaning of code is in the presence
> | of threads, so the specific situation discussed here is easy to
> | deal with (though Ada takes the view that unsychronized shared access to
> | non-atomic or non-volatile data from separate threads has undefined
> | effects).
> 
> In the following example, is the access to "Shared" considered
> unsynchronized even though what looks like a proper lock is used
> around it?

Yes, it is unsynchronized. Why would you think otherwise? (referencing
the RM). You can't adopt a naive memory model in Ada! One way to think
about things is that tasks are free to keep local copies of all global
variables, synchronizing their values only at a point of 
synchronization. Locking of this kind needs to be done with
entry barriers.
> 
> package P is
> 
>    Shared : Natural := 0;
> 
>    procedure Maybe_Increment;
> 
> end P;
> 
> 
> package body P is
> 
>    protected Lock is
>       procedure Maybe_Lock (Locked : out Boolean);
>       procedure Always_Unlock;
>    private
>       Is_Locked : Boolean := False;
>    end Lock;
> 
>    protected body Lock is
> 
>       procedure Always_Unlock is
>       begin
>          Is_Locked := False;
>       end Always_Unlock;
> 
>       procedure Maybe_Lock (Locked : out Boolean) is
>       begin
>          Locked    := not Is_Locked;
> 	 Is_Locked := True;
>       end Maybe_Lock;
> 
>    end Lock;
> 
>    procedure Maybe_Increment is
>       L : Boolean;
>    begin
>       Lock.Maybe_Lock (L);
>       if L then
>          Shared := Shared + 1;
>       end if;
>       Lock.Always_Unlock;
>    end Maybe_Increment;
> 
> end P;
> 
> By naively reading the code, I would assume that if two tasks were to
> call Maybe_Increment once, after completion of those tasks Shared would
> contain either 1 or 2, depending on whether they both got the lock in
> turn or if only one of them got it.
> 
> However, if you look at the x86 code for Maybe_Increment (-O3
> -fomit-frame-pointer -fno-inline), you'll see:
> 
>  1   p__maybe_increment:
>  2   .LFB11:
>  3         subl    $12, %esp
>  4 .LCFI6:
>  5         movl    $p__lock, %eax
>  6         call    p__lock__maybe_lockP
>  7         cmpb    $1, %al
>  8         movl    p__shared, %eax                 <=== unconditional load
>  9         sbbl    $-1, %eax                       <=== conditional +1
> 10         movl    %eax, p__shared                 <=== unconditional store
> 11         movl    $p__lock, %eax
> 12         addl    $12, %esp
> 13         jmp     p__lock__always_unlockP
> 
> Note lines 8 to 10: on a multiprocessor system with both tasks running at
> the same time on different processors, you can end up with Shared being
> zero after the two tasks have ended (for example if the task getting the
> lock runs one or two instructions ahead the one without the lock).
> 
>   Sam


^ permalink raw reply	[flat|nested] 208+ messages in thread

* Re: Optimization of conditional access to globals: thread-unsafe?
  2007-10-26 15:14   ` Andrew Haley
  2007-10-26 15:18     ` Robert Dewar
@ 2007-10-27 12:47     ` Bart Van Assche
  2007-10-27 13:07       ` Florian Weimer
  1 sibling, 1 reply; 208+ messages in thread
From: Bart Van Assche @ 2007-10-27 12:47 UTC (permalink / raw)
  To: Andrew Haley; +Cc: gcc, Andrew Pinski, Tomash Brechko, Florian Weimer

> > On 10/22/07, Andrew Haley <aph-gcc at littlepinkcloud dot COM> wrote:
> >
> > > The core problem here seems to be that the "C with threads" memory
> > > model isn't sufficiently well-defined to make a determination
> > > possible.  You're assuming that you have no responsibility to mark
> > > shared memory protected by a mutex as volatile, but I know of nothing
> > > in the C standard that makes such a guarantee.  A prudent programmer
> > > will make conservative assumptions.

I'd like to return on this. Variables that are shared over two or more
threads can be classified as follows:
- static variables, global variables and dynamically allocated data.
- variables allocated on the stack.
As known the compiler may not reorder any access to any static
variable, global variable or dynamically allocated data with a call to
a function that is not declared inline. Variables that are allocated
on the stack of a thread can only be shared with another thread only
by passing a pointer to that variable to another thread first. Passing
such a pointer inhibits reordering of accesses to shared local
variables with a call to a function that is not declared inline.

Or: a C/C++ compiler will never reorder accesses to shared variables
with function calls. So if accesses to shared variables are properly
guarded with calls to synchronization functions, it is not necessary
to declare these shared variables volatile.

There is one category of synchronization functions that has not yet
been discussed: synchronization functions that can be inlined. I
propose to require that such synchronization functions contain at
least one asm statement that specifies that it clobbers memory,
because this inhibits reordering between the inlined synchronization
function and accesses to variables.

Maybe it's a good idea to add a chapter to the gcc manual about
multithreaded programming, such that gcc users who did not follow this
discussion can look up this kind of information easily ?

Bart Van Assche.

^ permalink raw reply	[flat|nested] 208+ messages in thread

* Re: Optimization of conditional access to globals: thread-unsafe?
  2007-10-27 12:47     ` Bart Van Assche
@ 2007-10-27 13:07       ` Florian Weimer
  2007-10-27 13:16         ` Bart Van Assche
  0 siblings, 1 reply; 208+ messages in thread
From: Florian Weimer @ 2007-10-27 13:07 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: Andrew Haley, gcc, Andrew Pinski, Tomash Brechko

* Bart Van Assche:

> As known the compiler may not reorder any access to any static
> variable, global variable or dynamically allocated data with a call to
> a function that is not declared inline.

I assume you mean "defined in another translation unit" instead of "not
declared inline".  Still, I don't think this is true as far as static
variables are concerned (both file-scoped and function-scoped ones).

There may only be a few cases where you can prove that a call to an
extern function does not access a file-scoped static variable, so doing
that optimization may not be worthwhile.  But the optimization is not
forbidden per se.

> Variables that are allocated on the stack of a thread can only be
> shared with another thread only by passing a pointer to that variable
> to another thread first. Passing such a pointer inhibits reordering of
> accesses to shared local variables with a call to a function that is
> not declared inline.

And this isn't really specific to threads.

> Maybe it's a good idea to add a chapter to the gcc manual about
> multithreaded programming, such that gcc users who did not follow this
> discussion can look up this kind of information easily ?

If there's a real chance that something like a C or C++ memory model
will be standardized in the forseeable future (three years, perhaps), it
might be unwise to set in stone a potentially conflicting set of rules
for GCC.

^ permalink raw reply	[flat|nested] 208+ messages in thread

* Re: Optimization of conditional access to globals: thread-unsafe?
  2007-10-27 13:07       ` Florian Weimer
@ 2007-10-27 13:16         ` Bart Van Assche
  2007-10-27 13:16           ` Andrew Haley
  2007-10-27 13:34           ` Florian Weimer
  0 siblings, 2 replies; 208+ messages in thread
From: Bart Van Assche @ 2007-10-27 13:16 UTC (permalink / raw)
  To: Florian Weimer; +Cc: Andrew Haley, gcc, Andrew Pinski, Tomash Brechko

On 10/27/07, Florian Weimer <fw@deneb.enyo.de> wrote:

> And this isn't really specific to threads.

Hello Florian,

What I was trying to explain is that it is not necessary to declare
shared variables volatile, not for any C/C++ compiler that is
compliant with the language standard. Your reply did not acknowledge
nor deny this.

Bart Van Assche.

^ permalink raw reply	[flat|nested] 208+ messages in thread

* Re: Optimization of conditional access to globals: thread-unsafe?
  2007-10-27 13:16         ` Bart Van Assche
@ 2007-10-27 13:16           ` Andrew Haley
  2007-10-27 13:34           ` Florian Weimer
  1 sibling, 0 replies; 208+ messages in thread
From: Andrew Haley @ 2007-10-27 13:16 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: Florian Weimer, gcc, Andrew Pinski, Tomash Brechko

Bart Van Assche writes:
 > On 10/27/07, Florian Weimer <fw@deneb.enyo.de> wrote:
 > 
 > > And this isn't really specific to threads.

 > What I was trying to explain is that it is not necessary to declare
 > shared variables volatile, not for any C/C++ compiler that is
 > compliant with the language standard.

Sadly, you didn't quote any language in the standard to justify such a
statement.  If you expect anyone to take you seriously you'll have to
do that.  For what it's worth, I don't believe that's what the ISO C
standard says.

In any case, this discussion is pointless: we are moving towards
tracking the proposed memory model, which solves the problem.

Andrew.

^ permalink raw reply	[flat|nested] 208+ messages in thread

* Re: Optimization of conditional access to globals: thread-unsafe?
  2007-10-27 13:16         ` Bart Van Assche
  2007-10-27 13:16           ` Andrew Haley
@ 2007-10-27 13:34           ` Florian Weimer
  2007-10-28 13:47             ` Bart Van Assche
  1 sibling, 1 reply; 208+ messages in thread
From: Florian Weimer @ 2007-10-27 13:34 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: Andrew Haley, gcc, Andrew Pinski, Tomash Brechko

* Bart Van Assche:

> On 10/27/07, Florian Weimer <fw@deneb.enyo.de> wrote:
>
>> And this isn't really specific to threads.
>
> Hello Florian,
>
> What I was trying to explain is that it is not necessary to declare
> shared variables volatile, not for any C/C++ compiler that is
> compliant with the language standard.

The point of this thread is that a compliant compiler can turn loads
into stores if the object is not volatile at the point of the load.

Anyway, not reordering across function calls is not sufficient to get
sane threading semantics (IIRC, this is also explained in detail in Hans
Boehm's paper).

^ permalink raw reply	[flat|nested] 208+ messages in thread

* Re: Optimization of conditional access to globals: thread-unsafe?
  2007-10-27  9:33         ` Robert Dewar
@ 2007-10-27 13:49           ` Florian Weimer
  2007-10-27 13:59             ` Samuel Tardieu
  2007-10-27 16:25             ` Robert Dewar
  0 siblings, 2 replies; 208+ messages in thread
From: Florian Weimer @ 2007-10-27 13:49 UTC (permalink / raw)
  To: Robert Dewar
  Cc: Samuel Tardieu, Andrew Haley, Bart Van Assche, gcc,
	Andrew Pinski, Tomash Brechko

* Robert Dewar:

>> In the following example, is the access to "Shared" considered
>> unsynchronized even though what looks like a proper lock is used
>> around it?
>
> Yes, it is unsynchronized. Why would you think otherwise?

The signaling rules are dynamic, not static.  Only the code path that is
actually taken matters.  Sam's corrected code only updates Shared if the
operation in other tasks have been signaled (because of the entry_body
rule and the sequence rule).

(I can't reproduce the conditional store with my GCC 4.2 installation,
though.)

^ permalink raw reply	[flat|nested] 208+ messages in thread

* Re: Optimization of conditional access to globals: thread-unsafe?
  2007-10-27 13:49           ` Florian Weimer
@ 2007-10-27 13:59             ` Samuel Tardieu
  2007-10-27 14:25               ` Florian Weimer
  2007-10-27 16:25             ` Robert Dewar
  1 sibling, 1 reply; 208+ messages in thread
From: Samuel Tardieu @ 2007-10-27 13:59 UTC (permalink / raw)
  To: Florian Weimer
  Cc: Robert Dewar, Andrew Haley, Bart Van Assche, gcc, Andrew Pinski,
	Tomash Brechko

On 27/10, Florian Weimer wrote:

| (I can't reproduce the conditional store with my GCC 4.2 installation,
| though.)

You need "-O -fno-inline" to trigger it on this particular example
(you don't need "-fno-inline" if you put "Lock" in a separate package).

^ permalink raw reply	[flat|nested] 208+ messages in thread

* Re: Optimization of conditional access to globals: thread-unsafe?
  2007-10-27 13:59             ` Samuel Tardieu
@ 2007-10-27 14:25               ` Florian Weimer
  2007-10-27 19:35                 ` Andrew Haley
  0 siblings, 1 reply; 208+ messages in thread
From: Florian Weimer @ 2007-10-27 14:25 UTC (permalink / raw)
  To: Samuel Tardieu
  Cc: Robert Dewar, Andrew Haley, Bart Van Assche, gcc, Andrew Pinski,
	Tomash Brechko

* Samuel Tardieu:

> On 27/10, Florian Weimer wrote:
>
> | (I can't reproduce the conditional store with my GCC 4.2 installation,
> | though.)
>
> You need "-O -fno-inline" to trigger it on this particular example
> (you don't need "-fno-inline" if you put "Lock" in a separate package).

Ah, thanks.  I see it now.

If not for Ada, we need to fix it for Java.  The following snippet shows
the same problem:

class C {
    static volatile boolean flag;
    static int shared;
    public void maybe_increment() {
        if (flag)
            ++shared;
    }
}

_ZN1C15maybe_incrementEJvv:
.LFB3:
	movzbl	_ZN1C4flagE(%rip), %eax
	cmpb	$1, %al
	movl	_ZN1C6sharedE(%rip), %eax
	sbbl	$-1, %eax
	movl	%eax, _ZN1C6sharedE(%rip)
	ret

And the 1.5 memory model should really, really prevent that (if not,
it's broken).

^ permalink raw reply	[flat|nested] 208+ messages in thread

* Re: Optimization of conditional access to globals: thread-unsafe?
  2007-10-27 13:49           ` Florian Weimer
  2007-10-27 13:59             ` Samuel Tardieu
@ 2007-10-27 16:25             ` Robert Dewar
  2007-10-27 16:43               ` Samuel Tardieu
  1 sibling, 1 reply; 208+ messages in thread
From: Robert Dewar @ 2007-10-27 16:25 UTC (permalink / raw)
  To: Florian Weimer
  Cc: Samuel Tardieu, Andrew Haley, Bart Van Assche, gcc,
	Andrew Pinski, Tomash Brechko

Florian Weimer wrote:
> * Robert Dewar:
> 
>>> In the following example, is the access to "Shared" considered
>>> unsynchronized even though what looks like a proper lock is used
>>> around it?
>> Yes, it is unsynchronized. Why would you think otherwise?
> 
> The signaling rules are dynamic, not static.  Only the code path that is
> actually taken matters.  Sam's corrected code only updates Shared if the
> operation in other tasks have been signaled (because of the entry_body
> rule and the sequence rule).

I don't understand, are we looking at the same example, the example
from Sam that I looked at did not have an entry body, so how could the
entry body rule apply?
> 
> (I can't reproduce the conditional store with my GCC 4.2 installation,
> though.)


^ permalink raw reply	[flat|nested] 208+ messages in thread

* Re: Optimization of conditional access to globals: thread-unsafe?
  2007-10-27 16:25             ` Robert Dewar
@ 2007-10-27 16:43               ` Samuel Tardieu
  0 siblings, 0 replies; 208+ messages in thread
From: Samuel Tardieu @ 2007-10-27 16:43 UTC (permalink / raw)
  To: Robert Dewar
  Cc: Florian Weimer, Andrew Haley, Bart Van Assche, gcc,
	Andrew Pinski, Tomash Brechko

On 27/10, Robert Dewar wrote:

> I don't understand, are we looking at the same example, the example
> from Sam that I looked at did not have an entry body, so how could the
> entry body rule apply?

Let's look at the example (where I replaced protected procedures by
entries with a guard which is always True). As per RM95 9.10, I understand
that the calls to Maybe_Lock/Maybe_Unlock act as a synchronization point
for two tasks executing Maybe_Increment:

  http://pastebin.com/f35d673e9

This looks to me like a proper way to "synchronize the actions of two or
more tasks to allow, for example, meaningful communication by the direct
updating and reading of variables shared between the tasks" (RM citation).

However, despite the fact that the first task will get what looks like
a lock (L is True), with the current "optimization" the other one will
be able to modify the Shared variable even though the Ada code is not
supposed to read or write in its memory location (as L will be False
in the second task since the lock has not been acquired as it has been
granted to the first task).

I admit that this example may be contrived and not occur in real-life
code, but I am surprised that Ada allows the Shared variable to be
updated to its older value by a race condition in this case.

^ permalink raw reply	[flat|nested] 208+ messages in thread

* Re: Optimization of conditional access to globals: thread-unsafe?
  2007-10-27 14:25               ` Florian Weimer
@ 2007-10-27 19:35                 ` Andrew Haley
  0 siblings, 0 replies; 208+ messages in thread
From: Andrew Haley @ 2007-10-27 19:35 UTC (permalink / raw)
  To: Florian Weimer
  Cc: Samuel Tardieu, Robert Dewar, Bart Van Assche, gcc,
	Andrew Pinski, Tomash Brechko

Florian Weimer writes:
 > * Samuel Tardieu:
 > 
 > > On 27/10, Florian Weimer wrote:
 > >
 > > | (I can't reproduce the conditional store with my GCC 4.2 installation,
 > > | though.)
 > >
 > > You need "-O -fno-inline" to trigger it on this particular example
 > > (you don't need "-fno-inline" if you put "Lock" in a separate package).
 > 
 > Ah, thanks.  I see it now.
 > 
 > If not for Ada, we need to fix it for Java.  The following snippet shows
 > the same problem:
 > 
 > class C {
 >     static volatile boolean flag;
 >     static int shared;
 >     public void maybe_increment() {
 >         if (flag)
 >             ++shared;
 >     }
 > }
 > 
 > _ZN1C15maybe_incrementEJvv:
 > .LFB3:
 > 	movzbl	_ZN1C4flagE(%rip), %eax
 > 	cmpb	$1, %al
 > 	movl	_ZN1C6sharedE(%rip), %eax
 > 	sbbl	$-1, %eax
 > 	movl	%eax, _ZN1C6sharedE(%rip)
 > 	ret
 > 
 > And the 1.5 memory model should really, really prevent that (if not,
 > it's broken).

The Java 1.5 memory model will prevent that.  I'm hoping Ian Taylor's
patch will fix this problem for Java as well as C.

Andrew.

^ permalink raw reply	[flat|nested] 208+ messages in thread

* Re: Optimization of conditional access to globals: thread-unsafe?
  2007-10-27 13:34           ` Florian Weimer
@ 2007-10-28 13:47             ` Bart Van Assche
  2007-10-28 13:53               ` Robert Dewar
                                 ` (2 more replies)
  0 siblings, 3 replies; 208+ messages in thread
From: Bart Van Assche @ 2007-10-28 13:47 UTC (permalink / raw)
  To: Florian Weimer
  Cc: Andrew Haley, gcc, Andrew Pinski, Tomash Brechko, Robert Dewar

On 10/27/07, Florian Weimer <fw@deneb.enyo.de> wrote:
>
> Anyway, not reordering across function calls is not sufficient to get
> sane threading semantics (IIRC, this is also explained in detail in Hans
> Boehm's paper).

Hello Florian,

In Hans Boehm's paper the following issues are identified:
1. Concurrent accesses of variables without explicit locking can cause
unexpected results in a multithreaded context (paragraph 4.1).
2. If non-atomic variables (e.g. one field of a bitfield) are shared
over threads, and these are not protected by explicit locking,
updating such a variable in a multithreaded context is troublesome
(paragraph 4.2).
3. If the compiler performs register promotion on a shared variable,
this can cause undesired results in a multithreaded context (paragraph
4.3)

And this thread started with:
4. If the compiler generates a store operation for an assignment
statement that is not executed, this can cause trouble in a
mulithreaded context.

My opinion is that, given the importance of multithreading, it should
be documented in the gcc manual which optimizations can cause trouble
in multithreaded software (such as (3) and (4)). It should also be
documented which compiler flags must be used to disable optimizations
that cause trouble for multithreaded software. Requiring that all
thread-shared variables should be declared volatile is completely
unacceptable. We need a solution today for the combination of C/C++
and POSIX threads, we can't wait for the respective language
standardization committees to come up with a solution.

Regarding issues (1) and (2): (1) can be addressed by using
platform-specific or compiler-specific solutions, e.g. the datatype
atomic_t provided by the Linux kernel headers. And any prudent
programmer won't write code that triggers (2).

And as you may have noted, I do not agree with Hans Boehm where he
states that the combination of C/C++ with POSIX threads cannot result
in correctly working programs. I believe that the issues raised by
Hans Boehm can be solved.

Bart Van Assche.

^ permalink raw reply	[flat|nested] 208+ messages in thread

* Re: Optimization of conditional access to globals: thread-unsafe?
  2007-10-28 13:47             ` Bart Van Assche
@ 2007-10-28 13:53               ` Robert Dewar
  2007-10-28 15:03                 ` Tomash Brechko
  2007-10-28 21:19                 ` Bart Van Assche
  2007-10-28 14:18               ` Andrew Haley
  2007-10-28 15:07               ` Dave Korn
  2 siblings, 2 replies; 208+ messages in thread
From: Robert Dewar @ 2007-10-28 13:53 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Florian Weimer, Andrew Haley, gcc, Andrew Pinski, Tomash Brechko

Bart Van Assche wrote:

> My opinion is that, given the importance of multithreading, it should
> be documented in the gcc manual which optimizations can cause trouble
> in multithreaded software (such as (3) and (4)). It should also be
> documented which compiler flags must be used to disable optimizations
> that cause trouble for multithreaded software. Requiring that all
> thread-shared variables should be declared volatile is completely
> unacceptable. 

Why is this unacceptable .. seems much better to me than writing
undefined stuff.

> And as you may have noted, I do not agree with Hans Boehm where he
> states that the combination of C/C++ with POSIX threads cannot result
> in correctly working programs. I believe that the issues raised by
> Hans Boehm can be solved.

Well Hans is talking about C/C++, you are talking about some other
language in which programs which do not have well defined semantics
in C or C++ do have well defined semantics in your language.
> 
> Bart Van Assche.


^ permalink raw reply	[flat|nested] 208+ messages in thread

* Re: Optimization of conditional access to globals: thread-unsafe?
  2007-10-28 13:47             ` Bart Van Assche
  2007-10-28 13:53               ` Robert Dewar
@ 2007-10-28 14:18               ` Andrew Haley
  2007-10-28 15:07               ` Dave Korn
  2 siblings, 0 replies; 208+ messages in thread
From: Andrew Haley @ 2007-10-28 14:18 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Florian Weimer, gcc, Andrew Pinski, Tomash Brechko, Robert Dewar

Bart Van Assche writes:

 > We need a solution today for the combination of C/C++ and POSIX
 > threads, we can't wait for the respective language standardization
 > committees to come up with a solution.

And, in the proposed memory model, I believe we have one.  If there is
some reason you believe the proposed memory model won't be sufficient,
then maybe we can start looking at doing something gcc local.  But it
would surely be much better to do this through the ISO TC.  They will
doubtless be glad to respond to any concerns that you have.

Andrew.

^ permalink raw reply	[flat|nested] 208+ messages in thread

* Re: Optimization of conditional access to globals: thread-unsafe?
  2007-10-28 13:53               ` Robert Dewar
@ 2007-10-28 15:03                 ` Tomash Brechko
  2007-10-28 21:19                 ` Bart Van Assche
  1 sibling, 0 replies; 208+ messages in thread
From: Tomash Brechko @ 2007-10-28 15:03 UTC (permalink / raw)
  To: Robert Dewar
  Cc: Bart Van Assche, Florian Weimer, Andrew Haley, gcc, Andrew Pinski

On Sun, Oct 28, 2007 at 09:47:36 -0400, Robert Dewar wrote:
> Bart Van Assche wrote:
> 
> >Requiring that all thread-shared variables should be declared
> >volatile is completely unacceptable.
> 
> Why is this unacceptable .. seems much better to me than writing
> undefined stuff.

There's a parallel thread in the Linux Kernel Mailing List.  Everyone
is advised to read it, if not already.  There are several good points
there:

  - the problem is not limited to multithreaded domain: the page with
    the object could be made read-only during execution, thus

       if (! page_is_read_only)
         v = 1;

    would SIGSEGV for no apparent reason.

  - making things volatile is unacceptable from performance POV.

  - optimization in question might well turn out to be misoptimization
    for anything but microbenchmarks (read LKML for cache flush/dirty
    page issues).

  - "people knowledgeable in POSIX say that this optimization is
    bogus".  I would add that though we may say that Standard C is not
    aware of threads, POSIX _is_ aware of Standard C.  While POSIX
    failed to solve the issue by formal word, its intent is clear: to
    make POSIX Threads usable.  The compiler that claims to be POSIX
    compatible should take this into account.

  - there's also a good talk on lawyer-ish vs attached-to-reality
    approach.  I personally doubt those who continue to advise to use
    volatile are actually writing such multithreaded programs.  Most
    argue just for the fun of it.


> Well Hans is talking about C/C++, you are talking about some other
> language in which programs which do not have well defined semantics
> in C or C++ do have well defined semantics in your language.

Good thing we have this _bug_ in languages that define memory
semantics (Ada, Java), and no one yet argues that GCC should be fixed
wrt to only those languages.


-- 
   Tomash Brechko

^ permalink raw reply	[flat|nested] 208+ messages in thread

* RE: Optimization of conditional access to globals: thread-unsafe?
  2007-10-28 13:47             ` Bart Van Assche
  2007-10-28 13:53               ` Robert Dewar
  2007-10-28 14:18               ` Andrew Haley
@ 2007-10-28 15:07               ` Dave Korn
  2007-10-28 17:29                 ` Erik Trulsson
  2 siblings, 1 reply; 208+ messages in thread
From: Dave Korn @ 2007-10-28 15:07 UTC (permalink / raw)
  To: 'Bart Van Assche', 'Florian Weimer'
  Cc: 'Andrew Haley', gcc, 'Andrew Pinski',
	'Tomash Brechko', 'Robert Dewar'

On 28 October 2007 13:32, Bart Van Assche wrote:

>  Requiring that all
> thread-shared variables should be declared volatile is completely
> unacceptable. 

  Any variable that may be altered by an external unpredictable asynchronous
'force majeure' must be declared volatile or the behaviour is undefined.  Your
code is simply incorrect, and you appear to be demanding that the language
standards and the compiler all be revised to make the buggy code valid.

> We need a solution today for the combination of C/C++
> and POSIX threads, we can't wait for the respective language
> standardization committees to come up with a solution.

  You already have it, but you have declared it "unacceptable" and refused to
use it without stating any clear reason.

    cheers,
      DaveK
-- 
Can't think of a witty .sigline today....

^ permalink raw reply	[flat|nested] 208+ messages in thread

* Re: Optimization of conditional access to globals: thread-unsafe?
  2007-10-28 17:29                 ` Erik Trulsson
@ 2007-10-28 17:26                   ` Robert Dewar
  2007-10-28 17:49                     ` Erik Trulsson
  2007-10-28 17:39                   ` Richard Guenther
  1 sibling, 1 reply; 208+ messages in thread
From: Robert Dewar @ 2007-10-28 17:26 UTC (permalink / raw)
  To: Dave Korn, 'Bart Van Assche', 'Florian Weimer',
	'Andrew Haley', gcc, 'Andrew Pinski',
	'Tomash Brechko', 'Robert Dewar'

Erik Trulsson wrote:

> Unfortunately it seems that the POSIX standard for threads say that as long
> as access to a shared variable is protected by a mutex there is no need to
> use 'volatile'.

How does it say this, in some semantically precise way, or with hand
waving as in this sentence.
> 
> This means that POSIX essentially defines certain behaviours that the C
> standard left undefined.

But does it do so precisely?

^ permalink raw reply	[flat|nested] 208+ messages in thread

* Re: Optimization of conditional access to globals: thread-unsafe?
  2007-10-28 15:07               ` Dave Korn
@ 2007-10-28 17:29                 ` Erik Trulsson
  2007-10-28 17:26                   ` Robert Dewar
  2007-10-28 17:39                   ` Richard Guenther
  0 siblings, 2 replies; 208+ messages in thread
From: Erik Trulsson @ 2007-10-28 17:29 UTC (permalink / raw)
  To: Dave Korn
  Cc: 'Bart Van Assche', 'Florian Weimer',
	'Andrew Haley', gcc, 'Andrew Pinski',
	'Tomash Brechko', 'Robert Dewar'

On Sun, Oct 28, 2007 at 03:03:46PM -0000, Dave Korn wrote:
> On 28 October 2007 13:32, Bart Van Assche wrote:
> 
> >  Requiring that all
> > thread-shared variables should be declared volatile is completely
> > unacceptable. 
> 
>   Any variable that may be altered by an external unpredictable asynchronous
> 'force majeure' must be declared volatile or the behaviour is undefined.  Your
> code is simply incorrect, and you appear to be demanding that the language
> standards and the compiler all be revised to make the buggy code valid.


Unfortunately it seems that the POSIX standard for threads say that as long
as access to a shared variable is protected by a mutex there is no need to
use 'volatile'.

This means that POSIX essentially defines certain behaviours that the C
standard left undefined.

Personally I think the POSIX standard is broken in this regard, but if
programs that are valid according to POSIX are to work correctly then it is
not sufficient for the compiler to follow the C standard.  It must also not
break any of the guarantees that POSIX makes.


> 
> > We need a solution today for the combination of C/C++
> > and POSIX threads, we can't wait for the respective language
> > standardization committees to come up with a solution.
> 
>   You already have it, but you have declared it "unacceptable" and refused to
> use it without stating any clear reason.
> 
>     cheers,
>       DaveK
> -- 
> Can't think of a witty .sigline today....
> 

-- 
<Insert your favourite quote here.>
Erik Trulsson
ertr1013@student.uu.se

^ permalink raw reply	[flat|nested] 208+ messages in thread

* Re: Optimization of conditional access to globals: thread-unsafe?
  2007-10-28 17:29                 ` Erik Trulsson
  2007-10-28 17:26                   ` Robert Dewar
@ 2007-10-28 17:39                   ` Richard Guenther
  2007-10-28 18:03                     ` Erik Trulsson
                                       ` (2 more replies)
  1 sibling, 3 replies; 208+ messages in thread
From: Richard Guenther @ 2007-10-28 17:39 UTC (permalink / raw)
  To: Dave Korn, Bart Van Assche, Florian Weimer, Andrew Haley, gcc,
	Andrew Pinski, Tomash Brechko, Robert Dewar

On 10/28/07, Erik Trulsson <ertr1013@student.uu.se> wrote:
> On Sun, Oct 28, 2007 at 03:03:46PM -0000, Dave Korn wrote:
> > On 28 October 2007 13:32, Bart Van Assche wrote:
> >
> > >  Requiring that all
> > > thread-shared variables should be declared volatile is completely
> > > unacceptable.
> >
> >   Any variable that may be altered by an external unpredictable asynchronous
> > 'force majeure' must be declared volatile or the behaviour is undefined.  Your
> > code is simply incorrect, and you appear to be demanding that the language
> > standards and the compiler all be revised to make the buggy code valid.
>
>
> Unfortunately it seems that the POSIX standard for threads say that as long
> as access to a shared variable is protected by a mutex there is no need to
> use 'volatile'.

Which is a very unpracticable say, as it essentially would force the compiler
to assume every variable is protected by a mutex (how should it prove
otherwise?)

Richard.

^ permalink raw reply	[flat|nested] 208+ messages in thread

* Re: Optimization of conditional access to globals: thread-unsafe?
  2007-10-28 17:26                   ` Robert Dewar
@ 2007-10-28 17:49                     ` Erik Trulsson
  2007-10-28 18:02                       ` Andreas Schwab
  2007-10-28 18:40                       ` Dave Korn
  0 siblings, 2 replies; 208+ messages in thread
From: Erik Trulsson @ 2007-10-28 17:49 UTC (permalink / raw)
  To: Robert Dewar
  Cc: Dave Korn, 'Bart Van Assche', 'Florian Weimer',
	'Andrew Haley', gcc, 'Andrew Pinski',
	'Tomash Brechko'

On Sun, Oct 28, 2007 at 01:10:00PM -0400, Robert Dewar wrote:
> Erik Trulsson wrote:
> 
>> Unfortunately it seems that the POSIX standard for threads say that as 
>> long
>> as access to a shared variable is protected by a mutex there is no need to
>> use 'volatile'.
> 
> How does it say this, in some semantically precise way, or with hand
> waving as in this sentence.

I don't know.  I don't have access to the POSIX standard itself so I have
to depend on other peoples description of what POSIX says. (Thus my use of
'seems' above.)
Everything I have found seem to agree that POSIX does not require the use of
volatile though.


>> This means that POSIX essentially defines certain behaviours that the C
>> standard left undefined.
> 
> But does it do so precisely?

I doubt it.

Personally I suspect that "C + pthreads" is simply not well-defined
currently, and that almost every single program out there that uses pthreads
is depending on undefined behaviour.  Taking a hard-line stance on that
seems unlikely to be very popular or useful, though.



-- 
<Insert your favourite quote here.>
Erik Trulsson
ertr1013@student.uu.se

^ permalink raw reply	[flat|nested] 208+ messages in thread

* Re: Optimization of conditional access to globals: thread-unsafe?
  2007-10-28 17:49                     ` Erik Trulsson
@ 2007-10-28 18:02                       ` Andreas Schwab
  2007-11-04 14:33                         ` [wwwdocs] PATCH " Gerald Pfeifer
  2007-10-28 18:40                       ` Dave Korn
  1 sibling, 1 reply; 208+ messages in thread
From: Andreas Schwab @ 2007-10-28 18:02 UTC (permalink / raw)
  To: Robert Dewar
  Cc: Dave Korn, 'Bart Van Assche', 'Florian Weimer',
	'Andrew Haley', gcc, 'Andrew Pinski',
	'Tomash Brechko'

Erik Trulsson <ertr1013@student.uu.se> writes:

> I don't have access to the POSIX standard itself

See <http://www.opengroup.org/onlinepubs/009695399/toc.htm>.

Andreas.

-- 
Andreas Schwab, SuSE Labs, schwab@suse.de
SuSE Linux Products GmbH, Maxfeldstraße 5, 90409 Nürnberg, Germany
PGP key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."

^ permalink raw reply	[flat|nested] 208+ messages in thread

* Re: Optimization of conditional access to globals: thread-unsafe?
  2007-10-28 17:39                   ` Richard Guenther
@ 2007-10-28 18:03                     ` Erik Trulsson
  2007-10-28 20:12                     ` skaller
  2007-10-29  9:57                     ` Andrew Haley
  2 siblings, 0 replies; 208+ messages in thread
From: Erik Trulsson @ 2007-10-28 18:03 UTC (permalink / raw)
  To: Richard Guenther
  Cc: Dave Korn, Bart Van Assche, Florian Weimer, Andrew Haley, gcc,
	Andrew Pinski, Tomash Brechko, Robert Dewar

On Sun, Oct 28, 2007 at 06:29:44PM +0100, Richard Guenther wrote:
> On 10/28/07, Erik Trulsson <ertr1013@student.uu.se> wrote:
> > On Sun, Oct 28, 2007 at 03:03:46PM -0000, Dave Korn wrote:
> > > On 28 October 2007 13:32, Bart Van Assche wrote:
> > >
> > > >  Requiring that all
> > > > thread-shared variables should be declared volatile is completely
> > > > unacceptable.
> > >
> > >   Any variable that may be altered by an external unpredictable asynchronous
> > > 'force majeure' must be declared volatile or the behaviour is undefined.  Your
> > > code is simply incorrect, and you appear to be demanding that the language
> > > standards and the compiler all be revised to make the buggy code valid.
> >
> >
> > Unfortunately it seems that the POSIX standard for threads say that as long
> > as access to a shared variable is protected by a mutex there is no need to
> > use 'volatile'.
> 
> Which is a very unpracticable say, as it essentially would force the compiler
> to assume every variable is protected by a mutex (how should it prove
> otherwise?)

Not quite, but nearly so.  There are some situations where the compiler can
prove that a variable cannot be shared - for example a variable which is
local to a function, where that function never passes the address of that
variable to any other function (and where that function itself does not
create any new threads).  In that case no other thread can know the address
of that variable and thus it cannot be shared.




-- 
<Insert your favourite quote here.>
Erik Trulsson
ertr1013@student.uu.se

^ permalink raw reply	[flat|nested] 208+ messages in thread

* RE: Optimization of conditional access to globals: thread-unsafe?
  2007-10-28 17:49                     ` Erik Trulsson
  2007-10-28 18:02                       ` Andreas Schwab
@ 2007-10-28 18:40                       ` Dave Korn
  2007-10-28 19:15                         ` Erik Trulsson
  1 sibling, 1 reply; 208+ messages in thread
From: Dave Korn @ 2007-10-28 18:40 UTC (permalink / raw)
  To: 'Erik Trulsson', 'Robert Dewar'
  Cc: 'Bart Van Assche', 'Florian Weimer',
	'Andrew Haley', gcc, 'Andrew Pinski',
	'Tomash Brechko'

On 28 October 2007 17:39, Erik Trulsson wrote:

> On Sun, Oct 28, 2007 at 01:10:00PM -0400, Robert Dewar wrote:
>> Erik Trulsson wrote:
>> 
>>> Unfortunately it seems that the POSIX standard for threads say that as
>>> long as access to a shared variable is protected by a mutex there is no
>>> need to use 'volatile'.
>> 
>> How does it say this, in some semantically precise way, or with hand
>> waving as in this sentence.
> 
> I don't know.  I don't have access to the POSIX standard itself so I have
> to depend on other peoples description of what POSIX says. (Thus my use of
> 'seems' above.)
> Everything I have found seem to agree that POSIX does not require the use of
> volatile though.

  As far as I know, there is no separate 'pthreads' spec apart from what is
defined in the Threads section (2.9) of the SUS (http://tinyurl.com/2wdq2u)
and what it says about the various pthread_ functions in the system interfaces
(http://tinyurl.com/2r7c5k) chapter.  None of that, as far as I have been able
to determine, makes any kind of claims about access to shared state or the use
of volatile.


    cheers,
      DaveK

-- 
Can't think of a witty .sigline today....

^ permalink raw reply	[flat|nested] 208+ messages in thread

* Re: Optimization of conditional access to globals: thread-unsafe?
  2007-10-28 18:40                       ` Dave Korn
@ 2007-10-28 19:15                         ` Erik Trulsson
  2007-10-28 20:43                           ` skaller
  2007-10-29  5:17                           ` Ross Smith
  0 siblings, 2 replies; 208+ messages in thread
From: Erik Trulsson @ 2007-10-28 19:15 UTC (permalink / raw)
  To: Dave Korn
  Cc: 'Robert Dewar', 'Bart Van Assche',
	'Florian Weimer', 'Andrew Haley',
	gcc, 'Andrew Pinski', 'Tomash Brechko'

On Sun, Oct 28, 2007 at 06:06:17PM -0000, Dave Korn wrote:
> On 28 October 2007 17:39, Erik Trulsson wrote:
> 
> > On Sun, Oct 28, 2007 at 01:10:00PM -0400, Robert Dewar wrote:
> >> Erik Trulsson wrote:
> >> 
> >>> Unfortunately it seems that the POSIX standard for threads say that as
> >>> long as access to a shared variable is protected by a mutex there is no
> >>> need to use 'volatile'.
> >> 
> >> How does it say this, in some semantically precise way, or with hand
> >> waving as in this sentence.
> > 
> > I don't know.  I don't have access to the POSIX standard itself so I have
> > to depend on other peoples description of what POSIX says. (Thus my use of
> > 'seems' above.)
> > Everything I have found seem to agree that POSIX does not require the use of
> > volatile though.
> 
>   As far as I know, there is no separate 'pthreads' spec apart from what is
> defined in the Threads section (2.9) of the SUS (http://tinyurl.com/2wdq2u)
> and what it says about the various pthread_ functions in the system interfaces
> (http://tinyurl.com/2r7c5k) chapter.  None of that, as far as I have been able
> to determine, makes any kind of claims about access to shared state or the use
> of volatile.

Having just been pointed to that copy of the SUS, I must agree.  I can't
find anything in there saying anything at all about what is required to
safely share data between threads.  If that is really so it seems 'pthreads'
are even more under-specified than I thought (and I had fairly low
expectations in that regard.)
I really hope there is something I have missed.



-- 
<Insert your favourite quote here.>
Erik Trulsson
ertr1013@student.uu.se

^ permalink raw reply	[flat|nested] 208+ messages in thread

* Re: Optimization of conditional access to globals: thread-unsafe?
  2007-10-28 17:39                   ` Richard Guenther
  2007-10-28 18:03                     ` Erik Trulsson
@ 2007-10-28 20:12                     ` skaller
  2007-10-28 23:04                       ` Richard Guenther
  2007-10-29  9:57                     ` Andrew Haley
  2 siblings, 1 reply; 208+ messages in thread
From: skaller @ 2007-10-28 20:12 UTC (permalink / raw)
  To: Richard Guenther
  Cc: Dave Korn, Bart Van Assche, Florian Weimer, Andrew Haley, gcc,
	Andrew Pinski, Tomash Brechko, Robert Dewar


On Sun, 2007-10-28 at 18:29 +0100, Richard Guenther wrote:
> On 10/28/07, Erik Trulsson <ertr1013@student.uu.se> wrote:

> > Unfortunately it seems that the POSIX standard for threads say that as long
> > as access to a shared variable is protected by a mutex there is no need to
> > use 'volatile'.
> 
> Which is a very unpracticable say, as it essentially would force the compiler
> to assume every variable is protected by a mutex (how should it prove
> otherwise?)

So the proof is easy: mutex ops are function calls,
assume all function calls lock or unlock.

Thus: store registers aliasing sharable variables into 
those variables on every function call.

	int x = 1;
	x = x + 1; // r0 <- x; r0++
	x = x + 1; // r0++;
	f();       // x <- r0; f();

Note: this is not well stated. There is no explicit coupling
between a given variable and a mutex.

If thread A locks Mutex MA, and B locks MB, there is no synchronisation
between these threads and sharing can fail: it has to be the same
mutex (to effect 'mutual exclusion').

When two threads are exclusive, it is safe to keep variables
in registers again (because the other thread is locked up).

OK .. hmm .. well this is the idea, but a more formal proof
would be cool.


-- 
John Skaller <skaller at users dot sf dot net>
Felix, successor to C++: http://felix.sf.net

^ permalink raw reply	[flat|nested] 208+ messages in thread

* Re: Optimization of conditional access to globals: thread-unsafe?
  2007-10-28 19:15                         ` Erik Trulsson
@ 2007-10-28 20:43                           ` skaller
  2007-10-29  5:17                           ` Ross Smith
  1 sibling, 0 replies; 208+ messages in thread
From: skaller @ 2007-10-28 20:43 UTC (permalink / raw)
  To: Erik Trulsson
  Cc: Dave Korn, 'Robert Dewar', 'Bart Van Assche',
	'Florian Weimer', 'Andrew Haley',
	gcc, 'Andrew Pinski', 'Tomash Brechko'


On Sun, 2007-10-28 at 19:36 +0100, Erik Trulsson wrote:

> 
> Having just been pointed to that copy of the SUS, I must agree.  I can't
> find anything in there saying anything at all about what is required to
> safely share data between threads.  If that is really so it seems 'pthreads'
> are even more under-specified than I thought (and I had fairly low
> expectations in that regard.)
> I really hope there is something I have missed.

Clearly when two threads are both
running, one write to a variable means no other thread can
safely read or write it (assuming not atomic).

Mutex prevents more than one thread entering a particular
piece of code (that's defined, right?)

So the idea is clearly that once this is done, it is safe
to read or write a variable because no one else will.

Clearly the programmer must ensure no one else does.

Now, you are right Posix does not specify it is safe.

What you miss is that it doesn't have to: if it were
not safe, mutex would be useless, and since Posix specifies
Mutex it intends it to be useful.. so it follows that
it is safe .. no volatile required... :)

Remember Posix is an ISO Standard which codifies existing
practice and everyone makes the above assumptions.

-- 
John Skaller <skaller at users dot sf dot net>
Felix, successor to C++: http://felix.sf.net

^ permalink raw reply	[flat|nested] 208+ messages in thread

* Re: Optimization of conditional access to globals: thread-unsafe?
  2007-10-28 13:53               ` Robert Dewar
  2007-10-28 15:03                 ` Tomash Brechko
@ 2007-10-28 21:19                 ` Bart Van Assche
  2007-10-29  3:19                   ` skaller
  1 sibling, 1 reply; 208+ messages in thread
From: Bart Van Assche @ 2007-10-28 21:19 UTC (permalink / raw)
  To: Robert Dewar
  Cc: Florian Weimer, Andrew Haley, gcc, Andrew Pinski, Tomash Brechko

On 10/28/07, Robert Dewar <dewar@adacore.com> wrote:
> Bart Van Assche wrote:
>
> > My opinion is that, given the importance of multithreading, it should
> > be documented in the gcc manual which optimizations can cause trouble
> > in multithreaded software (such as (3) and (4)). It should also be
> > documented which compiler flags must be used to disable optimizations
> > that cause trouble for multithreaded software. Requiring that all
> > thread-shared variables should be declared volatile is completely
> > unacceptable.
>
> Why is this unacceptable .. seems much better to me than writing
> undefined stuff.

Requiring that all thread-shared variables must be declared volatile,
even those protected by calls to synchronization functions, implies
that all multithreaded code and all existing libraries have to be
changed. Functions like snprintf() write data to a buffer provided by
the caller. If all thread-shared variables should be declared
volatile, then a second version of snprintf() would be required with
volatile char* as the datatype of the first argument instead of char*.
Quite a hefty requirement if you ask me ...

Bart Van Assche.

^ permalink raw reply	[flat|nested] 208+ messages in thread

* Re: Optimization of conditional access to globals: thread-unsafe?
  2007-10-28 20:12                     ` skaller
@ 2007-10-28 23:04                       ` Richard Guenther
  2007-10-29  2:39                         ` skaller
  0 siblings, 1 reply; 208+ messages in thread
From: Richard Guenther @ 2007-10-28 23:04 UTC (permalink / raw)
  To: skaller
  Cc: Dave Korn, Bart Van Assche, Florian Weimer, Andrew Haley, gcc,
	Andrew Pinski, Tomash Brechko, Robert Dewar

On 10/28/07, skaller <skaller@users.sourceforge.net> wrote:
>
> On Sun, 2007-10-28 at 18:29 +0100, Richard Guenther wrote:
> > On 10/28/07, Erik Trulsson <ertr1013@student.uu.se> wrote:
>
> > > Unfortunately it seems that the POSIX standard for threads say that as long
> > > as access to a shared variable is protected by a mutex there is no need to
> > > use 'volatile'.
> >
> > Which is a very unpracticable say, as it essentially would force the compiler
> > to assume every variable is protected by a mutex (how should it prove
> > otherwise?)
>
> So the proof is easy: mutex ops are function calls,
> assume all function calls lock or unlock.
>
> Thus: store registers aliasing sharable variables into
> those variables on every function call.
>
>         int x = 1;
>         x = x + 1; // r0 <- x; r0++
>         x = x + 1; // r0++;
>         f();       // x <- r0; f();
>
> Note: this is not well stated. There is no explicit coupling
> between a given variable and a mutex.
>
> If thread A locks Mutex MA, and B locks MB, there is no synchronisation
> between these threads and sharing can fail: it has to be the same
> mutex (to effect 'mutual exclusion').
>
> When two threads are exclusive, it is safe to keep variables
> in registers again (because the other thread is locked up).
>
> OK .. hmm .. well this is the idea, but a more formal proof
> would be cool.

Doesn't work:

int a;
void foo(bool locked)
{
  if (locked)
    a++;
}

void bar(void)
{
  pthread_mutex_lock (&mx);
   foo(true);
  pthread_mutex_unlock(&mx);
}

you cannot do such analysis without seeing the whole program.

Richard.

^ permalink raw reply	[flat|nested] 208+ messages in thread

* Re: Optimization of conditional access to globals: thread-unsafe?
  2007-10-28 23:04                       ` Richard Guenther
@ 2007-10-29  2:39                         ` skaller
  2007-10-29  9:52                           ` Samuel Tardieu
  0 siblings, 1 reply; 208+ messages in thread
From: skaller @ 2007-10-29  2:39 UTC (permalink / raw)
  To: Richard Guenther
  Cc: Dave Korn, Bart Van Assche, Florian Weimer, Andrew Haley, gcc,
	Andrew Pinski, Tomash Brechko, Robert Dewar


On Sun, 2007-10-28 at 22:41 +0100, Richard Guenther wrote:

> > OK .. hmm .. well this is the idea, but a more formal proof
> > would be cool.
> 
> Doesn't work:

Of course it works.

> you cannot do such analysis without seeing the whole program.

There's no need. A mutex is assumed at each function call.
That is, registers are dumped to variables at each

	* call
	* function entry
	* function return

This means you cannot merely, say, push caller save
registers when calling a function, and you cannot leave
values in callee save registers, if the variable aliased
is sharable. 

In your example:

	int a;
	void foo(bool locked)
	{
	  if (locked)
	    a++;
	}

I see no problem, a is in memory, you can safely do

	if(!locked) goto end;
	r0 <- a; r0++; a <- r0;
	end: return;

Since 'a' here is sharable, the function can assume it
is not aliased in a register, load and increment it
and store it back.

It doesn't matter then, whether there is a mutex or not.
In fact, it doesn't matter if locked is true or false.

I also can't see anything at all is lost here.


-- 
John Skaller <skaller at users dot sf dot net>
Felix, successor to C++: http://felix.sf.net

^ permalink raw reply	[flat|nested] 208+ messages in thread

* Re: Optimization of conditional access to globals: thread-unsafe?
  2007-10-28 21:19                 ` Bart Van Assche
@ 2007-10-29  3:19                   ` skaller
  0 siblings, 0 replies; 208+ messages in thread
From: skaller @ 2007-10-29  3:19 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Robert Dewar, Florian Weimer, Andrew Haley, gcc, Andrew Pinski,
	Tomash Brechko


On Sun, 2007-10-28 at 21:43 +0100, Bart Van Assche wrote:
> On 10/28/07, Robert Dewar <dewar@adacore.com> wrote:
> > Bart Van Assche wrote:
> >
> > > My opinion is that, given the importance of multithreading, it should
> > > be documented in the gcc manual which optimizations can cause trouble
> > > in multithreaded software (such as (3) and (4)). It should also be
> > > documented which compiler flags must be used to disable optimizations
> > > that cause trouble for multithreaded software. Requiring that all
> > > thread-shared variables should be declared volatile is completely
> > > unacceptable.
> >
> > Why is this unacceptable .. seems much better to me than writing
> > undefined stuff.
> 
> Requiring that all thread-shared variables must be declared volatile,
> even those protected by calls to synchronization functions, implies
> that all multithreaded code and all existing libraries have to be
> changed.

Yes, of course that is out of the question. Instead all shared
variables are treated as sharable.

This is NOT the same as volatile. Sharable variables need to
be 'de-registered' (dumped out of aliasing registers) at
function call boundaries. This is MUCH less strict than
volatile, which is at every sequence point.

If the function call is to a visible function, gcc can look in it
to see if it might fiddle a mutex, and if not, there's no need
to dump the registers. In particular, there's no synchronisation
point when a function is inlined.

IMHO the effect of this is to change the optimiser so that local
variables not addressed are preferred for lifting to registers
over globals or addressed locals.

C++ non-static members must be treated as sharable, even
if they're private, however this is not necessarily a big
deal if they're inline.

The effect seems to be that this:

	if(cond)x++;

can quite safely be replaced by

	r0 <- x;
	if(cond) r0++;
	x <- r0;

which is the topic of this discussion.

If this code mutually excludes other accesses to x, then it is
safe, and if it doesn't, then the programmer is responsible
for writing undefined behaviour, not the compiler.

-- 
John Skaller <skaller at users dot sf dot net>
Felix, successor to C++: http://felix.sf.net

^ permalink raw reply	[flat|nested] 208+ messages in thread

* Re: Optimization of conditional access to globals: thread-unsafe?
  2007-10-28 19:15                         ` Erik Trulsson
  2007-10-28 20:43                           ` skaller
@ 2007-10-29  5:17                           ` Ross Smith
  1 sibling, 0 replies; 208+ messages in thread
From: Ross Smith @ 2007-10-29  5:17 UTC (permalink / raw)
  To: Dave Korn, 'Robert Dewar', 'Bart Van Assche',
	'Florian Weimer', 'Andrew Haley',
	gcc, 'Andrew Pinski', 'Tomash Brechko'

Erik Trulsson wrote:
> On Sun, Oct 28, 2007 at 06:06:17PM -0000, Dave Korn wrote:
>>
>>   As far as I know, there is no separate 'pthreads' spec apart from what is
>> defined in the Threads section (2.9) of the SUS (http://tinyurl.com/2wdq2u)
>> and what it says about the various pthread_ functions in the system interfaces
>> (http://tinyurl.com/2r7c5k) chapter.  None of that, as far as I have been able
>> to determine, makes any kind of claims about access to shared state or the use
>> of volatile.
> 
> Having just been pointed to that copy of the SUS, I must agree.  I can't
> find anything in there saying anything at all about what is required to
> safely share data between threads.  If that is really so it seems 'pthreads'
> are even more under-specified than I thought (and I had fairly low
> expectations in that regard.)
> I really hope there is something I have missed.

I think the relevant part is here:
http://www.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap04.html#tag_04_10

[begin quote]

4.10 Memory Synchronization

Applications shall ensure that access to any memory location by more 
than one thread of control (threads or processes) is restricted such 
that no thread of control can read or modify a memory location while 
another thread of control may be modifying it. Such access is restricted 
using functions that synchronize thread execution and also synchronize 
memory with respect to other threads. The following functions 
synchronize memory with respect to other threads:

fork()
pthread_barrier_wait()
pthread_cond_broadcast()
pthread_cond_signal()
pthread_cond_timedwait()
pthread_cond_wait()
pthread_create()
pthread_join()
pthread_mutex_lock()
pthread_mutex_timedlock()
pthread_mutex_trylock()
pthread_mutex_unlock()
pthread_spin_lock()
pthread_spin_trylock()
pthread_spin_unlock()
pthread_rwlock_rdlock()
pthread_rwlock_timedrdlock()
pthread_rwlock_timedwrlock()
pthread_rwlock_tryrdlock()
pthread_rwlock_trywrlock()
pthread_rwlock_unlock()
pthread_rwlock_wrlock()
sem_post()
sem_trywait()
sem_wait()
wait()
waitpid()

The pthread_once() function shall synchronize memory for the first call 
in each thread for a given pthread_once_t object.

Unless explicitly stated otherwise, if one of the above functions 
returns an error, it is unspecified whether the invocation causes memory 
to be synchronized.

Applications may allow more than one thread of control to read a memory 
location simultaneously.

[end quote]


-- Ross Smith

^ permalink raw reply	[flat|nested] 208+ messages in thread

* Re: Optimization of conditional access to globals: thread-unsafe?
  2007-10-29  2:39                         ` skaller
@ 2007-10-29  9:52                           ` Samuel Tardieu
  2007-10-29 11:24                             ` skaller
  0 siblings, 1 reply; 208+ messages in thread
From: Samuel Tardieu @ 2007-10-29  9:52 UTC (permalink / raw)
  To: skaller
  Cc: Richard Guenther, Dave Korn, Bart Van Assche, Florian Weimer,
	Andrew Haley, gcc, Andrew Pinski, Tomash Brechko, Robert Dewar

>>>>> "skaller" == skaller  <skaller@users.sourceforge.net> writes:

skaller> Since 'a' here is sharable, the function can assume it is not
skaller> aliased in a register, load and increment it and store it
skaller> back.

skaller> It doesn't matter then, whether there is a mutex or not.  In
skaller> fact, it doesn't matter if locked is true or false.

skaller> I also can't see anything at all is lost here.

Back to the beginning of this thread. The problem is that when locked
is false, the compiler writes (the previous value) to it anyway.

Look at the generated code for this example with the current
compiler. It is compiled as if it had been written as:

        int a;
        void foo(bool locked)
        {
          a -= locked ? -1 : 0;
        }

with (x86):

        movl    a, %eax
        cmpl    $1, 4(%esp)
        sbbl    $-1, %eax
        movl    %eax, a

There is clearly a race condition if you have multiple threads
executing this code even if only one thread has "locked" being true.

  Sam
-- 
Samuel Tardieu -- sam@rfc1149.net -- http://www.rfc1149.net/

^ permalink raw reply	[flat|nested] 208+ messages in thread

* Re: Optimization of conditional access to globals: thread-unsafe?
  2007-10-28 17:39                   ` Richard Guenther
  2007-10-28 18:03                     ` Erik Trulsson
  2007-10-28 20:12                     ` skaller
@ 2007-10-29  9:57                     ` Andrew Haley
  2 siblings, 0 replies; 208+ messages in thread
From: Andrew Haley @ 2007-10-29  9:57 UTC (permalink / raw)
  To: Richard Guenther; +Cc: gcc

Richard Guenther writes:
 > On 10/28/07, Erik Trulsson <ertr1013@student.uu.se> wrote:

 > > Unfortunately it seems that the POSIX standard for threads say that as long
 > > as access to a shared variable is protected by a mutex there is no need to
 > > use 'volatile'.
 > 
 > Which is a very unpracticable say, as it essentially would force the compiler
 > to assume every variable is protected by a mutex (how should it prove
 > otherwise?)

It's not as bad as you might think.  

The compiler consequences of adopting the N2334 memory model proposal
are summarized here:

http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2338.html

Andrew.

^ permalink raw reply	[flat|nested] 208+ messages in thread

* Re: Optimization of conditional access to globals: thread-unsafe?
  2007-10-29  9:52                           ` Samuel Tardieu
@ 2007-10-29 11:24                             ` skaller
  2007-10-29 13:57                               ` Darryl Miles
  0 siblings, 1 reply; 208+ messages in thread
From: skaller @ 2007-10-29 11:24 UTC (permalink / raw)
  To: Samuel Tardieu
  Cc: Richard Guenther, Dave Korn, Bart Van Assche, Florian Weimer,
	Andrew Haley, gcc, Andrew Pinski, Tomash Brechko, Robert Dewar


On Mon, 2007-10-29 at 10:37 +0100, Samuel Tardieu wrote:
> >>>>> "skaller" == skaller  <skaller@users.sourceforge.net> writes:

> with (x86):
> 
>         movl    a, %eax
>         cmpl    $1, 4(%esp)
>         sbbl    $-1, %eax
>         movl    %eax, a
> 
> There is clearly a race condition if you have multiple threads
> executing this code even if only one thread has "locked" being true.

Ah .. ok I think I finally see. Thanks! The code ensures
well definedness by checking the establishment of the
required invariant and dynamically chosing whether or not
to do the access on that basis .. and the optimisation
above defeats that by lifting the access out of the
conditional.

In the single threaded case the lift works because it
relies on sequential access, which is the only possibility
for a single thread.

-- 
John Skaller <skaller at users dot sf dot net>
Felix, successor to C++: http://felix.sf.net

^ permalink raw reply	[flat|nested] 208+ messages in thread

* Re: Optimization of conditional access to globals: thread-unsafe?
  2007-10-29 11:24                             ` skaller
@ 2007-10-29 13:57                               ` Darryl Miles
  0 siblings, 0 replies; 208+ messages in thread
From: Darryl Miles @ 2007-10-29 13:57 UTC (permalink / raw)
  To: skaller
  Cc: Samuel Tardieu, Richard Guenther, Dave Korn, Bart Van Assche,
	Florian Weimer, Andrew Haley, gcc, Andrew Pinski, Tomash Brechko,
	Robert Dewar

skaller wrote:
> Ah .. ok I think I finally see. Thanks! The code ensures
> well definedness by checking the establishment of the
> required invariant and dynamically chosing whether or not
> to do the access on that basis .. and the optimisation
> above defeats that by lifting the access out of the
> conditional.
> 
> In the single threaded case the lift works because it
> relies on sequential access, which is the only possibility
> for a single thread.


But this is clearly not a similar case.  There is a clear 
read-modify-write cycle taking place (-= operator), and you describe the 
problem in a way that a decrement with the value of zero is allowed.

The problem domain that is atomic read-modify-write is not the same as a 
atomic assignment, which is the basis of the original issue.  More over 
the original issue was a write access to variable where none was 
described in the code for that given circumstance.


Along the lines of my my first post to this thread, if you want atomic 
read-modify-write then you are doing to have to create your 
atomic_int_dec(int *intptr) function, or atomic_int_sub(int *intptr, int 
value) which makes uses of IA32 CPU lock prefix instructions.  But for 
many other platforms (almost all RISC) you are going to have to obtain a 
mutex lock then perform a 'load from memory to register', 'substract 
value', 'store to memory from register'.


Darryl

^ permalink raw reply	[flat|nested] 208+ messages in thread

* [wwwdocs] PATCH Re: Optimization of conditional access to globals:  thread-unsafe?
  2007-10-28 18:02                       ` Andreas Schwab
@ 2007-11-04 14:33                         ` Gerald Pfeifer
  2007-11-04 23:49                           ` Kai Henningsen
  0 siblings, 1 reply; 208+ messages in thread
From: Gerald Pfeifer @ 2007-11-04 14:33 UTC (permalink / raw)
  To: Andreas Schwab
  Cc: Robert Dewar, Dave Korn, Bart Van Assche, Florian Weimer,
	Andrew Haley, gcc, gcc-patches, Andrew Pinski, Tomash Brechko

On Sun, 28 Oct 2007, Andreas Schwab wrote:
>> I don't have access to the POSIX standard itself
> See <http://www.opengroup.org/onlinepubs/009695399/toc.htm>.

Now added to our "Links and Selected Readings" page; thanks for the
pointer, Andreas!

Gerald

Index: readings.html
===================================================================
RCS file: /cvs/gcc/wwwdocs/htdocs/readings.html,v
retrieving revision 1.167
diff -u -3 -p -r1.167 readings.html
--- readings.html	1 Nov 2007 23:16:55 -0000	1.167
+++ readings.html	4 Nov 2007 12:55:46 -0000
@@ -604,7 +604,8 @@ papers, hot list pages, potential softwa
   "Building an Optimizing Compiler"</a>.  300pp. ISBN: 1-55558-179-X.</li>
 
   <li><a href="http://www.opengroup.org/">The Open Group</a> has quite a bit
-  on POSIX etc.</li>
+  on <a href="http://www.opengroup.org/onlinepubs/009695399/toc.htm">POSIX
+  and friends</a>.</li>
                                                           
   <li><a href="http://www.unicode.org">Unicode</a> and <a
   href="http://www.unicode.org/unicode/reports/tr15/">Unicode Normalization

^ permalink raw reply	[flat|nested] 208+ messages in thread

* Re: [wwwdocs] PATCH Re: Optimization of conditional access to globals:  thread-unsafe?
  2007-11-04 14:33                         ` [wwwdocs] PATCH " Gerald Pfeifer
@ 2007-11-04 23:49                           ` Kai Henningsen
  0 siblings, 0 replies; 208+ messages in thread
From: Kai Henningsen @ 2007-11-04 23:49 UTC (permalink / raw)
  To: gcc

On Sun, Nov 04, 2007 at 02:04:21PM +0100, Gerald Pfeifer wrote:
> On Sun, 28 Oct 2007, Andreas Schwab wrote:
> >> I don't have access to the POSIX standard itself
> > See <http://www.opengroup.org/onlinepubs/009695399/toc.htm>.
> 
> Now added to our "Links and Selected Readings" page; thanks for the
> pointer, Andreas!

While you're at it, you might want to add a link to the Austin Group
which is developing that standard. Access to the documents needs only
membership in their mailing list.

It's at http://www.opengroup.org/austin/.

^ permalink raw reply	[flat|nested] 208+ messages in thread

* Re: Optimization of conditional access to globals: thread-unsafe?
  2007-10-31 22:43                     ` Jason Merrill
@ 2007-10-31 22:50                       ` Jason Merrill
  0 siblings, 0 replies; 208+ messages in thread
From: Jason Merrill @ 2007-10-31 22:50 UTC (permalink / raw)
  To: gcc; +Cc: gcc-patches

Ian Lance Taylor wrote:
> It appears that the draft C++0x memory model prohibits speculative
> stores.
> 
> Therefore I now think we should aim toward prohibiting them
> unconditionally.

I agree, or perhaps unless the user specifies a flag like 
-fthread-unsafe-opts or something.

> That memory model is just a draft.

It was voted into the C++ standard working paper at the last meeting. 
And the C committee has expressed interest in adopting it, or something 
similar, as well.

Jason

^ permalink raw reply	[flat|nested] 208+ messages in thread

* Re: Optimization of conditional access to globals: thread-unsafe?
  2007-10-26 22:38                   ` Ian Lance Taylor
  2007-10-26 22:46                     ` Jonathan Wakely
  2007-10-26 22:56                     ` Diego Novillo
@ 2007-10-31 22:43                     ` Jason Merrill
  2007-10-31 22:50                       ` Jason Merrill
  2 siblings, 1 reply; 208+ messages in thread
From: Jason Merrill @ 2007-10-31 22:43 UTC (permalink / raw)
  To: Ian Lance Taylor; +Cc: Diego Novillo, Michael Matz, gcc, gcc-patches

Ian Lance Taylor wrote:
> It appears that the draft C++0x memory model prohibits speculative
> stores.
> 
> Therefore I now think we should aim toward prohibiting them
> unconditionally.

I agree, or perhaps unless the user specifies a flag like 
-fthread-unsafe-opts or something.

> That memory model is just a draft.

It was voted into the C++ standard working paper at the last meeting. 
And the C committee has expressed interest in adopting it, or something 
similar, as well.

Jason

^ permalink raw reply	[flat|nested] 208+ messages in thread

* Re: Optimization of conditional access to globals: thread-unsafe?
  2007-10-30 15:27                                                               ` Tomash Brechko
@ 2007-10-31  2:21                                                                 ` Eric Botcazou
  0 siblings, 0 replies; 208+ messages in thread
From: Eric Botcazou @ 2007-10-31  2:21 UTC (permalink / raw)
  To: Tomash Brechko; +Cc: gcc

> Please read the _description_ that comes along with the code example.

I did.

> Anyways, the patch is there.

The one for ifcvt.c, yes; more will be needed though, see for example
  http://gcc.gnu.org/ml/gcc/2007-10/msg00754.html

-- 
Eric Botcazou

^ permalink raw reply	[flat|nested] 208+ messages in thread

* Re: Optimization of conditional access to globals: thread-unsafe?
  2007-10-30 17:05                               ` Ian Lance Taylor
@ 2007-10-30 22:01                                 ` Tomash Brechko
  0 siblings, 0 replies; 208+ messages in thread
From: Tomash Brechko @ 2007-10-30 22:01 UTC (permalink / raw)
  To: Ian Lance Taylor; +Cc: gcc

On Tue, Oct 30, 2007 at 09:49:00 -0700, Ian Lance Taylor wrote:
> I don't know which suggestion you are referring to.  The patch I wrote
> will retain the optimization in the case where the memory location is
> unconditionally written later in the function.  This is most relevant
> in that the optimization can take place in a loop, if somewhere after
> the loop the memory location is unconditionally written.

OK, thanks for the description, I just couldn't build GCC after update
to see what result looks like.  And big Thank You for the patch!


-- 
   Tomash Brechko

^ permalink raw reply	[flat|nested] 208+ messages in thread

* Re: Optimization of conditional access to globals: thread-unsafe?
  2007-10-30 16:17                             ` Tomash Brechko
@ 2007-10-30 17:05                               ` Ian Lance Taylor
  2007-10-30 22:01                                 ` Tomash Brechko
  0 siblings, 1 reply; 208+ messages in thread
From: Ian Lance Taylor @ 2007-10-30 17:05 UTC (permalink / raw)
  To: Tomash Brechko; +Cc: gcc

Tomash Brechko <tomash.brechko@gmail.com> writes:

> > >   if (condition) {
> > >     *p = value;
> > >     membarrier();
> > >   } else {
> > >     membarrier();
> > >   }
> > > 
> > > But this is the same as
> > > 
> > >   if (condition)
> > >     *p = value;
> > >   membarrier();
> > 
> > No, it isn't.  If membarrier is not a general function call, then it
> > has to be a magic function.  In gcc it is implemented using a volatile
> > asm.
> 
> I didn't get your point, but probably you didn't get my either.  I was
> talking about memory barriers as a whole, not a particular
> implementation in GCC.  And my point is that you are free to inject
> them wherever you like.  This will affect performance, but not
> correctness.  Hence you can't be sure membarrier() won't be moved from
> the condition.

My point is that for a memory barrier to work at all, it has to be
magic.  And if it is magic, then it can not be moved from the
condition.

To put it another way, if you can move the memory barrier from the
condition, then it is not a memory barrier after all.


> > Note that I've committed my patch to avoid speculative stores to all
> > active branches, so this particular case should be a non-issue going
> > forward.  However, we all are going to have to take a careful look at
> > gcc to make sure that it generally conforms to the C++0x memory model.
> 
> I'm not against ending this discussion.  As I understand the patch
> (and I don't grok GCC internals), it fixes both read-only memory case,
> and race case.  But it doesn't try to preserve the optimization in the
> form that was suggested by Michael Matz (i.e. to use pointer to dummy
> object on the stack), right?

I don't know which suggestion you are referring to.  The patch I wrote
will retain the optimization in the case where the memory location is
unconditionally written later in the function.  This is most relevant
in that the optimization can take place in a loop, if somewhere after
the loop the memory location is unconditionally written.

Ian

^ permalink raw reply	[flat|nested] 208+ messages in thread

* Re: Optimization of conditional access to globals: thread-unsafe?
  2007-10-30 14:50                           ` Ian Lance Taylor
@ 2007-10-30 16:17                             ` Tomash Brechko
  2007-10-30 17:05                               ` Ian Lance Taylor
  0 siblings, 1 reply; 208+ messages in thread
From: Tomash Brechko @ 2007-10-30 16:17 UTC (permalink / raw)
  To: Ian Lance Taylor; +Cc: gcc

On Tue, Oct 30, 2007 at 07:50:04 -0700, Ian Lance Taylor wrote:
> Tomash Brechko <tomash.brechko@gmail.com> writes:
> 
> > Even if we put aside the fact that there's no such membarrier()
> > equivalent in POSIX bindings, this won't help.
> 
> In POSIX, any mutex function must be a membarrier.  For example, on
> x86, mutex lock and unlock more or less have to execute the mfence
> instruction.  If they don't, the program can see inconsistent data
> structures despite the mutex operations.

Yes, but you don't imply I should write

  if (condition) {
    *p = value;
    pthread_mutex_lock(&dummy):
    pthread_mutex_unlock(&dummy):
  }

just to trigger it.


> >   if (condition) {
> >     *p = value;
> >     membarrier();
> >   } else {
> >     membarrier();
> >   }
> > 
> > But this is the same as
> > 
> >   if (condition)
> >     *p = value;
> >   membarrier();
> 
> No, it isn't.  If membarrier is not a general function call, then it
> has to be a magic function.  In gcc it is implemented using a volatile
> asm.

I didn't get your point, but probably you didn't get my either.  I was
talking about memory barriers as a whole, not a particular
implementation in GCC.  And my point is that you are free to inject
them wherever you like.  This will affect performance, but not
correctness.  Hence you can't be sure membarrier() won't be moved from
the condition.


> Note that I've committed my patch to avoid speculative stores to all
> active branches, so this particular case should be a non-issue going
> forward.  However, we all are going to have to take a careful look at
> gcc to make sure that it generally conforms to the C++0x memory model.

I'm not against ending this discussion.  As I understand the patch
(and I don't grok GCC internals), it fixes both read-only memory case,
and race case.  But it doesn't try to preserve the optimization in the
form that was suggested by Michael Matz (i.e. to use pointer to dummy
object on the stack), right?


-- 
   Tomash Brechko

^ permalink raw reply	[flat|nested] 208+ messages in thread

* Re: Optimization of conditional access to globals: thread-unsafe?
  2007-10-30 14:48                                                             ` Eric Botcazou
@ 2007-10-30 15:27                                                               ` Tomash Brechko
  2007-10-31  2:21                                                                 ` Eric Botcazou
  0 siblings, 1 reply; 208+ messages in thread
From: Tomash Brechko @ 2007-10-30 15:27 UTC (permalink / raw)
  To: Eric Botcazou; +Cc: gcc

On Tue, Oct 30, 2007 at 15:33:56 +0100, Eric Botcazou wrote:
> We're not talking about locks, see the example you gave in your
> first message.

Please read the _description_ that comes along with the code example.

Anyways, the patch is there.


-- 
   Tomash Brechko

^ permalink raw reply	[flat|nested] 208+ messages in thread

* Re: Optimization of conditional access to globals: thread-unsafe?
  2007-10-30 10:28                         ` Tomash Brechko
@ 2007-10-30 14:50                           ` Ian Lance Taylor
  2007-10-30 16:17                             ` Tomash Brechko
  0 siblings, 1 reply; 208+ messages in thread
From: Ian Lance Taylor @ 2007-10-30 14:50 UTC (permalink / raw)
  To: Tomash Brechko; +Cc: gcc

Tomash Brechko <tomash.brechko@gmail.com> writes:

> Even if we put aside the fact that there's no such membarrier()
> equivalent in POSIX bindings, this won't help.

In POSIX, any mutex function must be a membarrier.  For example, on
x86, mutex lock and unlock more or less have to execute the mfence
instruction.  If they don't, the program can see inconsistent data
structures despite the mutex operations.


>   if (condition) {
>     *p = value;
>     membarrier();
>   } else {
>     membarrier();
>   }
> 
> But this is the same as
> 
>   if (condition)
>     *p = value;
>   membarrier();

No, it isn't.  If membarrier is not a general function call, then it
has to be a magic function.  In gcc it is implemented using a volatile
asm.


Note that I've committed my patch to avoid speculative stores to all
active branches, so this particular case should be a non-issue going
forward.  However, we all are going to have to take a careful look at
gcc to make sure that it generally conforms to the C++0x memory model.

Ian

^ permalink raw reply	[flat|nested] 208+ messages in thread

* Re: Optimization of conditional access to globals: thread-unsafe?
  2007-10-30  9:04                                                           ` Tomash Brechko
@ 2007-10-30 14:48                                                             ` Eric Botcazou
  2007-10-30 15:27                                                               ` Tomash Brechko
  0 siblings, 1 reply; 208+ messages in thread
From: Eric Botcazou @ 2007-10-30 14:48 UTC (permalink / raw)
  To: Tomash Brechko; +Cc: gcc

> volatile can be properly used _only_ if you also assume atomicity and
> cache-coherence, and this is beyond POSIX.  But anyway, I'm proving
> the opposite: when you use POSIX locks, you don't have to use
> volatile, that it.

We're not talking about locks, see the example you gave in your first message.

-- 
Eric Botcazou

^ permalink raw reply	[flat|nested] 208+ messages in thread

* Re: Optimization of conditional access to globals: thread-unsafe?
  2007-10-29 15:09                       ` Michael Matz
  2007-10-29 15:16                         ` Darryl Miles
  2007-10-29 15:16                         ` Mark Mielke
@ 2007-10-30 10:28                         ` Tomash Brechko
  2007-10-30 14:50                           ` Ian Lance Taylor
  2 siblings, 1 reply; 208+ messages in thread
From: Tomash Brechko @ 2007-10-30 10:28 UTC (permalink / raw)
  To: gcc

I'd like to answer one last argument, mostly for the sake of curious
reader, because Michael himself has agreed with (at least the part of)
the point.


On Mon, Oct 29, 2007 at 16:00:18 +0100, Michael Matz wrote:
> The issue is, that people want to write this:
> 
>   if (condition)
>     *p = value;
> 
> (i.e. without any synchronization primitive or in fact anything else after 
> the store in the control region) and expect that the store indeed only 
> happens in that control region.  And this expectation is misguided.  Had 
> they written it like:
> 
>   if (condition) {
>     *p = value;
>     membarrier();
>   }
> 
> it would have worked just fine.

Even if we put aside the fact that there's no such membarrier()
equivalent in POSIX bindings, this won't help.

First of all, let's note that you can't break the program by making it
_more_ ordered.  Indeed, program correctness doesn't depend on some
particular reordering (you can't predict it anyway), it depends only
on some particular ordering.  So we can rewrite

  if (condition) {
    *p = value;
    membarrier();
  }

as

  if (condition) {
    *p = value;
    membarrier();
  } else {
    membarrier();
  }

But this is the same as

  if (condition)
    *p = value;
  membarrier();

and we are back to the start: the store could me moved outside the
condition.  In general the following would work

  if (condition) {
    *p = value;
    opaque_function();
  }

because GCC has to assume that the call may access any memory, thus
store to *p can't be moved outside of the condition, because the call
itself can't be moved outside.  But such a construction can't be the
requirement for threaded programming.


In the original example there _were_ synchronization primitives
already, the complete piece is:


  if (condition)
    pthread_mutex_lock(&mutex);

  ...

  if (condition)
    *p = value;

  ...

  if (condition)
    pthread_mutex_unlock(&mutex);


and POSIX doesn't require any additional ordering between lock() and
unlock().  When condition is false, any speculative store to *p is
bogus, because any condition is potentially a 'lock acquired'
condition (or 'not read-only' condition).  And it was shown that the
volatile qualifier can't be applied in general case.


But perhaps I'm the only one who is still unsure about the outcome of
this discussion :).


-- 
   Tomash Brechko

^ permalink raw reply	[flat|nested] 208+ messages in thread

* Re: Optimization of conditional access to globals: thread-unsafe?
  2007-10-30  8:29                                                         ` Eric Botcazou
@ 2007-10-30  9:04                                                           ` Tomash Brechko
  2007-10-30 14:48                                                             ` Eric Botcazou
  0 siblings, 1 reply; 208+ messages in thread
From: Tomash Brechko @ 2007-10-30  9:04 UTC (permalink / raw)
  To: Eric Botcazou; +Cc: gcc

On Tue, Oct 30, 2007 at 09:20:28 +0100, Eric Botcazou wrote:
> No, I just wanted to point out that "volatile" has a well-defined semantics 
> and can be properly used for shared accesses.  In other words, it's not all
> or nothing like your earlier message[*] seemed to imply.
> 
> [*] http://gcc.gnu.org/ml/gcc/2007-10/msg00663.html

I didn't get your point.  Sure volatile can be used _along_ with
shared data.  But we can't say it _has_ to be used _for_ shared data.
I.e. if you require all shared data to be volatile, you can't pass
pointer to such data to any function without casting away the
qualifier.

volatile can be properly used _only_ if you also assume atomicity and
cache-coherence, and this is beyond POSIX.  But anyway, I'm proving
the opposite: when you use POSIX locks, you don't have to use
volatile, that it.


-- 
   Tomash Brechko

^ permalink raw reply	[flat|nested] 208+ messages in thread

* Re: Optimization of conditional access to globals: thread-unsafe?
  2007-10-30  8:20                                                       ` Tomash Brechko
@ 2007-10-30  8:29                                                         ` Eric Botcazou
  2007-10-30  9:04                                                           ` Tomash Brechko
  0 siblings, 1 reply; 208+ messages in thread
From: Eric Botcazou @ 2007-10-30  8:29 UTC (permalink / raw)
  To: Tomash Brechko; +Cc: gcc

> Frankly, you realise the consequences of volatile access, you have
> this comment:
>
>   /* Avoid reading __gthread_active twice on the main code path.  */
>   int __gthread_active_latest_value = __gthread_active;
>
>
> Now, do you really believe that every multithreaded program should use
> volatile, and then should copy shared data to temporal storage, just
> because volatile is such a hammer?

No, I just wanted to point out that "volatile" has a well-defined semantics 
and can be properly used for shared accesses.  In other words, it's not all
or nothing like your earlier message[*] seemed to imply.

[*] http://gcc.gnu.org/ml/gcc/2007-10/msg00663.html

-- 
Eric Botcazou

^ permalink raw reply	[flat|nested] 208+ messages in thread

* Re: Optimization of conditional access to globals: thread-unsafe?
  2007-10-30  8:03                                                     ` Tomash Brechko
@ 2007-10-30  8:20                                                       ` Tomash Brechko
  2007-10-30  8:29                                                         ` Eric Botcazou
  0 siblings, 1 reply; 208+ messages in thread
From: Tomash Brechko @ 2007-10-30  8:20 UTC (permalink / raw)
  To: Eric Botcazou; +Cc: gcc

On Tue, Oct 30, 2007 at 10:59:24 +0300, Tomash Brechko wrote:
> On Tue, Oct 30, 2007 at 08:56:08 +0100, Eric Botcazou wrote:
> > > The use doesn't become proper simply because it appears in the code,
> > > even if in the code of GCC.  volatile might be used there for
> > > completely different reasons.
> > 
> > No, I put it there for this purpose.
> 
> Then you could remove it, if not for unlocked access.

Frankly, you realise the consequences of volatile access, you have
this comment:

  /* Avoid reading __gthread_active twice on the main code path.  */
  int __gthread_active_latest_value = __gthread_active;


Now, do you really believe that every multithreaded program should use
volatile, and then should copy shared data to temporal storage, just
because volatile is such a hammer?  You may have to, with current
compilers, but that's not what was supposed by POSIX.

-- 
   Tomash Brechko

^ permalink raw reply	[flat|nested] 208+ messages in thread

* Re: Optimization of conditional access to globals: thread-unsafe?
  2007-10-30  7:59                                                   ` Eric Botcazou
@ 2007-10-30  8:03                                                     ` Tomash Brechko
  2007-10-30  8:20                                                       ` Tomash Brechko
  0 siblings, 1 reply; 208+ messages in thread
From: Tomash Brechko @ 2007-10-30  8:03 UTC (permalink / raw)
  To: Eric Botcazou; +Cc: gcc

On Tue, Oct 30, 2007 at 08:56:08 +0100, Eric Botcazou wrote:
> > The use doesn't become proper simply because it appears in the code,
> > even if in the code of GCC.  volatile might be used there for
> > completely different reasons.
> 
> No, I put it there for this purpose.

Then you could remove it, if not for unlocked access.

-- 
   Tomash Brechko

^ permalink raw reply	[flat|nested] 208+ messages in thread

* Re: Optimization of conditional access to globals: thread-unsafe?
  2007-10-30  7:48                                                 ` Tomash Brechko
  2007-10-30  7:55                                                   ` Tomash Brechko
@ 2007-10-30  7:59                                                   ` Eric Botcazou
  2007-10-30  8:03                                                     ` Tomash Brechko
  1 sibling, 1 reply; 208+ messages in thread
From: Eric Botcazou @ 2007-10-30  7:59 UTC (permalink / raw)
  To: Tomash Brechko; +Cc: gcc

> The use doesn't become proper simply because it appears in the code,
> even if in the code of GCC.  volatile might be used there for
> completely different reasons.

No, I put it there for this purpose.

-- 
Eric Botcazou

^ permalink raw reply	[flat|nested] 208+ messages in thread

* Re: Optimization of conditional access to globals: thread-unsafe?
  2007-10-30  7:48                                                 ` Tomash Brechko
@ 2007-10-30  7:55                                                   ` Tomash Brechko
  2007-10-30  7:59                                                   ` Eric Botcazou
  1 sibling, 0 replies; 208+ messages in thread
From: Tomash Brechko @ 2007-10-30  7:55 UTC (permalink / raw)
  To: Eric Botcazou; +Cc: gcc

I accidentally removed the essential line, it should be:

On Tue, Oct 30, 2007 at 10:44:52 +0300, Tomash Brechko wrote:
>   static volatile int __gthread_active = -1;
> 
>   ...

      int __gthread_active_latest_value = __gthread_active;

>     /* This test is not protected to avoid taking a lock on the main code
>        path so every update of __gthread_active in a threaded program must
>        be atomic with regard to the result of the test.  */
>     if (__builtin_expect (__gthread_active_latest_value < 0, 0))
>       {
>         ...


But you knew it already ;).


-- 
   Tomash Brechko

^ permalink raw reply	[flat|nested] 208+ messages in thread

* Re: Optimization of conditional access to globals: thread-unsafe?
  2007-10-29 22:04                                               ` Eric Botcazou
@ 2007-10-30  7:48                                                 ` Tomash Brechko
  2007-10-30  7:55                                                   ` Tomash Brechko
  2007-10-30  7:59                                                   ` Eric Botcazou
  0 siblings, 2 replies; 208+ messages in thread
From: Tomash Brechko @ 2007-10-30  7:48 UTC (permalink / raw)
  To: Eric Botcazou; +Cc: gcc

On Mon, Oct 29, 2007 at 22:30:20 +0100, Eric Botcazou wrote:
> See gcc/gthr-posix.h for a proper use of "volatile" for a shared access.

It was already shown that you can't use volatile in general case,
because you can't pass such data to any function.  See the mail of
Bart Van Assche.

The use doesn't become proper simply because it appears in the code,
even if in the code of GCC.  volatile might be used there for
completely different reasons.  Consider this comment:

  static volatile int __gthread_active = -1;

  ...

    /* This test is not protected to avoid taking a lock on the main code
       path so every update of __gthread_active in a threaded program must
       be atomic with regard to the result of the test.  */
    if (__builtin_expect (__gthread_active_latest_value < 0, 0))
      {
        ...


volatile + atomic update + cache-coherent system will indeed give you
the correct result, but such use is not POSIX-compliant, and I mostly
talk about POSIX Threads.


-- 
   Tomash Brechko

^ permalink raw reply	[flat|nested] 208+ messages in thread

* Re: Optimization of conditional access to globals: thread-unsafe?
  2007-10-30  1:40                     ` Robert Dewar
@ 2007-10-30  6:37                       ` Eric Botcazou
  0 siblings, 0 replies; 208+ messages in thread
From: Eric Botcazou @ 2007-10-30  6:37 UTC (permalink / raw)
  To: Robert Dewar; +Cc: gcc, skaller, Andi Kleen

> Note also that excessive inlining often is a loss due to
> increase in icache pressure. In Ada it is the style to
> carefully mark inlinable routines with pragma Inline, and
> we often find in Ada that use of -O3, which activates
> automatic inlining, going beyond what the programmer has
> asked for, is often an overall loss.

That's a little outdated though, 4.x behaves differently.

-- 
Eric Botcazou

^ permalink raw reply	[flat|nested] 208+ messages in thread

* Re: Optimization of conditional access to globals: thread-unsafe?
  2007-10-29 22:07                   ` Robert Dewar
@ 2007-10-30  1:40                     ` Robert Dewar
  2007-10-30  6:37                       ` Eric Botcazou
  0 siblings, 1 reply; 208+ messages in thread
From: Robert Dewar @ 2007-10-30  1:40 UTC (permalink / raw)
  To: skaller; +Cc: Andi Kleen, gcc

Robert Dewar wrote:

> Yes, of course! unrolling loops is often an overall loss

Note also that excessive inlining often is a loss due to
increase in icache pressure. In Ada it is the style to
carefully mark inlinable routines with pragma Inline, and
we often find in Ada that use of -O3, which activates
automatic inlining, going beyond what the programmer has
asked for, is often an overall loss.



^ permalink raw reply	[flat|nested] 208+ messages in thread

* Re: Optimization of conditional access to globals: thread-unsafe?
  2007-10-29 21:29                 ` skaller
@ 2007-10-29 22:07                   ` Robert Dewar
  2007-10-30  1:40                     ` Robert Dewar
  0 siblings, 1 reply; 208+ messages in thread
From: Robert Dewar @ 2007-10-29 22:07 UTC (permalink / raw)
  To: skaller; +Cc: Andi Kleen, gcc

skaller wrote:
> On Mon, 2007-10-29 at 21:03 +0100, Andi Kleen wrote:
>> On Mon, Oct 29, 2007 at 03:51:27PM -0400, Robert Dewar wrote:
>>> Sure, well nearly every optimization has some case where it is a 
>>> pessimization (one interesting thing that happens is that if you
>>> change the length of generated code in *any* way you may be unlucky
>>> and cause a systematic instruction cache miss in a loop, inlining
>> icache misses are hard in general and agreed the compiler cannot
>> do too much about them (except for trying not to generate too bloated
>> code in general)
> 
> BTW: doesn't this suggest unrolling loops and recursions is
> potentially expensive?

Yes, of course! unrolling loops is often an overall loss
> 


^ permalink raw reply	[flat|nested] 208+ messages in thread

* Re: Optimization of conditional access to globals: thread-unsafe?
  2007-10-29  9:04                                             ` Tomash Brechko
  2007-10-29  9:12                                               ` Tomash Brechko
@ 2007-10-29 22:04                                               ` Eric Botcazou
  2007-10-30  7:48                                                 ` Tomash Brechko
  1 sibling, 1 reply; 208+ messages in thread
From: Eric Botcazou @ 2007-10-29 22:04 UTC (permalink / raw)
  To: Tomash Brechko; +Cc: gcc

> But shouldn't we formally define "define" first? :)

Note that I wrote "more or less formally".  The definition of "volatile" in 
the ISO C standard falls into this category and I'd personally trust it more 
than whatever -fno-speculative-store option.

See gcc/gthr-posix.h for a proper use of "volatile" for a shared access.

-- 
Eric Botcazou

^ permalink raw reply	[flat|nested] 208+ messages in thread

* Re: Optimization of conditional access to globals: thread-unsafe?
  2007-10-29 20:10               ` Andi Kleen
  2007-10-29 20:19                 ` Robert Dewar
@ 2007-10-29 21:29                 ` skaller
  2007-10-29 22:07                   ` Robert Dewar
  1 sibling, 1 reply; 208+ messages in thread
From: skaller @ 2007-10-29 21:29 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Robert Dewar, gcc


On Mon, 2007-10-29 at 21:03 +0100, Andi Kleen wrote:
> On Mon, Oct 29, 2007 at 03:51:27PM -0400, Robert Dewar wrote:
> > Sure, well nearly every optimization has some case where it is a 
> > pessimization (one interesting thing that happens is that if you
> > change the length of generated code in *any* way you may be unlucky
> > and cause a systematic instruction cache miss in a loop, inlining
> 
> icache misses are hard in general and agreed the compiler cannot
> do too much about them (except for trying not to generate too bloated
> code in general)

BTW: doesn't this suggest unrolling loops and recursions is
potentially expensive?

-- 
John Skaller <skaller at users dot sf dot net>
Felix, successor to C++: http://felix.sf.net

^ permalink raw reply	[flat|nested] 208+ messages in thread

* Re: Optimization of conditional access to globals: thread-unsafe?
  2007-10-29 20:59                                   ` Michael Matz
@ 2007-10-29 21:14                                     ` Tomash Brechko
  0 siblings, 0 replies; 208+ messages in thread
From: Tomash Brechko @ 2007-10-29 21:14 UTC (permalink / raw)
  To: Michael Matz; +Cc: Duncan Sands, gcc, David Miller, dave.korn

On Mon, Oct 29, 2007 at 21:52:19 +0100, Michael Matz wrote:
> It is safe if there's another dominating store outside the control region.  
> Apart from that I reluctantly agree (i.e. it's not enough if there's any 
> dominating access through the pointer in question, it must be a store).

Thank you!  I almost started to think like I'm loosing grounds for my
claims :).


-- 
   Tomash Brechko

^ permalink raw reply	[flat|nested] 208+ messages in thread

* Re: Optimization of conditional access to globals: thread-unsafe?
  2007-10-29 20:52                                 ` Tomash Brechko
@ 2007-10-29 20:59                                   ` Michael Matz
  2007-10-29 21:14                                     ` Tomash Brechko
  0 siblings, 1 reply; 208+ messages in thread
From: Michael Matz @ 2007-10-29 20:59 UTC (permalink / raw)
  To: Tomash Brechko; +Cc: Duncan Sands, gcc, David Miller, dave.korn

Hi,

On Mon, 29 Oct 2007, Tomash Brechko wrote:

> Still, I believe the example proves the general idea.  It shows that 
> speculative store is never safe, because every 'if' may be an 'if not 
> read-only'-one.

It is safe if there's another dominating store outside the control region.  
Apart from that I reluctantly agree (i.e. it's not enough if there's any 
dominating access through the pointer in question, it must be a store).


Ciao,
Michael.

^ permalink raw reply	[flat|nested] 208+ messages in thread

* Re: Optimization of conditional access to globals: thread-unsafe?
  2007-10-29 19:43                               ` Duncan Sands
  2007-10-29 20:03                                 ` Jack Lloyd
@ 2007-10-29 20:52                                 ` Tomash Brechko
  2007-10-29 20:59                                   ` Michael Matz
  1 sibling, 1 reply; 208+ messages in thread
From: Tomash Brechko @ 2007-10-29 20:52 UTC (permalink / raw)
  To: Duncan Sands; +Cc: gcc, Michael Matz, David Miller, dave.korn

On Mon, Oct 29, 2007 at 20:37:52 +0100, Duncan Sands wrote:
> I don't see this with gcc 4.1 or 4.2.  Just a data point.

Yes, thanks for pointing this.  It fails with gcc (GCC) 4.3.0 20071021
(experimental) though.  It turns out that GCC 4.2 and below don't do
this optimization for pointers (even when known to be non-null).
Formally, POSIX requires mprotect() to work only on mmap()'ed regions,
which are accessed through pointers.  Technically you can make any
page read-only, including the one that holds globals, but this won't
pass GCC lawyers.

Still, I believe the example proves the general idea.  It shows that
speculative store is never safe, because every 'if' may be an 'if not
read-only'-one.  And if optimization is not being performed, then it's
only for good: the program is thread-safe, and disabling optimization
for other cases won't affect performance of pointer case.


-- 
   Tomash Brechko

^ permalink raw reply	[flat|nested] 208+ messages in thread

* Re: Optimization of conditional access to globals: thread-unsafe?
  2007-10-29 20:10               ` Andi Kleen
@ 2007-10-29 20:19                 ` Robert Dewar
  2007-10-29 21:29                 ` skaller
  1 sibling, 0 replies; 208+ messages in thread
From: Robert Dewar @ 2007-10-29 20:19 UTC (permalink / raw)
  To: Andi Kleen; +Cc: gcc

Andi Kleen wrote:
> On Mon, Oct 29, 2007 at 03:51:27PM -0400, Robert Dewar wrote:
>> Sure, well nearly every optimization has some case where it is a 
>> pessimization (one interesting thing that happens is that if you
>> change the length of generated code in *any* way you may be unlucky
>> and cause a systematic instruction cache miss in a loop, inlining
> 
> icache misses are hard in general and agreed the compiler cannot
> do too much about them (except for trying not to generate too bloated
> code in general)
> 
> But adding gratuious dcache misses is a completely different thing.
> That is something the compiler has control over and should just
> not do.

Generally sounds right, though there are cases where this is OK (I
think for example there are examples of optimizations of priming
the explicit pipeline of the i860 may generate an extra dcache
miss but outside the loop, and conceivably that could be worth while).
> 
> -Andi


^ permalink raw reply	[flat|nested] 208+ messages in thread

* Re: Optimization of conditional access to globals: thread-unsafe?
  2007-10-29 20:00             ` Robert Dewar
@ 2007-10-29 20:10               ` Andi Kleen
  2007-10-29 20:19                 ` Robert Dewar
  2007-10-29 21:29                 ` skaller
  0 siblings, 2 replies; 208+ messages in thread
From: Andi Kleen @ 2007-10-29 20:10 UTC (permalink / raw)
  To: Robert Dewar; +Cc: Andi Kleen, gcc

On Mon, Oct 29, 2007 at 03:51:27PM -0400, Robert Dewar wrote:
> Sure, well nearly every optimization has some case where it is a 
> pessimization (one interesting thing that happens is that if you
> change the length of generated code in *any* way you may be unlucky
> and cause a systematic instruction cache miss in a loop, inlining

icache misses are hard in general and agreed the compiler cannot
do too much about them (except for trying not to generate too bloated
code in general)

But adding gratuious dcache misses is a completely different thing.
That is something the compiler has control over and should just
not do.

-Andi

^ permalink raw reply	[flat|nested] 208+ messages in thread

* Re: Optimization of conditional access to globals: thread-unsafe?
  2007-10-29 19:43                               ` Duncan Sands
@ 2007-10-29 20:03                                 ` Jack Lloyd
  2007-10-29 20:52                                 ` Tomash Brechko
  1 sibling, 0 replies; 208+ messages in thread
From: Jack Lloyd @ 2007-10-29 20:03 UTC (permalink / raw)
  To: gcc

On Mon, Oct 29, 2007 at 08:37:52PM +0100, Duncan Sands wrote:
> Hi Tomash,
> 
> >   moonlight:/tmp$ /usr/local/gcc-4.3-trunk/bin/gcc -O0 mmap.c -o mmap
> >   moonlight:/tmp$ ./mmap
> >   GCC is the best compiler ever!
> >   moonlight:/tmp$ /usr/local/gcc-4.3-trunk/bin/gcc -O1 mmap.c -o mmap
> >   moonlight:/tmp$ ./mmap
> >   Segmentation fault
> 
> I don't see this with gcc 4.1 or 4.2.  Just a data point.

I tried this and didn't see any problems with 4.1.1 20070105 and 4.3.0
20070907 (Linux/amd64) with or without optimization (-O0, -O2, -O2
-ftree-vectorize). I thought the relevant optimization pass went in to
gcc 3.4?

-Jack

^ permalink raw reply	[flat|nested] 208+ messages in thread

* Re: Optimization of conditional access to globals: thread-unsafe?
  2007-10-29 19:51           ` Andi Kleen
@ 2007-10-29 20:00             ` Robert Dewar
  2007-10-29 20:10               ` Andi Kleen
  0 siblings, 1 reply; 208+ messages in thread
From: Robert Dewar @ 2007-10-29 20:00 UTC (permalink / raw)
  To: Andi Kleen; +Cc: gcc

Andi Kleen wrote:
> Robert Dewar <dewar@adacore.com> writes:
>> a) the standard allows the optimization (or rather does not forbid it)
> 
> Assuming it is an optimization. See http://gcc.gnu.org/ml/gcc/2007-10/msg00607.html
> for a counter example. In general cache misses are so costly that anything
> that risks introducing more is in general a bad idea.

Sure, well nearly every optimization has some case where it is a 
pessimization (one interesting thing that happens is that if you
change the length of generated code in *any* way you may be unlucky
and cause a systematic instruction cache miss in a loop, inlining
can do this some times for example). But indeed, it is important
to evaluate a proposed optimization over a reasonable body of
tests, rather than going into the mode "look, I found this test
that speeds up 32.7% isn't that great? we definitely should put
this optimization in".

^ permalink raw reply	[flat|nested] 208+ messages in thread

* Re: Optimization of conditional access to globals: thread-unsafe?
       [not found]         ` <47260E97.4020309@adacore.com.suse.lists.egcs>
@ 2007-10-29 19:51           ` Andi Kleen
  2007-10-29 20:00             ` Robert Dewar
  0 siblings, 1 reply; 208+ messages in thread
From: Andi Kleen @ 2007-10-29 19:51 UTC (permalink / raw)
  To: Robert Dewar; +Cc: gcc

Robert Dewar <dewar@adacore.com> writes:
>
> a) the standard allows the optimization (or rather does not forbid it)

Assuming it is an optimization. See http://gcc.gnu.org/ml/gcc/2007-10/msg00607.html
for a counter example. In general cache misses are so costly that anything
that risks introducing more is in general a bad idea.

-Andi

^ permalink raw reply	[flat|nested] 208+ messages in thread

* Re: Optimization of conditional access to globals: thread-unsafe?
  2007-10-29 16:20                             ` Tomash Brechko
  2007-10-29 16:32                               ` Tomash Brechko
@ 2007-10-29 19:43                               ` Duncan Sands
  2007-10-29 20:03                                 ` Jack Lloyd
  2007-10-29 20:52                                 ` Tomash Brechko
  1 sibling, 2 replies; 208+ messages in thread
From: Duncan Sands @ 2007-10-29 19:43 UTC (permalink / raw)
  To: gcc; +Cc: Tomash Brechko, Michael Matz, David Miller, dave.korn

Hi Tomash,

>   moonlight:/tmp$ /usr/local/gcc-4.3-trunk/bin/gcc -O0 mmap.c -o mmap
>   moonlight:/tmp$ ./mmap
>   GCC is the best compiler ever!
>   moonlight:/tmp$ /usr/local/gcc-4.3-trunk/bin/gcc -O1 mmap.c -o mmap
>   moonlight:/tmp$ ./mmap
>   Segmentation fault

I don't see this with gcc 4.1 or 4.2.  Just a data point.

Ciao,

Duncan.

^ permalink raw reply	[flat|nested] 208+ messages in thread

* Re: Optimization of conditional access to globals: thread-unsafe?
  2007-10-29 12:08                               ` Darryl Miles
  2007-10-29 12:14                                 ` Robert Dewar
@ 2007-10-29 17:04                                 ` skaller
  1 sibling, 0 replies; 208+ messages in thread
From: skaller @ 2007-10-29 17:04 UTC (permalink / raw)
  To: Darryl Miles; +Cc: David Miller, dave.korn, tomash.brechko, matz, gcc


On Mon, 2007-10-29 at 11:57 +0000, Darryl Miles wrote:

> This then leads into the question.  Is a pointer allowed to be invalid.

A variable of type X need not contain an X in C, however the
effect of any operation including assignment, copying,
or passing is undefined if the value isn't valid.

Valid pointers are NULL or a pointer to, into, or one
past the end of any individual storage block.

However most architectures/ABI's work fine copying invalid
pointers.

The rule is partly due to Intel segmented memory,
allowing pointers to be passed in Segment:offset register
pairs, which in protected mode requires a valid 
segment descriptor.

-- 
John Skaller <skaller at users dot sf dot net>
Felix, successor to C++: http://felix.sf.net

^ permalink raw reply	[flat|nested] 208+ messages in thread

* Re: Optimization of conditional access to globals: thread-unsafe?
  2007-10-29 16:29                         ` Joe Buck
@ 2007-10-29 16:53                           ` Robert Dewar
  0 siblings, 0 replies; 208+ messages in thread
From: Robert Dewar @ 2007-10-29 16:53 UTC (permalink / raw)
  To: Joe Buck; +Cc: David Miller, dave.korn, matz, gcc

Joe Buck wrote:
> From: "Dave Korn" <dave.korn@artimi.com>
>>>   Better write your own compiler then.
> 
> On Sun, Oct 28, 2007 at 06:34:01PM -0700, David Miller wrote:
>> If this becomes the common attitude of GCC developers, you can pretty
>> much guarentee this will drive people to work on LLVM and other
>> alternative compiler code bases.
> 
> The primary job of GCC is to serve as the compiler for the GNU/Linux
> system.  This trumps strict, anal-retentive standards conformance,
> though ideally it should come as close as possible to achieving both.

Well actually I think "strict anal-retentive standards conformance" is
never trumped. The compiler MUST be standards conforming. However, this
issue has nothing to do with conforming or not conforming to the 
standard in the sense that no one is suggesting non-conformance.

The issue is whether to take optional advantage of freedoms granted
by the standard in performing optimizations, and that's another matter
entirely. Optimization is not a matter of conformance (the standard
allows but does not require optimization). In considering an 
optimization we must meet two criteria

a) the standard allows the optimization (or rather does not forbid it)

b) the optimization is pragmatically useful (the gains exceed the 
losses). The fact that an optimization is allowed by the standard is
NOT of itself an argument in favor of doing the optimization

> There are cases where we need to depart from the standard somewhat,
> because it doesn't reflect existing practice.  Aggressively optimizing
> on the basis that overflow of ints is undefined would, for example,
> break GCC itself, as well as many Unix and Linux programs.  This makes
> the job more difficult.

It is not "depart[ing] from the standard" to decline to "aggressively
optimize on the basis that overflow of ints is undefined". The standard
permits this, but does not require it, or even suggest it, it merely
enables it, so that we can consider whether it is a good idea.

> Unfortunately, too many GCC developers believed what they were told in
> their university compiler courses: that the language standard is the
> contract.  But that is only part of the story.

The language standard is a contract, but there is a big difference 
between a required performance element of a contract and something
that is allowed. If you get a mortgage from a bank, it is unlikely
that there is anything in this contract that forbids you from
painting the outside of your house purple with yellow stripes, but
that does not mean it is a good idea to do so.

What I think is a possible fault among *some* GCC developers, certainly
not all, is the notion that if an optimization is allowed, and if it
improves performance of at least some programs, then it is automatically
a good idea to do the optimization.

A similar situation arises in Ada with the question of whether the
compiler should assume that subtypes are in range, e.g. if you say

   X : Integer range 1 .. 10;

is the compiler allowed to assume X is in range (answer yes), should
it do so? (answer, many including me at AdaCore thing the answer is
a thundering no, but others at AdaCore are equally adamant that the
answer should be yes, so this is unresolved, currently the compiler
does make this assumption).


^ permalink raw reply	[flat|nested] 208+ messages in thread

* Re: Optimization of conditional access to globals: thread-unsafe?
  2007-10-29  7:43                             ` David Miller
  2007-10-29 12:08                               ` Darryl Miles
@ 2007-10-29 16:47                               ` Joe Buck
  1 sibling, 0 replies; 208+ messages in thread
From: Joe Buck @ 2007-10-29 16:47 UTC (permalink / raw)
  To: David Miller; +Cc: darryl-mailinglists, dave.korn, tomash.brechko, matz, gcc

On Sun, Oct 28, 2007 at 10:17:38PM -0700, David Miller wrote:
> From: Darryl Miles <darryl-mailinglists@netbauds.net>
> Date: Mon, 29 Oct 2007 04:53:49 +0000
> 
> > What are the issues with "speculative loads" ?
> 
> The conditional might be protecting whether the pointer is valid and
> can be dereferenced at all.
> 
> int *counter;
> 
> void foo(int counter_is_valid)
> {
> 	if (counter_is_valid)
> 		(*counter)++;
> }

GCC will never do a speculative access in this case.  That's because
"counter_is_valid" might really mean "counter_is_non_null".

It seems that the original issue can only occur if there is a direct
write to a global, not a write through a pointer.

^ permalink raw reply	[flat|nested] 208+ messages in thread

* Re: Optimization of conditional access to globals: thread-unsafe?
  2007-10-29 16:20                             ` Tomash Brechko
@ 2007-10-29 16:32                               ` Tomash Brechko
  2007-10-29 19:43                               ` Duncan Sands
  1 sibling, 0 replies; 208+ messages in thread
From: Tomash Brechko @ 2007-10-29 16:32 UTC (permalink / raw)
  To: gcc

On Mon, Oct 29, 2007 at 19:20:25 +0300, Tomash Brechko wrote:
> Good reasoning, and that's exactly what some of us are asking for.
> Please see the example:

At higher optimization levels GCC may inline f(), or not call it at
all, so below is a more complete case:


#include <sys/mman.h>
#include <unistd.h>
#include <stdio.h>


int
f(int read_only, int a[]) __attribute__((__noinline__));


int
f(int read_only, int a[])
{
  int res = a[0];

  if (! read_only)
    a[0] = 1;
  
  return res;
}


int
main(void)
{
  int res;

  const long page_size = sysconf(_SC_PAGESIZE);

  int *a1 = mmap(NULL, page_size, PROT_READ | PROT_WRITE,
                 MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
  int *a2 = mmap(NULL, page_size, PROT_READ,
                 MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);

  res += f(0, a1);
  res += f(1, a2);

  fputs("GCC is the best compiler ever!\n", stdout);

  return res;
}



-- 
   Tomash Brechko

^ permalink raw reply	[flat|nested] 208+ messages in thread

* Re: Optimization of conditional access to globals: thread-unsafe?
  2007-10-29  1:37                       ` David Miller
  2007-10-29  3:22                         ` skaller
  2007-10-29 11:54                         ` Robert Dewar
@ 2007-10-29 16:29                         ` Joe Buck
  2007-10-29 16:53                           ` Robert Dewar
  2 siblings, 1 reply; 208+ messages in thread
From: Joe Buck @ 2007-10-29 16:29 UTC (permalink / raw)
  To: David Miller; +Cc: dave.korn, matz, gcc


From: "Dave Korn" <dave.korn@artimi.com>
> >   Better write your own compiler then.

On Sun, Oct 28, 2007 at 06:34:01PM -0700, David Miller wrote:
> If this becomes the common attitude of GCC developers, you can pretty
> much guarentee this will drive people to work on LLVM and other
> alternative compiler code bases.

The primary job of GCC is to serve as the compiler for the GNU/Linux
system.  This trumps strict, anal-retentive standards conformance,
though ideally it should come as close as possible to achieving both.

There are cases where we need to depart from the standard somewhat,
because it doesn't reflect existing practice.  Aggressively optimizing
on the basis that overflow of ints is undefined would, for example,
break GCC itself, as well as many Unix and Linux programs.  This makes
the job more difficult.

Unfortunately, too many GCC developers believed what they were told in
their university compiler courses: that the language standard is the
contract.  But that is only part of the story.

^ permalink raw reply	[flat|nested] 208+ messages in thread

* Re: Optimization of conditional access to globals: thread-unsafe?
  2007-10-29 15:00                           ` Michael Matz
@ 2007-10-29 16:20                             ` Tomash Brechko
  2007-10-29 16:32                               ` Tomash Brechko
  2007-10-29 19:43                               ` Duncan Sands
  0 siblings, 2 replies; 208+ messages in thread
From: Tomash Brechko @ 2007-10-29 16:20 UTC (permalink / raw)
  To: Michael Matz; +Cc: David Miller, dave.korn, gcc

On Mon, Oct 29, 2007 at 15:53:56 +0100, Michael Matz wrote:
> No it won't, because without further information GCC can't know that a 
> memory access won't trap.  Ergo it will not move it out of its control 
> region, exactly because it would potentially introduce traps where there 
> were none before.

Good reasoning, and that's exactly what some of us are asking for.
Please see the example:


  #include <sys/mman.h>
  #include <unistd.h>
  #include <stdio.h>


  int
  f(int read_only, int a[])
  {
    int res = a[0];

    if (! read_only)
      a[0] = 1;
  
    return res;
  }


  int
  main(void)
  {
    const long page_size = sysconf(_SC_PAGESIZE);

    int *a1 = mmap(NULL, page_size, PROT_READ | PROT_WRITE,
                   MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
    int *a2 = mmap(NULL, page_size, PROT_READ,
                   MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);

    f(0, a1);
    f(1, a2);

    fputs("GCC is the best compiler ever!\n", stdout);
  }


It gives:

  moonlight:/tmp$ /usr/local/gcc-4.3-trunk/bin/gcc -O0 mmap.c -o mmap
  moonlight:/tmp$ ./mmap
  GCC is the best compiler ever!
  moonlight:/tmp$ /usr/local/gcc-4.3-trunk/bin/gcc -O1 mmap.c -o mmap
  moonlight:/tmp$ ./mmap
  Segmentation fault


:-/


The discussion is not pointless, just please try to understand what
other people are trying to say.  No one is stupid, we all just not on
the same page yet.


-- 
   Tomash Brechko

^ permalink raw reply	[flat|nested] 208+ messages in thread

* Re: Optimization of conditional access to globals: thread-unsafe?
  2007-10-29 15:24                           ` Michael Matz
@ 2007-10-29 15:40                             ` Darryl Miles
  0 siblings, 0 replies; 208+ messages in thread
From: Darryl Miles @ 2007-10-29 15:40 UTC (permalink / raw)
  To: Michael Matz; +Cc: Mark Mielke, David Miller, gcc

Michael Matz wrote:
>> Don't you need the barrier before.  This is to ensure it completed the
>> condition test completely first before it then processed the assignment
>> expression.
>>
>> if(condition) {
>>  somebarrier();
>>  *p = value;
>> }
>>
>> The issue is not that the store is done too late, but that a 
>> write-access is done too early.
> 
> No.  The initial cause for this needless thread was that a store was moved 
> down, out of its control region.  Of course it doesn't help when people 
> keep shifting their point of focus in such discussions.  Now it already 
> moved to fear that GCC would somehow introduce new traps.  Without the 
> people discussing about that fear even bothering to check if that really 
> happens :-(

No the initial problem was that the store was done when the code 
execution path clearly indicates no store should be performed.  The 
store was a re-write of the same and existing value in *p.

The optimizer tried to interleave the compare/test with the load from 
memory.  By inserting the barrier between the test and assignment that 
would stop that interleave from taking place, since it can't optimize 
across the barrier, it must perform the test and branch first, before it 
stores to memory.

It may optionally interleave the 'load from memory into register for 
"value" variable'.  This is would be a speculative load and this would 
be safe, as the value or 'value' may go unused (thrown away) if the 
branch is taken to skip the store to *p.


Now the original case was show as a simple function with just the

if(condition) {
  *v = 1
}

I would agree with you that a barrier() afterwards would be needed if 
there was any statement beyond that close brace of the test within the 
same function.  This is to ensure the store is not deferred any later, 
that maybe accessed via another alias to the same memory for which the 
compiler could not see at compile time.

But there isn't, there is a function return, which does the trick nicely.

A purist perspective this makes it:

void foo(int value) {
  if(condition) {
   somebarrier();
   *v = value;
   somebarrier();
  }

  // more statements here that may access *v
  // if you don't have any statements here, then you can omit the 2nd 
somebarrier() call

  return;
}


Darryl

^ permalink raw reply	[flat|nested] 208+ messages in thread

* Re: Optimization of conditional access to globals: thread-unsafe?
  2007-10-29 15:35                               ` Michael Matz
@ 2007-10-29 15:40                                 ` Robert Dewar
  0 siblings, 0 replies; 208+ messages in thread
From: Robert Dewar @ 2007-10-29 15:40 UTC (permalink / raw)
  To: Michael Matz; +Cc: David Miller, dave.korn, gcc

Michael Matz wrote:

> 456.hmmer is not a small benchmark, but a real world scientific 
> application for protein sequence analysis using hidden markov models.  It 
> just so happens that it also is a standardized benchmark in cpu2006.

A single data point is not data in the sense I refer to. What you want
is cumulated data over a significant body of real code. That's not
normally a requirement for optimizations, but to me it is for one
like this that is disruptive.

The fact that you have an optimization that can help in an individual
case *may* be justification for making it an option, but it is 
definitely not justification for making it the default when it
is disruptive.
> 
> 
> Ciao,
> Michael.


^ permalink raw reply	[flat|nested] 208+ messages in thread

* Re: Optimization of conditional access to globals: thread-unsafe?
  2007-10-29 15:34                             ` Robert Dewar
@ 2007-10-29 15:35                               ` Michael Matz
  2007-10-29 15:40                                 ` Robert Dewar
  0 siblings, 1 reply; 208+ messages in thread
From: Michael Matz @ 2007-10-29 15:35 UTC (permalink / raw)
  To: Robert Dewar; +Cc: David Miller, dave.korn, gcc

Hi,

On Mon, 29 Oct 2007, Robert Dewar wrote:

> Well perhaps some emails got lost, but to be clear what I am looking for 
> is actual data on real live large scale applications that show this 
> optimization having a significant effect. I have not seen that. Yes, I 
> understand that individual small benchmarks might be affected, but I 
> never find that convincing.

456.hmmer is not a small benchmark, but a real world scientific 
application for protein sequence analysis using hidden markov models.  It 
just so happens that it also is a standardized benchmark in cpu2006.


Ciao,
Michael.

^ permalink raw reply	[flat|nested] 208+ messages in thread

* Re: Optimization of conditional access to globals: thread-unsafe?
  2007-10-29 15:21                           ` Michael Matz
@ 2007-10-29 15:34                             ` Robert Dewar
  2007-10-29 15:35                               ` Michael Matz
  0 siblings, 1 reply; 208+ messages in thread
From: Robert Dewar @ 2007-10-29 15:34 UTC (permalink / raw)
  To: Michael Matz; +Cc: David Miller, dave.korn, gcc

Michael Matz wrote:
> Hi,
> 
> On Mon, 29 Oct 2007, Robert Dewar wrote:
> 
>> One thing that seems missing from this thread is any quantitative 
>> analysis of the value of this optimization.
> 
> Please read my mails carefully.

Well perhaps some emails got lost, but to be clear what I am looking
for is actual data on real live large scale applications that show
this optimization having a significant effect. I have not seen that.
Yes, I understand that individual small benchmarks might be affected,
but I never find that convincing.
> 
> 
> Ciao,
> Michael.


^ permalink raw reply	[flat|nested] 208+ messages in thread

* Re: Optimization of conditional access to globals: thread-unsafe?
  2007-10-29 15:16                         ` Darryl Miles
@ 2007-10-29 15:24                           ` Michael Matz
  2007-10-29 15:40                             ` Darryl Miles
  0 siblings, 1 reply; 208+ messages in thread
From: Michael Matz @ 2007-10-29 15:24 UTC (permalink / raw)
  To: Darryl Miles; +Cc: Mark Mielke, David Miller, gcc

Hi,

On Mon, 29 Oct 2007, Darryl Miles wrote:

> Don't you need the barrier before.  This is to ensure it completed the
> condition test completely first before it then processed the assignment
> expression.
> 
> if(condition) {
>  somebarrier();
>  *p = value;
> }
> 
> The issue is not that the store is done too late, but that a 
> write-access is done too early.

No.  The initial cause for this needless thread was that a store was moved 
down, out of its control region.  Of course it doesn't help when people 
keep shifting their point of focus in such discussions.  Now it already 
moved to fear that GCC would somehow introduce new traps.  Without the 
people discussing about that fear even bothering to check if that really 
happens :-(


Ciao,
Michael.

^ permalink raw reply	[flat|nested] 208+ messages in thread

* Re: Optimization of conditional access to globals: thread-unsafe?
  2007-10-29 11:54                         ` Robert Dewar
@ 2007-10-29 15:21                           ` Michael Matz
  2007-10-29 15:34                             ` Robert Dewar
  0 siblings, 1 reply; 208+ messages in thread
From: Michael Matz @ 2007-10-29 15:21 UTC (permalink / raw)
  To: Robert Dewar; +Cc: David Miller, dave.korn, gcc

Hi,

On Mon, 29 Oct 2007, Robert Dewar wrote:

> One thing that seems missing from this thread is any quantitative 
> analysis of the value of this optimization.

Please read my mails carefully.


Ciao,
Michael.

^ permalink raw reply	[flat|nested] 208+ messages in thread

* Re: Optimization of conditional access to globals: thread-unsafe?
  2007-10-29 15:09                       ` Michael Matz
@ 2007-10-29 15:16                         ` Darryl Miles
  2007-10-29 15:24                           ` Michael Matz
  2007-10-29 15:16                         ` Mark Mielke
  2007-10-30 10:28                         ` Tomash Brechko
  2 siblings, 1 reply; 208+ messages in thread
From: Darryl Miles @ 2007-10-29 15:16 UTC (permalink / raw)
  To: Michael Matz; +Cc: Mark Mielke, David Miller, gcc

Michael Matz wrote:
>   if (condition)
>     *p = value;
> 
> (i.e. without any synchronization primitive or in fact anything else after 
> the store in the control region) and expect that the store indeed only 
> happens in that control region.  And this expectation is misguided.  Had 
> they written it like:
> 
>   if (condition) {
>     *p = value;
>     membarrier();
>   }
> 
> it would have worked just fine.


Don't you need the barrier before.  This is to ensure it completed the 
condition test completely first before it then processed the assignment 
expression.

if(condition) {
  somebarrier();
  *p = value;
}

The issue is not that the store is done too late, but that a 
write-access is done too early.


Darryl

^ permalink raw reply	[flat|nested] 208+ messages in thread

* Re: Optimization of conditional access to globals: thread-unsafe?
  2007-10-29 15:09                       ` Michael Matz
  2007-10-29 15:16                         ` Darryl Miles
@ 2007-10-29 15:16                         ` Mark Mielke
  2007-10-30 10:28                         ` Tomash Brechko
  2 siblings, 0 replies; 208+ messages in thread
From: Mark Mielke @ 2007-10-29 15:16 UTC (permalink / raw)
  To: Michael Matz; +Cc: David Miller, gcc

Michael Matz wrote:
> Yes, and of course GCC doesn't move stores or loads over functions calls.  
> That's not the issue at all.  The issue is, that people want to write 
> this:
>   if (condition)
>     *p = value;
> (i.e. without any synchronization primitive or in fact anything else after 
> the store in the control region) and expect that the store indeed only 
> happens in that control region.  And this expectation is misguided.
If this is correct (condition makes no function calls and no volatiles 
are used, then I understand and agree.

Cheers,
mark

-- 
Mark Mielke <mark@mielke.cc>

^ permalink raw reply	[flat|nested] 208+ messages in thread

* Re: Optimization of conditional access to globals: thread-unsafe?
  2007-10-29  4:32                             ` David Miller
  2007-10-29  4:54                               ` skaller
@ 2007-10-29 15:14                               ` Michael Matz
  1 sibling, 0 replies; 208+ messages in thread
From: Michael Matz @ 2007-10-29 15:14 UTC (permalink / raw)
  To: David Miller; +Cc: skaller, dave.korn, tomash.brechko, gcc

Hi,

On Sun, 28 Oct 2007, David Miller wrote:

> the program would have allowed them to occur, otherwise you risk
> taking exceptions.
> 
> Do you really think that:
> 
> 	the_pointer_is_valid = func(potentially_bad_pointer);
> 	if (the_pointer_is_valid)
> 		*potentially_bad_pointer++;
> 
> should generate any memory accesses when 'the_pointer_is_valid'
> evaluates to false?

No, it of course should not.  And, surprise, it doesn't.  But in the 
following example the store can be moved out of it's control region:

	the_pointer_is_valid = func(potentially_bad_pointer);
	*potentially_bad_pointer = 0;  // Oops, unchecked access
	if (the_pointer_is_valid)
		*potentially_bad_pointer++;

Due to the unchecked access above, GCC is entitled to assume that also the 
second access will not trap, hence it can be moved down.

> And yet this is just another form of our original "threading" example:
> 
> 	if (pthread_mutex_trylock(lock))
> 		*counter++;

As GCC can't know that *counter will not trap, this is maybe the same code 
as your first example, but like there GCC won't do anything interesting.

> Only if you can prove that the program would access said memory with 
> said kind of access (read or write) can you legally speculate.

Good, we agree.  Where does GCC currently break this?


Ciao,
Michael.

^ permalink raw reply	[flat|nested] 208+ messages in thread

* Re: Optimization of conditional access to globals: thread-unsafe?
  2007-10-29  2:23                     ` Mark Mielke
@ 2007-10-29 15:09                       ` Michael Matz
  2007-10-29 15:16                         ` Darryl Miles
                                           ` (2 more replies)
  0 siblings, 3 replies; 208+ messages in thread
From: Michael Matz @ 2007-10-29 15:09 UTC (permalink / raw)
  To: Mark Mielke; +Cc: David Miller, gcc

Hi,

On Sun, 28 Oct 2007, Mark Mielke wrote:

> are not like this. Most uses require only lose ordering. The lose 
> ordering is providing by a mutex or other synchronization primitive. As 
> any function call might call a synchronization primitive, this would 
> mean that any function call should ensure that all scheduled reads or 
> writes to shared data before the function is called, be performed before 
> the function is called. Similarly, all such data may have changed by the 
> time the function returns. Unless the function can be proven to have no 
> effect (global optimization analysis? function inlining?), this is 
> expected behavior.

Yes, and of course GCC doesn't move stores or loads over functions calls.  
That's not the issue at all.  The issue is, that people want to write 
this:

  if (condition)
    *p = value;

(i.e. without any synchronization primitive or in fact anything else after 
the store in the control region) and expect that the store indeed only 
happens in that control region.  And this expectation is misguided.  Had 
they written it like:

  if (condition) {
    *p = value;
    membarrier();
  }

it would have worked just fine.


Ciao,
Michael.

^ permalink raw reply	[flat|nested] 208+ messages in thread

* Re: Optimization of conditional access to globals: thread-unsafe?
  2007-10-29  2:05                         ` David Miller
                                             ` (2 preceding siblings ...)
  2007-10-29  5:08                           ` Darryl Miles
@ 2007-10-29 15:00                           ` Michael Matz
  2007-10-29 16:20                             ` Tomash Brechko
  3 siblings, 1 reply; 208+ messages in thread
From: Michael Matz @ 2007-10-29 15:00 UTC (permalink / raw)
  To: David Miller; +Cc: dave.korn, tomash.brechko, gcc

Hi,

On Sun, 28 Oct 2007, David Miller wrote:

> The compiler simply cannot speculatively load or store to variables with 
> global visibility.
> 
> Suggesting volatile is totally impractical and in fact overkill.
> 
> Even basic correct single-threaded UNIX programs are broken by these 
> speculative stores.  If I use a conditional test to protect access to 
> memory mmap()'d with a read-only attribute, GCC's optimization will 
> cause write-protection exceptions.

No it won't, because without further information GCC can't know that a 
memory access won't trap.  Ergo it will not move it out of its control 
region, exactly because it would potentially introduce traps where there 
were none before.

It also will not blindly do speculative loads and stores as you suggested 
above, please get your facts straight to not muddy the water.  It will for 
instance not move stores over memory barriers.  It will not move them over 
(non-const) function calls.  These guarantees are completely sufficient to 
write thread-safe code, e.g. by including a mem barrier after a store to a 
shared global (inside the control region still).


Ciao,
Michael.

^ permalink raw reply	[flat|nested] 208+ messages in thread

* Re: Optimization of conditional access to globals: thread-unsafe?
  2007-10-29 12:18                     ` Tomash Brechko
@ 2007-10-29 14:12                       ` Andi Kleen
  0 siblings, 0 replies; 208+ messages in thread
From: Andi Kleen @ 2007-10-29 14:12 UTC (permalink / raw)
  To: Tomash Brechko; +Cc: Andi Kleen, gcc

On Mon, Oct 29, 2007 at 03:14:06PM +0300, Tomash Brechko wrote:
> On Mon, Oct 29, 2007 at 12:54:22 +0100, Andi Kleen wrote:
> > See http://gcc.gnu.org/ml/gcc/2007-10/msg00607.html for a test case
> > that shows where it can go horrible wrong (optimized code significantly 
> > slower than unoptimized code) Admittedly it is a constructed
> > one, but I don't think it is that unrealistic.
> 
> Thanks.  I had to change %Lu to %lu, and the example shows the point
> when run multiple times.

Sorry that was a glibc'ism. The correct C99 specifier would be %llu 
or %qu for traditional BSD.

-Andi

^ permalink raw reply	[flat|nested] 208+ messages in thread

* Re: Optimization of conditional access to globals: thread-unsafe?
  2007-10-29 11:57                   ` Andi Kleen
@ 2007-10-29 12:18                     ` Tomash Brechko
  2007-10-29 14:12                       ` Andi Kleen
  0 siblings, 1 reply; 208+ messages in thread
From: Tomash Brechko @ 2007-10-29 12:18 UTC (permalink / raw)
  To: Andi Kleen; +Cc: gcc

On Mon, Oct 29, 2007 at 12:54:22 +0100, Andi Kleen wrote:
> See http://gcc.gnu.org/ml/gcc/2007-10/msg00607.html for a test case
> that shows where it can go horrible wrong (optimized code significantly 
> slower than unoptimized code) Admittedly it is a constructed
> one, but I don't think it is that unrealistic.

Thanks.  I had to change %Lu to %lu, and the example shows the point
when run multiple times.



-- 
   Tomash Brechko

^ permalink raw reply	[flat|nested] 208+ messages in thread

* Re: Optimization of conditional access to globals: thread-unsafe?
  2007-10-29 12:08                               ` Darryl Miles
@ 2007-10-29 12:14                                 ` Robert Dewar
  2007-10-29 17:04                                 ` skaller
  1 sibling, 0 replies; 208+ messages in thread
From: Robert Dewar @ 2007-10-29 12:14 UTC (permalink / raw)
  To: Darryl Miles; +Cc: David Miller, dave.korn, tomash.brechko, matz, gcc

Darryl Miles wrote:

> This then leads into the question.  Is a pointer allowed to be invalid.
> 
> I'm sure I have read a comment on this before, along the line of the 
> spec says it must be valid or a certain number of other values (like 
> zero or one past being valid).  But I can not cite chapter and verse if 
> this is true.

Well most certainly it can be null, and the code sequence in question
could just be testing a pointer for being null, that's certainly very
common code, and in fact I don't see why this won't cause trouble???

^ permalink raw reply	[flat|nested] 208+ messages in thread

* Re: Optimization of conditional access to globals: thread-unsafe?
  2007-10-29  7:43                             ` David Miller
@ 2007-10-29 12:08                               ` Darryl Miles
  2007-10-29 12:14                                 ` Robert Dewar
  2007-10-29 17:04                                 ` skaller
  2007-10-29 16:47                               ` Joe Buck
  1 sibling, 2 replies; 208+ messages in thread
From: Darryl Miles @ 2007-10-29 12:08 UTC (permalink / raw)
  To: David Miller; +Cc: dave.korn, tomash.brechko, matz, gcc

David Miller wrote:
> From: Darryl Miles <darryl-mailinglists@netbauds.net>
> Date: Mon, 29 Oct 2007 04:53:49 +0000
> 
>> What are the issues with "speculative loads" ?
> 
> The conditional might be protecting whether the pointer is valid and
> can be dereferenced at all.

This then leads into the question.  Is a pointer allowed to be invalid.

I'm sure I have read a comment on this before, along the line of the 
spec says it must be valid or a certain number of other values (like 
zero or one past being valid).  But I can not cite chapter and verse if 
this is true.

I would agree however (before you say) that 'counter' by itself is just 
a variable, and it is only when execution allows it to be dereferenced 
that the issues about its validity come into play.  This is practical 
common law usage.


> And in another module that GCC can't see when compiling foo():

I agree, any external symbols might not even be in the 'C' language, but 
those symbols do conform to the ABI and the unwritten rules of what the 
value represents is implied when you assign an equivalent type 'int' to 
it in a C variable declaration.


Darryl

^ permalink raw reply	[flat|nested] 208+ messages in thread

* Re: Optimization of conditional access to globals: thread-unsafe?
       [not found]                 ` <20071028141821.GA4898@moonlight.home.suse.lists.egcs>
@ 2007-10-29 11:57                   ` Andi Kleen
  2007-10-29 12:18                     ` Tomash Brechko
  0 siblings, 1 reply; 208+ messages in thread
From: Andi Kleen @ 2007-10-29 11:57 UTC (permalink / raw)
  To: Tomash Brechko; +Cc: gcc

Tomash Brechko <tomash.brechko@gmail.com> writes:


>   - optimization in question might well turn out to be misoptimization
>     for anything but microbenchmarks (read LKML for cache flush/dirty
>     page issues).

See http://gcc.gnu.org/ml/gcc/2007-10/msg00607.html for a test case
that shows where it can go horrible wrong (optimized code significantly 
slower than unoptimized code) Admittedly it is a constructed
one, but I don't think it is that unrealistic.

>   - there's also a good talk on lawyer-ish vs attached-to-reality
>     approach.  I personally doubt those who continue to advise to use
>     volatile are actually writing such multithreaded programs.  Most
>     argue just for the fun of it.

Also they don't volunteer to audit multi-million LOC code bases to add
volatile everywhere. That has to be always taken into account.  For
the compiler it is a relatively simple localized change and then the
computer does all the work. For the compiled programs
auditing/changing this would be a huge effort done by humans.

-Andi

^ permalink raw reply	[flat|nested] 208+ messages in thread

* Re: Optimization of conditional access to globals: thread-unsafe?
  2007-10-29  1:37                       ` David Miller
  2007-10-29  3:22                         ` skaller
@ 2007-10-29 11:54                         ` Robert Dewar
  2007-10-29 15:21                           ` Michael Matz
  2007-10-29 16:29                         ` Joe Buck
  2 siblings, 1 reply; 208+ messages in thread
From: Robert Dewar @ 2007-10-29 11:54 UTC (permalink / raw)
  To: David Miller; +Cc: dave.korn, matz, gcc

One thing that seems missing from this thread is any quantitative
analysis of the value of this optimization. Although justifiable
from a formal point of view, whenever an optimization causes
difficulties for a significant number of users, it is important
to justify the optimization with clear data that it is really
valuable, otherwise it should be off by default.

^ permalink raw reply	[flat|nested] 208+ messages in thread

* Re: Optimization of conditional access to globals: thread-unsafe?
  2007-10-29  9:12                                               ` Tomash Brechko
@ 2007-10-29  9:35                                                 ` Tomash Brechko
  0 siblings, 0 replies; 208+ messages in thread
From: Tomash Brechko @ 2007-10-29  9:35 UTC (permalink / raw)
  To: Eric Botcazou; +Cc: gcc

On Mon, Oct 29, 2007 at 12:04:14 +0300, Tomash Brechko wrote:
> Rather, "...before it released the mutex, and we acuired the same
> mutex".  But it may be the same thread actually, so "final value" is
> the value that is seen by the thread at the beginning of excusive
> access to the object.  It is "final" wrt previous exclusive access to
> this object.

Note that this doesn't require the value to actually _be_ in the
memory, only to be observed as if it is there.  That's the power of
POSIX Threads, and that's why memory barriers, not cache flushes, are
behind pthread_mutex_lock() and friends.


-- 
   Tomash Brechko

^ permalink raw reply	[flat|nested] 208+ messages in thread

* Re: Optimization of conditional access to globals: thread-unsafe?
  2007-10-29  9:04                                             ` Tomash Brechko
@ 2007-10-29  9:12                                               ` Tomash Brechko
  2007-10-29  9:35                                                 ` Tomash Brechko
  2007-10-29 22:04                                               ` Eric Botcazou
  1 sibling, 1 reply; 208+ messages in thread
From: Tomash Brechko @ 2007-10-29  9:12 UTC (permalink / raw)
  To: Eric Botcazou; +Cc: gcc

On Mon, Oct 29, 2007 at 11:55:25 +0300, Tomash Brechko wrote:
> OK, formally there's no "final" value from current thread's POV, only
> the "current" value.  "Final" only matters from other thread's POV,
> like "this is the last value that was produced by another thread
> before it released the mutex".

Rather, "...before it released the mutex, and we acuired the same
mutex".  But it may be the same thread actually, so "final value" is
the value that is seen by the thread at the beginning of excusive
access to the object.  It is "final" wrt previous exclusive access to
this object.


-- 
   Tomash Brechko

^ permalink raw reply	[flat|nested] 208+ messages in thread

* Re: Optimization of conditional access to globals: thread-unsafe?
  2007-10-29  8:55                                           ` Eric Botcazou
@ 2007-10-29  9:04                                             ` Tomash Brechko
  2007-10-29  9:12                                               ` Tomash Brechko
  2007-10-29 22:04                                               ` Eric Botcazou
  0 siblings, 2 replies; 208+ messages in thread
From: Tomash Brechko @ 2007-10-29  9:04 UTC (permalink / raw)
  To: Eric Botcazou; +Cc: gcc

On Mon, Oct 29, 2007 at 09:50:16 +0100, Eric Botcazou wrote:
> Right, so please define more or less formally what the "final value" is from 
> the viewpoint of the current thread, this is the crux of the matter.

OK, formally there's no "final" value from current thread's POV, only
the "current" value.  "Final" only matters from other thread's POV,
like "this is the last value that was produced by another thread
before it released the mutex".

But shouldn't we formally define "define" first? :)


-- 
   Tomash Brechko

^ permalink raw reply	[flat|nested] 208+ messages in thread

* Re: Optimization of conditional access to globals: thread-unsafe?
  2007-10-29  8:49                                         ` Tomash Brechko
@ 2007-10-29  8:55                                           ` Eric Botcazou
  2007-10-29  9:04                                             ` Tomash Brechko
  0 siblings, 1 reply; 208+ messages in thread
From: Eric Botcazou @ 2007-10-29  8:55 UTC (permalink / raw)
  To: Tomash Brechko; +Cc: gcc

> And because your next question will be "how the compiler will know the
> corresponding mutex", the answer is: it can't, that's why "opaque
> function" rules come to play.

Right, so please define more or less formally what the "final value" is from 
the viewpoint of the current thread, this is the crux of the matter.

-- 
Eric Botcazou

^ permalink raw reply	[flat|nested] 208+ messages in thread

* Re: Optimization of conditional access to globals: thread-unsafe?
  2007-10-29  8:44                                       ` Tomash Brechko
@ 2007-10-29  8:49                                         ` Tomash Brechko
  2007-10-29  8:55                                           ` Eric Botcazou
  0 siblings, 1 reply; 208+ messages in thread
From: Tomash Brechko @ 2007-10-29  8:49 UTC (permalink / raw)
  To: Eric Botcazou; +Cc: gcc

On Mon, Oct 29, 2007 at 11:42:10 +0300, Tomash Brechko wrote:
> It means that the current thread is free to cache the value in the
> register as long as it knows no other thread can access it (i.e. as
> long as it holds corresponding mutex).

And because your next question will be "how the compiler will know the
corresponding mutex", the answer is: it can't, that's why "opaque
function" rules come to play.


-- 
   Tomash Brechko

^ permalink raw reply	[flat|nested] 208+ messages in thread

* Re: Optimization of conditional access to globals: thread-unsafe?
  2007-10-29  8:42                                     ` Eric Botcazou
@ 2007-10-29  8:44                                       ` Tomash Brechko
  2007-10-29  8:49                                         ` Tomash Brechko
  0 siblings, 1 reply; 208+ messages in thread
From: Tomash Brechko @ 2007-10-29  8:44 UTC (permalink / raw)
  To: Eric Botcazou; +Cc: gcc

On Mon, Oct 29, 2007 at 09:31:13 +0100, Eric Botcazou wrote:
> > The value that will be seen by other threads after they synchronize
> > memory (with pthread_mutex_lock(), for instance).
> 
> What does it mean from the viewpoint of the current thread?

It means that the current thread is free to cache the value in the
register as long as it knows no other thread can access it (i.e. as
long as it holds corresponding mutex).


-- 
   Tomash Brechko

^ permalink raw reply	[flat|nested] 208+ messages in thread

* Re: Optimization of conditional access to globals: thread-unsafe?
  2007-10-29  8:30                                   ` Tomash Brechko
@ 2007-10-29  8:42                                     ` Eric Botcazou
  2007-10-29  8:44                                       ` Tomash Brechko
  0 siblings, 1 reply; 208+ messages in thread
From: Eric Botcazou @ 2007-10-29  8:42 UTC (permalink / raw)
  To: Tomash Brechko; +Cc: gcc

> The value that will be seen by other threads after they synchronize
> memory (with pthread_mutex_lock(), for instance).

What does it mean from the viewpoint of the current thread?

-- 
Eric Botcazou

^ permalink raw reply	[flat|nested] 208+ messages in thread

* Re: Optimization of conditional access to globals: thread-unsafe?
  2007-10-29  8:21                                 ` Eric Botcazou
@ 2007-10-29  8:30                                   ` Tomash Brechko
  2007-10-29  8:42                                     ` Eric Botcazou
  0 siblings, 1 reply; 208+ messages in thread
From: Tomash Brechko @ 2007-10-29  8:30 UTC (permalink / raw)
  To: Eric Botcazou; +Cc: gcc

On Mon, Oct 29, 2007 at 09:12:09 +0100, Eric Botcazou wrote:
> Define "final value".

The value that will be seen by other threads after they synchronize
memory (with pthread_mutex_lock(), for instance).


-- 
   Tomash Brechko

^ permalink raw reply	[flat|nested] 208+ messages in thread

* Re: Optimization of conditional access to globals: thread-unsafe?
  2007-10-29  8:11                                 ` Andrew Pinski
@ 2007-10-29  8:22                                   ` Tomash Brechko
  0 siblings, 0 replies; 208+ messages in thread
From: Tomash Brechko @ 2007-10-29  8:22 UTC (permalink / raw)
  To: Andrew Pinski; +Cc: gcc

On Mon, Oct 29, 2007 at 01:08:22 -0700, Andrew Pinski wrote:
> On 10/29/07, Tomash Brechko <tomash.brechko@gmail.com> wrote:
> > But if *v is simply shared, do all stores to it matter?  No, only the
> > final value is relevant.
> 
> Actually it depends, it might matter.  If you have a loop checking the
> value of *v on a different thread and it does not change until this
> loop is done, then you end up with a wrong wait.  This is the same as
> what violatile is for really where it will change out side of the
> current thread.

Such program would be incorrect wrt POSIX Threads: you shouldn't read
the object that may be modified by another thread.  Such "wait" loop
is always wrong wrt POSIX Threads.


-- 
   Tomash Brechko

^ permalink raw reply	[flat|nested] 208+ messages in thread

* Re: Optimization of conditional access to globals: thread-unsafe?
  2007-10-29  8:08                               ` Tomash Brechko
  2007-10-29  8:11                                 ` Andrew Pinski
@ 2007-10-29  8:21                                 ` Eric Botcazou
  2007-10-29  8:30                                   ` Tomash Brechko
  1 sibling, 1 reply; 208+ messages in thread
From: Eric Botcazou @ 2007-10-29  8:21 UTC (permalink / raw)
  To: Tomash Brechko; +Cc: gcc

>   volatile int *v = (int *) 0xdeadbeef;
>
>   void
>   f()
>   {
>     int i;
>     for (i = 0; i < N; ++i)
>       *v = 1;
>   }
>
>
> _all_ N stores matter.  Why?  Because v may point to the device I/O
> port, and the device may _count_ those writes among other things.
>
> But if *v is simply shared, do all stores to it matter?  No, only the
> final value is relevant.

Define "final value".

-- 
Eric Botcazou

^ permalink raw reply	[flat|nested] 208+ messages in thread

* Re: Optimization of conditional access to globals: thread-unsafe?
  2007-10-29  8:08                               ` Tomash Brechko
@ 2007-10-29  8:11                                 ` Andrew Pinski
  2007-10-29  8:22                                   ` Tomash Brechko
  2007-10-29  8:21                                 ` Eric Botcazou
  1 sibling, 1 reply; 208+ messages in thread
From: Andrew Pinski @ 2007-10-29  8:11 UTC (permalink / raw)
  To: Tomash Brechko; +Cc: gcc

On 10/29/07, Tomash Brechko <tomash.brechko@gmail.com> wrote:
> But if *v is simply shared, do all stores to it matter?  No, only the
> final value is relevant.

Actually it depends, it might matter.  If you have a loop checking the
value of *v on a different thread and it does not change until this
loop is done, then you end up with a wrong wait.  This is the same as
what violatile is for really where it will change out side of the
current thread.

-- Pinski

^ permalink raw reply	[flat|nested] 208+ messages in thread

* Re: Optimization of conditional access to globals: thread-unsafe?
  2007-10-29  8:03                             ` Tomash Brechko
@ 2007-10-29  8:08                               ` Tomash Brechko
  2007-10-29  8:11                                 ` Andrew Pinski
  2007-10-29  8:21                                 ` Eric Botcazou
  0 siblings, 2 replies; 208+ messages in thread
From: Tomash Brechko @ 2007-10-29  8:08 UTC (permalink / raw)
  To: gcc

On Mon, Oct 29, 2007 at 10:43:13 +0300, Tomash Brechko wrote:
> I think most pro-volatile people didn't understood the meaning of
> several papers in the Internet that say you have to use volatile.

And some don't understand the true purposes of volatile itself.  In
the code below


  volatile int *v = (int *) 0xdeadbeef;

  void
  f()
  {
    int i;
    for (i = 0; i < N; ++i)
      *v = 1;
  }


_all_ N stores matter.  Why?  Because v may point to the device I/O
port, and the device may _count_ those writes among other things.

But if *v is simply shared, do all stores to it matter?  No, only the
final value is relevant.

That's why -fno-speculative-store will never be equal to volatile, and
that's why it is needed to replace current volatile hammer.


-- 
   Tomash Brechko

^ permalink raw reply	[flat|nested] 208+ messages in thread

* Re: Optimization of conditional access to globals: thread-unsafe?
  2007-10-29  2:54                           ` Dave Korn
                                               ` (2 preceding siblings ...)
  2007-10-29  4:35                             ` Mark Mielke
@ 2007-10-29  8:03                             ` Tomash Brechko
  2007-10-29  8:08                               ` Tomash Brechko
  3 siblings, 1 reply; 208+ messages in thread
From: Tomash Brechko @ 2007-10-29  8:03 UTC (permalink / raw)
  To: Dave Korn; +Cc: 'David Miller', matz, gcc

On Mon, Oct 29, 2007 at 02:39:15 -0000, Dave Korn wrote:
>   BTW, you and Tomash should get your stories in synch.  He says speculative
> loads are ok, just no stores, and wants a kind of half-volatile flag that
> would only suppress stores.  I think you're already looking one step further
> down the road than he is and have realised that speculative loads will give
> you problems too.

You don't do your homework.  This pointer
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2338.html
(which was already posted in this thread) explains the matter, see
"Speculative code motion involving loads" section.  So both David and
me are correct.


But curious, Bart already tried _several times_ to explain why using
volatile is not an option, but his arguments seem to be too
"inconvenient" to be considered.  Let me repeat: suppose we agree that
every shared data should be annotated as volatile.  So if I want to
share dynamic data, I have to write


   _volatile_ data_type *pdata = malloc(size);


But how to use this data?  There are not many library functions that
accept pointer to volatile (and casting the qualifier away will bring
us back to the start).  Should every library function have 2^n copies
where different combinations of parameters are annotated as volatile?


I think most pro-volatile people didn't understood the meaning of
several papers in the Internet that say you have to use volatile.
Those papers never meant to say that volatile is a proper way to use
shared data with POSIX threads, rather that because the compilers are
made the way they are you have to use volatile for now to overcome
compiler thread-unawareness.


David R. Butenhof was the member of POSIX.1c (POSIX Threads)
committee.  In his book, "Programming with POSIX Threads", there are
no volatiles at all.  Of course one can say he didn't grok C, or even
POSIX, or POSIX Threads.  But it shows the intent, at least how he
felt it.

And this is the way to go: in sane world standards follow the reality,
not the other way around.  And they will, that's why the work of Hans
Boehm is there.  As it was already mentioned in this thread, while his
proposal is not final yet, most of the work is being done on atomics,
so it highly unlikely that "no-speculative-stores-please" requirement
will change.



-- 
   Tomash Brechko

^ permalink raw reply	[flat|nested] 208+ messages in thread

* Re: Optimization of conditional access to globals: thread-unsafe?
  2007-10-29  5:08                           ` Darryl Miles
@ 2007-10-29  7:43                             ` David Miller
  2007-10-29 12:08                               ` Darryl Miles
  2007-10-29 16:47                               ` Joe Buck
  0 siblings, 2 replies; 208+ messages in thread
From: David Miller @ 2007-10-29  7:43 UTC (permalink / raw)
  To: darryl-mailinglists; +Cc: dave.korn, tomash.brechko, matz, gcc

From: Darryl Miles <darryl-mailinglists@netbauds.net>
Date: Mon, 29 Oct 2007 04:53:49 +0000

> What are the issues with "speculative loads" ?

The conditional might be protecting whether the pointer is valid and
can be dereferenced at all.

int *counter;

void foo(int counter_is_valid)
{
	if (counter_is_valid)
		(*counter)++;
}

And in another module that GCC can't see when compiling foo():

extern int *counter;

int main(void)
{
	int a = 0;

	foo(0);
	counter = &a;
	foo(1);

	return 0;
}

^ permalink raw reply	[flat|nested] 208+ messages in thread

* Re: Optimization of conditional access to globals: thread-unsafe?
  2007-10-29  2:05                         ` David Miller
  2007-10-29  2:54                           ` Dave Korn
  2007-10-29  3:32                           ` skaller
@ 2007-10-29  5:08                           ` Darryl Miles
  2007-10-29  7:43                             ` David Miller
  2007-10-29 15:00                           ` Michael Matz
  3 siblings, 1 reply; 208+ messages in thread
From: Darryl Miles @ 2007-10-29  5:08 UTC (permalink / raw)
  To: David Miller; +Cc: dave.korn, tomash.brechko, matz, gcc

David Miller wrote:
> The compiler simply cannot speculatively load or store to variables
> with global visibility.

s/with global visibility/with visibility outside the scope of the 
functional unit the compiler is able to see at compile time/

Which basically means the compiler is king for doing these tricks with 
CPU registers, areas of the stack and inlined functional units in which 
it can be 100% sure about it access to this data.


What are the issues with "speculative loads" ?  Is there such a page as 
a write only page used by any system GCC targets ?  For general usage 
the x86 concept of read-only or read-write fits well, which means that 
speculative load's are usually a safe optimization.

But I'd be all for a way to allow/disallow each optimization 
independently (this give the developer more choice in the matter).  With 
"speculative loads" enabled by default and "speculative stores" disabled 
by default for any multi-threaded code.

As per my other posting have the ability to 
__attribute__((disallow_speculative_load,disallow_speculative_store)) or 
to __attribute__((allow_speculative_load,allow_speculative_store)) to 
pin the issue.  With -fdisallow-speculative-load 
-fallow-speculative-load etc... for the defaults for the entire file 
being compiled.


Darryl

^ permalink raw reply	[flat|nested] 208+ messages in thread

* Re: Optimization of conditional access to globals: thread-unsafe?
  2007-10-29  4:32                             ` David Miller
@ 2007-10-29  4:54                               ` skaller
  2007-10-29 15:14                               ` Michael Matz
  1 sibling, 0 replies; 208+ messages in thread
From: skaller @ 2007-10-29  4:54 UTC (permalink / raw)
  To: David Miller; +Cc: dave.korn, tomash.brechko, matz, gcc


On Sun, 2007-10-28 at 20:32 -0700, David Miller wrote:
> From: skaller <skaller@users.sourceforge.net>

> > That is the programmers fault, they should have accessed the 
> > variable using a const. Failing to do so gives the compiler
> > permission to write speculatively.
> 
> I do not agree with you.

Yeah, on consideration you're probably right.

> It is perfectly legal to use read-only protection to implement
> things like efficient garbage collection scans.

Yes. And I'm wrong about 'const': my way you'd have to make it
const and cast to non-const to prevent speculative writes.
That's unworkable ..

> It's not even write exceptions, what about the pointer being
> valid at all?

That's a different case. 

Gcc already provides a way to do this on say AMD64,
using __builtin_prefetch. That instruction is perfectly
legal on an invalid address. Yeah I know this isn't a
complete load (the actual load into a register has to
be done as well).

-- 
John Skaller <skaller at users dot sf dot net>
Felix, successor to C++: http://felix.sf.net

^ permalink raw reply	[flat|nested] 208+ messages in thread

* Re: Optimization of conditional access to globals: thread-unsafe?
  2007-10-29  2:54                           ` Dave Korn
  2007-10-29  3:04                             ` David Miller
  2007-10-29  3:08                             ` David Miller
@ 2007-10-29  4:35                             ` Mark Mielke
  2007-10-29  8:03                             ` Tomash Brechko
  3 siblings, 0 replies; 208+ messages in thread
From: Mark Mielke @ 2007-10-29  4:35 UTC (permalink / raw)
  To: Dave Korn; +Cc: 'David Miller', tomash.brechko, matz, gcc

Dave Korn wrote:
> On 29 October 2007 01:38, David Miller wrote:
>
>   
>   You'll be back.  Next week, you'll discover a corner case where caching a
> shared variable in a register can be a bad thing when one thread uses locks
> and the other doesn't, and you'll be back to demand that optimisation is
> removed as well.
>   
Why would David ask for something so unreasonable?

Why do you believe that the use of mutex to synchronize access to a 
shared resource, without the use of volatile on the shared resource 
being accessed, as documented in countless real life examples, is 
unreasonable or incorrect?

I do not understand your position.

Cheers,
mark

-- 
Mark Mielke <mark@mielke.cc>

^ permalink raw reply	[flat|nested] 208+ messages in thread

* Re: Optimization of conditional access to globals: thread-unsafe?
  2007-10-29  3:32                           ` skaller
@ 2007-10-29  4:32                             ` David Miller
  2007-10-29  4:54                               ` skaller
  2007-10-29 15:14                               ` Michael Matz
  0 siblings, 2 replies; 208+ messages in thread
From: David Miller @ 2007-10-29  4:32 UTC (permalink / raw)
  To: skaller; +Cc: dave.korn, tomash.brechko, matz, gcc

From: skaller <skaller@users.sourceforge.net>
Date: Mon, 29 Oct 2007 14:21:10 +1100

> 
> On Sun, 2007-10-28 at 18:37 -0700, David Miller wrote:
> > Even basic correct single-threaded UNIX programs are broken by these
> > speculative stores.  If I use a conditional test to protect access to
> > memory mmap()'d with a read-only attribute, GCC's optimization will
> > cause write-protection exceptions.
> 
> That is the programmers fault, they should have accessed the 
> variable using a const. Failing to do so gives the compiler
> permission to write speculatively.

I do not agree with you.

It is perfectly legal to use read-only protection to implement
things like efficient garbage collection scans.

It's not even write exceptions, what about the pointer being
valid at all?

Memory accesses really are special.  You can only execute them when
the program would have allowed them to occur, otherwise you risk
taking exceptions.

Do you really think that:

	the_pointer_is_valid = func(potentially_bad_pointer);
	if (the_pointer_is_valid)
		*potentially_bad_pointer++;

should generate any memory accesses when 'the_pointer_is_valid'
evaluates to false?

And yet this is just another form of our original "threading" example:

	if (pthread_mutex_trylock(lock))
		*counter++;

It shows that memory accesses are a fundamental issue.

Only if you can prove that the program would access said memory with
said kind of access (read or write) can you legally speculate.

Happily it seems that for the cases where it helps code generation
substantially, this precondition is true.

^ permalink raw reply	[flat|nested] 208+ messages in thread

* Re: Optimization of conditional access to globals: thread-unsafe?
  2007-10-29  2:05                         ` David Miller
  2007-10-29  2:54                           ` Dave Korn
@ 2007-10-29  3:32                           ` skaller
  2007-10-29  4:32                             ` David Miller
  2007-10-29  5:08                           ` Darryl Miles
  2007-10-29 15:00                           ` Michael Matz
  3 siblings, 1 reply; 208+ messages in thread
From: skaller @ 2007-10-29  3:32 UTC (permalink / raw)
  To: David Miller; +Cc: dave.korn, tomash.brechko, matz, gcc


On Sun, 2007-10-28 at 18:37 -0700, David Miller wrote:
> From: "Dave Korn" <dave.korn@artimi.com>
> Date: Mon, 29 Oct 2007 01:16:07 -0000

> The compiler simply cannot speculatively load or store to variables
> with global visibility.

I think it can.

> Suggesting volatile is totally impractical and in fact overkill.
> 
> Even basic correct single-threaded UNIX programs are broken by these
> speculative stores.  If I use a conditional test to protect access to
> memory mmap()'d with a read-only attribute, GCC's optimization will
> cause write-protection exceptions.

That is the programmers fault, they should have accessed the 
variable using a const. Failing to do so gives the compiler
permission to write speculatively.

-- 
John Skaller <skaller at users dot sf dot net>
Felix, successor to C++: http://felix.sf.net

^ permalink raw reply	[flat|nested] 208+ messages in thread

* Re: Optimization of conditional access to globals: thread-unsafe?
  2007-10-29  1:37                       ` David Miller
@ 2007-10-29  3:22                         ` skaller
  2007-10-29 11:54                         ` Robert Dewar
  2007-10-29 16:29                         ` Joe Buck
  2 siblings, 0 replies; 208+ messages in thread
From: skaller @ 2007-10-29  3:22 UTC (permalink / raw)
  To: David Miller; +Cc: dave.korn, matz, gcc


On Sun, 2007-10-28 at 18:34 -0700, David Miller wrote:

> More importantly, you cannot break things on people out of mere
> convenience.

If you want a case of this .. its the ill-considered strict aliasing
rules in C. WG14 seems to think C had a strong enough type system
to make this rule, but it does not. So gcc provides a switch
to turn it off.

This is actually a bit annoying, because the granularity is not
so sweet: I'd be happy with floating point aliasing to be strict,
but not integers and *definitely* not pointers.

C is too brain dead for strict aliasing: it could break many
memory management codes which, for example, alias memory
with pointer to void* for alignment purposes, or intptr_t 
for bit fiddling. Or code like this which I write:

	struct X { int x; } x;
	struct Y { int y[1]; } y;
	Y *py  = (Y*)(void*)&x;
	X *px = (X*)(void*)&y;

[My Felix compiler does this cast systematically and deliberately]

-- 
John Skaller <skaller at users dot sf dot net>
Felix, successor to C++: http://felix.sf.net

^ permalink raw reply	[flat|nested] 208+ messages in thread

* Re: Optimization of conditional access to globals: thread-unsafe?
  2007-10-29  2:54                           ` Dave Korn
  2007-10-29  3:04                             ` David Miller
@ 2007-10-29  3:08                             ` David Miller
  2007-10-29  4:35                             ` Mark Mielke
  2007-10-29  8:03                             ` Tomash Brechko
  3 siblings, 0 replies; 208+ messages in thread
From: David Miller @ 2007-10-29  3:08 UTC (permalink / raw)
  To: dave.korn; +Cc: tomash.brechko, matz, gcc

From: "Dave Korn" <dave.korn@artimi.com>
Date: Mon, 29 Oct 2007 02:39:15 -0000

> On 29 October 2007 01:38, David Miller wrote:
> 
> > Even basic correct single-threaded UNIX programs are broken by these
> > speculative stores.  If I use a conditional test to protect access to
> > memory mmap()'d with a read-only attribute, GCC's optimization will
> > cause write-protection exceptions.
> 
>   Hmm, that's a far more substantial argument.  It raises the question: is the
> compiler entitled to assume that a non-const pointer always points to
> non-const data?

Using mrprotect() to mark pages of garbage collection memory read-only
in the compiler in order to speed up GC sweeps done during compilation
has been suggested at times in the past.  The idea is that pages
marked read-only are elided from the GC scan lists (their state
remains the same if nobody writes to them) and to trap write access
exceptions via a signal handler, which puts back the write capability
for that page, and adds the page to the GC scan lists before returning
from the signal handler.

If GCC ever used this kind of technique, we can then proclaim with joy
that even GCC is not a properly written C program!

To me it's pretty clear that speculative stores have to be done with
extreme care, if at all.  Right now we know of many real life every
day examples that break because of them: threaded programs, OS
kernels, programs using signal handlers, and anything using
mprotect() in sophisticated ways such as garbage collectors.

^ permalink raw reply	[flat|nested] 208+ messages in thread

* Re: Optimization of conditional access to globals: thread-unsafe?
  2007-10-29  2:54                           ` Dave Korn
@ 2007-10-29  3:04                             ` David Miller
  2007-10-29  3:08                             ` David Miller
                                               ` (2 subsequent siblings)
  3 siblings, 0 replies; 208+ messages in thread
From: David Miller @ 2007-10-29  3:04 UTC (permalink / raw)
  To: dave.korn; +Cc: tomash.brechko, matz, gcc

From: "Dave Korn" <dave.korn@artimi.com>
Date: Mon, 29 Oct 2007 02:39:15 -0000

>   BTW, you and Tomash should get your stories in synch.  He says
> speculative loads are ok, just no stores, and wants a kind of
> half-volatile flag that would only suppress stores.  I think you're
> already looking one step further down the road than he is and have
> realised that speculative loads will give you problems too.

Probably speculative loads are OK, as long as function calls
to functions the compiler cannot see the complete implementation
of form an implicit boundary (ie. any memory might be modified)
which is happily does already.

In what cases those speculative loads are profitable is another
matter, given how expensive cache misses are compared to mispredicted
branches.

^ permalink raw reply	[flat|nested] 208+ messages in thread

* RE: Optimization of conditional access to globals: thread-unsafe?
  2007-10-29  2:05                         ` David Miller
@ 2007-10-29  2:54                           ` Dave Korn
  2007-10-29  3:04                             ` David Miller
                                               ` (3 more replies)
  2007-10-29  3:32                           ` skaller
                                             ` (2 subsequent siblings)
  3 siblings, 4 replies; 208+ messages in thread
From: Dave Korn @ 2007-10-29  2:54 UTC (permalink / raw)
  To: 'David Miller'; +Cc: tomash.brechko, matz, gcc

On 29 October 2007 01:38, David Miller wrote:

> From: "Dave Korn" <dave.korn@artimi.com>
> Date: Mon, 29 Oct 2007 01:16:07 -0000
> 
>>   Thing is, if you disable all optimisations that are potentially
>> unsafe in the presence of threads, won't you just get the same
>> effect as if you had used volatile anyway, only on every single
>> variable in the program instead of just the ones the programmer has
>> designated as sensitive?
> 
> This is not really what is being suggested at all.
> 
> The compiler simply cannot speculatively load or store to variables
> with global visibility.

  You'll be back.  Next week, you'll discover a corner case where caching a
shared variable in a register can be a bad thing when one thread uses locks
and the other doesn't, and you'll be back to demand that optimisation is
removed as well.

  BTW, you and Tomash should get your stories in synch.  He says speculative
loads are ok, just no stores, and wants a kind of half-volatile flag that
would only suppress stores.  I think you're already looking one step further
down the road than he is and have realised that speculative loads will give
you problems too.
 
> Suggesting volatile is totally impractical and in fact overkill.

  I keep hearing this claim, but nobody says why.  What /else/ does it do that
isn't necessary for correctness in this (or other) case(s)?

> Even basic correct single-threaded UNIX programs are broken by these
> speculative stores.  If I use a conditional test to protect access to
> memory mmap()'d with a read-only attribute, GCC's optimization will
> cause write-protection exceptions.

  Hmm, that's a far more substantial argument.  It raises the question: is the
compiler entitled to assume that a non-const pointer always points to
non-const data?

    cheers,
      DaveK
-- 
Can't think of a witty .sigline today....

^ permalink raw reply	[flat|nested] 208+ messages in thread

* Re: Optimization of conditional access to globals: thread-unsafe?
  2007-10-29  1:01                   ` David Miller
@ 2007-10-29  2:23                     ` Mark Mielke
  2007-10-29 15:09                       ` Michael Matz
  0 siblings, 1 reply; 208+ messages in thread
From: Mark Mielke @ 2007-10-29  2:23 UTC (permalink / raw)
  To: David Miller; +Cc: matz, gcc

David Miller wrote:
> From: Michael Matz <matz@suse.de>
> Date: Sun, 28 Oct 2007 18:08:23 +0100 (CET)
>   
>> I mean who am I to demand that people write correct code, 
>> I must be insane.
>>     
>
> Correctness is defined by pervasive common usage as much as it
> is by paper standards.
>   
Reading this thread, I find myself confused. GCC is used regularly for 
both multi-threaded and single-threaded code. It is impractical to 
require all variables that may be shared between threads to be declared 
volatile. Worse, I find myself suspecting it may be impossible. Any 
particular library may be used from a multi-threaded context or a 
single-threaded context, with a very common belief that the access can 
be protected by wrapping all accesses to the thread-unsafe resource with 
a mutex. Are some people here really suggesting that all variables 
everywhere be declared volatile?

I remain unconvinced that declaring these shared variables "volatile" is 
correct. Certainly, if the ordering of reads and writes must be 
carefully controlled completely by the programmer, volatile should be 
used. Most uses are not like this. Most uses require only lose ordering. 
The lose ordering is providing by a mutex or other synchronization 
primitive. As any function call might call a synchronization primitive, 
this would mean that any function call should ensure that all scheduled 
reads or writes to shared data before the function is called, be 
performed before the function is called. Similarly, all such data may 
have changed by the time the function returns. Unless the function can 
be proven to have no effect (global optimization analysis? function 
inlining?), this is expected behavior.

Am I stating the obvious? Is this an unreasonable expectation for some 
reason? Do I not understand the issue?

Cheers,
mark

-- 
Mark Mielke <mark@mielke.cc>

^ permalink raw reply	[flat|nested] 208+ messages in thread

* Re: Optimization of conditional access to globals: thread-unsafe?
  2007-10-29  1:29                       ` Dave Korn
@ 2007-10-29  2:05                         ` David Miller
  2007-10-29  2:54                           ` Dave Korn
                                             ` (3 more replies)
  0 siblings, 4 replies; 208+ messages in thread
From: David Miller @ 2007-10-29  2:05 UTC (permalink / raw)
  To: dave.korn; +Cc: tomash.brechko, matz, gcc

From: "Dave Korn" <dave.korn@artimi.com>
Date: Mon, 29 Oct 2007 01:16:07 -0000

>   Thing is, if you disable all optimisations that are potentially
> unsafe in the presence of threads, won't you just get the same
> effect as if you had used volatile anyway, only on every single
> variable in the program instead of just the ones the programmer has
> designated as sensitive?

This is not really what is being suggested at all.

The compiler simply cannot speculatively load or store to variables
with global visibility.

Suggesting volatile is totally impractical and in fact overkill.

Even basic correct single-threaded UNIX programs are broken by these
speculative stores.  If I use a conditional test to protect access to
memory mmap()'d with a read-only attribute, GCC's optimization will
cause write-protection exceptions.

^ permalink raw reply	[flat|nested] 208+ messages in thread

* Re: Optimization of conditional access to globals: thread-unsafe?
  2007-10-29  1:16                     ` Dave Korn
@ 2007-10-29  1:37                       ` David Miller
  2007-10-29  3:22                         ` skaller
                                           ` (2 more replies)
  0 siblings, 3 replies; 208+ messages in thread
From: David Miller @ 2007-10-29  1:37 UTC (permalink / raw)
  To: dave.korn; +Cc: matz, gcc

From: "Dave Korn" <dave.korn@artimi.com>
Date: Mon, 29 Oct 2007 01:05:06 -0000

>   "My way is right and everyone else's is wrong".

I didn't say that.  I said that what users do on a broad scale is an
important consideration that often trumps paper standards.  And yes,
users as well as the implementors themselves do in fact get to be a
part of making that determination.

Standards are also not infallible laws that should be followed
blindly.

More importantly, you cannot break things on people out of mere
convenience.

The paper standards don't matter if that's not what people actually
do.  Nobody marks all of their thread and signal accessed shared
variables as volatile, and telling them to do so does not solve the
problem.  Rather, it just infuriates those users.

Find me one OS kernel code base written in the C language that marks
all lock protected variables as volatile?  And no you cannot cop out
from this obvious example merely by saying that none of them are truly
written in the "C language."

Again, standards should be strongly questioned when they do not
acknowledge and co-exist with wide spread existing practice.

>   Better write your own compiler then.

If this becomes the common attitude of GCC developers, you can pretty
much guarentee this will drive people to work on LLVM and other
alternative compiler code bases.

^ permalink raw reply	[flat|nested] 208+ messages in thread

* RE: Optimization of conditional access to globals: thread-unsafe?
  2007-10-28 18:06                     ` Tomash Brechko
  2007-10-28 18:43                       ` Tomash Brechko
@ 2007-10-29  1:29                       ` Dave Korn
  2007-10-29  2:05                         ` David Miller
  1 sibling, 1 reply; 208+ messages in thread
From: Dave Korn @ 2007-10-29  1:29 UTC (permalink / raw)
  To: 'Tomash Brechko', 'Michael Matz'; +Cc: gcc

On 28 October 2007 18:03, Tomash Brechko wrote:

> You got my intent all wrong.  Performance matters for both sides.  And
> currently the only option for multithreaded programs is to use
> volatile, which _greatly_ hurts performance.
> 
> What I was trying to say, is that it would be nice to have
> -fno-thread-unsafe-optimization option.  And I was trying to say that
> when you _enable_ this option, the performance won't be hurt much,
> while the program will become thread-safe.  

  Thing is, if you disable all optimisations that are potentially unsafe in
the presence of threads, won't you just get the same effect as if you had used
volatile anyway, only on every single variable in the program instead of just
the ones the programmer has designated as sensitive?

> I never even said that
> this option should be the default (though it would be reasonable for
> -pthread or -fopenmp).  

  Well, as I said before, I'm not going to complain about some optional flag
that limits the compiler's optimisations, but I think that what's going to
happen is that you're going to find another race condition caused by another
optimiser, and want to add disabling that optimisation as well to this new
flag, and there's going to be a long process of repeated cycles of this, until
what you end up with is a flag that has the exact same effect as volatile.

> But there are obviously people who think
> there's no need in such option whatsoever, because "threaded code is
> broken by definition, and I don't write it anyway".
> 
> Even if mutithreading is of no immediate concern for you, it will
> become tomorrow then you decide to run your loop on all 1024 cores
> your cell phone provides.  

  This is smug patronising nonsense.  There are a lot of people around here
who've been writing multithreaded code for decades.  Not to mention
multi-processor code.  In both symmetric and asymmetric setups.  With and
without all kinds of coherent and non-coherent caching considerations to take
into account.  Please try not to claim that you are some kind of far sighted
visionary preaching the future to a bunch of ignorami; we've all been doing it
for years, and we do have a pretty good grasp of the concepts.

> So you can't argue that this option
> wouldn't be nice to have, no?

  I haven't needed it yet.  I mark my volatile variables as volatile, rather
than expecting the compiler to treat all variables the same indiscriminately.
You're right that volatile doesn't solve the whole problem; you still have to
write correctly threadsafe code, you still have to use locks, and you still
have to be aware of the huge complexities of lock-free code, but what it does
give you is the essential tool you need from the compiler for that: a strict
one-to-one relationship between the loads and stores in your high level source
and those in the emitted assembler.

> And as I understood this discussion, there will be such option in GCC
> in the nearest future.

  Well, as I don't mind confirming once more, if you really want
-fshoot-yourself-in-the-foot, you're welcome to it.

    cheers,
      DaveK
-- 
Can't think of a witty .sigline today....

^ permalink raw reply	[flat|nested] 208+ messages in thread

* RE: Optimization of conditional access to globals: thread-unsafe?
  2007-10-29  1:05                   ` David Miller
@ 2007-10-29  1:16                     ` Dave Korn
  2007-10-29  1:37                       ` David Miller
  0 siblings, 1 reply; 208+ messages in thread
From: Dave Korn @ 2007-10-29  1:16 UTC (permalink / raw)
  To: 'David Miller', matz; +Cc: gcc

On 29 October 2007 01:01, David Miller wrote:

> From: Michael Matz <matz@suse.de>
> Date: Sun, 28 Oct 2007 18:08:23 +0100 (CET)
> 
>> <tongue-in-cheek>
>> You mean like POSIX doesn't count very much for the kernel behaviour?
>> </tongue-in-cheek>
> 
> Nice scarecrow.
> 
> Linux has and will break POSIX where POSIX asks unreasonable and
> stupid things.
> 
> And in particular we will not follow POSIX if doing so breaks
> pervasive practices in userspace that have worked under Linux for a
> long time.
> 
> We do not follow paper standards blindly.  Practical considerations
> alway trump standards.  Standards are often wrong or it's authors did
> not consider a particular case sufficiently.


  "My way is right and everyone else's is wrong".

  Better write your own compiler then.


    cheers,
      DaveK
-- 
Can't think of a witty .sigline today....

^ permalink raw reply	[flat|nested] 208+ messages in thread

* Re: Optimization of conditional access to globals: thread-unsafe?
  2007-10-28 17:10                 ` Michael Matz
  2007-10-29  1:01                   ` David Miller
@ 2007-10-29  1:05                   ` David Miller
  2007-10-29  1:16                     ` Dave Korn
  1 sibling, 1 reply; 208+ messages in thread
From: David Miller @ 2007-10-29  1:05 UTC (permalink / raw)
  To: matz; +Cc: gcc

From: Michael Matz <matz@suse.de>
Date: Sun, 28 Oct 2007 18:08:23 +0100 (CET)

> <tongue-in-cheek>
> You mean like POSIX doesn't count very much for the kernel behaviour?
> </tongue-in-cheek>

Nice scarecrow.

Linux has and will break POSIX where POSIX asks unreasonable and
stupid things.

And in particular we will not follow POSIX if doing so breaks
pervasive practices in userspace that have worked under Linux for a
long time.

We do not follow paper standards blindly.  Practical considerations
alway trump standards.  Standards are often wrong or it's authors did
not consider a particular case sufficiently.

^ permalink raw reply	[flat|nested] 208+ messages in thread

* Re: Optimization of conditional access to globals: thread-unsafe?
  2007-10-28 17:10                 ` Michael Matz
@ 2007-10-29  1:01                   ` David Miller
  2007-10-29  2:23                     ` Mark Mielke
  2007-10-29  1:05                   ` David Miller
  1 sibling, 1 reply; 208+ messages in thread
From: David Miller @ 2007-10-29  1:01 UTC (permalink / raw)
  To: matz; +Cc: gcc

From: Michael Matz <matz@suse.de>
Date: Sun, 28 Oct 2007 18:08:23 +0100 (CET)

> I mean who am I to demand that people write correct code, 
> I must be insane.

Correctness is defined by pervasive common usage as much as it
is by paper standards.

^ permalink raw reply	[flat|nested] 208+ messages in thread

* Re: Optimization of conditional access to globals: thread-unsafe?
  2007-10-28 18:06                     ` Tomash Brechko
@ 2007-10-28 18:43                       ` Tomash Brechko
  2007-10-29  1:29                       ` Dave Korn
  1 sibling, 0 replies; 208+ messages in thread
From: Tomash Brechko @ 2007-10-28 18:43 UTC (permalink / raw)
  To: gcc

On Sun, Oct 28, 2007 at 21:03:09 +0300, Tomash Brechko wrote:
> What I was trying to say, is that it would be nice to have
> -fno-thread-unsafe-optimization option.

Rather clear -fno-speculative-store, in the light of mprotect() and
non-writable memory.


-- 
   Tomash Brechko

^ permalink raw reply	[flat|nested] 208+ messages in thread

* Re: Optimization of conditional access to globals: thread-unsafe?
  2007-10-28 17:08                   ` Michael Matz
@ 2007-10-28 18:06                     ` Tomash Brechko
  2007-10-28 18:43                       ` Tomash Brechko
  2007-10-29  1:29                       ` Dave Korn
  0 siblings, 2 replies; 208+ messages in thread
From: Tomash Brechko @ 2007-10-28 18:06 UTC (permalink / raw)
  To: Michael Matz; +Cc: gcc

On Sun, Oct 28, 2007 at 17:51:57 +0100, Michael Matz wrote:
> I was merely showing that this transformation _does_ matter in some cases 
> to refute opposite claims which seemed to be expressed too airy in this 
> thread.

You got my intent all wrong.  Performance matters for both sides.  And
currently the only option for multithreaded programs is to use
volatile, which _greatly_ hurts performance.

What I was trying to say, is that it would be nice to have
-fno-thread-unsafe-optimization option.  And I was trying to say that
when you _enable_ this option, the performance won't be hurt much,
while the program will become thread-safe.  I never even said that
this option should be the default (though it would be reasonable for
-pthread or -fopenmp).  But there are obviously people who think
there's no need in such option whatsoever, because "threaded code is
broken by definition, and I don't write it anyway".

Even if mutithreading is of no immediate concern for you, it will
become tomorrow then you decide to run your loop on all 1024 cores
your cell phone provides.  So you can't argue that this option
wouldn't be nice to have, no?


And as I understood this discussion, there will be such option in GCC
in the nearest future.


-- 
   Tomash Brechko

^ permalink raw reply	[flat|nested] 208+ messages in thread

* Re: Optimization of conditional access to globals: thread-unsafe?
  2007-10-26 22:57               ` David Miller
@ 2007-10-28 17:10                 ` Michael Matz
  2007-10-29  1:01                   ` David Miller
  2007-10-29  1:05                   ` David Miller
  0 siblings, 2 replies; 208+ messages in thread
From: Michael Matz @ 2007-10-28 17:10 UTC (permalink / raw)
  To: David Miller; +Cc: gcc

Hi,

On Fri, 26 Oct 2007, David Miller wrote:

> Also, it bears repeating that whatever performance argument you make for 
> or against this issue matters little if it breaks lots of existing and 
> working code.

It matters insofar as that existing and working code is broken in a strict 
sense.  As long as that holds there's a high chance of "breaking" it at 
random times again in the future, even when the one transformation is 
changed to not "break" it anymore.  So, we either can change the 
transformation and wait for the next uproar in a couple of months or 
somehow hope that code is fixed.  But that's all the same argumentation 
like in the signed integer overflow discussion, so my hopes for the latter 
are quite low.  I mean who am I to demand that people write correct code, 
I must be insane.

> It is also important to remind people that paper standards count less 
> than common sense and what effects users on a practical level, even when 
> those paper standards allow your favorite optimization or 
> transformation.

<tongue-in-cheek>
You mean like POSIX doesn't count very much for the kernel behaviour?
</tongue-in-cheek>

You ask us to somehow regard common sense (whatever that is) and 
practicality reasons (for which set of people?) higher than paper 
standards.  How comes then, that under linux directories are still 
seekable?  Certainly when I sometimes try to convince our kernel people of 
some clever idea, they happily use the POSIX hammer quite fine.  I sigh 
and move on.  So what exactly brings you into a position to define common 
sense or which paper standards we should ignore?


Ciao,
Michael.

^ permalink raw reply	[flat|nested] 208+ messages in thread

* Re: Optimization of conditional access to globals: thread-unsafe?
  2007-10-26 17:55                 ` Tomash Brechko
@ 2007-10-28 17:08                   ` Michael Matz
  2007-10-28 18:06                     ` Tomash Brechko
  0 siblings, 1 reply; 208+ messages in thread
From: Michael Matz @ 2007-10-28 17:08 UTC (permalink / raw)
  To: Tomash Brechko; +Cc: gcc

Hi,

On Fri, 26 Oct 2007, Tomash Brechko wrote:

> On Fri, Oct 26, 2007 at 21:45:03 +0400, Tomash Brechko wrote:
> > Note that it doesn't cancel cmoves, as those are loads, not stores.
> 
> I just checked with x86 instruction reference, CMOVcc is reg->reg or
> mem->reg, never reg->mem.  You know God's deed when you see it. :)

I wasn't precise in what actually is the important optimization.  The 
important thing about this loop is, that the data is basically random, so 
that branch prediction has no chance to do any good.  Consequentially all 
branches in that loop have a pretty high cost.  So high in fact that it's 
better to replace it with conditional moves on the value to store and make 
the store unconditional.

So, yes, there are no conditional store instructions on x86, but the 
branches need to be removed anyway for performance, and for that we need 
to make the stores unconditional (even at the cost of perhaps introducing 
another load).

You are also right that for that example we can determine that an 
unconditional store already dominates (and postdominates) the conditional 
stores in question and hence would already be thread-unsafe, so the 
transformation would be okay even with thread-safeness in mind.

I was merely showing that this transformation _does_ matter in some cases 
to refute opposite claims which seemed to be expressed too airy in this 
thread.

Now there are multiple ways out of this dilemma, retaining the 
transformation and not breaking threaded code:
1) do the transformation only if there are already other stores in an 
   outer control region.  I see that already being worked on down-thread.
2) do the transformation but also conditionalize the address of the store:
   if (cond)
     *p = val;

   --->

   __typeof__ (*p) dummy;
   if (!cond)
     p = &dummy;  // dummy a stack slot, hence no trap, no thread 
                  // implications
   *p = val;
   
I plan to work on the latter anyway somewhen as it also allows me to do 
the transformation if unconditional non-trappingness can't be proven.


Ciao,
Michael.

P.S.: I'm still somewhat disappointed about the way this discussion goes, 
it reminds me of the ugly one about signed integer overflow.  There it was 
an overly vocal set of people refusing to write ISO C which lead to a very 
intrusive change in the compiler.  Now this seems to happen again (though 
no such intrusive changes would be required right now, but perhaps for the 
other memory model).  Then and now the presumed "deficiencies" did exist 
already since years, but for some unfathomable reason only resulted in 
tempest in a teacup recently.  I don't think it's a good strategy to 
change the compiler into a strictly speaking wrong direction whenever the 
loudness of whiners reaches a certain amount.

^ permalink raw reply	[flat|nested] 208+ messages in thread

* Re: Optimization of conditional access to globals: thread-unsafe?
  2007-10-27 21:35   ` Dave Korn
@ 2007-10-27 22:58     ` Darryl Miles
  0 siblings, 0 replies; 208+ messages in thread
From: Darryl Miles @ 2007-10-27 22:58 UTC (permalink / raw)
  To: Dave Korn; +Cc: 'Tomash Brechko', gcc

Dave Korn wrote:
> On 27 October 2007 18:27, Darryl Miles wrote:
>> The "unintended write-access" optimization is a massive headache for
>> developers of multi-threaded code.
> 
>   But it's a boon for the autovectoriser, because it means it can transform
> code with a branch into straight-line code.

Then write to the stack or a register, but not to heap when the 
programmer didn't explicitly permit the compiler to do so because no 
memory targeted lvalue expression was to be executed.

This basic rule about is what a threaded programmer expects of the C 
language, even if there is no written law in C99.


I don't want to stop people using this optimization technique there will 
always be a useful case for it, I just want to be able to turn that one 
off, but keep all other optimizations.


...SNIP...


>> So much control that I would also like to see a pair of
>> __attribute__((optimization_hint_keywords)) attached to the variable
>> declaration to provide fine grain control.  Such a solution to the
>> problem would keep everybody happy.
> 
>   How about attaching the 'volatile' keyword to the variable?

No we are _HAPPY_ to allow "may optimize read access mode" but not happy 
to allow "may optimize read and write access mode" (as per my previous 
description).  Volatile can not differentiate this.  Nor can volatile 
instruct the compiler which method to use to perform the load or store, 
for example a 64bit long long type on i386.

Volatile has its uses but it pretty much a sledgehammer to the problem 
domain.


>> Hmm... on this point there can be a problem.  There are 2 major types of
>> access read from memory (load) and write to memory (store).  It is very
>> possible to end up performing an optimistic read; only to throw away the
>> value contained due to a compare/jump.  This is usually considered a
>> safe optimization.
> 
>   As embedded programmers who have to deal with registers containing
> auto-resetting status bits have known for many years, this is not a safe
> optimisation at all.  We use 'volatile' to suppress it.

It is safe for general programming usage which was the original case. 
See my comment (which you failed to cite) over the use of volatile for 
the situation you describe.  I've already covered this case for you.


>> NB Marking the variable 'volatile' does not mean anything useful in the
>> situation you are in.  The exact meaning of what 'volatile' is can be a
>> problem between compilers, but in the case of GCC it can stop the
>> re-ordering and the caching of value in register aspect of your entire
>> problem.  But it will never enforce the method used to perform the
>> load/store, not will it (at this time) stop the unintended write-access.
> 
>   Huh?  When I tried compiling the test case, it did exactly that.  Hang on,
> I'll check:

We differ slighting on our understanding of volatile.  It does not 
provide exactly what a threaded programmer wants, even thought to you it 
addresses the problem when used with GCC in the cases you have tried.

Your example you cite is coincidental, thats just how GCC generates code.


Darryl

^ permalink raw reply	[flat|nested] 208+ messages in thread

* RE: Optimization of conditional access to globals: thread-unsafe?
  2007-10-27 18:15 ` Darryl Miles
@ 2007-10-27 21:35   ` Dave Korn
  2007-10-27 22:58     ` Darryl Miles
  0 siblings, 1 reply; 208+ messages in thread
From: Dave Korn @ 2007-10-27 21:35 UTC (permalink / raw)
  To: 'Darryl Miles', 'Tomash Brechko'; +Cc: gcc

On 27 October 2007 18:27, Darryl Miles wrote:

> The "unintended write-access" optimization is a massive headache for
> developers of multi-threaded code.

  But it's a boon for the autovectoriser, because it means it can transform
code with a branch into straight-line code.

> The problem here is the mandatory write access to a memory location for
> which the as-written code path does not indicate a write memory access
> should occur.

  That's the difference between as-if and actual.
 
> This is a tricky one, optimizations which have the effect of causing an
> "unintended write access to some resource" when the code path does not
> intend this to happen crosses a line IMHO.

  Well, volatile was invented to address this exact issue.  If you use it, you
get exactly what you're asking for.
 
> I think that GCC should understand where that line is and have a compile
> time parameter to configure if that line maybe crossed.  Its a matter
> for debate as to what the default should be and/or -O6 should allow that
> line to be crossed, but having no mechanism to control it is the real
> bummer.

  Well, that's kind of like what -fvolatile used to do; but it's a bit
indiscriminate to apply it to the entire program when shared state of this
sort may be infrequent or even entirely absent.
 
> Even if the interpretation offered of the C language standards
> specification says the line maybe be crossed, from a practical point of
> view this is one aspect of optimization that a developer would want to
> have complete control over.

  That's what volatile gives you.
 
> So much control that I would also like to see a pair of
> __attribute__((optimization_hint_keywords)) attached to the variable
> declaration to provide fine grain control.  Such a solution to the
> problem would keep everybody happy.

  How about attaching the 'volatile' keyword to the variable?

>> Here are some pieces from C99:
>> 
> ...SNIP...
>> Sec 3.1 par 4: NOTE 3 Expressions that are not evaluated do not access
>>                objects.

  I only have a draft copy, but isn't this language specific to the definition
of 'restrict' semantics?  I'm not sure it's relevant here in that case.
 
> Hmm... on this point there can be a problem.  There are 2 major types of
> access read from memory (load) and write to memory (store).  It is very
> possible to end up performing an optimistic read; only to throw away the
> value contained due to a compare/jump.  This is usually considered a
> safe optimization.

  As embedded programmers who have to deal with registers containing
auto-resetting status bits have known for many years, this is not a safe
optimisation at all.  We use 'volatile' to suppress it.

> But reading the statement above as-is and in the context of this problem
> might make some believe this  "optimistic read" optimization is breaking
> the rules.


> NB Marking the variable 'volatile' does not mean anything useful in the
> situation you are in.  The exact meaning of what 'volatile' is can be a
> problem between compilers, but in the case of GCC it can stop the
> re-ordering and the caching of value in register aspect of your entire
> problem.  But it will never enforce the method used to perform the
> load/store, not will it (at this time) stop the unintended write-access.

  Huh?  When I tried compiling the test case, it did exactly that.  Hang on,
I'll check:

[dk@tuxtlas serial_booting]$ gcc -S -O1 -x c - -o a.s

     extern int v;

     void
     f(int set_v)
     {
       if (set_v)
         v = 1;
     }


[dk@tuxtlas serial_booting]$ cat a.s
        .file   ""
        .text
.globl f
        .type   f, @function
f:
.LFB2:
        testl   %edi, %edi
        movl    $1, %eax
        cmove   v(%rip), %eax
        movl    %eax, v(%rip)
        ret
.LFE2:
        .size   f, .-f
        .section        .eh_frame,"a",@progbits

        .section        .note.GNU-stack,"",@progbits
        .ident  "GCC: (GNU) 3.4.6 20060404 (Red Hat 3.4.6-8)"
[dk@tuxtlas serial_booting]$ gcc -S -O1 -x c - -o a.s

     extern volatile int v ;


     void
     f(int set_v)
     {
       if (set_v)
         v = 1;
     }

[dk@tuxtlas serial_booting]$ cat a.s
        .file   ""
        .text
.globl f
        .type   f, @function
f:
.LFB2:
        testl   %edi, %edi
        je      .L1
        movl    $1, v(%rip)
.L1:
        rep ; ret
.LFE2:
        .size   f, .-f
        .section        .eh_frame,"a",@progbits

        .section        .note.GNU-stack,"",@progbits
        .ident  "GCC: (GNU) 3.4.6 20060404 (Red Hat 3.4.6-8)"
[dk@tuxtlas serial_booting]$


  Looks good to me.  Isn't that what everyone wants the compiler to be doing
with this code?

    cheers,
      DaveK
-- 
Can't think of a witty .sigline today....

^ permalink raw reply	[flat|nested] 208+ messages in thread

* Re: Optimization of conditional access to globals: thread-unsafe?
  2007-10-27 17:08                 ` Andi Kleen
@ 2007-10-27 18:24                   ` Ian Lance Taylor
  0 siblings, 0 replies; 208+ messages in thread
From: Ian Lance Taylor @ 2007-10-27 18:24 UTC (permalink / raw)
  To: Andi Kleen; +Cc: gcc

Andi Kleen <andi@firstfloor.org> writes:

> Ian Lance Taylor <iant@google.com> writes:
> >
> > What do people think of this patch?  This seems to fix the problem
> > case without breaking Michael's case.  It basically avoids store
> > speculation: we don't write to a MEM unless the function
> > unconditionally writes to the MEM anyhow.
> 
> I'm not sure "function" is a good area to check here. It might well be that
> a function has parts where it is ok to change memory (because a lock is hold)
> and another part where this is not true. But your check would
> mix them both togeter.

The second version of my patch would not do that, because the lock
operation would be a memory barrier.

> Basic block (or rather super block without function calls or memory barriers) 
> would be better.

Basic block would be useless, since there are already two basic blocks
involved.

My second patch looks through dominated blocks and stops at a memory
barrier.  So pretty much what you suggest.

Ian

^ permalink raw reply	[flat|nested] 208+ messages in thread

* Re: Optimization of conditional access to globals: thread-unsafe?
  2007-10-21 14:55 Tomash Brechko
  2007-10-21 15:26 ` Erik Trulsson
  2007-10-21 23:07 ` Dave Korn
@ 2007-10-27 18:15 ` Darryl Miles
  2007-10-27 21:35   ` Dave Korn
  2 siblings, 1 reply; 208+ messages in thread
From: Darryl Miles @ 2007-10-27 18:15 UTC (permalink / raw)
  To: Tomash Brechko; +Cc: gcc


Comments inline below vvvvv


Tomash Brechko wrote:
> Consider this piece of code:
> 
>     extern int v;
>   
>     void
>     f(int set_v)
>     {
>       if (set_v)
>         v = 1;
>     }
> 
>     f:
>             pushl   %ebp
>             movl    %esp, %ebp
>             cmpl    $0, 8(%ebp)
>             movl    $1, %eax
>             cmove   v, %eax        ; load (maybe)
>             movl    %eax, v        ; store (always)
>             popl    %ebp
>             ret
> 
> Note the last unconditional store to v.  Now, if some thread would
> modify v between our load and store (acquiring the mutex first), then
> we will overwrite the new value with the old one (and would do that in
> a thread-unsafe manner, not acquiring the mutex).
> 
> So, do the calls to f(0) require the mutex, or it's a GCC bug?

The "unintended write-access" optimization is a massive headache for 
developers of multi-threaded code.


The problem here is the mandatory write access to a memory location for 
which the as-written code path does not indicate a write memory access 
should occur.

This is a tricky one, optimizations which have the effect of causing an 
"unintended write access to some resource" when the code path does not 
intend this to happen crosses a line IMHO.

I think that GCC should understand where that line is and have a compile 
time parameter to configure if that line maybe crossed.  Its a matter 
for debate as to what the default should be and/or -O6 should allow that 
line to be crossed, but having no mechanism to control it is the real 
bummer.

Even if the interpretation offered of the C language standards 
specification says the line maybe be crossed, from a practical point of 
view this is one aspect of optimization that a developer would want to 
have complete control over.

So much control that I would also like to see a pair of 
__attribute__((optimization_hint_keywords)) attached to the variable 
declaration to provide fine grain control.  Such a solution to the 
problem would keep everybody happy.


> Here are some pieces from C99:
> 
...SNIP...
> Sec 3.1 par 4: NOTE 3 Expressions that are not evaluated do not access
>                objects.

Hmm... on this point there can be a problem.  There are 2 major types of 
access read from memory (load) and write to memory (store).  It is very 
possible to end up performing an optimistic read; only to throw away the 
value contained due to a compare/jump.  This is usually considered a 
safe optimization.

But reading the statement above as-is and in the context of this problem 
might make some believe this  "optimistic read" optimization is breaking 
the rules.


Maybe in GCC there should be C99 adherence levels :

strict mode: Where this C99 clause is adhered to, but this is much like 
compiling code without optimization, like when debugging.  Since during 
debugging you always want nice clear per line / per expression 
separation so you can walk through execution with a debugger.

may optimize read access mode: This is the normal case for optimization, 
where you might interleave a 'compare reg with immediate' and a 'load 
from memory', then perform a 'conditional branch' that ends up at code 
that never uses the value loaded from memory.  The only rare case this 
is a problem is where a read from special memory, but volatile in GCC 
exists for that or you could move all accesses to that memory away from 
regular C language syntax and into a function call.

may optimize read and write access mode: This is the problem case you 
are seeing.  Same as the mode above but also permits the unintended 
write access, but only to write back the same value as before (based on 
the compiler's thread naive perception of execution at least!).



> So, could someone explain me why this GCC optimization is valid, and,
> if so, where lies the boundary below which I may safely assume GCC
> won't try to store to objects that aren't stored to explicitly during
> particular execution path?  Or maybe the named bug report is valid
> after all?

As has been pointed out by others there is no specification on what 
happens between threads.


Your route out of this problem is to write your own implementation of:

atomic_int_set(int *ptr, int value);

Which always uses an atomic single instruction store.  Which is 
thread-safe with respect to ensuring that no other concurrent read or 
write to that location will ever see a corrupted value.  Where a 
corrupted value in this case would be some value other than "the 
previous value of 'v'" and "the value of '1'" you are setting, also that 
once a concurrent access with "the value of '1'" is first obversed, it 
will not be possible to observe the previous value on a subsequent read 
(the value doesn't flap about once it changes, it changes for good).

if (set_v)
   atomic_int_set(&v, 1);



By doing the above you are programatically dictating the method of 
thread-safety in 2 directions.

One direction in terms of something that is agreeable with a compiler 
and something it can't optimize/change.  By using a basic function call 
invocation to an external symbol the compiler has no room to be able to 
think about optimization.  Since the compiler does not know the side 
effects that calling this external symbol may have.  So it can't reorder 
this operation either, so it occurred exactly at the moment your code 
says it should.

The other direction is in terms of the target CPU assembler instructions 
and architecture.  The implementation of your atomic_int_set() will be 
fixed by expressing the operation directly in assembler (to be sure it 
always will use a single instruction move register to memory).


It would also be true that you should also have a method for reading the 
current value of 'v' in an atomic way.

This may also mean you have to create a function:

extern int atomic_int_get(int *);

For the purpose of obtaining the current value.  Yes you have to apply 
the same care when reading the value as you do with setting it.



NB Marking the variable 'volatile' does not mean anything useful in the 
situation you are in.  The exact meaning of what 'volatile' is can be a 
problem between compilers, but in the case of GCC it can stop the 
re-ordering and the caching of value in register aspect of your entire 
problem.  But it will never enforce the method used to perform the 
load/store, not will it (at this time) stop the unintended write-access. 
  Although in the case of an aligned integer of natural bitwidth it is 
somewhat difficult for the compiler to do the wrong thing on most 
architectures, as the most efficient instruction is the atomic 
load/store between register and memory.


Darryl

^ permalink raw reply	[flat|nested] 208+ messages in thread

* Re: Optimization of conditional access to globals: thread-unsafe?
       [not found]               ` <m33avxfu2i.fsf@localhost.localdomain.suse.lists.egcs>
@ 2007-10-27 17:08                 ` Andi Kleen
  2007-10-27 18:24                   ` Ian Lance Taylor
  0 siblings, 1 reply; 208+ messages in thread
From: Andi Kleen @ 2007-10-27 17:08 UTC (permalink / raw)
  To: Ian Lance Taylor; +Cc: gcc

Ian Lance Taylor <iant@google.com> writes:
>
> What do people think of this patch?  This seems to fix the problem
> case without breaking Michael's case.  It basically avoids store
> speculation: we don't write to a MEM unless the function
> unconditionally writes to the MEM anyhow.

I'm not sure "function" is a good area to check here. It might well be that
a function has parts where it is ok to change memory (because a lock is hold)
and another part where this is not true. But your check would
mix them both togeter.

Basic block (or rather super block without function calls or memory barriers) 
would be better.

-Andi

^ permalink raw reply	[flat|nested] 208+ messages in thread

* Re: Optimization of conditional access to globals: thread-unsafe?
  2007-10-27  0:17                 ` skaller
  2007-10-27  0:26                   ` David Daney
@ 2007-10-27 12:51                   ` Andrew Haley
  1 sibling, 0 replies; 208+ messages in thread
From: Andrew Haley @ 2007-10-27 12:51 UTC (permalink / raw)
  To: skaller; +Cc: Ian Lance Taylor, Michael Matz, gcc, gcc-patches

skaller writes:
 > 
 > On Fri, 2007-10-26 at 14:24 -0700, Ian Lance Taylor wrote:
 > 
 > > This is basically a public relations exercise.  I doubt this
 > > optimization is especially important, so I think it's OK to
 > > disable it to keep people happy.  Even though the optimization
 > > has been there since gcc 3.4 and nobody noticed.
 > 
 > Most people didn't have multi-core processors then..

It's partly that.  It's also that some people, particularly kernel
hackers, are super-paranoid about this sort of thing, and that's all
to the good.  The window of vulnerability introduced by this
"optimization" is extremely small -- but it is there.

As far as I can tell from reading the Linux kernel list, almost every
kernel programmer wants this transformation to be removed.  I suspect
that most pthreads programmers do too.  So yeah, it's mostly a public
relations exercise, but a worthwhile one.  I look forward to going to
the kernel list and telling them we've done what they wanted us to do.
They're an important part of our community.

Andrew.

^ permalink raw reply	[flat|nested] 208+ messages in thread

* Re: Optimization of conditional access to globals: thread-unsafe?
  2007-10-26 23:34                 ` skaller
@ 2007-10-27 10:54                   ` Tomash Brechko
  0 siblings, 0 replies; 208+ messages in thread
From: Tomash Brechko @ 2007-10-27 10:54 UTC (permalink / raw)
  To: skaller; +Cc: gcc

On Sat, Oct 27, 2007 at 09:25:09 +1000, skaller wrote:
> Yes, but with a class:
> 
> 	struct X {
> 		int x;
> 		void f() { if (C) x = 1; }
> 		void f2() { reg = x; if (c) reg = 1; x = reg; }
> 	};

Hmm, indeed, and the example may end right here, you don't have to
allocate global X.  x member is "shared" among all X member functions,
so if both X::f() and X::f2() are called concurrently for the same
object without the lock, you are in trouble, even if you know only one
of them might modify the x for current conditions.

Since both f() and f2() implicitly get 'this' pointer, the situation
when "the address of some local var is taken" is more frequent then I
thought before, thanks for pointing this.

Then perhaps all unconditional speculative stores should be avoided
(unless there's also explicit unconditional store), without the need
to analize whether it is safe or not.


-- 
   Tomash Brechko

^ permalink raw reply	[flat|nested] 208+ messages in thread

* Re: Optimization of conditional access to globals: thread-unsafe?
  2007-10-27  0:26                   ` David Daney
  2007-10-27  0:36                     ` Robert Dewar
@ 2007-10-27  1:29                     ` skaller
  1 sibling, 0 replies; 208+ messages in thread
From: skaller @ 2007-10-27  1:29 UTC (permalink / raw)
  To: David Daney; +Cc: gcc


On Fri, 2007-10-26 at 17:19 -0700, David Daney wrote:
> skaller wrote:
> > On Fri, 2007-10-26 at 14:24 -0700, Ian Lance Taylor wrote:
> >> Michael Matz <matz@suse.de> writes:
> > 
> >> This is basically a public relations exercise.  I doubt this
> >> optimization is especially important, so I think it's OK to disable it
> >> to keep people happy.  Even though the optimization has been there
> >> since gcc 3.4 and nobody noticed.
> > 
> > Most people didn't have multi-core processors then..
> > 
> 
> They did use pthreads though.  Code correctness in this case does not 
> depend on the number of processor cores.

Probably this is mostly correct, though it seems to depend
on whether loads and stores are atomic with respect
to parallel or merely pre-empted accesses.

There are of course cases where it matters. Eg critical
sections on a single core can be implemented by simply 
masking interrupts. That doesn't work on a dual core.

-- 
John Skaller <skaller at users dot sf dot net>
Felix, successor to C++: http://felix.sf.net

^ permalink raw reply	[flat|nested] 208+ messages in thread

* Re: Optimization of conditional access to globals: thread-unsafe?
  2007-10-27  0:26                   ` David Daney
@ 2007-10-27  0:36                     ` Robert Dewar
  2007-10-27  1:29                     ` skaller
  1 sibling, 0 replies; 208+ messages in thread
From: Robert Dewar @ 2007-10-27  0:36 UTC (permalink / raw)
  To: David Daney; +Cc: skaller, gcc

David Daney wrote:

> They did use pthreads though.  Code correctness in this case does not 
> depend on the number of processor cores.

True, but in practice "real" multiprocessing shows up such bugs
more often ...
> 
> David Daney


^ permalink raw reply	[flat|nested] 208+ messages in thread

* Re: Optimization of conditional access to globals: thread-unsafe?
  2007-10-27  0:17                 ` skaller
@ 2007-10-27  0:26                   ` David Daney
  2007-10-27  0:36                     ` Robert Dewar
  2007-10-27  1:29                     ` skaller
  2007-10-27 12:51                   ` Andrew Haley
  1 sibling, 2 replies; 208+ messages in thread
From: David Daney @ 2007-10-27  0:26 UTC (permalink / raw)
  To: skaller; +Cc: gcc

skaller wrote:
> On Fri, 2007-10-26 at 14:24 -0700, Ian Lance Taylor wrote:
>> Michael Matz <matz@suse.de> writes:
> 
>> This is basically a public relations exercise.  I doubt this
>> optimization is especially important, so I think it's OK to disable it
>> to keep people happy.  Even though the optimization has been there
>> since gcc 3.4 and nobody noticed.
> 
> Most people didn't have multi-core processors then..
> 

They did use pthreads though.  Code correctness in this case does not 
depend on the number of processor cores.

David Daney

^ permalink raw reply	[flat|nested] 208+ messages in thread

* Re: Optimization of conditional access to globals: thread-unsafe?
  2007-10-26 21:29               ` Ian Lance Taylor
                                   ` (2 preceding siblings ...)
  2007-10-26 22:20                 ` Jakub Jelinek
@ 2007-10-27  0:17                 ` skaller
  2007-10-27  0:26                   ` David Daney
  2007-10-27 12:51                   ` Andrew Haley
  3 siblings, 2 replies; 208+ messages in thread
From: skaller @ 2007-10-27  0:17 UTC (permalink / raw)
  To: Ian Lance Taylor; +Cc: Michael Matz, gcc, gcc-patches


On Fri, 2007-10-26 at 14:24 -0700, Ian Lance Taylor wrote:
> Michael Matz <matz@suse.de> writes:

> This is basically a public relations exercise.  I doubt this
> optimization is especially important, so I think it's OK to disable it
> to keep people happy.  Even though the optimization has been there
> since gcc 3.4 and nobody noticed.

Most people didn't have multi-core processors then..


-- 
John Skaller <skaller at users dot sf dot net>
Felix, successor to C++: http://felix.sf.net

^ permalink raw reply	[flat|nested] 208+ messages in thread

* Re: Optimization of conditional access to globals: thread-unsafe?
  2007-10-26 19:11               ` Tomash Brechko
@ 2007-10-26 23:34                 ` skaller
  2007-10-27 10:54                   ` Tomash Brechko
  0 siblings, 1 reply; 208+ messages in thread
From: skaller @ 2007-10-26 23:34 UTC (permalink / raw)
  To: Tomash Brechko; +Cc: gcc


On Fri, 2007-10-26 at 23:03 +0400, Tomash Brechko wrote:
> On Sat, Oct 27, 2007 at 03:06:21 +1000, skaller wrote:

> > And what do you do if you do not KNOW what the storage class is,
> > which is the case 99.99% of the time in C++ member functions?
> 
> I'm not quite sure what you mean here.  If extern vs static---that's
> of no concern.  What matters is whether the object can possibly be
> accessed from another thread, and this has nothing specific to C++.

Yes, but with a class:

	struct X {
		int x;
		void f() { if (C) x = 1; }
		void f2() { reg = x; if (c) reg = 1; x = reg; }
	};

	X global;
	void k() {
		X local;
		global.f(); global.f2();
		local.f(); local.f2();
	};

you would have to assume all member variables were accessible
to another thread when generating the member functions,
even if the variable is private, unless you did heavy analysis
to ensure the class didn't leak its address.

In effect this means method access is slower than global
functions in a threading context.**

[** in Felix I attempt to generate a global function instead
of a class for a Felix function, which is a C++ applicative
object .. but then Felix is a whole program analyser so
it can do this. The reason is .. that I guessed C++ compilers
such as gcc optimise global functions better than methods.]


-- 
John Skaller <skaller at users dot sf dot net>
Felix, successor to C++: http://felix.sf.net

^ permalink raw reply	[flat|nested] 208+ messages in thread

* Re: Optimization of conditional access to globals: thread-unsafe?
  2007-10-26 17:06             ` Michael Matz
  2007-10-26 17:54               ` Tomash Brechko
  2007-10-26 21:29               ` Ian Lance Taylor
@ 2007-10-26 22:57               ` David Miller
  2007-10-28 17:10                 ` Michael Matz
  2 siblings, 1 reply; 208+ messages in thread
From: David Miller @ 2007-10-26 22:57 UTC (permalink / raw)
  To: matz; +Cc: tomash.brechko, gcc

From: Michael Matz <matz@suse.de>
Date: Fri, 26 Oct 2007 19:04:10 +0200 (CEST)

> Certainly some suggestions for another memory model look quite
> similar to considering all non-automatic objects as volatile, at
> which point the question should be allowed why not simply using
> 'volatile'.

This is very much not true.

You can speculatively load these global variables as much as you like,
you just can't unconditionally store to them.

Volatile is a much different beast.

Also, it bears repeating that whatever performance argument you make
for or against this issue matters little if it breaks lots of existing
and working code.  It is also important to remind people that paper
standards count less than common sense and what effects users on a
practical level, even when those paper standards allow your favorite
optimization or transformation.

I think some people in this discussion too often use paper standards
as a crutch in their arguments.

^ permalink raw reply	[flat|nested] 208+ messages in thread

* Re: Optimization of conditional access to globals: thread-unsafe?
  2007-10-26 22:38                   ` Ian Lance Taylor
  2007-10-26 22:46                     ` Jonathan Wakely
@ 2007-10-26 22:56                     ` Diego Novillo
  2007-10-31 22:43                     ` Jason Merrill
  2 siblings, 0 replies; 208+ messages in thread
From: Diego Novillo @ 2007-10-26 22:56 UTC (permalink / raw)
  To: Ian Lance Taylor; +Cc: Michael Matz, gcc, gcc-patches

On 26 Oct 2007 15:20:01 -0700, Ian Lance Taylor <iant@google.com> wrote:

> It appears that the draft C++0x memory model prohibits speculative
> stores.

Well, sure, might as well.  Though the final form of the standard may
be different, I doubt that this case will change significantly.

^ permalink raw reply	[flat|nested] 208+ messages in thread

* Re: Optimization of conditional access to globals: thread-unsafe?
  2007-10-26 22:20                 ` Jakub Jelinek
@ 2007-10-26 22:55                   ` Ian Lance Taylor
  0 siblings, 0 replies; 208+ messages in thread
From: Ian Lance Taylor @ 2007-10-26 22:55 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Michael Matz, gcc, gcc-patches

Jakub Jelinek <jakub@redhat.com> writes:

> On Fri, Oct 26, 2007 at 02:24:21PM -0700, Ian Lance Taylor wrote:
> > What do people think of this patch?  This seems to fix the problem
> > case without breaking Michael's case.  It basically avoids store
> > speculation: we don't write to a MEM unless the function
> > unconditionally writes to the MEM anyhow.
> 
> This still isn't enough.  If you have a non-pure/non-const CALL_INSN
> before the unconditional store into it, you need to return false from
> noce_mem_unconditionally_set_p as that function could have a barrier
> in it.  Similarly for inline asm or __sync_* builtin generated insns
> (not sure ATM if just stopping on UNSPEC_VOLATILE/ASM_INPUT/ASM_OPERANDS
> or something else is needed).

Yeah, I thought of that later.  This is the patch I'm actually
testing.  Does this look OK to you?

Ian

Index: ifcvt.c
===================================================================
--- ifcvt.c	(revision 129661)
+++ ifcvt.c	(working copy)
@@ -2139,6 +2139,47 @@ noce_mem_write_may_trap_or_fault_p (cons
   return false;
 }
 
+/* Return whether we can use store speculation for MEM.  TOP_BB is the
+   basic block above the conditional block where we are considering
+   doing the speculative store.  We look for whether MEM is set
+   unconditionally later in the function.  */
+
+static bool
+noce_can_store_speculate_p (basic_block top_bb, const_rtx mem)
+{
+  basic_block dominator;
+
+  for (dominator = get_immediate_dominator (CDI_POST_DOMINATORS, top_bb);
+       dominator != NULL;
+       dominator = get_immediate_dominator (CDI_POST_DOMINATORS, dominator))
+    {
+      rtx insn;
+
+      FOR_BB_INSNS (dominator, insn)
+	{
+	  if (memory_modified_in_insn_p (mem, insn))
+	    return true;
+	  if (modified_in_p (XEXP (mem, 0), insn))
+	    return false;
+
+	  /* If we see something that might be a memory barrier, we
+	     have to stop looking.  Even if the MEM is set later in
+	     the function, we still don't want to set it
+	     unconditionally before the barrier.  Note that
+	     memory_modified_in_p will return true for an asm which
+	     clobbers memory.  */
+	  if (INSN_P (insn)
+	      && (volatile_insn_p (PATTERN (insn))
+		  || (CALL_P (insn)
+		      && (!CONST_OR_PURE_CALL_P (insn)
+			  || pure_call_p (insn)))))
+	    return false;
+	}
+    }
+
+  return false;
+}
+
 /* Given a simple IF-THEN-JOIN or IF-THEN-ELSE-JOIN block, attempt to convert
    it without using conditional execution.  Return TRUE if we were successful
    at converting the block.  */
@@ -2292,17 +2333,31 @@ noce_process_if_block (struct noce_if_in
       goto success;
     }
 
-  /* Disallow the "if (...) x = a;" form (with an implicit "else x = x;")
-     for optimizations if writing to x may trap or fault, i.e. it's a memory
-     other than a static var or a stack slot, is misaligned on strict
-     aligned machines or is read-only.
-     If x is a read-only memory, then the program is valid only if we
-     avoid the store into it.  If there are stores on both the THEN and
-     ELSE arms, then we can go ahead with the conversion; either the
-     program is broken, or the condition is always false such that the
-     other memory is selected.  */
-  if (!set_b && MEM_P (orig_x) && noce_mem_write_may_trap_or_fault_p (orig_x))
-    return FALSE;
+  if (!set_b && MEM_P (orig_x))
+    {
+      /* Disallow the "if (...) x = a;" form (implicit "else x = x;")
+	 for optimizations if writing to x may trap or fault,
+	 i.e. it's a memory other than a static var or a stack slot,
+	 is misaligned on strict aligned machines or is read-only.  If
+	 x is a read-only memory, then the program is valid only if we
+	 avoid the store into it.  If there are stores on both the
+	 THEN and ELSE arms, then we can go ahead with the conversion;
+	 either the program is broken, or the condition is always
+	 false such that the other memory is selected.  */
+      if (noce_mem_write_may_trap_or_fault_p (orig_x))
+	return FALSE;
+
+      /* Avoid store speculation: given "if (...) x = a" where x is a
+	 MEM, we only want to do the store if x is always set
+	 somewhere in the function.  This avoids cases like
+	   if (pthread_mutex_trylock(mutex))
+	     ++global_variable;
+	 where we only want global_variable to be changed if the mutex
+	 is held.  FIXME: This should ideally be expressed directly in
+	 RTL somehow.  */
+      if (!noce_can_store_speculate_p (test_bb, orig_x))
+	return FALSE;
+    }
 
   if (noce_try_move (if_info))
     goto success;
@@ -3957,7 +4012,7 @@ dead_or_predicable (basic_block test_bb,
 /* Main entry point for all if-conversion.  */
 
 static void
-if_convert (bool recompute_dominance)
+if_convert (void)
 {
   basic_block bb;
   int pass;
@@ -3977,9 +4032,8 @@ if_convert (bool recompute_dominance)
   loop_optimizer_finalize ();
   free_dominance_info (CDI_DOMINATORS);
 
-  /* Compute postdominators if we think we'll use them.  */
-  if (HAVE_conditional_execution || recompute_dominance)
-    calculate_dominance_info (CDI_POST_DOMINATORS);
+  /* Compute postdominators.  */
+  calculate_dominance_info (CDI_POST_DOMINATORS);
 
   df_set_flags (DF_LR_RUN_DCE);
 
@@ -4068,7 +4122,7 @@ rest_of_handle_if_conversion (void)
       if (dump_file)
         dump_flow_info (dump_file, dump_flags);
       cleanup_cfg (CLEANUP_EXPENSIVE);
-      if_convert (false);
+      if_convert ();
     }
 
   cleanup_cfg (0);
@@ -4105,7 +4159,7 @@ gate_handle_if_after_combine (void)
 static unsigned int
 rest_of_handle_if_after_combine (void)
 {
-  if_convert (true);
+  if_convert ();
   return 0;
 }
 
@@ -4138,7 +4192,7 @@ gate_handle_if_after_reload (void)
 static unsigned int
 rest_of_handle_if_after_reload (void)
 {
-  if_convert (true);
+  if_convert ();
   return 0;
 }
 

^ permalink raw reply	[flat|nested] 208+ messages in thread

* Re: Optimization of conditional access to globals: thread-unsafe?
  2007-10-26 22:38                   ` Ian Lance Taylor
@ 2007-10-26 22:46                     ` Jonathan Wakely
  2007-10-26 22:56                     ` Diego Novillo
  2007-10-31 22:43                     ` Jason Merrill
  2 siblings, 0 replies; 208+ messages in thread
From: Jonathan Wakely @ 2007-10-26 22:46 UTC (permalink / raw)
  To: Ian Lance Taylor; +Cc: Diego Novillo, Michael Matz, gcc, gcc-patches

On 26 Oct 2007 15:20:01 -0700, Ian Lance Taylor <iant@google.com> wrote:
>
> It appears that the draft C++0x memory model prohibits speculative
> stores.
>
> Therefore I now think we should aim toward prohibiting them
> unconditionally.  That memory model is just a draft.  But I think we
> should implement it unconditionally when it exists.

In case anyone who's interested hasn't seen it, the draft memory model
is accompanied by N2338, "Concurrency memory model compiler
consequences"
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2338.html

Jon

^ permalink raw reply	[flat|nested] 208+ messages in thread

* Re: Optimization of conditional access to globals: thread-unsafe?
  2007-10-26 21:39                 ` Diego Novillo
@ 2007-10-26 22:38                   ` Ian Lance Taylor
  2007-10-26 22:46                     ` Jonathan Wakely
                                       ` (2 more replies)
  0 siblings, 3 replies; 208+ messages in thread
From: Ian Lance Taylor @ 2007-10-26 22:38 UTC (permalink / raw)
  To: Diego Novillo; +Cc: Michael Matz, gcc, gcc-patches

"Diego Novillo" <dnovillo@google.com> writes:

> On 26 Oct 2007 14:24:21 -0700, Ian Lance Taylor <iant@google.com> wrote:
> 
> > What do people think of this patch?  This seems to fix the problem
> > case without breaking Michael's case.  It basically avoids store
> > speculation: we don't write to a MEM unless the function
> > unconditionally writes to the MEM anyhow.
> 
> I think it couldn't hurt.  Providing it as a QOI feature might be
> good.  However, we should predicate these changes on a -fthread-safe
> flag.  More and more of these corner cases will start popping up.

It appears that the draft C++0x memory model prohibits speculative
stores.

Therefore I now think we should aim toward prohibiting them
unconditionally.  That memory model is just a draft.  But I think we
should implement it unconditionally when it exists.

Ian

^ permalink raw reply	[flat|nested] 208+ messages in thread

* Re: Optimization of conditional access to globals: thread-unsafe?
  2007-10-26 21:29               ` Ian Lance Taylor
  2007-10-26 21:39                 ` Diego Novillo
  2007-10-26 21:53                 ` Daniel Jacobowitz
@ 2007-10-26 22:20                 ` Jakub Jelinek
  2007-10-26 22:55                   ` Ian Lance Taylor
  2007-10-27  0:17                 ` skaller
  3 siblings, 1 reply; 208+ messages in thread
From: Jakub Jelinek @ 2007-10-26 22:20 UTC (permalink / raw)
  To: Ian Lance Taylor; +Cc: Michael Matz, gcc, gcc-patches

On Fri, Oct 26, 2007 at 02:24:21PM -0700, Ian Lance Taylor wrote:
> What do people think of this patch?  This seems to fix the problem
> case without breaking Michael's case.  It basically avoids store
> speculation: we don't write to a MEM unless the function
> unconditionally writes to the MEM anyhow.

This still isn't enough.  If you have a non-pure/non-const CALL_INSN
before the unconditional store into it, you need to return false from
noce_mem_unconditionally_set_p as that function could have a barrier
in it.  Similarly for inline asm or __sync_* builtin generated insns
(not sure ATM if just stopping on UNSPEC_VOLATILE/ASM_INPUT/ASM_OPERANDS
or something else is needed).

	Jakub

^ permalink raw reply	[flat|nested] 208+ messages in thread

* Re: Optimization of conditional access to globals: thread-unsafe?
  2007-10-26 21:29               ` Ian Lance Taylor
  2007-10-26 21:39                 ` Diego Novillo
@ 2007-10-26 21:53                 ` Daniel Jacobowitz
  2007-10-26 22:20                 ` Jakub Jelinek
  2007-10-27  0:17                 ` skaller
  3 siblings, 0 replies; 208+ messages in thread
From: Daniel Jacobowitz @ 2007-10-26 21:53 UTC (permalink / raw)
  To: Ian Lance Taylor; +Cc: Michael Matz, gcc, gcc-patches

On Fri, Oct 26, 2007 at 02:24:21PM -0700, Ian Lance Taylor wrote:
> What do people think of this patch?  This seems to fix the problem
> case without breaking Michael's case.  It basically avoids store
> speculation: we don't write to a MEM unless the function
> unconditionally writes to the MEM anyhow.
> 
> This is basically a public relations exercise.  I doubt this
> optimization is especially important, so I think it's OK to disable it
> to keep people happy.  Even though the optimization has been there
> since gcc 3.4 and nobody noticed.
> 
> Of course this kind of thing will break again until somebody takes the
> time to fully implement something like the C++0x memory model.

Right.  In fact it seems to me to be still broken; you just need a
bigger test case.

  if (trylock)
    { var++; unlock; }

  sleep

  lock
  var++;
  unlock

I'm sure someone can turn that into a sensible looking example, with a
little inlining.

-- 
Daniel Jacobowitz
CodeSourcery

^ permalink raw reply	[flat|nested] 208+ messages in thread

* Re: Optimization of conditional access to globals: thread-unsafe?
  2007-10-26 21:29               ` Ian Lance Taylor
@ 2007-10-26 21:39                 ` Diego Novillo
  2007-10-26 22:38                   ` Ian Lance Taylor
  2007-10-26 21:53                 ` Daniel Jacobowitz
                                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 208+ messages in thread
From: Diego Novillo @ 2007-10-26 21:39 UTC (permalink / raw)
  To: Ian Lance Taylor; +Cc: Michael Matz, gcc, gcc-patches

On 26 Oct 2007 14:24:21 -0700, Ian Lance Taylor <iant@google.com> wrote:

> What do people think of this patch?  This seems to fix the problem
> case without breaking Michael's case.  It basically avoids store
> speculation: we don't write to a MEM unless the function
> unconditionally writes to the MEM anyhow.

I think it couldn't hurt.  Providing it as a QOI feature might be
good.  However, we should predicate these changes on a -fthread-safe
flag.  More and more of these corner cases will start popping up.

^ permalink raw reply	[flat|nested] 208+ messages in thread

* Re: Optimization of conditional access to globals: thread-unsafe?
  2007-10-26 17:06             ` Michael Matz
  2007-10-26 17:54               ` Tomash Brechko
@ 2007-10-26 21:29               ` Ian Lance Taylor
  2007-10-26 21:39                 ` Diego Novillo
                                   ` (3 more replies)
  2007-10-26 22:57               ` David Miller
  2 siblings, 4 replies; 208+ messages in thread
From: Ian Lance Taylor @ 2007-10-26 21:29 UTC (permalink / raw)
  To: Michael Matz; +Cc: gcc, gcc-patches

Michael Matz <matz@suse.de> writes:

> Both, the assessment of far-stretchedness and these numbers seem to be 
> invented ad hoc.  The latter is irrelevant (it's not interesting how many 
> cases there are, but how important those cases which occur are, for some 
> metric, let's say performance).  And the former isn't true, i.e. the 
> concern is not far-stretched.  For 456.hmmer for instance it is crucial 
> that this transformation happens, the basic situation looks like so:

What do people think of this patch?  This seems to fix the problem
case without breaking Michael's case.  It basically avoids store
speculation: we don't write to a MEM unless the function
unconditionally writes to the MEM anyhow.

This is basically a public relations exercise.  I doubt this
optimization is especially important, so I think it's OK to disable it
to keep people happy.  Even though the optimization has been there
since gcc 3.4 and nobody noticed.

Of course this kind of thing will break again until somebody takes the
time to fully implement something like the C++0x memory model.

I haven't tested this patch.

Ian

Index: ifcvt.c
===================================================================
--- ifcvt.c	(revision 128958)
+++ ifcvt.c	(working copy)
@@ -2139,6 +2139,32 @@ noce_mem_write_may_trap_or_fault_p (cons
   return false;
 }
 
+/* Return whether a MEM is unconditionally set in the function
+   following TOP_BB.  */
+
+static bool
+noce_mem_unconditionally_set_p (basic_block top_bb, const_rtx mem)
+{
+  basic_block dominator;
+
+  for (dominator = get_immediate_dominator (CDI_POST_DOMINATORS, top_bb);
+       dominator != NULL;
+       dominator = get_immediate_dominator (CDI_POST_DOMINATORS, dominator))
+    {
+      rtx insn;
+
+      FOR_BB_INSNS (dominator, insn)
+	{
+	  if (memory_modified_in_insn_p (mem, insn))
+	    return true;
+	  if (modified_in_p (XEXP (mem, 0), insn))
+	    return false;
+	}
+    }
+
+  return false;
+}
+
 /* Given a simple IF-THEN-JOIN or IF-THEN-ELSE-JOIN block, attempt to convert
    it without using conditional execution.  Return TRUE if we were successful
    at converting the block.  */
@@ -2292,17 +2318,31 @@ noce_process_if_block (struct noce_if_in
       goto success;
     }
 
-  /* Disallow the "if (...) x = a;" form (with an implicit "else x = x;")
-     for optimizations if writing to x may trap or fault, i.e. it's a memory
-     other than a static var or a stack slot, is misaligned on strict
-     aligned machines or is read-only.
-     If x is a read-only memory, then the program is valid only if we
-     avoid the store into it.  If there are stores on both the THEN and
-     ELSE arms, then we can go ahead with the conversion; either the
-     program is broken, or the condition is always false such that the
-     other memory is selected.  */
-  if (!set_b && MEM_P (orig_x) && noce_mem_write_may_trap_or_fault_p (orig_x))
-    return FALSE;
+  if (!set_b && MEM_P (orig_x))
+    {
+      /* Disallow the "if (...) x = a;" form (implicit "else x = x;")
+	 for optimizations if writing to x may trap or fault,
+	 i.e. it's a memory other than a static var or a stack slot,
+	 is misaligned on strict aligned machines or is read-only.  If
+	 x is a read-only memory, then the program is valid only if we
+	 avoid the store into it.  If there are stores on both the
+	 THEN and ELSE arms, then we can go ahead with the conversion;
+	 either the program is broken, or the condition is always
+	 false such that the other memory is selected.  */
+      if (noce_mem_write_may_trap_or_fault_p (orig_x))
+	return FALSE;
+
+      /* Avoid store speculation: given "if (...) x = a" where x is a
+	 MEM, we only want to do the store if x is always set
+	 somewhere in the function.  This avoids cases like
+	   if (pthread_mutex_trylock(mutex))
+	     ++global_variable;
+	 where we only want global_variable to be changed if the mutex
+	 is held.  FIXME: This should ideally be expressed directly in
+	 RTL somehow.  */
+      if (!noce_mem_unconditionally_set_p (test_bb, orig_x))
+	return FALSE;
+    }
 
   if (noce_try_move (if_info))
     goto success;
@@ -3957,7 +3997,7 @@ dead_or_predicable (basic_block test_bb,
 /* Main entry point for all if-conversion.  */
 
 static void
-if_convert (bool recompute_dominance)
+if_convert (void)
 {
   basic_block bb;
   int pass;
@@ -3977,9 +4017,8 @@ if_convert (bool recompute_dominance)
   loop_optimizer_finalize ();
   free_dominance_info (CDI_DOMINATORS);
 
-  /* Compute postdominators if we think we'll use them.  */
-  if (HAVE_conditional_execution || recompute_dominance)
-    calculate_dominance_info (CDI_POST_DOMINATORS);
+  /* Compute postdominators.  */
+  calculate_dominance_info (CDI_POST_DOMINATORS);
 
   df_set_flags (DF_LR_RUN_DCE);
 
@@ -4068,7 +4107,7 @@ rest_of_handle_if_conversion (void)
       if (dump_file)
         dump_flow_info (dump_file, dump_flags);
       cleanup_cfg (CLEANUP_EXPENSIVE);
-      if_convert (false);
+      if_convert ();
     }
 
   cleanup_cfg (0);
@@ -4105,7 +4144,7 @@ gate_handle_if_after_combine (void)
 static unsigned int
 rest_of_handle_if_after_combine (void)
 {
-  if_convert (true);
+  if_convert ();
   return 0;
 }
 
@@ -4138,7 +4177,7 @@ gate_handle_if_after_reload (void)
 static unsigned int
 rest_of_handle_if_after_reload (void)
 {
-  if_convert (true);
+  if_convert ();
   return 0;
 }
 

^ permalink raw reply	[flat|nested] 208+ messages in thread

* Re: Optimization of conditional access to globals: thread-unsafe?
  2007-10-22 10:19     ` Andrew Haley
  2007-10-22 10:50       ` Tomash Brechko
@ 2007-10-26 21:24       ` Florian Weimer
  1 sibling, 0 replies; 208+ messages in thread
From: Florian Weimer @ 2007-10-26 21:24 UTC (permalink / raw)
  To: Andrew Haley; +Cc: Tomash Brechko, gcc

* Andrew Haley:

> The core problem here seems to be that the "C with threads" memory
> model isn't sufficiently well-defined to make a determination
> possible.  You're assuming that you have no resposibility to mark
> shared memory protected by a mutex as volatile, but I know of nothing
> in the C standard that makes such a guarantee.  A prudent programmer
> will make conservative assumptions.

Sprinkling volatile all over the place is looks like the wrong answer.
It disables many optimizations, so you could probably use a simpler
compiler which doesn't perform the problematic optimizations in the
first place.

Not creating spurious stores seems to be a saner approach.  Hans Boehm's
concerns still apply, of course, but with knowledge of the architecture
and GCC's existing support of optimization barriers, programmers
probably have enough control to produce what they need.

^ permalink raw reply	[flat|nested] 208+ messages in thread

* Re: Optimization of conditional access to globals: thread-unsafe?
  2007-10-26 17:10             ` skaller
@ 2007-10-26 19:11               ` Tomash Brechko
  2007-10-26 23:34                 ` skaller
  0 siblings, 1 reply; 208+ messages in thread
From: Tomash Brechko @ 2007-10-26 19:11 UTC (permalink / raw)
  To: skaller; +Cc: gcc

On Sat, Oct 27, 2007 at 03:06:21 +1000, skaller wrote:
> err .. what about the heap??

The heap are objects for which the addresses were taken.  So they can
be shared.  But I haven't yet seen that the optimization we discuss is
being applied to the object accessed though the pointer (see my reply
to Michael Matz).  Maybe this is just a coincidence.

I was beaten already for repeating myself, but please let me do that
once more :).  First, I have a strong believe (though I didn't test
it) that

  if (C)
    val->mem;

runs faster than

  mem->reg;
  if (C)
    val->reg;
  reg->mem;

(short) jump will cost less then unconditional load/store when they
are not needed (especially the store).

BTW, it would be interesting to measure if short jumps are as bad as
long jumps, i.e. whether CPU pipeline is flushed when jump target is
already in it.


Second, in situation like

  loop
    if (C)
      val->mem;

i.e. when there are lots of conditional stores, only one final store
matters.  And current optimization employs this:

  mem->reg;
  loop
    if (C)
      val->reg;
  reg->mem;    // One final store.

But at the cost of additional register this final store can be made
conditional (there are cases when even that register is not needed,
but that requires thorough analysis of val's possible values, i.e. reg
could be initialized to some "invalid" value and then checked for it).

Registers are a valuable resource, yes.  But so is the correct program
result.  Since GCC is correct wrt all standards, next comes its
usability in not-yet-standardized domains.


> And what do you do if you do not KNOW what the storage class is,
> which is the case 99.99% of the time in C++ member functions?

I'm not quite sure what you mean here.  If extern vs static---that's
of no concern.  What matters is whether the object can possibly be
accessed from another thread, and this has nothing specific to C++.



-- 
   Tomash Brechko

^ permalink raw reply	[flat|nested] 208+ messages in thread

* Re: Optimization of conditional access to globals: thread-unsafe?
  2007-10-26 17:54               ` Tomash Brechko
@ 2007-10-26 17:55                 ` Tomash Brechko
  2007-10-28 17:08                   ` Michael Matz
  0 siblings, 1 reply; 208+ messages in thread
From: Tomash Brechko @ 2007-10-26 17:55 UTC (permalink / raw)
  To: Michael Matz; +Cc: gcc

On Fri, Oct 26, 2007 at 21:45:03 +0400, Tomash Brechko wrote:
> Note that it doesn't cancel cmoves, as those are loads, not stores.

I just checked with x86 instruction reference, CMOVcc is reg->reg or
mem->reg, never reg->mem.  You know God's deed when you see it. :)


-- 
   Tomash Brechko

^ permalink raw reply	[flat|nested] 208+ messages in thread

* Re: Optimization of conditional access to globals: thread-unsafe?
  2007-10-26 17:06             ` Michael Matz
@ 2007-10-26 17:54               ` Tomash Brechko
  2007-10-26 17:55                 ` Tomash Brechko
  2007-10-26 21:29               ` Ian Lance Taylor
  2007-10-26 22:57               ` David Miller
  2 siblings, 1 reply; 208+ messages in thread
From: Tomash Brechko @ 2007-10-26 17:54 UTC (permalink / raw)
  To: Michael Matz; +Cc: gcc

On Fri, Oct 26, 2007 at 19:04:10 +0200, Michael Matz wrote:
> int f(int M, int *mc, int *mpp, int *tpmm, int *ip, int *tpim, int *dpp,
>       int *tpdm, int xmb, int *bp, int *ms)
> {
>   int k, sc;
>   for (k = 1; k <= M; k++)
>     {
>       mc[k] = mpp[k-1]   + tpmm[k-1];
>       if ((sc = ip[k-1]  + tpim[k-1]) > mc[k])  mc[k] = sc;
>       if ((sc = dpp[k-1] + tpdm[k-1]) > mc[k])  mc[k] = sc;
>       if ((sc = xmb  + bp[k])         > mc[k])  mc[k] = sc;
>       mc[k] += ms[k];
>     }
> }

Aha, but the store in this example is _never_ speculative when
concurrency in concerned: you _explicitly_ store to mc[k] anyway, so
you may as well add some stores here and there.  If mc[] shared, it's
programmer's responsibility to protect it with the lock.

When you remove the first and the last lines inside the loop, then all
stores will become conditional.  But only one value will get to mc[k],
so there's no point in making the only store unconditional.  Note that
it doesn't cancel cmoves, as those are loads, not stores.

But look at the whole matter another way: suppose GCC implements some
optimization, really cool one, and users quickly find a lot of uses
for it.  But then it is discovered that this optimization is not
general enough, and in come cases wrong code is produced.  What would
you do?  Remove it?  But users will complain.  Ignore the matter?
Other users will complain.  But you may make it optional, like
-funsafe-math-optimizations or -funsafe-loop-optimizations, and
everyone is happy.

Our situation is a bit different, because 1) speculative store is not
a bug per see, 2) program classes where it can do harm
(mutli-threaded), and where it can not (single-threaded), are clearly
separable.  Alright, not entirely, because we don't know when and how
libraries are used.  But that is the case for -funsafe- options above
too.  Want safe library?  Compile with
-fno-thread-unsafe-optimizations, or specify that any user data
pointers to which are passed to the library should not be shared (at
least during the library call).


> >   void
> >   f(int set_v, int *v)
> >   {
> >     if (set_v)
> >       *v = 1;
> >   }
> > 
> > there's no load-maybe_update-store optimization, so there won't be
> > slowdown for such cases also (BTW, how this case is different from
> > when v is global?).
> 
> The difference is, that 'v' might be zero, hence *v could trap, hence it 
> can't be moved out of its control region.  If you somehow could determine 
> that *v can't trap (e.g. by having a dominating access to it already) then 
> the transformation will be done.

Good point.  But how to tell the compiler that it is not NULL?  The
following doesn't work too:

  void
  f(int set_v, int v[1])
  {
    if (set_v)
      v[0] = 1;
  }


  void
  g(int set_v, int *v) __attribute__((nonnull));

  void
  g(int set_v, int *v)
  {
    if (set_v)
      *v = 1;
  }


Please note that I'm not trying to prove you wrong, just curious about
the reasons why there's no optimization.


-- 
   Tomash Brechko

^ permalink raw reply	[flat|nested] 208+ messages in thread

* Re: Optimization of conditional access to globals: thread-unsafe?
  2007-10-26 16:18           ` Tomash Brechko
  2007-10-26 17:06             ` Michael Matz
@ 2007-10-26 17:10             ` skaller
  2007-10-26 19:11               ` Tomash Brechko
  1 sibling, 1 reply; 208+ messages in thread
From: skaller @ 2007-10-26 17:10 UTC (permalink / raw)
  To: Tomash Brechko
  Cc: Dave Korn, 'Ian Lance Taylor', 'Bart Van Assche', gcc


On Fri, 2007-10-26 at 20:17 +0400, Tomash Brechko wrote:

> cases.  Only globals, or locals which address was passed to some
> function, should be treated specially.  

err .. what about the heap??

And what do you do if you do not KNOW what the storage class is,
which is the case 99.99% of the time in C++ member functions?

-- 
John Skaller <skaller at users dot sf dot net>
Felix, successor to C++: http://felix.sf.net

^ permalink raw reply	[flat|nested] 208+ messages in thread

* Re: Optimization of conditional access to globals: thread-unsafe?
  2007-10-26 16:18           ` Tomash Brechko
@ 2007-10-26 17:06             ` Michael Matz
  2007-10-26 17:54               ` Tomash Brechko
                                 ` (2 more replies)
  2007-10-26 17:10             ` skaller
  1 sibling, 3 replies; 208+ messages in thread
From: Michael Matz @ 2007-10-26 17:06 UTC (permalink / raw)
  To: Tomash Brechko; +Cc: gcc

Hi,

On Fri, 26 Oct 2007, Tomash Brechko wrote:

> It was already said that instead of disallowing all optimization with
> volatile, the optimization itself may be made a bit differently.
> Besides, the concern that it will hurt performance at large is a bit
> far-stretched.  You still may speculatively store to automatic var for
> which address was never taken, and this alone covers 50%--80% of
> cases.

Both, the assessment of far-stretchedness and these numbers seem to be 
invented ad hoc.  The latter is irrelevant (it's not interesting how many 
cases there are, but how important those cases which occur are, for some 
metric, let's say performance).  And the former isn't true, i.e. the 
concern is not far-stretched.  For 456.hmmer for instance it is crucial 
that this transformation happens, the basic situation looks like so:

int f(int M, int *mc, int *mpp, int *tpmm, int *ip, int *tpim, int *dpp,
      int *tpdm, int xmb, int *bp, int *ms)
{
  int k, sc;
  for (k = 1; k <= M; k++)
    {
      mc[k] = mpp[k-1]   + tpmm[k-1];
      if ((sc = ip[k-1]  + tpim[k-1]) > mc[k])  mc[k] = sc;
      if ((sc = dpp[k-1] + tpdm[k-1]) > mc[k])  mc[k] = sc;
      if ((sc = xmb  + bp[k])         > mc[k])  mc[k] = sc;
      mc[k] += ms[k];
    }
}

Here the conditional stores to mc[k] are better be implemented as 
conditional moves, otherwise you loose about 25% performance on some 
platforms.  See PR27313, for which I implemented this transformation on 
the tree level.  A similar transformation happens already since much 
longer time by the RTL if-cvt.  All of these are currently completely 
valid transformations, so they could only be redefined as invalid by some 
other memory model.  Such other memory model has to take into account the 
performance implications, which do exist.  Contrary to what some 
proponents of a different model claim.  Certainly some suggestions for 
another memory model look quite similar to considering all non-automatic 
objects as volatile, at which point the question should be allowed why not 
simply using 'volatile'.

> Only globals, or locals which address was passed to some
> function, should be treated specially.  Also, for the case
> 
>   void
>   f(int set_v, int *v)
>   {
>     if (set_v)
>       *v = 1;
>   }
> 
> there's no load-maybe_update-store optimization, so there won't be
> slowdown for such cases also (BTW, how this case is different from
> when v is global?).

The difference is, that 'v' might be zero, hence *v could trap, hence it 
can't be moved out of its control region.  If you somehow could determine 
that *v can't trap (e.g. by having a dominating access to it already) then 
the transformation will be done.


Ciao,
Michael.

^ permalink raw reply	[flat|nested] 208+ messages in thread

* Re: Optimization of conditional access to globals: thread-unsafe?
  2007-10-26 16:04         ` Dave Korn
@ 2007-10-26 16:18           ` Tomash Brechko
  2007-10-26 17:06             ` Michael Matz
  2007-10-26 17:10             ` skaller
  0 siblings, 2 replies; 208+ messages in thread
From: Tomash Brechko @ 2007-10-26 16:18 UTC (permalink / raw)
  To: Dave Korn; +Cc: 'Ian Lance Taylor', 'Bart Van Assche', gcc

On Fri, Oct 26, 2007 at 17:00:28 +0100, Dave Korn wrote:
> >       * Disallow speculative stores on potentially shared objects.
> >       * Disallow reading and re-writing of unrelated objects. (For
> >         instance, if you have struct S{ char a,b; }; it is not OK to
> >         modify b by reading in the whole struct, bit-twiddling b, and
> >         writing the whole struct because that would interfere with
> >         another thread that is trying to write to a.)
> 
>   I don't see how that second one is possible in the most general case.  Some
> cpus don't have all widths of access mode;

From http://www.hpl.hp.com/techreports/2004/HPL-2004-209.pdf:

  Fortunately, the original motivation for this lax specification
  seems to stem from machine architectures that did not support
  byte-wide stores.  To our knowledge, no such architectures are still
  in wide-spread multiprocessor use.


> and how could it possibly work for sub-world bitfields?  (Or are
> those just to be considered 'related'?)

How mutex-protected, or even atomic access to bit-fields could
possibly work?  Yes, they are related, or rather do not constitute a
separate object, but belong to one common.


>   Aren't we about to reinvent -fvolatile, with all the hideous performance
> losses that that implies?

It was already said that instead of disallowing all optimization with
volatile, the optimization itself may be made a bit differently.
Besides, the concern that it will hurt performance at large is a bit
far-stretched.  You still may speculatively store to automatic var for
which address was never taken, and this alone covers 50%--80% of
cases.  Only globals, or locals which address was passed to some
function, should be treated specially.  Also, for the case

  void
  f(int set_v, int *v)
  {
    if (set_v)
      *v = 1;
  }

there's no load-maybe_update-store optimization, so there won't be
slowdown for such cases also (BTW, how this case is different from
when v is global?).


-- 
   Tomash Brechko

^ permalink raw reply	[flat|nested] 208+ messages in thread

* RE: Optimization of conditional access to globals: thread-unsafe?
  2007-10-26 15:51       ` Tomash Brechko
@ 2007-10-26 16:04         ` Dave Korn
  2007-10-26 16:18           ` Tomash Brechko
  0 siblings, 1 reply; 208+ messages in thread
From: Dave Korn @ 2007-10-26 16:04 UTC (permalink / raw)
  To: 'Tomash Brechko', 'Ian Lance Taylor'
  Cc: 'Bart Van Assche', gcc

On 26 October 2007 16:51, Tomash Brechko wrote:

> On Fri, Oct 26, 2007 at 08:32:07 -0700, Ian Lance Taylor wrote:
>> The language standard does not forbid speculative stores to non-atomic
>> objects.

>       * Disallow speculative stores on potentially shared objects.
>       * Disallow reading and re-writing of unrelated objects. (For
>         instance, if you have struct S{ char a,b; }; it is not OK to
>         modify b by reading in the whole struct, bit-twiddling b, and
>         writing the whole struct because that would interfere with
>         another thread that is trying to write to a.)

  I don't see how that second one is possible in the most general case.  Some
cpus don't have all widths of access mode; and how could it possibly work for
sub-world bitfields?  (Or are those just to be considered 'related'?)

> So, will "potentially shared objects" be marked as such explicitly by
> the programmer, or is it a compiler job to identify them?

  Well, the compiler can certainly do some of that (cf. escape analysis), but
it's always going to have to be vastly more conservative than it could be if
the programmer directs it with annotations.  As far as I can see, we'd either
need some very thorough LTO, or we'd just have to treat /all/ globals this way
indiscriminately.

  Aren't we about to reinvent -fvolatile, with all the hideous performance
losses that that implies?

    cheers,
      DaveK
-- 
Can't think of a witty .sigline today....

^ permalink raw reply	[flat|nested] 208+ messages in thread

* Re: Optimization of conditional access to globals: thread-unsafe?
  2007-10-26 15:50     ` Ian Lance Taylor
@ 2007-10-26 15:51       ` Tomash Brechko
  2007-10-26 16:04         ` Dave Korn
  0 siblings, 1 reply; 208+ messages in thread
From: Tomash Brechko @ 2007-10-26 15:51 UTC (permalink / raw)
  To: Ian Lance Taylor; +Cc: Bart Van Assche, gcc

On Fri, Oct 26, 2007 at 08:32:07 -0700, Ian Lance Taylor wrote:
> The language standard does not forbid speculative stores to non-atomic
> objects.

That's why there's a proposal to refine the language.  I was meaning
the folloing:

  http://www.artima.com/cppsource/threads_meeting.html:

  Hans Boehm and Herb Sutter both presented very detailed and
  well-thought out memory models. Their differences are subtle and
  important, but in broad strokes, both proposals paint a similar
  picture. In particular, both proposals:

      * Specify a set of atomic (aka, interlocked) primitive operations.
      * Explicitly specify the ordering constraints on atomic reads and writes.
      * Specify the visibility of atomic writes.
      * Disallow speculative stores on potentially shared objects.
      * Disallow reading and re-writing of unrelated objects. (For
        instance, if you have struct S{ char a,b; }; it is not OK to
        modify b by reading in the whole struct, bit-twiddling b, and
        writing the whole struct because that would interfere with
        another thread that is trying to write to a.)


So, will "potentially shared objects" be marked as such explicitly by
the programmer, or is it a compiler job to identify them?


-- 
   Tomash Brechko

^ permalink raw reply	[flat|nested] 208+ messages in thread

* Re: Optimization of conditional access to globals: thread-unsafe?
  2007-10-26 14:38   ` Tomash Brechko
@ 2007-10-26 15:50     ` Ian Lance Taylor
  2007-10-26 15:51       ` Tomash Brechko
  0 siblings, 1 reply; 208+ messages in thread
From: Ian Lance Taylor @ 2007-10-26 15:50 UTC (permalink / raw)
  To: Tomash Brechko; +Cc: Bart Van Assche, gcc

Tomash Brechko <tomash.brechko@gmail.com> writes:

>   - the compiler should not introduce speculative stores to the shared
>     objects.  This is what my original question was about.  I haven't
>     read all the papers yet, so one thing is still unclear to me: it
>     seems like atomic variables will be annotated as such
>     (atomic<int>).  But I found no proposal for annotation of
>     non-atomic objects that are protected by the ordinary locks (like
>     mutexes).  Will the compiler be forbiden to do all speculative
>     stores, or how will it recognize shared objects as such?

In practice, gcc will provide a variable attribute to mark the
variable as atomic.

The language standard does not forbid speculative stores to non-atomic
objects.

Ian

^ permalink raw reply	[flat|nested] 208+ messages in thread

* Re: Optimization of conditional access to globals: thread-unsafe?
  2007-10-26 13:45 ` Bart Van Assche
  2007-10-26 14:38   ` Tomash Brechko
@ 2007-10-26 15:24   ` Ian Lance Taylor
  1 sibling, 0 replies; 208+ messages in thread
From: Ian Lance Taylor @ 2007-10-26 15:24 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: Tomash Brechko, gcc, Andrew Pinski

"Bart Van Assche" <bart.vanassche@gmail.com> writes:

>   * As known the compiler may reorder function calls and assignments
> to non-volatile variables if the compiler can prove that the called
> function won't modify that variable. This becomes problematic if the
> variable is modified by more than one thread and the called function
> is a synchronization function, e.g. pthread_mutex_lock(). This kind of
> reordering is highly undesirable. This is why any variable that is
> shared over threads has to be declared volatile, even when using
> explicit locking calls.

What happens in practice is that pthread_mutex_lock and friends are
magic functions.  In gcc, this magic implemented using inline
assembler constructs.

Ian

^ permalink raw reply	[flat|nested] 208+ messages in thread

* Re: Optimization of conditional access to globals: thread-unsafe?
  2007-10-26 13:45 ` Bart Van Assche
@ 2007-10-26 14:38   ` Tomash Brechko
  2007-10-26 15:50     ` Ian Lance Taylor
  2007-10-26 15:24   ` Ian Lance Taylor
  1 sibling, 1 reply; 208+ messages in thread
From: Tomash Brechko @ 2007-10-26 14:38 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: gcc

Hello Bart,

Thanks for the summary.  There are good pointers in this e-mail thread
regarding the current state of the process of defining memory model
for C++ (and eventually for C I guess).

From those pointers several conclusions may be made (which are in line
with that you said):

  - though neither Standard C nor POSIX require to use volatile, it
    seems like you have to use it until the memory model is clearly
    defined.

  - the compiler should not introduce speculative stores to the shared
    objects.  This is what my original question was about.  I haven't
    read all the papers yet, so one thing is still unclear to me: it
    seems like atomic variables will be annotated as such
    (atomic<int>).  But I found no proposal for annotation of
    non-atomic objects that are protected by the ordinary locks (like
    mutexes).  Will the compiler be forbiden to do all speculative
    stores, or how will it recognize shared objects as such?

  - the compiler should not cross object boundary when doing the store
    (i.e. when storing to 8-bit char it should not store to the whole
    32/64-bit word).  Here's the same question about shared object
    annotation.


Cheers,

-- 
   Tomash Brechko

^ permalink raw reply	[flat|nested] 208+ messages in thread

* Re: Optimization of conditional access to globals: thread-unsafe?
       [not found] <e2e108260710260541n61462585u99de9bc0617720f4@mail.gmail.com>
@ 2007-10-26 13:45 ` Bart Van Assche
  2007-10-26 14:38   ` Tomash Brechko
  2007-10-26 15:24   ` Ian Lance Taylor
  0 siblings, 2 replies; 208+ messages in thread
From: Bart Van Assche @ 2007-10-26 13:45 UTC (permalink / raw)
  To: Tomash Brechko; +Cc: gcc, Andrew Pinski

On 10/21/07, Tomash Brechko <tomash.brechko@gmail.com> wrote:

> Hello,
>
> I have a question regarding the thread-safeness of a particular GCC
> optimization.  I'm sorry if this was already discussed on the list, if
> so please provide me with the reference to the previous discussion.
>
> Consider this piece of code:
>
>     extern int v;
>
>     void
>     f(int set_v)
>     {
>       if (set_v)
>         v = 1;
>     }
>
> If f() is called concurrently from several threads, then call to f(1)
> should be protected by the mutex.  But do we have to acquire the mutex
> for f(0) calls?  I'd say no, why, there's no access to global v in
> that case.  But GCC 3.3.4--4.3.0 on i686 with -01 generates the
> following:
>
>     f:
>             pushl   %ebp
>             movl    %esp, %ebp
>             cmpl    $0, 8(%ebp)
>             movl    $1, %eax
>             cmove   v, %eax        ; load (maybe)
>             movl    %eax, v        ; store (always)
>             popl    %ebp
>             ret
>
> Note the last unconditional store to v.  Now, if some thread would
> modify v between our load and store (acquiring the mutex first), then
> we will overwrite the new value with the old one (and would do that in
> a thread-unsafe manner, not acquiring the mutex).
>
> So, do the calls to f(0) require the mutex, or it's a GCC bug?
...
> So, could someone explain me why this GCC optimization is valid, and,
> if so, where lies the boundary below which I may safely assume GCC
> won't try to store to objects that aren't stored to explicitly during
> particular execution path?  Or maybe the named bug report is valid
> after all?

Hello Tomash,

I'm not an expert in the C89/C99 standards, but I have written a Ph.D.
on the subject of memory models. What I learned during writing that
Ph.D. is the following:

- If you want to know which optimizations are valid and which ones are
not, you have to look at the semantics defined in the language
standard.

- Every language standard document defines what the result is of
executing a sequential program. The definition of the behavior of a
multithreaded program written in a certain programming language is
called the memory model of that programming language.

- The memory model of C and C++ is still under discussion as has
already been pointed out on this mailing list.

- Although the memory model for C and C++ is still under discussion,
there is a definition for the behavior of multithreaded C and C++
programs. The following is required by the ANSI/ISO C89 standard (from
paragraph 5.1.2.3, Program Execution):
  Accessing a volatile object, modifying an object, modifying a file,
or calling a function
  that does any of those operations are all side effects, which are
changes in the state of
  the execution environment. Evaluation of an expression may produce
side effects. At
  certain specified points in the execution sequence called sequence
points, all side effects
  of previous evaluations shall be complete and no side effects of
subsequent evaluations
  shall have taken place. (A summary of the sequence points is given
in annex C.)

In annex C it is explained that a.o. the call to a function (after
argument evaluation) is a sequence point.

See also http://std.dkuug.dk/JTC1/SC22/WG14/www/docs/n843.pdf

- The above paragraph does not impose any limitation for the compiler
with regard to optimizations on non-volatile variables. Or: the
generated code shown in your mail is allowed by the above paragraph.

- The above paragraph has also the following implications for volatile
variables:
  * There exists a total order for all accesses to all volatile variables.
  * It is the responsibility of the compiler to ensure cache coherency
for volatile variables. If memory barrier instructions are needed to
ensure cache coherency on the architecture for which the compiler is
generating code for, then it is the responsibility of the compiler to
generate these instructions for volatile variables. This fact is often
overlooked.
  * The compiler must generate code such that exactly one store
statement is executed for each assignment to a volatile variable.
Prefetching volatile variables is allowed as long as it does not
violate paragraph 5.1.2.3 from the language definition.
  * As known the compiler may reorder function calls and assignments
to non-volatile variables if the compiler can prove that the called
function won't modify that variable. This becomes problematic if the
variable is modified by more than one thread and the called function
is a synchronization function, e.g. pthread_mutex_lock(). This kind of
reordering is highly undesirable. This is why any variable that is
shared over threads has to be declared volatile, even when using
explicit locking calls.

I hope the above brings more clarity in this discussion.

Bart Van Assche.

^ permalink raw reply	[flat|nested] 208+ messages in thread

* Re: Optimization of conditional access to globals: thread-unsafe?
  2007-10-22 10:15       ` Robert Dewar
@ 2007-10-23 16:53         ` Paul Brook
  0 siblings, 0 replies; 208+ messages in thread
From: Paul Brook @ 2007-10-23 16:53 UTC (permalink / raw)
  To: gcc; +Cc: Robert Dewar, Tomash Brechko

On Monday 22 October 2007, Robert Dewar wrote:
> Erik Trulsson wrote:
> > It is also worth noting that just declaring a variable 'volatile' does
> > not help all that much in making it safer to use in a threded environment
> > if you have multiple CPUs.  (There is nothing that says that a multi-CPU
> > system has to have any kind of automatic cache-coherence.)
>
> The first sentence here could be misleading, there are LOTS of systems
> where there is automatic cache-coherence, and of course the use of
> 'volatile' on such systems does indeed help. If you are working on
> a systemn without cache-coherence, you indeed have big problems, but
> that's rarely the case, most multi-processor computers in common use
> do guarantee cache coherence.

IMHO the statement is correct, but the justification is incorrect.

While most multiprocessor machines do provide cache coherence, many do not 
guarantee strict ordering of memory accesses.  In practice you need both for 
correct operation. i.e. some form of explicit synchronisation is required on 
most modern SMP systems.

Hardware cache coherence just makes this much quicker/easier to implement.
To a first approximation you need a pipeline flush rather than a cache flush.

Paul

^ permalink raw reply	[flat|nested] 208+ messages in thread

* Re: Optimization of conditional access to globals: thread-unsafe?
  2007-10-22 18:00                               ` Tomash Brechko
@ 2007-10-23  9:45                                 ` Andrew Haley
  0 siblings, 0 replies; 208+ messages in thread
From: Andrew Haley @ 2007-10-23  9:45 UTC (permalink / raw)
  To: Tomash Brechko; +Cc: Michael Matz, Dave Korn, gcc

Tomash Brechko writes:
 > On Mon, Oct 22, 2007 at 18:48:02 +0100, Andrew Haley wrote:
 > > Err, not exactly.  :)
 > > 
 > > See http://www.hpl.hp.com/personal/Hans_Boehm/c++mm/why_undef.html
 > 
 > Why, I'd say that page is about original races in the program, not
 > about what compiler should do with races that it introduces itself.
 > 
 > Still, "let's wait and see" is probably the best outcome that I can
 > expect from this discussion, so thanks anyway. ;)

It'll be interesting to see, when the draft recommendation is
published, whether your example would have been correct.

It will, to say the least, be nice to have a proper standard for the
memory model, so that we never have to have "is this pthreads program
defined or not?" arguments ever again.  :-)

Andrew.

^ permalink raw reply	[flat|nested] 208+ messages in thread

* Re: Optimization of conditional access to globals: thread-unsafe?
  2007-10-22 18:15                     ` skaller
@ 2007-10-22 18:26                       ` Andrew Pinski
  0 siblings, 0 replies; 208+ messages in thread
From: Andrew Pinski @ 2007-10-22 18:26 UTC (permalink / raw)
  To: skaller; +Cc: Tomash Brechko, Dave Korn, gcc

On 10/22/07, skaller <skaller@users.sourceforge.net> wrote:
> Registers are a limited resource.

Everything is limited, some processors are more limited than others  :).
Seriously, I think this should be discussed in a language standards
comittee area rather than inside GCC's development since right now GCC
is correct.  I don't want to limit GCC's output to "thread safe"
optimizations.

In fact any optimization that changes order of loads/stores is not
thread safe.  So you just disabled every high level optimization.

-- Pinski

^ permalink raw reply	[flat|nested] 208+ messages in thread

* Re: Optimization of conditional access to globals: thread-unsafe?
  2007-10-22 14:32                   ` Tomash Brechko
  2007-10-22 16:15                     ` Michael Matz
@ 2007-10-22 18:15                     ` skaller
  2007-10-22 18:26                       ` Andrew Pinski
  1 sibling, 1 reply; 208+ messages in thread
From: skaller @ 2007-10-22 18:15 UTC (permalink / raw)
  To: Tomash Brechko; +Cc: Dave Korn, gcc


On Mon, 2007-10-22 at 18:32 +0400, Tomash Brechko wrote:

> But it could use additional register and be:
> 
>                                          0 -> flag_reg;
>                                          loop
>                                            if (condition)
>                                              val -> reg;
>                                              1 -> flag_reg;
>                                          if (flag_reg == 1)
>                                            reg -> mem;
> 

> So, why not use flag_reg and thus make GCC thread-aware for this case?

Registers are a limited resource.

-- 
John Skaller <skaller at users dot sf dot net>
Felix, successor to C++: http://felix.sf.net

^ permalink raw reply	[flat|nested] 208+ messages in thread

* Re: Optimization of conditional access to globals: thread-unsafe?
       [not found]                     ` <18204.57073.943880.741269@zebedee.pink.suse.lists.egcs>
@ 2007-10-22 18:11                       ` Andi Kleen
  0 siblings, 0 replies; 208+ messages in thread
From: Andi Kleen @ 2007-10-22 18:11 UTC (permalink / raw)
  To: Andrew Haley; +Cc: gcc

Andrew Haley <aph-gcc@littlepinkcloud.COM> writes:

> Tomash Brechko writes:
>
>  > 
>  > Several people already suggested to use volatile for shared data.
>  > Yes, it will help because we know it will disable all access
>  > optimizations, including thread-unaware ones.  But I don't want to
>  > disable _all_ optimizations, I rather vote for thread-aware
>  > optimizations.
>
> But your plan would disable optimizations even when it isn't necessary
> to do so.  Only a small part of the data in a multi-threaded program
> are shared.

At least for current x86 it is dubious the cmov change on memory was actually
an improvement.

-Andi

^ permalink raw reply	[flat|nested] 208+ messages in thread

* Re: Optimization of conditional access to globals: thread-unsafe?
  2007-10-22 17:48                             ` Andrew Haley
@ 2007-10-22 18:00                               ` Tomash Brechko
  2007-10-23  9:45                                 ` Andrew Haley
  0 siblings, 1 reply; 208+ messages in thread
From: Tomash Brechko @ 2007-10-22 18:00 UTC (permalink / raw)
  To: Andrew Haley; +Cc: Michael Matz, Dave Korn, gcc

On Mon, Oct 22, 2007 at 18:48:02 +0100, Andrew Haley wrote:
> Err, not exactly.  :)
> 
> See http://www.hpl.hp.com/personal/Hans_Boehm/c++mm/why_undef.html

Why, I'd say that page is about original races in the program, not
about what compiler should do with races that it introduces itself.

Still, "let's wait and see" is probably the best outcome that I can
expect from this discussion, so thanks anyway. ;)


-- 
   Tomash Brechko

^ permalink raw reply	[flat|nested] 208+ messages in thread

* Re: Optimization of conditional access to globals: thread-unsafe?
  2007-10-22 10:09     ` Erik Trulsson
  2007-10-22 10:15       ` Robert Dewar
@ 2007-10-22 17:59       ` skaller
  1 sibling, 0 replies; 208+ messages in thread
From: skaller @ 2007-10-22 17:59 UTC (permalink / raw)
  To: Erik Trulsson; +Cc: Tomash Brechko, gcc


On Mon, 2007-10-22 at 12:09 +0200, Erik Trulsson wrote:

> My own conclusion from this discussion (and others) is that shared memory is
> a lousy paradigm for communication between different threads of execution,
> precisely because it is so hard to specify exactly what should happen or not
> happen in various situations. 

in the abstract, this isn't really the case. Exactly the same problems
occur with message passing as with shared memory, for the trivial
reason that you can view memory reads as sending an address as
a message followed by a reply of the data (smly for writes).

The theorists working on message passing have great fun with
algorithms to ensure proper ordering .. which is just the
same problem as cache synchronisation.

The real difference is scoping: with shared memory it is
easy to accidentally fail to synchronise, but synchronisation
is easy. With processes and message passing, simple jobs
are trivial and all the communication is explicit, but for
complex interactions it is a lot of work and also 
can be extremely inefficient.

the big advantage of processes and message passing is the
potential to scale to the whole universe, whereas shared
memory abstracted across networks is likely to be
extremely slow and hard to reason about even if someone
actually implemented it.

Just as an example, Erlang is dynamically typed, purely
functional, and uses processes and message passing with
no ordering guarantees .. however it allows you to read
messages out of order. What this means is if you want
to synchronise .. you have to write code to actually do
it, eg a double handshake... shared memory systems
do that kind of thing directly in hardware, so you can
sometime work at a much higher level.



-- 
John Skaller <skaller at users dot sf dot net>
Felix, successor to C++: http://felix.sf.net

^ permalink raw reply	[flat|nested] 208+ messages in thread

* RE: Optimization of conditional access to globals: thread-unsafe?
  2007-10-22 17:33                         ` Andrew Haley
  2007-10-22 17:44                           ` Tomash Brechko
@ 2007-10-22 17:51                           ` Dave Korn
  1 sibling, 0 replies; 208+ messages in thread
From: Dave Korn @ 2007-10-22 17:51 UTC (permalink / raw)
  To: 'Andrew Haley', 'Tomash Brechko'
  Cc: 'Michael Matz', gcc

On 22 October 2007 18:34, Andrew Haley wrote:

>  > Again, we are not discussing some particular code sample, and how it
>  > might be fixed, but the problem in general.  Should GCC do
>  > thread-unsafe optimizations, or not?
> 
> We do understand what you're saying, and simply repeating the same
> thing doesn't help.

  Well, just to answer the question at face value, "Yes, of course it should,
because 99.9% of the time the fact of their thread-safety or otherwise is
irrelevant".

  The interesting question of course is how we can get the compiler to
recognize that 0.1% when thread-safety is not just relevant but vital.  That
may still end up requiring some kind of annotation (like 'volatile'), but
defining a clear memory model should allow the compiler to make inferences and
deductions for itself that save the programmer a lot of the work of specifying
what data needs to be thread-safe and when and where.


    cheers,
      DaveK
-- 
Can't think of a witty .sigline today....

^ permalink raw reply	[flat|nested] 208+ messages in thread

* Re: Optimization of conditional access to globals: thread-unsafe?
  2007-10-22 17:44                           ` Tomash Brechko
@ 2007-10-22 17:48                             ` Andrew Haley
  2007-10-22 18:00                               ` Tomash Brechko
  0 siblings, 1 reply; 208+ messages in thread
From: Andrew Haley @ 2007-10-22 17:48 UTC (permalink / raw)
  To: Tomash Brechko; +Cc: Michael Matz, Dave Korn, gcc

Tomash Brechko writes:
 > On Mon, Oct 22, 2007 at 18:33:37 +0100, Andrew Haley wrote:
 > > We do understand what you're saying, and simply repeating the same
 > > thing doesn't help.
 > > 
 > > I think we should wait to see what the C++ working group comes up with
 > > and consider implementing that, rather than some ad-hoc gcc-specific
 > > proposal.
 > 
 > Aha, but repeating worked.  This is the first time someone agrees that
 > the problem lies not entirely in the programmer's code.  Thank you!
 > :))

Err, not exactly.  :)

See http://www.hpl.hp.com/personal/Hans_Boehm/c++mm/why_undef.html

Andrew.

^ permalink raw reply	[flat|nested] 208+ messages in thread

* Re: Optimization of conditional access to globals: thread-unsafe?
  2007-10-22 17:33                         ` Andrew Haley
@ 2007-10-22 17:44                           ` Tomash Brechko
  2007-10-22 17:48                             ` Andrew Haley
  2007-10-22 17:51                           ` Dave Korn
  1 sibling, 1 reply; 208+ messages in thread
From: Tomash Brechko @ 2007-10-22 17:44 UTC (permalink / raw)
  To: Andrew Haley; +Cc: Michael Matz, Dave Korn, gcc

On Mon, Oct 22, 2007 at 18:33:37 +0100, Andrew Haley wrote:
> We do understand what you're saying, and simply repeating the same
> thing doesn't help.
> 
> I think we should wait to see what the C++ working group comes up with
> and consider implementing that, rather than some ad-hoc gcc-specific
> proposal.

Aha, but repeating worked.  This is the first time someone agrees that
the problem lies not entirely in the programmer's code.  Thank you!
:))


-- 
   Tomash Brechko

^ permalink raw reply	[flat|nested] 208+ messages in thread

* Re: Optimization of conditional access to globals: thread-unsafe?
  2007-10-22 17:18                       ` Tomash Brechko
@ 2007-10-22 17:33                         ` Andrew Haley
  2007-10-22 17:44                           ` Tomash Brechko
  2007-10-22 17:51                           ` Dave Korn
  0 siblings, 2 replies; 208+ messages in thread
From: Andrew Haley @ 2007-10-22 17:33 UTC (permalink / raw)
  To: Tomash Brechko; +Cc: Michael Matz, Dave Korn, gcc

Tomash Brechko writes:

 > 
 > Several people already suggested to use volatile for shared data.
 > Yes, it will help because we know it will disable all access
 > optimizations, including thread-unaware ones.  But I don't want to
 > disable _all_ optimizations, I rather vote for thread-aware
 > optimizations.

But your plan would disable optimizations even when it isn't necessary
to do so.  Only a small part of the data in a multi-threaded program
are shared.

 > There is no requirement in POSIX to make all shared data volatile.
 > As the article referenced in the thread explains, there is no
 > agreement between POSIX and C/C++ wrt memory access.  But should it
 > be fixed in the compiler (as article suggests), or should every
 > shared data in every threaded program be defined volatile, just for
 > the case?  I never seen latter approach in any Open Source project
 > (though didn't look for it specifically), and many of them are
 > considered quite portable.
 > 
 > Again, we are not discussing some particular code sample, and how it
 > might be fixed, but the problem in general.  Should GCC do
 > thread-unsafe optimizations, or not?

We do understand what you're saying, and simply repeating the same
thing doesn't help.

I think we should wait to see what the C++ working group comes up with
and consider implementing that, rather than some ad-hoc gcc-specific
proposal.

There's some discussion here:

http://www.artima.com/cppsource/threads_meeting.html

and here:

http://www.hpl.hp.com/personal/Hans_Boehm/c++mm/

Andrew.

^ permalink raw reply	[flat|nested] 208+ messages in thread

* Re: Optimization of conditional access to globals: thread-unsafe?
  2007-10-22 16:15                     ` Michael Matz
  2007-10-22 16:22                       ` Dave Korn
@ 2007-10-22 17:18                       ` Tomash Brechko
  2007-10-22 17:33                         ` Andrew Haley
  1 sibling, 1 reply; 208+ messages in thread
From: Tomash Brechko @ 2007-10-22 17:18 UTC (permalink / raw)
  To: Michael Matz; +Cc: Dave Korn, gcc

On Mon, Oct 22, 2007 at 18:15:35 +0200, Michael Matz wrote:
> > I'd rather wish the optimization would be done differently.  Currently
> > we have:
> > 
> >                                          mem -> reg;
> >    loop                                  loop
> >      if (condition)    => optimize =>      if (condition)
> >        val -> mem;                           val -> reg;
> >                                          reg -> mem;
> > 
> > 
> > But it could use additional register and be:
> > 
> >                                          0 -> flag_reg;
> >                                          loop
> >                                            if (condition)
> >                                              val -> reg;
> >                                              1 -> flag_reg;
> >                                          if (flag_reg == 1)
> >                                            reg -> mem;
> 
> That could be done but would be besides the point.  You traded one 
> conditional store with another one, so you've gained nothing in that 
> transformation.

Rather I traded possibly many conditional stores in a loop with one
conditional store outside the loop.  And this exactly coincides with
the point of discussion: you can't go further, when you replace
conditional store with unconditional one, you introduce the race that
wasn't in the original code.

Several people already suggested to use volatile for shared data.
Yes, it will help because we know it will disable all access
optimizations, including thread-unaware ones.  But I don't want to
disable _all_ optimizations, I rather vote for thread-aware
optimizations.  There is no requirement in POSIX to make all shared
data volatile.  As the article referenced in the thread explains,
there is no agreement between POSIX and C/C++ wrt memory access.  But
should it be fixed in the compiler (as article suggests), or should
every shared data in every threaded program be defined volatile, just
for the case?  I never seen latter approach in any Open Source project
(though didn't look for it specifically), and many of them are
considered quite portable.

Again, we are not discussing some particular code sample, and how it
might be fixed, but the problem in general.  Should GCC do
thread-unsafe optimizations, or not?


-- 
   Tomash Brechko

^ permalink raw reply	[flat|nested] 208+ messages in thread

* RE: Optimization of conditional access to globals: thread-unsafe?
  2007-10-22 16:15                     ` Michael Matz
@ 2007-10-22 16:22                       ` Dave Korn
  2007-10-22 17:18                       ` Tomash Brechko
  1 sibling, 0 replies; 208+ messages in thread
From: Dave Korn @ 2007-10-22 16:22 UTC (permalink / raw)
  To: 'Michael Matz', 'Tomash Brechko'; +Cc: gcc

On 22 October 2007 17:16, Michael Matz wrote:


>> I'd rather wish the optimization would be done differently.  Currently we
>> have: 
>> 
>>                                          mem -> reg;
>>    loop                                  loop
>>      if (condition)    => optimize =>      if (condition)
>>        val -> mem;                           val -> reg;
>>                                          reg -> mem;
>> 
>> 
>> But it could use additional register and be:
>> 
>>                                          0 -> flag_reg;
>>                                          loop
>>                                            if (condition)
>>                                              val -> reg;
>>                                              1 -> flag_reg;
>>                                          if (flag_reg == 1)
>>                                            reg -> mem;
> 
> That could be done but would be besides the point.  You traded one
> conditional store with another one, so you've gained nothing in that
> transformation.  

  Not quite: he's hoisted it (lowered it? sunk it?) out of the bottom of the
loop, so the test/branch/store only occurs once, and inside the loop there's
no memory access at all (which should be faster even than a load-cmove-store
with hot caches and no branches...)


    cheers,
      DaveK
-- 
Can't think of a witty .sigline today....

^ permalink raw reply	[flat|nested] 208+ messages in thread

* Re: Optimization of conditional access to globals: thread-unsafe?
  2007-10-22 14:32                   ` Tomash Brechko
@ 2007-10-22 16:15                     ` Michael Matz
  2007-10-22 16:22                       ` Dave Korn
  2007-10-22 17:18                       ` Tomash Brechko
  2007-10-22 18:15                     ` skaller
  1 sibling, 2 replies; 208+ messages in thread
From: Michael Matz @ 2007-10-22 16:15 UTC (permalink / raw)
  To: Tomash Brechko; +Cc: Dave Korn, gcc

Hi,

On Mon, 22 Oct 2007, Tomash Brechko wrote:

> On Mon, Oct 22, 2007 at 14:53:41 +0100, Dave Korn wrote:
> > The optimisation the compiler is making here is a big win in normal
> > code, you wouldn't want to disable it unless absolutely necessary;
> > to be precise, you wouldn't want to automatically disable it for
> > every loop and variable in a program that used -fopenmp just because
> > /some/ of the variables in that program couldn't be safely accessed
> > that way.
> 
> I'd rather wish the optimization would be done differently.  Currently
> we have:
> 
>                                          mem -> reg;
>    loop                                  loop
>      if (condition)    => optimize =>      if (condition)
>        val -> mem;                           val -> reg;
>                                          reg -> mem;
> 
> 
> But it could use additional register and be:
> 
>                                          0 -> flag_reg;
>                                          loop
>                                            if (condition)
>                                              val -> reg;
>                                              1 -> flag_reg;
>                                          if (flag_reg == 1)
>                                            reg -> mem;

That could be done but would be besides the point.  You traded one 
conditional store with another one, so you've gained nothing in that 
transformation.  The point of this transformation is precisely to get rid 
of that conditional store (enabling for instance other transformations as 
easier store sinking).  That sometimes gains _much_ performance, so 
something we want to do in all cases where it's possible.  You really have 
to protect your data access itself, or make those data accesses volatile, 
there's no way around this.


Ciao,
Michael.

^ permalink raw reply	[flat|nested] 208+ messages in thread

* Re: Optimization of conditional access to globals: thread-unsafe?
  2007-10-22 13:53                 ` Dave Korn
@ 2007-10-22 14:32                   ` Tomash Brechko
  2007-10-22 16:15                     ` Michael Matz
  2007-10-22 18:15                     ` skaller
  0 siblings, 2 replies; 208+ messages in thread
From: Tomash Brechko @ 2007-10-22 14:32 UTC (permalink / raw)
  To: Dave Korn; +Cc: gcc

On Mon, Oct 22, 2007 at 14:53:41 +0100, Dave Korn wrote:
> The optimisation the compiler is making here is a big win in normal
> code, you wouldn't want to disable it unless absolutely necessary;
> to be precise, you wouldn't want to automatically disable it for
> every loop and variable in a program that used -fopenmp just because
> /some/ of the variables in that program couldn't be safely accessed
> that way.

I'd rather wish the optimization would be done differently.  Currently
we have:

                                         mem -> reg;
   loop                                  loop
     if (condition)    => optimize =>      if (condition)
       val -> mem;                           val -> reg;
                                         reg -> mem;


But it could use additional register and be:

                                         0 -> flag_reg;
                                         loop
                                           if (condition)
                                             val -> reg;
                                             1 -> flag_reg;
                                         if (flag_reg == 1)
                                           reg -> mem;


Note that by doing so we also eliminate all memory accesses when they
are not needed (when condition is never true), and memory bandwidth is
a major limiting factor nowadays.  Actually, for the very first code
piece of this thread I'd say that optimization


                                     mem -> reg;
   if (condition)   => optimize =>   if (condition)
     val -> mem;                       val -> reg;
                                     reg -> mem;

(there's no loop) is actually a counter-optimization even in
single-threaded case: we replace a branch, which surely has its costs,
with unconditional memory load and store, which cost much more.  Even
if branching would flush CPU pipeline even when jump destination is
already in the pipeline (is this the case?), memory load has its own
quite big cost plus the cost of flushing one line from the cache just
to perform single operation on mem.

So, why not use flag_reg and thus make GCC thread-aware for this case?
I read the article suggested by Andrew Haley, its main point is that
the compiler should be made thread-aware.  Making all shared objects
volatile is an overkill, and is more a trick rather than a solution.


-- 
   Tomash Brechko

^ permalink raw reply	[flat|nested] 208+ messages in thread

* RE: Optimization of conditional access to globals: thread-unsafe?
  2007-10-22 11:26               ` Tomash Brechko
@ 2007-10-22 13:53                 ` Dave Korn
  2007-10-22 14:32                   ` Tomash Brechko
  0 siblings, 1 reply; 208+ messages in thread
From: Dave Korn @ 2007-10-22 13:53 UTC (permalink / raw)
  To: 'Tomash Brechko'; +Cc: gcc

On 22 October 2007 12:27, Tomash Brechko wrote:

> On Mon, Oct 22, 2007 at 12:19:40 +0100, Dave Korn wrote:
>>   *What* mutex are you referring to?  There is no mutex in that code.
> 
> I was talking about the code in the comment#7.  For the code in the
> comment#1, the piece is simply incomplete.  For it, mutex should be
> used if x < 99, not clear if x >= 99.

  Gotcha.  Well, the rule still is: if you want an exact one-to-one
relationship between assignments in your program and externally-visible memory
accesses, use volatile.  C is not a glorified assembler, it is an idealised
virtual machine implemented on the hardware of a real underlying host, and you
can't make assumptions about internal implementation details of that virtual
machine or the relationship between it and the real machine which is hosting
the code.  The optimisation the compiler is making here is a big win in normal
code, you wouldn't want to disable it unless absolutely necessary; to be
precise, you wouldn't want to automatically disable it for every loop and
variable in a program that used -fopenmp just because /some/ of the variables
in that program couldn't be safely accessed that way.

    cheers,
      DaveK
-- 
Can't think of a witty .sigline today....

^ permalink raw reply	[flat|nested] 208+ messages in thread

* Re: Optimization of conditional access to globals: thread-unsafe?
  2007-10-22 11:19             ` Dave Korn
@ 2007-10-22 11:26               ` Tomash Brechko
  2007-10-22 13:53                 ` Dave Korn
  0 siblings, 1 reply; 208+ messages in thread
From: Tomash Brechko @ 2007-10-22 11:26 UTC (permalink / raw)
  To: Dave Korn; +Cc: gcc

On Mon, Oct 22, 2007 at 12:19:40 +0100, Dave Korn wrote:
>   *What* mutex are you referring to?  There is no mutex in that code.

I was talking about the code in the comment#7.  For the code in the
comment#1, the piece is simply incomplete.  For it, mutex should be
used if x < 99, not clear if x >= 99.


-- 
   Tomash Brechko

^ permalink raw reply	[flat|nested] 208+ messages in thread

* Re: Optimization of conditional access to globals: thread-unsafe?
  2007-10-22 11:08         ` Andrew Haley
@ 2007-10-22 11:21           ` Tomash Brechko
  0 siblings, 0 replies; 208+ messages in thread
From: Tomash Brechko @ 2007-10-22 11:21 UTC (permalink / raw)
  To: Andrew Haley; +Cc: gcc

On Mon, Oct 22, 2007 at 12:08:02 +0100, Andrew Haley wrote:
> Well, that's a big job: you'd have to decide on what a memory model
> really should be, and then implement that model.

Wouldn't the following rule of thumb work?: GCC is allowed to inject
additional store operations on some execution path only if there are
explicit store operations (i.e. issued by the user code if read
verbatim).

The whole problem will vanish if the last store that GCC adds will be
made conditional, like

   if (there_were_explicit_stores_already)
     store;

When execution do not get to basic blocks that have stores, GCC
shouldn't add any.


-- 
   Tomash Brechko

^ permalink raw reply	[flat|nested] 208+ messages in thread

* RE: Optimization of conditional access to globals: thread-unsafe?
  2007-10-22 11:17           ` Tomash Brechko
@ 2007-10-22 11:19             ` Dave Korn
  2007-10-22 11:26               ` Tomash Brechko
  0 siblings, 1 reply; 208+ messages in thread
From: Dave Korn @ 2007-10-22 11:19 UTC (permalink / raw)
  To: 'Tomash Brechko'; +Cc: gcc

On 22 October 2007 12:17, Tomash Brechko wrote:

> On Mon, Oct 22, 2007 at 12:07:20 +0100, Dave Korn wrote:
>> And even volatile wouldn't help if the code said
>> 
>>       if (i > x)
>>         var += i;
>> 
>> instead of a simple assignment.  The race in fact *does* exist in the
>> original program, but is hidden by the fact that you don't care which of
>> two operations that overwrite the previous value complete in which order,
>> but you're assuming the operation that modifies var is atomic, and there's
>> nothing to innately guarantee that in the original program.  The race
>> condition *is* already there.
> 
> Why?  For that example, if executed verbatim, it is either i > x
> always false, or the mutex is properly acquired.  No one is assuming
> atomic update.

  *What* mutex are you referring to?  There is no mutex in that code.

    cheers,
      DaveK
-- 
Can't think of a witty .sigline today....

^ permalink raw reply	[flat|nested] 208+ messages in thread

* Re: Optimization of conditional access to globals: thread-unsafe?
  2007-10-22 11:07         ` Dave Korn
@ 2007-10-22 11:17           ` Tomash Brechko
  2007-10-22 11:19             ` Dave Korn
  0 siblings, 1 reply; 208+ messages in thread
From: Tomash Brechko @ 2007-10-22 11:17 UTC (permalink / raw)
  To: Dave Korn; +Cc: gcc

On Mon, Oct 22, 2007 at 12:07:20 +0100, Dave Korn wrote:
> And even volatile wouldn't help if the code said
> 
>       if (i > x)
>         var += i;
> 
> instead of a simple assignment.  The race in fact *does* exist in the original
> program, but is hidden by the fact that you don't care which of two operations
> that overwrite the previous value complete in which order, but you're assuming
> the operation that modifies var is atomic, and there's nothing to innately
> guarantee that in the original program.  The race condition *is* already
> there.

Why?  For that example, if executed verbatim, it is either i > x
always false, or the mutex is properly acquired.  No one is assuming
atomic update.



-- 
   Tomash Brechko

^ permalink raw reply	[flat|nested] 208+ messages in thread

* Re: Optimization of conditional access to globals: thread-unsafe?
  2007-10-22 10:54         ` Dave Korn
@ 2007-10-22 11:10           ` Tomash Brechko
  0 siblings, 0 replies; 208+ messages in thread
From: Tomash Brechko @ 2007-10-22 11:10 UTC (permalink / raw)
  To: Dave Korn; +Cc: gcc

On Mon, Oct 22, 2007 at 11:54:47 +0100, Dave Korn wrote:
> http://www.google.com/search?q=Threads+cannot+be+implemented+as+a+library&sour
> ceid=mozilla-search&start=0&start=0&ie=utf-8&oe=utf-8&client=firefox-a&rls=org
> .mozilla:en-GB:official


Thanks!


-- 
   Tomash Brechko

^ permalink raw reply	[flat|nested] 208+ messages in thread

* Re: Optimization of conditional access to globals: thread-unsafe?
  2007-10-22 10:50       ` Tomash Brechko
                           ` (2 preceding siblings ...)
  2007-10-22 11:07         ` Dave Korn
@ 2007-10-22 11:08         ` Andrew Haley
  2007-10-22 11:21           ` Tomash Brechko
  3 siblings, 1 reply; 208+ messages in thread
From: Andrew Haley @ 2007-10-22 11:08 UTC (permalink / raw)
  To: Tomash Brechko; +Cc: gcc

Tomash Brechko writes:
 > On Mon, Oct 22, 2007 at 11:19:31 +0100, Andrew Haley wrote:
 > > Please have a read of [1].  Let us know if anything you have observed
 > > isn't covered in that paper.
 > > 
 > > [1] Hans-Juergen Boehm. Threads cannot be implemented as a library. In
 > > Proc. of the ACM SIGPLAN 2005 Conf. on Programming Language
 > > Design and Implementation (PLDI), pages 261?268, Chicago, IL, June
 > > 2005.
 > 
 > Unfortunately I'm not lucky enough to have ACM access.  But from the
 > Abstract:

www.hpl.hp.com/techreports/2004/HPL-2004-209.pdf 

 >   We provide specific arguments that a pure library approach, in which
 >   the compiler is designed independently of threading issues, cannot
 >   guarantee correctness of the resulting code.
 > 
 > Can't agree less!  That's why for _practical_ reasons I'd say GCC
 > should be thread-aware, even if _theoretically_ it doesn't have to.

Well, that's a big job: you'd have to decide on what a memory model
really should be, and then implement that model.  The right approach
is surely to do this within the standardization bodies, which seems to
be the approach Hans Boehm is suggesting.  In the meantime, a prudent
programmer will make conservative assumptions and use volatile,
especially if they hope to write portable programs.

Andrew.

^ permalink raw reply	[flat|nested] 208+ messages in thread

* RE: Optimization of conditional access to globals: thread-unsafe?
  2007-10-22 10:50       ` Tomash Brechko
  2007-10-22 10:54         ` Dave Korn
  2007-10-22 11:00         ` Tomash Brechko
@ 2007-10-22 11:07         ` Dave Korn
  2007-10-22 11:17           ` Tomash Brechko
  2007-10-22 11:08         ` Andrew Haley
  3 siblings, 1 reply; 208+ messages in thread
From: Dave Korn @ 2007-10-22 11:07 UTC (permalink / raw)
  To: 'Tomash Brechko', gcc

On 22 October 2007 11:51, Tomash Brechko wrote:

> Can't agree less!  That's why for _practical_ reasons I'd say GCC
> should be thread-aware, even if _theoretically_ it doesn't have to.
> And AFAIU it already _is_, for the most part of it.  That's why I want
> to see Bug#31862 be confirmed, accepted, and fixed.


  Re that particular bug, there are grounds to say that if gcc is going to
implement a flag -fopenmp, it should try and generate threading-compatible
code, I agree, but your point:

"And the essence of this bug report is that gcc chooses to unconditionally
write to variables that are simply lexically mentioned but otherwise aren't
accessed during execution."

is simply something that you have no right to expect of the compiler in the C
language unless you use volatile.  Here's Jakub's original example:

"int var;
void
foo (int x)
{
  int i;
  for (i = 0; i < 100; i++)
    {
      if (i > x)
        var = i;
    }
}

When some other thread modifies var at the same time while foo (200) is
executed, the compiler inserted a race which doesn't really exist in the
original program, as it will do reg = var; ... var = reg; even when var was
never modified."

  If var is volatile, the compiler won't do that, and it is I'm afraid the
right answer to the problem in this case: 'var' is inappropriately declared if
it is to be used from multiple threads in this way.  And even volatile
wouldn't help if the code said

      if (i > x)
        var += i;

instead of a simple assignment.  The race in fact *does* exist in the original
program, but is hidden by the fact that you don't care which of two operations
that overwrite the previous value complete in which order, but you're assuming
the operation that modifies var is atomic, and there's nothing to innately
guarantee that in the original program.  The race condition *is* already
there.

  There should really be a lock/unlock mutex sequence around the assignment to
var, but within the scope of the if condition.  And at that point you'd find
that gcc didn't hoist anything past the subroutine calls to the mutex
lock/unlock and so only the code path through the then-part of the if would
ever touch the variable at all.


    cheers,
      DaveK
-- 
Can't think of a witty .sigline today....

^ permalink raw reply	[flat|nested] 208+ messages in thread

* Re: Optimization of conditional access to globals: thread-unsafe?
  2007-10-22 10:50       ` Tomash Brechko
  2007-10-22 10:54         ` Dave Korn
@ 2007-10-22 11:00         ` Tomash Brechko
  2007-10-22 11:07         ` Dave Korn
  2007-10-22 11:08         ` Andrew Haley
  3 siblings, 0 replies; 208+ messages in thread
From: Tomash Brechko @ 2007-10-22 11:00 UTC (permalink / raw)
  To: gcc

On Mon, Oct 22, 2007 at 14:50:44 +0400, Tomash Brechko wrote:
> Can't agree less!

Can't agree more!, that's what it was supposed to say, think you've
got it right ;).


-- 
   Tomash Brechko

^ permalink raw reply	[flat|nested] 208+ messages in thread

* RE: Optimization of conditional access to globals: thread-unsafe?
  2007-10-22 10:50       ` Tomash Brechko
@ 2007-10-22 10:54         ` Dave Korn
  2007-10-22 11:10           ` Tomash Brechko
  2007-10-22 11:00         ` Tomash Brechko
                           ` (2 subsequent siblings)
  3 siblings, 1 reply; 208+ messages in thread
From: Dave Korn @ 2007-10-22 10:54 UTC (permalink / raw)
  To: 'Tomash Brechko', gcc

On 22 October 2007 11:51, Tomash Brechko wrote:

> On Mon, Oct 22, 2007 at 11:19:31 +0100, Andrew Haley wrote:
>> Please have a read of [1].  Let us know if anything you have observed
>> isn't covered in that paper. 
>> 
>> [1] Hans-Juergen Boehm. Threads cannot be implemented as a library. In
>> Proc. of the ACM SIGPLAN 2005 Conf. on Programming Language
>> Design and Implementation (PLDI), pages 261?268, Chicago, IL, June
>> 2005.
> 
> Unfortunately I'm not lucky enough to have ACM access.


http://www.google.com/search?q=Threads+cannot+be+implemented+as+a+library&sour
ceid=mozilla-search&start=0&start=0&ie=utf-8&oe=utf-8&client=firefox-a&rls=org
.mozilla:en-GB:official


    cheers,
      DaveK
-- 
Can't think of a witty .sigline today....

^ permalink raw reply	[flat|nested] 208+ messages in thread

* Re: Optimization of conditional access to globals: thread-unsafe?
  2007-10-22 10:19     ` Andrew Haley
@ 2007-10-22 10:50       ` Tomash Brechko
  2007-10-22 10:54         ` Dave Korn
                           ` (3 more replies)
  2007-10-26 21:24       ` Florian Weimer
  1 sibling, 4 replies; 208+ messages in thread
From: Tomash Brechko @ 2007-10-22 10:50 UTC (permalink / raw)
  To: gcc

On Mon, Oct 22, 2007 at 11:19:31 +0100, Andrew Haley wrote:
> Please have a read of [1].  Let us know if anything you have observed
> isn't covered in that paper.
> 
> [1] Hans-Juergen Boehm. Threads cannot be implemented as a library. In
> Proc. of the ACM SIGPLAN 2005 Conf. on Programming Language
> Design and Implementation (PLDI), pages 261?268, Chicago, IL, June
> 2005.

Unfortunately I'm not lucky enough to have ACM access.  But from the
Abstract:

  We provide specific arguments that a pure library approach, in which
  the compiler is designed independently of threading issues, cannot
  guarantee correctness of the resulting code.


Can't agree less!  That's why for _practical_ reasons I'd say GCC
should be thread-aware, even if _theoretically_ it doesn't have to.
And AFAIU it already _is_, for the most part of it.  That's why I want
to see Bug#31862 be confirmed, accepted, and fixed.


-- 
   Tomash Brechko

^ permalink raw reply	[flat|nested] 208+ messages in thread

* RE: Optimization of conditional access to globals: thread-unsafe?
  2007-10-22  1:25   ` skaller
@ 2007-10-22 10:32     ` Dave Korn
  0 siblings, 0 replies; 208+ messages in thread
From: Dave Korn @ 2007-10-22 10:32 UTC (permalink / raw)
  To: 'skaller'; +Cc: 'Tomash Brechko', gcc

On 22 October 2007 02:20, skaller wrote:

> On Mon, 2007-10-22 at 00:07 +0100, Dave Korn wrote:
> 
>>   If you really want all externally-visible accesses to v to be made
>> exactly as the code directs, rather than allowing gcc to optimise them in
>> any way that (from the program's POV) it's just the same 'as-if' they had
>> been done exactly, make v volatile.
> 
> That is not enough. Apart from the lack of ISO semantics for volatile,
> typically a compiler will take volatile as a hint to not hold
> values of the variable in a register.
> 
> On a multi-processor, this is not enough, because each CPU
> may still hold modified values in separate caches.

  Yes.  volatile's job is to make the compiler issue real memory load and
store operations when and where you say in the code.  Beyond that it's all up
to you, just like fflush doesn't guarantee the kernel/filesystem write-back
cache is emptied, only the C runtime library buffer.

> But I don't actually know what gcc does, although I guess
> it does nothing.

  Yep.

>  The OS has to do the right thing here
> when a mutex is locked etc, but the code for that is
> probably in the kernel which is better able to manage
> things like cache synchronisation than a compiler.

  The OS and the system libc together, yes.


    cheers,
      DaveK
-- 
Can't think of a witty .sigline today....

^ permalink raw reply	[flat|nested] 208+ messages in thread

* Re: Optimization of conditional access to globals: thread-unsafe?
  2007-10-22  9:36   ` Tomash Brechko
  2007-10-22 10:09     ` Erik Trulsson
@ 2007-10-22 10:19     ` Andrew Haley
  2007-10-22 10:50       ` Tomash Brechko
  2007-10-26 21:24       ` Florian Weimer
  1 sibling, 2 replies; 208+ messages in thread
From: Andrew Haley @ 2007-10-22 10:19 UTC (permalink / raw)
  To: Tomash Brechko; +Cc: gcc

Tomash Brechko writes:
 > On Mon, Oct 22, 2007 at 00:07:50 +0100, Dave Korn wrote:
 > >   Because of the 'as-if' rule.  Since the standard is neutral
 > > with regard to threads, gcc does not have to take them into
 > > account when it decides whether an optimisation would satisfy the
 > > 'as-if' rule.
 > 
 > If this would be true, then the compiler is free to inject the
 > sequence
 > 
 >   mov mem -> reg
 >   mov reg -> mem
 > 
 > just _anywhere_.

That's right.  This isn't a standards conformance issue, rather one of
quality of implementation.

The core problem here seems to be that the "C with threads" memory
model isn't sufficiently well-defined to make a determination
possible.  You're assuming that you have no resposibility to mark
shared memory protected by a mutex as volatile, but I know of nothing
in the C standard that makes such a guarantee.  A prudent programmer
will make conservative assumptions.

Please have a read of [1].  Let us know if anything you have observed
isn't covered in that paper.

Andrew.

[1] Hans-Juergen Boehm. Threads cannot be implemented as a library. In
Proc. of the ACM SIGPLAN 2005 Conf. on Programming Language
Design and Implementation (PLDI), pages 261?268, Chicago, IL, June
2005.

^ permalink raw reply	[flat|nested] 208+ messages in thread

* Re: Optimization of conditional access to globals: thread-unsafe?
  2007-10-22 10:09     ` Erik Trulsson
@ 2007-10-22 10:15       ` Robert Dewar
  2007-10-23 16:53         ` Paul Brook
  2007-10-22 17:59       ` skaller
  1 sibling, 1 reply; 208+ messages in thread
From: Robert Dewar @ 2007-10-22 10:15 UTC (permalink / raw)
  To: Tomash Brechko, gcc

Erik Trulsson wrote:

> It is also worth noting that just declaring a variable 'volatile' does not
> help all that much in making it safer to use in a threded environment if you
> have multiple CPUs.  (There is nothing that says that a multi-CPU system has
> to have any kind of automatic cache-coherence.)

The first sentence here could be misleading, there are LOTS of systems
where there is automatic cache-coherence, and of course the use of
'volatile' on such systems does indeed help. If you are working on
a systemn without cache-coherence, you indeed have big problems, but
that's rarely the case, most multi-processor computers in common use
do guarantee cache coherence.

^ permalink raw reply	[flat|nested] 208+ messages in thread

* Re: Optimization of conditional access to globals: thread-unsafe?
  2007-10-22  9:36   ` Tomash Brechko
@ 2007-10-22 10:09     ` Erik Trulsson
  2007-10-22 10:15       ` Robert Dewar
  2007-10-22 17:59       ` skaller
  2007-10-22 10:19     ` Andrew Haley
  1 sibling, 2 replies; 208+ messages in thread
From: Erik Trulsson @ 2007-10-22 10:09 UTC (permalink / raw)
  To: Tomash Brechko; +Cc: gcc

On Mon, Oct 22, 2007 at 01:36:17PM +0400, Tomash Brechko wrote:
> On Mon, Oct 22, 2007 at 00:07:50 +0100, Dave Korn wrote:
> >   Because of the 'as-if' rule.  Since the standard is neutral with regard to
> > threads, gcc does not have to take them into account when it decides whether
> > an optimisation would satisfy the 'as-if' rule.
> 
> If this would be true, then the compiler is free to inject the
> sequence
> 
>   mov mem -> reg
>   mov reg -> mem
> 
> just _anywhere_.

As far as the C standard is concerned, yes, the compiler is most certainly
free to insert such a sequence almost anywhere.

If a variable has been declared as 'volatile' however, then all accesses to
it must be according to the abstract machine defined by the C standard, i.e.
the compiler is not allowed to optimize away any access to the variable, nor
is it allowed to insert spurious accesses to the variable.

It is worth noting that exactly what constitutes an access is
implementation-defined.

It is also worth noting that just declaring a variable 'volatile' does not
help all that much in making it safer to use in a threded environment if you
have multiple CPUs.  (There is nothing that says that a multi-CPU system has
to have any kind of automatic cache-coherence.)


>  How the programmer can predict where and when to
> lock the mutex to protect mem?  The only thing we could relay on then
> is that the compiler is sound, it wouldn't inject such a sequence
> unless it really feels so.  But still, how to determine when the
> compiler really feels so?

You will have to read the documentation for the compiler and the
threading library caerfully, and hope that they have something useful
to say on this matter.  All too often they won't, in which case
you will have to do what most programmars do in practice in this situation:
Write something that "should" work, and hope for the best.  Most of the time
it will actually work.


My own conclusion from this discussion (and others) is that shared memory is
a lousy paradigm for communication between different threads of execution,
precisely because it is so hard to specify exactly what should happen or not
happen in various situations. (Most of the time the relevant standards
do not actually specify this in sufficient detail.)  I also conclude that
POSIX threads should be avoided if you are really concerned about
correctness.   (Which of course hasn't stopped lots of people from using
them - with varying results.)
Message passing has an advantage here since then only the people writing
the actual message-passing routines need to know about the underlying
details.



> 
> Here's another piece of code, more real and sound this time:
> 
> 
>   #include <pthread.h>
> 
>   static pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER;
>   static int acquires_count = 0;
> 
>   int
>   trylock()
>   {
>     int res;
> 
>     res = pthread_mutex_trylock(&mutex);
>     if (res == 0)
>       ++acquires_count;
> 
>     return res;
>   }
> 
> 
> Is it thread safe?  Or rather, should the compiler preserve its
> thread-safeness, as seen from the programmer's POV?  Otherwise I don't
> get how pthread_mutex_trylock() could possibly ever be used, because
> it's exactly the case when you _have_ to do the access based on the
> condition, "assume the worst" won't work here.  GCC 4.3 with -O1
> generates:
> 
>   trylock:
>           pushl   %ebp
>           movl    %esp, %ebp
>           subl    $8, %esp
>           movl    $mutex, (%esp)
>           call    pthread_mutex_trylock
>           cmpl    $1, %eax                ; test res
>           movl    acquires_count, %edx    ; load
>           adcl    $0, %edx                ; maybe add 1
>           movl    %edx, acquires_count    ; store
>           leave
>           ret
> 

What happens if you declare the variables as 'volatile' ?
(There is no guarantee that this will make things better, but it
is very likely.)



-- 
<Insert your favourite quote here.>
Erik Trulsson
ertr1013@student.uu.se

^ permalink raw reply	[flat|nested] 208+ messages in thread

* Re: Optimization of conditional access to globals: thread-unsafe?
  2007-10-21 23:07 ` Dave Korn
  2007-10-22  1:25   ` skaller
@ 2007-10-22  9:36   ` Tomash Brechko
  2007-10-22 10:09     ` Erik Trulsson
  2007-10-22 10:19     ` Andrew Haley
  1 sibling, 2 replies; 208+ messages in thread
From: Tomash Brechko @ 2007-10-22  9:36 UTC (permalink / raw)
  To: gcc

On Mon, Oct 22, 2007 at 00:07:50 +0100, Dave Korn wrote:
>   Because of the 'as-if' rule.  Since the standard is neutral with regard to
> threads, gcc does not have to take them into account when it decides whether
> an optimisation would satisfy the 'as-if' rule.

If this would be true, then the compiler is free to inject the
sequence

  mov mem -> reg
  mov reg -> mem

just _anywhere_.  How the programmer can predict where and when to
lock the mutex to protect mem?  The only thing we could relay on then
is that the compiler is sound, it wouldn't inject such a sequence
unless it really feels so.  But still, how to determine when the
compiler really feels so?

Here's another piece of code, more real and sound this time:


  #include <pthread.h>

  static pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER;
  static int acquires_count = 0;

  int
  trylock()
  {
    int res;

    res = pthread_mutex_trylock(&mutex);
    if (res == 0)
      ++acquires_count;

    return res;
  }


Is it thread safe?  Or rather, should the compiler preserve its
thread-safeness, as seen from the programmer's POV?  Otherwise I don't
get how pthread_mutex_trylock() could possibly ever be used, because
it's exactly the case when you _have_ to do the access based on the
condition, "assume the worst" won't work here.  GCC 4.3 with -O1
generates:

  trylock:
          pushl   %ebp
          movl    %esp, %ebp
          subl    $8, %esp
          movl    $mutex, (%esp)
          call    pthread_mutex_trylock
          cmpl    $1, %eax                ; test res
          movl    acquires_count, %edx    ; load
          adcl    $0, %edx                ; maybe add 1
          movl    %edx, acquires_count    ; store
          leave
          ret


-- 
   Tomash Brechko

^ permalink raw reply	[flat|nested] 208+ messages in thread

* RE: Optimization of conditional access to globals: thread-unsafe?
  2007-10-21 23:07 ` Dave Korn
@ 2007-10-22  1:25   ` skaller
  2007-10-22 10:32     ` Dave Korn
  2007-10-22  9:36   ` Tomash Brechko
  1 sibling, 1 reply; 208+ messages in thread
From: skaller @ 2007-10-22  1:25 UTC (permalink / raw)
  To: Dave Korn; +Cc: 'Tomash Brechko', gcc


On Mon, 2007-10-22 at 00:07 +0100, Dave Korn wrote:

>   If you really want all externally-visible accesses to v to be made exactly
> as the code directs, rather than allowing gcc to optimise them in any way that
> (from the program's POV) it's just the same 'as-if' they had been done
> exactly, make v volatile.

That is not enough. Apart from the lack of ISO semantics for volatile,
typically a compiler will take volatile as a hint to not hold
values of the variable in a register.

On a multi-processor, this is not enough, because each CPU
may still hold modified values in separate caches.

Perhaps gcc actually puts a RW barrier to force 
cache synchronisation on every volatile access..
this seems rather expensive and very hard to do since
it is very dependent on the actual box (not just the
processor). Some processor caches might require external
electrical signals to synchronise, for example. This is
quite possible if you have multiple CPU boards in a box.

But I don't actually know what gcc does, although I guess
it does nothing. The OS has to do the right thing here
when a mutex is locked etc, but the code for that is
probably in the kernel which is better able to manage
things like cache synchronisation than a compiler.


-- 
John Skaller <skaller at users dot sf dot net>
Felix, successor to C++: http://felix.sf.net

^ permalink raw reply	[flat|nested] 208+ messages in thread

* Re: Optimization of conditional access to globals: thread-unsafe?
  2007-10-21 16:16   ` Tomash Brechko
  2007-10-21 18:51     ` Richard Guenther
@ 2007-10-22  1:16     ` skaller
  1 sibling, 0 replies; 208+ messages in thread
From: skaller @ 2007-10-22  1:16 UTC (permalink / raw)
  To: Tomash Brechko; +Cc: gcc


On Sun, 2007-10-21 at 20:16 +0400, Tomash Brechko wrote:

> But if C99 is thread-neutral, then it's compiler's responsibility to
> ensure the same result as some abstract machine 

No. The compiler is responsible for ensuring that things work
if, and only if:

1. your program conforms in required ways 

2. you only call functions in the standard library
   or functions you define in your program

Therefore, when making calls to ANY external library,
all bets are off UNLESS that library also meets the above
conditions.

Interfacing the operating system in any way OTHER than
the C standard library functions immediately relieves
the compiler of all responsibility -- and that includes
calls to Posix functions which are not implemented
entirely in Standard C.

Of course, a C compiler may make additional guarantees,
for example, Posix compliance, but then you must check
the Posix standard to see what additional things it 
promises.

AFAIK, access to a shared variable is sound if serialised
by a mutex. However that isn't a proper description,
there are situations where you can safely read a shared
variable without a lock being held, for example if you know
another thread has not modified it since the last synchronisation.


-- 
John Skaller <skaller at users dot sf dot net>
Felix, successor to C++: http://felix.sf.net

^ permalink raw reply	[flat|nested] 208+ messages in thread

* RE: Optimization of conditional access to globals: thread-unsafe?
  2007-10-21 14:55 Tomash Brechko
  2007-10-21 15:26 ` Erik Trulsson
@ 2007-10-21 23:07 ` Dave Korn
  2007-10-22  1:25   ` skaller
  2007-10-22  9:36   ` Tomash Brechko
  2007-10-27 18:15 ` Darryl Miles
  2 siblings, 2 replies; 208+ messages in thread
From: Dave Korn @ 2007-10-21 23:07 UTC (permalink / raw)
  To: 'Tomash Brechko', gcc

On 21 October 2007 15:55, Tomash Brechko wrote:

> Consider this piece of code:
> 
>     extern int v;
> 
>     void
>     f(int set_v)
>     {
>       if (set_v)
>         v = 1;
>     }

>     f:
>             pushl   %ebp
>             movl    %esp, %ebp
>             cmpl    $0, 8(%ebp)
>             movl    $1, %eax
>             cmove   v, %eax        ; load (maybe)
>             movl    %eax, v        ; store (always)
>             popl    %ebp
>             ret
> 
> Note the last unconditional store to v.  
> So, could someone explain me why this GCC optimization is valid, 

  Because of the 'as-if' rule.  Since the standard is neutral with regard to
threads, gcc does not have to take them into account when it decides whether
an optimisation would satisfy the 'as-if' rule.

  If you really want all externally-visible accesses to v to be made exactly
as the code directs, rather than allowing gcc to optimise them in any way that
(from the program's POV) it's just the same 'as-if' they had been done
exactly, make v volatile.

    cheers,
      DaveK
-- 
Can't think of a witty .sigline today....

^ permalink raw reply	[flat|nested] 208+ messages in thread

* Re: Optimization of conditional access to globals: thread-unsafe?
  2007-10-21 16:16   ` Tomash Brechko
@ 2007-10-21 18:51     ` Richard Guenther
  2007-10-22  1:16     ` skaller
  1 sibling, 0 replies; 208+ messages in thread
From: Richard Guenther @ 2007-10-21 18:51 UTC (permalink / raw)
  To: Tomash Brechko; +Cc: gcc

On 10/21/07, Tomash Brechko <tomash.brechko@gmail.com> wrote:
> On Sun, Oct 21, 2007 at 17:26:02 +0200, Erik Trulsson wrote:
> > Note that C99 is firmly based on a single-threaded execution model and says
> > nothing whatsoever about what should happen or not happen in a threaded
> > environment.  According to C99 a C compiler is allowed to generate such code
> > as gcc does.
>
> Yes, I understand that C99 doesn't concern threads per see, but I
> wouldn't call it pro-single-threaded, rather thread-neutral.  I.e. the
> standard isn't made explicitly incompatible with threads, it is
> simply "says nothing about threads".
>
>
> > If you are using some threaded environment then you will have to read the
> > relevant standard for that to find out if it imposes any additional
> > restricitions on a C compiler beyond what the C standard does.
>
> All we have is POSIX, and it imposes very little on compiler I guess.
>
>
> > I suspect that most of them will not say one way or the other about what
> > should happen in this case, which means that you will have to assume the
> > worst case and protect all calls to f() regardless of the value of the
> > argument.
>
> Well, assuming the worst case won't always work, that's why I asked
> about reasonable boundary.  Consider the following (putting
> style/efficiency matters aside):
>
>   #include <pthread.h>
>
>   #define N 100
>
>   /* mutex[i] corresponds to byte[i].  */
>   pthread_mutex_t mutex[N];
>   char byte[N];
>
>   void
>   f(int i)
>   {
>     pthread_mutex_lock(&mutex[i]);
>     byte[i] = 1;
>     pthread_mutex_unlock(&mutex[i]);
>   }
>
>
> Is this code thread-safe?  Because from some POV C99 doesn't forbid to
> load and store the whole word when single byte[i] is accessed (given
> that C99 is pro-single-threaded).
>
> But if C99 is thread-neutral, then it's compiler's responsibility to
> ensure the same result as some abstract machine (which may be
> sequential).  In this case the compiler should access the single byte,
> no more.

On some architectures this is not even possible.  For performance
reasons you should store the individual bytes to separate cache-lines
anyway.

Richard.

^ permalink raw reply	[flat|nested] 208+ messages in thread

* Re: Optimization of conditional access to globals: thread-unsafe?
  2007-10-21 15:26 ` Erik Trulsson
@ 2007-10-21 16:16   ` Tomash Brechko
  2007-10-21 18:51     ` Richard Guenther
  2007-10-22  1:16     ` skaller
  0 siblings, 2 replies; 208+ messages in thread
From: Tomash Brechko @ 2007-10-21 16:16 UTC (permalink / raw)
  To: gcc

On Sun, Oct 21, 2007 at 17:26:02 +0200, Erik Trulsson wrote:
> Note that C99 is firmly based on a single-threaded execution model and says
> nothing whatsoever about what should happen or not happen in a threaded
> environment.  According to C99 a C compiler is allowed to generate such code
> as gcc does.

Yes, I understand that C99 doesn't concern threads per see, but I
wouldn't call it pro-single-threaded, rather thread-neutral.  I.e. the
standard isn't made explicitly incompatible with threads, it is
simply "says nothing about threads".


> If you are using some threaded environment then you will have to read the
> relevant standard for that to find out if it imposes any additional
> restricitions on a C compiler beyond what the C standard does.

All we have is POSIX, and it imposes very little on compiler I guess.


> I suspect that most of them will not say one way or the other about what
> should happen in this case, which means that you will have to assume the
> worst case and protect all calls to f() regardless of the value of the
> argument.

Well, assuming the worst case won't always work, that's why I asked
about reasonable boundary.  Consider the following (putting
style/efficiency matters aside):

  #include <pthread.h>

  #define N 100

  /* mutex[i] corresponds to byte[i].  */
  pthread_mutex_t mutex[N];
  char byte[N];

  void
  f(int i)
  {
    pthread_mutex_lock(&mutex[i]);
    byte[i] = 1;
    pthread_mutex_unlock(&mutex[i]);
  }


Is this code thread-safe?  Because from some POV C99 doesn't forbid to
load and store the whole word when single byte[i] is accessed (given
that C99 is pro-single-threaded).

But if C99 is thread-neutral, then it's compiler's responsibility to
ensure the same result as some abstract machine (which may be
sequential).  In this case the compiler should access the single byte,
no more.


OK, I've got your point, but I'm not satisfied :).


-- 
   Tomash Brechko

^ permalink raw reply	[flat|nested] 208+ messages in thread

* Re: Optimization of conditional access to globals: thread-unsafe?
  2007-10-21 14:55 Tomash Brechko
@ 2007-10-21 15:26 ` Erik Trulsson
  2007-10-21 16:16   ` Tomash Brechko
  2007-10-21 23:07 ` Dave Korn
  2007-10-27 18:15 ` Darryl Miles
  2 siblings, 1 reply; 208+ messages in thread
From: Erik Trulsson @ 2007-10-21 15:26 UTC (permalink / raw)
  To: Tomash Brechko; +Cc: gcc

On Sun, Oct 21, 2007 at 06:55:13PM +0400, Tomash Brechko wrote:
> Hello,
> 
> I have a question regarding the thread-safeness of a particular GCC
> optimization.  I'm sorry if this was already discussed on the list, if
> so please provide me with the reference to the previous discussion.
> 
> Consider this piece of code:
> 
>     extern int v;
>   
>     void
>     f(int set_v)
>     {
>       if (set_v)
>         v = 1;
>     }
> 
> If f() is called concurrently from several threads, then call to f(1)
> should be protected by the mutex.  But do we have to acquire the mutex
> for f(0) calls?  I'd say no, why, there's no access to global v in
> that case.  But GCC 3.3.4--4.3.0 on i686 with -01 generates the
> following:
> 
>     f:
>             pushl   %ebp
>             movl    %esp, %ebp
>             cmpl    $0, 8(%ebp)
>             movl    $1, %eax
>             cmove   v, %eax        ; load (maybe)
>             movl    %eax, v        ; store (always)
>             popl    %ebp
>             ret
> 
> Note the last unconditional store to v.  Now, if some thread would
> modify v between our load and store (acquiring the mutex first), then
> we will overwrite the new value with the old one (and would do that in
> a thread-unsafe manner, not acquiring the mutex).
> 
> So, do the calls to f(0) require the mutex, or it's a GCC bug?

Note that C99 is firmly based on a single-threaded execution model and says
nothing whatsoever about what should happen or not happen in a threaded
environment.  According to C99 a C compiler is allowed to generate such code
as gcc does.


If you are using some threaded environment then you will have to read the
relevant standard for that to find out if it imposes any additional
restricitions on a C compiler beyond what the C standard does.

I suspect that most of them will not say one way or the other about what
should happen in this case, which means that you will have to assume the
worst case and protect all calls to f() regardless of the value of the
argument.


Personally I would say that it is the accesses to 'v' that should be
protected by a mutex (or similar), not the calls to f().
It is almost always better programming practice to protect access to data
objects rather than to code.  



> 
> This very bug was actually already reported for a bit different case,
> "Loop IM and other optimizations harmful for -fopenmp"
> (http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31862 ; please ignore my
> last comment there, as I no longer sure myself).  But the report was
> closed with "UNCONFIRMED" mark, and reasons for that are not quire
> clear to me.  I tried to dig into the C99 standard and David
> Butenhof's "Programming with POSIX Threads", and didn't find any
> indication that call f(0) should be also protected by the mutex.
> 
> Here are some pieces from C99:
> 
> Sec 3.1 par 3: NOTE 2 "Modify" includes the case where the new value
>                being stored is the same as the previous value.
> 
> Sec 3.1 par 4: NOTE 3 Expressions that are not evaluated do not access
>                objects.
> 
> Sec 5.1.2.3 par 3: In the abstract machine, all expressions are
>                    evaluated as specified by the semantics.
> 
> Sec 5.1.2.3 par 5 basically says that the result of the program
> execution wrt volatile objects, external files and terminal output
> should be the same for all confirming implementations.
> 
> Sec 5.1.2.3 par 8: EXAMPLE 1 An implementation might define a
>                    one-to-one correspondence between abstract and
>                    actual semantics: ...
> 
> Sec 5.1.2.3 par 9: Alternatively, an implementation might perform
>                    various optimizations within each translation unit,
>                    such that the actual semantics would agree with the
>                    abstract semantics only when making function calls
>                    across translation unit boundaries. ...
> 
> I think that the above says that even when compiler chooses to do some
> optimizations, the result of the _whole execution_ should be the same
> as if actual semantics equals to abstract semantics.  Sec 5.1.2.3 par
> 9 cited last is not a permission to do optimizations that may change
> the end result.  In our case when threads are involved the result may
> change, because there's no access to v in the abstract semantics, and
> thus no mutex is required from abstract POV.
> 
> 
> So, could someone explain me why this GCC optimization is valid, and,
> if so, where lies the boundary below which I may safely assume GCC
> won't try to store to objects that aren't stored to explicitly during
> particular execution path?  Or maybe the named bug report is valid
> after all?
> 

-- 
<Insert your favourite quote here.>
Erik Trulsson
ertr1013@student.uu.se

^ permalink raw reply	[flat|nested] 208+ messages in thread

* Optimization of conditional access to globals: thread-unsafe?
@ 2007-10-21 14:55 Tomash Brechko
  2007-10-21 15:26 ` Erik Trulsson
                   ` (2 more replies)
  0 siblings, 3 replies; 208+ messages in thread
From: Tomash Brechko @ 2007-10-21 14:55 UTC (permalink / raw)
  To: gcc

Hello,

I have a question regarding the thread-safeness of a particular GCC
optimization.  I'm sorry if this was already discussed on the list, if
so please provide me with the reference to the previous discussion.

Consider this piece of code:

    extern int v;
  
    void
    f(int set_v)
    {
      if (set_v)
        v = 1;
    }

If f() is called concurrently from several threads, then call to f(1)
should be protected by the mutex.  But do we have to acquire the mutex
for f(0) calls?  I'd say no, why, there's no access to global v in
that case.  But GCC 3.3.4--4.3.0 on i686 with -01 generates the
following:

    f:
            pushl   %ebp
            movl    %esp, %ebp
            cmpl    $0, 8(%ebp)
            movl    $1, %eax
            cmove   v, %eax        ; load (maybe)
            movl    %eax, v        ; store (always)
            popl    %ebp
            ret

Note the last unconditional store to v.  Now, if some thread would
modify v between our load and store (acquiring the mutex first), then
we will overwrite the new value with the old one (and would do that in
a thread-unsafe manner, not acquiring the mutex).

So, do the calls to f(0) require the mutex, or it's a GCC bug?

This very bug was actually already reported for a bit different case,
"Loop IM and other optimizations harmful for -fopenmp"
(http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31862 ; please ignore my
last comment there, as I no longer sure myself).  But the report was
closed with "UNCONFIRMED" mark, and reasons for that are not quire
clear to me.  I tried to dig into the C99 standard and David
Butenhof's "Programming with POSIX Threads", and didn't find any
indication that call f(0) should be also protected by the mutex.

Here are some pieces from C99:

Sec 3.1 par 3: NOTE 2 "Modify" includes the case where the new value
               being stored is the same as the previous value.

Sec 3.1 par 4: NOTE 3 Expressions that are not evaluated do not access
               objects.

Sec 5.1.2.3 par 3: In the abstract machine, all expressions are
                   evaluated as specified by the semantics.

Sec 5.1.2.3 par 5 basically says that the result of the program
execution wrt volatile objects, external files and terminal output
should be the same for all confirming implementations.

Sec 5.1.2.3 par 8: EXAMPLE 1 An implementation might define a
                   one-to-one correspondence between abstract and
                   actual semantics: ...

Sec 5.1.2.3 par 9: Alternatively, an implementation might perform
                   various optimizations within each translation unit,
                   such that the actual semantics would agree with the
                   abstract semantics only when making function calls
                   across translation unit boundaries. ...

I think that the above says that even when compiler chooses to do some
optimizations, the result of the _whole execution_ should be the same
as if actual semantics equals to abstract semantics.  Sec 5.1.2.3 par
9 cited last is not a permission to do optimizations that may change
the end result.  In our case when threads are involved the result may
change, because there's no access to v in the abstract semantics, and
thus no mutex is required from abstract POV.


So, could someone explain me why this GCC optimization is valid, and,
if so, where lies the boundary below which I may safely assume GCC
won't try to store to objects that aren't stored to explicitly during
particular execution path?  Or maybe the named bug report is valid
after all?


Thanks in advance,

-- 
   Tomash Brechko

^ permalink raw reply	[flat|nested] 208+ messages in thread

end of thread, other threads:[~2007-11-04 20:06 UTC | newest]

Thread overview: 208+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <e2e108260710260634q7a291337s6e66dfa25f28b68a@mail.gmail.com>
2007-10-26 14:11 ` Optimization of conditional access to globals: thread-unsafe? Bart Van Assche
2007-10-26 15:14   ` Andrew Haley
2007-10-26 15:18     ` Robert Dewar
2007-10-26 15:27       ` Dave Korn
2007-10-26 16:28         ` skaller
2007-10-26 16:38           ` Michael Matz
2007-10-26 17:04         ` Richard Kenner
2007-10-26 16:00       ` Samuel Tardieu
2007-10-26 17:03         ` Samuel Tardieu
2007-10-27  9:33         ` Robert Dewar
2007-10-27 13:49           ` Florian Weimer
2007-10-27 13:59             ` Samuel Tardieu
2007-10-27 14:25               ` Florian Weimer
2007-10-27 19:35                 ` Andrew Haley
2007-10-27 16:25             ` Robert Dewar
2007-10-27 16:43               ` Samuel Tardieu
2007-10-27 12:47     ` Bart Van Assche
2007-10-27 13:07       ` Florian Weimer
2007-10-27 13:16         ` Bart Van Assche
2007-10-27 13:16           ` Andrew Haley
2007-10-27 13:34           ` Florian Weimer
2007-10-28 13:47             ` Bart Van Assche
2007-10-28 13:53               ` Robert Dewar
2007-10-28 15:03                 ` Tomash Brechko
2007-10-28 21:19                 ` Bart Van Assche
2007-10-29  3:19                   ` skaller
2007-10-28 14:18               ` Andrew Haley
2007-10-28 15:07               ` Dave Korn
2007-10-28 17:29                 ` Erik Trulsson
2007-10-28 17:26                   ` Robert Dewar
2007-10-28 17:49                     ` Erik Trulsson
2007-10-28 18:02                       ` Andreas Schwab
2007-11-04 14:33                         ` [wwwdocs] PATCH " Gerald Pfeifer
2007-11-04 23:49                           ` Kai Henningsen
2007-10-28 18:40                       ` Dave Korn
2007-10-28 19:15                         ` Erik Trulsson
2007-10-28 20:43                           ` skaller
2007-10-29  5:17                           ` Ross Smith
2007-10-28 17:39                   ` Richard Guenther
2007-10-28 18:03                     ` Erik Trulsson
2007-10-28 20:12                     ` skaller
2007-10-28 23:04                       ` Richard Guenther
2007-10-29  2:39                         ` skaller
2007-10-29  9:52                           ` Samuel Tardieu
2007-10-29 11:24                             ` skaller
2007-10-29 13:57                               ` Darryl Miles
2007-10-29  9:57                     ` Andrew Haley
2007-10-26 16:08   ` skaller
     [not found] <Pine.LNX.4.64.0710281753210.23011@wotan.suse.de.suse.lists.egcs>
     [not found] ` <20071028.180108.71876074.davem@davemloft.net.suse.lists.egcs>
     [not found]   ` <02e701c819c7$be985620$2e08a8c0@CAM.ARTIMI.COM.suse.lists.egcs>
     [not found]     ` <20071028.183401.197068473.davem@davemloft.net.suse.lists.egcs>
     [not found]       ` <20071029162032.GA10611@synopsys.com.suse.lists.egcs>
     [not found]         ` <47260E97.4020309@adacore.com.suse.lists.egcs>
2007-10-29 19:51           ` Andi Kleen
2007-10-29 20:00             ` Robert Dewar
2007-10-29 20:10               ` Andi Kleen
2007-10-29 20:19                 ` Robert Dewar
2007-10-29 21:29                 ` skaller
2007-10-29 22:07                   ` Robert Dewar
2007-10-30  1:40                     ` Robert Dewar
2007-10-30  6:37                       ` Eric Botcazou
     [not found] <e2e108260710260634q7a291337s6e66dfa25f28b68a@mail.gmail.com.suse.lists.egcs>
     [not found] ` <e2e108260710260705s170a7c82udb0c9db26a408d84@mail.gmail.com.suse.lists.egcs>
     [not found]   ` <18210.795.425145.46885@zebedee.pink.suse.lists.egcs>
     [not found]     ` <e2e108260710270510j56fe188dkabe070f4c6bcbe0a@mail.gmail.com.suse.lists.egcs>
     [not found]       ` <87hckcpvp5.fsf@mid.deneb.enyo.de.suse.lists.egcs>
     [not found]         ` <e2e108260710270607u6798af5em6467bd38788f48cd@mail.gmail.com.suse.lists.egcs>
     [not found]           ` <87abq4ofym.fsf@mid.deneb.enyo.de.suse.lists.egcs>
     [not found]             ` <e2e108260710280631i405e4fd8te51ff7aa2ebece23@mail.gmail.com.suse.lists.egcs>
     [not found]               ` <472492F8.90700@adacore.com.suse.lists.egcs>
     [not found]                 ` <20071028141821.GA4898@moonlight.home.suse.lists.egcs>
2007-10-29 11:57                   ` Andi Kleen
2007-10-29 12:18                     ` Tomash Brechko
2007-10-29 14:12                       ` Andi Kleen
     [not found] <e2e108260710260541n61462585u99de9bc0617720f4@mail.gmail.com.suse.lists.egcs>
     [not found] ` <e2e108260710260620k2a2e21b3t1d6c052f14d36094@mail.gmail.com.suse.lists.egcs>
     [not found]   ` <20071026143334.GA5041@moonlight.home.suse.lists.egcs>
     [not found]     ` <m38x5pj3ig.fsf@localhost.localdomain.suse.lists.egcs>
     [not found]       ` <20071026155101.GB5041@moonlight.home.suse.lists.egcs>
     [not found]         ` <016201c817e9$5454edd0$2e08a8c0@CAM.ARTIMI.COM.suse.lists.egcs>
     [not found]           ` <20071026161739.GC5041@moonlight.home.suse.lists.egcs>
     [not found]             ` <Pine.LNX.4.64.0710261836440.23011@wotan.suse.de.suse.lists.egcs>
     [not found]               ` <m33avxfu2i.fsf@localhost.localdomain.suse.lists.egcs>
2007-10-27 17:08                 ` Andi Kleen
2007-10-27 18:24                   ` Ian Lance Taylor
     [not found] <e2e108260710260541n61462585u99de9bc0617720f4@mail.gmail.com>
2007-10-26 13:45 ` Bart Van Assche
2007-10-26 14:38   ` Tomash Brechko
2007-10-26 15:50     ` Ian Lance Taylor
2007-10-26 15:51       ` Tomash Brechko
2007-10-26 16:04         ` Dave Korn
2007-10-26 16:18           ` Tomash Brechko
2007-10-26 17:06             ` Michael Matz
2007-10-26 17:54               ` Tomash Brechko
2007-10-26 17:55                 ` Tomash Brechko
2007-10-28 17:08                   ` Michael Matz
2007-10-28 18:06                     ` Tomash Brechko
2007-10-28 18:43                       ` Tomash Brechko
2007-10-29  1:29                       ` Dave Korn
2007-10-29  2:05                         ` David Miller
2007-10-29  2:54                           ` Dave Korn
2007-10-29  3:04                             ` David Miller
2007-10-29  3:08                             ` David Miller
2007-10-29  4:35                             ` Mark Mielke
2007-10-29  8:03                             ` Tomash Brechko
2007-10-29  8:08                               ` Tomash Brechko
2007-10-29  8:11                                 ` Andrew Pinski
2007-10-29  8:22                                   ` Tomash Brechko
2007-10-29  8:21                                 ` Eric Botcazou
2007-10-29  8:30                                   ` Tomash Brechko
2007-10-29  8:42                                     ` Eric Botcazou
2007-10-29  8:44                                       ` Tomash Brechko
2007-10-29  8:49                                         ` Tomash Brechko
2007-10-29  8:55                                           ` Eric Botcazou
2007-10-29  9:04                                             ` Tomash Brechko
2007-10-29  9:12                                               ` Tomash Brechko
2007-10-29  9:35                                                 ` Tomash Brechko
2007-10-29 22:04                                               ` Eric Botcazou
2007-10-30  7:48                                                 ` Tomash Brechko
2007-10-30  7:55                                                   ` Tomash Brechko
2007-10-30  7:59                                                   ` Eric Botcazou
2007-10-30  8:03                                                     ` Tomash Brechko
2007-10-30  8:20                                                       ` Tomash Brechko
2007-10-30  8:29                                                         ` Eric Botcazou
2007-10-30  9:04                                                           ` Tomash Brechko
2007-10-30 14:48                                                             ` Eric Botcazou
2007-10-30 15:27                                                               ` Tomash Brechko
2007-10-31  2:21                                                                 ` Eric Botcazou
2007-10-29  3:32                           ` skaller
2007-10-29  4:32                             ` David Miller
2007-10-29  4:54                               ` skaller
2007-10-29 15:14                               ` Michael Matz
2007-10-29  5:08                           ` Darryl Miles
2007-10-29  7:43                             ` David Miller
2007-10-29 12:08                               ` Darryl Miles
2007-10-29 12:14                                 ` Robert Dewar
2007-10-29 17:04                                 ` skaller
2007-10-29 16:47                               ` Joe Buck
2007-10-29 15:00                           ` Michael Matz
2007-10-29 16:20                             ` Tomash Brechko
2007-10-29 16:32                               ` Tomash Brechko
2007-10-29 19:43                               ` Duncan Sands
2007-10-29 20:03                                 ` Jack Lloyd
2007-10-29 20:52                                 ` Tomash Brechko
2007-10-29 20:59                                   ` Michael Matz
2007-10-29 21:14                                     ` Tomash Brechko
2007-10-26 21:29               ` Ian Lance Taylor
2007-10-26 21:39                 ` Diego Novillo
2007-10-26 22:38                   ` Ian Lance Taylor
2007-10-26 22:46                     ` Jonathan Wakely
2007-10-26 22:56                     ` Diego Novillo
2007-10-31 22:43                     ` Jason Merrill
2007-10-31 22:50                       ` Jason Merrill
2007-10-26 21:53                 ` Daniel Jacobowitz
2007-10-26 22:20                 ` Jakub Jelinek
2007-10-26 22:55                   ` Ian Lance Taylor
2007-10-27  0:17                 ` skaller
2007-10-27  0:26                   ` David Daney
2007-10-27  0:36                     ` Robert Dewar
2007-10-27  1:29                     ` skaller
2007-10-27 12:51                   ` Andrew Haley
2007-10-26 22:57               ` David Miller
2007-10-28 17:10                 ` Michael Matz
2007-10-29  1:01                   ` David Miller
2007-10-29  2:23                     ` Mark Mielke
2007-10-29 15:09                       ` Michael Matz
2007-10-29 15:16                         ` Darryl Miles
2007-10-29 15:24                           ` Michael Matz
2007-10-29 15:40                             ` Darryl Miles
2007-10-29 15:16                         ` Mark Mielke
2007-10-30 10:28                         ` Tomash Brechko
2007-10-30 14:50                           ` Ian Lance Taylor
2007-10-30 16:17                             ` Tomash Brechko
2007-10-30 17:05                               ` Ian Lance Taylor
2007-10-30 22:01                                 ` Tomash Brechko
2007-10-29  1:05                   ` David Miller
2007-10-29  1:16                     ` Dave Korn
2007-10-29  1:37                       ` David Miller
2007-10-29  3:22                         ` skaller
2007-10-29 11:54                         ` Robert Dewar
2007-10-29 15:21                           ` Michael Matz
2007-10-29 15:34                             ` Robert Dewar
2007-10-29 15:35                               ` Michael Matz
2007-10-29 15:40                                 ` Robert Dewar
2007-10-29 16:29                         ` Joe Buck
2007-10-29 16:53                           ` Robert Dewar
2007-10-26 17:10             ` skaller
2007-10-26 19:11               ` Tomash Brechko
2007-10-26 23:34                 ` skaller
2007-10-27 10:54                   ` Tomash Brechko
2007-10-26 15:24   ` Ian Lance Taylor
     [not found] <20071022093617.GA5073@moonlight.home.suse.lists.egcs>
     [not found] ` <18204.31027.183382.838763@zebedee.pink.suse.lists.egcs>
     [not found]   ` <20071022105044.GB5073@moonlight.home.suse.lists.egcs>
     [not found]     ` <011501c8149b$b7156c20$2e08a8c0@CAM.ARTIMI.COM.suse.lists.egcs>
     [not found]       ` <20071022111704.GE5073@moonlight.home.suse.lists.egcs>
     [not found]         ` <011601c8149d$7050bea0$2e08a8c0@CAM.ARTIMI.COM.suse.lists.egcs>
     [not found]           ` <20071022112643.GG5073@moonlight.home.suse.lists.egcs>
     [not found]             ` <012501c814b2$f4623470$2e08a8c0@CAM.ARTIMI.COM.suse.lists.egcs>
     [not found]               ` <20071022143215.GH5073@moonlight.home.suse.lists.egcs>
     [not found]                 ` <Pine.LNX.4.64.0710221757450.23011@wotan.suse.de.suse.lists.egcs>
     [not found]                   ` <20071022171757.GI5073@moonlight.home.suse.lists.egcs>
     [not found]                     ` <18204.57073.943880.741269@zebedee.pink.suse.lists.egcs>
2007-10-22 18:11                       ` Andi Kleen
2007-10-21 14:55 Tomash Brechko
2007-10-21 15:26 ` Erik Trulsson
2007-10-21 16:16   ` Tomash Brechko
2007-10-21 18:51     ` Richard Guenther
2007-10-22  1:16     ` skaller
2007-10-21 23:07 ` Dave Korn
2007-10-22  1:25   ` skaller
2007-10-22 10:32     ` Dave Korn
2007-10-22  9:36   ` Tomash Brechko
2007-10-22 10:09     ` Erik Trulsson
2007-10-22 10:15       ` Robert Dewar
2007-10-23 16:53         ` Paul Brook
2007-10-22 17:59       ` skaller
2007-10-22 10:19     ` Andrew Haley
2007-10-22 10:50       ` Tomash Brechko
2007-10-22 10:54         ` Dave Korn
2007-10-22 11:10           ` Tomash Brechko
2007-10-22 11:00         ` Tomash Brechko
2007-10-22 11:07         ` Dave Korn
2007-10-22 11:17           ` Tomash Brechko
2007-10-22 11:19             ` Dave Korn
2007-10-22 11:26               ` Tomash Brechko
2007-10-22 13:53                 ` Dave Korn
2007-10-22 14:32                   ` Tomash Brechko
2007-10-22 16:15                     ` Michael Matz
2007-10-22 16:22                       ` Dave Korn
2007-10-22 17:18                       ` Tomash Brechko
2007-10-22 17:33                         ` Andrew Haley
2007-10-22 17:44                           ` Tomash Brechko
2007-10-22 17:48                             ` Andrew Haley
2007-10-22 18:00                               ` Tomash Brechko
2007-10-23  9:45                                 ` Andrew Haley
2007-10-22 17:51                           ` Dave Korn
2007-10-22 18:15                     ` skaller
2007-10-22 18:26                       ` Andrew Pinski
2007-10-22 11:08         ` Andrew Haley
2007-10-22 11:21           ` Tomash Brechko
2007-10-26 21:24       ` Florian Weimer
2007-10-27 18:15 ` Darryl Miles
2007-10-27 21:35   ` Dave Korn
2007-10-27 22:58     ` Darryl Miles

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).