New pthreads library

public inbox for libc-hacker@sourceware.org
 help / color / mirror / Atom feed

* New pthreads library
@ 1999-08-25 15:47 Mark Kettenis
  1999-08-27 23:51 ` Ulrich Drepper
  0 siblings, 1 reply; 14+ messages in thread
From: Mark Kettenis @ 1999-08-25 15:47 UTC (permalink / raw)
  To: libc-hacker

Hi,

Over the last few months I have been working on a pthreads library for
the Hurd.  Unlike the LinuxThreads library, I have tried to seperate
the system-independend bits as much from the system-dependent bits as
possible, in order to make porting to other systems as easy as
possible.  Several people have expressed the opinion that it would be
a good idea that ultimately all glibc ports would use the same
pthreads library.  To make it possible to use my pthreads library on
Linux too, I'm making a snapshot of my work available and ask
everybody who's interested to take a look at it and discuss it on the
list.

The snapshot is available at:

ftp://alpha.gnu.org/gnu/hurd/contrib/kettenis/pthread-19990825.tar.gz

Not all necessary functions have been implemented yet, but it is my
intention to implement at least those things that LinuxThreads has.
The mutex implementation is a generic one that uses spinlocks.  Of
course this will be replaced with an optimzed version for specific
machines like the one in LinuxThreads.

Nothing is fixed yet.  Please feel free to criticise everything.  I'm
prepared to design huge parts of the library from the ground up if
that increases the chances of making this the unified glibc pthreads
library.

Mark

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: New pthreads library
  1999-08-25 15:47 New pthreads library Mark Kettenis
@ 1999-08-27 23:51 ` Ulrich Drepper
  1999-08-29 15:24   ` Mark Kettenis
  0 siblings, 1 reply; 14+ messages in thread
From: Ulrich Drepper @ 1999-08-27 23:51 UTC (permalink / raw)
  To: Mark Kettenis; +Cc: libc-hacker

Hi,

I've taken a brief look at the code.  The model would currently not be
usable for Linux due to missing functionality in the kernel.  We need
this strange manager thread.  But this hopefully will change.

The code itself looks quite reasonable though some things are really
problematic: too many locks.  E.g., you use a lock to increment global
variables.  This should not be necessary.  Most processors define such
an instruction or a number of instructions.  I have added some time
back the header atomicity.h in glibc and the functions contained
should be used.  There are certainly some missing (e.g.,
atomic_increment and atomic_decrement, which are less generic than
atomic_add).

In this context I should complain a bit about the lack of comments.
For code as complicated as the thread library I would like to see lots
of comments.  I know that it seems early to do this but because you
are not going to develop this code alone by yourself it is necessary
to communicate the decisions.  E.g., from the brief look I was not
able to see why you need the PTHREAD_COUNTED flag.

The next problem is the handling of thread descriptor.  We used to use
the pointer to the data structures as the pthread_t element but
stopped this.  I think the main reason was that it caused problems
recognizing when a thread desriptor was illegal.  I personally don't
think this is much of a problem.

What is a problem is that you dereference the pointer directly.  Maybe
you should have taken a look at the LinuxThreads code.  Here I've
added some time last year some changes which move the access in a
macro.  E.g., instead of

	thread->exited

I would write

	THREAD_GETMEM (thread, exited)

The benefit of this is that on modern architectures with dedicated
thread pointer registers (see SPARC) one could define THREAD_GETMEM as
this:

	register pthread_t *__thread_self __asm__ ("%g6");

	#define THREAD_GETMEM(descr, member) __thread_self->member

It should be obvious that this is much better.  Even for x86 it is
possible to user this trick and the only reason I have not activated
this so far in LinuxThreads is that there iss a kernel problem.  But
for the Hurd you should be able to make it work.

A last point is no really criticism on your code.  I planned to revamp
the entire thread library anyway.  One of the things I want to
implement is the separation of kernel and user threads.  User level
thread implementations have big advantages in some situations and I
would like to have a combination.  I have not yet any concrete
planning but I would have waited making with coming up with a new
design until I have an indea how to tackle this problem.  At least the
implementation should be general enough to allow this modified behaviour.

-- 
---------------.      drepper at gnu.org  ,-.   1325 Chesapeake Terrace
Ulrich Drepper  \    ,-------------------'   \  Sunnyvale, CA 94089 USA
Cygnus Solutions `--' drepper at cygnus.com   `------------------------

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: New pthreads library
  1999-08-27 23:51 ` Ulrich Drepper
@ 1999-08-29 15:24   ` Mark Kettenis
  1999-08-29 16:05     ` Ulrich Drepper
  0 siblings, 1 reply; 14+ messages in thread
From: Mark Kettenis @ 1999-08-29 15:24 UTC (permalink / raw)
  To: drepper; +Cc: libc-hacker

   From: Ulrich Drepper <drepper@cygnus.com>
   Date: 27 Aug 1999 23:49:08 -0700

   Hi,

   I've taken a brief look at the code.  The model would currently not be
   usable for Linux due to missing functionality in the kernel.  We need
   this strange manager thread.  But this hopefully will change.

Thanks, for looking at the code.  When you say that `The model would
currently not be usable for Linux', do you mean that it would involve
a lot of work to merge the manager thread into my current approach,
which would be a lot of wasted effort since the missing functionality
is going to be added to the Linux kernel anyway?  Or do you mean that
there really is a fundamental reason why it would impossible to
implement the system dependent bits of code necessary in my
implementation for the current Linux kernels?

   The code itself looks quite reasonable though some things are really
   problematic: too many locks.  E.g., you use a lock to increment global
   variables.  This should not be necessary.  Most processors define such
   an instruction or a number of instructions.  I have added some time
   back the header atomicity.h in glibc and the functions contained
   should be used.  There are certainly some missing (e.g.,
   atomic_increment and atomic_decrement, which are less generic than
   atomic_add).

I agree that such optimizations should be made when they're possible.
However, I'm a little puzzled over the current implementation.  If
we're compiling the library for a processor that does not support all
atomic operations (e.g. the i386) the file
`sysdeps/generic/atomicity.h' is used.  The operations in this file
aren't really atomic.  Right now this is not a very big problem since
the only use of <atomicity.h> is in the profiling code
(elf/dl-profile.c and gmon/mcount.c), but in the threads library this
is unacceptable.  So in certain cases I'll have to fall back on using
locks.  Implementing atomic_increment on the i386 is not difficult at
all, so that'll defenitely do that.

   In this context I should complain a bit about the lack of comments.
   For code as complicated as the thread library I would like to see lots
   of comments.  I know that it seems early to do this but because you
   are not going to develop this code alone by yourself it is necessary
   to communicate the decisions.  E.g., from the brief look I was not
   able to see why you need the PTHREAD_COUNTED flag.

OK.  Adding comments is now top priority.  By the way, PTHREAD_COUNTED
is no longer there since it would prevent using atomic_increment as
you suggested.  There is also a comment now explaining the possible
race condition between pthread_create() and pthread_exit() that it was
supposed to solve.

   The next problem is the handling of thread descriptor.  We used to use
   the pointer to the data structures as the pthread_t element but
   stopped this.  I think the main reason was that it caused problems
   recognizing when a thread desriptor was illegal.  I personally don't
   think this is much of a problem.

Yes, pthread_join, pthread_detach and pthread_kill are supposed to
fail and return ESRCH if no thread with the specified thread ID could
be found.  For some other functions (for example pthread_getpriority)
this is a ``may fail'' condition, and the behaviour of pthread_equal
is undefined if the thread ID is invalid.  Back when I started I
discussed this with Roland McGrath and Thomas Bushnell.  They said
that making the pointer to the thread data structure the thread ID was
defenitely the right thing (and I agree), and said that this might
very well be that the intention of the standard was to make ESRCH a
``may fail'' condition in all cases.  I decided to keep things as they
are now until someone complains about it.

   What is a problem is that you dereference the pointer directly.  Maybe
   you should have taken a look at the LinuxThreads code.  Here I've
   added some time last year some changes which move the access in a
   macro.  E.g., instead of

	   thread->exited

   I would write

	   THREAD_GETMEM (thread, exited)

   The benefit of this is that on modern architectures with dedicated
   thread pointer registers (see SPARC) one could define THREAD_GETMEM as
   this:

	   register pthread_t *__thread_self __asm__ ("%g6");

	   #define THREAD_GETMEM(descr, member) __thread_self->member

   It should be obvious that this is much better.  Even for x86 it is
   possible to user this trick and the only reason I have not activated
   this so far in LinuxThreads is that there iss a kernel problem.  But
   for the Hurd you should be able to make it work.

Thanks for explaining what THREAD_GETMEM is for.  I didn't quite get
why this `ugflication' of the code was necessary.  Of course I never
looked at the SPARC code.  I will certainly convert my code to do
this, but I may postpone this until the code has stabilized a bit.

By `using this trick for the x86' you probably mean the approach laid
out in `linuxthreads/sysdeps/i386/useldt.h'.  Implementing this
approach for the Hurd should indeed be possible since Mach has the
necessary support for manipulating the LDT.  If you want to support
arbitrary stacks (_PTHREAD_STACKADDR and _PTHREAD_STACKSIZE support)
in an efficient way, such a mechanism is almost essential.

   A last point is no really criticism on your code.  I planned to revamp
   the entire thread library anyway.  One of the things I want to
   implement is the separation of kernel and user threads.  User level
   thread implementations have big advantages in some situations and I
   would like to have a combination.  I have not yet any concrete
   planning but I would have waited making with coming up with a new
   design until I have an indea how to tackle this problem.  At least the
   implementation should be general enough to allow this modified behaviour.

A while back, when you were hinting at a rewrite of the threads
library, you talked about a paper describing the new Mach 3.0 cthreads
library.  I belive the paper you were talking about is `Randall
W. Dean, Using Continuations to Build a User-Level Threads Library'.
I've read the paper and it seems like a good approach to me.  Anyway,
I'll design the public interfaces in such a way that moving
`one-on-one' model to the `many-on-many' model is possible.  I believe
that this only means that I must avoid exposing details about the
underlying kernel threads, which is a good goal anyway, and is not
hard to achieve.  Making sure that we can implement the the
`many-to-many' model later on without changing too much of the
internals is a bit more difficult, but I'll do my best.  I'll avoid
assuming that every thread is always tied to a kernel thread.  If you
have any other suggestions for things that I have to keep in mind,
please don't hesitate to mention them to me.

Mark

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: New pthreads library
  1999-08-29 15:24   ` Mark Kettenis
@ 1999-08-29 16:05     ` Ulrich Drepper
  1999-08-29 22:33       ` Roland McGrath
  0 siblings, 1 reply; 14+ messages in thread
From: Ulrich Drepper @ 1999-08-29 16:05 UTC (permalink / raw)
  To: Mark Kettenis; +Cc: libc-hacker

Mark Kettenis <kettenis@wins.uva.nl> writes:

> Thanks, for looking at the code.  When you say that `The model would
> currently not be usable for Linux', do you mean that it would involve
> a lot of work to merge the manager thread into my current approach,
> which would be a lot of wasted effort since the missing functionality
> is going to be added to the Linux kernel anyway?

Yes, and hopefully yes.  There are some patches for the kernel
floating around which will make the manager unnecessary.  But I have
not yet heard a definitive word from Linus.

> I agree that such optimizations should be made when they're possible.
> However, I'm a little puzzled over the current implementation.  If
> we're compiling the library for a processor that does not support all
> atomic operations (e.g. the i386) the file
> `sysdeps/generic/atomicity.h' is used.  The operations in this file
> aren't really atomic.

Right.  I don't say that the current atomicity.h files are all correct
or that all needed ones and all functionality are available.

E.g., we should have macros like

  #define ATOMIC_VARIABLE(class, type, name)

which could expand for machines without appropriate instructions to

  #define ATOMIC_VARIABLE(class, type, name) \
    class __spin_lock_t name##_lock; \
    class type name

and then ATOMIC_INCREMENT could be defined like

  #define ATOMIC_INCREMENT(name) \
    __spin_lock (name##_lock); \
    ++name; \
    __spin_unlock (name##_lock)

But I really don't care about i386 anymore (and not for old SPARCs
etc) which is why generic/atomicity.h is currently used.  Beside,
there rarely were SMP systems using these processors so the atomicity
primitives need not be SMP safe.

> [Roland and Thomas] said that making the pointer to the thread data
> structure the thread ID was defenitely the right thing (and I
> agree), and said that this might very well be that the intention of
> the standard was to make ESRCH a ``may fail'' condition in all
> cases.  I decided to keep things as they are now until someone
> complains about it.

Using the pointer is only possible if there is a fixed number of
descriptors available and they are all in a preallocated array.  Just
like the old implementations of stdio.  Since I assume that for the
Hurd there will be no limits you cannot use pointers for the reasons
you cited: some function must be able to recognize invalid thread
handles.

I really would suggest to go with an index based thread handle and use
the indirection.  Maybe even with some kind of generation number.
I.e., you a 32/64 bit value where the lower bits are an index and some
of the upper bits are a generation number.  Then a debugging version
could recognize thread handle where the index is again valid, but it
was left over from a different thread.  But this is something for a
debugging version.

> Thanks for explaining what THREAD_GETMEM is for.  I didn't quite get
> why this `ugflication' of the code was necessary.  Of course I never
> looked at the SPARC code.  I will certainly convert my code to do
> this, but I may postpone this until the code has stabilized a bit.

Well, by waiting you only make the number of needed changes bigger.

> By `using this trick for the x86' you probably mean the approach laid
> out in `linuxthreads/sysdeps/i386/useldt.h'.

Yes.

> If you want to support arbitrary stacks (_PTHREAD_STACKADDR and
> _PTHREAD_STACKSIZE support) in an efficient way, such a mechanism is
> almost essential.

It's also essential if you have programs which set up there own
stacks.  E.g., one of the Smalltalk implementations does this.

> A while back, when you were hinting at a rewrite of the threads
> library, you talked about a paper describing the new Mach 3.0 cthreads
> library.  I belive the paper you were talking about is `Randall
> W. Dean, Using Continuations to Build a User-Level Threads Library'.
> I've read the paper and it seems like a good approach to me.

This is one approach but it's not the only way to do this.

> Anyway, I'll design the public interfaces in such a way that moving
> `one-on-one' model to the `many-on-many' model is possible.  I
> believe that this only means that I must avoid exposing details
> about the underlying kernel threads, which is a good goal anyway,
> and is not hard to achieve.

What this does mean is that there must be two kinds of thread
descriptors: kernel and user.  The API only exposes the user threads
but the library must know about both.  And ideally the only machine
dependent thing is the kernel thread part.

Therefore I'd prefer to see this model being taken into account
immediately.  The whole user-thread creation would be generic code.
The pt-start.c files you currently use would not be in sysdeps.  And
it should be possible to implement the normal 1:1 model even with the
structure being the fully development n:m model without performance
decrease.

> If you have any other suggestions for things that I have to keep in
> mind, please don't hesitate to mention them to me.

I'd like to see already at this point a kernel_thread_t separate from
the user pthread_t.  The pthread_t are directly created by the
pthread_create calls, the kernel_thread_t objects are created as an
reaction to a request to run a pthread_t.  I.e., for now always a
kernel thread is created.  Then this can be changed later very easy.

Therefore I'd like to see a user-level scheduler instance (a library
subfunction for now) which gets an pthread_t object reference and then
makes it run.  Either by calling clone() (or whatever Mach uses) or
later by folding the new thread in the set of running user threads.

I know this is all a bit vague and I appreciate that you have done
that much work already.  But I'd really like to see some more planning
and discussion before too much code is written.

-- 
---------------.      drepper at gnu.org  ,-.   1325 Chesapeake Terrace
Ulrich Drepper  \    ,-------------------'   \  Sunnyvale, CA 94089 USA
Cygnus Solutions `--' drepper at cygnus.com   `------------------------

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: New pthreads library
  1999-08-29 16:05     ` Ulrich Drepper
@ 1999-08-29 22:33       ` Roland McGrath
  1999-08-29 23:10         ` Ulrich Drepper
  0 siblings, 1 reply; 14+ messages in thread
From: Roland McGrath @ 1999-08-29 22:33 UTC (permalink / raw)
  To: Ulrich Drepper; +Cc: Thomas Bushnell

> Using the pointer [for pthread_t] is only possible if there is a fixed
> number of descriptors available and they are all in a preallocated array.

Do you say this based on the presumption that one must be able to easily
tell whether a given pointer is a valid thread ID or not?  

> Just like the old implementations of stdio.  Since I assume that for the
> Hurd there will be no limits you cannot use pointers for the reasons you
> cited: some function must be able to recognize invalid thread handles.

1003.1-1996 is not entirely consistent on the requirements for this--or at
least the specification is quite subtle.  It is subtle enough that Thomas
and I previously thought parts of it might have been unintentional, and
certainly subtle enough that I think it deserves full exposition of the
precise constraints before concluding on the implementation ramifications.

pthread_kill, pthread_detach, and pthread_join are required to return ESRCH
if the condition occurs that the thread ID is invalid.

The other calls where ESRCH is mentioned (and relates to a thread ID rather
than a process ID) are specified to return ESRCH if the condition is detected.
I believe this weaker specification allows those calls to crash when given
an invalid thread ID.

The threads introduction (section 16.1 p 333) says:

	A conforming implementation is free to reuse a thread ID after the
	thread terminates if it was created with the `detachstate'
	attribute set to PTHREAD_CREATE_DETACHED or if pthread_detach() or
	pthread_join() has been called for that thread.  If a thread is
	detached, its thread ID is invalid for use as an argument in a call
	to pthread_detach() or pthread_join().

A relevant note in the specification of pthread_kill (3.3.10.2 p 92):

	As in kill(), if SIG is zero, error checking is performed but no
	signal is actually sent.

I take this to indicate the intent that pthread_kill can be used as a
thread liveness check (as kill can be used as a PID liveness check).

I believe those are all the directly relevant citations from the standard.

I think it is safe to presume from general principles that picking a
pthread_t value from thin air (i.e. not the result of any prior
pthread_create call in the same process) has undefined behavior, or at
least unspecified behavior.  (Section 16.1 explicitly specifies using a
pthread_t gotten from another process to be unspecified.)  As a practical
matter, an implementation probably needs to detect and reject zero (which
any application's uninitialized pthread_t variables are likely to be set
to--though such an application is buggy by the spec).  So aside from that,
the issue is thread IDs of terminated threads.  Let's examine the cases.

The thread ID of a terminated, undetached, unjoined thread is still valid.
Its data structure sits around until it is detached or joined.

The thread ID of a terminated, detached thread is invalid but subject to
reuse.  Likewise the thread ID of a terminated, joined thread.
pthread_kill, pthread_detach, and pthread_join are required to return ESRCH
if the condition occurs that "No thread could be found corresponding to
that specified by the given thread ID."  So these are allowed either to
succeed by operating on a new thread that has reused the terminated
thread's ID, or to fail with ESRCH.

Personally, I think pthread_kill is the only one of these where there is a
good argument for requiring robust detection of thread IDs for terminated,
detached threads.  But those three are what's specified, and anyway just
one call is enough to require arranging things to make robust detection
possible.

I see only one way to meet the spec while using a data structure pointer
for pthread_t.  That is never to free the data structures, only reuse them
after the thread has terminated and been joined or detached.  This has the
obvious cost of consuming (presumably mostly untouched) virtual memory
proportional to the maximum number of threads the process has ever had, no
matter how few threads remain live.

> I really would suggest to go with an index based thread handle and use
> the indirection.  Maybe even with some kind of generation number.

That seems reasonable enough to me.  The main downside to this is the need
to resize the indirection table as the number of threads grows.

> > Thanks for explaining what THREAD_GETMEM is for.  I didn't quite get
> > why this `ugflication' of the code was necessary.  Of course I never
> > looked at the SPARC code.  I will certainly convert my code to do
> > this, but I may postpone this until the code has stabilized a bit.

I don't understand why the uglifying macro is necessary at all.  With a
global register variable in scope and an inline `thread_self' function that
just returns its value, the compiler does the right thing for:

	struct thread_internal *self = internal_thread_self ();
	blah = self->foobar;

i.e., it just optimizes out the SELF variable completely and replaces it
with the global register. 

For the x86 segment register trick, you can do a very similar thing just
with another special declaration.  Either:

	extern struct thread_internal *_self asm("%gs:0");
	inline struct thread_internal *thread_self() { return _self; }

or:

	extern struct thread_internal _self asm("%gs:0");
	inline struct thread_internal *thread_self() { return &_self; }

depending on the exact flavor of segment register trick you're using.
> > A while back, when you were hinting at a rewrite of the threads
> > library, you talked about a paper describing the new Mach 3.0 cthreads
> > library.  I belive the paper you were talking about is `Randall
> > W. Dean, Using Continuations to Build a User-Level Threads Library'.
> > I've read the paper and it seems like a good approach to me.
> 
> This is one approach but it's not the only way to do this.

There are many designs that have been researched.  Most of the things we
have discussed here in the past grow out of the "scheduler activations"
concept.  The Utah OSDI'99 paper (see
http://www.cs.utah.edu/projects/flux/papers/index.html ) discusses issues
somewhat related to this, and its bibliography contains some citations
about scheduler activations and related work (including the Mach
implementation of scheduler activations, I believe).

> What this does mean is that there must be two kinds of thread
> descriptors: kernel and user.  The API only exposes the user threads
> but the library must know about both.  

All of this is internal implementation structure that will be completely
hidden from the user API, and almost completely hidden from the user ABI.
There are only a few things in the user API that may be important to inline
for speed such that the user ABI might depend on any thread data structure
at all; these are thread-specific data and pthread_self, and probably no
others.  Only a small data structure (that can be a prefix of the full
internal thread descriptor) containing the information these inlined calls
need becomes a part of the user ABI.  The full data structure used to
describe threads can change and evolve for implementation convenience,
without affecting the ABI at all.

Since this structure layout is purely internal source-level decision, the
"kernel thread part" can be either a separate structure or just a portion
of the full thread data structure, according to the convenience of the
implementation.  For a particular implementation/configuration that is
always 1:1, there is no reason not to include everything in a single data
structure allocated in one chunk.

> And ideally the only machine dependent thing is the kernel thread part.

Hmm.  On the contrary, I would think that the OS-specific kernel interface
part would be mostly machine-independent (since the OS takes care of
abstracting the state it maintains).  Conversely, some implementation
models do preemptive thread context switching purely in user mode (e.g. the
scheduler activations model); so the "user part" is then responsible for
the highly machine-dependent details of saving a preempted user thread's
state and restoring previously saved context.

What I would like to see is the machine-dependent but OS-independent parts
well isolated so we can reuse (at least parts of) the machine-dependent
thread switching code in different OS implementations.  Something akin to
the Solaris getcontext/setcontext interface might be nice.

> I'd like to see already at this point a kernel_thread_t separate from
> the user pthread_t.  The pthread_t are directly created by the
> pthread_create calls, the kernel_thread_t objects are created as an
> reaction to a request to run a pthread_t.  I.e., for now always a
> kernel thread is created.  Then this can be changed later very easy.

Again, I think this is something that should not be fixed for all versions
of the implementation.  The division of the abstractions into conceptually
distinct data structures is certainly good.  But it can easily be a
decision made at source level via sysdeps typedefs/macros whether to
include a "kernel thread" data structure directly in the pthread data
structure or to use a pointer to a separately allocated structure.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: New pthreads library
  1999-08-29 22:33       ` Roland McGrath
@ 1999-08-29 23:10         ` Ulrich Drepper
  1999-08-30  0:16           ` Roland McGrath
  0 siblings, 1 reply; 14+ messages in thread
From: Ulrich Drepper @ 1999-08-29 23:10 UTC (permalink / raw)
  To: Roland McGrath; +Cc: Mark Kettenis, libc-hacker, Thomas Bushnell

Roland McGrath <roland@frob.com> writes:

> Do you say this based on the presumption that one must be able to easily
> tell whether a given pointer is a valid thread ID or not?

This is what the specification says for some of the functions
(pthread_join).  We can of course choose to ignore it but it's a bit
more problematic.  Just assume this code:

	pthread_create (&t, ...);
	pthread_join (t, ...);

There is no guarantee that the newly created thread already terminated
when the join operation starts.  It might even be that there is a new
thread with this thread handle and so the join is for a different
thread.

> I believe this weaker specification allows those calls to crash when given
> an invalid thread ID.

At least the definition is unspecified.

> That seems reasonable enough to me.  The main downside to this is the need
> to resize the indirection table as the number of threads grows.

Well, there are techniques available to reduce the costs of this.

> For the x86 segment register trick, you can do a very similar thing just
> with another special declaration.  Either:
> 
> 	extern struct thread_internal *_self asm("%gs:0");
> 	inline struct thread_internal *thread_self() { return _self; }
> 
> or:
> 
> 	extern struct thread_internal _self asm("%gs:0");
> 	inline struct thread_internal *thread_self() { return &_self; }

The second case does not work (try it, you get a wrong use of %gs) and
overall this is not the same.  In your case the compiler has to waste
a register for the thread data pointer.  If you look at the code in
linuxthreads you'll see that I've written the code so that I directly
access the thread data through the %gs register with an non-zero offset.

This makes a big difference for this stupid architecture.  If you can
find a way to modify your code appropriate it's certainly better.  But
I haven't found a possibility.

> Hmm.  On the contrary, I would think that the OS-specific kernel interface
> part would be mostly machine-independent (since the OS takes care of
> abstracting the state it maintains).  Conversely, some implementation
> models do preemptive thread context switching purely in user mode (e.g. the
> scheduler activations model); so the "user part" is then responsible for
> the highly machine-dependent details of saving a preempted user thread's
> state and restoring previously saved context.

Well, yes, of course this part is also dependend.  But there is
already functionality to do this in the libc API (swapcontext etc).

> But it can easily be a decision made at source level via sysdeps
> typedefs/macros whether to include a "kernel thread" data structure
> directly in the pthread data structure or to use a pointer to a
> separately allocated structure.

Makes sense.  But we should forsee this right from the beginning.

-- 
---------------.      drepper at gnu.org  ,-.   1325 Chesapeake Terrace
Ulrich Drepper  \    ,-------------------'   \  Sunnyvale, CA 94089 USA
Cygnus Solutions `--' drepper at cygnus.com   `------------------------

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: New pthreads library
  1999-08-29 23:10         ` Ulrich Drepper
@ 1999-08-30  0:16           ` Roland McGrath
  1999-08-30  8:29             ` Ulrich Drepper
  1999-08-30 14:11             ` Richard Henderson
  0 siblings, 2 replies; 14+ messages in thread
From: Roland McGrath @ 1999-08-30  0:16 UTC (permalink / raw)
  To: Ulrich Drepper; +Cc: Mark Kettenis, libc-hacker, Thomas Bushnell

> This is what the specification says for some of the functions
> (pthread_join).  We can of course choose to ignore it but it's a bit
> more problematic.  Just assume this code:
> 
> 	pthread_create (&t, ...);
> 	pthread_join (t, ...);
> 
> There is no guarantee that the newly created thread already terminated
> when the join operation starts.  It might even be that there is a new
> thread with this thread handle and so the join is for a different
> thread.

That is an entirely different case (which I already mentioned).  An
undetached thread ID clearly remains valid after it terminates, so that
pthread_join can work.  That is why I went to all the trouble of clearing
stating the complete set of cases, I wish you would address the points I
raised if you have something to say.

> > For the x86 segment register trick, you can do a very similar thing just
> > with another special declaration.  Either:
> > 
> > 	extern struct thread_internal *_self asm("%gs:0");
> > 	inline struct thread_internal *thread_self() { return _self; }
> > 
> > or:
> > 
> > 	extern struct thread_internal _self asm("%gs:0");
> > 	inline struct thread_internal *thread_self() { return &_self; }
> 
> The second case does not work (try it, you get a wrong use of %gs) and
> overall this is not the same.  In your case the compiler has to waste
> a register for the thread data pointer.  

Have you tried it?  I have, and I have used this very technique in another
implementation in the past.  Here is an example:

	struct foo {
	  int x, y;
	};
	extern struct foo seg __asm__("%gs:0");
	static inline struct foo *
	the_foo(void) { return &seg; }

	int foobar(int x)
	{
	  struct foo *me = the_foo();
	  return x + me->y;
	}

This compiles to:

	foobar:
		pushl %ebp
		movl %esp,%ebp
		movl %gs:0+4,%eax
		addl 8(%ebp),%eax
		leave
		ret

(That's from egcs 1.1b, but I have used this technique in the past with gcc
2.7.2.1 as well.)  It is only a convenient happenstance of the assembly
syntax that this kludge turns out to work, but, hell, it's the x86.

> This makes a big difference for this stupid architecture.  If you can
> find a way to modify your code appropriate it's certainly better.  But
> I haven't found a possibility.

I have only the working example of exactly the syntax I used in my first
message to refer you to.  What else can I say?

> Well, yes, of course this part is also dependend.  But there is
> already functionality to do this in the libc API (swapcontext etc).

Yes, but we need to implement it!

> > But it can easily be a decision made at source level via sysdeps
> > typedefs/macros whether to include a "kernel thread" data structure
> > directly in the pthread data structure or to use a pointer to a
> > separately allocated structure.
> 
> Makes sense.  But we should forsee this right from the beginning.

Of course (and here we are foreseeing it right now in the beginning!).

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: New pthreads library
  1999-08-30  0:16           ` Roland McGrath
@ 1999-08-30  8:29             ` Ulrich Drepper
  1999-08-30 12:12               ` Roland McGrath
  1999-08-30 14:11             ` Richard Henderson
  1 sibling, 1 reply; 14+ messages in thread
From: Ulrich Drepper @ 1999-08-30  8:29 UTC (permalink / raw)
  To: Roland McGrath; +Cc: Mark Kettenis, libc-hacker, Thomas Bushnell

Roland McGrath <roland@frob.com> writes:

> Have you tried it?

Yes, I did.  And for my test case I got something like

	movl $%gs:0, %eax
	movl 4(%eax), %eax

or so.  I see that your example is working but I cannot explain ehy
they one I had does not work.  (Of course I overwrote my test case
with your's).  If I got the result I saw because of an error on my
side this method is certainly preferrable.  But it might also be that
gcc does something strange things and it is not reliable.  I'll talk
with our gcc people.

-- 
---------------.      drepper at gnu.org  ,-.   1325 Chesapeake Terrace
Ulrich Drepper  \    ,-------------------'   \  Sunnyvale, CA 94089 USA
Cygnus Solutions `--' drepper at cygnus.com   `------------------------

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: New pthreads library
  1999-08-30  8:29             ` Ulrich Drepper
@ 1999-08-30 12:12               ` Roland McGrath
  1999-08-30 12:34                 ` Ulrich Drepper
  0 siblings, 1 reply; 14+ messages in thread
From: Roland McGrath @ 1999-08-30 12:12 UTC (permalink / raw)
  To: Ulrich Drepper; +Cc: libc-hacker

> Roland McGrath <roland@frob.com> writes:
> 
> > Have you tried it?
> 
> Yes, I did.  And for my test case I got something like
> 
> 	movl $%gs:0, %eax
> 	movl 4(%eax), %eax

Ah, yes.  That is what you will get for:

	extern struct foo seg asm("%gs:0");
	... = &seg;

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: New pthreads library
  1999-08-30 12:12               ` Roland McGrath
@ 1999-08-30 12:34                 ` Ulrich Drepper
  1999-08-30 13:03                   ` Roland McGrath
  0 siblings, 1 reply; 14+ messages in thread
From: Ulrich Drepper @ 1999-08-30 12:34 UTC (permalink / raw)
  To: Roland McGrath; +Cc: libc-hacker

Roland McGrath <frob@MIT.EDU> writes:

> Ah, yes.  That is what you will get for:
> 
> 	extern struct foo seg asm("%gs:0");
> 	... = &seg;

We'll see whether we can work around this.  Maybe it's never necessary.

-- 
---------------.      drepper at gnu.org  ,-.   1325 Chesapeake Terrace
Ulrich Drepper  \    ,-------------------'   \  Sunnyvale, CA 94089 USA
Cygnus Solutions `--' drepper at cygnus.com   `------------------------

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: New pthreads library
  1999-08-30 12:34                 ` Ulrich Drepper
@ 1999-08-30 13:03                   ` Roland McGrath
  0 siblings, 0 replies; 14+ messages in thread
From: Roland McGrath @ 1999-08-30 13:03 UTC (permalink / raw)
  To: Ulrich Drepper; +Cc: libc-hacker

> Roland McGrath <frob@MIT.EDU> writes:
> 
> > Ah, yes.  That is what you will get for:
> > 
> > 	extern struct foo seg asm("%gs:0");
> > 	... = &seg;
> 
> We'll see whether we can work around this.  Maybe it's never necessary.

That's what I was thinking.  If only macros that use this is are "access
member of current thread's descriptor", then it should never come up at all.
But you probably do need to completely avoid assigning any variables to
`&seg', because without optimization the compiler will emit the bogus
instructions attempting to take the address of the segment register.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: New pthreads library
  1999-08-30  0:16           ` Roland McGrath
  1999-08-30  8:29             ` Ulrich Drepper
@ 1999-08-30 14:11             ` Richard Henderson
  1999-08-30 14:23               ` Roland McGrath
  1 sibling, 1 reply; 14+ messages in thread
From: Richard Henderson @ 1999-08-30 14:11 UTC (permalink / raw)
  To: Roland McGrath
  Cc: Ulrich Drepper, Mark Kettenis, libc-hacker, Thomas Bushnell

On Mon, Aug 30, 1999 at 03:15:52AM -0400, Roland McGrath wrote:
> 	extern struct foo seg __asm__("%gs:0");
> 	static inline struct foo *
> 	the_foo(void) { return &seg; }
> 
> 	int foobar(int x)
> 	{
> 	  struct foo *me = the_foo();
> 	  return x + me->y;
> 	}
> 
> This compiles to:
> 
> 	foobar:
> 		pushl %ebp
> 		movl %esp,%ebp
> 		movl %gs:0+4,%eax

This is pretty cool, but it won't work with -fpic.  There 
we'll try something like

	movl	%gs:0@GOT(%ebx),%eax
	movl	4(%eax), %eax


r~

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: New pthreads library
  1999-08-30 14:11             ` Richard Henderson
@ 1999-08-30 14:23               ` Roland McGrath
  1999-08-30 14:25                 ` Richard Henderson
  0 siblings, 1 reply; 14+ messages in thread
From: Roland McGrath @ 1999-08-30 14:23 UTC (permalink / raw)
  To: Richard Henderson
  Cc: Ulrich Drepper, Mark Kettenis, libc-hacker, Thomas Bushnell

> This is pretty cool, but it won't work with -fpic.  There 
> we'll try something like
> 
> 	movl	%gs:0@GOT(%ebx),%eax
> 	movl	4(%eax), %eax

Nag dab it!  I knew there was something.  Oh well.
I don't suppose it would be worth adding __attribute__((segment ("%gs")))
on extern decls.  

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: New pthreads library
  1999-08-30 14:23               ` Roland McGrath
@ 1999-08-30 14:25                 ` Richard Henderson
  0 siblings, 0 replies; 14+ messages in thread
From: Richard Henderson @ 1999-08-30 14:25 UTC (permalink / raw)
  To: Roland McGrath
  Cc: Ulrich Drepper, Mark Kettenis, libc-hacker, Thomas Bushnell

On Mon, Aug 30, 1999 at 05:15:49PM -0400, Roland McGrath wrote:
> I don't suppose it would be worth adding __attribute__((segment ("%gs")))
> on extern decls.  

Without additional work, that wouldn't be able to accomplish
what you want.  What you _really_ want is something like

  int * __attribute__((seggs)) ptr;

so that %gs is used whenever ptr is dereferenced.  Then you'd do

  struct stuff {
  	int x, y, x;
  };

  #define thread_ptr	((struct stuff * __attribute__((seggs))) 0)

  thread_ptr->y;

The DSP folks would love for us to be able to represent something
like this, since differing address spaces for them have different
scheduling characteristics.  (E.g.  you may schedule two loads in
the same cycle, but only if one is from X memory and the other from
Y memory.)

r~

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~1999-08-30 14:25 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
1999-08-25 15:47 New pthreads library Mark Kettenis
1999-08-27 23:51 ` Ulrich Drepper
1999-08-29 15:24   ` Mark Kettenis
1999-08-29 16:05     ` Ulrich Drepper
1999-08-29 22:33       ` Roland McGrath
1999-08-29 23:10         ` Ulrich Drepper
1999-08-30  0:16           ` Roland McGrath
1999-08-30  8:29             ` Ulrich Drepper
1999-08-30 12:12               ` Roland McGrath
1999-08-30 12:34                 ` Ulrich Drepper
1999-08-30 13:03                   ` Roland McGrath
1999-08-30 14:11             ` Richard Henderson
1999-08-30 14:23               ` Roland McGrath
1999-08-30 14:25                 ` Richard Henderson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).