public inbox for libc-hacker@sourceware.org
 help / color / mirror / Atom feed
* Dealing with multiple page sizes in NPTL
@ 2005-09-21 14:06 Steve Munroe
  2005-10-16 12:54 ` Roland McGrath
  0 siblings, 1 reply; 6+ messages in thread
From: Steve Munroe @ 2005-09-21 14:06 UTC (permalink / raw)
  To: libc-hacker


The recently announced POWER5+ hardware now supports 4KB, 64KB, and 16KB
pages sizes. For large systems used in High Performance Computing or large
scale data base applications a larger page size makes the TLB more
effective and boosts performance.

At this year's OLS there was discussion of adding a kernel option to
support 64KB (vs 4KB) as the base page size in these environments. I
suspect that a larger base page is an issue for IA64 as well. This raises
the possibility that the page size may change depending on which kernel was
booted for that machine. This page size will be reported via AT_PAGESZ but
the question is how well does glibc respond to that value not be a constant
4096.

Amazing well. Most of glibc depends on GLRO(dl_pagesize), __getpagesize() ,
or _sysconf(_SC_PAGESIZE), which are derived from AT_PAGESZ either directly
or indirectly. Even Linuxthreads behaves correctly because it uses the
following definition:

   /* The page size we can get from the system.  This should likely not be
      changed by the machine file but, you never know.  */
   #ifndef PAGE_SIZE
   #define PAGE_SIZE  (sysconf (_SC_PAGE_SIZE))
   #endif

NPTL however has a problem where I found pthread_create was returning
EINVAL due to the following code in allocate_stack() in
glibc/nptl/allocatestack.c.

      guardsize = (attr->guardsize + pagesize_m1) & ~pagesize_m1;
      if (__builtin_expect (size < (guardsize + __static_tls_size
                                    + MINIMAL_REST_STACK + pagesize_m1 +
1),
                            0))
        /* The stack is too small (or the guard too large).  */
        return EINVAL;

I found that the minimum stack size that allowed a thread to be created on
a 64K-page kernel is 135296, which equals 65536 + 128 + 4096 + 65535 + 1.
The guardsize and pagesize_m1 are computed from __getpagesize() but the
default value of size is not.

If the pthread_attr does no provide the stacksize attribute the
__default_stacksize value is used:

     /* Get the stack size from the attribute if it is set.  Otherwise we
        use the default we determined at start time.  */
     size = attr->stacksize ?: __default_stacksize;

The problem is the initialization of __default_stacksize which occurs in
nptl/init.c and nptl/vars.c. It looks like vars.c handles initialization
for the static case:

   /* Default stack size.  */
   size_t __default_stacksize attribute_hidden
   #ifdef SHARED
   ;
   #else
     = PTHREAD_STACK_MIN;
   #endif

And init.c (__pthread_initialize_minimal_internal) handles initialization
for the dynamic case:

     if (getrlimit (RLIMIT_STACK, &limit) != 0
         || limit.rlim_cur == RLIM_INFINITY)
       /* The system limit is not usable.  Use an architecture-specific
          default.  */
       __default_stacksize = ARCH_STACK_DEFAULT_SIZE;
     else if (limit.rlim_cur < PTHREAD_STACK_MIN)
       /* The system limit is unusably small.
          Use the minimal size acceptable.  */
       __default_stacksize = PTHREAD_STACK_MIN;
     else
      ....

The default value of PTHREAD_STACK_MIN is 16384 which too small for a 64KB
page. The minimum needs to be at least 2 pages (128KB) one for the
guardpage and one or more pages to hold the; minimum stack, thread struct,
and static TLS storage.

The seemingly simple solution is to use something like:

   #define  PTHREAD_STACK_MIN  (2 * __getpagesize())

but this causes other problems. The conditional:

   #if PTHREAD_STACK_MIN == 16384
   weak_alias (__pthread_attr_setstacksize, pthread_attr_setstacksize)
   #else
   versioned_symbol (libpthread, __pthread_attr_setstacksize,
                     pthread_attr_setstacksize, GLIBC_2_3_3);
   ...
   #endif

is used in several places to determine if versioning is required. These and
the static assignment in nptl/vars.c will not compile unless
PTHREAD_STACK_MIN is constant. So there is a structual problem of how make
a variable or at least how to set __default_stacksize correctly when
AT_PAGESZ > __default_stacksize. The dynamic case can be addressed with:

     if (__default_stacksize < (2 * __getpagesize()))
       /* The default_stacksize must be at least 2 pages.  */
       __default_stacksize = (2 * __getpagesize());
      ....

in __pthread_initialize_minimal_internal. It is not clear how best to
address the static case. It also seems that the formula in allocatestack()
needs to change to something like:

      guardsize = (attr->guardsize + pagesize_m1) & ~pagesize_m1;
      if (__builtin_expect (size < ((guardsize + __static_tls_size
                                    + MINIMAL_REST_STACK
                                    + pagesize_m1) & ~pagesize_m1,
                            0))
        /* The stack is too small (or the guard too large).  */
        return EINVAL;

Steven J. Munroe
Linux on Power Toolchain Architect
IBM Corporation, Linux Technology Center

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Dealing with multiple page sizes in NPTL
  2005-09-21 14:06 Dealing with multiple page sizes in NPTL Steve Munroe
@ 2005-10-16 12:54 ` Roland McGrath
  2005-10-17 22:09   ` Steve Munroe
                     ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Roland McGrath @ 2005-10-16 12:54 UTC (permalink / raw)
  To: Steve Munroe; +Cc: libc-hacker

There are a few issues here.  

Certainly it should choose a default stack size that it will accept.  I've
committed a change to nptl/init.c so that it applies the same minimum (based
on the page size) that the EINVAL check in allocatestack.c will demand.

It's not clear to me why the allocatestack.c check is as it is:

      if (__builtin_expect (size < (guardsize + __static_tls_size
				    + MINIMAL_REST_STACK + pagesize_m1 + 1),

Requiring the guard size plus the minimum usable size plus another page
seems odd.  Off hand, it makes more sense to require the guard size plus
just the minimum usable size, or that rounded up to a page.  But I won't
presume to change this calculation before Ulrich comments on it.  Whatever
the calculation is, the defaulting code in init.c should be updated to match.

A different question is PTHREAD_STACK_MIN.  When an application allocates
its own thread stack, then 16384 may be perfectly adequate regardless of
the page size.  One can make the case that an application writer might want
to minimize memory consumption at the expense of foregoing guard pages.  A
clever application writer might even do his own allocation with guard pages
below but use the rest of the page above the stack for other purposes, when
the page size is much larger than the actual stack requirements.

However, the standard would seem to indicate that an application can use
pthread_attr_setstacksize without pthread_attr_setstackaddr (and rather
than pthread_attr_setstack), passing PTHREAD_STACK_MIN, and expect to
create a thread with a guard page.  For that to work, PTHREAD_STACK_MIN has
to be at least a page over the minimum usable stack (for the guard page).
The standard explicitly mentions that the effective guardsize may be
page-rounded.  I don't think the standard clearly says whether the
guardsize is included in the stacksize, but that's what we do.  That being
the case, PTHREAD_STACK_MIN not being more than a page just makes no sense.
There is an argument to be made that when pthread_attr_setstacksize
succeeds without complaint (because the PTHREAD_STACK_MIN minimum was met),
then pthread_create should not then fail because that size is too small.
The only way to oblige that is to round up the requested size when
allocating, to big enough for the guard page and the minimum usable stack.
But then that runs afoul of the specification that it's "the size of the
thread stack", meaning that the application might expect it to be exact
(with reliable overflow behavior).  

It may well be desireable to change PTHREAD_STACK_MIN.  Some other
platforms have larger sizes (ia64 has 192k, enough for 64k pages to work).
Or, POSIX allows us to omit the compile-time definition entirely, and
oblige applications to use sysconf--then no particular value would be
compiled into applications.  But, doing either of those is an ABI change.
Programs that used PTHREAD_STACK_MIN according to the proper API were
compiled into existing binaries using the 16384 value; those have a right
to keep working.  That's why we have that compatibility code in
pthread_attr_setstacksize and pthread_attr_stack, for the platforms that
got a new ABI in the GLIBC_2.3.3 version set changing the PTHREAD_STACK_MIN
value exposed to applications.

If you want to change PTHREAD_STACK_MIN for powerpc, then you have to add
similar compatibility code for the old ABI that will have to be obsoleted
by a GLIBC_2.4 version of pthread_attr_setstack{,size}.  I would also be
open to removing PTHREAD_STACK_MIN as a compile-time invariant.  That both
requires some more futzing in libc, and has potential fallout in terms of
source compatibility with not-quite-compliant applications that assume
there is a macro giving a constant value.


Thanks,
Roland

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Dealing with multiple page sizes in NPTL
  2005-10-16 12:54 ` Roland McGrath
@ 2005-10-17 22:09   ` Steve Munroe
  2005-11-03 15:33     ` Mark Brown
  2005-10-27 18:29   ` Steve Munroe
  2005-11-17 14:54   ` Steve Munroe
  2 siblings, 1 reply; 6+ messages in thread
From: Steve Munroe @ 2005-10-17 22:09 UTC (permalink / raw)
  To: Roland McGrath; +Cc: libc-hacker

Roland McGrath <roland@redhat.com> wrote on 10/16/2005 07:54:15 AM:

> There are a few issues here. 
> 
> Certainly it should choose a default stack size that it will accept. 
I've
> committed a change to nptl/init.c so that it applies the same minimum 
(based
> on the page size) that the EINVAL check in allocatestack.c will demand.
> 
> It's not clear to me why the allocatestack.c check is as it is:
> 
>       if (__builtin_expect (size < (guardsize + __static_tls_size
>                 + MINIMAL_REST_STACK + pagesize_m1 + 1),
> 
> Requiring the guard size plus the minimum usable size plus another page
> seems odd.  Off hand, it makes more sense to require the guard size plus
> just the minimum usable size, or that rounded up to a page.  But I won't
> presume to change this calculation before Ulrich comments on it. 
Whatever
> the calculation is, the defaulting code in init.c should be updated to 
match.
> 
Looks like your patch still requires a minimum of 3 pages (192KB) as 
written. I had hoped we could change the test in allocatestack.c to allow 
a minimum 2 page stack as long as the constraint pagesize > 
(__static_tls_size + MINIMAL_REST_STACK) was meet.

> A different question is PTHREAD_STACK_MIN.  When an application 
allocates
> its own thread stack, then 16384 may be perfectly adequate regardless of
> the page size.  One can make the case that an application writer might 
want
> to minimize memory consumption at the expense of foregoing guard pages. 
A
> clever application writer might even do his own allocation with guard 
pages
> below but use the rest of the page above the stack for other purposes, 
when
> the page size is much larger than the actual stack requirements.
> 
> However, the standard would seem to indicate that an application can use
> pthread_attr_setstacksize without pthread_attr_setstackaddr (and rather
> than pthread_attr_setstack), passing PTHREAD_STACK_MIN, and expect to
> create a thread with a guard page.  For that to work, PTHREAD_STACK_MIN 
has
> to be at least a page over the minimum usable stack (for the guard 
page).
> The standard explicitly mentions that the effective guardsize may be
> page-rounded.  I don't think the standard clearly says whether the
> guardsize is included in the stacksize, but that's what we do.  That 
being
> the case, PTHREAD_STACK_MIN not being more than a page just makes no 
sense.
> There is an argument to be made that when pthread_attr_setstacksize
> succeeds without complaint (because the PTHREAD_STACK_MIN minimum was 
met),
> then pthread_create should not then fail because that size is too small.
> The only way to oblige that is to round up the requested size when
> allocating, to big enough for the guard page and the minimum usable 
stack.
> But then that runs afoul of the specification that it's "the size of the
> thread stack", meaning that the application might expect it to be exact
> (with reliable overflow behavior).

It seems that stack size, allocation size, and reliable overflow behavior 
are separable issues. The stack may start PTHREAD_STACK_MIN bytes into a 
larger page. As long as the guard page is an appropriate distance from the 
initial stack pointer, exact overflow behavior is maintained. This may 
waste part of larger page, but the user delegated the details to of 
storage allocations to the run-time, and the run-time is doing the best it 
can for the conditions it is given.

> 
> It may well be desireable to change PTHREAD_STACK_MIN.  Some other
> platforms have larger sizes (ia64 has 192k, enough for 64k pages to 
work).
> Or, POSIX allows us to omit the compile-time definition entirely, and
> oblige applications to use sysconf--then no particular value would be
> compiled into applications.  But, doing either of those is an ABI 
change.
> Programs that used PTHREAD_STACK_MIN according to the proper API were
> compiled into existing binaries using the 16384 value; those have a 
right
> to keep working.  That's why we have that compatibility code in
> pthread_attr_setstacksize and pthread_attr_stack, for the platforms that
> got a new ABI in the GLIBC_2.3.3 version set changing the 
PTHREAD_STACK_MIN
> value exposed to applications.
> 
I am reluctant to change PTHREAD_STACK_MIN for powerpc32 because as you 
say clever applications may want the smaller stack even with the 64K page. 
This the feedback I get from the IBM Java developers and performance 
teams. The concern is that forcing the minimum to 128KB (or 192KB) reduces 
the max threads, max heap, or both for the 32-bit JVM. 

Changing PTHREAD_STACK_MIN for powerpc64 is more viable, but as you say we 
would have to version and find someway to make old applications work with 
64K pages. I would prefer that the new minimum would be 128K (not 192K). 
This requires changing the test in allocatestack.c.

> If you want to change PTHREAD_STACK_MIN for powerpc, then you have to 
add
> similar compatibility code for the old ABI that will have to be 
obsoleted
> by a GLIBC_2.4 version of pthread_attr_setstack{,size}.  I would also be
> open to removing PTHREAD_STACK_MIN as a compile-time invariant.  That 
both
> requires some more futzing in libc, and has potential fallout in terms 
of
> source compatibility with not-quite-compliant applications that assume
> there is a macro giving a constant value.
> 
I to avoid (making PTHREAD_STACK_MIN non-constant) if we can. I fear that 
some of these not-quite-compliant applications might be DB2 or Oracle...

Thanks

Steven J. Munroe
Linux on Power Toolchain Architect
IBM Corporation, Linux Technology Center

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Dealing with multiple page sizes in NPTL
  2005-10-16 12:54 ` Roland McGrath
  2005-10-17 22:09   ` Steve Munroe
@ 2005-10-27 18:29   ` Steve Munroe
  2005-11-17 14:54   ` Steve Munroe
  2 siblings, 0 replies; 6+ messages in thread
From: Steve Munroe @ 2005-10-27 18:29 UTC (permalink / raw)
  To: Ulrich Drepper, libc-hacker; +Cc: Roland McGrath

Uli would you take a look at the minimum stack size test in 
nptl/allocatestack.c (allocate_stack):

      /* Make sure the size of the stack is enough for the guard and
         eventually the thread descriptor.  */
      guardsize = (attr->guardsize + pagesize_m1) & ~pagesize_m1;
      if (__builtin_expect (size < (guardsize + __static_tls_size
                                    + MINIMAL_REST_STACK + pagesize_m1 + 
1),
                            0))
        /* The stack is too small (or the guard too large).  */
        return EINVAL;

I don't think this test is correct and it causes pthread_create failures 
with the alternate 64KB page support in powerpc. It seems odd that this 
code rounds the gaurdpage but simply adds s full page to the thread stack 
without rounding. As is this code forces a 3 page minimum even when that 
page size is larger then PTHREAD_STACK_MIN. I suggest that something like.

      if (__builtin_expect (size < ((guardsize + __static_tls_size
                                    + MINIMAL_REST_STACK + pagesize_m1)  & 
~pagesize_m1),
                            0))

Which should continue to work for existing systems and still enforces the 
2 page minimum for larger page sizes (pagesize > PTHREAD_STACK_MIN).

Thanks.


Steven J. Munroe
Linux on Power Toolchain Architect
IBM Corporation, Linux Technology Center

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Dealing with multiple page sizes in NPTL
  2005-10-17 22:09   ` Steve Munroe
@ 2005-11-03 15:33     ` Mark Brown
  0 siblings, 0 replies; 6+ messages in thread
From: Mark Brown @ 2005-11-03 15:33 UTC (permalink / raw)
  To: Steve Munroe; +Cc: libc-hacker, Roland McGrath

Steve, Roland,


Steve Munroe wrote:
> Roland McGrath <roland@redhat.com> wrote on 10/16/2005 07:54:15 AM:
> 
> > A different question is PTHREAD_STACK_MIN.  When an application 
> > allocates
> > its own thread stack, then 16384 may be perfectly adequate regardless 
of
> > the page size.  One can make the case that an application writer might 

> > want
> > to minimize memory consumption at the expense of foregoing guard 
pages. 
> > A
> > clever application writer might even do his own allocation with guard 
> > pages
> > below but use the rest of the page above the stack for other purposes, 

> > when
> > the page size is much larger than the actual stack requirements.
> > 
> > However, the standard would seem to indicate that an application can 
use
> > pthread_attr_setstacksize without pthread_attr_setstackaddr (and 
rather
> > than pthread_attr_setstack), passing PTHREAD_STACK_MIN, and expect to
> > create a thread with a guard page.  For that to work, 
PTHREAD_STACK_MIN 
> > has
> > to be at least a page over the minimum usable stack (for the guard 
> > page).
> > The standard explicitly mentions that the effective guardsize may be
> > page-rounded.  I don't think the standard clearly says whether the
> > guardsize is included in the stacksize, but that's what we do.  That 
> > being
> > the case, PTHREAD_STACK_MIN not being more than a page just makes no 
> > sense.
> > There is an argument to be made that when pthread_attr_setstacksize
> > succeeds without complaint (because the PTHREAD_STACK_MIN minimum was 
> > met),
> > then pthread_create should not then fail because that size is too 
small.
> > The only way to oblige that is to round up the requested size when
> > allocating, to big enough for the guard page and the minimum usable 
> > stack.
> > But then that runs afoul of the specification that it's "the size of 
the
> > thread stack", meaning that the application might expect it to be 
exact
> > (with reliable overflow behavior).

The Standard is unclear on whether PTHREAD_STACK_MIN includes the 
guard page, or if the guard page is considered to be allocated in addition 
to
the stack size requested by the application. The Joint Working Group is
happy with this being unspecified (meaning it can be implemented either
way and be conforming); I brought the question up, Ulrich replied with how
glibc implements it, and there are no objections at present even from
those who implement it the other way (as an addition to requested size).


-------------------

bmark@us.ibm.com

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Dealing with multiple page sizes in NPTL
  2005-10-16 12:54 ` Roland McGrath
  2005-10-17 22:09   ` Steve Munroe
  2005-10-27 18:29   ` Steve Munroe
@ 2005-11-17 14:54   ` Steve Munroe
  2 siblings, 0 replies; 6+ messages in thread
From: Steve Munroe @ 2005-11-17 14:54 UTC (permalink / raw)
  To: Roland McGrath; +Cc: libc-hacker, Mark Brown

Mark Brown tells me we have agreement on this topic but I not seen any 
changes upstream.

Are you waiting for a patch from me on this?

Steven J. Munroe
Linux on Power Toolchain Architect
IBM Corporation, Linux Technology Center

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2005-11-17 14:54 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-09-21 14:06 Dealing with multiple page sizes in NPTL Steve Munroe
2005-10-16 12:54 ` Roland McGrath
2005-10-17 22:09   ` Steve Munroe
2005-11-03 15:33     ` Mark Brown
2005-10-27 18:29   ` Steve Munroe
2005-11-17 14:54   ` Steve Munroe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).