* Dealing with multiple page sizes in NPTL @ 2005-09-21 14:06 Steve Munroe 2005-10-16 12:54 ` Roland McGrath 0 siblings, 1 reply; 6+ messages in thread From: Steve Munroe @ 2005-09-21 14:06 UTC (permalink / raw) To: libc-hacker The recently announced POWER5+ hardware now supports 4KB, 64KB, and 16KB pages sizes. For large systems used in High Performance Computing or large scale data base applications a larger page size makes the TLB more effective and boosts performance. At this year's OLS there was discussion of adding a kernel option to support 64KB (vs 4KB) as the base page size in these environments. I suspect that a larger base page is an issue for IA64 as well. This raises the possibility that the page size may change depending on which kernel was booted for that machine. This page size will be reported via AT_PAGESZ but the question is how well does glibc respond to that value not be a constant 4096. Amazing well. Most of glibc depends on GLRO(dl_pagesize), __getpagesize() , or _sysconf(_SC_PAGESIZE), which are derived from AT_PAGESZ either directly or indirectly. Even Linuxthreads behaves correctly because it uses the following definition: /* The page size we can get from the system. This should likely not be changed by the machine file but, you never know. */ #ifndef PAGE_SIZE #define PAGE_SIZE (sysconf (_SC_PAGE_SIZE)) #endif NPTL however has a problem where I found pthread_create was returning EINVAL due to the following code in allocate_stack() in glibc/nptl/allocatestack.c. guardsize = (attr->guardsize + pagesize_m1) & ~pagesize_m1; if (__builtin_expect (size < (guardsize + __static_tls_size + MINIMAL_REST_STACK + pagesize_m1 + 1), 0)) /* The stack is too small (or the guard too large). */ return EINVAL; I found that the minimum stack size that allowed a thread to be created on a 64K-page kernel is 135296, which equals 65536 + 128 + 4096 + 65535 + 1. The guardsize and pagesize_m1 are computed from __getpagesize() but the default value of size is not. If the pthread_attr does no provide the stacksize attribute the __default_stacksize value is used: /* Get the stack size from the attribute if it is set. Otherwise we use the default we determined at start time. */ size = attr->stacksize ?: __default_stacksize; The problem is the initialization of __default_stacksize which occurs in nptl/init.c and nptl/vars.c. It looks like vars.c handles initialization for the static case: /* Default stack size. */ size_t __default_stacksize attribute_hidden #ifdef SHARED ; #else = PTHREAD_STACK_MIN; #endif And init.c (__pthread_initialize_minimal_internal) handles initialization for the dynamic case: if (getrlimit (RLIMIT_STACK, &limit) != 0 || limit.rlim_cur == RLIM_INFINITY) /* The system limit is not usable. Use an architecture-specific default. */ __default_stacksize = ARCH_STACK_DEFAULT_SIZE; else if (limit.rlim_cur < PTHREAD_STACK_MIN) /* The system limit is unusably small. Use the minimal size acceptable. */ __default_stacksize = PTHREAD_STACK_MIN; else .... The default value of PTHREAD_STACK_MIN is 16384 which too small for a 64KB page. The minimum needs to be at least 2 pages (128KB) one for the guardpage and one or more pages to hold the; minimum stack, thread struct, and static TLS storage. The seemingly simple solution is to use something like: #define PTHREAD_STACK_MIN (2 * __getpagesize()) but this causes other problems. The conditional: #if PTHREAD_STACK_MIN == 16384 weak_alias (__pthread_attr_setstacksize, pthread_attr_setstacksize) #else versioned_symbol (libpthread, __pthread_attr_setstacksize, pthread_attr_setstacksize, GLIBC_2_3_3); ... #endif is used in several places to determine if versioning is required. These and the static assignment in nptl/vars.c will not compile unless PTHREAD_STACK_MIN is constant. So there is a structual problem of how make a variable or at least how to set __default_stacksize correctly when AT_PAGESZ > __default_stacksize. The dynamic case can be addressed with: if (__default_stacksize < (2 * __getpagesize())) /* The default_stacksize must be at least 2 pages. */ __default_stacksize = (2 * __getpagesize()); .... in __pthread_initialize_minimal_internal. It is not clear how best to address the static case. It also seems that the formula in allocatestack() needs to change to something like: guardsize = (attr->guardsize + pagesize_m1) & ~pagesize_m1; if (__builtin_expect (size < ((guardsize + __static_tls_size + MINIMAL_REST_STACK + pagesize_m1) & ~pagesize_m1, 0)) /* The stack is too small (or the guard too large). */ return EINVAL; Steven J. Munroe Linux on Power Toolchain Architect IBM Corporation, Linux Technology Center ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Dealing with multiple page sizes in NPTL 2005-09-21 14:06 Dealing with multiple page sizes in NPTL Steve Munroe @ 2005-10-16 12:54 ` Roland McGrath 2005-10-17 22:09 ` Steve Munroe ` (2 more replies) 0 siblings, 3 replies; 6+ messages in thread From: Roland McGrath @ 2005-10-16 12:54 UTC (permalink / raw) To: Steve Munroe; +Cc: libc-hacker There are a few issues here. Certainly it should choose a default stack size that it will accept. I've committed a change to nptl/init.c so that it applies the same minimum (based on the page size) that the EINVAL check in allocatestack.c will demand. It's not clear to me why the allocatestack.c check is as it is: if (__builtin_expect (size < (guardsize + __static_tls_size + MINIMAL_REST_STACK + pagesize_m1 + 1), Requiring the guard size plus the minimum usable size plus another page seems odd. Off hand, it makes more sense to require the guard size plus just the minimum usable size, or that rounded up to a page. But I won't presume to change this calculation before Ulrich comments on it. Whatever the calculation is, the defaulting code in init.c should be updated to match. A different question is PTHREAD_STACK_MIN. When an application allocates its own thread stack, then 16384 may be perfectly adequate regardless of the page size. One can make the case that an application writer might want to minimize memory consumption at the expense of foregoing guard pages. A clever application writer might even do his own allocation with guard pages below but use the rest of the page above the stack for other purposes, when the page size is much larger than the actual stack requirements. However, the standard would seem to indicate that an application can use pthread_attr_setstacksize without pthread_attr_setstackaddr (and rather than pthread_attr_setstack), passing PTHREAD_STACK_MIN, and expect to create a thread with a guard page. For that to work, PTHREAD_STACK_MIN has to be at least a page over the minimum usable stack (for the guard page). The standard explicitly mentions that the effective guardsize may be page-rounded. I don't think the standard clearly says whether the guardsize is included in the stacksize, but that's what we do. That being the case, PTHREAD_STACK_MIN not being more than a page just makes no sense. There is an argument to be made that when pthread_attr_setstacksize succeeds without complaint (because the PTHREAD_STACK_MIN minimum was met), then pthread_create should not then fail because that size is too small. The only way to oblige that is to round up the requested size when allocating, to big enough for the guard page and the minimum usable stack. But then that runs afoul of the specification that it's "the size of the thread stack", meaning that the application might expect it to be exact (with reliable overflow behavior). It may well be desireable to change PTHREAD_STACK_MIN. Some other platforms have larger sizes (ia64 has 192k, enough for 64k pages to work). Or, POSIX allows us to omit the compile-time definition entirely, and oblige applications to use sysconf--then no particular value would be compiled into applications. But, doing either of those is an ABI change. Programs that used PTHREAD_STACK_MIN according to the proper API were compiled into existing binaries using the 16384 value; those have a right to keep working. That's why we have that compatibility code in pthread_attr_setstacksize and pthread_attr_stack, for the platforms that got a new ABI in the GLIBC_2.3.3 version set changing the PTHREAD_STACK_MIN value exposed to applications. If you want to change PTHREAD_STACK_MIN for powerpc, then you have to add similar compatibility code for the old ABI that will have to be obsoleted by a GLIBC_2.4 version of pthread_attr_setstack{,size}. I would also be open to removing PTHREAD_STACK_MIN as a compile-time invariant. That both requires some more futzing in libc, and has potential fallout in terms of source compatibility with not-quite-compliant applications that assume there is a macro giving a constant value. Thanks, Roland ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Dealing with multiple page sizes in NPTL 2005-10-16 12:54 ` Roland McGrath @ 2005-10-17 22:09 ` Steve Munroe 2005-11-03 15:33 ` Mark Brown 2005-10-27 18:29 ` Steve Munroe 2005-11-17 14:54 ` Steve Munroe 2 siblings, 1 reply; 6+ messages in thread From: Steve Munroe @ 2005-10-17 22:09 UTC (permalink / raw) To: Roland McGrath; +Cc: libc-hacker Roland McGrath <roland@redhat.com> wrote on 10/16/2005 07:54:15 AM: > There are a few issues here. > > Certainly it should choose a default stack size that it will accept. I've > committed a change to nptl/init.c so that it applies the same minimum (based > on the page size) that the EINVAL check in allocatestack.c will demand. > > It's not clear to me why the allocatestack.c check is as it is: > > if (__builtin_expect (size < (guardsize + __static_tls_size > + MINIMAL_REST_STACK + pagesize_m1 + 1), > > Requiring the guard size plus the minimum usable size plus another page > seems odd. Off hand, it makes more sense to require the guard size plus > just the minimum usable size, or that rounded up to a page. But I won't > presume to change this calculation before Ulrich comments on it. Whatever > the calculation is, the defaulting code in init.c should be updated to match. > Looks like your patch still requires a minimum of 3 pages (192KB) as written. I had hoped we could change the test in allocatestack.c to allow a minimum 2 page stack as long as the constraint pagesize > (__static_tls_size + MINIMAL_REST_STACK) was meet. > A different question is PTHREAD_STACK_MIN. When an application allocates > its own thread stack, then 16384 may be perfectly adequate regardless of > the page size. One can make the case that an application writer might want > to minimize memory consumption at the expense of foregoing guard pages. A > clever application writer might even do his own allocation with guard pages > below but use the rest of the page above the stack for other purposes, when > the page size is much larger than the actual stack requirements. > > However, the standard would seem to indicate that an application can use > pthread_attr_setstacksize without pthread_attr_setstackaddr (and rather > than pthread_attr_setstack), passing PTHREAD_STACK_MIN, and expect to > create a thread with a guard page. For that to work, PTHREAD_STACK_MIN has > to be at least a page over the minimum usable stack (for the guard page). > The standard explicitly mentions that the effective guardsize may be > page-rounded. I don't think the standard clearly says whether the > guardsize is included in the stacksize, but that's what we do. That being > the case, PTHREAD_STACK_MIN not being more than a page just makes no sense. > There is an argument to be made that when pthread_attr_setstacksize > succeeds without complaint (because the PTHREAD_STACK_MIN minimum was met), > then pthread_create should not then fail because that size is too small. > The only way to oblige that is to round up the requested size when > allocating, to big enough for the guard page and the minimum usable stack. > But then that runs afoul of the specification that it's "the size of the > thread stack", meaning that the application might expect it to be exact > (with reliable overflow behavior). It seems that stack size, allocation size, and reliable overflow behavior are separable issues. The stack may start PTHREAD_STACK_MIN bytes into a larger page. As long as the guard page is an appropriate distance from the initial stack pointer, exact overflow behavior is maintained. This may waste part of larger page, but the user delegated the details to of storage allocations to the run-time, and the run-time is doing the best it can for the conditions it is given. > > It may well be desireable to change PTHREAD_STACK_MIN. Some other > platforms have larger sizes (ia64 has 192k, enough for 64k pages to work). > Or, POSIX allows us to omit the compile-time definition entirely, and > oblige applications to use sysconf--then no particular value would be > compiled into applications. But, doing either of those is an ABI change. > Programs that used PTHREAD_STACK_MIN according to the proper API were > compiled into existing binaries using the 16384 value; those have a right > to keep working. That's why we have that compatibility code in > pthread_attr_setstacksize and pthread_attr_stack, for the platforms that > got a new ABI in the GLIBC_2.3.3 version set changing the PTHREAD_STACK_MIN > value exposed to applications. > I am reluctant to change PTHREAD_STACK_MIN for powerpc32 because as you say clever applications may want the smaller stack even with the 64K page. This the feedback I get from the IBM Java developers and performance teams. The concern is that forcing the minimum to 128KB (or 192KB) reduces the max threads, max heap, or both for the 32-bit JVM. Changing PTHREAD_STACK_MIN for powerpc64 is more viable, but as you say we would have to version and find someway to make old applications work with 64K pages. I would prefer that the new minimum would be 128K (not 192K). This requires changing the test in allocatestack.c. > If you want to change PTHREAD_STACK_MIN for powerpc, then you have to add > similar compatibility code for the old ABI that will have to be obsoleted > by a GLIBC_2.4 version of pthread_attr_setstack{,size}. I would also be > open to removing PTHREAD_STACK_MIN as a compile-time invariant. That both > requires some more futzing in libc, and has potential fallout in terms of > source compatibility with not-quite-compliant applications that assume > there is a macro giving a constant value. > I to avoid (making PTHREAD_STACK_MIN non-constant) if we can. I fear that some of these not-quite-compliant applications might be DB2 or Oracle... Thanks Steven J. Munroe Linux on Power Toolchain Architect IBM Corporation, Linux Technology Center ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Dealing with multiple page sizes in NPTL 2005-10-17 22:09 ` Steve Munroe @ 2005-11-03 15:33 ` Mark Brown 0 siblings, 0 replies; 6+ messages in thread From: Mark Brown @ 2005-11-03 15:33 UTC (permalink / raw) To: Steve Munroe; +Cc: libc-hacker, Roland McGrath Steve, Roland, Steve Munroe wrote: > Roland McGrath <roland@redhat.com> wrote on 10/16/2005 07:54:15 AM: > > > A different question is PTHREAD_STACK_MIN. When an application > > allocates > > its own thread stack, then 16384 may be perfectly adequate regardless of > > the page size. One can make the case that an application writer might > > want > > to minimize memory consumption at the expense of foregoing guard pages. > > A > > clever application writer might even do his own allocation with guard > > pages > > below but use the rest of the page above the stack for other purposes, > > when > > the page size is much larger than the actual stack requirements. > > > > However, the standard would seem to indicate that an application can use > > pthread_attr_setstacksize without pthread_attr_setstackaddr (and rather > > than pthread_attr_setstack), passing PTHREAD_STACK_MIN, and expect to > > create a thread with a guard page. For that to work, PTHREAD_STACK_MIN > > has > > to be at least a page over the minimum usable stack (for the guard > > page). > > The standard explicitly mentions that the effective guardsize may be > > page-rounded. I don't think the standard clearly says whether the > > guardsize is included in the stacksize, but that's what we do. That > > being > > the case, PTHREAD_STACK_MIN not being more than a page just makes no > > sense. > > There is an argument to be made that when pthread_attr_setstacksize > > succeeds without complaint (because the PTHREAD_STACK_MIN minimum was > > met), > > then pthread_create should not then fail because that size is too small. > > The only way to oblige that is to round up the requested size when > > allocating, to big enough for the guard page and the minimum usable > > stack. > > But then that runs afoul of the specification that it's "the size of the > > thread stack", meaning that the application might expect it to be exact > > (with reliable overflow behavior). The Standard is unclear on whether PTHREAD_STACK_MIN includes the guard page, or if the guard page is considered to be allocated in addition to the stack size requested by the application. The Joint Working Group is happy with this being unspecified (meaning it can be implemented either way and be conforming); I brought the question up, Ulrich replied with how glibc implements it, and there are no objections at present even from those who implement it the other way (as an addition to requested size). ------------------- bmark@us.ibm.com ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Dealing with multiple page sizes in NPTL 2005-10-16 12:54 ` Roland McGrath 2005-10-17 22:09 ` Steve Munroe @ 2005-10-27 18:29 ` Steve Munroe 2005-11-17 14:54 ` Steve Munroe 2 siblings, 0 replies; 6+ messages in thread From: Steve Munroe @ 2005-10-27 18:29 UTC (permalink / raw) To: Ulrich Drepper, libc-hacker; +Cc: Roland McGrath Uli would you take a look at the minimum stack size test in nptl/allocatestack.c (allocate_stack): /* Make sure the size of the stack is enough for the guard and eventually the thread descriptor. */ guardsize = (attr->guardsize + pagesize_m1) & ~pagesize_m1; if (__builtin_expect (size < (guardsize + __static_tls_size + MINIMAL_REST_STACK + pagesize_m1 + 1), 0)) /* The stack is too small (or the guard too large). */ return EINVAL; I don't think this test is correct and it causes pthread_create failures with the alternate 64KB page support in powerpc. It seems odd that this code rounds the gaurdpage but simply adds s full page to the thread stack without rounding. As is this code forces a 3 page minimum even when that page size is larger then PTHREAD_STACK_MIN. I suggest that something like. if (__builtin_expect (size < ((guardsize + __static_tls_size + MINIMAL_REST_STACK + pagesize_m1) & ~pagesize_m1), 0)) Which should continue to work for existing systems and still enforces the 2 page minimum for larger page sizes (pagesize > PTHREAD_STACK_MIN). Thanks. Steven J. Munroe Linux on Power Toolchain Architect IBM Corporation, Linux Technology Center ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Dealing with multiple page sizes in NPTL 2005-10-16 12:54 ` Roland McGrath 2005-10-17 22:09 ` Steve Munroe 2005-10-27 18:29 ` Steve Munroe @ 2005-11-17 14:54 ` Steve Munroe 2 siblings, 0 replies; 6+ messages in thread From: Steve Munroe @ 2005-11-17 14:54 UTC (permalink / raw) To: Roland McGrath; +Cc: libc-hacker, Mark Brown Mark Brown tells me we have agreement on this topic but I not seen any changes upstream. Are you waiting for a patch from me on this? Steven J. Munroe Linux on Power Toolchain Architect IBM Corporation, Linux Technology Center ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2005-11-17 14:54 UTC | newest] Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2005-09-21 14:06 Dealing with multiple page sizes in NPTL Steve Munroe 2005-10-16 12:54 ` Roland McGrath 2005-10-17 22:09 ` Steve Munroe 2005-11-03 15:33 ` Mark Brown 2005-10-27 18:29 ` Steve Munroe 2005-11-17 14:54 ` Steve Munroe
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).