public inbox for libc-hacker@sourceware.org
 help / color / mirror / Atom feed
From: Steve Munroe <sjmunroe@us.ibm.com>
To: libc-hacker@sources.redhat.com
Subject: Dealing with multiple page sizes in NPTL
Date: Wed, 21 Sep 2005 14:06:00 -0000	[thread overview]
Message-ID: <OFA7EBC397.0036E69E-ON86257082.0059A092-86257083.0004EB19@us.ibm.com> (raw)


The recently announced POWER5+ hardware now supports 4KB, 64KB, and 16KB
pages sizes. For large systems used in High Performance Computing or large
scale data base applications a larger page size makes the TLB more
effective and boosts performance.

At this year's OLS there was discussion of adding a kernel option to
support 64KB (vs 4KB) as the base page size in these environments. I
suspect that a larger base page is an issue for IA64 as well. This raises
the possibility that the page size may change depending on which kernel was
booted for that machine. This page size will be reported via AT_PAGESZ but
the question is how well does glibc respond to that value not be a constant
4096.

Amazing well. Most of glibc depends on GLRO(dl_pagesize), __getpagesize() ,
or _sysconf(_SC_PAGESIZE), which are derived from AT_PAGESZ either directly
or indirectly. Even Linuxthreads behaves correctly because it uses the
following definition:

   /* The page size we can get from the system.  This should likely not be
      changed by the machine file but, you never know.  */
   #ifndef PAGE_SIZE
   #define PAGE_SIZE  (sysconf (_SC_PAGE_SIZE))
   #endif

NPTL however has a problem where I found pthread_create was returning
EINVAL due to the following code in allocate_stack() in
glibc/nptl/allocatestack.c.

      guardsize = (attr->guardsize + pagesize_m1) & ~pagesize_m1;
      if (__builtin_expect (size < (guardsize + __static_tls_size
                                    + MINIMAL_REST_STACK + pagesize_m1 +
1),
                            0))
        /* The stack is too small (or the guard too large).  */
        return EINVAL;

I found that the minimum stack size that allowed a thread to be created on
a 64K-page kernel is 135296, which equals 65536 + 128 + 4096 + 65535 + 1.
The guardsize and pagesize_m1 are computed from __getpagesize() but the
default value of size is not.

If the pthread_attr does no provide the stacksize attribute the
__default_stacksize value is used:

     /* Get the stack size from the attribute if it is set.  Otherwise we
        use the default we determined at start time.  */
     size = attr->stacksize ?: __default_stacksize;

The problem is the initialization of __default_stacksize which occurs in
nptl/init.c and nptl/vars.c. It looks like vars.c handles initialization
for the static case:

   /* Default stack size.  */
   size_t __default_stacksize attribute_hidden
   #ifdef SHARED
   ;
   #else
     = PTHREAD_STACK_MIN;
   #endif

And init.c (__pthread_initialize_minimal_internal) handles initialization
for the dynamic case:

     if (getrlimit (RLIMIT_STACK, &limit) != 0
         || limit.rlim_cur == RLIM_INFINITY)
       /* The system limit is not usable.  Use an architecture-specific
          default.  */
       __default_stacksize = ARCH_STACK_DEFAULT_SIZE;
     else if (limit.rlim_cur < PTHREAD_STACK_MIN)
       /* The system limit is unusably small.
          Use the minimal size acceptable.  */
       __default_stacksize = PTHREAD_STACK_MIN;
     else
      ....

The default value of PTHREAD_STACK_MIN is 16384 which too small for a 64KB
page. The minimum needs to be at least 2 pages (128KB) one for the
guardpage and one or more pages to hold the; minimum stack, thread struct,
and static TLS storage.

The seemingly simple solution is to use something like:

   #define  PTHREAD_STACK_MIN  (2 * __getpagesize())

but this causes other problems. The conditional:

   #if PTHREAD_STACK_MIN == 16384
   weak_alias (__pthread_attr_setstacksize, pthread_attr_setstacksize)
   #else
   versioned_symbol (libpthread, __pthread_attr_setstacksize,
                     pthread_attr_setstacksize, GLIBC_2_3_3);
   ...
   #endif

is used in several places to determine if versioning is required. These and
the static assignment in nptl/vars.c will not compile unless
PTHREAD_STACK_MIN is constant. So there is a structual problem of how make
a variable or at least how to set __default_stacksize correctly when
AT_PAGESZ > __default_stacksize. The dynamic case can be addressed with:

     if (__default_stacksize < (2 * __getpagesize()))
       /* The default_stacksize must be at least 2 pages.  */
       __default_stacksize = (2 * __getpagesize());
      ....

in __pthread_initialize_minimal_internal. It is not clear how best to
address the static case. It also seems that the formula in allocatestack()
needs to change to something like:

      guardsize = (attr->guardsize + pagesize_m1) & ~pagesize_m1;
      if (__builtin_expect (size < ((guardsize + __static_tls_size
                                    + MINIMAL_REST_STACK
                                    + pagesize_m1) & ~pagesize_m1,
                            0))
        /* The stack is too small (or the guard too large).  */
        return EINVAL;

Steven J. Munroe
Linux on Power Toolchain Architect
IBM Corporation, Linux Technology Center

             reply	other threads:[~2005-09-21 14:06 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-09-21 14:06 Steve Munroe [this message]
2005-10-16 12:54 ` Roland McGrath
2005-10-17 22:09   ` Steve Munroe
2005-11-03 15:33     ` Mark Brown
2005-10-27 18:29   ` Steve Munroe
2005-11-17 14:54   ` Steve Munroe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=OFA7EBC397.0036E69E-ON86257082.0059A092-86257083.0004EB19@us.ibm.com \
    --to=sjmunroe@us.ibm.com \
    --cc=libc-hacker@sources.redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).