public inbox for libc-help@sourceware.org
 help / color / mirror / Atom feed
* Question about the highly optimized aspects of libc implementation
@ 2017-11-30  6:48 Will Hawkins
  2017-11-30  6:54 ` Siddhesh Poyarekar
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Will Hawkins @ 2017-11-30  6:48 UTC (permalink / raw)
  To: libc-help

Hello everyone!

Please let me apologize at the start if this is the wrong venue to ask
this question.

I've been digging through the glibc implementation and looking for
examples of where compiler directives or hand-written assembly have
been used to improve performance at the "expense" of standards or
conventions.

I know that question seems incredibly vague but I'm struggling to put
it into better terms. I am not necessarily looking for places where
certain functions have been highly optimized (ie, the *cpy functions
for x86) -- I can find those relatively easily in the sysdeps.

Maybe an example will help?

Are there cases where function _a_ calls function _b_ without adhering
to the normal calling convention or the platform ABI because the
implementer knows that only function _a_ will call function _b_ and
knows which registers are dead/preserved/etc?

If any such code fragments exist that you think I would find
interesting, I'd love to get some pointers. Don't worry about
explaining what is going on -- I know everyone is incredibly busy! I
can dig into the code myself. I'm just hoping for the community's
expertise on finding /where/ to look!

Again, I know that everyone is incredibly busy and I hope that this
question did not waste anyone's valuable time.

Thanks in advance for any information that you can provide!

Will Hawkins

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Question about the highly optimized aspects of libc implementation
  2017-11-30  6:48 Question about the highly optimized aspects of libc implementation Will Hawkins
@ 2017-11-30  6:54 ` Siddhesh Poyarekar
  2017-11-30  8:22   ` Florian Weimer
  2017-11-30  6:56 ` Carlos O'Donell
  2017-11-30  8:36 ` Florian Weimer
  2 siblings, 1 reply; 9+ messages in thread
From: Siddhesh Poyarekar @ 2017-11-30  6:54 UTC (permalink / raw)
  To: Will Hawkins; +Cc: libc-help

On 30 November 2017 at 12:17, Will Hawkins <whh8b@virginia.edu> wrote:
> Please let me apologize at the start if this is the wrong venue to ask
> this question.

This is the perfect venue for these questions.

> I've been digging through the glibc implementation and looking for
> examples of where compiler directives or hand-written assembly have
> been used to improve performance at the "expense" of standards or
> conventions.
>
> I know that question seems incredibly vague but I'm struggling to put
> it into better terms. I am not necessarily looking for places where
> certain functions have been highly optimized (ie, the *cpy functions
> for x86) -- I can find those relatively easily in the sysdeps.
>
> Maybe an example will help?
>
> Are there cases where function _a_ calls function _b_ without adhering
> to the normal calling convention or the platform ABI because the
> implementer knows that only function _a_ will call function _b_ and
> knows which registers are dead/preserved/etc?

Here's a couple that I know off the top of my head:

There are some ASM implementations that assume that overwriting callee
saved registers is safe without saving them because it *knows* that
the caller doesn't use them.  Some pthread functions in x86/x86_64
were examples of these but a number of them were dropped in the recent
past so I don't know if that still is a case.

A lot of the internal function calls in i686 use registers to pass
arguments instead of the standard ABI that mandates using the stack.

Siddhesh
-- 
http://siddhesh.in

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Question about the highly optimized aspects of libc implementation
  2017-11-30  6:48 Question about the highly optimized aspects of libc implementation Will Hawkins
  2017-11-30  6:54 ` Siddhesh Poyarekar
@ 2017-11-30  6:56 ` Carlos O'Donell
  2017-11-30  7:00   ` Will Hawkins
  2017-11-30  8:36 ` Florian Weimer
  2 siblings, 1 reply; 9+ messages in thread
From: Carlos O'Donell @ 2017-11-30  6:56 UTC (permalink / raw)
  To: Will Hawkins, libc-help

On 11/29/2017 10:47 PM, Will Hawkins wrote:
> I've been digging through the glibc implementation and looking for
> examples of where compiler directives or hand-written assembly have
> been used to improve performance at the "expense" of standards or
> conventions.

Let me address the "standards" side of this...

No such thing exists and has a public interface that conforming programs
can call. Such things may exist internal to glibc, but then we have
control over the API/ABI and can do what we want.

We have some functions which have standards for them, but for which
we don't follow the standard because Linux did it one way and
that's the only supportable way e.g. group scheduling etc.

Let me address the "conventions" side of this...

The entire math library has "cheats" which would be assembly implementations
that have lower accuracy bounds. We have generally tended to remove these
since we want a certain uniform level of accuracy for all the math functions
where possible. One example was the removal of x86 sincos hardware support
for the generic sin/cos support because the x86 hardware version is not
accurate enough (poor range reduction).

Does that answer your question?

-- 
Cheers,
Carlos.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Question about the highly optimized aspects of libc implementation
  2017-11-30  6:56 ` Carlos O'Donell
@ 2017-11-30  7:00   ` Will Hawkins
  0 siblings, 0 replies; 9+ messages in thread
From: Will Hawkins @ 2017-11-30  7:00 UTC (permalink / raw)
  To: Carlos O'Donell, siddhesh.poyarekar; +Cc: libc-help

Mr. O' Donell and Poyarekar,

Forgive the top-posting but I just wanted to quickly thank both of
your for your answers! This is exactly the type of information that I
was hoping to gather. Thank you again!

Will

On Thu, Nov 30, 2017 at 1:56 AM, Carlos O'Donell <carlos@redhat.com> wrote:
> On 11/29/2017 10:47 PM, Will Hawkins wrote:
>> I've been digging through the glibc implementation and looking for
>> examples of where compiler directives or hand-written assembly have
>> been used to improve performance at the "expense" of standards or
>> conventions.
>
> Let me address the "standards" side of this...
>
> No such thing exists and has a public interface that conforming programs
> can call. Such things may exist internal to glibc, but then we have
> control over the API/ABI and can do what we want.
>
> We have some functions which have standards for them, but for which
> we don't follow the standard because Linux did it one way and
> that's the only supportable way e.g. group scheduling etc.
>
> Let me address the "conventions" side of this...
>
> The entire math library has "cheats" which would be assembly implementations
> that have lower accuracy bounds. We have generally tended to remove these
> since we want a certain uniform level of accuracy for all the math functions
> where possible. One example was the removal of x86 sincos hardware support
> for the generic sin/cos support because the x86 hardware version is not
> accurate enough (poor range reduction).
>
> Does that answer your question?
>
> --
> Cheers,
> Carlos.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Question about the highly optimized aspects of libc implementation
  2017-11-30  6:54 ` Siddhesh Poyarekar
@ 2017-11-30  8:22   ` Florian Weimer
  0 siblings, 0 replies; 9+ messages in thread
From: Florian Weimer @ 2017-11-30  8:22 UTC (permalink / raw)
  To: Siddhesh Poyarekar, Will Hawkins; +Cc: libc-help

On 11/30/2017 07:54 AM, Siddhesh Poyarekar wrote:
> A lot of the internal function calls in i686 use registers to pass
> arguments instead of the standard ABI that mandates using the stack.

We remove internal_function, so this no longer happens across 
translation units.  (GCC will still perform the same optimization within 
a translation unit, but that is permitted under the as-if rule.)

Thanks,
Florian

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Question about the highly optimized aspects of libc implementation
  2017-11-30  6:48 Question about the highly optimized aspects of libc implementation Will Hawkins
  2017-11-30  6:54 ` Siddhesh Poyarekar
  2017-11-30  6:56 ` Carlos O'Donell
@ 2017-11-30  8:36 ` Florian Weimer
  2017-12-04  1:15   ` Will Hawkins
  2 siblings, 1 reply; 9+ messages in thread
From: Florian Weimer @ 2017-11-30  8:36 UTC (permalink / raw)
  To: Will Hawkins, libc-help

On 11/30/2017 07:47 AM, Will Hawkins wrote:
> I've been digging through the glibc implementation and looking for
> examples of where compiler directives or hand-written assembly have
> been used to improve performance at the "expense" of standards or
> conventions.

The fcntl implementation calls va_arg on a variadic argument which might 
not actually exist.  The syscall function does something similar (but it 
is actually implemented in machine code, so it's less of a problem).

The NSS internals in general and getaddrinfo in particular call 
functions through a mis-matching function pointer (with an additional 
argument added, or with a void * argument where the function is defined 
with a concrete function pointer).

Calling functions such as getpwuid_r with a pointer which has not been 
allocated on the heap (or reusing an existing allocation for a second 
call) is probably not quite valid C due to aliasing violations.

A lot of the code which manipulates struct 
sockaddr/sockaddr_in/sockaddr_in6 objects does not make the additional 
copies which are needed to avoid aliasing violations.

Do you need more?  I can probably go on for quite some time.

Thanks,
Florian

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Question about the highly optimized aspects of libc implementation
  2017-11-30  8:36 ` Florian Weimer
@ 2017-12-04  1:15   ` Will Hawkins
  2017-12-04  8:12     ` Siddhesh Poyarekar
  2017-12-04 12:15     ` Florian Weimer
  0 siblings, 2 replies; 9+ messages in thread
From: Will Hawkins @ 2017-12-04  1:15 UTC (permalink / raw)
  To: Florian Weimer; +Cc: libc-help

Mr. Weimer,

Thank you for your response and forgive me for not responding more
quickly. I just had a few follow-ups that I've sprinkled below!

On Thu, Nov 30, 2017 at 3:36 AM, Florian Weimer <fweimer@redhat.com> wrote:
> On 11/30/2017 07:47 AM, Will Hawkins wrote:
>>
>> I've been digging through the glibc implementation and looking for
>> examples of where compiler directives or hand-written assembly have
>> been used to improve performance at the "expense" of standards or
>> conventions.
>
>
> The fcntl implementation calls va_arg on a variadic argument which might not
> actually exist.  The syscall function does something similar (but it is
> actually implemented in machine code, so it's less of a problem).

Are you referring to

...
int
__libc_fcntl (int fd, int cmd, ...)
{
  va_list ap;
  void *arg;

  va_start (ap, cmd);
  arg = va_arg (ap, void *);
  va_end (ap);
...

from sysdeps/unix/sysv/linux/fcntl.c?


>
> The NSS internals in general and getaddrinfo in particular call functions
> through a mis-matching function pointer (with an additional argument added,
> or with a void * argument where the function is defined with a concrete
> function pointer).

Are you referring to, for example,

      if (fct != NULL)
        {
          if (req->ai_family == AF_INET6
        || req->ai_family == AF_UNSPEC)
      {
        gethosts (AF_INET6, struct in6_addr);
        no_inet6_data = no_data;
        inet6_status = status;
      }

from sysdeps/posix/getaddrinfo.c where gethosts uses DL_CALL_FCT to
invoke fct as if it were a function that returned void* and with an
additional void* parameter as a result of the call through
_dl_mcount_wrapper_check?

>
> Calling functions such as getpwuid_r with a pointer which has not been
> allocated on the heap (or reusing an existing allocation for a second call)
> is probably not quite valid C due to aliasing violations.
>
> A lot of the code which manipulates struct sockaddr/sockaddr_in/sockaddr_in6
> objects does not make the additional copies which are needed to avoid
> aliasing violations.
>
> Do you need more?  I can probably go on for quite some time.

What you've given me is great! However, if there are other interesting
ones, I'd love to hear them! I love seeing the /expert/ uses of the C
language for learning. You are all amazing craftpersons -- it's great
to watch you work.

Are there any places where, I know this sounds crazy, but functions
are invoked with push/jump (or straight jumps) because, for instance
they are tail calls or somehow the return address is known statically?

Again, thank you so much for taking the time to respond! I really appreciate it!

Will

>
> Thanks,
> Florian

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Question about the highly optimized aspects of libc implementation
  2017-12-04  1:15   ` Will Hawkins
@ 2017-12-04  8:12     ` Siddhesh Poyarekar
  2017-12-04 12:15     ` Florian Weimer
  1 sibling, 0 replies; 9+ messages in thread
From: Siddhesh Poyarekar @ 2017-12-04  8:12 UTC (permalink / raw)
  To: Will Hawkins; +Cc: Florian Weimer, libc-help

On 4 December 2017 at 06:45, Will Hawkins <whh8b@virginia.edu> wrote:
> Are there any places where, I know this sounds crazy, but functions
> are invoked with push/jump (or straight jumps) because, for instance
> they are tail calls or somehow the return address is known statically?

That's not uncommon -  ancillary memcpy functions (such as mempcpy,
__memcpy_chk) get implemented that way and I'm sure there are other
such routines in there.

Siddhesh
-- 
http://siddhesh.in

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Question about the highly optimized aspects of libc implementation
  2017-12-04  1:15   ` Will Hawkins
  2017-12-04  8:12     ` Siddhesh Poyarekar
@ 2017-12-04 12:15     ` Florian Weimer
  1 sibling, 0 replies; 9+ messages in thread
From: Florian Weimer @ 2017-12-04 12:15 UTC (permalink / raw)
  To: Will Hawkins; +Cc: libc-help

On 12/04/2017 02:15 AM, Will Hawkins wrote:

>> The fcntl implementation calls va_arg on a variadic argument which might not
>> actually exist.  The syscall function does something similar (but it is
>> actually implemented in machine code, so it's less of a problem).
> 
> Are you referring to
> 
> ...
> int
> __libc_fcntl (int fd, int cmd, ...)
> {
>    va_list ap;
>    void *arg;
> 
>    va_start (ap, cmd);
>    arg = va_arg (ap, void *);
>    va_end (ap);
> ...
> 
> from sysdeps/unix/sysv/linux/fcntl.c?

Right, this is what I had in mind.

>> The NSS internals in general and getaddrinfo in particular call functions
>> through a mis-matching function pointer (with an additional argument added,
>> or with a void * argument where the function is defined with a concrete
>> function pointer).
> 
> Are you referring to, for example,
> 
>        if (fct != NULL)
>          {
>            if (req->ai_family == AF_INET6
>          || req->ai_family == AF_UNSPEC)
>        {
>          gethosts (AF_INET6, struct in6_addr);
>          no_inet6_data = no_data;
>          inet6_status = status;
>        }

Yes, per the comment above:

	  if (fct == NULL)
	    /* We are cheating here.  The gethostbyname2_r
	       function does not have the same interface as
	       gethostbyname3_r but the extra arguments the
	       latter takes are added at the end.  So the
	       gethostbyname2_r code will just ignore them.  */
	    fct = __nss_lookup_function (nip, "gethostbyname2_r");

> from sysdeps/posix/getaddrinfo.c where gethosts uses DL_CALL_FCT to
> invoke fct as if it were a function that returned void* and with an
> additional void* parameter as a result of the call through
> _dl_mcount_wrapper_check?

I haven't considered this.  I meant the pointer assignment I quoted above.

> What you've given me is great! However, if there are other interesting
> ones, I'd love to hear them! I love seeing the /expert/ uses of the C
> language for learning. You are all amazing craftpersons -- it's great
> to watch you work.

Oh, I expected you were doing research on deliberate use of non-standard 
constructs.  We get such queries from time to time.

The examples I quoted can hardly be considered “expert uses”.  They are 
just the historically chosen approach.  I doubt we would add such code 
today, maybe with the exception of the fcntl case.

In any case, these a truly bad examples, and we only get away with this 
because the C library is essentially part of the C implementation.  For 
general-purpose programming, these practices are harmful, and we are 
gradually removing problematic code from glibc, too.

> Are there any places where, I know this sounds crazy, but functions
> are invoked with push/jump (or straight jumps) because, for instance
> they are tail calls or somehow the return address is known statically?

In general, the compiler does this automatically if feasible.  If we 
need this optimization (say for clone/vfork), I think we implement the 
functions involved in assembler.

Thanks,
Florian

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2017-12-04 12:15 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-11-30  6:48 Question about the highly optimized aspects of libc implementation Will Hawkins
2017-11-30  6:54 ` Siddhesh Poyarekar
2017-11-30  8:22   ` Florian Weimer
2017-11-30  6:56 ` Carlos O'Donell
2017-11-30  7:00   ` Will Hawkins
2017-11-30  8:36 ` Florian Weimer
2017-12-04  1:15   ` Will Hawkins
2017-12-04  8:12     ` Siddhesh Poyarekar
2017-12-04 12:15     ` Florian Weimer

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).