public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
* RFC: The CPU run-time library for C
@ 2018-12-03 17:46 H.J. Lu
  2018-12-03 18:20 ` Szabolcs Nagy
  2018-12-04 18:12 ` Siddhesh Poyarekar
  0 siblings, 2 replies; 8+ messages in thread
From: H.J. Lu @ 2018-12-03 17:46 UTC (permalink / raw)
  To: GNU C Library

Here is the updated proposal for the CPU run-time library for C.

Any comments?

H.J.
---
Memory and string functions in the current glibc are highly optimized for
the current processors on market.  But it takes years for glibc from
release to public to be on end-user’s machines:

1. In 2018, people are still using glibc 2.17, which was released in
February, 2013, on Intel Skylake server, even when the current released
glibc 2.28 has the new memory and string functions optimized for Skylake
server.
2. The same thing will happen five years from now.

One way to address is to make glibc modular by putting memory and string
functions into a separate library, which can be updated separately,
independent of other parts of glibc.  However, memory and string functions
are integral parts of glibc.  Making them a separate modular may not be
easy to achieve.

I am proposing a less aggressive approach by adding a --enable-cpu-rt
configure option to enable the CPU run-time library for C, libcpu-rt-c:

1. It contains a subset of glibc.  There are no new implementations of
any libc functions.  All functions in libcpu-rt-c come from inside of
glibc and are tested with the same test frame work.
2. Start with memory and string functions.  Any other additions must be
carefully screened.
3. It should support glibc tunables.
4. It should be binary compatible with all existing glibc binaries so
that LD_PRELOAD=libcpu-rt-c.so” can be used to override functions in
libc.so.

End users can obtain libcpu-rt-c from

1. Install libcpu-rt-c binary from their OS vendors if available.
2. Build libcpu-rt-c from source.
3. Download libcpu-rt-c binary from a central location.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: RFC: The CPU run-time library for C
  2018-12-03 17:46 RFC: The CPU run-time library for C H.J. Lu
@ 2018-12-03 18:20 ` Szabolcs Nagy
  2018-12-03 18:33   ` H.J. Lu
  2018-12-04 18:12 ` Siddhesh Poyarekar
  1 sibling, 1 reply; 8+ messages in thread
From: Szabolcs Nagy @ 2018-12-03 18:20 UTC (permalink / raw)
  To: H.J. Lu, GNU C Library; +Cc: nd

On 03/12/18 17:46, H.J. Lu wrote:
> 4. It should be binary compatible with all existing glibc binaries so
> that LD_PRELOAD=libcpu-rt-c.so” can be used to override functions in
> libc.so.

is that possible in the presence of multiple
symbol versions for the same symbol?

(e.g. if i want to do this for the math library,
now that we have separate svid vs non-svid compat
symbols it's not clear to me if you could get the
right symbol version with preloading)

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: RFC: The CPU run-time library for C
  2018-12-03 18:20 ` Szabolcs Nagy
@ 2018-12-03 18:33   ` H.J. Lu
  2018-12-04 16:50     ` H.J. Lu
  0 siblings, 1 reply; 8+ messages in thread
From: H.J. Lu @ 2018-12-03 18:33 UTC (permalink / raw)
  To: szabolcs.nagy; +Cc: GNU C Library, nd

On Mon, Dec 3, 2018 at 10:20 AM Szabolcs Nagy <Szabolcs.Nagy@arm.com> wrote:
>
> On 03/12/18 17:46, H.J. Lu wrote:
> > 4. It should be binary compatible with all existing glibc binaries so
> > that LD_PRELOAD=libcpu-rt-c.so” can be used to override functions in
> > libc.so.
>
> is that possible in the presence of multiple
> symbol versions for the same symbol?
>
> (e.g. if i want to do this for the math library,
> now that we have separate svid vs non-svid compat
> symbols it's not clear to me if you could get the
> right symbol version with preloading)

In this case, the CPU run-time library needs to provide
a compat symbol so that reference to the compat symbol
works correctly with LD_PRELOAD.  I believe it is doable.

-- 
H.J.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: RFC: The CPU run-time library for C
  2018-12-03 18:33   ` H.J. Lu
@ 2018-12-04 16:50     ` H.J. Lu
  0 siblings, 0 replies; 8+ messages in thread
From: H.J. Lu @ 2018-12-04 16:50 UTC (permalink / raw)
  To: szabolcs.nagy; +Cc: GNU C Library, nd

[-- Attachment #1: Type: text/plain, Size: 1824 bytes --]

On Mon, Dec 3, 2018 at 10:32 AM H.J. Lu <hjl.tools@gmail.com> wrote:
>
> On Mon, Dec 3, 2018 at 10:20 AM Szabolcs Nagy <Szabolcs.Nagy@arm.com> wrote:
> >
> > On 03/12/18 17:46, H.J. Lu wrote:
> > > 4. It should be binary compatible with all existing glibc binaries so
> > > that LD_PRELOAD=libcpu-rt-c.so” can be used to override functions in
> > > libc.so.
> >
> > is that possible in the presence of multiple
> > symbol versions for the same symbol?
> >
> > (e.g. if i want to do this for the math library,
> > now that we have separate svid vs non-svid compat
> > symbols it's not clear to me if you could get the
> > right symbol version with preloading)
>
> In this case, the CPU run-time library needs to provide
> a compat symbol so that reference to the compat symbol
> works correctly with LD_PRELOAD.  I believe it is doable.

Here is a test:

[hjl@gnu-cfl-1 preload-2]$ make
gcc -fno-builtin -g   -c -o foo.o foo.c
gcc -fno-builtin -g -fPIC   -c -o bar-old.o bar-old.c
gcc -shared -o libbar-old.so bar-old.o
-Wl,-soname,libbar.so,--version-script=libbar-old.map
gcc -o foo-old foo.o libbar-old.so -Wl,-R,.
gcc -fno-builtin -g -fPIC   -c -o bar-new.o bar-new.c
gcc -shared -o libbar-new.so bar-new.o
-Wl,-soname,libbar.so,--version-script=libbar.map
gcc -o foo-new foo.o libbar-new.so -Wl,-R,.
gcc -fno-builtin -g -fPIC   -c -o preload.o preload.c
gcc -shared -o preload.so preload.o -Wl,--version-script=libbar.map
ln -sf libbar-old.so libbar.so
./foo-old
libbar.so: bar
LD_PRELOAD=./preload.so ./foo-old
preload.so: old_bar
ln -sf libbar-new.so libbar.so
./foo-new
libbar.so: new_bar
LD_PRELOAD=./preload.so ./foo-new
preload.so: new_bar
./foo-old
libbar.so: old_bar
LD_PRELOAD=./preload.so ./foo-old
preload.so: old_bar
[hjl@gnu-cfl-1 preload-2]$

--
H.J.

[-- Attachment #2: test.tar.xz --]
[-- Type: application/x-xz, Size: 784 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: RFC: The CPU run-time library for C
  2018-12-03 17:46 RFC: The CPU run-time library for C H.J. Lu
  2018-12-03 18:20 ` Szabolcs Nagy
@ 2018-12-04 18:12 ` Siddhesh Poyarekar
  2018-12-04 20:34   ` Carlos O'Donell
  1 sibling, 1 reply; 8+ messages in thread
From: Siddhesh Poyarekar @ 2018-12-04 18:12 UTC (permalink / raw)
  To: H.J. Lu, GNU C Library

On 03/12/18 11:16 PM, H.J. Lu wrote:
> 1. Install libcpu-rt-c binary from their OS vendors if available.

I'm curious to know what OS vendors think of this.  AFAICT, it's not too 
different from shipping an alternate glibc and in some ways, the latter 
might just be easier than munging scripts to build a separate library.

Also, if the same ABI guarantees are expected of this new library, then 
again would OS vendors prefer to ship a whole new library or would they 
be better off just backporting these new routines?

Basically, this doesn't make sense if OS vendors aren't going to ship 
it.  Building in this complexity just to make a downloadable binary in 
some arbitrary place sounds like an ugly hack that will come to bite us 
later.

Siddhesh

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: RFC: The CPU run-time library for C
  2018-12-04 18:12 ` Siddhesh Poyarekar
@ 2018-12-04 20:34   ` Carlos O'Donell
  2018-12-05  3:54     ` Siddhesh Poyarekar
  2018-12-07 20:30     ` Patrick McGehearty
  0 siblings, 2 replies; 8+ messages in thread
From: Carlos O'Donell @ 2018-12-04 20:34 UTC (permalink / raw)
  To: Siddhesh Poyarekar, H.J. Lu, GNU C Library

On 12/4/18 1:12 PM, Siddhesh Poyarekar wrote:
> On 03/12/18 11:16 PM, H.J. Lu wrote:
>> 1. Install libcpu-rt-c binary from their OS vendors if available.
> 
> I'm curious to know what OS vendors think of this.  AFAICT, it's not
> too different from shipping an alternate glibc and in some ways, the
> latter might just be easier than munging scripts to build a separate
> library.
> 
> Also, if the same ABI guarantees are expected of this new library,
> then again would OS vendors prefer to ship a whole new library or
> would they be better off just backporting these new routines?
> 
> Basically, this doesn't make sense if OS vendors aren't going to ship
> it.  Building in this complexity just to make a downloadable binary
> in some arbitrary place sounds like an ugly hack that will come to
> bite us later.

H.J. posted an early RFC in June:
https://www.sourceware.org/ml/libc-alpha/2018-06/msg00259.html

My summary of consensus in June was:

- Suggest implementing in a distinct project: Adhemerval, Florian, Carlos.

- Request simpler design: Florian, Siddhesh.

(1) Why not an external preloadable library?

This RFC appears unchanged from the original proposal and the outstanding
comments do not appear to have been discussed in any further detail.
Particularly the cost/benefit ratio to the project to accept such patches
versus a simpler mechanism. Likewise why "most" of user needs cannot be met
by something like the ARM's cortex-strings, which doesn't need deep 
integration with glibc-specific features.

(2) Current libcpu-rt-c proposal does not meed OS vendor needs.

The present libcpu-rt-c proposal as-is is not usable by OS vendors;
replacing the core string routines is equivalent to a library rebase
and requires revalidation efforts by the distribution and by QE. This
makes it *almost* as difficult to rebase and update libcpu-rt-c as it is
to rebase and update glibc (not to mention it requires using DTS in RHEL
to get a new-enough compiler/binutils). The other consequence is that a
newer compiler/binutils may need a newer gdb to even be able to debug
the code in question, and the problem is compounded. No distro that
I'm aware of has ever delivered something like this.

OS vendors already have process to backport IFUNC and other
improvements to stable branches, and we do this in RHEL for Intel, 
IBM, and ARM (just look at our public glibc.spec %changelog) e.g.
- Improve libm performance AArch64 (#1302086)
- Improve memcpy performance for POWER9 DD2.1 (#1498925)
- Add Intel AVX-512 optimized routines (#1298526).
- Improve performance on Intel Purley (#1335286).
- Add support for new IBM z14 (s390x) instructions (#1375235)

If you need key routines backported, please work with your
distribution contact to have key support backported. RHEL
point releases happen frequently.

Therefore this proposal only adds work to upstream glibc, and
doesn't provide customers with a supported libcpu-rt-c. At most
it gives customers a way to improve performance by using 
libraries provided by a 3rd party. That 3rd party could equally
deploy a custom glibc and tell the customer to use that.

(3) Solution is too costly in terms of maintenance.

The solution lacks the simplicity of plans like --enable-math-private.

In this patch set from Florian:
https://sourceware.org/ml/libc-alpha/2018-09/msg00368.html

We see a proposal that is much simpler for the math routines.
In particularly building libm.so such that it is distinct from
glibc and can be preloaded. This is easier for libm functions
because they are so distinct from libc, but it's just an example
of the kind of well isolated solutions which are desirable
from upstream.

My opinion is that unless the solution becomes drastically
simpler that it has too high a cost in terms of maintenance
for the problem it solves.

---

In summary:

(1) Could solve "most" of the problem with an external
    pre-loadable library, wihtout all the bells-and-whistles
    glibc has (tunables, etc) e.g. ARM's cortex-strings.

(2) Difficult to support from an OS vendor point of view.
    Easier to just ship a new glibc.

(3) Costly in terms of maintenance for the value it provides.
    Cost is ongoing maintenance and support of lots of
    conditionals to enable 3rd parties providing parts of
    new glibc's functionality to users.

-- 
Cheers,
Carlos.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: RFC: The CPU run-time library for C
  2018-12-04 20:34   ` Carlos O'Donell
@ 2018-12-05  3:54     ` Siddhesh Poyarekar
  2018-12-07 20:30     ` Patrick McGehearty
  1 sibling, 0 replies; 8+ messages in thread
From: Siddhesh Poyarekar @ 2018-12-05  3:54 UTC (permalink / raw)
  To: Carlos O'Donell, H.J. Lu, GNU C Library

On 05/12/18 2:04 AM, Carlos O'Donell wrote:
> H.J. posted an early RFC in June:
> https://www.sourceware.org/ml/libc-alpha/2018-06/msg00259.html
> 
> My summary of consensus in June was:
> 
> - Suggest implementing in a distinct project: Adhemerval, Florian, Carlos.
> 
> - Request simpler design: Florian, Siddhesh.

Well my opinion was really more about glibc's build system and not this 
library; I couldn't see a viable way to have it ship even then and I 
think a lot of y'all had already made that point.

> (1) Why not an external preloadable library?
> 
> This RFC appears unchanged from the original proposal and the outstanding
> comments do not appear to have been discussed in any further detail.
> Particularly the cost/benefit ratio to the project to accept such patches
> versus a simpler mechanism. Likewise why "most" of user needs cannot be met
> by something like the ARM's cortex-strings, which doesn't need deep
> integration with glibc-specific features.

Yeah, that is a much more flexible approach.  Maybe in the medium/long 
term we could consider the idea of making this new project into a 
submodule of glibc to reduce or even avoid duplication of code.

My only concern here is fragmentation; architecture maintainers will 
need to make sure that they're syncing routines regularly.  It happens 
for arm/aarch64 currently because we're still in a state where glibc 
dictates the development to a great extent.  Once this library gets 
traction, that incentive may get lost.

> (2) Current libcpu-rt-c proposal does not meed OS vendor needs.
> 
> The present libcpu-rt-c proposal as-is is not usable by OS vendors;
> replacing the core string routines is equivalent to a library rebase
> and requires revalidation efforts by the distribution and by QE. This
> makes it *almost* as difficult to rebase and update libcpu-rt-c as it is
> to rebase and update glibc (not to mention it requires using DTS in RHEL
> to get a new-enough compiler/binutils). The other consequence is that a
> newer compiler/binutils may need a newer gdb to even be able to debug
> the code in question, and the problem is compounded. No distro that
> I'm aware of has ever delivered something like this.
> 
> OS vendors already have process to backport IFUNC and other
> improvements to stable branches, and we do this in RHEL for Intel,
> IBM, and ARM (just look at our public glibc.spec %changelog) e.g.
> - Improve libm performance AArch64 (#1302086)
> - Improve memcpy performance for POWER9 DD2.1 (#1498925)
> - Add Intel AVX-512 optimized routines (#1298526).
> - Improve performance on Intel Purley (#1335286).
> - Add support for new IBM z14 (s390x) instructions (#1375235)
> 
> If you need key routines backported, please work with your
> distribution contact to have key support backported. RHEL
> point releases happen frequently.

Agreed, this is pretty much what I said at the Plumbers last month with 
my ex-Red Hatter Fedora on.

> Therefore this proposal only adds work to upstream glibc, and
> doesn't provide customers with a supported libcpu-rt-c. At most
> it gives customers a way to improve performance by using
> libraries provided by a 3rd party. That 3rd party could equally
> deploy a custom glibc and tell the customer to use that.

Right.

> (3) Solution is too costly in terms of maintenance.
> 
> The solution lacks the simplicity of plans like --enable-math-private.
> 
> In this patch set from Florian:
> https://sourceware.org/ml/libc-alpha/2018-09/msg00368.html
> 
> We see a proposal that is much simpler for the math routines.
> In particularly building libm.so such that it is distinct from
> glibc and can be preloaded. This is easier for libm functions
> because they are so distinct from libc, but it's just an example
> of the kind of well isolated solutions which are desirable
> from upstream.

This is fine for math, but maybe not for strings because they might need 
some initialization state to work correctly (e.g. tunables) and also 
because they may get used very early.  It's solvable, but not as easily 
as math.

Siddhesh

PS: <bait>We should some day talk about going the opposite way and 
merging libpthread.so into libc.so</bait>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: RFC: The CPU run-time library for C
  2018-12-04 20:34   ` Carlos O'Donell
  2018-12-05  3:54     ` Siddhesh Poyarekar
@ 2018-12-07 20:30     ` Patrick McGehearty
  1 sibling, 0 replies; 8+ messages in thread
From: Patrick McGehearty @ 2018-12-07 20:30 UTC (permalink / raw)
  To: libc-alpha

Disclaimer: While I work for Oracle, I am not authorized on comment on 
Oracle product plans.

My work focus has been primarily on performance issues on various 
systems and
HW platforms over the years. I am sympathetic to the desire for performance
improvements to get into actual end-user's hands as quickly as is consistent
with security, reliability, etc.  I don't believe the add-on 
glibc_(mem/string)
approach will achieve this goal for most vendors and most 
vendor-dependent users.

Ideally, any supported release of a product will require testing, 
documentation,
and a QA phase. Most vendors support multiple releases at any given time.
For something likely tightly tied to glibc would require QA work for each
glibc_(mem/string) with each glibc_(base). If a vendor has only 3 of 
each at any
given time, that would still mean 9 units of QA work instead of 3 units 
of QA work.
The potential market benefit would be small compared to the additional 
overhead
of just the QA work. Once you add in the increased cost of applying fixes
more yet more source trees [six instead of three in the above example],
it hardly seems an attractive path for SW maintenance.

Today, a vendor can select the upstream performance related patches they
perceive as useful to their customers and apply them to their next update
of their newest glibc release. Older releases are likely to be left 
unchanged
as customers on older releases implicitly prefer stability. If they wanted
the latest stuff they'd switch to the newest vendor release.

To get improvements to customers faster, we need vendors to have pressure
from customers to make those improvements available. That means
customers need to be aware that improvements are happening.
Even simple synthetic open source benchmarks with a reasonable range of
input values can be useful in this regard. Then one can say:
"On the glibc strcpy benchmark, for platform y, the new strcpy code runs 
x% faster."
Simple, quantitative, easy to grasp the improvement, and easy to validate
by anyone with access to the src, the test, and platform y.
Then a vendor could pick up a set of improvements and tell customers
that "our newest version of glibc runs %x to %y faster on a range of
commonly used functions (see appendix for details) than glibc version zzz."
Customers who care would gravitate to vendors who release improvements
more quickly, giving vendors a reason to port the upstream improvements
more quickly.

- patrick


On 12/4/2018 2:34 PM, Carlos O'Donell wrote:
> On 12/4/18 1:12 PM, Siddhesh Poyarekar wrote:
>> On 03/12/18 11:16 PM, H.J. Lu wrote:
>>> 1. Install libcpu-rt-c binary from their OS vendors if available.
>> I'm curious to know what OS vendors think of this.  AFAICT, it's not
>> too different from shipping an alternate glibc and in some ways, the
>> latter might just be easier than munging scripts to build a separate
>> library.
>>
>> Also, if the same ABI guarantees are expected of this new library,
>> then again would OS vendors prefer to ship a whole new library or
>> would they be better off just backporting these new routines?
>>
>> Basically, this doesn't make sense if OS vendors aren't going to ship
>> it.  Building in this complexity just to make a downloadable binary
>> in some arbitrary place sounds like an ugly hack that will come to
>> bite us later.
> H.J. posted an early RFC in June:
> https://www.sourceware.org/ml/libc-alpha/2018-06/msg00259.html
>
> My summary of consensus in June was:
>
> - Suggest implementing in a distinct project: Adhemerval, Florian, Carlos.
>
> - Request simpler design: Florian, Siddhesh.
>
> (1) Why not an external preloadable library?
>
> This RFC appears unchanged from the original proposal and the outstanding
> comments do not appear to have been discussed in any further detail.
> Particularly the cost/benefit ratio to the project to accept such patches
> versus a simpler mechanism. Likewise why "most" of user needs cannot be met
> by something like the ARM's cortex-strings, which doesn't need deep
> integration with glibc-specific features.
>
> (2) Current libcpu-rt-c proposal does not meed OS vendor needs.
>
> The present libcpu-rt-c proposal as-is is not usable by OS vendors;
> replacing the core string routines is equivalent to a library rebase
> and requires revalidation efforts by the distribution and by QE. This
> makes it *almost* as difficult to rebase and update libcpu-rt-c as it is
> to rebase and update glibc (not to mention it requires using DTS in RHEL
> to get a new-enough compiler/binutils). The other consequence is that a
> newer compiler/binutils may need a newer gdb to even be able to debug
> the code in question, and the problem is compounded. No distro that
> I'm aware of has ever delivered something like this.
>
> OS vendors already have process to backport IFUNC and other
> improvements to stable branches, and we do this in RHEL for Intel,
> IBM, and ARM (just look at our public glibc.spec %changelog) e.g.
> - Improve libm performance AArch64 (#1302086)
> - Improve memcpy performance for POWER9 DD2.1 (#1498925)
> - Add Intel AVX-512 optimized routines (#1298526).
> - Improve performance on Intel Purley (#1335286).
> - Add support for new IBM z14 (s390x) instructions (#1375235)
>
> If you need key routines backported, please work with your
> distribution contact to have key support backported. RHEL
> point releases happen frequently.
>
> Therefore this proposal only adds work to upstream glibc, and
> doesn't provide customers with a supported libcpu-rt-c. At most
> it gives customers a way to improve performance by using
> libraries provided by a 3rd party. That 3rd party could equally
> deploy a custom glibc and tell the customer to use that.
>
> (3) Solution is too costly in terms of maintenance.
>
> The solution lacks the simplicity of plans like --enable-math-private.
>
> In this patch set from Florian:
> https://sourceware.org/ml/libc-alpha/2018-09/msg00368.html
>
> We see a proposal that is much simpler for the math routines.
> In particularly building libm.so such that it is distinct from
> glibc and can be preloaded. This is easier for libm functions
> because they are so distinct from libc, but it's just an example
> of the kind of well isolated solutions which are desirable
> from upstream.
>
> My opinion is that unless the solution becomes drastically
> simpler that it has too high a cost in terms of maintenance
> for the problem it solves.
>
> ---
>
> In summary:
>
> (1) Could solve "most" of the problem with an external
>      pre-loadable library, wihtout all the bells-and-whistles
>      glibc has (tunables, etc) e.g. ARM's cortex-strings.
>
> (2) Difficult to support from an OS vendor point of view.
>      Easier to just ship a new glibc.
>
> (3) Costly in terms of maintenance for the value it provides.
>      Cost is ongoing maintenance and support of lots of
>      conditionals to enable 3rd parties providing parts of
>      new glibc's functionality to users.
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2018-12-07 20:16 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-12-03 17:46 RFC: The CPU run-time library for C H.J. Lu
2018-12-03 18:20 ` Szabolcs Nagy
2018-12-03 18:33   ` H.J. Lu
2018-12-04 16:50     ` H.J. Lu
2018-12-04 18:12 ` Siddhesh Poyarekar
2018-12-04 20:34   ` Carlos O'Donell
2018-12-05  3:54     ` Siddhesh Poyarekar
2018-12-07 20:30     ` Patrick McGehearty

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).