public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
* A per-user or per-application ld.so.cache?
@ 2016-02-08 18:40 Carlos O'Donell
  2016-02-08 19:10 ` Florian Weimer
                   ` (3 more replies)
  0 siblings, 4 replies; 26+ messages in thread
From: Carlos O'Donell @ 2016-02-08 18:40 UTC (permalink / raw)
  To: GNU C Library

Under what conditions might it make sense to implement
a per-user ld.so.cache?

At Red Hat we have some customers, particularly in HPC,
which deploy quite large applications across systems that
they don't themselves maintain. In this case the given
application could have thousands of DSOs. When you load
up such an application the normal search paths apply
and that's not very optimal.

Might it make sense to have a per-user ld.so.cache for
this case? Might we even entertain a per-application
ld.so.cache?

Cheers,
Carlos.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: A per-user or per-application ld.so.cache?
  2016-02-08 18:40 A per-user or per-application ld.so.cache? Carlos O'Donell
@ 2016-02-08 19:10 ` Florian Weimer
  2016-02-08 20:19   ` Carlos O'Donell
  2016-02-08 19:12 ` Siddhesh Poyarekar
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 26+ messages in thread
From: Florian Weimer @ 2016-02-08 19:10 UTC (permalink / raw)
  To: carlos; +Cc: libc-alpha

On 02/08/2016 07:40 PM, Carlos O'Donell wrote:
> Under what conditions might it make sense to implement
> a per-user ld.so.cache?
> 
> At Red Hat we have some customers, particularly in HPC,
> which deploy quite large applications across systems that
> they don't themselves maintain. In this case the given
> application could have thousands of DSOs. When you load
> up such an application the normal search paths apply
> and that's not very optimal.

Are these processes short-lived?

Is symbol lookup performance an issue as well?

What's the total size of all relevant DSOs, combined?  What does the
directory structure look like?

Which ELF dynamic linking features are used?

Is the bulk of those DSOs pulled in with dlopen, after the initial
dynamic link?  If yes, does this happen directly (many DSOs dlopen'ed
individually) or indirectly (few of them pull in a huge cascade of
dependencies)?


If the processes are not short-lived and most of the DSOs are loaded
after user code has started executing, I doubt an on-disk cache is the
right solution.

Florian

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: A per-user or per-application ld.so.cache?
  2016-02-08 18:40 A per-user or per-application ld.so.cache? Carlos O'Donell
  2016-02-08 19:10 ` Florian Weimer
@ 2016-02-08 19:12 ` Siddhesh Poyarekar
  2016-02-08 20:14   ` Carlos O'Donell
  2016-02-08 19:16 ` Zack Weinberg
  2016-02-09  4:35 ` Mike Frysinger
  3 siblings, 1 reply; 26+ messages in thread
From: Siddhesh Poyarekar @ 2016-02-08 19:12 UTC (permalink / raw)
  To: Carlos O'Donell; +Cc: GNU C Library

On Mon, Feb 08, 2016 at 01:40:05PM -0500, Carlos O'Donell wrote:
> Under what conditions might it make sense to implement
> a per-user ld.so.cache?
> 
> At Red Hat we have some customers, particularly in HPC,
> which deploy quite large applications across systems that
> they don't themselves maintain. In this case the given
> application could have thousands of DSOs. When you load
> up such an application the normal search paths apply
> and that's not very optimal.
> 
> Might it make sense to have a per-user ld.so.cache for
> this case? Might we even entertain a per-application
> ld.so.cache?

It won't be a bad idea as long as it is ignored when running setuid
binaries.

The other alternative could be extending ld.so.cache to have profiles
for specific binaries in the system path.  That is, enhance the format
of files in /etc/ld.so.conf.d/ to have a regular expression (or maybe
even simple wildcards) in the beginning that specifies binaries that
will use these paths first.  IIRC we do something similar for hwcaps.

This should also improve performance marginally for applications that
don't need all of those library paths that the files in ld.so.conf.d
add.  It will also help maintain a clean namespace in cases where two
installations may have DSOs with the same names but in different
paths.

Siddhesh

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: A per-user or per-application ld.so.cache?
  2016-02-08 18:40 A per-user or per-application ld.so.cache? Carlos O'Donell
  2016-02-08 19:10 ` Florian Weimer
  2016-02-08 19:12 ` Siddhesh Poyarekar
@ 2016-02-08 19:16 ` Zack Weinberg
  2016-02-08 20:10   ` Carlos O'Donell
  2016-02-09  4:35 ` Mike Frysinger
  3 siblings, 1 reply; 26+ messages in thread
From: Zack Weinberg @ 2016-02-08 19:16 UTC (permalink / raw)
  To: Carlos O'Donell, GNU C Library

On Mon, Feb 8, 2016 at 1:40 PM, Carlos O'Donell <carlos@redhat.com> wrote:
> Under what conditions might it make sense to implement
> a per-user ld.so.cache?

My immediate reaction is that ld.so is already too complicated and I
would hate to see it get even more complicated.  Also, I'd like to
understand why these HPC apps require _thousands_ of DSOs, and what
their lifecycles are; I'm not convinced we even know what the
_problem_ is here, let alone the appropriate solution.

zw

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: A per-user or per-application ld.so.cache?
  2016-02-08 19:16 ` Zack Weinberg
@ 2016-02-08 20:10   ` Carlos O'Donell
  0 siblings, 0 replies; 26+ messages in thread
From: Carlos O'Donell @ 2016-02-08 20:10 UTC (permalink / raw)
  To: Zack Weinberg, GNU C Library; +Cc: Ben Woodard

On 02/08/2016 02:16 PM, Zack Weinberg wrote:
> On Mon, Feb 8, 2016 at 1:40 PM, Carlos O'Donell <carlos@redhat.com> wrote:
>> Under what conditions might it make sense to implement
>> a per-user ld.so.cache?
> 
> My immediate reaction is that ld.so is already too complicated and I
> would hate to see it get even more complicated.  Also, I'd like to
> understand why these HPC apps require _thousands_ of DSOs, and what
> their lifecycles are; I'm not convinced we even know what the
> _problem_ is here, let alone the appropriate solution.

Do you really feel ld.so is complicated? My feeling is that it's
poorly documented and poorly tested, but not overly complicated
(though I've been working on dlmopen and other bits so I'm feeling
far more familiar with it than before).

Any changes in this area would need thorough testing, including
building up enough to test ldconfig/ld.so.cache changes from within
a chroot-based testsuite (or something similar).

You can start looking at a problem LLNL has here:
http://computation.llnl.gov/projects/spindle/spindle-paper.pdf

We reference SPINDLE from here also:
https://sourceware.org/glibc/wiki/Tools%20Interface%20NG

The HPC apps don't require _thousands_ of DSOs, it's just the way
they are being developed, Firefox already has 203 DSOs for what is
probably a less complicated application.

We're in touch with various customers and I can relay questions
if we have any, so can Ben Woodard (Red Hat).

Cheers,
Carlos.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: A per-user or per-application ld.so.cache?
  2016-02-08 19:12 ` Siddhesh Poyarekar
@ 2016-02-08 20:14   ` Carlos O'Donell
  2016-02-09  3:29     ` Siddhesh Poyarekar
  0 siblings, 1 reply; 26+ messages in thread
From: Carlos O'Donell @ 2016-02-08 20:14 UTC (permalink / raw)
  To: Siddhesh Poyarekar; +Cc: GNU C Library

On 02/08/2016 02:11 PM, Siddhesh Poyarekar wrote:
> On Mon, Feb 08, 2016 at 01:40:05PM -0500, Carlos O'Donell wrote:
>> Under what conditions might it make sense to implement
>> a per-user ld.so.cache?
>>
>> At Red Hat we have some customers, particularly in HPC,
>> which deploy quite large applications across systems that
>> they don't themselves maintain. In this case the given
>> application could have thousands of DSOs. When you load
>> up such an application the normal search paths apply
>> and that's not very optimal.
>>
>> Might it make sense to have a per-user ld.so.cache for
>> this case? Might we even entertain a per-application
>> ld.so.cache?
> 
> It won't be a bad idea as long as it is ignored when running setuid
> binaries.
> 
> The other alternative could be extending ld.so.cache to have profiles
> for specific binaries in the system path.  That is, enhance the format
> of files in /etc/ld.so.conf.d/ to have a regular expression (or maybe
> even simple wildcards) in the beginning that specifies binaries that
> will use these paths first.  IIRC we do something similar for hwcaps.
> 
> This should also improve performance marginally for applications that
> don't need all of those library paths that the files in ld.so.conf.d
> add.  It will also help maintain a clean namespace in cases where two
> installations may have DSOs with the same names but in different
> paths.

The downside is that the user has no control of this cache and
would need administrative intervention for help accelerating their
application. Consider that you bought time on a cluster of machines,
and now to run your app you're making the user interact with the sysadmin
to install new filters and run ldconfig on every node? It won't scale
(from a human perspective).

With a per-user/per-proces cache say in ~/.ld.so.cache, the user could
prime the cache themselves after setting up their application with
bundled libraries and have it work as expected, but accelerated lookups
without lots of stat/getdents in $HOME.

Does that counter-argument make sense for why the cache could be
under user control? It means the data needs to be inspected carefully
though.

Cheers,
Carlos.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: A per-user or per-application ld.so.cache?
  2016-02-08 19:10 ` Florian Weimer
@ 2016-02-08 20:19   ` Carlos O'Donell
  2016-02-08 20:25     ` Florian Weimer
  2016-02-08 22:29     ` Ben Woodard
  0 siblings, 2 replies; 26+ messages in thread
From: Carlos O'Donell @ 2016-02-08 20:19 UTC (permalink / raw)
  To: Florian Weimer; +Cc: libc-alpha, Ben Woodard

On 02/08/2016 02:10 PM, Florian Weimer wrote:
> On 02/08/2016 07:40 PM, Carlos O'Donell wrote:
>> Under what conditions might it make sense to implement
>> a per-user ld.so.cache?
>>
>> At Red Hat we have some customers, particularly in HPC,
>> which deploy quite large applications across systems that
>> they don't themselves maintain. In this case the given
>> application could have thousands of DSOs. When you load
>> up such an application the normal search paths apply
>> and that's not very optimal.
> 
> Are these processes short-lived?

No.

See [1]. 

> Is symbol lookup performance an issue as well?

Yes. So are the various O(n^2) algorithms we need to fix
inside the loader, particularly the DSO sorts we use.

> What's the total size of all relevant DSOs, combined?  What does the
> directory structure look like?

I don't know. We should as Ben Woodard. To get us that data.

Ben?

> Which ELF dynamic linking features are used?

I don't know.

> Is the bulk of those DSOs pulled in with dlopen, after the initial
> dynamic link?  If yes, does this happen directly (many DSOs dlopen'ed
> individually) or indirectly (few of them pull in a huge cascade of
> dependencies)?

I do not believe the bulk of the DSOs are pulled in with dlopen.

Though for python code I know that might be the reverse with each
python module being a DSO that is loaded by the interpreter.

Which means we probably have two cases:
* Long chains of DSOs (non-python applications)
* Short single DSO chains, but lots of them (python modules).
 
> If the processes are not short-lived and most of the DSOs are loaded
> after user code has started executing, I doubt an on-disk cache is the
> right solution.

Why would a long-lived process that uses dlopen fail to benefit from an
on-disk cache? The on-disk cache, as it is today, is used for a similar
situation already, why not extend it? The biggest difference is that
we trust the cache we have today and mmap into memory. We would have to
harden the code that processes that cache, but it should not be that
hard.

Would you mind expanding on your concern that the solution would not work?

Cheers,
Carlos.
 

[1] http://computation.llnl.gov/projects/spindle/spindle-paper.pdf

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: A per-user or per-application ld.so.cache?
  2016-02-08 20:19   ` Carlos O'Donell
@ 2016-02-08 20:25     ` Florian Weimer
  2016-02-08 20:36       ` Carlos O'Donell
  2016-02-08 22:51       ` Ben Woodard
  2016-02-08 22:29     ` Ben Woodard
  1 sibling, 2 replies; 26+ messages in thread
From: Florian Weimer @ 2016-02-08 20:25 UTC (permalink / raw)
  To: Carlos O'Donell; +Cc: libc-alpha, Ben Woodard

On 02/08/2016 09:19 PM, Carlos O'Donell wrote:

> Why would a long-lived process that uses dlopen fail to benefit from an
> on-disk cache?

It's not worth the complexity.  (On top of the SUID issue already
mentioned, there is also the question of cache invalidation.)  With
long-living processes, you could just read in the a designated list of
directories at startup and use that to seed an ephemeral cache.  Hence
my question about the directory layout.

> Would you mind expanding on your concern that the solution would not work?

It would work, it's just more difficult to use.

Florian

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: A per-user or per-application ld.so.cache?
  2016-02-08 20:25     ` Florian Weimer
@ 2016-02-08 20:36       ` Carlos O'Donell
  2016-02-08 23:00         ` Ben Woodard
  2016-02-09  6:57         ` Florian Weimer
  2016-02-08 22:51       ` Ben Woodard
  1 sibling, 2 replies; 26+ messages in thread
From: Carlos O'Donell @ 2016-02-08 20:36 UTC (permalink / raw)
  To: Florian Weimer; +Cc: libc-alpha, Ben Woodard

On 02/08/2016 03:25 PM, Florian Weimer wrote:
> On 02/08/2016 09:19 PM, Carlos O'Donell wrote:
> 
>> Why would a long-lived process that uses dlopen fail to benefit from an
>> on-disk cache?
> 
> It's not worth the complexity.  (On top of the SUID issue already
> mentioned, there is also the question of cache invalidation.)  With
> long-living processes, you could just read in the a designated list of
> directories at startup and use that to seed an ephemeral cache.  Hence
> my question about the directory layout.

There is no added complexity? It's exactly the same code used by ldconfig
to generated /etc/ld.so.cache, but we extended it to untrusted directories,
and so must filter it based on the user. I admit it is a *little* bit of
added complexity.

Are you familiar with what goes into /etc/ld.so.cache? It is only a cache
of lookups, not the DSOs themselves, so we would be caching the results of
a search of the user directories and recording the DSOs found there, nothing
more. The user is already mostly accustomed to using ldconfig as root to
update the global cache, this is just an extension to allow ldoconfig to
be run by the user.

Right now it's just an idea, and I'm open to other solutions to the problem
of accelerating the DSO search path lookup and minimizing the number of
stats and directories traversals on potentially distributed filesystems.

>> Would you mind expanding on your concern that the solution would not work?
> 
> It would work, it's just more difficult to use.

Would you mind expanding on what you would find difficult? Words like better
or worse, in a technical context, need explicit descriptions of what is
better and what is worse.

The user would have to run 'ldconfig', and perhaps by default we update the
user cache and skip updating the global cache if the user lacks the persmissions
to do so. Not that different from what we do today with Fedora/RHEL spec files
when libraries are installed.

Cheers,
Carlos.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: A per-user or per-application ld.so.cache?
  2016-02-08 20:19   ` Carlos O'Donell
  2016-02-08 20:25     ` Florian Weimer
@ 2016-02-08 22:29     ` Ben Woodard
  2016-02-09  7:18       ` Florian Weimer
  1 sibling, 1 reply; 26+ messages in thread
From: Ben Woodard @ 2016-02-08 22:29 UTC (permalink / raw)
  To: Carlos O'Donell; +Cc: Florian Weimer, libc-alpha


> On Feb 8, 2016, at 12:19 PM, Carlos O'Donell <carlos@redhat.com> wrote:
> 
> On 02/08/2016 02:10 PM, Florian Weimer wrote:
>> On 02/08/2016 07:40 PM, Carlos O'Donell wrote:
>>> Under what conditions might it make sense to implement
>>> a per-user ld.so.cache?
>>> 
>>> At Red Hat we have some customers, particularly in HPC,
>>> which deploy quite large applications across systems that
>>> they don't themselves maintain. In this case the given
>>> application could have thousands of DSOs. When you load
>>> up such an application the normal search paths apply
>>> and that's not very optimal.
>> 
>> Are these processes short-lived?
> 
> No.
> 
> See [1]. 
> 
>> Is symbol lookup performance an issue as well?
> 
> Yes. So are the various O(n^2) algorithms we need to fix
> inside the loader, particularly the DSO sorts we use.
> 
>> What's the total size of all relevant DSOs, combined?  What does the
>> directory structure look like?
> 
> I don't know. We should as Ben Woodard. To get us that data.
> 
> Ben?

I just talked to one of the developers to get a good sense of the current problem. 
The sum of the on-disk file ELF files including debuginfo for one app that we looked at is around 3GB but when we just look at the text in all the ELF files it is 100-200MB depending on architecture spread across about 1400 DSOs. 

Not including the directories already found in the system runtime linker cache there were around 15 directories being pointed to.

> 
>> Which ELF dynamic linking features are used?
> 
> I don't know.

Currently to improve performance they use a lot of rpath and environment specific link paths setup by something quite a lot like modules.

As you pointed out they also have an application which assists in loading all of these libraries for a large MPI job which is called spindle. That makes use of the audit interface.

They wrote a benchmark which demonstrates some of the challenges that they face: https://codesign.llnl.gov/pynamic.php
 
> 
>> Is the bulk of those DSOs pulled in with dlopen, after the initial
>> dynamic link?  If yes, does this happen directly (many DSOs dlopen'ed
>> individually) or indirectly (few of them pull in a huge cascade of
>> dependencies)?
> 
> I do not believe the bulk of the DSOs are pulled in with dlopen.
> 
> Though for python code I know that might be the reverse with each
> python module being a DSO that is loaded by the interpreter.

Unfortunately, that is in fact the case. Many of the applications are glued together with python while bulk of the computation occurs in C++. So much of the code is in fact loaded by a python interpreter.
> 
> Which means we probably have two cases:
> * Long chains of DSOs (non-python applications)
> * Short single DSO chains, but lots of them (python modules).

I brought up this scenario with the developer and there is a 3rd scenario which is python loads a computational library which then has quite a few dependencies. In particular, basically every high level physics library needs to use MPI to communicate between adjacent cells in the mesh. 
> 
>> If the processes are not short-lived and most of the DSOs are loaded
>> after user code has started executing, I doubt an on-disk cache is the
>> right solution.

Except for the fact that the process is starting on literally thousands of nodes simultaneously and its libraries are scattered around about 15 non-system project directories. This leads to a phenomenal number of NFS operations as the compute nodes search through 20 or so directories for all their components. That brings even very powerful NFS servers to their knees. 

> 
> Why would a long-lived process that uses dlopen fail to benefit from an
> on-disk cache? The on-disk cache, as it is today, is used for a similar
> situation already, why not extend it? The biggest difference is that
> we trust the cache we have today and mmap into memory. We would have to
> harden the code that processes that cache, but it should not be that
> hard.
> 
> Would you mind expanding on your concern that the solution would not work?
> 

-ben

> Cheers,
> Carlos.
> 
> 
> [1] http://computation.llnl.gov/projects/spindle/spindle-paper.pdf

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: A per-user or per-application ld.so.cache?
  2016-02-08 20:25     ` Florian Weimer
  2016-02-08 20:36       ` Carlos O'Donell
@ 2016-02-08 22:51       ` Ben Woodard
  1 sibling, 0 replies; 26+ messages in thread
From: Ben Woodard @ 2016-02-08 22:51 UTC (permalink / raw)
  To: Florian Weimer; +Cc: Carlos O'Donell, libc-alpha

I tried something like that years ago. I wrote them a script which:
1) created a subdir, 
2) did a ldd on the specified app, 
3) made direct symlinks to all the libraries that it found in the directory that it created.
4) moved the application to the subdir
5) replaced the application with a shell script that pre-pended . (dot) to the library path and then executed the target binary

IIRC the problems with that approach were that that it didn’t handle the dlopened binaries and that it didn’t fully enumerate the library dependencies.

It was the failure of this approach, that led to the development of spindle. 

Cutting down on the directory searching by every compute node would certainly help quite a lot. It still wouldn’t solve the problem of getting the data to the thousands of compute nodes but it would tackle the problem with the geometric explosion of directory searching. As I understand it, removing directory searching cuts the time down from several hours to dozens of minutes and spindle brings it down to just a few minutes.

-ben

 
> On Feb 8, 2016, at 12:25 PM, Florian Weimer <fweimer@redhat.com> wrote:
> 
> On 02/08/2016 09:19 PM, Carlos O'Donell wrote:
> 
>> Why would a long-lived process that uses dlopen fail to benefit from an
>> on-disk cache?
> 
> It's not worth the complexity.  (On top of the SUID issue already
> mentioned, there is also the question of cache invalidation.)  With
> long-living processes, you could just read in the a designated list of
> directories at startup and use that to seed an ephemeral cache.  Hence
> my question about the directory layout.
> 
>> Would you mind expanding on your concern that the solution would not work?
> 
> It would work, it's just more difficult to use.
> 
> Florian

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: A per-user or per-application ld.so.cache?
  2016-02-08 20:36       ` Carlos O'Donell
@ 2016-02-08 23:00         ` Ben Woodard
  2016-02-09  6:57         ` Florian Weimer
  1 sibling, 0 replies; 26+ messages in thread
From: Ben Woodard @ 2016-02-08 23:00 UTC (permalink / raw)
  To: Carlos O'Donell; +Cc: Florian Weimer, libc-alpha


> On Feb 8, 2016, at 12:36 PM, Carlos O'Donell <carlos@redhat.com> wrote:
> 
> Are you familiar with what goes into /etc/ld.so.cache? It is only a cache
> of lookups, not the DSOs themselves, so we would be caching the results of
> a search of the user directories and recording the DSOs found there, nothing
> more. The user is already mostly accustomed to using ldconfig as root to
> update the global cache,

That is probably not true in the cases that I’m familiar with but I will agree that it could relatively easily be taught. Having them specify the directories that they want searched in a .ld.so.conf or maybe in a .ld.so.conf.appname in the directory for the application could speed things up considerably. 

> this is just an extension to allow ldoconfig to
> be run by the user.

-ben

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: A per-user or per-application ld.so.cache?
  2016-02-08 20:14   ` Carlos O'Donell
@ 2016-02-09  3:29     ` Siddhesh Poyarekar
  2016-02-09  3:35       ` Carlos O'Donell
  0 siblings, 1 reply; 26+ messages in thread
From: Siddhesh Poyarekar @ 2016-02-09  3:29 UTC (permalink / raw)
  To: Carlos O'Donell; +Cc: GNU C Library

On Mon, Feb 08, 2016 at 03:14:08PM -0500, Carlos O'Donell wrote:
> The downside is that the user has no control of this cache and
> would need administrative intervention for help accelerating their
> application. Consider that you bought time on a cluster of machines,
> and now to run your app you're making the user interact with the sysadmin
> to install new filters and run ldconfig on every node? It won't scale
> (from a human perspective).
> 
> With a per-user/per-proces cache say in ~/.ld.so.cache, the user could
> prime the cache themselves after setting up their application with
> bundled libraries and have it work as expected, but accelerated lookups
> without lots of stat/getdents in $HOME.
> 
> Does that counter-argument make sense for why the cache could be
> under user control? It means the data needs to be inspected carefully
> though.

Sure, but a similar effect could also be achieved using
LD_LIBRARY_PATH in ~/.bashrc.

Siddhesh

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: A per-user or per-application ld.so.cache?
  2016-02-09  3:29     ` Siddhesh Poyarekar
@ 2016-02-09  3:35       ` Carlos O'Donell
  2016-02-09  4:19         ` Siddhesh Poyarekar
  0 siblings, 1 reply; 26+ messages in thread
From: Carlos O'Donell @ 2016-02-09  3:35 UTC (permalink / raw)
  To: Siddhesh Poyarekar; +Cc: GNU C Library

On 02/08/2016 10:29 PM, Siddhesh Poyarekar wrote:
> On Mon, Feb 08, 2016 at 03:14:08PM -0500, Carlos O'Donell wrote:
>> The downside is that the user has no control of this cache and
>> would need administrative intervention for help accelerating their
>> application. Consider that you bought time on a cluster of machines,
>> and now to run your app you're making the user interact with the sysadmin
>> to install new filters and run ldconfig on every node? It won't scale
>> (from a human perspective).
>>
>> With a per-user/per-proces cache say in ~/.ld.so.cache, the user could
>> prime the cache themselves after setting up their application with
>> bundled libraries and have it work as expected, but accelerated lookups
>> without lots of stat/getdents in $HOME.
>>
>> Does that counter-argument make sense for why the cache could be
>> under user control? It means the data needs to be inspected carefully
>> though.
> 
> Sure, but a similar effect could also be achieved using
> LD_LIBRARY_PATH in ~/.bashrc.

Not similar enough from a performance perspective.

If you have 15 paths in LD_LIBRARY_PATH, they each need to be searched
in order to find the DSOs required in the last path entry. If you had
a per-user cache it's a single cache lookup and an mmap. There is no
traversal required of any filesystem if you get a hit in the cache.

Isn't that much better?

All of the cache machinery is there, we just don't have a per-user
cache.

Cheers,
Carlos.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: A per-user or per-application ld.so.cache?
  2016-02-09  3:35       ` Carlos O'Donell
@ 2016-02-09  4:19         ` Siddhesh Poyarekar
  0 siblings, 0 replies; 26+ messages in thread
From: Siddhesh Poyarekar @ 2016-02-09  4:19 UTC (permalink / raw)
  To: Carlos O'Donell; +Cc: GNU C Library

On Mon, Feb 08, 2016 at 10:35:17PM -0500, Carlos O'Donell wrote:
> Not similar enough from a performance perspective.
> 
> If you have 15 paths in LD_LIBRARY_PATH, they each need to be searched
> in order to find the DSOs required in the last path entry. If you had
> a per-user cache it's a single cache lookup and an mmap. There is no
> traversal required of any filesystem if you get a hit in the cache.
> 
> Isn't that much better?
> 
> All of the cache machinery is there, we just don't have a per-user
> cache.

Agreed, it will be better.

Siddhesh

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: A per-user or per-application ld.so.cache?
  2016-02-08 18:40 A per-user or per-application ld.so.cache? Carlos O'Donell
                   ` (2 preceding siblings ...)
  2016-02-08 19:16 ` Zack Weinberg
@ 2016-02-09  4:35 ` Mike Frysinger
  2016-02-09  6:04   ` Carlos O'Donell
  2016-02-09  9:00   ` Andreas Schwab
  3 siblings, 2 replies; 26+ messages in thread
From: Mike Frysinger @ 2016-02-09  4:35 UTC (permalink / raw)
  To: Carlos O'Donell; +Cc: GNU C Library

[-- Attachment #1: Type: text/plain, Size: 501 bytes --]

On 08 Feb 2016 13:40, Carlos O'Donell wrote:
> Might it make sense to have a per-user ld.so.cache for
> this case? Might we even entertain a per-application
> ld.so.cache?

i wouldn't mind adding a new env knob like LD_LIBRARY_CACHE with the
same characteristics as LD_LIBRARY_PATH such as:
 - it is searched before the system cache
 - it is ignored in set*id environments

i suspect looking for something like ~/.ld.so/cache wouldn't be as
flexible, nor does it match the existing LD_xxx vars.
-mike

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: A per-user or per-application ld.so.cache?
  2016-02-09  4:35 ` Mike Frysinger
@ 2016-02-09  6:04   ` Carlos O'Donell
  2016-02-09  9:00   ` Andreas Schwab
  1 sibling, 0 replies; 26+ messages in thread
From: Carlos O'Donell @ 2016-02-09  6:04 UTC (permalink / raw)
  To: GNU C Library

On 02/08/2016 11:35 PM, Mike Frysinger wrote:
> On 08 Feb 2016 13:40, Carlos O'Donell wrote:
>> Might it make sense to have a per-user ld.so.cache for
>> this case? Might we even entertain a per-application
>> ld.so.cache?
> 
> i wouldn't mind adding a new env knob like LD_LIBRARY_CACHE with the
> same characteristics as LD_LIBRARY_PATH such as:
>  - it is searched before the system cache
>  - it is ignored in set*id environments

I like it. We teach ldconfig to generate cache files, and then
we can even get some cache testing into the testsuite easily
without having to do chroot based testing right away.

c.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: A per-user or per-application ld.so.cache?
  2016-02-08 20:36       ` Carlos O'Donell
  2016-02-08 23:00         ` Ben Woodard
@ 2016-02-09  6:57         ` Florian Weimer
  2016-02-09  7:44           ` Carlos O'Donell
  1 sibling, 1 reply; 26+ messages in thread
From: Florian Weimer @ 2016-02-09  6:57 UTC (permalink / raw)
  To: Carlos O'Donell; +Cc: libc-alpha, Ben Woodard

On 02/08/2016 09:36 PM, Carlos O'Donell wrote:

> Would you mind expanding on what you would find difficult? Words like better
> or worse, in a technical context, need explicit descriptions of what is
> better and what is worse.

I assume you want to keep a single cache file, right?

If I understand the current situation correctly, the system cache is not
just an optimization, it is also used to effectively extend the search
path because otherwise, ld.so would not load libraries from
/usr/lib64/atlas, for example.  (I have a file
/etc/ld.so.conf.d/atlas-x86_64.conf which lists the directory
/usr/lib64/atlas.)

I think this means that if you do not update cache, but install new
system DSO versions, you might no longer be able to find all DSOs.
Users would need some way to know when to update their caches.

Or we'd have to do that as part of ld.so, but that doesn't seem to be
particularly attractive because of the limited facilities at that point
of process life.  This is why I asked if the loading is triggered only
after user code has run.

> The user would have to run 'ldconfig', and perhaps by default we update the
> user cache and skip updating the global cache if the user lacks the persmissions
> to do so. Not that different from what we do today with Fedora/RHEL spec files
> when libraries are installed.

Yes, and I'm worried that keeping the cache in sync could be too confusing.

Florian

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: A per-user or per-application ld.so.cache?
  2016-02-08 22:29     ` Ben Woodard
@ 2016-02-09  7:18       ` Florian Weimer
  2016-02-09 23:27         ` Ben Woodard
  0 siblings, 1 reply; 26+ messages in thread
From: Florian Weimer @ 2016-02-09  7:18 UTC (permalink / raw)
  To: Ben Woodard, Carlos O'Donell; +Cc: libc-alpha

On 02/08/2016 11:29 PM, Ben Woodard wrote:

> I just talked to one of the developers to get a good sense of the current problem. 
> The sum of the on-disk file ELF files including debuginfo for one app that we looked at is around 3GB but when we just look at the text in all the ELF files it is 100-200MB depending on architecture spread across about 1400 DSOs. 

This means that copying the text files together into a single file would
be feasible.

> Except for the fact that the process is starting on literally thousands of nodes simultaneously and its libraries are scattered around about 15 non-system project directories. This leads to a phenomenal number of NFS operations as the compute nodes search through 20 or so directories for all their components. That brings even very powerful NFS servers to their knees. 

Okay, this is the critical bit which was missing so far.  I think Linux
has pretty good caching for lookup failures, so the whole performance
issue was a bit puzzling.  If the whole thing runs on many nodes against
storage which lacks such caching, then I can see that this could turn
into a problem.

The main question is: Will the storage be able to cope with millions of
file opens if they magically pick the right file name (avoiding ENOENT)?
 If not, the only viable optimization seems to be the single file approach.

How will the storage react to parallel read operations on those 15
directories from many nodes?

I'm worried a bit that this turns into a request to tune ld.so to very
peculiar storage stack behavior.


Depending on what they do with Python, the Python module importer will
still cause a phenomenal amount of ENOENT traffic, and there is nothing
we can do about that because it's not related to dlopen.

Florian

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: A per-user or per-application ld.so.cache?
  2016-02-09  6:57         ` Florian Weimer
@ 2016-02-09  7:44           ` Carlos O'Donell
  2016-02-15 18:30             ` Ben Woodard
  0 siblings, 1 reply; 26+ messages in thread
From: Carlos O'Donell @ 2016-02-09  7:44 UTC (permalink / raw)
  To: Florian Weimer; +Cc: libc-alpha, Ben Woodard

On 02/09/2016 01:57 AM, Florian Weimer wrote:
> On 02/08/2016 09:36 PM, Carlos O'Donell wrote:
> 
>> Would you mind expanding on what you would find difficult? Words like better
>> or worse, in a technical context, need explicit descriptions of what is
>> better and what is worse.
> 
> I assume you want to keep a single cache file, right?

I had not considered otherwise, but Mike's suggestion of a LD_LIBRARY_CACHE
which lists multiple files has it's own appeal.

> If I understand the current situation correctly, the system cache is not
> just an optimization, it is also used to effectively extend the search
> path because otherwise, ld.so would not load libraries from
> /usr/lib64/atlas, for example.  (I have a file
> /etc/ld.so.conf.d/atlas-x86_64.conf which lists the directory
> /usr/lib64/atlas.)

Yes.

> I think this means that if you do not update cache, but install new
> system DSO versions, you might no longer be able to find all DSOs.
> Users would need some way to know when to update their caches.

System DSOs are part of /etc/ld.so.cache, and while users might use
their own personal cache to load system DSOs from system directories,
it is not recommended because the user doesn't know when those files
get updated. It's possible, but not recommended, and one should let
/etc/ld.so.cache handle that, and the sysadmin will update that cache 
(or package installs will).

With that out of the way, the user is responsible for caching anything
they have access to change.

> Or we'd have to do that as part of ld.so, but that doesn't seem to be
> particularly attractive because of the limited facilities at that point
> of process life.  This is why I asked if the loading is triggered only
> after user code has run.

Right, it happens very early.

>> The user would have to run 'ldconfig', and perhaps by default we update the
>> user cache and skip updating the global cache if the user lacks the persmissions
>> to do so. Not that different from what we do today with Fedora/RHEL spec files
>> when libraries are installed.
> 
> Yes, and I'm worried that keeping the cache in sync could be too confusing.

Then don't update the cache? Instead make the cache always work.

For example if you had a user/application cache that was relative to $HOME
or $ORIGIN (dynamic string token), then it needs no updates and is relocatable?

If you want to accelerate your application you would use ldconfig to create
a path relative cache file, and then set LD_LIBRARY_CACHE to that cache
file, and when you start your ld.so it loads that cache.

Application developers could ship the cache file with the application and
use a wrapper script to set the env var (like any other required env var
for the application).

This has the added benefit of being able to accelerate RPATH lookups using
the same strategy.

The whole plan certainly needs some more thought.

Cheers,
Carlos.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: A per-user or per-application ld.so.cache?
  2016-02-09  4:35 ` Mike Frysinger
  2016-02-09  6:04   ` Carlos O'Donell
@ 2016-02-09  9:00   ` Andreas Schwab
  1 sibling, 0 replies; 26+ messages in thread
From: Andreas Schwab @ 2016-02-09  9:00 UTC (permalink / raw)
  To: Carlos O'Donell; +Cc: GNU C Library

Mike Frysinger <vapier@gentoo.org> writes:

> i suspect looking for something like ~/.ld.so/cache wouldn't be as
> flexible, nor does it match the existing LD_xxx vars.

Also glibc should to be referring to a file in HOME.

Andreas.

-- 
Andreas Schwab, SUSE Labs, schwab@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: A per-user or per-application ld.so.cache?
  2016-02-09  7:18       ` Florian Weimer
@ 2016-02-09 23:27         ` Ben Woodard
  2016-03-08 10:11           ` Florian Weimer
  0 siblings, 1 reply; 26+ messages in thread
From: Ben Woodard @ 2016-02-09 23:27 UTC (permalink / raw)
  To: Florian Weimer; +Cc: Carlos O'Donell, GNU C Library


> On Feb 8, 2016, at 11:18 PM, Florian Weimer <fweimer@redhat.com> wrote:
> 
> On 02/08/2016 11:29 PM, Ben Woodard wrote:
> 
>> I just talked to one of the developers to get a good sense of the current problem. 
>> The sum of the on-disk file ELF files including debuginfo for one app that we looked at is around 3GB but when we just look at the text in all the ELF files it is 100-200MB depending on architecture spread across about 1400 DSOs. 
> 
> This means that copying the text files together into a single file would
> be feasible.
> 

Am I understanding you correctly? You’re suggesting linking?
We have hundreds of applications and hundreds of libraries all with their own development teams and release schedules. What you seem to be suggesting sounds like combinatoric insanity. 

Some people like the weather service may have one or a couple of apps that they run over and over but we are a national lab and we have literally thousands of users doing all sorts of things. We have around 10,000 active users. What we have is more on the scale of building a distribution like Fedora including Gnome, firefox, and openoffice with the huge tangled web of dependencies. 

>> Except for the fact that the process is starting on literally thousands of nodes simultaneously and its libraries are scattered around about 15 non-system project directories. This leads to a phenomenal number of NFS operations as the compute nodes search through 20 or so directories for all their components. That brings even very powerful NFS servers to their knees. 
> 
> Okay, this is the critical bit which was missing so far.  I think Linux
> has pretty good caching for lookup failures, so the whole performance
> issue was a bit puzzling.  If the whole thing runs on many nodes against
> storage which lacks such caching, then I can see that this could turn
> into a problem.

It isn’t just caching, it comes down to the number iops that the thundering herd of all the compute nodes participating in a MPI job generate. In the base OS there isn’t anything like a distributed cache where because one node figures out that a particular library is in a particular place the 3000 nodes participating in the job all know not to bother looking in all the places where the library wasn’t found. That kind of distributed per-application cache is what Carlos is suggesting. 

In essence spindle is a tool which solves this problem as well as providing an efficient mechanism for distributing the ELF files to the compute nodes without hammering the NFS servers.

> 
> The main question is: Will the storage be able to cope with millions of
> file opens if they magically pick the right file name (avoiding ENOENT)?
> If not, the only viable optimization seems to be the single file approach.

That is a storage system design constraint. Honestly, that is not really hard because every single compute node is asking for the same thing and so the server has the blocks in cache and just spits them out to all the nodes through its high speed network interfaces. Yes it would be better to have them flood fill out to all the nodes but that is a different problem.
> 
> How will the storage react to parallel read operations on those 15
> directories from many nodes?
> 

Once again that is a different problem that is tangentially related and solved at the center wide storage system design level.

> I'm worried a bit that this turns into a request to tune ld.so to very
> peculiar storage stack behavior.

I don’t see that. I think that this is part of the larger change in the division of labor as computing became cheaper in relation to manpower as well as the commodification of the OS and distribution. 

As computing became cheaper it is being used for a broader range of applications. Instead of shipping a huge array of every conceivable library and piece of software, we OS distributors have trimmed the system libraries down to a small supportable subset. This means that an increasing percentage of the libraries used to accomplish some task are not part of the OS distribution.

Then as we moved from a system administrator maintaining a small number of systems each carefully and deliberately configured including the software and libraries for the applications running on these servers to a more devops model where a system administrator oversees the provisioning of hundreds or thousands of machines and the developers who may not have root access must install and maintain their own software and libraries above and beyond the OS instance, the notion of a universal system wide /etc/ld.so.conf feeding into a cache becomes less and less practical. 

This isn’t tuning for a peculiar storage stack behavior. This is adapting to the reality of the way things exist now where the OS vendor and the system administrator do not have the time, inclination, or sometimes ability to configure a system for the app that is going to run on it. That is why we need to push the capability down to allow non-root users to make use of benefits of ldconfig and the ld.so.cache to optimize the load time for their work environment or application.
 
> Depending on what they do with Python, the Python module importer will
> still cause a phenomenal amount of ENOENT traffic, and there is nothing
> we can do about that because it's not related to dlopen.
> 
> Florian

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: A per-user or per-application ld.so.cache?
  2016-02-09  7:44           ` Carlos O'Donell
@ 2016-02-15 18:30             ` Ben Woodard
  2016-03-08 10:37               ` Florian Weimer
  0 siblings, 1 reply; 26+ messages in thread
From: Ben Woodard @ 2016-02-15 18:30 UTC (permalink / raw)
  To: Carlos O'Donell; +Cc: Florian Weimer, libc-alpha

I’ve been talking to the HPC tools and system guys and to my surprise they favor Florian’s approach which is to change glibc ld.so to cache the full directories of the visited in the process of finding a library. Subsequent lookups would first look in this cache before looking in subsequent directories in library search paths.

I presented to them both approaches and what I saw to be the advantages and disadvantages of both:

	The first approach is to add code to ld.so which reads the entire directory for the library search paths 
	the first time that it visits them and stores it in a cache in memory. Then instead of revisiting the 
	directories when it searches for the next library, it first searches in the the caches of the contents 
	of the directories. If it finds the file, then it tries to open there there first, if that doesn’t work it drops 
	that cache entry and falls through to the normal library loading behavior.

	The advantages of this approach are:
	1) it wouldn’t require any user retraining <- This turned out to be very important to them
	2) because they are built in memory each and every time and not stored on disk there would not be a
	problem with the cache files being out of date..

	The disadvantages are:
	1) It would consume memory for these caches. The developer advocating this said that he would be 
	willing to add an interface to drop these caches later when the app has completed its loading of objects.
	2) Every single compute node would still need to read every directory in the library search paths 
	once. <- This is one of the biggest downsides for HPC applications. It was this fact that led me to believe
	that they would prefer the second approach.
	3) If there are a lot of files in the directories other than the libraries being used the amount of memory
	being used for this cache could be notable. Notable still being measured in KB vs. MB though.
	4) It creates a second caching system parallel to the one in.so.cache
	5) Users would explicitly make code changes to drop in-memory caches

	-------------

	The second approach is to make ldconfig a command that a normal non-root user could run. It would then 
	build a ld.so.cache file either for the user or for a specific application. Then all the nodes load this cache 
	file and know exactly where to find their libraries. They wouldn’t even have to read the directories unless 
	the cache file is out of date. 

	Advantages:
	1) could be run before the job once for all the nodes. 
	2) the same cache could be loaded by all compute nodes
	3) no directory reading operations needed at all unless the cache file is out of date
	4) the cache file could persist between runs

	Disadvantages
	1) some user training required
	2) the cache file could be out of date or not match the OS version or architecture. This would basically 
	not happen in our environment. Especially if users put ldconfig in their job launch script. <- This was a big 
	issue in their mind. I argued that rebuilding the cache for a particular application would only take a few
	seconds and it could easily be added to a startup script. My impression is that their notion of this 
	approach may have been biased by viewing the cache as something semi-permanent as opposed to 
	something more ephemeral that could be quickly recreated.
	3) the code that loads the cache file would need to be substantially hardened to make sure it couldn’t 
	be abused.
	4) They also brought up that there are cases where the paths seen on the compute nodes are differen
	than the paths seen on the login nodes and in this case pre-computing a ldcache would difficult. I do 
	not see this as unresolvable as long as the user ldconfig also honors LD_LIBRARY_PATH when
	generating a ldcache for a particular application.

One mistake that I did make in this presentation is that I unintentionally presented it as an either-or choice “which one of these would you prefer?” rather than even considering the possibility of implementing both approaches.

-ben


> On Feb 8, 2016, at 11:44 PM, Carlos O'Donell <carlos@redhat.com> wrote:
> 
> On 02/09/2016 01:57 AM, Florian Weimer wrote:
>> On 02/08/2016 09:36 PM, Carlos O'Donell wrote:
>> 
>>> Would you mind expanding on what you would find difficult? Words like better
>>> or worse, in a technical context, need explicit descriptions of what is
>>> better and what is worse.
>> 
>> I assume you want to keep a single cache file, right?
> 
> I had not considered otherwise, but Mike's suggestion of a LD_LIBRARY_CACHE
> which lists multiple files has it's own appeal.
> 
>> If I understand the current situation correctly, the system cache is not
>> just an optimization, it is also used to effectively extend the search
>> path because otherwise, ld.so would not load libraries from
>> /usr/lib64/atlas, for example.  (I have a file
>> /etc/ld.so.conf.d/atlas-x86_64.conf which lists the directory
>> /usr/lib64/atlas.)
> 
> Yes.
> 
>> I think this means that if you do not update cache, but install new
>> system DSO versions, you might no longer be able to find all DSOs.
>> Users would need some way to know when to update their caches.
> 
> System DSOs are part of /etc/ld.so.cache, and while users might use
> their own personal cache to load system DSOs from system directories,
> it is not recommended because the user doesn't know when those files
> get updated. It's possible, but not recommended, and one should let
> /etc/ld.so.cache handle that, and the sysadmin will update that cache 
> (or package installs will).
> 
> With that out of the way, the user is responsible for caching anything
> they have access to change.
> 
>> Or we'd have to do that as part of ld.so, but that doesn't seem to be
>> particularly attractive because of the limited facilities at that point
>> of process life.  This is why I asked if the loading is triggered only
>> after user code has run.
> 
> Right, it happens very early.
> 
>>> The user would have to run 'ldconfig', and perhaps by default we update the
>>> user cache and skip updating the global cache if the user lacks the persmissions
>>> to do so. Not that different from what we do today with Fedora/RHEL spec files
>>> when libraries are installed.
>> 
>> Yes, and I'm worried that keeping the cache in sync could be too confusing.
> 
> Then don't update the cache? Instead make the cache always work.
> 
> For example if you had a user/application cache that was relative to $HOME
> or $ORIGIN (dynamic string token), then it needs no updates and is relocatable?
> 
> If you want to accelerate your application you would use ldconfig to create
> a path relative cache file, and then set LD_LIBRARY_CACHE to that cache
> file, and when you start your ld.so it loads that cache.
> 
> Application developers could ship the cache file with the application and
> use a wrapper script to set the env var (like any other required env var
> for the application).
> 
> This has the added benefit of being able to accelerate RPATH lookups using
> the same strategy.
> 
> The whole plan certainly needs some more thought.
> 
> Cheers,
> Carlos.
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: A per-user or per-application ld.so.cache?
  2016-02-09 23:27         ` Ben Woodard
@ 2016-03-08 10:11           ` Florian Weimer
  0 siblings, 0 replies; 26+ messages in thread
From: Florian Weimer @ 2016-03-08 10:11 UTC (permalink / raw)
  To: Ben Woodard; +Cc: Carlos O'Donell, GNU C Library

On 02/10/2016 12:27 AM, Ben Woodard wrote:
> 
>> On Feb 8, 2016, at 11:18 PM, Florian Weimer <fweimer@redhat.com> wrote:
>>
>> On 02/08/2016 11:29 PM, Ben Woodard wrote:
>>
>>> I just talked to one of the developers to get a good sense of the current problem. 
>>> The sum of the on-disk file ELF files including debuginfo for one app that we looked at is around 3GB but when we just look at the text in all the ELF files it is 100-200MB depending on architecture spread across about 1400 DSOs. 
>>
>> This means that copying the text files together into a single file would
>> be feasible.
>>
> 
> Am I understanding you correctly? YouÂ’re suggesting linking?

Yes, building a single file which contains multiple DSOs, prior to
shipping to the cluster.  This would be just a cache.  Unlike static
linking, the presence of an additional library will not cause an
observable difference (beyond use of disk space).

It may be the only way to eliminate the bottleneck.

Florian

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: A per-user or per-application ld.so.cache?
  2016-02-15 18:30             ` Ben Woodard
@ 2016-03-08 10:37               ` Florian Weimer
  2017-04-06 13:01                 ` Florian Weimer
  0 siblings, 1 reply; 26+ messages in thread
From: Florian Weimer @ 2016-03-08 10:37 UTC (permalink / raw)
  To: Ben Woodard, Carlos O'Donell; +Cc: libc-alpha

[-- Attachment #1: Type: text/plain, Size: 1214 bytes --]

On 02/15/2016 07:30 PM, Ben Woodard wrote:
> I’ve been talking to the HPC tools and system guys and to my surprise they favor Florian’s approach which is to change glibc ld.so to cache the full directories of the visited in the process of finding a library. Subsequent lookups would first look in this cache before looking in subsequent directories in library search paths.

Thanks.

Before we start working on this, I would like to double-check that their
storage copes reasonably well with parallel readdir load.

Could you ask them to run the attached benchmark program on their
cluster, in a massively parallel fashion?  All the directories on a
typical library search path have to be listed as command line arguments
(separately, i.e. not joined as one argument and separated with colons).

The results will show if the directory listing overhead is acceptable.
It is unlikely that an ld.so implementation Median and maximum job
execution time should be sufficient, but the benchmark program produces
additional diagnostic output to identify specific bottlenecks.  For
example, if the file system reports a large block size, opendir may
allocate an equally large amount of memory.

Thanks,
Florian


[-- Attachment #2: readdir-bench.c --]
[-- Type: text/plain, Size: 2378 bytes --]

/* Test program for measuring readdir speed.  */

#define _GNU_SOURCE
#include <dirent.h>
#include <errno.h>
#include <stdbool.h>
#include <stdio.h>
#include <stdlib.h>
#include <sys/statvfs.h>
#include <sys/time.h>
#include <time.h>

static struct timeval
current (void)
{
  struct timeval tv;
  if (gettimeofday (&tv, NULL) != 0)
    {
      perror ("gettimeofday");
      abort ();
    }
  return tv;
}

static double
diff (const struct timeval a, const struct timeval b)
{
  double a_sec = a.tv_sec;
  double b_sec = b.tv_sec;
  return (a_sec - b_sec) + (a.tv_usec - b.tv_usec) * 1e-6;
}

static void
print_time (const struct timeval tv)
{
  struct tm tm;
  if (gmtime_r (&tv.tv_sec, &tm) == NULL)
    {
      perror ("gmtime_r");
      abort ();
    }
  printf ("%04d-%02d-%02dT%02d:%02d:%02d.%06d ",
          1900 + tm.tm_year,
          1 + tm.tm_mon,
          tm.tm_mday,
          tm.tm_hour,
          tm.tm_min,
          tm.tm_sec,
          (int) tv.tv_usec);
}

static void
list_directory (const char *path)
{
  struct timeval before = current ();
  print_time (before);
  printf ("%s: listing directory\n", path);

  DIR *dir = opendir (path);
  if (dir == NULL)
    {
      fprintf (stderr, "opendir (\"%s\"): %m", path);
      return;
    }
  {
    struct statvfs st;
    if (fstatvfs (dirfd (dir), &st) != 0)
      fprintf (stderr, "fstatvfs (\"%s\"): %m\n", path);
    else
      {
        print_time (current ());
        printf ("%s: file system block size: %lu\n", path, st.f_bsize);
      }
  }
  unsigned long long count = 0;
  while (true)
    {
      errno = 0;
      struct dirent64 *e = readdir64 (dir);
      if (e == NULL)
        {
          if (errno != 0)
            {
              perror ("readdir");
              closedir (dir);
              return;
            }
          else
            break;
        }
      ++count;
    }
  closedir (dir);
  struct timeval after = current ();
  print_time (after);
  printf ("%s: read %llu directory entries in %g seconds\n",
          path, count, diff (after, before));
}

int
main (int argc, char **argv)
{
  struct timeval before = current ();
  print_time (before);
  printf (" starting\n");
  ++argv;
  while (*argv)
    {
      list_directory (*argv);
      ++argv;
    }
  struct timeval after = current ();
  print_time (after);
  printf ("total: %g seconds\n", diff (after, before));
}

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: A per-user or per-application ld.so.cache?
  2016-03-08 10:37               ` Florian Weimer
@ 2017-04-06 13:01                 ` Florian Weimer
  0 siblings, 0 replies; 26+ messages in thread
From: Florian Weimer @ 2017-04-06 13:01 UTC (permalink / raw)
  To: Ben Woodard, Carlos O'Donell; +Cc: libc-alpha

On 03/08/2016 11:37 AM, Florian Weimer wrote:
> On 02/15/2016 07:30 PM, Ben Woodard wrote:
>> I’ve been talking to the HPC tools and system guys and to my surprise they favor Florian’s approach which is to change glibc ld.so to cache the full directories of the visited in the process of finding a library. Subsequent lookups would first look in this cache before looking in subsequent directories in library search paths.
>
> Thanks.
>
> Before we start working on this, I would like to double-check that their
> storage copes reasonably well with parallel readdir load.
>
> Could you ask them to run the attached benchmark program on their
> cluster, in a massively parallel fashion?  All the directories on a
> typical library search path have to be listed as command line arguments
> (separately, i.e. not joined as one argument and separated with colons).
>
> The results will show if the directory listing overhead is acceptable.
> It is unlikely that an ld.so implementation Median and maximum job
> execution time should be sufficient, but the benchmark program produces
> additional diagnostic output to identify specific bottlenecks.  For
> example, if the file system reports a large block size, opendir may
> allocate an equally large amount of memory.

Hi Ben,

have you been able to run the benchmark?  Did the storage hold up well 
under the severe readdir load?

Thanks,
Florian

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2017-04-06 13:01 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-02-08 18:40 A per-user or per-application ld.so.cache? Carlos O'Donell
2016-02-08 19:10 ` Florian Weimer
2016-02-08 20:19   ` Carlos O'Donell
2016-02-08 20:25     ` Florian Weimer
2016-02-08 20:36       ` Carlos O'Donell
2016-02-08 23:00         ` Ben Woodard
2016-02-09  6:57         ` Florian Weimer
2016-02-09  7:44           ` Carlos O'Donell
2016-02-15 18:30             ` Ben Woodard
2016-03-08 10:37               ` Florian Weimer
2017-04-06 13:01                 ` Florian Weimer
2016-02-08 22:51       ` Ben Woodard
2016-02-08 22:29     ` Ben Woodard
2016-02-09  7:18       ` Florian Weimer
2016-02-09 23:27         ` Ben Woodard
2016-03-08 10:11           ` Florian Weimer
2016-02-08 19:12 ` Siddhesh Poyarekar
2016-02-08 20:14   ` Carlos O'Donell
2016-02-09  3:29     ` Siddhesh Poyarekar
2016-02-09  3:35       ` Carlos O'Donell
2016-02-09  4:19         ` Siddhesh Poyarekar
2016-02-08 19:16 ` Zack Weinberg
2016-02-08 20:10   ` Carlos O'Donell
2016-02-09  4:35 ` Mike Frysinger
2016-02-09  6:04   ` Carlos O'Donell
2016-02-09  9:00   ` Andreas Schwab

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).