public inbox for libc-help@sourceware.org
 help / color / mirror / Atom feed
* inode-based dlopen caching
@ 2021-06-05 13:59 Soni L.
  2021-06-07 21:53 ` Adhemerval Zanella
  2021-06-08 16:56 ` Florian Weimer
  0 siblings, 2 replies; 12+ messages in thread
From: Soni L. @ 2021-06-05 13:59 UTC (permalink / raw)
  To: libc-help

Currently dlopen caching is based on filenames, it'd be nice if it was
based on inodes to support better "re"loading (aka loading a new module
with the same name because unloading modules with threads is never a
good idea). This is good for stuff that deals with plugins.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: inode-based dlopen caching
  2021-06-05 13:59 inode-based dlopen caching Soni L.
@ 2021-06-07 21:53 ` Adhemerval Zanella
  2021-06-07 22:50   ` Soni L.
  2021-06-08 16:56 ` Florian Weimer
  1 sibling, 1 reply; 12+ messages in thread
From: Adhemerval Zanella @ 2021-06-07 21:53 UTC (permalink / raw)
  To: Soni L., Libc-help



On 05/06/2021 10:59, Soni L. via Libc-help wrote:
> Currently dlopen caching is based on filenames, it'd be nice if it was
> based on inodes to support better "re"loading (aka loading a new module
> with the same name because unloading modules with threads is never a
> good idea). This is good for stuff that deals with plugins.
> 

What do you mean by 'caching' in this scenario? glibc does not maintain
a cache of loaded libraries, different than other implementation it
does try to unload the library on dlclose (there are cases where it
is not readily possible due dependency chains). 

And I am not seeing how inote-bases dlopen really helps here, if inode 
changes means that file was potentially changed (so you will need to 
proper dclose it).  I think using filenames is in fact the proper way
here, since Linux does the hard lifting of the inode cache and provide
fast file access and mmap support for shared libraries.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: inode-based dlopen caching
  2021-06-07 21:53 ` Adhemerval Zanella
@ 2021-06-07 22:50   ` Soni L.
  2021-06-08 13:14     ` Adhemerval Zanella
  0 siblings, 1 reply; 12+ messages in thread
From: Soni L. @ 2021-06-07 22:50 UTC (permalink / raw)
  To: Adhemerval Zanella, Libc-help



On 2021-06-07 6:53 p.m., Adhemerval Zanella wrote:
> 
> 
> On 05/06/2021 10:59, Soni L. via Libc-help wrote:
>> Currently dlopen caching is based on filenames, it'd be nice if it was
>> based on inodes to support better "re"loading (aka loading a new module
>> with the same name because unloading modules with threads is never a
>> good idea). This is good for stuff that deals with plugins.
>> 
> 
> What do you mean by 'caching' in this scenario? glibc does not maintain
> a cache of loaded libraries, different than other implementation it
> does try to unload the library on dlclose (there are cases where it
> is not readily possible due dependency chains). 
> 
> And I am not seeing how inote-bases dlopen really helps here, if inode 
> changes means that file was potentially changed (so you will need to 
> proper dclose it).  I think using filenames is in fact the proper way
> here, since Linux does the hard lifting of the inode cache and provide
> fast file access and mmap support for shared libraries.
> 
You can't unload a dlopen that uses threads, at least not safely. So for
all intents and purposes you can't unload it. Instead you need to tell
it you're done with it, but not unload it, and load the new one. But
that's the problem - the filename-based stuff means you can't load the
new one.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: inode-based dlopen caching
  2021-06-07 22:50   ` Soni L.
@ 2021-06-08 13:14     ` Adhemerval Zanella
  2021-06-08 16:26       ` Soni L.
  0 siblings, 1 reply; 12+ messages in thread
From: Adhemerval Zanella @ 2021-06-08 13:14 UTC (permalink / raw)
  To: Soni L., Libc-help



On 07/06/2021 19:50, Soni L. wrote:
> 
> 
> On 2021-06-07 6:53 p.m., Adhemerval Zanella wrote:
>>
>>
>> On 05/06/2021 10:59, Soni L. via Libc-help wrote:
>>> Currently dlopen caching is based on filenames, it'd be nice if it was
>>> based on inodes to support better "re"loading (aka loading a new module
>>> with the same name because unloading modules with threads is never a
>>> good idea). This is good for stuff that deals with plugins.
>>>
>>
>> What do you mean by 'caching' in this scenario? glibc does not maintain
>> a cache of loaded libraries, different than other implementation it
>> does try to unload the library on dlclose (there are cases where it
>> is not readily possible due dependency chains). 
>>
>> And I am not seeing how inote-bases dlopen really helps here, if inode 
>> changes means that file was potentially changed (so you will need to 
>> proper dclose it).  I think using filenames is in fact the proper way
>> here, since Linux does the hard lifting of the inode cache and provide
>> fast file access and mmap support for shared libraries.
>>
> You can't unload a dlopen that uses threads, at least not safely. So for
> all intents and purposes you can't unload it. Instead you need to tell
> it you're done with it, but not unload it, and load the new one. But
> that's the problem - the filename-based stuff means you can't load the
> new one.
> 

Sure you can unload a dlopen library, the API makes the program responsible
to synchronize the access (since dlsym/dlvsym returns an function pointer).

If I understood correctly what you are suggesting is making dlclose a noop,
so a newer dlopen will also be a noop if it is essentially the same shared
object (what happen if the shared library is updated and the inode keep the
same?).

This is design choice to actually unload the shared object on dlclose and
changing it because it might incurs in concurrent issues on programs that
do not synchronize its access is a really bad motivation.  There are multiple 
better ways to handle it, either by wrapping with a more user-friendly API or
using a high level language.

If the motivation is to avoid the potential synchronization issues libc
itself need to handle (such as TLS and other shared resources), it could
be a better motivation.  But even it is a trade off of keep allocated 
resources even when caller states it does not need them anymore.  As
fair I know this the design musl-libc has done.

We could do it, but we have been fixing an improving the dynamic loader
over time that makes this approach also complex and with not large benefits.
Also, we still need to do some filename caching to handle things as
RUNPATH/RPATH and SONAME; so the implementation to also take in consideration
inode might add even more complexity and have more corner cases.
 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: inode-based dlopen caching
  2021-06-08 13:14     ` Adhemerval Zanella
@ 2021-06-08 16:26       ` Soni L.
  2021-06-08 16:51         ` Adhemerval Zanella
  0 siblings, 1 reply; 12+ messages in thread
From: Soni L. @ 2021-06-08 16:26 UTC (permalink / raw)
  To: Adhemerval Zanella, Libc-help



On 2021-06-08 10:14 a.m., Adhemerval Zanella wrote:
> 
> 
> On 07/06/2021 19:50, Soni L. wrote:
>> 
>> 
>> On 2021-06-07 6:53 p.m., Adhemerval Zanella wrote:
>>>
>>>
>>> On 05/06/2021 10:59, Soni L. via Libc-help wrote:
>>>> Currently dlopen caching is based on filenames, it'd be nice if it was
>>>> based on inodes to support better "re"loading (aka loading a new module
>>>> with the same name because unloading modules with threads is never a
>>>> good idea). This is good for stuff that deals with plugins.
>>>>
>>>
>>> What do you mean by 'caching' in this scenario? glibc does not maintain
>>> a cache of loaded libraries, different than other implementation it
>>> does try to unload the library on dlclose (there are cases where it
>>> is not readily possible due dependency chains). 
>>>
>>> And I am not seeing how inote-bases dlopen really helps here, if inode 
>>> changes means that file was potentially changed (so you will need to 
>>> proper dclose it).  I think using filenames is in fact the proper way
>>> here, since Linux does the hard lifting of the inode cache and provide
>>> fast file access and mmap support for shared libraries.
>>>
>> You can't unload a dlopen that uses threads, at least not safely. So for
>> all intents and purposes you can't unload it. Instead you need to tell
>> it you're done with it, but not unload it, and load the new one. But
>> that's the problem - the filename-based stuff means you can't load the
>> new one.
>> 
> 
> Sure you can unload a dlopen library, the API makes the program responsible
> to synchronize the access (since dlsym/dlvsym returns an function pointer).
> [snip]
If the shared library creates its own threads, those won't be killed
when the shared library is closed. If the shared library is then
unloaded, those threads will be running code from the void. That's a
problem.

Being able to update the library, without getting rid of the original
(i.e. by unlinking the original file first), allows the program to
gracefully update loaded plugins without a full restart.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: inode-based dlopen caching
  2021-06-08 16:26       ` Soni L.
@ 2021-06-08 16:51         ` Adhemerval Zanella
  0 siblings, 0 replies; 12+ messages in thread
From: Adhemerval Zanella @ 2021-06-08 16:51 UTC (permalink / raw)
  To: Soni L., Libc-help



On 08/06/2021 13:26, Soni L. wrote:
> 
> 
> On 2021-06-08 10:14 a.m., Adhemerval Zanella wrote:
>>
>>
>> On 07/06/2021 19:50, Soni L. wrote:
>>>
>>>
>>> On 2021-06-07 6:53 p.m., Adhemerval Zanella wrote:
>>>>
>>>>
>>>> On 05/06/2021 10:59, Soni L. via Libc-help wrote:
>>>>> Currently dlopen caching is based on filenames, it'd be nice if it was
>>>>> based on inodes to support better "re"loading (aka loading a new module
>>>>> with the same name because unloading modules with threads is never a
>>>>> good idea). This is good for stuff that deals with plugins.
>>>>>
>>>>
>>>> What do you mean by 'caching' in this scenario? glibc does not maintain
>>>> a cache of loaded libraries, different than other implementation it
>>>> does try to unload the library on dlclose (there are cases where it
>>>> is not readily possible due dependency chains). 
>>>>
>>>> And I am not seeing how inote-bases dlopen really helps here, if inode 
>>>> changes means that file was potentially changed (so you will need to 
>>>> proper dclose it).  I think using filenames is in fact the proper way
>>>> here, since Linux does the hard lifting of the inode cache and provide
>>>> fast file access and mmap support for shared libraries.
>>>>
>>> You can't unload a dlopen that uses threads, at least not safely. So for
>>> all intents and purposes you can't unload it. Instead you need to tell
>>> it you're done with it, but not unload it, and load the new one. But
>>> that's the problem - the filename-based stuff means you can't load the
>>> new one.
>>>
>>
>> Sure you can unload a dlopen library, the API makes the program responsible
>> to synchronize the access (since dlsym/dlvsym returns an function pointer).
>> [snip]
> If the shared library creates its own threads, those won't be killed
> when the shared library is closed. If the shared library is then
> unloaded, those threads will be running code from the void. That's a
> problem.
> 
> Being able to update the library, without getting rid of the original
> (i.e. by unlinking the original file first), allows the program to
> gracefully update loaded plugins without a full restart.
> 

Again, this is a library issue that should be dealt by the provided
API by the library (such as providing a cleanup handler to synchronize
or cancel the threads execution).

In this scenario you are describing you will end up with the library
loaded in two different mapping with potentially two different code
(since you are updating the library and dlclose might be a noop).
I really don't see this as an improvement, it is rather a potential
trigger to subtle issues, specially if the threads trying to sync
with process shared resources.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: inode-based dlopen caching
  2021-06-05 13:59 inode-based dlopen caching Soni L.
  2021-06-07 21:53 ` Adhemerval Zanella
@ 2021-06-08 16:56 ` Florian Weimer
  2021-06-08 17:19   ` Adhemerval Zanella
  1 sibling, 1 reply; 12+ messages in thread
From: Florian Weimer @ 2021-06-08 16:56 UTC (permalink / raw)
  To: Soni L. via Libc-help

* Soni L. via Libc-help:

> Currently dlopen caching is based on filenames, it'd be nice if it was
> based on inodes to support better "re"loading (aka loading a new module
> with the same name because unloading modules with threads is never a
> good idea). This is good for stuff that deals with plugins.

It's an interesting idea.  We'd probably also want a flag that hides the
symbols from general binding and makes them available for direct dlsym
lookups using the handle returned by dlopen (otherwise the old
definitions stick around).

The tricky question is what to do about dependencies.  A behavioral
change for just one level is not too hard, but everything goes further
is difficult.

Vivek Das Mohapatra's RTLD_SHARED patches may help with isolating
dependencies.

Thanks,
Florian


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: inode-based dlopen caching
  2021-06-08 16:56 ` Florian Weimer
@ 2021-06-08 17:19   ` Adhemerval Zanella
  2021-06-08 17:20     ` Florian Weimer
  0 siblings, 1 reply; 12+ messages in thread
From: Adhemerval Zanella @ 2021-06-08 17:19 UTC (permalink / raw)
  To: Florian Weimer, Libc-help



On 08/06/2021 13:56, Florian Weimer via Libc-help wrote:
> * Soni L. via Libc-help:
> 
>> Currently dlopen caching is based on filenames, it'd be nice if it was
>> based on inodes to support better "re"loading (aka loading a new module
>> with the same name because unloading modules with threads is never a
>> good idea). This is good for stuff that deals with plugins.
> 
> It's an interesting idea.  We'd probably also want a flag that hides the
> symbols from general binding and makes them available for direct dlsym
> lookups using the handle returned by dlopen (otherwise the old
> definitions stick around).
> 
> The tricky question is what to do about dependencies.  A behavioral
> change for just one level is not too hard, but everything goes further
> is difficult.
> 
> Vivek Das Mohapatra's RTLD_SHARED patches may help with isolating
> dependencies.

The RTLD_SHARED with the -Wl,-z,unique might help with the dependency,
but it would require the caller to proper setup the linker flag on
the specific shared libraries.

But my main reservation with this is the idea of reload the new module
is, although the symbol resolution is a different namespace, it still
share the same process resources.  It would require a lot of careful
code within the library so it can run with multiple instances, we see
the potential issues with our static dlopen support.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: inode-based dlopen caching
  2021-06-08 17:19   ` Adhemerval Zanella
@ 2021-06-08 17:20     ` Florian Weimer
  2021-06-08 18:10       ` Soni L.
  0 siblings, 1 reply; 12+ messages in thread
From: Florian Weimer @ 2021-06-08 17:20 UTC (permalink / raw)
  To: Adhemerval Zanella; +Cc: Libc-help

* Adhemerval Zanella:

> But my main reservation with this is the idea of reload the new module
> is, although the symbol resolution is a different namespace, it still
> share the same process resources.  It would require a lot of careful
> code within the library so it can run with multiple instances, we see
> the potential issues with our static dlopen support.

It would work quite well for a lot of stateless JNI wrappers or Python
extensions, though.  That alone looks like a relevant use case to me.

Thanks,
Florian


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: inode-based dlopen caching
  2021-06-08 17:20     ` Florian Weimer
@ 2021-06-08 18:10       ` Soni L.
  2021-06-08 18:17         ` Florian Weimer
  0 siblings, 1 reply; 12+ messages in thread
From: Soni L. @ 2021-06-08 18:10 UTC (permalink / raw)
  To: Florian Weimer, Adhemerval Zanella; +Cc: Libc-help



On 2021-06-08 2:20 p.m., Florian Weimer via Libc-help wrote:
> * Adhemerval Zanella:
> 
>> But my main reservation with this is the idea of reload the new module
>> is, although the symbol resolution is a different namespace, it still
>> share the same process resources.  It would require a lot of careful
>> code within the library so it can run with multiple instances, we see
>> the potential issues with our static dlopen support.
> 
> It would work quite well for a lot of stateless JNI wrappers or Python
> extensions, though.  That alone looks like a relevant use case to me.
> 
> Thanks,
> Florian
> 

The motivating use-case is hexchat plugins. Especially when they're
written in Rust. Rust is supposed to provide memory safety but some
edge-cases with dlopen break that, like truncating the shared library,
or closing a shared library that has spawned threads.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: inode-based dlopen caching
  2021-06-08 18:10       ` Soni L.
@ 2021-06-08 18:17         ` Florian Weimer
  2021-06-08 19:25           ` Adhemerval Zanella
  0 siblings, 1 reply; 12+ messages in thread
From: Florian Weimer @ 2021-06-08 18:17 UTC (permalink / raw)
  To: Soni L.; +Cc: Adhemerval Zanella, Libc-help

* Soni L.:

> The motivating use-case is hexchat plugins. Especially when they're
> written in Rust. Rust is supposed to provide memory safety but some
> edge-cases with dlopen break that, like truncating the shared library,
> or closing a shared library that has spawned threads.

Truncating the shared object will always cause problems until the kernel
implements MAP_COPY, or we stop mapping code in the dynamic loader.

Another option (implemented by GHC and others) is to have a customer
loader.  Except for initial-exec memory and symbol interposition, there
is nothing magic at all about dlopen.  Applications certainly can
implement their own object code loading mechanisms.

Thanks,
Florian


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: inode-based dlopen caching
  2021-06-08 18:17         ` Florian Weimer
@ 2021-06-08 19:25           ` Adhemerval Zanella
  0 siblings, 0 replies; 12+ messages in thread
From: Adhemerval Zanella @ 2021-06-08 19:25 UTC (permalink / raw)
  To: Florian Weimer, Soni L.; +Cc: Libc-help



On 08/06/2021 15:17, Florian Weimer wrote:
> * Soni L.:
> 
>> The motivating use-case is hexchat plugins. Especially when they're
>> written in Rust. Rust is supposed to provide memory safety but some
>> edge-cases with dlopen break that, like truncating the shared library,
>> or closing a shared library that has spawned threads.
> 
> Truncating the shared object will always cause problems until the kernel
> implements MAP_COPY, or we stop mapping code in the dynamic loader.

Which I recall some discussion from Linus won't going to happen (unless
he changed his mind, the discussion it some years old already).

> 
> Another option (implemented by GHC and others) is to have a customer
> loader.  Except for initial-exec memory and symbol interposition, there
> is nothing magic at all about dlopen.  Applications certainly can
> implement their own object code loading mechanisms.

I think the main problem is implement all the ELF idiosyncrasies
correctly, assuming that ELF is used in first place.

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2021-06-08 19:25 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-05 13:59 inode-based dlopen caching Soni L.
2021-06-07 21:53 ` Adhemerval Zanella
2021-06-07 22:50   ` Soni L.
2021-06-08 13:14     ` Adhemerval Zanella
2021-06-08 16:26       ` Soni L.
2021-06-08 16:51         ` Adhemerval Zanella
2021-06-08 16:56 ` Florian Weimer
2021-06-08 17:19   ` Adhemerval Zanella
2021-06-08 17:20     ` Florian Weimer
2021-06-08 18:10       ` Soni L.
2021-06-08 18:17         ` Florian Weimer
2021-06-08 19:25           ` Adhemerval Zanella

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).