public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
From: Adhemerval Zanella Netto <adhemerval.zanella@linaro.org>
To: Jan Kratochvil <jkratochvil@azul.com>
Cc: Florian Weimer <fweimer@redhat.com>,
	libc-alpha@sourceware.org, Anton Kozlov <akozlov@azul.com>,
	Siddhesh Poyarekar <siddhesh@gotplt.org>
Subject: Re: [PATCH] RFC: Provide a function to reset IFUNC PLTs
Date: Mon, 20 Mar 2023 13:47:53 -0300	[thread overview]
Message-ID: <a37d306c-1fb2-0214-df42-c7179ebe590f@linaro.org> (raw)
In-Reply-To: <ZBMp6HwLS5Q6afx+@host2.jankratochvil.net>



On 16/03/23 11:38, Jan Kratochvil wrote:
> On Thu, 09 Mar 2023 16:47:49 +0100, Adhemerval Zanella Netto wrote:
>> I am not sure how the kernel would enumerate new tasks that are created
>> while iterating over /proc/self/task.
> 
> I have updated the code as you have found a race there. Now this is no longer
> relevant as all known tasks are already verified as stopped. So there is no
> more running task to create another task. While iterating /proc/self/task:
> (1) either there must be already an unstopped task when opendir() was called,
>     in such case the iteration of /proc/self/task will be retried anyway.
> (2) or all tasks are stopped and therefore no task can create any new task.
> 
> 
>> On closefrom Linux fallback we have
>> a similar problem, where the code iterates over /proc/self/fd, and everytime
>> it closes a file descriptor it lseeks back to beginning.  It works because
>> eventually there will be no more entries on /proc/self/fd, so either you
>> will need to certify that kernel adds new tasks at the end of the getdents
>> call (used by readdir, or lseek and keep track of all tasks already signaled.
> 
> That is not needed, see above.
> 
> 
>> While it might work on the JVM where you can not fully control who change 
>> SIGUSR1  disposition (and I am not sure JVM would prevent a JNI call to do so),
>> so you can't really make it generic without explicit reserve a signal to do so, 
>> similar to what glibc does for SIGCANCEL and SIGSETXID (used to synchronize 
>> setuid functions over threads).  Meaning that callers of sigaction can't 
>> not explicit set such reserved signal.
>>
>> This is similar to what we do for SIGSETXID, so I think a proper way to
>> do it would to do always install a new signal handler to this on pthread_create,
>> on signal handle synchronize with proper async-signal-safe interface
>> (pthread_mutex_lock is not, you might accomplish with sem_post but most likely
>> you will need a atomic+futex way similar to a barrier), iterate over all 
>> dl_stack_used (so the interface can work without access to procfs), issue the 
>> signal handler or each thread, operate on the maps, then synchronize to resume
>> threads.  We can't really make it generic without accessing the internal
>> glibc thread states.
> 
> Good to know, if the patch gets a serious consideration for upstreaming
> I understand the signal number needs to be handled better.
> 

For libc point of view I don't think this interface make sense without it itself
do the stop-the-world; since potentially calling while ifunc might be resolved 
adds another kind of concurrency that is far from ideal (we already have to handle
lazy binding, dl_profile, and other concurrent notes that touch the GOT and
associated data for symbol resolution).

> 
>> And you will also need to reallocate not only glibc, but potentially *all*
>> libraries (since ifunc can be used by any function).
> 
> This is what the patch already does by _dl_relocate_object().
> 
> 
>>> So the only remaining option is that all the programs will be doing
>>> setenv("GLIBC_TUNABLES=glibc.cpu.hwcaps=...") and re-exec(). That is
>>> a peformance kill and definitely not nice compared to any method of an IFUNC
>>> reset.
>>
>> Assuming you don't reset env variable on process spawning, you can set it as 
>> default for the session.
> 
> The /usr/bin/java program needs to setenv("GLIBC_TUNABLES=glibc.cpu.hwcaps=...")
> and then it can either system("itself") or exec("itself"). This is what you
> mean by the session?

I was thinking invoking java itself with GLIBC_TUNABLES, but I think it would
make sense to add a way to globally set tunables without the need to setting
the environment variable.

We already some ideas in mind [1], where I have in mind a system defined file
(as glibc already have for preload and ld.so.cache) that sets the defaults.
Siddhesh idea is to have a global one that is *not* overwritten by users,
while my idea is the opposite where the global sets the default, and users
can override it.  This is open to discussion on how this would work, if we
we will need an extra flag to disable it (similar to DF_1_NODEFLIB), how 
user define GLIBC_TUNABLES will interact with it, etc.

[1] https://sourceware.org/pipermail/libc-alpha/2023-March/146405.html

> 
> 
>> Another option would to deploy a glibc built with 
>> --disable-multi-arch; it will disable ifunc generation.
> 
> That is not an option. OpenJDK must be compatible with normal existing Linux
> OSes.
> 

I do not know exactly how criu works internally, but from you answer and if I
understood it correctly when criu restores a dump it will only restore private 
mappings from the process and not shared ones?  If so, how does it handle 
restoring on a system on an older glibc (or on a system with any dependency
older than the one that the process was dumped)? Does it only restore file
backed shared mappings based on naming?

Using a custom built glibc with --disable-multi-arch along with rpath, you
have a dissociated glibc from the system.  You can even tune to use an 
specific x86_64-vX abi if you see that the workload benefits from some
specific string/mem/math routine.

> 
>> And IMHO this is way nicer because this IFUNC reset as-is, without a proper 
>> stop-the-word support, is not safe and adds another corner case for the already
>> over-complicated ifunc interface.
> 
> stop-the-world is already implemented modulo possible bugfixes.
> 	https://github.com/openjdk/crac/pull/41/files#diff-aeec57d804d56002f26a85359fc4ac8b48cfc249d57c656a30a63fc6bf3457adR6029
> 

Again, from a Java perspective it might be fine.  But for a generic C interface
it still has a lot of corner cases.

>>> In Java world the other libraries (in general, there are some JNI exceptions)
>>> do not matter as they are a Java code JIT-compiled by JVM.
>>
>> And this won't be a Java specific interface, but rather a GNU extension for C
>> library.  So we must make it as concise as possible, without adding any other
>> security or undefined behavior.  
> 
> I agree. Handling IFUNC for other libraries is also possible but it has to be
> a next step. It does not make sense to handle IFUNC in other libraries when
> glibc still crashes first.
> 
> 
> On Thu, 09 Mar 2023 18:43:08 +0100, Adhemerval Zanella Netto wrote:
>> And the 'handler' signal handler has some potential shortcomings as well: 
>>
>>   * backtrace is not async-signal-safe: glibc implementation on first call
>>     issues dlopen, which calls malloc; and libgcc_eh.so *might* also calls malloc.
> 
> I do not see it in practice:
> 	Temporary breakpoint 1, main () at backtrace.c:5
> 	5	  backtrace(buf, sizeof(buf)/sizeof(*buf));
> 	(gdb) b dlopen
> 	Breakpoint 2 at 0x7ffff7c88f20: file dlopen.c, line 77.
> 	(gdb) c
> 	Continuing.
> 	[Inferior 1 (process 825695) exited normally]
> 
> And neither in the sources:
> glibc$ grep dlopen $(find -iname "*backtrace*")

Check misc/unwind-link.c:

 48   /* Initialize a copy of the data, so that we do not need about
 49      unlocking in case the dynamic loader somehow triggers
 50      unwinding.  */
 51   void *local_libgcc_handle = __libc_dlopen (LIBGCC_S_SO);
 52   if (local_libgcc_handle == NULL)
 53     {
 54       __libc_lock_unlock (lock);
 55       return NULL;
 56     }

The backtrace calls it through UNWIND_LINK_PTR macro.  You might not seeing it
because we cache libgcc_s.so loading and once the programs calls backtrace or
pthread_cancel, it won't call __libc_unwind_link_get anymore.

> 
> 
>>   * pthread calls are not async-signal-safe either.
> 
> There are no pthread_* calls, everything is based on kernel tasks.

The SIGUSR1 handler it installs on freeze() uses both mutexes and cond variables,
and this is quite easy to deadload if a thread sends a SIGUSR1 if its not suppose
to (and both pthread_mutex_lock and pthread_cond_signal are not async-signal-safe).

> 
> 
>>   * it only handles libc.so, other libraries that uses ifunc for function
>>     selection also fails.
> 
> You are right, I have mostly implemented this hard-coded "libc.so.6" to make
> it general (for any libraries containing at least one STT_GNU_IFUNC) although
> I haven't finished this implementation due to the last paragraph below.
> 
> 
>>   * the syscall heuristics do not handle partial results (for instance if
>>     write syscall returns do EINTR).
> 
> I do not think EINTR would matter. The syscall heuristics is there expecting
> that any library function which contains syscalls is not an IFUNC function.
> 
> 
>> So this code has the potential of deadlock, specially if you have another
>> thread issuing malloc.
> 
> I may have missed something but I do not see it so according to the answers
> above.
> 



  reply	other threads:[~2023-03-20 16:47 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-06  8:04 Jan Kratochvil
2023-03-07  8:40 ` Florian Weimer
2023-03-07 13:07   ` Adhemerval Zanella Netto
2023-03-08 10:21     ` Jan Kratochvil
2023-03-08 13:04       ` Adhemerval Zanella Netto
2023-03-09 11:32         ` Jan Kratochvil
2023-03-09 15:47           ` Adhemerval Zanella Netto
2023-03-09 17:43             ` Adhemerval Zanella Netto
2023-03-16 14:38             ` Jan Kratochvil
2023-03-20 16:47               ` Adhemerval Zanella Netto [this message]
2023-03-29 12:12                 ` Jan Kratochvil
2023-03-29 13:14                   ` Adhemerval Zanella Netto
2023-03-13 13:59           ` Florian Weimer
2023-03-14 12:55             ` Jan Kratochvil
2023-03-14 14:49               ` Florian Weimer
2023-03-14 15:06                 ` Jan Kratochvil
2023-03-08 10:23   ` Jan Kratochvil
2023-03-08 10:44     ` Florian Weimer
2023-03-08 11:03       ` Jan Kratochvil

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a37d306c-1fb2-0214-df42-c7179ebe590f@linaro.org \
    --to=adhemerval.zanella@linaro.org \
    --cc=akozlov@azul.com \
    --cc=fweimer@redhat.com \
    --cc=jkratochvil@azul.com \
    --cc=libc-alpha@sourceware.org \
    --cc=siddhesh@gotplt.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).