From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-oo1-xc2c.google.com (mail-oo1-xc2c.google.com [IPv6:2607:f8b0:4864:20::c2c]) by sourceware.org (Postfix) with ESMTPS id 028513858417 for ; Mon, 20 Mar 2023 16:47:58 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 028513858417 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=linaro.org Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linaro.org Received: by mail-oo1-xc2c.google.com with SMTP id n3-20020a4ad403000000b0053351dadc20so2003333oos.13 for ; Mon, 20 Mar 2023 09:47:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1679330877; h=content-transfer-encoding:in-reply-to:organization:from:references :cc:to:content-language:subject:user-agent:mime-version:date :message-id:from:to:cc:subject:date:message-id:reply-to; bh=ZvooLPBtikDrDuB5PaJTGisa9YxCR6apI/wxlJSIZ2g=; b=qCvrl1WOR6ifKqVul6n1cy4Tna8Ovt3BUw1F59D8F1u94xpkABPAXTLxFZoEQWUsT4 Vy+2aP3YcPHO6UtHfa5odadpvnXX4b9DtE9LNl65Ko7bEJ7PnPYDOFh6WIkoe+RFZLCu 4ltPAtr7SOEpWT/gpFyRBWtgUKXofvF9yCFwrRM3jghpRZhGnQrZFcuygcqCzSgrpeC9 3RBg4rwBN37ZfkvCAZje0Wo7nNHchP8KHYubrqhntUDHlGyEaarcTxLjDuhT9Ce2JOGC SKZrXhWnMCyt3trWa/jjn22J91XVkV6LaML7jt19felJmV1PTuT9Yv649cd6y7A7Y3WS HU7A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1679330877; h=content-transfer-encoding:in-reply-to:organization:from:references :cc:to:content-language:subject:user-agent:mime-version:date :message-id:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=ZvooLPBtikDrDuB5PaJTGisa9YxCR6apI/wxlJSIZ2g=; b=4wA3X4wffNikeDzZjR3iDB9O5TphqbQCLXn1KObcdw2EH4J4MVj5DdFvIdB2vJwYve qjH5y+iQkOp7YprPVR87t3vfKUU8fz8IHce/eknV5OB/n0Bf1ftfVnFmKMXFjWblv35F RRJSjc0u0dLX2c/IFBCAsuHrgmDiiXWLQ3+DWKpZs/m4+jo0L2qaXuerYWcsZcWAow+H kHO8VzifG0Fy8QANFVWkAQvibafnQNx5Er+T4tHYg33w9A1/xnGmm8BrPCd0epISnPCH n1rh5WO2LNGSaGYU6VP342wldXI8IWspUdTy0jIxnJ6aiPjj8A75Ev9BslIRcUPUP2Zo p5Pg== X-Gm-Message-State: AO0yUKXSOXD9vpK+YWoKlYWBewTFByUUwOzSpB9E6POu3JC73lQWc9mY i76Pnsxx71dz97mq99TO/PO9WQ== X-Google-Smtp-Source: AK7set+5rK5Qv0mlQLZtDm0n41SK7zHWDBbfrJ1RZICo3AE3VWM90QMJE+0PA/YMwB1CyTSZAwUazA== X-Received: by 2002:a4a:3715:0:b0:533:c6b7:27dc with SMTP id r21-20020a4a3715000000b00533c6b727dcmr155036oor.0.1679330877047; Mon, 20 Mar 2023 09:47:57 -0700 (PDT) Received: from ?IPV6:2804:1b3:a7c0:c260:d4b6:6c90:6159:ac3d? ([2804:1b3:a7c0:c260:d4b6:6c90:6159:ac3d]) by smtp.gmail.com with ESMTPSA id b1-20020a4a98c1000000b00524f47b4682sm3944864ooj.10.2023.03.20.09.47.54 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 20 Mar 2023 09:47:56 -0700 (PDT) Message-ID: Date: Mon, 20 Mar 2023 13:47:53 -0300 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:102.0) Gecko/20100101 Thunderbird/102.9.0 Subject: Re: [PATCH] RFC: Provide a function to reset IFUNC PLTs Content-Language: en-US To: Jan Kratochvil Cc: Florian Weimer , libc-alpha@sourceware.org, Anton Kozlov , Siddhesh Poyarekar References: From: Adhemerval Zanella Netto Organization: Linaro In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-5.2 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,NICE_REPLY_A,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On 16/03/23 11:38, Jan Kratochvil wrote: > On Thu, 09 Mar 2023 16:47:49 +0100, Adhemerval Zanella Netto wrote: >> I am not sure how the kernel would enumerate new tasks that are created >> while iterating over /proc/self/task. > > I have updated the code as you have found a race there. Now this is no longer > relevant as all known tasks are already verified as stopped. So there is no > more running task to create another task. While iterating /proc/self/task: > (1) either there must be already an unstopped task when opendir() was called, > in such case the iteration of /proc/self/task will be retried anyway. > (2) or all tasks are stopped and therefore no task can create any new task. > > >> On closefrom Linux fallback we have >> a similar problem, where the code iterates over /proc/self/fd, and everytime >> it closes a file descriptor it lseeks back to beginning. It works because >> eventually there will be no more entries on /proc/self/fd, so either you >> will need to certify that kernel adds new tasks at the end of the getdents >> call (used by readdir, or lseek and keep track of all tasks already signaled. > > That is not needed, see above. > > >> While it might work on the JVM where you can not fully control who change >> SIGUSR1 disposition (and I am not sure JVM would prevent a JNI call to do so), >> so you can't really make it generic without explicit reserve a signal to do so, >> similar to what glibc does for SIGCANCEL and SIGSETXID (used to synchronize >> setuid functions over threads). Meaning that callers of sigaction can't >> not explicit set such reserved signal. >> >> This is similar to what we do for SIGSETXID, so I think a proper way to >> do it would to do always install a new signal handler to this on pthread_create, >> on signal handle synchronize with proper async-signal-safe interface >> (pthread_mutex_lock is not, you might accomplish with sem_post but most likely >> you will need a atomic+futex way similar to a barrier), iterate over all >> dl_stack_used (so the interface can work without access to procfs), issue the >> signal handler or each thread, operate on the maps, then synchronize to resume >> threads. We can't really make it generic without accessing the internal >> glibc thread states. > > Good to know, if the patch gets a serious consideration for upstreaming > I understand the signal number needs to be handled better. > For libc point of view I don't think this interface make sense without it itself do the stop-the-world; since potentially calling while ifunc might be resolved adds another kind of concurrency that is far from ideal (we already have to handle lazy binding, dl_profile, and other concurrent notes that touch the GOT and associated data for symbol resolution). > >> And you will also need to reallocate not only glibc, but potentially *all* >> libraries (since ifunc can be used by any function). > > This is what the patch already does by _dl_relocate_object(). > > >>> So the only remaining option is that all the programs will be doing >>> setenv("GLIBC_TUNABLES=glibc.cpu.hwcaps=...") and re-exec(). That is >>> a peformance kill and definitely not nice compared to any method of an IFUNC >>> reset. >> >> Assuming you don't reset env variable on process spawning, you can set it as >> default for the session. > > The /usr/bin/java program needs to setenv("GLIBC_TUNABLES=glibc.cpu.hwcaps=...") > and then it can either system("itself") or exec("itself"). This is what you > mean by the session? I was thinking invoking java itself with GLIBC_TUNABLES, but I think it would make sense to add a way to globally set tunables without the need to setting the environment variable. We already some ideas in mind [1], where I have in mind a system defined file (as glibc already have for preload and ld.so.cache) that sets the defaults. Siddhesh idea is to have a global one that is *not* overwritten by users, while my idea is the opposite where the global sets the default, and users can override it. This is open to discussion on how this would work, if we we will need an extra flag to disable it (similar to DF_1_NODEFLIB), how user define GLIBC_TUNABLES will interact with it, etc. [1] https://sourceware.org/pipermail/libc-alpha/2023-March/146405.html > > >> Another option would to deploy a glibc built with >> --disable-multi-arch; it will disable ifunc generation. > > That is not an option. OpenJDK must be compatible with normal existing Linux > OSes. > I do not know exactly how criu works internally, but from you answer and if I understood it correctly when criu restores a dump it will only restore private mappings from the process and not shared ones? If so, how does it handle restoring on a system on an older glibc (or on a system with any dependency older than the one that the process was dumped)? Does it only restore file backed shared mappings based on naming? Using a custom built glibc with --disable-multi-arch along with rpath, you have a dissociated glibc from the system. You can even tune to use an specific x86_64-vX abi if you see that the workload benefits from some specific string/mem/math routine. > >> And IMHO this is way nicer because this IFUNC reset as-is, without a proper >> stop-the-word support, is not safe and adds another corner case for the already >> over-complicated ifunc interface. > > stop-the-world is already implemented modulo possible bugfixes. > https://github.com/openjdk/crac/pull/41/files#diff-aeec57d804d56002f26a85359fc4ac8b48cfc249d57c656a30a63fc6bf3457adR6029 > Again, from a Java perspective it might be fine. But for a generic C interface it still has a lot of corner cases. >>> In Java world the other libraries (in general, there are some JNI exceptions) >>> do not matter as they are a Java code JIT-compiled by JVM. >> >> And this won't be a Java specific interface, but rather a GNU extension for C >> library. So we must make it as concise as possible, without adding any other >> security or undefined behavior. > > I agree. Handling IFUNC for other libraries is also possible but it has to be > a next step. It does not make sense to handle IFUNC in other libraries when > glibc still crashes first. > > > On Thu, 09 Mar 2023 18:43:08 +0100, Adhemerval Zanella Netto wrote: >> And the 'handler' signal handler has some potential shortcomings as well: >> >> * backtrace is not async-signal-safe: glibc implementation on first call >> issues dlopen, which calls malloc; and libgcc_eh.so *might* also calls malloc. > > I do not see it in practice: > Temporary breakpoint 1, main () at backtrace.c:5 > 5 backtrace(buf, sizeof(buf)/sizeof(*buf)); > (gdb) b dlopen > Breakpoint 2 at 0x7ffff7c88f20: file dlopen.c, line 77. > (gdb) c > Continuing. > [Inferior 1 (process 825695) exited normally] > > And neither in the sources: > glibc$ grep dlopen $(find -iname "*backtrace*") Check misc/unwind-link.c: 48 /* Initialize a copy of the data, so that we do not need about 49 unlocking in case the dynamic loader somehow triggers 50 unwinding. */ 51 void *local_libgcc_handle = __libc_dlopen (LIBGCC_S_SO); 52 if (local_libgcc_handle == NULL) 53 { 54 __libc_lock_unlock (lock); 55 return NULL; 56 } The backtrace calls it through UNWIND_LINK_PTR macro. You might not seeing it because we cache libgcc_s.so loading and once the programs calls backtrace or pthread_cancel, it won't call __libc_unwind_link_get anymore. > > >> * pthread calls are not async-signal-safe either. > > There are no pthread_* calls, everything is based on kernel tasks. The SIGUSR1 handler it installs on freeze() uses both mutexes and cond variables, and this is quite easy to deadload if a thread sends a SIGUSR1 if its not suppose to (and both pthread_mutex_lock and pthread_cond_signal are not async-signal-safe). > > >> * it only handles libc.so, other libraries that uses ifunc for function >> selection also fails. > > You are right, I have mostly implemented this hard-coded "libc.so.6" to make > it general (for any libraries containing at least one STT_GNU_IFUNC) although > I haven't finished this implementation due to the last paragraph below. > > >> * the syscall heuristics do not handle partial results (for instance if >> write syscall returns do EINTR). > > I do not think EINTR would matter. The syscall heuristics is there expecting > that any library function which contains syscalls is not an IFUNC function. > > >> So this code has the potential of deadlock, specially if you have another >> thread issuing malloc. > > I may have missed something but I do not see it so according to the answers > above. >