From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ed1-x531.google.com (mail-ed1-x531.google.com [IPv6:2a00:1450:4864:20::531]) by sourceware.org (Postfix) with ESMTPS id 9F7393856275 for ; Fri, 12 May 2023 19:50:15 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 9F7393856275 Authentication-Results: sourceware.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=google.com Received: by mail-ed1-x531.google.com with SMTP id 4fb4d7f45d1cf-50dba8a52dcso32746a12.0 for ; Fri, 12 May 2023 12:50:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1683921014; x=1686513014; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=7+5UXm1tQwwaZkoiHJqIhrfjBAsBCBWkd8TaMyc9SSo=; b=bgjBvgSRo8HNmdXfBTh8dZvVrV26B16D/m7m9MCzg8It5ch6oG7wMLpwPRf53E5Kwm kSBWDelWzi/x94EeZMhen6Hut6hCxga56WBTGfd8hdoRhTlZB/KJMWjwPwhtYBL6wMqV wRBnO3KN7gxiS7FpMOLiFVJKf6IuzxL+symoN4895PoTyLBzd/pvEf52orkVwQeuiIlr xqi/zWU+xhFMiC9gk0DytElUHdoZibBhPEdiFFDqMN7beBBw4wGQo3e8SK/bD0h0631T 9CKJQjKlBOFZY5ASFNROBXTPXrerAfH+TT9/YE0QL9TGtvkswMVRbaB3lRfd6SvCV5b6 38kA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1683921014; x=1686513014; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=7+5UXm1tQwwaZkoiHJqIhrfjBAsBCBWkd8TaMyc9SSo=; b=QvmPAjS7qf0dGWPwWri8ijz6gF5ie/g4rneplF8RG9roXRCydqz0EjWzF8w3qubDZe Y40o5uzNjsp5G8NYbUBsYl/JiHPTJIVx5QOQtmUYpZ6IwmzRj2yBfxsC8tKF1jm0JQRR M2abdijGRRxtd5iV6UBm1utvxO3LynJUII5FXCKCB82tOsVEypXWd2AdQdaH/+j+l1W0 /f5iWli71c56Umb1Nq9NfO1EXXG0zhX5EDI36+aipucpeise92vQ1gxe5OCMX7A7eIzJ gOcT8WCV3X4MpXEr0LjcbajkarmVhToYrts+hqBlMSiurwOe6drCoCluDlIpCP8iycJR /8Vw== X-Gm-Message-State: AC+VfDyEJ42uQwRUpqMbwLhieMYAREAuNSreXVfV8itJJRez96Hu1QIq bYVi3tB+7yTHq/CLKvnNVzCou/XPKeWZX52zxwSqRA== X-Google-Smtp-Source: ACHHUZ40O0hezZOZaN0yhme1j35WynudLF03crGLSYsM/Utr4rA0z2tja9JEK+nx+n9Xj0ZGd5BdjNKX26newpi54hE= X-Received: by 2002:a50:9f28:0:b0:501:d395:972c with SMTP id b37-20020a509f28000000b00501d395972cmr84051edf.5.1683921013903; Fri, 12 May 2023 12:50:13 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Fangrui Song Date: Fri, 12 May 2023 12:50:01 -0700 Message-ID: Subject: Re: global pointer gets overwritten with dlopen(3) on RISC-V To: Lukasz Stelmach Cc: libc-alpha@sourceware.org, schwab@suse.de, fweimer@redhat.com, palmer@dabbelt.com, adhemerval.zanella@linaro.org, joseph@codesourcery.com, binutils@sourceware.org, Marek Pikula , Marek Szyprowski , Karol Lewandowski Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-18.7 required=5.0 tests=BAYES_00,DKIMWL_WL_MED,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,ENV_AND_HDR_SPF_MATCH,KAM_INFOUSMEBIZ,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE,USER_IN_DEF_DKIM_WL,USER_IN_DEF_SPF_WL autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On 2023-05-12, Lukasz Stelmach wrote: >Hi, > >We've encountered an issue of a program misbehaving due to its gp value >being overwritten. Let me present our setup and the exact sequence of >events. > >We've got a program (the testee) written in C that we test with another >one (a testing harness, the tester) written in C++ with gtest. So far, >so good. To make the testing and inspection of the internal state of the >testee easier the tester does not start the testee as a separate process >but loads it with dlopen(3) and calls the testee's main() function. As Florian replied, dlopening an executable is disallowed. dlopen will give an error (dlerror()): "cannot dynamically load executable" or "cannot dynamically load position-independent executable". > >Data >structures of the testee get initialised but the main() exits (as >desired) due to some unmet requirements. But this is fine. The code of >the testee remains usable and the tester starts calling it function by >function. > >Alas, this is the point where things go south. What is worse they do so >in a semi-random fashion. We've seen several different behaviours they >were consistent between runs, but sometimes changed after compilation. >Long story short, both the tester and testee were compiled and linked >with relaxed relocations turned on. Both chunks of code assumed >different value of the gp register, of course. > >What happens =E2=80=94 step by step: > >1. The tester starts and sets its the gp value in _start (see sysdeps/risc= v/start.S) > >2. The tester loads the testee with dlopen(3) > >3. The dlopen(3) calls load_gp via preinit_array (see sysdeps/riscv/start.= S) > >4. The testee's code works fine, because the the gp register holds the val= ue > from loaded with the testee's load_gp. > >5. The tester's code fails in many curious ways (e.g. stdio doesn't work, > different functions are called than were supposed to because > ofoverwrittent GOT entries etc.) Even in situations when the tester > didn't fail until the end of its main(), it always caught SIGSEGV in > __do_global_dtors_aux(). > >Our fix was to link the tester with -mno-relax option set. And it >worked. However, it took us a few days to understand all the details and >we think something needs to be done to avoid the confusing semi-random >failure mode even though we recognise our use-case is somewhat unusual. > >Possible general solutions: > >1. Make -mno-relax the default for ld(1) (on Linux?). We have no >benchmarks whatsoever, but global variables aren't very popular in >application code these days and the gp register allows access to a >single memory page (4kB) only. No big deal really. I do agree that --no-relax-gp is a sensible default choice for GNU ld. https://maskray.me/blog/2021-03-14-the-dark-side-of-riscv-linker-relaxation= #global-pointer-relaxation Perhaps you can start a separate topic on binutils? :) According to a doc from SiFive about -static -mcpu=3Dsifive-u74 builds, https://docs.google.com/spreadsheets/d/14V7cPbyc80AcGHzsMaw9hYb232dzRbGCmTA= pnxj-SpU/edit#gid=3D0 global pointer relaxation saves at best 0.5% size (I guess that refers to .text. If we count all allocable sections, the percentage is likely even smaller.) I understand that global pointer relaxation may be more useful for certain embedded use cases, but its saving for many other scenarios is probably not significant enough, and using gp (x3) for platform specific purposes (https://github.com/riscv-non-isa/riscv-elf-psabi-doc/pull/371) may provide a larger benefit. >2. Make dlopen(3) (or any appropriate piece of code deep down in glibc) >recognise the situation where the gp has been set and may be overwritten a= nd >report error. Neither overwriting the the gp nor loading a binary without >(e.g. removing load_gp from preinit_array. why is it there in the first >place?) would give us a working code. > >The above solutions aren't mutually exclusive and implementing both of >them seems like a good idea. > >Are there any other ways to avoid misbehaviour when a process dlopens an >executable binary and calls its code? > >Kind regards, >-- >=C5=81ukasz Stelmach >Samsung R&D Institute Poland >Samsung Electronics