public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
From: Fangrui Song <maskray@google.com>
To: Lukasz Stelmach <l.stelmach@samsung.com>
Cc: libc-alpha@sourceware.org, schwab@suse.de, fweimer@redhat.com,
	 palmer@dabbelt.com, adhemerval.zanella@linaro.org,
	joseph@codesourcery.com,  binutils@sourceware.org,
	Marek Pikula <m.pikula@partner.samsung.com>,
	 Marek Szyprowski <m.szyprowski@samsung.com>,
	Karol Lewandowski <k.lewandowsk@samsung.com>
Subject: Re: global pointer gets overwritten with dlopen(3) on RISC-V
Date: Fri, 12 May 2023 12:50:01 -0700	[thread overview]
Message-ID: <CAFP8O3KetKH8fMZoA=KPbtW6m4_KORwUF+HzD+btxUMdkiCubA@mail.gmail.com> (raw)
In-Reply-To: <oypijda5y97hre.fsf%l.stelmach@samsung.com>

On 2023-05-12, Lukasz Stelmach wrote:
>Hi,
>
>We've encountered an issue of a program misbehaving due to its gp value
>being overwritten. Let me present our setup and the exact sequence of
>events.
>
>We've got a program (the testee) written in C that we test with another
>one (a testing harness, the tester) written in C++ with gtest. So far,
>so good. To make the testing and inspection of the internal state of the
>testee easier the tester does not start the testee as a separate process
>but loads it with dlopen(3) and calls the testee's main() function.

As Florian replied, dlopening an executable is disallowed.
dlopen will give an error (dlerror()):
"cannot dynamically load executable" or
"cannot dynamically load position-independent executable".

>
>Data
>structures of the testee get initialised but the main() exits (as
>desired) due to some unmet requirements. But this is fine. The code of
>the testee remains usable and the tester starts calling it function by
>function.
>
>Alas, this is the point where things go south. What is worse they do so
>in a semi-random fashion. We've seen several different behaviours they
>were consistent between runs, but sometimes changed after compilation.
>Long story short, both the tester and testee were compiled and linked
>with relaxed relocations turned on. Both chunks of code assumed
>different value of the gp register, of course.
>
>What happens — step by step:
>
>1. The tester starts and sets its the gp value in _start (see sysdeps/riscv/start.S)
>
>2. The tester loads the testee with dlopen(3)
>
>3. The dlopen(3) calls load_gp via preinit_array (see sysdeps/riscv/start.S)
>
>4. The testee's code works fine, because the the gp register holds the value
>   from loaded with the testee's load_gp.
>
>5. The tester's code fails in many curious ways (e.g. stdio doesn't work,
>   different functions are called than were supposed to because
>   ofoverwrittent GOT entries etc.) Even in situations when the tester
>   didn't fail until the end of its main(), it always caught SIGSEGV in
>   __do_global_dtors_aux().
>
>Our fix was to link the tester with -mno-relax option set. And it
>worked. However, it took us a few days to understand all the details and
>we think something needs to be done to avoid the confusing semi-random
>failure mode even though we recognise our use-case is somewhat unusual.
>
>Possible general solutions:
>
>1. Make -mno-relax the default for ld(1) (on Linux?). We have no
>benchmarks whatsoever, but global variables aren't very popular in
>application code these days and the gp register allows access to a
>single memory page (4kB) only. No big deal really.

I do agree that --no-relax-gp is a sensible default choice for GNU ld.
https://maskray.me/blog/2021-03-14-the-dark-side-of-riscv-linker-relaxation#global-pointer-relaxation

Perhaps you can start a separate topic on binutils? :)

According to a doc from SiFive about -static -mcpu=sifive-u74 builds,
https://docs.google.com/spreadsheets/d/14V7cPbyc80AcGHzsMaw9hYb232dzRbGCmTApnxj-SpU/edit#gid=0
global pointer relaxation saves at best 0.5% size (I guess that refers
to .text. If we count all allocable sections, the percentage is likely
even smaller.)

I understand that global pointer relaxation may be more useful for
certain embedded use cases, but its saving for many other scenarios is
probably not significant enough, and using gp (x3) for platform
specific purposes
(https://github.com/riscv-non-isa/riscv-elf-psabi-doc/pull/371)
may provide a larger benefit.


>2. Make dlopen(3) (or any appropriate piece of code deep down in glibc)
>recognise the situation where the gp has been set and may be overwritten and
>report error. Neither overwriting the the gp nor loading a binary without
>(e.g. removing load_gp from preinit_array. why is it there in the first
>place?) would give us a working code.
>
>The above solutions aren't mutually exclusive and implementing both of
>them seems like a good idea.
>
>Are there any other ways to avoid misbehaviour when a process dlopens an
>executable binary and calls its code?
>
>Kind regards,
>--
>Łukasz Stelmach
>Samsung R&D Institute Poland
>Samsung Electronics

  parent reply	other threads:[~2023-05-12 19:50 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CGME20230512142114eucas1p112d969a89ad2480a0c10a532bd6d8440@eucas1p1.samsung.com>
2023-05-12 14:21 ` Lukasz Stelmach
2023-05-12 15:13   ` Florian Weimer
2023-05-15 13:47     ` Palmer Dabbelt
     [not found]       ` <CGME20230516065316eucas1p17bffcd25209bb441b9a9f4d263aa8b3c@eucas1p1.samsung.com>
2023-05-16  6:53         ` Lukasz Stelmach
2023-05-16  7:59           ` Szabolcs Nagy
2023-05-17  0:11             ` Palmer Dabbelt
2023-05-12 19:50   ` Fangrui Song [this message]
2023-05-12 20:11     ` Florian Weimer
2023-05-12 20:33       ` Palmer Dabbelt
2023-05-12 21:09         ` Fangrui Song
2023-05-12 21:57           ` Palmer Dabbelt
2023-05-12 22:34             ` Fangrui Song
2023-05-12 22:47               ` Palmer Dabbelt
2023-05-13  0:05                 ` Fangrui Song
2023-05-13  0:35                   ` Palmer Dabbelt
2023-05-16  3:56                     ` Fangrui Song
2023-05-16 22:51                       ` Palmer Dabbelt
2023-05-16 23:21                         ` Palmer Dabbelt
2023-05-12 20:35       ` Jeff Law

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAFP8O3KetKH8fMZoA=KPbtW6m4_KORwUF+HzD+btxUMdkiCubA@mail.gmail.com' \
    --to=maskray@google.com \
    --cc=adhemerval.zanella@linaro.org \
    --cc=binutils@sourceware.org \
    --cc=fweimer@redhat.com \
    --cc=joseph@codesourcery.com \
    --cc=k.lewandowsk@samsung.com \
    --cc=l.stelmach@samsung.com \
    --cc=libc-alpha@sourceware.org \
    --cc=m.pikula@partner.samsung.com \
    --cc=m.szyprowski@samsung.com \
    --cc=palmer@dabbelt.com \
    --cc=schwab@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).