public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
From: Florian Weimer <fweimer@redhat.com>
To: Jonathon Anderson <janderson@rice.edu>
Cc: John Mellor-Crummey <johnmc@rice.edu>,
	 Xiaozhu Meng <xm13@rice.edu>,
	"Mark W. Krentel" <krentel@rice.edu>, <libc-alpha@sourceware.org>
Subject: Re: Fwd: [PATCH v5 00/22] Some rtld-audit fixes
Date: Wed, 17 Nov 2021 21:42:29 +0100	[thread overview]
Message-ID: <87bl2i345m.fsf@oldenburg.str.redhat.com> (raw)
In-Reply-To: <0D3F0C5F-2586-42F9-916D-2F327432AF13@rice.edu> (John Mellor-Crummey via Libc-alpha's message of "Wed, 17 Nov 2021 12:08:17 -0600")

Thank you for the feedback.  At this time, I want to comment on the
l_addr aspect.

> >   3. For non-PIE executables the base address listed in link_map->l_addr
> >      for the main application binary is 0, even though dladdr is able to
> >      recover the correct offset. La_objopen is affected by this.
> >
> >   This would require to change an internal semantic for link_map->l_addr.
> >   This is not straighfoward and I am not sure about the direct gains.
> 
> Again, we are wholly sympathetic to the difficulty of refactoring
> complex code!
> 
> The motivation for providing a consistent link_map->l_addr value is
> to unify the handling for the main executable with any other binary
> and to allow access to the ELF header of the main executable (which
> provides fields not available anywhere else: type, ABI, entry
> point...). An alternative would be to re-open the file from its path
> (link_map->l_name), however this is a serious performance concern for
> large-scale executions (metadata servers are known to be a bottleneck
> of parallel filesystems).

It also has its security issues because you might get back the binary
you expect.

> dladdr is not always an option for
> auditors, as noted by another of our 'Tier 2' issues. Right now, we
> only require the program headers which we can obtain from
> getauxval(AT_PHDR), however this technique has questionable
> portability and robustness (getauxval returns an unsigned long, not a
> pointer).

A glibc port to an architecture where a long value cannot hold all
pointer values will have to provide an alternative interface similar to
getauxval, but that returns pointer values.  Of course that's not the
only interface with this problem (ElfW(Addr) is an integer as well).  It
makes the Morello glibc port quite interesting.  So I think *something*
like getauxval (AT_PHDR) will always be available, with pretty much
identical semantics.

> From an outside perspective the current l_addr semantic is fairly
> undocumented, the dladdr and dlinfo man pages define it vaguely as
> the "difference between the address in the ELF file and the address
> in memory." That sounds (to me at least) like l_addr should point to
> byte 0 in the file (the ELF header), and that seems to be correct in
> all but the non-PIE case.

I have struggled with this in the past.  I agree that it is confusing.
l_addr is the offset between virtual addresses in the program header of
the ELF object and the actual addresses in the process image.  This
offset happens to be 0 for ET_EXEC objects, and only there.

I think the sort-of-official way out of this conundrum is to call
dl_iterate_phdr and then look at the program headers.  This is what
_Unwind_Find_FDE in libgcc does to find the object boundaries and the
PT_GNU_EH_FRAME segment, see libgcc/unwind-dw2-fde-dip.c in the GCC
sources:

  […]
  # define __RELOC_POINTER(ptr, base) ((ptr) + (base))
  […]
  _Unwind_Ptr load_base;
  […]
  load_base = info->dlpi_addr;
  […]
  /* See if PC falls into one of the loaded segments.  Find the eh_frame
     segment at the same time.  */
  for (n = info->dlpi_phnum; --n >= 0; phdr++)
    {
      if (phdr->p_type == PT_LOAD)
        {
          _Unwind_Ptr vaddr = (_Unwind_Ptr)
            __RELOC_POINTER (phdr->p_vaddr, load_base);
          if (data->pc >= vaddr && data->pc < vaddr + phdr->p_memsz)
            {
              match = 1;
              pc_low = vaddr;
              pc_high =  vaddr + phdr->p_memsz;
            }
        }
  […]

This is not even glibc-specific, other systems have dl_iterate_phdr as
well.

getauxval (AT_PHDR) is definitely the more direct, Linux-specific
approach.  It's also much faster because it does not involve locking.

> dladdr gets its value from link_map->l_map_start instead of l_addr,
> so the semantic we want is already present in a private field. It
> seems to me these two fields could be swapped with little issue, if
> altering the public semantic is not acceptable we could also be sated
> if l_map_start was made public.

Applications which know about the current semantics of l_addr will
break, though.  l_addr is also exposed to debuggers via the _r_debug
interface.  I really do not think we can make changes to l_addr.

We have a similar issue around l_name being "" for the main program, and
unfortuantely I will have to argue quite strongly against changing that.

We should perhaps collect these grievances and work on better interfaces
for the future.  However, that will be quite a long-term investment
because it will take years until these new interfaces will be available
on your users' systems.  Bug fixes we can roll out more quickly, but
glibc interface changes typically happen at major distribution release
boundaries only.  For the time being, it has to be workarounds for
missing interfaces, I'm afraid.

Thanks,
Florian


  reply	other threads:[~2021-11-17 20:42 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <EA69A62D-7C01-4536-B551-2609226053F2@rice.edu>
2021-11-17 18:08 ` John Mellor-Crummey
2021-11-17 20:42   ` Florian Weimer [this message]
2021-11-18 21:55     ` Jonathon Anderson
2021-11-19 19:18       ` Florian Weimer
2021-11-19 19:56         ` Adhemerval Zanella
2021-11-19 20:31           ` Florian Weimer
2021-11-23 16:36             ` Adhemerval Zanella
2021-11-22 17:46 jma14
2021-11-23 13:58 ` Adhemerval Zanella
2021-11-23 14:02   ` Florian Weimer
2021-11-23 16:25     ` Adhemerval Zanella
2021-11-23 16:50       ` Florian Weimer
2021-11-23 21:13         ` Jonathon Anderson
2021-11-25 17:56           ` Adhemerval Zanella
2021-11-22 17:46 jma14

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87bl2i345m.fsf@oldenburg.str.redhat.com \
    --to=fweimer@redhat.com \
    --cc=janderson@rice.edu \
    --cc=johnmc@rice.edu \
    --cc=krentel@rice.edu \
    --cc=libc-alpha@sourceware.org \
    --cc=xm13@rice.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).