public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
From: Florian Weimer <fweimer@redhat.com>
To: Chung-Lin Tang <cltang@codesourcery.com>
Cc: GNU C Library <libc-alpha@sourceware.org>,
	 Carlos O'Donell <carlos@redhat.com>
Subject: Re: [PATCH v4 2/2] BZ #17645, fix slow DSO sorting behavior in dynamic loader -- Algorithm changes
Date: Mon, 18 Jan 2021 21:28:25 +0100	[thread overview]
Message-ID: <87o8hmkn12.fsf@oldenburg.str.redhat.com> (raw)
In-Reply-To: <85ab188b-64c2-d1fd-33fa-d7d2e6fb2d97@codesourcery.com> (Chung-Lin Tang's message of "Sun, 3 Jan 2021 19:22:48 +0800")

* Chung-Lin Tang:

> +/* We use a recursive function due to its better clarity and ease of
> +   implementation, as well as faster execution speed. We already use
> +   alloca() for list allocation during the breadth-first search of
> +   dependencies in _dl_map_object_deps(), and this should be on the
> +   same order of worst-case stack usage.  */
> +
> +static void
> +dfs_traversal (struct link_map ***rpo, struct link_map *map,
> +	       bool *do_reldeps)
> +{

Could you please add a comment describing the rpo, that it points past
the end of the destination array?  That's not obvious based on the
interface.

Although I'm quite happy how simple the code is to read. 8-)

> +  if (map->l_visited)
> +    return;
> +
> +  map->l_visited = 1;
> +
> +  if (map->l_initfini)
> +    {
> +      for (int i = 0; map->l_initfini[i] != NULL; i++)
> +	{
> +	  struct link_map *dep = map->l_initfini[i];
> +	  if (dep->l_visited == 0)
> +	    dfs_traversal (rpo, dep, do_reldeps);
> +	}
> +    }

First DT_NEEDED dependency is added first …
> +
> +  if (__glibc_unlikely (do_reldeps != NULL && map->l_reldeps != NULL))
> +    {
> +      /* Indicate that we encountered relocation dependencies during
> +	 traversal.  */
> +      *do_reldeps = true;
> +
> +      for (int m = map->l_reldeps->act - 1; m >= 0; m--)
> +	{
> +	  struct link_map *dep = map->l_reldeps->list[m];
> +	  if (dep->l_visited == 0)
> +	    dfs_traversal (rpo, dep, do_reldeps);
> +	}
> +    }
> +
> +  *rpo -= 1;
> +  **rpo = map;
> +}

… and the depending object is added last.  The l_visited check at the
start (and in the loops) ensures that this doesn't happen before.

> +/* Topologically sort array MAPS according to dependencies of the contained
> +   objects.  */
> +
> +static void
> +_dl_sort_maps_dfs (struct link_map **maps, unsigned int nmaps,
> +		   unsigned int skip __attribute__ ((unused)), bool for_fini)
> +{

> +  for (int i = nmaps - 1; i >= 0; i--)
> +    {
> +      dfs_traversal (&rpo_head, maps[i], do_reldeps_ref);
> +
> +      /* We can break early if all objects are already placed.  */
> +      if (rpo_head == rpo)
> +	goto end;
> +    }
> +  assert (rpo_head == rpo);

Should “goto end” be a “break”?  The compiler elides the assert for that
case, I expect.

> +  /* Here we may do a second pass of sorting, using only l_initfini[]
> +     static dependency links. This is avoided if !FOR_FINI or if we didn't
> +     find any reldeps in the first DFS traversal.
> +
> +     The reason we do this is: while it is unspecified how circular
> +     dependencies should be handled, the presumed reasonable behavior is to
> +     have destructors to respect static dependency links as much as possible,
> +     overriding reldeps if needed. And the first sorting pass, which takes
> +     l_initfini/l_reldeps links equally, may not preserve this priority.
> +
> +     Hence we do a 2nd sorting pass, taking only DT_NEEDED links into account
> +     (see how the do_reldeps argument to dfs_traversal() is NULL below).  */
> +  if (do_reldeps)
> +    {
> +      for (int i = nmaps - 1; i >= 0; i--)
> +	rpo[i]->l_visited = 0;
> +
> +      struct link_map **maps_head = &maps[nmaps];
> +      for (int i = nmaps - 1; i >= 0; i--)
> +	{
> +	  dfs_traversal (&maps_head, rpo[i], NULL);
> +
> +	  /* We can break early if all objects are already placed.
> +	     The below memcpy is not needed in the do_reldeps case here,
> +	     since we wrote back to maps[] during DFS traversal.  */
> +	  if (maps_head == maps)
> +	    return;
> +	}
> +      assert (maps_head == maps);
> +      return;
> +    }

About that second sort: I *suspect* the original goal was to use reldeps
for constructor ordering as well.  There were actually two sorting calls
in the original implementation of shared object loading, but the second
happend *before* relocation processing, so it never encountered any
collected relocations.  They only mattered on dlclose/process
termination, and this is not actually what users want.

At this point, I am quite convinced that we should get rid of reldeps
sorting completely.  If we reorder the link map chaining at load time, I
think we won't need to have to do *any* sorting at closing time.  This
means that we also don't have to allocate any memory for it.

Does anyone have concerns about moving in that direction?

I do believe that we should make one dependency sorting change in one
release, and not do this transition piecewise.

> +void
> +_dl_sort_maps (struct link_map **maps, unsigned int nmaps,
> +	       unsigned int skip, bool for_fini)
> +{
> +  /* Index code for sorting algorithm currently in use.  */
> +  static int32_t algorithm = 0;
> +  if (__glibc_unlikely (algorithm == 0))
> +    algorithm = TUNABLE_GET (glibc, rtld, dynamic_sort,
> +			     int32_t, NULL);
> +
> +  /* It can be tempting to use a static function pointer to store and call
> +     the current selected sorting algorithm routine, but experimentation
> +     shows that current processors still do not handle indirect branches
> +     that efficiently, plus a static function pointer will involve
> +     PTR_MANGLE/DEMANGLE, further impairing performance of small, common
> +     input cases. A simple if-case with direct function calls appears to
> +     be the fastest.  */
> +  if (__glibc_likely (algorithm == 1))
> +    _dl_sort_maps_original (maps, nmaps, skip, for_fini);
> +  else if (algorithm == 2)
> +    _dl_sort_maps_dfs (maps, nmaps, skip, for_fini);
> +  else
> +    __builtin_unreachable ();
> +}

Some nitpicking: We could declare the function pointer attribute_relro,
and since this function is necessarily called before final relocation,
it will be set before RELRO because active, so writing to it still
permitted.  The flag-based solution is of course fine, so I suggest just
to remove the comment.

Thanks,
Florian
-- 
Red Hat GmbH, https://de.redhat.com/ , Registered seat: Grasbrunn,
Commercial register: Amtsgericht Muenchen, HRB 153243,
Managing Directors: Charles Cachera, Brian Klemm, Laurie Krebs, Michael O'Neill


  parent reply	other threads:[~2021-01-18 20:28 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-03 11:22 Chung-Lin Tang
2021-01-13 12:00 ` Florian Weimer
2021-01-18 20:28 ` Florian Weimer [this message]
2021-01-18 20:46   ` Florian Weimer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87o8hmkn12.fsf@oldenburg.str.redhat.com \
    --to=fweimer@redhat.com \
    --cc=carlos@redhat.com \
    --cc=cltang@codesourcery.com \
    --cc=libc-alpha@sourceware.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).