Hi Carlos, here's the v3 patch for the actual sorting changes. On 2020/6/30 5:29 AM, Carlos O'Donell wrote: > On 5/19/20 9:25 AM, Chung-Lin Tang wrote: >> Hi Carlos, >> This part is the actual code changes in elf/, changes since the last version[1] are: > > I have 1 remaining question. Please see "QUESTION:" below. Please see below. > I have 1 remaining nit, which is just to use named arguments in ldsodefs.h. Done. > I have 1 code change regarding PTR_MANGLE/PTR_DEMANGLE to obfuscate the stored > function pointer. I have been performance testing the patch, and have concluded that, although just in the hundreds of cycles as measured by hp-timing.h facilities, the speed of using a simple if-case and direct calls is faster than calling through a function pointer (current CPUs still doesn't do indirect branches efficiently enough), and adding pointer mangle/demangle will just add on to the overhead. So in this version I've changed to using just a static int32_t code instead of function pointer. (this matters a bit because as I've tested, while the new DFS algorithm is more scalable, for the common small-number-of-deps case the new _dl_sort_maps routine is actually slightly slower than the old algorithm; trying to recover this gap as much as possible) > Please post v3. One way to answer my question is to post a v3 with a large enough > comment in that section that explains *why* we resort without taking l_reldeps into > account. > > Looking very good. Thanks for all the work here. > >> 1. As you suggested, the use of a glibc tunable (glibc.rtld.dynamic_sort) >> to allow specifying which sorting algorithm to use. "1" for the current algorithm, >> "2" for the new DFS based one. The default is 1 to be conservative. > > Thank you. I think that overall I'd want to take the following strategy: > > (1) Test glibc.rtld.dynamic_sort=2 in Fedora Rawhide for 6 months. > (2) Switch glibc.rtld.dynamic_sort default to 2 in glibc 2.33, and declare > the version 1 deprecated. > (3) In glibc 2.35 or glibc 2.36 remove version 1. That gives us a 1.5 to 2 year > window where we have deprecated and then removed the old algorithm. Users > will have deployed environments at this point with the default set correctly > and we'll have seen the most egregious problems sorted out. > > Not everyone may agree with this strategy. However, I'm hesitant to take a more > strict approach to this problem. We need a good transitional period with both > options. I personally think this is fine, nothing to object. >> +static void >> +_dl_sort_maps_dfs (struct link_map **maps, unsigned int nmaps, >> + unsigned int skip __attribute__ ((unused)), bool for_fini) >> +{ >> + for (int i = 0; i < nmaps; i++) >> + maps[i]->l_visited = 0; The two loops where we initialize the l_visited bits have been adjusted to 'for(int i = nmaps-1; i >= 0; i--)'. The change compare against zero generally saves one register pressure in the loop body. > >> + To summarize, just passing in the full list, and iterating from back >> + to front makes things much more straightforward. */ >> + >> + struct link_map *rpo[nmaps]; > > OK. Array to hold rpo pointers. > >> + struct link_map **rpo_head = &rpo[nmaps]; > > Required: Please add a comment that we start 1 past the end of this list because > of the *pro -= 1; we use in dfs_traveral(). Done. >> + end: >> + /* This is avoided if !FOR_FINI or if we didn't find any reldeps in >> + the first DFS traversal. */ > > OK. Even if we find everything sorted we might have to check reldeps. > You could adjust the logic in dfs_traversal to get rid of this here, but > it would complicate the logic there. > >> + if (do_reldeps) >> + { >> + for (int i = 0; i < nmaps; i++) >> + rpo[i]->l_visited = 0; > > OK. Clear rpo in order setting l_visited to 0, and we have to sort all over again. > >> + >> + struct link_map **maps_head = &maps[nmaps]; >> + for (int i = nmaps - 1; i >= 0; i--) >> + { >> + dfs_traversal (&maps_head, rpo[i], NULL); > > QUESTION: > > We saw symbol dependencies. > > The objects have to already be in maps[] or it's an unsatisfied symbol dependency. > > We already took into account the l_reldeps in dfs_traversal() the first time. > > Now we sort again *without* taking them into account. > > However, we use the rpo[] maps list which includes the sorted order of the relocation > dependencies taken into account. > > Why are we doing this? Is it to take DT_NEEDED as a preference over symbol > relocation dependencies? Yes, the discussion of #15311 was that in the destructor ordering, static DT_NEEDED dependencies should not be violated in the face of relocation dependencies, and a second pass of sorting using only static deps should be a viable solution to enforce this. To illustrate, given the tst-bz15311 test description in elf/dso-sort-tests-1.def in the testing patch: Test description: {+a;+e;+f;+g;+d;%d;-d;-g;-f;-e;-a};a->b->c->d;d=>[ba];c=>a;b=>e=>a;c=>f=>b;d=>g=>c (1) Output of the current "old" algorithm: {+a[d>c>b>a>];+e[e>];+f[f>];+g[g>];+d[];%d(b(e(a()))a()g(c(a()f(b(e(a()))))));-d[];-g[];-f[];-e[];-a[c>b>a>];+e[e>];+f[f>];+g[g>];+d[];%d(b(e(a()))a()g(c(a()f(b(e(a()))))));-d[];-g[];-f[];-e[];-a[c>b>a>];+e[e>];+f[f>];+g[g>];+d[];%d(b(e(a()))a()g(c(a()f(b(e(a()))))));-d[];-g[];-f[];-e[];-a[b->c->d" static dependency restrictions, even in the face of circular dependency ambiguity. Currently, only (2) does this. But you did mention the test is actually underlinked, and the proper fix is to properly link it. So I think there's also a question of: should #15311 be fixed at all? Is the current "beyond specification" handling of this testcase desired, or should we just accept such a testcase as undefined behavior? (if we don't need to handle this, the code could be leaner) FYI, the tst-bz15311 testcase is the only test that is currently affected by this second pass sorting, i.e. removing it does not cause any regressions in other elf/ tests. >> +void >> +_dl_sort_maps (struct link_map **maps, unsigned int nmaps, >> + unsigned int skip, bool for_fini) >> +{ >> + /* Function pointer to sorting algorithm currently in use. */ >> + static void (*sort_maps) (struct link_map **, unsigned int, >> + unsigned int, bool) = NULL; >> + if (__glibc_unlikely (sort_maps == NULL)) >> + { >> + /* Select sorting algorithm based on tunable value. */ >> + int32_t algorithm = TUNABLE_GET (glibc, rtld, dynamic_sort, >> + int32_t, NULL); >> + if (__glibc_likely (algorithm == 1)) >> + sort_maps = _dl_sort_maps_original; >> + else if (algorithm == 2) >> + sort_maps = _dl_sort_maps_dfs; > > Both of these need function pointer encryption e.g. PTR_MANGLE. This part has been changed to (pasting changed fragment here to ease review): /* Index code for sorting algorithm currently in use. */ static int32_t algorithm = 0; if (__glibc_unlikely (algorithm == 0)) algorithm = TUNABLE_GET (glibc, rtld, dynamic_sort, int32_t, NULL); /* It can be tempting to use a static function pointer to store and call the current selected sorting algorithm routine, but experimentation shows that current processors still do not handle indirect branches that efficiently, plus a static function pointer will involve PTR_MANGLE/DEMANGLE, further impairing performance of small, common input cases. A simple if-case with direct function calls appears to be the fastest. */ if (__glibc_likely (algorithm == 1)) _dl_sort_maps_original (maps, nmaps, skip, for_fini); else if (algorithm == 2) _dl_sort_maps_dfs (maps, nmaps, skip, for_fini); else __builtin_unreachable (); The reasons of change are mainly performance related as mentioned above, I don't think it will affect understanding though. >> unsigned int l_phdr_allocated:1; /* Nonzero if the data structure pointed >> to by `l_phdr' is allocated. */ >> unsigned int l_soname_added:1; /* Nonzero if the SONAME is for sure in >> diff --git a/sysdeps/generic/ldsodefs.h b/sysdeps/generic/ldsodefs.h >> index 5ff4a2831b..a0f60df81c 100644 >> --- a/sysdeps/generic/ldsodefs.h >> +++ b/sysdeps/generic/ldsodefs.h >> @@ -1017,8 +1017,8 @@ extern void _dl_init (struct link_map *main_map, int argc, char **argv, >> extern void _dl_fini (void) attribute_hidden; >> >> /* Sort array MAPS according to dependencies of the contained objects. */ >> -extern void _dl_sort_maps (struct link_map **maps, unsigned int nmaps, >> - char *used, bool for_fini) attribute_hidden; >> +extern void _dl_sort_maps (struct link_map **, unsigned int, unsigned int, >> + bool) attribute_hidden; > > Please name the arguments. They should be matching names to the implementation > and help the reader understand the arguments. Done. The sorting part of the v3 patch is attached, please see if this is ready to commit. Thanks, Chung-Lin