From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 21527 invoked by alias); 26 Nov 2014 21:56:49 -0000 Mailing-List: contact glibc-bugs-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Post: List-Help: , Sender: glibc-bugs-owner@sourceware.org Received: (qmail 21492 invoked by uid 48); 26 Nov 2014 21:56:43 -0000 From: "paulo.cesar.pereira.de.andrade at gmail dot com" To: glibc-bugs@sourceware.org Subject: [Bug dynamic-link/17645] RFE: Improve performance of dynamic loader for deeply nested DSO dependencies. Date: Wed, 26 Nov 2014 21:56:00 -0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: glibc X-Bugzilla-Component: dynamic-link X-Bugzilla-Version: unspecified X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: paulo.cesar.pereira.de.andrade at gmail dot com X-Bugzilla-Status: NEW X-Bugzilla-Priority: P2 X-Bugzilla-Assigned-To: unassigned at sourceware dot org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit X-Bugzilla-URL: http://sourceware.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-SW-Source: 2014-11/txt/msg00207.txt.bz2 https://sourceware.org/bugzilla/show_bug.cgi?id=17645 --- Comment #5 from Paulo Andrade --- (In reply to Carlos O'Donell from comment #4) > (In reply to Paulo Andrade from comment #1) > > Created attachment 7972 [details] > > Proposed patch > > > > This patch implements a simple stable topological sort that moves > > an entry at most once, by keeping track of, and resetting the weight > > of the entries after every move. It breaks loops by choosing the > > ones that appear first with the lowest weight, that is, less > > dependencies. > > If there is a loop, the "c" variable in the "sort" function will > > be larger than 0. > > At the high-level your patch is almost ready. > > You refactor the sorting code out into a static function that the compiler > can inline and you use the same pattern in all places where sorting happens. > This is good. The code is small enough that this way (static functions) should be better. But it also cannot be really shared, without extra parameters, as there are 3 slightly different variants of the loop. > You are missing two important things: > > * More code comments. You should comment each sort routine with a high-level > comment about what you're doing and why. This is particularly important for > the weight functions. Ok. I will work on that. A quick explanation is somewhat like this: """ The algorithm implements a simple stable topological sort. The algorithm is expected to be fast and break cycles in a reasonable way. If there are no cycles, it does a plain topological sort, that does not reorder the input, that is, it is stable. Every link_map instance has a l_idx field, that in special cases, currently only when a DSO is being unloaded, can be less than zero. If the l_idx is not less than zero, and not in the dl_sort_fini call that already initializes the l_idx entries, the sort function initializes the l_idx field of every link_map pointer to an unique identifier. The sort is done in-place, by decrementing the insert position (the "k" variable) in the maps vector. Sorting is done by swapping an entry with the current insert position, and decrementing the insert position. The "sort" function allocates in the stack the "seen" vector, that works as a boolean flag for every link_map, to know if it has been already moved. Moving, actually swapping, is done at most once, and not done if it is already in the proper position. The "sort" function also allocates in stack the "w" vector, that is a counter of how many, not self and not already "seen" direct references exist to a link_map entry. The "weight" function resets the "w" entries at every call. Note that other than the first call, every call reduces the number of entries by one, as the last moved link_map pointer does not need to be verified again. In the "weight" function, once a link_map entry l_initfini vector does not find any dependency, either the entry is a self dependency, the object is being unloaded or the dependency has been moved (in which case a cyclic dependency was just broken), the "w" entry for it will be 0. Note that the "w" vector is indexed by position in the input/output maps vector, while the "seen" vector is indexed by the "l_idx" field of the "struct link_map" representing the DSO. Every loop, after calling the "weight" function, the "sort" searches all "w" entries for the first entry matching the "c" variable. The "c" variable is initialized as zero, and reset to zero after an entry is moved. If it scans all "w" entries and none is equal to "c", it increments "c"; if "c" is larger than zero it means there is a loop, in which case it chooses the first one with the lowest weight. Once an entry is moved, it is marked as "seen", and if it was part of a cycle, the cycle is now broken, as it will no longer influence its direct dependencies. Recognized possible problems: o The algorithm is not expected to scale well beyond a few thousand entries, due to the need to reinitialize the "w" vector. o If there are multiple "solutions", it will frequently not match sorting done by the previous algorithm. But, it guarantees an stable sorting, that is, does not change input order. o It does not attempt to be smart when finding loops, a better solution probably would be to break the smallest loop first, when there are multiple entries with the same weight, instead of choosing the first entry. """ > * Tests. The existing set of tests are insufficient to cover the changes you > are proposing. While it seems unfair, and I understand that, this kind of > change has not been undertaken because of the lack of testing. You need to > add more tests. The *best* testing is actually a test which > deterministically generates a sequence of permutations of linked objects and > verifies they are sorted correctly by the algorithms in question. This way > we can show that the sorting works for a large set of object dependency > orderings. When we get a bug report, and we will, we can enlarge the set of > automated testing to include the case the bug report claims and see what > went wrong. I will work on test cases. So far my tests were glibc "make check" and running rawhide and rhel-7 with the patch. I am particularly interested in knowing if there is a test case for this commit: https://sourceware.org/git/?p=glibc.git;a=blame;f=elf/dl-fini.c;h=c35577565eb66ac241365c05efe7c7fb791e0df2#l102 line 102, because it is another special variant of the loop, and what I did to address it is to loop over the entries, and increase the "w" vector of the dependencies, to ensure the dependencies are not moved before the entry, but on very complex cycles, and if having conflicting rules, it cannot guarantee it will really " preserve a link time dependency". But I suspect the same is true for the current algorithm; worse if the link time dependencies generate conflicting dependencies in cycles. > If you can cover those two things, then you're ready to post the patch to > libc-alpha for more formal review. Ok. I followed wiki instructions and sent the first version to libc-alpha, but will now wait for a more concrete situation based on bugzilla. > This looks really good though and you're making great progress in a > difficult part of the dynamic loader. -- You are receiving this mail because: You are on the CC list for the bug.