From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 78567 invoked by alias); 27 Dec 2019 12:37:26 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Received: (qmail 78559 invoked by uid 89); 27 Dec 2019 12:37:26 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-4.6 required=5.0 tests=AWL,BAYES_00,SPF_PASS autolearn=ham version=3.3.1 spammy=excited, consequences, sake, took X-HELO: us-smtp-delivery-1.mimecast.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1577450242; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=cQ3rqXm4GO0hpP0FuauLoSns1ExFzmSMadSUSknuyXo=; b=RkdrlYhxXjUXNDXnwuT77fEqDw/wfYsjqqTDgetIj5htO352Hj8wen+DJw24ZLN+1niOyr 3+niD4Qf04IS3iczaAb4NocXBXc5c8Reqw35Dt2nr+ZqrhuIf2p1unR9cDwBt2qxj3hRoq lXp3CqzGz4ELVwvPEswOwD5b5yQ475Y= Return-Path: To: Chung-Lin Tang , libc-alpha From: Carlos O'Donell Subject: Comments on the ld.so optimization patches. Message-ID: <4364bdcb-cae0-3631-d4ca-b5074feb016a@redhat.com> Date: Fri, 27 Dec 2019 12:37:00 -0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.2.2 MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-SW-Source: 2019-12/txt/msg00753.txt.bz2 Chung-Lin, I have broken up my comments into two groupings. First are the high level comments about the changes. The suggested changes are things which I think we can actually achieve because they are largely structural (though point 2 may be difficult). Overall I am very excited and enthusiastic about this kind of change. I think it is the right direction to both cleanup and simplify the code in the dynamic loader that handles dependency ordering for loaded ELF objects. The fact that you have chosen *not* to go with something as complicated as Tarjan's SCC algorithm is a relief to me as reviewer. I agree with you that for the sake of the long-term maintenance of the dyanmic loader the first fix to the algorithm should be to keep it simple and avoid the worst of the behaviour we currently have. I am *very* impressed with the concept of creating a DSM for auto-generating the more complex dependency tests. I enjoyed working through the various tricky examples and getting to learn the syntax. I was quickly creating test cases that were more complex than I would have otherwise and refactoring them was a breeze (as opposed to editing dozens of files with sed scripts). I admit that _manually_ working through the example for bug 15311 did not yield the expect results of either the old or new sorting code, but I think I need to go through that a couple more times slowly. At a high level I see a few things that we need to organize in order to make this patch something which we can include in 2.31. (1) Remove test data from Makefiles I *like* how you auto-generate temporary Makefiles and include them into the elf/Makefile to allow you to maximally parallelize these tests and make sure the objects are built as expected. What I think is problematic is putting the test data directly in the Makefile. Some of this data is going to get bigger and messier, and that will impact the readability and editing of the Makefile. Conceptually I'd like to avoid putting test data directly into the Makefile. We have compile and link test setup data in the Makefiles, and that makes sense. We also have environment variables in the Makefiles, and I wish we didn't (we should have a distinct file-per-test with runtime metadata the test driver can use). Your initial concept puts the data of the tests directly in the Makefiles. Is there any way we can move this out of the Makefiles? I'd like to be able to generate much larger tests and it will make the Makefiles quite messy. In summary: I think the auto-generated dependencies are clever and cool, but the test data should live outside of the Makefile. (2) Support old and new sortings Ask Florian about broken legacy applications and standard IO hardening and you'll get a very long story about how impactful it can be to change core parts of the library. We have lots of users probably running lots of old code that depends in some ways on the existing broken sorting algorithm. If I had to make a suggestion here it would look like this: a. Split the core loader changes into two pieces. - The l_visited cleanup and usage. - Converting the old algorithm to use l_visited (if possible?) b. Add a new sorting algorithm. - Put the old sorting algorithm into dl-sort-maps-v1.c - Put the new sorting algorithm into dl-sort-maps-v2.c - Since the change in sorting is only for non-conforming cases where there are cyclic dependencies add a tunable to pick the case: glibc.ld.dynamic_sort=1 (for v1, default) glibc.ld.dynamic_sort=2 (for v2) - Then in Fedora Rawhide we can set glibc.ld.dynamic_sort to 2 and run the entire distro for 6 months, get feedback, and flip the default to 2 in glibc 2.32. (3) One microbenchmark One thing I think we do need to address is to get at least a single microbenchmark, driven by dso-ordering-test.py into the microbenchmark. A large enough test, as noted in (1), get unruly to put into the makefile which is why it would be nice if we could make up a framework to drive it from a test data file. When making changes that improve performance we should look to contribute a microbenchmark that shows a performance improvement. In this case I think we can come up with this relatively easily. My expectation is that we have at least 1 microbenchmark, either hand crafted or as the output of a really long input data sequence for dso-ordering-test.py. (4) dso-ordering-test.py improvements. (4.a) Support 1000's of DSOs. We need to support more than single character names for DSOs. I would like to be able to initially put in some very large tests for the microbenchmark on the order of 1000+ DSOs. This will require a way to disambiguate larger DSO names. May I suggest "|" or "/" ? e.g. a100|a101|a102... (4.b) Textual difference between constructor destructor. When writing and debugging tests with dso-ordering-test.py I found that sometimes when I had lots of DSOs it became difficult to tell the start of the destructor sequences at a glance (multi-line wrapping etc.). May I also suggest some textual difference for "running constructor" vs. "running destructor" e.g. a1 vs. ~a1 or something similar which will be immediately familiar to those writing C++? The alternative would be to use something like a comma ',' to distinguish between the various entries when they run into eachother. --- At an implementation level I see a few things that we need to cleanup in order to make this patch something which we can include in 2.31. (5) Specifically call out the use of l_reserved. All internal loader state should be called out and used for a specific purpose to avoid any issues with ruse of the same storage. To that end I'd like to see l_reserved changed to l_used/l_done and a comment explaining that they are used for the dependency sorting algorithm to avoid the need to allocate any storage for tracking this information. (6) Test case for bug 15311 and map interleaving. When running test for 15311, the libc.so.6 and ld.so scopes are oddly out of place with the rest of the scopes for the dependent objects. e.g. scope 1: /home/carlos/build/glibc-ldso/elf/tst-bz15311-a.so /home/carlos/build/glibc-ldso/elf/tst-bz15311-b.so /home/carlos/build/glibc-ldso/libc.so.6 /home/carlos/build/glibc-ldso/elf/tst-bz15311-c.so /home/carlos/build/glibc-ldso/elf/ld.so /home/carlos/build/glibc-ldso/elf/tst-bz15311-d.so Notice how with your patch the maps for libc.so.6 and ld.so are interleaved. Why is that? (7) Test run vs. test pass. Is there any reason to split the dependency sorting tests into two tests? Currently your code runs the test to generate output, and then runs a *-cmp* tests to compare the output. Could rework this so there is just one test? I think you should be able to do this by avoiding adding the tests to the tests Make var and just run it as a tests-special test with the appropriate dependencies triggering the binary being built (like other tests-special tests). I need to further debug the tst-bz15311 differences, but I wnated to get this email sent to you before the new year to give you a chance to look over my comments. I apologize that this took so long. Thank you for your patience. Please tell me if you have any comments on the above, or if you think anything here is unreasonable. I'm particularly worried about point (2) and compatibility and my suggestion to use a tunable here (are there security consequences to sorting differently?). Thank you for your work on this complex and brittle area of code. -- Cheers, Carlos.