* Understanding why getaddrinfo_a and __gai_enqueue_request use recursive locks @ 2017-09-23 7:08 Will Hawkins 2017-09-23 13:04 ` Carlos O'Donell 0 siblings, 1 reply; 17+ messages in thread From: Will Hawkins @ 2017-09-23 7:08 UTC (permalink / raw) To: libc-help; +Cc: Will Hawkins Hello! Thank you all for your willingness to be part of the community of glibc developers and for being so helpful when a novice like me has a stupid question! I was looking at the source code for getaddrinfo_a and noticed that it calls __gai_enqueue_request. The former takes the __gai_requests_mutex lock at the beginning of its work. The latter does the exact same thing. I understand that this will not deadlock because __gai_requests_mutex is declared as a recursive lock, but I can't see the reason for locking in both places. I could imagine that it would be necessary if getaddrinfo_a or __gai_enqueue_request called itself recursively or both the functions were publicly accessible but neither of those seems to be the case. It seems that getaddrinfo_a is the only function that calls __gai_enqueue_request. Is this somehow related to the "removal of internal function" change that Mr. Weimer committed a few months ago (commit ca2085)? In other words, is this meant to future proof the possibility that someone (a non-internal user of glibc) calls __gai_enqueue_request directly? I am sure that there is a very subtle explanation for what is going on here, but my mind is too simple to get it. I know that the code works, but I am curious about why. If you can shed any light on the situation, I'd really appreciate it! Thanks in advance! Will ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Understanding why getaddrinfo_a and __gai_enqueue_request use recursive locks 2017-09-23 7:08 Understanding why getaddrinfo_a and __gai_enqueue_request use recursive locks Will Hawkins @ 2017-09-23 13:04 ` Carlos O'Donell 2017-09-23 14:44 ` Will Hawkins 0 siblings, 1 reply; 17+ messages in thread From: Carlos O'Donell @ 2017-09-23 13:04 UTC (permalink / raw) To: Will Hawkins, libc-help; +Cc: Will Hawkins On 09/23/2017 01:08 AM, Will Hawkins wrote: > I am sure that there is a very subtle explanation for what is going on > here, but my mind is too simple to get it. I know that the code works, > but I am curious about why. > > If you can shed any light on the situation, I'd really appreciate it! It is a design principle. Acquire the lock if you inspect data that could be concurrently modified. You are correct that it looks like the list lock acquisition could be removed from __gai_enqueue_request. That would be an optimization. The comments even say so themselves. Patches welcome! :-) Please don't be too scared about the contribution checklist, we can help with any step along the way: https://sourceware.org/glibc/wiki/Contribution%20checklist -- Cheers, Carlos. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Understanding why getaddrinfo_a and __gai_enqueue_request use recursive locks 2017-09-23 13:04 ` Carlos O'Donell @ 2017-09-23 14:44 ` Will Hawkins 2017-09-25 16:13 ` Carlos O'Donell 0 siblings, 1 reply; 17+ messages in thread From: Will Hawkins @ 2017-09-23 14:44 UTC (permalink / raw) To: Carlos O'Donell; +Cc: libc-help, Will Hawkins On Sat, Sep 23, 2017 at 9:04 AM, Carlos O'Donell <carlos@redhat.com> wrote: > On 09/23/2017 01:08 AM, Will Hawkins wrote: >> I am sure that there is a very subtle explanation for what is going on >> here, but my mind is too simple to get it. I know that the code works, >> but I am curious about why. >> >> If you can shed any light on the situation, I'd really appreciate it! > > It is a design principle. Acquire the lock if you inspect data that could > be concurrently modified. Thank you! This makes perfect sense, of course. Obviously with an asynchronous function that is implemented with concurrent access to shared data structures in a multithreaded world, mutexes are our friends. :-) > > You are correct that it looks like the list lock acquisition could be > removed from __gai_enqueue_request. That would be an optimization. > The comments even say so themselves. This is what confused me -- thank you for the confirmation that I was not going crazy! After an hour of trying to figure out how the code did not deadlock every time, I finally looked at the initialization/definition of the mutex and saw that it was recursive. > > Patches welcome! :-) > This will be awesome! I'd love to contribute. I am in/out for the next several days but I will submit it asap! > Please don't be too scared about the contribution checklist, we can > help with any step along the way: > > https://sourceware.org/glibc/wiki/Contribution%20checklist > Thank you so much for taking the time to reply and encouraging my participation. Talk to you soon! Will > -- > Cheers, > Carlos. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Understanding why getaddrinfo_a and __gai_enqueue_request use recursive locks 2017-09-23 14:44 ` Will Hawkins @ 2017-09-25 16:13 ` Carlos O'Donell 2017-09-25 17:05 ` Will Hawkins ` (2 more replies) 0 siblings, 3 replies; 17+ messages in thread From: Carlos O'Donell @ 2017-09-25 16:13 UTC (permalink / raw) To: Will Hawkins; +Cc: libc-help, Will Hawkins On 09/23/2017 08:44 AM, Will Hawkins wrote: > This is what confused me -- thank you for the confirmation that I was > not going crazy! After an hour of trying to figure out how the code > did not deadlock every time, I finally looked at the > initialization/definition of the mutex and saw that it was recursive. We call this a belt-and-suspenders approach, particularly if there were multiple callers in the past that might or might no have taken the lock. In this case it looks like there is just one caller so the appropriate patch would have to be: * Remove locking from helper function. * Add comments to helper function explaining that lock X must be held before calling. * Add comments in caller stating that lock must be taken before calling helper function. * Look to see if have a reasonable test that exercises this code path, and if not add one (I think there are tests already that cover this area of code). That's just a rough sketch of the details. This fix doesn't need a public bug since it's not a user visible feature, just an optimization. -- Cheers, Carlos. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Understanding why getaddrinfo_a and __gai_enqueue_request use recursive locks 2017-09-25 16:13 ` Carlos O'Donell @ 2017-09-25 17:05 ` Will Hawkins 2017-09-25 17:23 ` Hal Murray 2017-09-25 23:00 ` Will Hawkins 2 siblings, 0 replies; 17+ messages in thread From: Will Hawkins @ 2017-09-25 17:05 UTC (permalink / raw) To: Carlos O'Donell; +Cc: libc-help, Will Hawkins On Mon, Sep 25, 2017 at 12:13 PM, Carlos O'Donell <carlos@redhat.com> wrote: > On 09/23/2017 08:44 AM, Will Hawkins wrote: >> This is what confused me -- thank you for the confirmation that I was >> not going crazy! After an hour of trying to figure out how the code >> did not deadlock every time, I finally looked at the >> initialization/definition of the mutex and saw that it was recursive. > > We call this a belt-and-suspenders approach, particularly if there were > multiple callers in the past that might or might no have taken the lock. > > In this case it looks like there is just one caller so the appropriate > patch would have to be: > > * Remove locking from helper function. > * Add comments to helper function explaining that lock X must be held > before calling. > * Add comments in caller stating that lock must be taken before calling > helper function. > * Look to see if have a reasonable test that exercises this code path, > and if not add one (I think there are tests already that cover this > area of code). > > That's just a rough sketch of the details. This fix doesn't need a > public bug since it's not a user visible feature, just an optimization. Mr. O'Donnell, This is a great plan of attack and I'll follow it. Once I've made the necessary changes I will go back and read the information about submitting patches and then go from there. Thanks for your mentorship with this trivial patch. Will > > -- > Cheers, > Carlos. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Understanding why getaddrinfo_a and __gai_enqueue_request use recursive locks 2017-09-25 16:13 ` Carlos O'Donell 2017-09-25 17:05 ` Will Hawkins @ 2017-09-25 17:23 ` Hal Murray 2017-09-25 19:06 ` Carlos O'Donell 2017-09-25 20:39 ` Florian Weimer 2017-09-25 23:00 ` Will Hawkins 2 siblings, 2 replies; 17+ messages in thread From: Hal Murray @ 2017-09-25 17:23 UTC (permalink / raw) To: Carlos O'Donell; +Cc: libc-help, hmurray carlos@redhat.com said: > * Look to see if have a reasonable test that exercises this code path, > and if not add one (I think there are tests already that cover this > area of code). Are there any tools that scan source code to check for this sort of thing? I think you would need a way to annotate helper functions as needing a lock and specify the lock. It may also take annotations in the caller function. -- These are my opinions. I hate spam. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Understanding why getaddrinfo_a and __gai_enqueue_request use recursive locks 2017-09-25 17:23 ` Hal Murray @ 2017-09-25 19:06 ` Carlos O'Donell 2017-09-25 20:38 ` Florian Weimer 2017-09-25 20:39 ` Florian Weimer 1 sibling, 1 reply; 17+ messages in thread From: Carlos O'Donell @ 2017-09-25 19:06 UTC (permalink / raw) To: Hal Murray; +Cc: libc-help On 09/25/2017 11:23 AM, Hal Murray wrote: > > carlos@redhat.com said: >> * Look to see if have a reasonable test that exercises this code path, >> and if not add one (I think there are tests already that cover this >> area of code). > > Are there any tools that scan source code to check for this sort of thing? > > I think you would need a way to annotate helper functions as needing a lock > and specify the lock. It may also take annotations in the caller function. For glibc you can't use gcov (since some of the pieces of dependent functionality are implemented by glibc), but you can use perf e.g. perf record, perf report. That way you can get coverage. -- Cheers, Carlos. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Understanding why getaddrinfo_a and __gai_enqueue_request use recursive locks 2017-09-25 19:06 ` Carlos O'Donell @ 2017-09-25 20:38 ` Florian Weimer 0 siblings, 0 replies; 17+ messages in thread From: Florian Weimer @ 2017-09-25 20:38 UTC (permalink / raw) To: Carlos O'Donell, Hal Murray; +Cc: libc-help On 09/25/2017 09:06 PM, Carlos O'Donell wrote: > For glibc you can't use gcov (since some of the pieces of dependent functionality > are implemented by glibc), We have special builds with profiling enabled. They should give function-level coverage information at least. Florian ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Understanding why getaddrinfo_a and __gai_enqueue_request use recursive locks 2017-09-25 17:23 ` Hal Murray 2017-09-25 19:06 ` Carlos O'Donell @ 2017-09-25 20:39 ` Florian Weimer 2017-09-26 6:29 ` Hal Murray 1 sibling, 1 reply; 17+ messages in thread From: Florian Weimer @ 2017-09-25 20:39 UTC (permalink / raw) To: Hal Murray, Carlos O'Donell; +Cc: libc-help On 09/25/2017 07:23 PM, Hal Murray wrote: > I think you would need a way to annotate helper functions as needing a lock > and specify the lock. It may also take annotations in the caller function. Do you mean annotations to find resource leaks? As far as free software tools are concerned, using C++ and RAII looks like the only feasible solution (with strong tool support) right now. Florian ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Understanding why getaddrinfo_a and __gai_enqueue_request use recursive locks 2017-09-25 20:39 ` Florian Weimer @ 2017-09-26 6:29 ` Hal Murray 0 siblings, 0 replies; 17+ messages in thread From: Hal Murray @ 2017-09-26 6:29 UTC (permalink / raw) To: Florian Weimer; +Cc: Hal Murray, libc-help > Do you mean annotations to find resource leaks? No. I was looking for a compiler like package that would scan all the source code to verify that the appropriate lock was held whenever an inner function was called. The inner function would need an annotation to say that it was an inner function and which lock was needed. The outer functions would need an annotation to say when it was locking or unlocking a lock. It might work to recognize the POSIX functions. That probably doesn't work if you lock in one function and unlock in a different function. I don't think I've ever worked with code like that. -- These are my opinions. I hate spam. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Understanding why getaddrinfo_a and __gai_enqueue_request use recursive locks 2017-09-25 16:13 ` Carlos O'Donell 2017-09-25 17:05 ` Will Hawkins 2017-09-25 17:23 ` Hal Murray @ 2017-09-25 23:00 ` Will Hawkins 2017-09-26 4:42 ` Carlos O'Donell 2 siblings, 1 reply; 17+ messages in thread From: Will Hawkins @ 2017-09-25 23:00 UTC (permalink / raw) To: Carlos O'Donell; +Cc: libc-help, Will Hawkins On Mon, Sep 25, 2017 at 12:13 PM, Carlos O'Donell <carlos@redhat.com> wrote: > On 09/23/2017 08:44 AM, Will Hawkins wrote: >> This is what confused me -- thank you for the confirmation that I was >> not going crazy! After an hour of trying to figure out how the code >> did not deadlock every time, I finally looked at the >> initialization/definition of the mutex and saw that it was recursive. > > We call this a belt-and-suspenders approach, particularly if there were > multiple callers in the past that might or might no have taken the lock. > > In this case it looks like there is just one caller so the appropriate > patch would have to be: > > * Remove locking from helper function. > * Add comments to helper function explaining that lock X must be held > before calling. > * Add comments in caller stating that lock must be taken before calling > helper function. > * Look to see if have a reasonable test that exercises this code path, > and if not add one (I think there are tests already that cover this > area of code). > There appears to be a related test in the file ga_test.c in resolv/. However, I can't figure out how the check/tests target is building this file. In fact, I can't get the file to build. The target that seems to control building this file is guarded by two conditionals build-shared and have-thread-library. I have verified that those flags are both set in my build environment. With that control passed, it looks like it is supposed to only then use the ga_test as the only test target. This appears to be confimed by the comments in the Changelog (Changelog.old/Changelog.12:5501 and commit log (e10546cb67fe0b5366fede3661b6c15363e4f54f) for when that file/test was incorporated into the repository. The target, then, looks like this: $(objpfx)ga_test: $(objpfx)libanl.so $(shared-thread-library) which, in turn, ensures that libanl.so is built. However, that target has no other actions. I would presume that somewhere there is a pattern matching target that will build this, but I'm too stupid to find it. If anyone can help me track this down, I'd really appreciate it. Thank you so much! Will > That's just a rough sketch of the details. This fix doesn't need a > public bug since it's not a user visible feature, just an optimization. > > -- > Cheers, > Carlos. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Understanding why getaddrinfo_a and __gai_enqueue_request use recursive locks 2017-09-25 23:00 ` Will Hawkins @ 2017-09-26 4:42 ` Carlos O'Donell 2017-09-26 6:06 ` Will Hawkins 0 siblings, 1 reply; 17+ messages in thread From: Carlos O'Donell @ 2017-09-26 4:42 UTC (permalink / raw) To: Will Hawkins; +Cc: libc-help, Will Hawkins On 09/25/2017 05:00 PM, Will Hawkins wrote: > On Mon, Sep 25, 2017 at 12:13 PM, Carlos O'Donell <carlos@redhat.com> wrote: >> On 09/23/2017 08:44 AM, Will Hawkins wrote: >>> This is what confused me -- thank you for the confirmation that I was >>> not going crazy! After an hour of trying to figure out how the code >>> did not deadlock every time, I finally looked at the >>> initialization/definition of the mutex and saw that it was recursive. >> >> We call this a belt-and-suspenders approach, particularly if there were >> multiple callers in the past that might or might no have taken the lock. >> >> In this case it looks like there is just one caller so the appropriate >> patch would have to be: >> >> * Remove locking from helper function. >> * Add comments to helper function explaining that lock X must be held >> before calling. >> * Add comments in caller stating that lock must be taken before calling >> helper function. >> * Look to see if have a reasonable test that exercises this code path, >> and if not add one (I think there are tests already that cover this >> area of code). >> > > There appears to be a related test in the file ga_test.c in resolv/. > However, I can't figure out how the check/tests target is building > this file. In fact, I can't get the file to build. The file is not being built because it's not listed in one of the test targets. It *could* be added to tests (normal tests), or tests-internal (tests that need access to libc internals), or xtests (tests that need special access, usually root). However, it's not clear that this test was ever designed to be run in the glibc test harness. It looks like an example of how to use the interfaces, but is not a test itself. > The target that seems to control building this file is guarded by two > conditionals build-shared and have-thread-library. I have verified > that those flags are both set in my build environment. With that > control passed, it looks like it is supposed to only then use the > ga_test as the only test target. This appears to be confimed by the > comments in the Changelog (Changelog.old/Changelog.12:5501 and commit > log (e10546cb67fe0b5366fede3661b6c15363e4f54f) for when that file/test > was incorporated into the repository. > > The target, then, looks like this: > > $(objpfx)ga_test: $(objpfx)libanl.so $(shared-thread-library) > > which, in turn, ensures that libanl.so is built. Right. The tests target depends on this, which then depends on these two files, and that overrides the normal build process for ga_test.c, and just ensures that libanl.so and libpthread.so are built. > However, that target has no other actions. I would presume that > somewhere there is a pattern matching target that will build this, but > I'm too stupid to find it. No, you are not too stupid, you are correct, the file ga_test.c is never built, and looks like it was never designed to be run as a test. The targets are bogus though. Can you test removing them and ga_test.c? diff --git a/resolv/Makefile b/resolv/Makefile index ec7e4fd..8a822e0 100644 --- a/resolv/Makefile +++ b/resolv/Makefile @@ -101,10 +101,6 @@ routines += $(libnss_dns-routines) $(libresolv-routines) static-only-routines += $(libnss_dns-routines) $(libresolv-routines) endif -ifeq (yesyes,$(build-shared)$(have-thread-library)) -tests: $(objpfx)ga_test -endif - ifeq ($(run-built-tests),yes) ifneq (no,$(PERL)) tests-special += $(objpfx)mtrace-tst-leaks.out @@ -134,8 +130,6 @@ $(objpfx)libnss_dns.so: $(objpfx)libresolv.so # The asynchronous name lookup code needs the thread library. $(objpfx)libanl.so: $(shared-thread-library) -$(objpfx)ga_test: $(objpfx)libanl.so $(shared-thread-library) - $(objpfx)tst-res_hconf_reorder: $(libdl) $(shared-thread-library) tst-res_hconf_reorder-ENV = RESOLV_REORDER=on --- I think you could write a much better test than ga_test.c to get better test coverage for your suggested lock changes. Then to add the test all you need to do is plunk a tst-foo.c into resolv/ and add it to the tests variable. Example: diff --git a/resolv/Makefile b/resolv/Makefile index ec7e4fd..48dc25e 100644 --- a/resolv/Makefile +++ b/resolv/Makefile @@ -53,6 +53,7 @@ tests += \ tst-resolv-network \ tst-resolv-res_init-multi \ tst-resolv-search \ + tst-foo # These tests need libdl. ifeq (yes,$(build-shared)) --- That is enough to build and run tst-foo from tst-foo.c in the resolv subdir, and you'd use the support/README-test.c as the template. -- Cheers, Carlos. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Understanding why getaddrinfo_a and __gai_enqueue_request use recursive locks 2017-09-26 4:42 ` Carlos O'Donell @ 2017-09-26 6:06 ` Will Hawkins 2017-09-26 18:39 ` Will Hawkins 0 siblings, 1 reply; 17+ messages in thread From: Will Hawkins @ 2017-09-26 6:06 UTC (permalink / raw) To: Carlos O'Donell; +Cc: libc-help, Will Hawkins On Tue, Sep 26, 2017 at 12:42 AM, Carlos O'Donell <carlos@redhat.com> wrote: > On 09/25/2017 05:00 PM, Will Hawkins wrote: >> On Mon, Sep 25, 2017 at 12:13 PM, Carlos O'Donell <carlos@redhat.com> wrote: >>> On 09/23/2017 08:44 AM, Will Hawkins wrote: >>>> This is what confused me -- thank you for the confirmation that I was >>>> not going crazy! After an hour of trying to figure out how the code >>>> did not deadlock every time, I finally looked at the >>>> initialization/definition of the mutex and saw that it was recursive. >>> >>> We call this a belt-and-suspenders approach, particularly if there were >>> multiple callers in the past that might or might no have taken the lock. >>> >>> In this case it looks like there is just one caller so the appropriate >>> patch would have to be: >>> >>> * Remove locking from helper function. >>> * Add comments to helper function explaining that lock X must be held >>> before calling. >>> * Add comments in caller stating that lock must be taken before calling >>> helper function. >>> * Look to see if have a reasonable test that exercises this code path, >>> and if not add one (I think there are tests already that cover this >>> area of code). >>> >> >> There appears to be a related test in the file ga_test.c in resolv/. >> However, I can't figure out how the check/tests target is building >> this file. In fact, I can't get the file to build. > > The file is not being built because it's not listed in one of the > test targets. > > It *could* be added to tests (normal tests), or tests-internal (tests > that need access to libc internals), or xtests (tests that need special > access, usually root). > > However, it's not clear that this test was ever designed to be run > in the glibc test harness. It looks like an example of how to use the > interfaces, but is not a test itself. Thank you for your reply! What are you saying is that, because this tests: target does not have any recipe associated with it, it becomes just another way to add a prerequisite for the tests: target whose recipe is, ultimately, defined in the "master" Makefile? An example of the recipe here: https://www.gnu.org/software/make/manual/html_node/Multiple-Rules.html > >> The target that seems to control building this file is guarded by two >> conditionals build-shared and have-thread-library. I have verified >> that those flags are both set in my build environment. With that >> control passed, it looks like it is supposed to only then use the >> ga_test as the only test target. This appears to be confimed by the >> comments in the Changelog (Changelog.old/Changelog.12:5501 and commit >> log (e10546cb67fe0b5366fede3661b6c15363e4f54f) for when that file/test >> was incorporated into the repository. >> >> The target, then, looks like this: >> >> $(objpfx)ga_test: $(objpfx)libanl.so $(shared-thread-library) >> >> which, in turn, ensures that libanl.so is built. > > Right. The tests target depends on this, which then depends on these > two files, and that overrides the normal build process for ga_test.c, > and just ensures that libanl.so and libpthread.so are built. > >> However, that target has no other actions. I would presume that >> somewhere there is a pattern matching target that will build this, but >> I'm too stupid to find it. > > No, you are not too stupid, you are correct, the file ga_test.c is never > built, and looks like it was never designed to be run as a test. > > The targets are bogus though. > > Can you test removing them and ga_test.c? > > diff --git a/resolv/Makefile b/resolv/Makefile > index ec7e4fd..8a822e0 100644 > --- a/resolv/Makefile > +++ b/resolv/Makefile > @@ -101,10 +101,6 @@ routines += $(libnss_dns-routines) $(libresolv-routines) > static-only-routines += $(libnss_dns-routines) $(libresolv-routines) > endif > > -ifeq (yesyes,$(build-shared)$(have-thread-library)) > -tests: $(objpfx)ga_test > -endif > - > ifeq ($(run-built-tests),yes) > ifneq (no,$(PERL)) > tests-special += $(objpfx)mtrace-tst-leaks.out > @@ -134,8 +130,6 @@ $(objpfx)libnss_dns.so: $(objpfx)libresolv.so > # The asynchronous name lookup code needs the thread library. > $(objpfx)libanl.so: $(shared-thread-library) > > -$(objpfx)ga_test: $(objpfx)libanl.so $(shared-thread-library) > - > $(objpfx)tst-res_hconf_reorder: $(libdl) $(shared-thread-library) > tst-res_hconf_reorder-ENV = RESOLV_REORDER=on > > --- > I am testing this as we speak but don't think that I will know the answer before I head to sleep tonight. I will respond as soon as my tests have completed their execution. > I think you could write a much better test than ga_test.c > to get better test coverage for your suggested lock changes. > > Then to add the test all you need to do is plunk a tst-foo.c into > resolv/ and add it to the tests variable. > > Example: > > diff --git a/resolv/Makefile b/resolv/Makefile > index ec7e4fd..48dc25e 100644 > --- a/resolv/Makefile > +++ b/resolv/Makefile > @@ -53,6 +53,7 @@ tests += \ > tst-resolv-network \ > tst-resolv-res_init-multi \ > tst-resolv-search \ > + tst-foo > > # These tests need libdl. > ifeq (yes,$(build-shared)) > --- > > That is enough to build and run tst-foo from tst-foo.c in the > resolv subdir, and you'd use the support/README-test.c as the > template. > As with everything that you've said so far, this makes sense. The ga_test.c code did not seem to be an effective test for this. I only went down this rabbit hold because it was the only "test" that I could find that seemed to be even remotely related to the getaddrinfo_a. When I realized that even that was not being built, I started chasing my tail trying to answer the questions above. As before, thank you! Will > -- > Cheers, > Carlos. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Understanding why getaddrinfo_a and __gai_enqueue_request use recursive locks 2017-09-26 6:06 ` Will Hawkins @ 2017-09-26 18:39 ` Will Hawkins 2017-09-27 5:54 ` Carlos O'Donell 0 siblings, 1 reply; 17+ messages in thread From: Will Hawkins @ 2017-09-26 18:39 UTC (permalink / raw) To: Carlos O'Donell; +Cc: libc-help, Will Hawkins On Tue, Sep 26, 2017 at 2:06 AM, Will Hawkins <whh8b@virginia.edu> wrote: > On Tue, Sep 26, 2017 at 12:42 AM, Carlos O'Donell <carlos@redhat.com> wrote: >> On 09/25/2017 05:00 PM, Will Hawkins wrote: >>> On Mon, Sep 25, 2017 at 12:13 PM, Carlos O'Donell <carlos@redhat.com> wrote: >>>> On 09/23/2017 08:44 AM, Will Hawkins wrote: >>>>> This is what confused me -- thank you for the confirmation that I was >>>>> not going crazy! After an hour of trying to figure out how the code >>>>> did not deadlock every time, I finally looked at the >>>>> initialization/definition of the mutex and saw that it was recursive. >>>> >>>> We call this a belt-and-suspenders approach, particularly if there were >>>> multiple callers in the past that might or might no have taken the lock. >>>> >>>> In this case it looks like there is just one caller so the appropriate >>>> patch would have to be: >>>> >>>> * Remove locking from helper function. >>>> * Add comments to helper function explaining that lock X must be held >>>> before calling. >>>> * Add comments in caller stating that lock must be taken before calling >>>> helper function. >>>> * Look to see if have a reasonable test that exercises this code path, >>>> and if not add one (I think there are tests already that cover this >>>> area of code). >>>> >>> >>> There appears to be a related test in the file ga_test.c in resolv/. >>> However, I can't figure out how the check/tests target is building >>> this file. In fact, I can't get the file to build. >> >> The file is not being built because it's not listed in one of the >> test targets. >> >> It *could* be added to tests (normal tests), or tests-internal (tests >> that need access to libc internals), or xtests (tests that need special >> access, usually root). >> >> However, it's not clear that this test was ever designed to be run >> in the glibc test harness. It looks like an example of how to use the >> interfaces, but is not a test itself. > > Thank you for your reply! > > What are you saying is that, because this tests: target does not have > any recipe associated with it, it becomes just another way to add a > prerequisite for the tests: target whose recipe is, ultimately, > defined in the "master" Makefile? An example of the recipe here: > https://www.gnu.org/software/make/manual/html_node/Multiple-Rules.html > >> >>> The target that seems to control building this file is guarded by two >>> conditionals build-shared and have-thread-library. I have verified >>> that those flags are both set in my build environment. With that >>> control passed, it looks like it is supposed to only then use the >>> ga_test as the only test target. This appears to be confimed by the >>> comments in the Changelog (Changelog.old/Changelog.12:5501 and commit >>> log (e10546cb67fe0b5366fede3661b6c15363e4f54f) for when that file/test >>> was incorporated into the repository. >>> >>> The target, then, looks like this: >>> >>> $(objpfx)ga_test: $(objpfx)libanl.so $(shared-thread-library) >>> >>> which, in turn, ensures that libanl.so is built. >> >> Right. The tests target depends on this, which then depends on these >> two files, and that overrides the normal build process for ga_test.c, >> and just ensures that libanl.so and libpthread.so are built. >> >>> However, that target has no other actions. I would presume that >>> somewhere there is a pattern matching target that will build this, but >>> I'm too stupid to find it. >> >> No, you are not too stupid, you are correct, the file ga_test.c is never >> built, and looks like it was never designed to be run as a test. >> >> The targets are bogus though. >> >> Can you test removing them and ga_test.c? >> >> diff --git a/resolv/Makefile b/resolv/Makefile >> index ec7e4fd..8a822e0 100644 >> --- a/resolv/Makefile >> +++ b/resolv/Makefile >> @@ -101,10 +101,6 @@ routines += $(libnss_dns-routines) $(libresolv-routines) >> static-only-routines += $(libnss_dns-routines) $(libresolv-routines) >> endif >> >> -ifeq (yesyes,$(build-shared)$(have-thread-library)) >> -tests: $(objpfx)ga_test >> -endif >> - >> ifeq ($(run-built-tests),yes) >> ifneq (no,$(PERL)) >> tests-special += $(objpfx)mtrace-tst-leaks.out >> @@ -134,8 +130,6 @@ $(objpfx)libnss_dns.so: $(objpfx)libresolv.so >> # The asynchronous name lookup code needs the thread library. >> $(objpfx)libanl.so: $(shared-thread-library) >> >> -$(objpfx)ga_test: $(objpfx)libanl.so $(shared-thread-library) >> - >> $(objpfx)tst-res_hconf_reorder: $(libdl) $(shared-thread-library) >> tst-res_hconf_reorder-ENV = RESOLV_REORDER=on >> >> --- >> > > I am testing this as we speak but don't think that I will know the > answer before I head to sleep tonight. I will respond as soon as my > tests have completed their execution. Testing of this 'patch' is complete. I applied the change then used the instructions on https://sourceware.org/glibc/wiki/Testing/Testsuite#Testing_just_one_test to test. Here's how I did it: From the build directory, I did rm -f resolv/*.out make subdirs=resolv check and everything ran the way I expected (the same number of test failures and passes). Thanks again for all your help! Will > >> I think you could write a much better test than ga_test.c >> to get better test coverage for your suggested lock changes. >> >> Then to add the test all you need to do is plunk a tst-foo.c into >> resolv/ and add it to the tests variable. >> >> Example: >> >> diff --git a/resolv/Makefile b/resolv/Makefile >> index ec7e4fd..48dc25e 100644 >> --- a/resolv/Makefile >> +++ b/resolv/Makefile >> @@ -53,6 +53,7 @@ tests += \ >> tst-resolv-network \ >> tst-resolv-res_init-multi \ >> tst-resolv-search \ >> + tst-foo >> >> # These tests need libdl. >> ifeq (yes,$(build-shared)) >> --- >> >> That is enough to build and run tst-foo from tst-foo.c in the >> resolv subdir, and you'd use the support/README-test.c as the >> template. >> > > As with everything that you've said so far, this makes sense. The > ga_test.c code did not seem to be an effective test for this. I only > went down this rabbit hold because it was the only "test" that I could > find that seemed to be even remotely related to the getaddrinfo_a. > When I realized that even that was not being built, I started chasing > my tail trying to answer the questions above. > > As before, thank you! > > Will > >> -- >> Cheers, >> Carlos. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Understanding why getaddrinfo_a and __gai_enqueue_request use recursive locks 2017-09-26 18:39 ` Will Hawkins @ 2017-09-27 5:54 ` Carlos O'Donell 2017-09-27 6:02 ` Will Hawkins 0 siblings, 1 reply; 17+ messages in thread From: Carlos O'Donell @ 2017-09-27 5:54 UTC (permalink / raw) To: Will Hawkins; +Cc: libc-help, Will Hawkins On 09/26/2017 12:39 PM, Will Hawkins wrote: > On Tue, Sep 26, 2017 at 2:06 AM, Will Hawkins <whh8b@virginia.edu> wrote: >> On Tue, Sep 26, 2017 at 12:42 AM, Carlos O'Donell <carlos@redhat.com> wrote: >>> Can you test removing them and ga_test.c? >>> >>> diff --git a/resolv/Makefile b/resolv/Makefile >>> index ec7e4fd..8a822e0 100644 >>> --- a/resolv/Makefile >>> +++ b/resolv/Makefile >>> @@ -101,10 +101,6 @@ routines += $(libnss_dns-routines) $(libresolv-routines) >>> static-only-routines += $(libnss_dns-routines) $(libresolv-routines) >>> endif >>> >>> -ifeq (yesyes,$(build-shared)$(have-thread-library)) >>> -tests: $(objpfx)ga_test >>> -endif >>> - >>> ifeq ($(run-built-tests),yes) >>> ifneq (no,$(PERL)) >>> tests-special += $(objpfx)mtrace-tst-leaks.out >>> @@ -134,8 +130,6 @@ $(objpfx)libnss_dns.so: $(objpfx)libresolv.so >>> # The asynchronous name lookup code needs the thread library. >>> $(objpfx)libanl.so: $(shared-thread-library) >>> >>> -$(objpfx)ga_test: $(objpfx)libanl.so $(shared-thread-library) >>> - >>> $(objpfx)tst-res_hconf_reorder: $(libdl) $(shared-thread-library) >>> tst-res_hconf_reorder-ENV = RESOLV_REORDER=on >>> >>> --- >>> >> >> I am testing this as we speak but don't think that I will know the >> answer before I head to sleep tonight. I will respond as soon as my >> tests have completed their execution. > > Testing of this 'patch' is complete. I applied the change then used > the instructions on > https://sourceware.org/glibc/wiki/Testing/Testsuite#Testing_just_one_test > to test. Here's how I did it: > > From the build directory, I did > rm -f resolv/*.out > make subdirs=resolv check > > and everything ran the way I expected (the same number of test > failures and passes). > > Thanks again for all your help! Awesome, thanks. I'll see about posting a patch to do that, and include you in the testing notes. Alternatively you can post the patch yourself and I'll approve it, the changes are mechanical and below the copyright limit, but any future work would require copyright assignment :-) -- Cheers, Carlos. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Understanding why getaddrinfo_a and __gai_enqueue_request use recursive locks 2017-09-27 5:54 ` Carlos O'Donell @ 2017-09-27 6:02 ` Will Hawkins 2017-09-27 6:05 ` Carlos O'Donell 0 siblings, 1 reply; 17+ messages in thread From: Will Hawkins @ 2017-09-27 6:02 UTC (permalink / raw) To: Carlos O'Donell; +Cc: libc-help, Will Hawkins On Wed, Sep 27, 2017 at 1:53 AM, Carlos O'Donell <carlos@redhat.com> wrote: > On 09/26/2017 12:39 PM, Will Hawkins wrote: >> On Tue, Sep 26, 2017 at 2:06 AM, Will Hawkins <whh8b@virginia.edu> wrote: >>> On Tue, Sep 26, 2017 at 12:42 AM, Carlos O'Donell <carlos@redhat.com> wrote: >>>> Can you test removing them and ga_test.c? >>>> >>>> diff --git a/resolv/Makefile b/resolv/Makefile >>>> index ec7e4fd..8a822e0 100644 >>>> --- a/resolv/Makefile >>>> +++ b/resolv/Makefile >>>> @@ -101,10 +101,6 @@ routines += $(libnss_dns-routines) $(libresolv-routines) >>>> static-only-routines += $(libnss_dns-routines) $(libresolv-routines) >>>> endif >>>> >>>> -ifeq (yesyes,$(build-shared)$(have-thread-library)) >>>> -tests: $(objpfx)ga_test >>>> -endif >>>> - >>>> ifeq ($(run-built-tests),yes) >>>> ifneq (no,$(PERL)) >>>> tests-special += $(objpfx)mtrace-tst-leaks.out >>>> @@ -134,8 +130,6 @@ $(objpfx)libnss_dns.so: $(objpfx)libresolv.so >>>> # The asynchronous name lookup code needs the thread library. >>>> $(objpfx)libanl.so: $(shared-thread-library) >>>> >>>> -$(objpfx)ga_test: $(objpfx)libanl.so $(shared-thread-library) >>>> - >>>> $(objpfx)tst-res_hconf_reorder: $(libdl) $(shared-thread-library) >>>> tst-res_hconf_reorder-ENV = RESOLV_REORDER=on >>>> >>>> --- >>>> >>> >>> I am testing this as we speak but don't think that I will know the >>> answer before I head to sleep tonight. I will respond as soon as my >>> tests have completed their execution. >> >> Testing of this 'patch' is complete. I applied the change then used >> the instructions on >> https://sourceware.org/glibc/wiki/Testing/Testsuite#Testing_just_one_test >> to test. Here's how I did it: >> >> From the build directory, I did >> rm -f resolv/*.out >> make subdirs=resolv check >> >> and everything ran the way I expected (the same number of test >> failures and passes). >> >> Thanks again for all your help! > Awesome, thanks. I'll see about posting a patch to do that, and include > you in the testing notes. Alternatively you can post the patch yourself > and I'll approve it, the changes are mechanical and below the copyright > limit, but any future work would require copyright assignment :-) If you aren't in any hurry to have this added, then I'd like to do it. It will give me practice at the submission process while I work on the other patch for removing the locks held by the callee function. Just let me know whether it would be okay to wait for a few days on this patch. If so, I'll take a crack at it! I would certainly not mind you moving ahead if you wanted to move quicker. I just thought I'd offer to do it for the experience and to keep you from having to use your valuable time! Thanks again for all the help! Will > > -- > Cheers, > Carlos. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Understanding why getaddrinfo_a and __gai_enqueue_request use recursive locks 2017-09-27 6:02 ` Will Hawkins @ 2017-09-27 6:05 ` Carlos O'Donell 0 siblings, 0 replies; 17+ messages in thread From: Carlos O'Donell @ 2017-09-27 6:05 UTC (permalink / raw) To: Will Hawkins; +Cc: libc-help, Will Hawkins On 09/27/2017 12:01 AM, Will Hawkins wrote: > On Wed, Sep 27, 2017 at 1:53 AM, Carlos O'Donell <carlos@redhat.com> wrote: >> Awesome, thanks. I'll see about posting a patch to do that, and include >> you in the testing notes. Alternatively you can post the patch yourself >> and I'll approve it, the changes are mechanical and below the copyright >> limit, but any future work would require copyright assignment :-) > > If you aren't in any hurry to have this added, then I'd like to do it. > It will give me practice at the submission process while I work on the > other patch for removing the locks held by the callee function. Just > let me know whether it would be okay to wait for a few days on this > patch. If so, I'll take a crack at it! I would certainly not mind you > moving ahead if you wanted to move quicker. I just thought I'd offer > to do it for the experience and to keep you from having to use your > valuable time! I'm not in a hurry. Please feel free to take your time and post the patch to libc-alpha when you are ready. I've got a lot of other things to work on ;-) -- Cheers, Carlos. ^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2017-09-27 6:05 UTC | newest] Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2017-09-23 7:08 Understanding why getaddrinfo_a and __gai_enqueue_request use recursive locks Will Hawkins 2017-09-23 13:04 ` Carlos O'Donell 2017-09-23 14:44 ` Will Hawkins 2017-09-25 16:13 ` Carlos O'Donell 2017-09-25 17:05 ` Will Hawkins 2017-09-25 17:23 ` Hal Murray 2017-09-25 19:06 ` Carlos O'Donell 2017-09-25 20:38 ` Florian Weimer 2017-09-25 20:39 ` Florian Weimer 2017-09-26 6:29 ` Hal Murray 2017-09-25 23:00 ` Will Hawkins 2017-09-26 4:42 ` Carlos O'Donell 2017-09-26 6:06 ` Will Hawkins 2017-09-26 18:39 ` Will Hawkins 2017-09-27 5:54 ` Carlos O'Donell 2017-09-27 6:02 ` Will Hawkins 2017-09-27 6:05 ` Carlos O'Donell
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).