public inbox for libc-help@sourceware.org
 help / color / mirror / Atom feed
* Understanding why getaddrinfo_a and __gai_enqueue_request use recursive locks
@ 2017-09-23  7:08 Will Hawkins
  2017-09-23 13:04 ` Carlos O'Donell
  0 siblings, 1 reply; 17+ messages in thread
From: Will Hawkins @ 2017-09-23  7:08 UTC (permalink / raw)
  To: libc-help; +Cc: Will Hawkins

Hello!

Thank you all for your willingness to be part of the community of
glibc developers and for being so helpful when a novice like me has a
stupid question!

I was looking at the source code for getaddrinfo_a and noticed that it
calls __gai_enqueue_request. The former takes the __gai_requests_mutex
lock at the beginning of its work. The latter does the exact same
thing.

I understand that this will not deadlock because __gai_requests_mutex
is declared as a recursive lock, but I can't see the reason for
locking in both places.

I could imagine that it would be necessary if getaddrinfo_a or
__gai_enqueue_request called itself recursively or both the functions
were publicly accessible but neither of those seems to be the case. It
seems that getaddrinfo_a is the only function that calls
__gai_enqueue_request.

Is this somehow related to the "removal of internal function" change
that Mr. Weimer committed a few months ago (commit ca2085)? In other
words, is this meant to future proof the possibility that someone (a
non-internal user of glibc) calls __gai_enqueue_request directly?

I am sure that there is a very subtle explanation for what is going on
here, but my mind is too simple to get it. I know that the code works,
but I am curious about why.

If you can shed any light on the situation, I'd really appreciate it!

Thanks in advance!

Will

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Understanding why getaddrinfo_a and __gai_enqueue_request use recursive locks
  2017-09-23  7:08 Understanding why getaddrinfo_a and __gai_enqueue_request use recursive locks Will Hawkins
@ 2017-09-23 13:04 ` Carlos O'Donell
  2017-09-23 14:44   ` Will Hawkins
  0 siblings, 1 reply; 17+ messages in thread
From: Carlos O'Donell @ 2017-09-23 13:04 UTC (permalink / raw)
  To: Will Hawkins, libc-help; +Cc: Will Hawkins

On 09/23/2017 01:08 AM, Will Hawkins wrote:
> I am sure that there is a very subtle explanation for what is going on
> here, but my mind is too simple to get it. I know that the code works,
> but I am curious about why.
> 
> If you can shed any light on the situation, I'd really appreciate it!

It is a design principle. Acquire the lock if you inspect data that could
be concurrently modified.

You are correct that it looks like the list lock acquisition could be
removed from __gai_enqueue_request. That would be an optimization.
The comments even say so themselves.

Patches welcome! :-)

Please don't be too scared about the contribution checklist, we can
help with any step along the way:

https://sourceware.org/glibc/wiki/Contribution%20checklist

-- 
Cheers,
Carlos.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Understanding why getaddrinfo_a and __gai_enqueue_request use recursive locks
  2017-09-23 13:04 ` Carlos O'Donell
@ 2017-09-23 14:44   ` Will Hawkins
  2017-09-25 16:13     ` Carlos O'Donell
  0 siblings, 1 reply; 17+ messages in thread
From: Will Hawkins @ 2017-09-23 14:44 UTC (permalink / raw)
  To: Carlos O'Donell; +Cc: libc-help, Will Hawkins

On Sat, Sep 23, 2017 at 9:04 AM, Carlos O'Donell <carlos@redhat.com> wrote:
> On 09/23/2017 01:08 AM, Will Hawkins wrote:
>> I am sure that there is a very subtle explanation for what is going on
>> here, but my mind is too simple to get it. I know that the code works,
>> but I am curious about why.
>>
>> If you can shed any light on the situation, I'd really appreciate it!
>
> It is a design principle. Acquire the lock if you inspect data that could
> be concurrently modified.

Thank you! This makes perfect sense, of course. Obviously with an
asynchronous function that is implemented with concurrent access to
shared data structures in a multithreaded world, mutexes are our
friends. :-)

>
> You are correct that it looks like the list lock acquisition could be
> removed from __gai_enqueue_request. That would be an optimization.
> The comments even say so themselves.

This is what confused me -- thank you for the confirmation that I was
not going crazy! After an hour of trying to figure out how the code
did not deadlock every time, I finally looked at the
initialization/definition of the mutex and saw that it was recursive.

>
> Patches welcome! :-)
>

This will be awesome! I'd love to contribute. I am in/out for the next
several days but I will submit it asap!

> Please don't be too scared about the contribution checklist, we can
> help with any step along the way:
>
> https://sourceware.org/glibc/wiki/Contribution%20checklist
>

Thank you so much for taking the time to reply and encouraging my
participation. Talk to you soon!

Will

> --
> Cheers,
> Carlos.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Understanding why getaddrinfo_a and __gai_enqueue_request use recursive locks
  2017-09-23 14:44   ` Will Hawkins
@ 2017-09-25 16:13     ` Carlos O'Donell
  2017-09-25 17:05       ` Will Hawkins
                         ` (2 more replies)
  0 siblings, 3 replies; 17+ messages in thread
From: Carlos O'Donell @ 2017-09-25 16:13 UTC (permalink / raw)
  To: Will Hawkins; +Cc: libc-help, Will Hawkins

On 09/23/2017 08:44 AM, Will Hawkins wrote:
> This is what confused me -- thank you for the confirmation that I was
> not going crazy! After an hour of trying to figure out how the code
> did not deadlock every time, I finally looked at the
> initialization/definition of the mutex and saw that it was recursive.

We call this a belt-and-suspenders approach, particularly if there were
multiple callers in the past that might or might no have taken the lock.

In this case it looks like there is just one caller so the appropriate
patch would have to be:

* Remove locking from helper function.
* Add comments to helper function explaining that lock X must be held
  before calling.
* Add comments in caller stating that lock must be taken before calling
  helper function.
* Look to see if have a reasonable test that exercises this code path,
  and if not add one (I think there are tests already that cover this
  area of code).

That's just a rough sketch of the details. This fix doesn't need a
public bug since it's not a user visible feature, just an optimization.

-- 
Cheers,
Carlos.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Understanding why getaddrinfo_a and __gai_enqueue_request use recursive locks
  2017-09-25 16:13     ` Carlos O'Donell
@ 2017-09-25 17:05       ` Will Hawkins
  2017-09-25 17:23       ` Hal Murray
  2017-09-25 23:00       ` Will Hawkins
  2 siblings, 0 replies; 17+ messages in thread
From: Will Hawkins @ 2017-09-25 17:05 UTC (permalink / raw)
  To: Carlos O'Donell; +Cc: libc-help, Will Hawkins

On Mon, Sep 25, 2017 at 12:13 PM, Carlos O'Donell <carlos@redhat.com> wrote:
> On 09/23/2017 08:44 AM, Will Hawkins wrote:
>> This is what confused me -- thank you for the confirmation that I was
>> not going crazy! After an hour of trying to figure out how the code
>> did not deadlock every time, I finally looked at the
>> initialization/definition of the mutex and saw that it was recursive.
>
> We call this a belt-and-suspenders approach, particularly if there were
> multiple callers in the past that might or might no have taken the lock.
>
> In this case it looks like there is just one caller so the appropriate
> patch would have to be:
>
> * Remove locking from helper function.
> * Add comments to helper function explaining that lock X must be held
>   before calling.
> * Add comments in caller stating that lock must be taken before calling
>   helper function.
> * Look to see if have a reasonable test that exercises this code path,
>   and if not add one (I think there are tests already that cover this
>   area of code).
>
> That's just a rough sketch of the details. This fix doesn't need a
> public bug since it's not a user visible feature, just an optimization.

Mr. O'Donnell,

This is a great plan of attack and I'll follow it. Once I've made the
necessary changes I will go back and read the information about
submitting patches and then go from there.

Thanks for your mentorship with this trivial patch.

Will

>
> --
> Cheers,
> Carlos.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Understanding why getaddrinfo_a and __gai_enqueue_request use  recursive locks
  2017-09-25 16:13     ` Carlos O'Donell
  2017-09-25 17:05       ` Will Hawkins
@ 2017-09-25 17:23       ` Hal Murray
  2017-09-25 19:06         ` Carlos O'Donell
  2017-09-25 20:39         ` Florian Weimer
  2017-09-25 23:00       ` Will Hawkins
  2 siblings, 2 replies; 17+ messages in thread
From: Hal Murray @ 2017-09-25 17:23 UTC (permalink / raw)
  To: Carlos O'Donell; +Cc: libc-help, hmurray


carlos@redhat.com said:
> * Look to see if have a reasonable test that exercises this code path,
>   and if not add one (I think there are tests already that cover this
>   area of code). 

Are there any tools that scan source code to check for this sort of thing?

I think you would need a way to annotate helper functions as needing a lock 
and specify the lock.  It may also take annotations in the caller function.


-- 
These are my opinions.  I hate spam.



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Understanding why getaddrinfo_a and __gai_enqueue_request use recursive locks
  2017-09-25 17:23       ` Hal Murray
@ 2017-09-25 19:06         ` Carlos O'Donell
  2017-09-25 20:38           ` Florian Weimer
  2017-09-25 20:39         ` Florian Weimer
  1 sibling, 1 reply; 17+ messages in thread
From: Carlos O'Donell @ 2017-09-25 19:06 UTC (permalink / raw)
  To: Hal Murray; +Cc: libc-help

On 09/25/2017 11:23 AM, Hal Murray wrote:
> 
> carlos@redhat.com said:
>> * Look to see if have a reasonable test that exercises this code path,
>>   and if not add one (I think there are tests already that cover this
>>   area of code). 
> 
> Are there any tools that scan source code to check for this sort of thing?
> 
> I think you would need a way to annotate helper functions as needing a lock 
> and specify the lock.  It may also take annotations in the caller function.

For glibc you can't use gcov (since some of the pieces of dependent functionality
are implemented by glibc), but you can use perf e.g. perf record, perf report.
That way you can get coverage.

-- 
Cheers,
Carlos.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Understanding why getaddrinfo_a and __gai_enqueue_request use recursive locks
  2017-09-25 19:06         ` Carlos O'Donell
@ 2017-09-25 20:38           ` Florian Weimer
  0 siblings, 0 replies; 17+ messages in thread
From: Florian Weimer @ 2017-09-25 20:38 UTC (permalink / raw)
  To: Carlos O'Donell, Hal Murray; +Cc: libc-help

On 09/25/2017 09:06 PM, Carlos O'Donell wrote:

> For glibc you can't use gcov (since some of the pieces of dependent functionality
> are implemented by glibc),

We have special builds with profiling enabled.  They should give 
function-level coverage information at least.

Florian

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Understanding why getaddrinfo_a and __gai_enqueue_request use recursive locks
  2017-09-25 17:23       ` Hal Murray
  2017-09-25 19:06         ` Carlos O'Donell
@ 2017-09-25 20:39         ` Florian Weimer
  2017-09-26  6:29           ` Hal Murray
  1 sibling, 1 reply; 17+ messages in thread
From: Florian Weimer @ 2017-09-25 20:39 UTC (permalink / raw)
  To: Hal Murray, Carlos O'Donell; +Cc: libc-help

On 09/25/2017 07:23 PM, Hal Murray wrote:
> I think you would need a way to annotate helper functions as needing a lock
> and specify the lock.  It may also take annotations in the caller function.

Do you mean annotations to find resource leaks?

As far as free software tools are concerned, using C++ and RAII looks 
like the only feasible solution (with strong tool support) right now.

Florian

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Understanding why getaddrinfo_a and __gai_enqueue_request use recursive locks
  2017-09-25 16:13     ` Carlos O'Donell
  2017-09-25 17:05       ` Will Hawkins
  2017-09-25 17:23       ` Hal Murray
@ 2017-09-25 23:00       ` Will Hawkins
  2017-09-26  4:42         ` Carlos O'Donell
  2 siblings, 1 reply; 17+ messages in thread
From: Will Hawkins @ 2017-09-25 23:00 UTC (permalink / raw)
  To: Carlos O'Donell; +Cc: libc-help, Will Hawkins

On Mon, Sep 25, 2017 at 12:13 PM, Carlos O'Donell <carlos@redhat.com> wrote:
> On 09/23/2017 08:44 AM, Will Hawkins wrote:
>> This is what confused me -- thank you for the confirmation that I was
>> not going crazy! After an hour of trying to figure out how the code
>> did not deadlock every time, I finally looked at the
>> initialization/definition of the mutex and saw that it was recursive.
>
> We call this a belt-and-suspenders approach, particularly if there were
> multiple callers in the past that might or might no have taken the lock.
>
> In this case it looks like there is just one caller so the appropriate
> patch would have to be:
>
> * Remove locking from helper function.
> * Add comments to helper function explaining that lock X must be held
>   before calling.
> * Add comments in caller stating that lock must be taken before calling
>   helper function.
> * Look to see if have a reasonable test that exercises this code path,
>   and if not add one (I think there are tests already that cover this
>   area of code).
>

There appears to be a related test in the file ga_test.c in resolv/.
However, I can't figure out how the check/tests target is building
this file. In fact, I can't get the file to build.

The target that seems to control building this file is guarded by two
conditionals build-shared and have-thread-library. I have verified
that those flags are both set in my build environment. With that
control passed, it looks like it is supposed to only then use the
ga_test as the only test target. This appears to be confimed by the
comments in the Changelog (Changelog.old/Changelog.12:5501 and commit
log (e10546cb67fe0b5366fede3661b6c15363e4f54f) for when that file/test
was incorporated into the repository.

The target, then, looks like this:

$(objpfx)ga_test: $(objpfx)libanl.so $(shared-thread-library)

which, in turn, ensures that libanl.so is built.

However, that target has no other actions. I would presume that
somewhere there is a pattern matching target that will build this, but
I'm too stupid to find it.

If anyone can help me track this down, I'd really appreciate it. Thank
you so much!

Will


> That's just a rough sketch of the details. This fix doesn't need a
> public bug since it's not a user visible feature, just an optimization.
>
> --
> Cheers,
> Carlos.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Understanding why getaddrinfo_a and __gai_enqueue_request use recursive locks
  2017-09-25 23:00       ` Will Hawkins
@ 2017-09-26  4:42         ` Carlos O'Donell
  2017-09-26  6:06           ` Will Hawkins
  0 siblings, 1 reply; 17+ messages in thread
From: Carlos O'Donell @ 2017-09-26  4:42 UTC (permalink / raw)
  To: Will Hawkins; +Cc: libc-help, Will Hawkins

On 09/25/2017 05:00 PM, Will Hawkins wrote:
> On Mon, Sep 25, 2017 at 12:13 PM, Carlos O'Donell <carlos@redhat.com> wrote:
>> On 09/23/2017 08:44 AM, Will Hawkins wrote:
>>> This is what confused me -- thank you for the confirmation that I was
>>> not going crazy! After an hour of trying to figure out how the code
>>> did not deadlock every time, I finally looked at the
>>> initialization/definition of the mutex and saw that it was recursive.
>>
>> We call this a belt-and-suspenders approach, particularly if there were
>> multiple callers in the past that might or might no have taken the lock.
>>
>> In this case it looks like there is just one caller so the appropriate
>> patch would have to be:
>>
>> * Remove locking from helper function.
>> * Add comments to helper function explaining that lock X must be held
>>   before calling.
>> * Add comments in caller stating that lock must be taken before calling
>>   helper function.
>> * Look to see if have a reasonable test that exercises this code path,
>>   and if not add one (I think there are tests already that cover this
>>   area of code).
>>
> 
> There appears to be a related test in the file ga_test.c in resolv/.
> However, I can't figure out how the check/tests target is building
> this file. In fact, I can't get the file to build.

The file is not being built because it's not listed in one of the
test targets.

It *could* be added to tests (normal tests), or tests-internal (tests
that need access to libc internals), or xtests (tests that need special
access, usually root).

However, it's not clear that this test was ever designed to be run
in the glibc test harness. It looks like an example of how to use the
interfaces, but is not a test itself.

> The target that seems to control building this file is guarded by two
> conditionals build-shared and have-thread-library. I have verified
> that those flags are both set in my build environment. With that
> control passed, it looks like it is supposed to only then use the
> ga_test as the only test target. This appears to be confimed by the
> comments in the Changelog (Changelog.old/Changelog.12:5501 and commit
> log (e10546cb67fe0b5366fede3661b6c15363e4f54f) for when that file/test
> was incorporated into the repository.
> 
> The target, then, looks like this:
> 
> $(objpfx)ga_test: $(objpfx)libanl.so $(shared-thread-library)
> 
> which, in turn, ensures that libanl.so is built.

Right. The tests target depends on this, which then depends on these
two files, and that overrides the normal build process for ga_test.c,
and just ensures that libanl.so and libpthread.so are built.

> However, that target has no other actions. I would presume that
> somewhere there is a pattern matching target that will build this, but
> I'm too stupid to find it.

No, you are not too stupid, you are correct, the file ga_test.c is never
built, and looks like it was never designed to be run as a test.

The targets are bogus though.

Can you test removing them and ga_test.c?

diff --git a/resolv/Makefile b/resolv/Makefile
index ec7e4fd..8a822e0 100644
--- a/resolv/Makefile
+++ b/resolv/Makefile
@@ -101,10 +101,6 @@ routines                += $(libnss_dns-routines) $(libresolv-routines)
 static-only-routines    += $(libnss_dns-routines) $(libresolv-routines)
 endif
 
-ifeq (yesyes,$(build-shared)$(have-thread-library))
-tests: $(objpfx)ga_test
-endif
-
 ifeq ($(run-built-tests),yes)
 ifneq (no,$(PERL))
 tests-special += $(objpfx)mtrace-tst-leaks.out
@@ -134,8 +130,6 @@ $(objpfx)libnss_dns.so: $(objpfx)libresolv.so
 # The asynchronous name lookup code needs the thread library.
 $(objpfx)libanl.so: $(shared-thread-library)
 
-$(objpfx)ga_test: $(objpfx)libanl.so $(shared-thread-library)
-
 $(objpfx)tst-res_hconf_reorder: $(libdl) $(shared-thread-library)
 tst-res_hconf_reorder-ENV = RESOLV_REORDER=on
 
---

I think you could write a much better test than ga_test.c
to get better test coverage for your suggested lock changes.

Then to add the test all you need to do is plunk a tst-foo.c into
resolv/ and add it to the tests variable.

Example:

diff --git a/resolv/Makefile b/resolv/Makefile
index ec7e4fd..48dc25e 100644
--- a/resolv/Makefile
+++ b/resolv/Makefile
@@ -53,6 +53,7 @@ tests += \
   tst-resolv-network \
   tst-resolv-res_init-multi \
   tst-resolv-search \
+  tst-foo
 
 # These tests need libdl.
 ifeq (yes,$(build-shared))
---

That is enough to build and run tst-foo from tst-foo.c in the
resolv subdir, and you'd use the support/README-test.c as the
template.

-- 
Cheers,
Carlos.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Understanding why getaddrinfo_a and __gai_enqueue_request use recursive locks
  2017-09-26  4:42         ` Carlos O'Donell
@ 2017-09-26  6:06           ` Will Hawkins
  2017-09-26 18:39             ` Will Hawkins
  0 siblings, 1 reply; 17+ messages in thread
From: Will Hawkins @ 2017-09-26  6:06 UTC (permalink / raw)
  To: Carlos O'Donell; +Cc: libc-help, Will Hawkins

On Tue, Sep 26, 2017 at 12:42 AM, Carlos O'Donell <carlos@redhat.com> wrote:
> On 09/25/2017 05:00 PM, Will Hawkins wrote:
>> On Mon, Sep 25, 2017 at 12:13 PM, Carlos O'Donell <carlos@redhat.com> wrote:
>>> On 09/23/2017 08:44 AM, Will Hawkins wrote:
>>>> This is what confused me -- thank you for the confirmation that I was
>>>> not going crazy! After an hour of trying to figure out how the code
>>>> did not deadlock every time, I finally looked at the
>>>> initialization/definition of the mutex and saw that it was recursive.
>>>
>>> We call this a belt-and-suspenders approach, particularly if there were
>>> multiple callers in the past that might or might no have taken the lock.
>>>
>>> In this case it looks like there is just one caller so the appropriate
>>> patch would have to be:
>>>
>>> * Remove locking from helper function.
>>> * Add comments to helper function explaining that lock X must be held
>>>   before calling.
>>> * Add comments in caller stating that lock must be taken before calling
>>>   helper function.
>>> * Look to see if have a reasonable test that exercises this code path,
>>>   and if not add one (I think there are tests already that cover this
>>>   area of code).
>>>
>>
>> There appears to be a related test in the file ga_test.c in resolv/.
>> However, I can't figure out how the check/tests target is building
>> this file. In fact, I can't get the file to build.
>
> The file is not being built because it's not listed in one of the
> test targets.
>
> It *could* be added to tests (normal tests), or tests-internal (tests
> that need access to libc internals), or xtests (tests that need special
> access, usually root).
>
> However, it's not clear that this test was ever designed to be run
> in the glibc test harness. It looks like an example of how to use the
> interfaces, but is not a test itself.

Thank you for your reply!

What are you saying is that, because this tests: target does not have
any recipe associated with it, it becomes just another way to add a
prerequisite for the tests: target whose recipe is, ultimately,
defined in the "master" Makefile? An example of the recipe here:
https://www.gnu.org/software/make/manual/html_node/Multiple-Rules.html

>
>> The target that seems to control building this file is guarded by two
>> conditionals build-shared and have-thread-library. I have verified
>> that those flags are both set in my build environment. With that
>> control passed, it looks like it is supposed to only then use the
>> ga_test as the only test target. This appears to be confimed by the
>> comments in the Changelog (Changelog.old/Changelog.12:5501 and commit
>> log (e10546cb67fe0b5366fede3661b6c15363e4f54f) for when that file/test
>> was incorporated into the repository.
>>
>> The target, then, looks like this:
>>
>> $(objpfx)ga_test: $(objpfx)libanl.so $(shared-thread-library)
>>
>> which, in turn, ensures that libanl.so is built.
>
> Right. The tests target depends on this, which then depends on these
> two files, and that overrides the normal build process for ga_test.c,
> and just ensures that libanl.so and libpthread.so are built.
>
>> However, that target has no other actions. I would presume that
>> somewhere there is a pattern matching target that will build this, but
>> I'm too stupid to find it.
>
> No, you are not too stupid, you are correct, the file ga_test.c is never
> built, and looks like it was never designed to be run as a test.
>
> The targets are bogus though.
>
> Can you test removing them and ga_test.c?
>
> diff --git a/resolv/Makefile b/resolv/Makefile
> index ec7e4fd..8a822e0 100644
> --- a/resolv/Makefile
> +++ b/resolv/Makefile
> @@ -101,10 +101,6 @@ routines                += $(libnss_dns-routines) $(libresolv-routines)
>  static-only-routines    += $(libnss_dns-routines) $(libresolv-routines)
>  endif
>
> -ifeq (yesyes,$(build-shared)$(have-thread-library))
> -tests: $(objpfx)ga_test
> -endif
> -
>  ifeq ($(run-built-tests),yes)
>  ifneq (no,$(PERL))
>  tests-special += $(objpfx)mtrace-tst-leaks.out
> @@ -134,8 +130,6 @@ $(objpfx)libnss_dns.so: $(objpfx)libresolv.so
>  # The asynchronous name lookup code needs the thread library.
>  $(objpfx)libanl.so: $(shared-thread-library)
>
> -$(objpfx)ga_test: $(objpfx)libanl.so $(shared-thread-library)
> -
>  $(objpfx)tst-res_hconf_reorder: $(libdl) $(shared-thread-library)
>  tst-res_hconf_reorder-ENV = RESOLV_REORDER=on
>
> ---
>

I am testing this as we speak but don't think that I will know the
answer before I head to sleep tonight. I will respond as soon as my
tests have completed their execution.

> I think you could write a much better test than ga_test.c
> to get better test coverage for your suggested lock changes.
>
> Then to add the test all you need to do is plunk a tst-foo.c into
> resolv/ and add it to the tests variable.
>
> Example:
>
> diff --git a/resolv/Makefile b/resolv/Makefile
> index ec7e4fd..48dc25e 100644
> --- a/resolv/Makefile
> +++ b/resolv/Makefile
> @@ -53,6 +53,7 @@ tests += \
>    tst-resolv-network \
>    tst-resolv-res_init-multi \
>    tst-resolv-search \
> +  tst-foo
>
>  # These tests need libdl.
>  ifeq (yes,$(build-shared))
> ---
>
> That is enough to build and run tst-foo from tst-foo.c in the
> resolv subdir, and you'd use the support/README-test.c as the
> template.
>

As with everything that you've said so far, this makes sense. The
ga_test.c code did not seem to be an effective test for this. I only
went down this rabbit hold because it was the only "test" that I could
find that seemed to be even remotely related to the getaddrinfo_a.
When I realized that even that was not being built, I started chasing
my tail trying to answer the questions above.

As before, thank you!

Will

> --
> Cheers,
> Carlos.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Understanding why getaddrinfo_a and __gai_enqueue_request use  recursive locks
  2017-09-25 20:39         ` Florian Weimer
@ 2017-09-26  6:29           ` Hal Murray
  0 siblings, 0 replies; 17+ messages in thread
From: Hal Murray @ 2017-09-26  6:29 UTC (permalink / raw)
  To: Florian Weimer; +Cc: Hal Murray, libc-help

> Do you mean annotations to find resource leaks?

No.  I was looking for a compiler like package that would scan all the source 
code to verify that the appropriate lock was held whenever an inner function 
was called.

The inner function would need an annotation to say that it was an inner 
function and which lock was needed.

The outer functions would need an annotation to say when it was locking or 
unlocking a lock.  It might work to recognize the POSIX functions.

That probably doesn't work if you lock in one function and unlock in a 
different function.  I don't think I've ever worked with code like that.


-- 
These are my opinions.  I hate spam.



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Understanding why getaddrinfo_a and __gai_enqueue_request use recursive locks
  2017-09-26  6:06           ` Will Hawkins
@ 2017-09-26 18:39             ` Will Hawkins
  2017-09-27  5:54               ` Carlos O'Donell
  0 siblings, 1 reply; 17+ messages in thread
From: Will Hawkins @ 2017-09-26 18:39 UTC (permalink / raw)
  To: Carlos O'Donell; +Cc: libc-help, Will Hawkins

On Tue, Sep 26, 2017 at 2:06 AM, Will Hawkins <whh8b@virginia.edu> wrote:
> On Tue, Sep 26, 2017 at 12:42 AM, Carlos O'Donell <carlos@redhat.com> wrote:
>> On 09/25/2017 05:00 PM, Will Hawkins wrote:
>>> On Mon, Sep 25, 2017 at 12:13 PM, Carlos O'Donell <carlos@redhat.com> wrote:
>>>> On 09/23/2017 08:44 AM, Will Hawkins wrote:
>>>>> This is what confused me -- thank you for the confirmation that I was
>>>>> not going crazy! After an hour of trying to figure out how the code
>>>>> did not deadlock every time, I finally looked at the
>>>>> initialization/definition of the mutex and saw that it was recursive.
>>>>
>>>> We call this a belt-and-suspenders approach, particularly if there were
>>>> multiple callers in the past that might or might no have taken the lock.
>>>>
>>>> In this case it looks like there is just one caller so the appropriate
>>>> patch would have to be:
>>>>
>>>> * Remove locking from helper function.
>>>> * Add comments to helper function explaining that lock X must be held
>>>>   before calling.
>>>> * Add comments in caller stating that lock must be taken before calling
>>>>   helper function.
>>>> * Look to see if have a reasonable test that exercises this code path,
>>>>   and if not add one (I think there are tests already that cover this
>>>>   area of code).
>>>>
>>>
>>> There appears to be a related test in the file ga_test.c in resolv/.
>>> However, I can't figure out how the check/tests target is building
>>> this file. In fact, I can't get the file to build.
>>
>> The file is not being built because it's not listed in one of the
>> test targets.
>>
>> It *could* be added to tests (normal tests), or tests-internal (tests
>> that need access to libc internals), or xtests (tests that need special
>> access, usually root).
>>
>> However, it's not clear that this test was ever designed to be run
>> in the glibc test harness. It looks like an example of how to use the
>> interfaces, but is not a test itself.
>
> Thank you for your reply!
>
> What are you saying is that, because this tests: target does not have
> any recipe associated with it, it becomes just another way to add a
> prerequisite for the tests: target whose recipe is, ultimately,
> defined in the "master" Makefile? An example of the recipe here:
> https://www.gnu.org/software/make/manual/html_node/Multiple-Rules.html
>
>>
>>> The target that seems to control building this file is guarded by two
>>> conditionals build-shared and have-thread-library. I have verified
>>> that those flags are both set in my build environment. With that
>>> control passed, it looks like it is supposed to only then use the
>>> ga_test as the only test target. This appears to be confimed by the
>>> comments in the Changelog (Changelog.old/Changelog.12:5501 and commit
>>> log (e10546cb67fe0b5366fede3661b6c15363e4f54f) for when that file/test
>>> was incorporated into the repository.
>>>
>>> The target, then, looks like this:
>>>
>>> $(objpfx)ga_test: $(objpfx)libanl.so $(shared-thread-library)
>>>
>>> which, in turn, ensures that libanl.so is built.
>>
>> Right. The tests target depends on this, which then depends on these
>> two files, and that overrides the normal build process for ga_test.c,
>> and just ensures that libanl.so and libpthread.so are built.
>>
>>> However, that target has no other actions. I would presume that
>>> somewhere there is a pattern matching target that will build this, but
>>> I'm too stupid to find it.
>>
>> No, you are not too stupid, you are correct, the file ga_test.c is never
>> built, and looks like it was never designed to be run as a test.
>>
>> The targets are bogus though.
>>
>> Can you test removing them and ga_test.c?
>>
>> diff --git a/resolv/Makefile b/resolv/Makefile
>> index ec7e4fd..8a822e0 100644
>> --- a/resolv/Makefile
>> +++ b/resolv/Makefile
>> @@ -101,10 +101,6 @@ routines                += $(libnss_dns-routines) $(libresolv-routines)
>>  static-only-routines    += $(libnss_dns-routines) $(libresolv-routines)
>>  endif
>>
>> -ifeq (yesyes,$(build-shared)$(have-thread-library))
>> -tests: $(objpfx)ga_test
>> -endif
>> -
>>  ifeq ($(run-built-tests),yes)
>>  ifneq (no,$(PERL))
>>  tests-special += $(objpfx)mtrace-tst-leaks.out
>> @@ -134,8 +130,6 @@ $(objpfx)libnss_dns.so: $(objpfx)libresolv.so
>>  # The asynchronous name lookup code needs the thread library.
>>  $(objpfx)libanl.so: $(shared-thread-library)
>>
>> -$(objpfx)ga_test: $(objpfx)libanl.so $(shared-thread-library)
>> -
>>  $(objpfx)tst-res_hconf_reorder: $(libdl) $(shared-thread-library)
>>  tst-res_hconf_reorder-ENV = RESOLV_REORDER=on
>>
>> ---
>>
>
> I am testing this as we speak but don't think that I will know the
> answer before I head to sleep tonight. I will respond as soon as my
> tests have completed their execution.

Testing of this 'patch' is complete. I applied the change then used
the instructions on
https://sourceware.org/glibc/wiki/Testing/Testsuite#Testing_just_one_test
to test. Here's how I did it:

From the build directory, I did
rm -f resolv/*.out
make subdirs=resolv check

and everything ran the way I expected (the same number of test
failures and passes).

Thanks again for all your help!
Will


>
>> I think you could write a much better test than ga_test.c
>> to get better test coverage for your suggested lock changes.
>>
>> Then to add the test all you need to do is plunk a tst-foo.c into
>> resolv/ and add it to the tests variable.
>>
>> Example:
>>
>> diff --git a/resolv/Makefile b/resolv/Makefile
>> index ec7e4fd..48dc25e 100644
>> --- a/resolv/Makefile
>> +++ b/resolv/Makefile
>> @@ -53,6 +53,7 @@ tests += \
>>    tst-resolv-network \
>>    tst-resolv-res_init-multi \
>>    tst-resolv-search \
>> +  tst-foo
>>
>>  # These tests need libdl.
>>  ifeq (yes,$(build-shared))
>> ---
>>
>> That is enough to build and run tst-foo from tst-foo.c in the
>> resolv subdir, and you'd use the support/README-test.c as the
>> template.
>>
>
> As with everything that you've said so far, this makes sense. The
> ga_test.c code did not seem to be an effective test for this. I only
> went down this rabbit hold because it was the only "test" that I could
> find that seemed to be even remotely related to the getaddrinfo_a.
> When I realized that even that was not being built, I started chasing
> my tail trying to answer the questions above.
>
> As before, thank you!
>
> Will
>
>> --
>> Cheers,
>> Carlos.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Understanding why getaddrinfo_a and __gai_enqueue_request use recursive locks
  2017-09-26 18:39             ` Will Hawkins
@ 2017-09-27  5:54               ` Carlos O'Donell
  2017-09-27  6:02                 ` Will Hawkins
  0 siblings, 1 reply; 17+ messages in thread
From: Carlos O'Donell @ 2017-09-27  5:54 UTC (permalink / raw)
  To: Will Hawkins; +Cc: libc-help, Will Hawkins

On 09/26/2017 12:39 PM, Will Hawkins wrote:
> On Tue, Sep 26, 2017 at 2:06 AM, Will Hawkins <whh8b@virginia.edu> wrote:
>> On Tue, Sep 26, 2017 at 12:42 AM, Carlos O'Donell <carlos@redhat.com> wrote:
>>> Can you test removing them and ga_test.c?
>>>
>>> diff --git a/resolv/Makefile b/resolv/Makefile
>>> index ec7e4fd..8a822e0 100644
>>> --- a/resolv/Makefile
>>> +++ b/resolv/Makefile
>>> @@ -101,10 +101,6 @@ routines                += $(libnss_dns-routines) $(libresolv-routines)
>>>  static-only-routines    += $(libnss_dns-routines) $(libresolv-routines)
>>>  endif
>>>
>>> -ifeq (yesyes,$(build-shared)$(have-thread-library))
>>> -tests: $(objpfx)ga_test
>>> -endif
>>> -
>>>  ifeq ($(run-built-tests),yes)
>>>  ifneq (no,$(PERL))
>>>  tests-special += $(objpfx)mtrace-tst-leaks.out
>>> @@ -134,8 +130,6 @@ $(objpfx)libnss_dns.so: $(objpfx)libresolv.so
>>>  # The asynchronous name lookup code needs the thread library.
>>>  $(objpfx)libanl.so: $(shared-thread-library)
>>>
>>> -$(objpfx)ga_test: $(objpfx)libanl.so $(shared-thread-library)
>>> -
>>>  $(objpfx)tst-res_hconf_reorder: $(libdl) $(shared-thread-library)
>>>  tst-res_hconf_reorder-ENV = RESOLV_REORDER=on
>>>
>>> ---
>>>
>>
>> I am testing this as we speak but don't think that I will know the
>> answer before I head to sleep tonight. I will respond as soon as my
>> tests have completed their execution.
> 
> Testing of this 'patch' is complete. I applied the change then used
> the instructions on
> https://sourceware.org/glibc/wiki/Testing/Testsuite#Testing_just_one_test
> to test. Here's how I did it:
> 
> From the build directory, I did
> rm -f resolv/*.out
> make subdirs=resolv check
> 
> and everything ran the way I expected (the same number of test
> failures and passes).
> 
> Thanks again for all your help!
Awesome, thanks. I'll see about posting a patch to do that, and include
you in the testing notes. Alternatively you can post the patch yourself
and I'll approve it, the changes are mechanical and below the copyright
limit, but any future work would require copyright assignment :-)

-- 
Cheers,
Carlos.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Understanding why getaddrinfo_a and __gai_enqueue_request use recursive locks
  2017-09-27  5:54               ` Carlos O'Donell
@ 2017-09-27  6:02                 ` Will Hawkins
  2017-09-27  6:05                   ` Carlos O'Donell
  0 siblings, 1 reply; 17+ messages in thread
From: Will Hawkins @ 2017-09-27  6:02 UTC (permalink / raw)
  To: Carlos O'Donell; +Cc: libc-help, Will Hawkins

On Wed, Sep 27, 2017 at 1:53 AM, Carlos O'Donell <carlos@redhat.com> wrote:
> On 09/26/2017 12:39 PM, Will Hawkins wrote:
>> On Tue, Sep 26, 2017 at 2:06 AM, Will Hawkins <whh8b@virginia.edu> wrote:
>>> On Tue, Sep 26, 2017 at 12:42 AM, Carlos O'Donell <carlos@redhat.com> wrote:
>>>> Can you test removing them and ga_test.c?
>>>>
>>>> diff --git a/resolv/Makefile b/resolv/Makefile
>>>> index ec7e4fd..8a822e0 100644
>>>> --- a/resolv/Makefile
>>>> +++ b/resolv/Makefile
>>>> @@ -101,10 +101,6 @@ routines                += $(libnss_dns-routines) $(libresolv-routines)
>>>>  static-only-routines    += $(libnss_dns-routines) $(libresolv-routines)
>>>>  endif
>>>>
>>>> -ifeq (yesyes,$(build-shared)$(have-thread-library))
>>>> -tests: $(objpfx)ga_test
>>>> -endif
>>>> -
>>>>  ifeq ($(run-built-tests),yes)
>>>>  ifneq (no,$(PERL))
>>>>  tests-special += $(objpfx)mtrace-tst-leaks.out
>>>> @@ -134,8 +130,6 @@ $(objpfx)libnss_dns.so: $(objpfx)libresolv.so
>>>>  # The asynchronous name lookup code needs the thread library.
>>>>  $(objpfx)libanl.so: $(shared-thread-library)
>>>>
>>>> -$(objpfx)ga_test: $(objpfx)libanl.so $(shared-thread-library)
>>>> -
>>>>  $(objpfx)tst-res_hconf_reorder: $(libdl) $(shared-thread-library)
>>>>  tst-res_hconf_reorder-ENV = RESOLV_REORDER=on
>>>>
>>>> ---
>>>>
>>>
>>> I am testing this as we speak but don't think that I will know the
>>> answer before I head to sleep tonight. I will respond as soon as my
>>> tests have completed their execution.
>>
>> Testing of this 'patch' is complete. I applied the change then used
>> the instructions on
>> https://sourceware.org/glibc/wiki/Testing/Testsuite#Testing_just_one_test
>> to test. Here's how I did it:
>>
>> From the build directory, I did
>> rm -f resolv/*.out
>> make subdirs=resolv check
>>
>> and everything ran the way I expected (the same number of test
>> failures and passes).
>>
>> Thanks again for all your help!
> Awesome, thanks. I'll see about posting a patch to do that, and include
> you in the testing notes. Alternatively you can post the patch yourself
> and I'll approve it, the changes are mechanical and below the copyright
> limit, but any future work would require copyright assignment :-)

If you aren't in any hurry to have this added, then I'd like to do it.
It will give me practice at the submission process while I work on the
other patch for removing the locks held by the callee function. Just
let me know whether it would be okay to wait for a few days on this
patch. If so, I'll take a crack at it! I would certainly not mind you
moving ahead if you wanted to move quicker. I just thought I'd offer
to do it for the experience and to keep you from having to use your
valuable time!

Thanks again for all the help!
Will

>
> --
> Cheers,
> Carlos.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Understanding why getaddrinfo_a and __gai_enqueue_request use recursive locks
  2017-09-27  6:02                 ` Will Hawkins
@ 2017-09-27  6:05                   ` Carlos O'Donell
  0 siblings, 0 replies; 17+ messages in thread
From: Carlos O'Donell @ 2017-09-27  6:05 UTC (permalink / raw)
  To: Will Hawkins; +Cc: libc-help, Will Hawkins

On 09/27/2017 12:01 AM, Will Hawkins wrote:
> On Wed, Sep 27, 2017 at 1:53 AM, Carlos O'Donell <carlos@redhat.com> wrote:
>> Awesome, thanks. I'll see about posting a patch to do that, and include
>> you in the testing notes. Alternatively you can post the patch yourself
>> and I'll approve it, the changes are mechanical and below the copyright
>> limit, but any future work would require copyright assignment :-)
> 
> If you aren't in any hurry to have this added, then I'd like to do it.
> It will give me practice at the submission process while I work on the
> other patch for removing the locks held by the callee function. Just
> let me know whether it would be okay to wait for a few days on this
> patch. If so, I'll take a crack at it! I would certainly not mind you
> moving ahead if you wanted to move quicker. I just thought I'd offer
> to do it for the experience and to keep you from having to use your
> valuable time!

I'm not in a hurry. Please feel free to take your time and post the
patch to libc-alpha when you are ready. I've got a lot of other things
to work on ;-)

-- 
Cheers,
Carlos.

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2017-09-27  6:05 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-09-23  7:08 Understanding why getaddrinfo_a and __gai_enqueue_request use recursive locks Will Hawkins
2017-09-23 13:04 ` Carlos O'Donell
2017-09-23 14:44   ` Will Hawkins
2017-09-25 16:13     ` Carlos O'Donell
2017-09-25 17:05       ` Will Hawkins
2017-09-25 17:23       ` Hal Murray
2017-09-25 19:06         ` Carlos O'Donell
2017-09-25 20:38           ` Florian Weimer
2017-09-25 20:39         ` Florian Weimer
2017-09-26  6:29           ` Hal Murray
2017-09-25 23:00       ` Will Hawkins
2017-09-26  4:42         ` Carlos O'Donell
2017-09-26  6:06           ` Will Hawkins
2017-09-26 18:39             ` Will Hawkins
2017-09-27  5:54               ` Carlos O'Donell
2017-09-27  6:02                 ` Will Hawkins
2017-09-27  6:05                   ` Carlos O'Donell

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).