public inbox for glibc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug nss/20874] getaddrinfo_a segfault
       [not found] <bug-20874-131@http.sourceware.org/bugzilla/>
@ 2021-06-21 12:45 ` rainhard.driessler at artech dot at
  0 siblings, 0 replies; only message in thread
From: rainhard.driessler at artech dot at @ 2021-06-21 12:45 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=20874

Rainhard Driessler <rainhard.driessler at artech dot at> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |rainhard.driessler at artech dot a
                   |                            |t

--- Comment #10 from Rainhard Driessler <rainhard.driessler at artech dot at> ---
Created attachment 13505
  --> https://sourceware.org/bugzilla/attachment.cgi?id=13505&action=edit
Fix getaddrinfo_a / gai_suspend race condition

I worked on this bug in 2.23 and 2.26 - after checking the current version I'm
confident this bug should still be around up to now.

>From my point of view, there's two issues here:
* The async_waitlist allocated in getaddrinfo_a is freed in gai_notify, but not
removed from the requests waitlist, leaving a dangling pointer behind
* Pointers to a gai_suspend calls' local waitlist structures may persist in
requests waitlists after gai_suspend returned, leaving invalid pointers into
the stack behind

Although the combination of getaddrinfo_a (without signals) + gai_suspend
usually works, there seems to be a thread execution order that causes invalid
waitlist pointers to be dereferenced in gai_notify, specifically in
(--*waitlist->counterp).

I don't have definitive proof so far, but I could reliably trigger the
segmentation fault as follows:
- Turn off local DNS cache
- Use a fast, local DNS for resolution
- Simulate delay using "tc qdisc add dev <ethX> root netem delay Xms 10ms 25%"
- Loop calls of getaddrinfo_a, followed gai_suspend with timeout Xms
- The more instances of the looping binary run, the faster the segfault happens

The attached patch fixes the issue for me: I unconditionally remove the
waitlist entries added within gai_suspend and furthermore remove the
async_waitlist entry freed within gai_notify from the list.

Let me know what you think.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2021-06-21 12:45 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <bug-20874-131@http.sourceware.org/bugzilla/>
2021-06-21 12:45 ` [Bug nss/20874] getaddrinfo_a segfault rainhard.driessler at artech dot at

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).