public inbox for glibc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug libc/30081] New: libresolv: timeout when running in single-request mode
@ 2023-02-05  4:10 thiago at kde dot org
  2024-06-13  7:38 ` [Bug libc/30081] " fweimer at redhat dot com
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: thiago at kde dot org @ 2023-02-05  4:10 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=30081

            Bug ID: 30081
           Summary: libresolv: timeout when running in single-request mode
           Product: glibc
           Version: 2.39
            Status: UNCONFIRMED
          Severity: normal
          Priority: P2
         Component: libc
          Assignee: unassigned at sourceware dot org
          Reporter: thiago at kde dot org
                CC: drepper.fsp at gmail dot com
  Target Milestone: ---

Short description:
send_dg() from resolv/res_send.c times out when sending IPv6 and IPv4
sequentially, instead of in parallel, when the reply is SERVFAIL. This happens
because it waits for the second reply to arrive without having sent a query.

The sequential sending can be triggered by the reply to the second request
(IPv6) having timed out at any point in the execution. For non-local DNS
servers and a long-running application, this is not inconceivable. From that
point on, every resolution that includes a SERVFAIL will be 10-seconds long.

This was noticed when attempting to resolve a multi-label ".local" names. The
nsswitch.conf in the machine in question was:

hosts:          files mdns_minimal [NOTFOUND=return] dns mdns4

Because the name in question had more than 2 labels, mdns_minimal refused
operation, so the query was sent to libnss_dns. The DNS server (a
systemd-resolved stub resolver with mDNS support turned off) responded with
SERVFAIL. That triggered the timeout.

Affects:
2.17 to 2.36, at least. I don't have any older ones to try. My guess is that it
has existed since Happy Eyeballs was introduced to support parallel IPv6 and
IPv4 queries.


Testcase:
===
#include <netdb.h>
#include <resolv.h>

int main()
{
    struct addrinfo *ai;
    int res;

    // simulate send_dg() having timed out once
    res_init();
    _res.options |= RES_SNGLKUP;

    res = getaddrinfo("foobar", "*", NULL, &ai);
}
===

This works best with the systemd-resolved stub resolver, on 127.0.0.53. It'll
reply SERVFAIL to the "foobar" name:

03:54:30.338732 IP 127.0.0.1.56636 > 127.0.0.53.53: 57854+ [1au] A? foobar.lan.
(39)
03:54:30.340472 IP 127.0.0.53.53 > 127.0.0.1.56636: 57854 NXDomain 0/0/1 (39)
03:54:30.340707 IP 127.0.0.1.56636 > 127.0.0.53.53: 17656+ [1au] AAAA?
foobar.lan. (39)
03:54:30.341484 IP 127.0.0.53.53 > 127.0.0.1.56636: 17656 NXDomain 0/0/1 (39)
03:54:30.341716 IP 127.0.0.1.59512 > 127.0.0.53.53: 11803+ [1au] A? foobar.
(35)
03:54:30.341754 IP 127.0.0.53.53 > 127.0.0.1.59512: 11803 ServFail* 0/0/1 (35)
03:54:35.346469 IP 127.0.0.1.59512 > 127.0.0.53.53: 11803+ [1au] A? foobar.
(35)
03:54:35.346665 IP 127.0.0.53.53 > 127.0.0.1.59512: 11803 ServFail* 0/0/1 (35)

Note the 5 seconds.

strace of those last two queries was:
===
03:54:30.341599 socket(AF_INET, SOCK_DGRAM|SOCK_CLOEXEC|SOCK_NONBLOCK,
IPPROTO_IP) = 3
03:54:30.341613 setsockopt(3, SOL_IP, IP_RECVERR, [1], 4) = 0
03:54:30.341667 connect(3, {sa_family=AF_INET, sin_port=htons(53),
sin_addr=inet_addr("127.0.0.53")}, 16) = 0
03:54:30.341695 poll([{fd=3, events=POLLOUT}], 1, 0) = 1 ([{fd=3,
revents=POLLOUT}])
03:54:30.341708 sendto(3, ".\33\1
\0\1\0\0\0\0\0\1\6foobar\0\0\1\0\1\0\0)\4\260\0\0\0"..., 35, MSG_NOSIGNAL,
NULL, 0) = 35
03:54:30.341725 poll([{fd=3, events=POLLIN}], 1, 5000) = 1 ([{fd=3,
revents=POLLIN}])
03:54:30.341765 ioctl(3, FIONREAD, [35]) = 0
03:54:30.341777 recvfrom(3,
".\33\205\202\0\1\0\0\0\0\0\1\6foobar\0\0\1\0\1\0\0)\377\326\0\0\0"..., 2048,
0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("127.0.0.53")},
[28 => 16]) = 35
03:54:30.341793 poll([{fd=3, events=POLLIN}], 1, 4999) = 0 (Timeout)
03:54:35.346127 poll([{fd=3, events=POLLOUT}], 1, 0) = 1 ([{fd=3,
revents=POLLOUT}])
03:54:35.346370 sendto(3, ".\33\1
\0\1\0\0\0\0\0\1\6foobar\0\0\1\0\1\0\0)\4\260\0\0\0"..., 35, MSG_NOSIGNAL,
NULL, 0) = 35
03:54:35.346642 poll([{fd=3, events=POLLIN}], 1, 5000) = 1 ([{fd=3,
revents=POLLIN}])
03:54:35.346701 ioctl(3, FIONREAD, [35]) = 0
03:54:35.346732 recvfrom(3,
".\33\205\202\0\1\0\0\0\0\0\1\6foobar\0\0\1\0\1\0\0)\377\326\0\0\0"..., 2048,
0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("127.0.0.53")},
[28 => 16]) = 35
03:54:35.346779 poll([{fd=3, events=POLLIN}], 1, 4999) = 0 (Timeout)
03:54:40.351137 close(3)                = 0

Note the two poll() for 4999 ms that timed out. Those happen after a successful
recvfrom() of the reply that we wanted.

Code analysis:
The recvfrom() happens on line 1179 of resolv/res_send.c (2.37)

                *thisresplenp = __recvfrom (pfd[0].fd, (char *) *thisansp,
                                            *thisanssizp, 0,
                                            (struct sockaddr *) &from,
                                            &fromlen);

The __libc_res_queriesmatch() test on line 1204 passes (I tested with a full
strace packet payload), setting matching_query = 1:

                if (!recvresp1
                    && anhp->id == hp->id
                    && __libc_res_queriesmatch (buf, buf + buflen,
                                                *thisansp,
                                                *thisansp + *thisanssizp))
                  matching_query = 1;

Since this reply is a SERVFAIL, the if on line 1222 is entered:

                if (anhp->rcode == SERVFAIL ||
                    anhp->rcode == NOTIMP ||
                    anhp->rcode == REFUSED) {
                next_ns:
                        if (recvresp1 || (buf2 != NULL && recvresp2)) {
                          *resplen2 = 0;
                          return resplen;
                        }
                        if (buf2 != NULL)
                          {
                            /* No data from the first reply.  */
                            resplen = 0;
                            /* We are waiting for a possible second reply.  */
                            if (matching_query == 1)
                              recvresp1 = 1;
                            else
                              recvresp2 = 1;

                            goto wait;
                          }

At this point, both recvresp1 and recvresp2 are still false, so the first
condition is skipped. As this is a Happy Eyeballs request, buf2 is not null, so
the second condition is entered, where the comments say "We are waiting for a
possible second reply" but no request has yet been sent. This proceeds to "goto
wait" and will time out.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug libc/30081] libresolv: timeout when running in single-request mode
  2023-02-05  4:10 [Bug libc/30081] New: libresolv: timeout when running in single-request mode thiago at kde dot org
@ 2024-06-13  7:38 ` fweimer at redhat dot com
  2024-06-13  7:38 ` fweimer at redhat dot com
  2024-06-13  7:38 ` [Bug network/30081] " fweimer at redhat dot com
  2 siblings, 0 replies; 4+ messages in thread
From: fweimer at redhat dot com @ 2024-06-13  7:38 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=30081

Florian Weimer <fweimer at redhat dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
              Flags|                            |security-
                 CC|                            |fweimer at redhat dot com

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug libc/30081] libresolv: timeout when running in single-request mode
  2023-02-05  4:10 [Bug libc/30081] New: libresolv: timeout when running in single-request mode thiago at kde dot org
  2024-06-13  7:38 ` [Bug libc/30081] " fweimer at redhat dot com
@ 2024-06-13  7:38 ` fweimer at redhat dot com
  2024-06-13  7:38 ` [Bug network/30081] " fweimer at redhat dot com
  2 siblings, 0 replies; 4+ messages in thread
From: fweimer at redhat dot com @ 2024-06-13  7:38 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=30081

Florian Weimer <fweimer at redhat dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Assignee|unassigned at sourceware dot org   |fweimer at redhat dot com
   Last reconfirmed|                            |2024-06-13
             Status|UNCONFIRMED                 |ASSIGNED
     Ever confirmed|0                           |1

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug network/30081] libresolv: timeout when running in single-request mode
  2023-02-05  4:10 [Bug libc/30081] New: libresolv: timeout when running in single-request mode thiago at kde dot org
  2024-06-13  7:38 ` [Bug libc/30081] " fweimer at redhat dot com
  2024-06-13  7:38 ` fweimer at redhat dot com
@ 2024-06-13  7:38 ` fweimer at redhat dot com
  2 siblings, 0 replies; 4+ messages in thread
From: fweimer at redhat dot com @ 2024-06-13  7:38 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=30081

Florian Weimer <fweimer at redhat dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
          Component|libc                        |network

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2024-06-13  7:38 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-02-05  4:10 [Bug libc/30081] New: libresolv: timeout when running in single-request mode thiago at kde dot org
2024-06-13  7:38 ` [Bug libc/30081] " fweimer at redhat dot com
2024-06-13  7:38 ` fweimer at redhat dot com
2024-06-13  7:38 ` [Bug network/30081] " fweimer at redhat dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).