From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id E85573858C54; Sun, 5 Feb 2023 04:10:48 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org E85573858C54 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1675570248; bh=DTd0UA+/8IxBe67Dh7yfFmPAiKTZn3hchAKatZDQinI=; h=From:To:Subject:Date:From; b=ioXpbQHljtK06I1DZsjFYqRNxllOiAdssuQFBSmUPvThlEeuisQZIEUGBYZxRti0p eAOsotQceuirWVpiB+vUZjCo0Rv2Y/K6hf/Xq0C0loVvGHeiwyUN9sTsB/qOy6N/53 1JY+tTLDjVkFZnpaKKlKfxj4rCi9DHjj3AxPQkPE= From: "thiago at kde dot org" To: glibc-bugs@sourceware.org Subject: [Bug libc/30081] New: libresolv: timeout when running in single-request mode Date: Sun, 05 Feb 2023 04:10:47 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: glibc X-Bugzilla-Component: libc X-Bugzilla-Version: 2.39 X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: thiago at kde dot org X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Resolution: X-Bugzilla-Priority: P2 X-Bugzilla-Assigned-To: unassigned at sourceware dot org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_id short_desc product version bug_status bug_severity priority component assigned_to reporter cc target_milestone Message-ID: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://sourceware.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 List-Id: https://sourceware.org/bugzilla/show_bug.cgi?id=3D30081 Bug ID: 30081 Summary: libresolv: timeout when running in single-request mode Product: glibc Version: 2.39 Status: UNCONFIRMED Severity: normal Priority: P2 Component: libc Assignee: unassigned at sourceware dot org Reporter: thiago at kde dot org CC: drepper.fsp at gmail dot com Target Milestone: --- Short description: send_dg() from resolv/res_send.c times out when sending IPv6 and IPv4 sequentially, instead of in parallel, when the reply is SERVFAIL. This happ= ens because it waits for the second reply to arrive without having sent a query. The sequential sending can be triggered by the reply to the second request (IPv6) having timed out at any point in the execution. For non-local DNS servers and a long-running application, this is not inconceivable. From that point on, every resolution that includes a SERVFAIL will be 10-seconds long. This was noticed when attempting to resolve a multi-label ".local" names. T= he nsswitch.conf in the machine in question was: hosts: files mdns_minimal [NOTFOUND=3Dreturn] dns mdns4 Because the name in question had more than 2 labels, mdns_minimal refused operation, so the query was sent to libnss_dns. The DNS server (a systemd-resolved stub resolver with mDNS support turned off) responded with SERVFAIL. That triggered the timeout. Affects: 2.17 to 2.36, at least. I don't have any older ones to try. My guess is tha= t it has existed since Happy Eyeballs was introduced to support parallel IPv6 and IPv4 queries. Testcase: =3D=3D=3D #include #include int main() { struct addrinfo *ai; int res; // simulate send_dg() having timed out once res_init(); _res.options |=3D RES_SNGLKUP; res =3D getaddrinfo("foobar", "*", NULL, &ai); } =3D=3D=3D This works best with the systemd-resolved stub resolver, on 127.0.0.53. It'= ll reply SERVFAIL to the "foobar" name: 03:54:30.338732 IP 127.0.0.1.56636 > 127.0.0.53.53: 57854+ [1au] A? foobar.= lan. (39) 03:54:30.340472 IP 127.0.0.53.53 > 127.0.0.1.56636: 57854 NXDomain 0/0/1 (3= 9) 03:54:30.340707 IP 127.0.0.1.56636 > 127.0.0.53.53: 17656+ [1au] AAAA? foobar.lan. (39) 03:54:30.341484 IP 127.0.0.53.53 > 127.0.0.1.56636: 17656 NXDomain 0/0/1 (3= 9) 03:54:30.341716 IP 127.0.0.1.59512 > 127.0.0.53.53: 11803+ [1au] A? foobar. (35) 03:54:30.341754 IP 127.0.0.53.53 > 127.0.0.1.59512: 11803 ServFail* 0/0/1 (= 35) 03:54:35.346469 IP 127.0.0.1.59512 > 127.0.0.53.53: 11803+ [1au] A? foobar. (35) 03:54:35.346665 IP 127.0.0.53.53 > 127.0.0.1.59512: 11803 ServFail* 0/0/1 (= 35) Note the 5 seconds. strace of those last two queries was: =3D=3D=3D 03:54:30.341599 socket(AF_INET, SOCK_DGRAM|SOCK_CLOEXEC|SOCK_NONBLOCK, IPPROTO_IP) =3D 3 03:54:30.341613 setsockopt(3, SOL_IP, IP_RECVERR, [1], 4) =3D 0 03:54:30.341667 connect(3, {sa_family=3DAF_INET, sin_port=3Dhtons(53), sin_addr=3Dinet_addr("127.0.0.53")}, 16) =3D 0 03:54:30.341695 poll([{fd=3D3, events=3DPOLLOUT}], 1, 0) =3D 1 ([{fd=3D3, revents=3DPOLLOUT}]) 03:54:30.341708 sendto(3, ".\33\1 \0\1\0\0\0\0\0\1\6foobar\0\0\1\0\1\0\0)\4\260\0\0\0"..., 35, MSG_NOSIGNAL, NULL, 0) =3D 35 03:54:30.341725 poll([{fd=3D3, events=3DPOLLIN}], 1, 5000) =3D 1 ([{fd=3D3, revents=3DPOLLIN}]) 03:54:30.341765 ioctl(3, FIONREAD, [35]) =3D 0 03:54:30.341777 recvfrom(3, ".\33\205\202\0\1\0\0\0\0\0\1\6foobar\0\0\1\0\1\0\0)\377\326\0\0\0"..., 204= 8, 0, {sa_family=3DAF_INET, sin_port=3Dhtons(53), sin_addr=3Dinet_addr("127.0.= 0.53")}, [28 =3D> 16]) =3D 35 03:54:30.341793 poll([{fd=3D3, events=3DPOLLIN}], 1, 4999) =3D 0 (Timeout) 03:54:35.346127 poll([{fd=3D3, events=3DPOLLOUT}], 1, 0) =3D 1 ([{fd=3D3, revents=3DPOLLOUT}]) 03:54:35.346370 sendto(3, ".\33\1 \0\1\0\0\0\0\0\1\6foobar\0\0\1\0\1\0\0)\4\260\0\0\0"..., 35, MSG_NOSIGNAL, NULL, 0) =3D 35 03:54:35.346642 poll([{fd=3D3, events=3DPOLLIN}], 1, 5000) =3D 1 ([{fd=3D3, revents=3DPOLLIN}]) 03:54:35.346701 ioctl(3, FIONREAD, [35]) =3D 0 03:54:35.346732 recvfrom(3, ".\33\205\202\0\1\0\0\0\0\0\1\6foobar\0\0\1\0\1\0\0)\377\326\0\0\0"..., 204= 8, 0, {sa_family=3DAF_INET, sin_port=3Dhtons(53), sin_addr=3Dinet_addr("127.0.= 0.53")}, [28 =3D> 16]) =3D 35 03:54:35.346779 poll([{fd=3D3, events=3DPOLLIN}], 1, 4999) =3D 0 (Timeout) 03:54:40.351137 close(3) =3D 0 Note the two poll() for 4999 ms that timed out. Those happen after a succes= sful recvfrom() of the reply that we wanted. Code analysis: The recvfrom() happens on line 1179 of resolv/res_send.c (2.37) *thisresplenp =3D __recvfrom (pfd[0].fd, (char *) *thisansp, *thisanssizp, 0, (struct sockaddr *) &from, &fromlen); The __libc_res_queriesmatch() test on line 1204 passes (I tested with a full strace packet payload), setting matching_query =3D 1: if (!recvresp1 && anhp->id =3D=3D hp->id && __libc_res_queriesmatch (buf, buf + buflen, *thisansp, *thisansp + *thisanssizp)) matching_query =3D 1; Since this reply is a SERVFAIL, the if on line 1222 is entered: if (anhp->rcode =3D=3D SERVFAIL || anhp->rcode =3D=3D NOTIMP || anhp->rcode =3D=3D REFUSED) { next_ns: if (recvresp1 || (buf2 !=3D NULL && recvresp2)) { *resplen2 =3D 0; return resplen; } if (buf2 !=3D NULL) { /* No data from the first reply. */ resplen =3D 0; /* We are waiting for a possible second reply. = */ if (matching_query =3D=3D 1) recvresp1 =3D 1; else recvresp2 =3D 1; goto wait; } At this point, both recvresp1 and recvresp2 are still false, so the first condition is skipped. As this is a Happy Eyeballs request, buf2 is not null= , so the second condition is entered, where the comments say "We are waiting for= a possible second reply" but no request has yet been sent. This proceeds to "= goto wait" and will time out. --=20 You are receiving this mail because: You are on the CC list for the bug.=