From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from movementarian.org (ssh.movementarian.org [139.162.205.133]) by sourceware.org (Postfix) with ESMTPS id D16CE38582B8 for ; Mon, 11 Mar 2024 09:08:27 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org D16CE38582B8 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=movementarian.org Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=movementarian.org ARC-Filter: OpenARC Filter v1.0.0 sourceware.org D16CE38582B8 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=139.162.205.133 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1710148109; cv=none; b=IMt1VoRZ+jfrX0EzxZ1qpSiEHNmPmAnVINzIC6mqtps38vKt05/B26GbbSJlAzWfhXS+6T85j2k3eVYRtJMcWJ+PdKOsJstKkPxYIs+DhSih/yYc4ryf8eniGvxf2UAEkb4lJUuq3xcsujNcLjz4Wa4y1fYRMkpkoU9wxk/UQQs= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1710148109; c=relaxed/simple; bh=knjr24mjLfEvjGTThom+sGdTsikHgNDqMFgPHE+LwTc=; h=Date:From:To:Subject:Message-ID:MIME-Version; b=O1pldzA9MNNX0d8YH50hov0OumPCRBowROS8bP5laWHBfEIo3Zcs7ejDNoqgFzNqGqs3SCod4Bynr/boz5QqflEj5xowJxNevtUAhV23G+mZ3FUClpbBNUKkb6vm5be0jibjTXVXHFCimuJCKrM7ZD3Mt6IzCdHBt2VlFtTu1C0= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from movement by movementarian.org with local (Exim 4.95) (envelope-from ) id 1rjbdy-00HGme-17; Mon, 11 Mar 2024 09:08:26 +0000 Date: Mon, 11 Mar 2024 09:08:26 +0000 From: John Levon To: libc-alpha@sourceware.org Cc: fweimer@redhat.com Subject: Issue with stale resolv.conf state Message-ID: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline X-Url: http://www.movementarian.org/ X-Spam-Status: No, score=-2.2 required=5.0 tests=BAYES_00,KAM_DMARC_STATUS,SPF_HELO_PASS,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: I have an intermittent issue where a getaddrinfo()-using application uses stale nameservers. That is, /etc/resolv.conf has been updated, the original nameservers are not reachable at all, but the application doesn't ever notice. Note that this only reproduces very occassionally so difficult for me to distill into a simple test case. This is with glibc 2.35 but from a quick look I didn't see any changes in master that would help. I confirmed that glibc never stat()s the file, and this is because we are here: 68 /* Initialize *RESP if RES_INIT is not yet set in RESP->options, or if 69 res_init in some other thread requested re-initializing. */ 70 static __attribute__ ((warn_unused_result)) bool 71 maybe_init (struct resolv_context *ctx, bool preinit) 72 { 73 struct __res_state *resp = ctx->resp; 74 if (resp->options & RES_INIT) 75 { 76 if (resp->options & RES_NORELOAD) 77 /* Configuration reloading was explicitly disabled. */ 78 return true; 79 80 /* If there is no associated resolv_conf object despite the 81 initialization, something modified *ctx->resp. Do not 82 override those changes. */ 83 if (ctx->conf != NULL && replicated_configuration_matches (ctx)) And "replicated_configuration_matches()" is false. Thus we never examine the file for any changes and continue using the old version indefinitely. I don't understand the first part of the comment, but indeed, ->resp doesn't match. In particular: 62 return ctx->resp->options == ctx->conf->options and ctx->resp (aka _resp) has 0x47002c1 whereas ctx->conf has 0x41002c1. I'm not sure but I suspect the additional RES_SNGLKUP|RES_SNGLKUPREOP may be due to this code: 1000 /* There are quite a few broken name servers out 1001 there which don't handle two outstanding 1002 requests from the same source. There are also 1003 broken firewall settings. If we time out after 1004 having received one answer switch to the mode 1005 where we send the second request only once we 1006 have received the first answer. */ 1007 if (!single_request) 1008 { 1009 statp->options |= RES_SNGLKUP; 1010 single_request = true; 1011 *gotsomewhere = save_gotsomewhere; 1012 goto retry; 1013 } 1014 else if (!single_request_reopen) 1015 { 1016 statp->options |= RES_SNGLKUPREOP; 1017 single_request_reopen = true; 1018 *gotsomewhere = save_gotsomewhere; 1019 __res_iclose (statp, false); 1020 goto retry_reopen; 1021 } I'm guessing these got set when the VPN dropped routing to the old nameservers, but before the next getaddrinfo() came in, thus leading to the match failing. I can't see where the application code itself can be at fault here, but I'm not 100% confident about the above analysis either. Any thoughts? thanks john