public inbox for
 help / color / mirror / Atom feed
From: John Levon <>
Subject: Issue with stale resolv.conf state
Date: Mon, 11 Mar 2024 09:08:26 +0000	[thread overview]
Message-ID: <> (raw)

I have an intermittent issue where a getaddrinfo()-using application uses stale
nameservers. That is, /etc/resolv.conf has been updated, the original
nameservers are not reachable at all, but the application doesn't ever notice.
Note that this only reproduces very occassionally so difficult for me to distill
into a simple test case.

This is with glibc 2.35 but from a quick look I didn't see any changes in master
that would help.

I confirmed that glibc never stat()s the file, and this is because we are here:

 68 /* Initialize *RESP if RES_INIT is not yet set in RESP->options, or if           
 69    res_init in some other thread requested re-initializing.  */                  
 70 static __attribute__ ((warn_unused_result)) bool                                 
 71 maybe_init (struct resolv_context *ctx, bool preinit)                            
 72 {                                                                                
 73   struct __res_state *resp = ctx->resp;                                          
 74   if (resp->options & RES_INIT)                                                  
 75     {                                                                            
 76       if (resp->options & RES_NORELOAD)                                          
 77         /* Configuration reloading was explicitly disabled.  */                  
 78         return true;                                                             
 80       /* If there is no associated resolv_conf object despite the                
 81          initialization, something modified *ctx->resp.  Do not                  
 82          override those changes.  */                                             
 83       if (ctx->conf != NULL && replicated_configuration_matches (ctx))           

And "replicated_configuration_matches()" is false. Thus we never examine the
file for any changes and continue using the old version indefinitely.

I don't understand the first part of the comment, but indeed, ->resp doesn't
match. In particular:

 62   return ctx->resp->options == ctx->conf->options                                

and ctx->resp (aka _resp) has 0x47002c1 whereas ctx->conf has 0x41002c1.

I'm not sure but I suspect the additional RES_SNGLKUP|RES_SNGLKUPREOP may be due
to this code:

1000                     /* There are quite a few broken name servers out             
1001                        there which don't handle two outstanding                  
1002                        requests from the same source.  There are also            
1003                        broken firewall settings.  If we time out after           
1004                        having received one answer switch to the mode             
1005                        where we send the second request only once we             
1006                        have received the first answer.  */                       
1007                     if (!single_request)                                         
1008                       {                                                          
1009                         statp->options |= RES_SNGLKUP;                           
1010                         single_request = true;                                   
1011                         *gotsomewhere = save_gotsomewhere;                       
1012                         goto retry;                                              
1013                       }                                                          
1014                     else if (!single_request_reopen)                             
1015                       {                                                          
1016                         statp->options |= RES_SNGLKUPREOP;                       
1017                         single_request_reopen = true;                            
1018                         *gotsomewhere = save_gotsomewhere;                       
1019                         __res_iclose (statp, false);                             
1020                         goto retry_reopen;                                       
1021                       }                                                          

I'm guessing these got set when the VPN dropped routing to the old nameservers,
but before the next getaddrinfo() came in, thus leading to the match failing.

I can't see where the application code itself can be at fault here, but I'm not
100% confident about the above analysis either. Any thoughts?


             reply	other threads:[~2024-03-11  9:08 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-11  9:08 John Levon [this message]
2024-03-11 10:51 ` Florian Weimer
2024-03-12  0:51   ` Cristian Rodríguez
2024-03-12  6:45     ` Florian Weimer
2024-03-12  9:09       ` Philip Sanetra
2024-03-12 10:04       ` John Levon
2024-03-12 14:25       ` Cristian Rodríguez
2024-03-12 14:30         ` Florian Weimer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \ \ \ \ \

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).