* Issue with stale resolv.conf state
@ 2024-03-11 9:08 John Levon
2024-03-11 10:51 ` Florian Weimer
0 siblings, 1 reply; 8+ messages in thread
From: John Levon @ 2024-03-11 9:08 UTC (permalink / raw)
To: libc-alpha; +Cc: fweimer
I have an intermittent issue where a getaddrinfo()-using application uses stale
nameservers. That is, /etc/resolv.conf has been updated, the original
nameservers are not reachable at all, but the application doesn't ever notice.
Note that this only reproduces very occassionally so difficult for me to distill
into a simple test case.
This is with glibc 2.35 but from a quick look I didn't see any changes in master
that would help.
I confirmed that glibc never stat()s the file, and this is because we are here:
68 /* Initialize *RESP if RES_INIT is not yet set in RESP->options, or if
69 res_init in some other thread requested re-initializing. */
70 static __attribute__ ((warn_unused_result)) bool
71 maybe_init (struct resolv_context *ctx, bool preinit)
72 {
73 struct __res_state *resp = ctx->resp;
74 if (resp->options & RES_INIT)
75 {
76 if (resp->options & RES_NORELOAD)
77 /* Configuration reloading was explicitly disabled. */
78 return true;
79
80 /* If there is no associated resolv_conf object despite the
81 initialization, something modified *ctx->resp. Do not
82 override those changes. */
83 if (ctx->conf != NULL && replicated_configuration_matches (ctx))
And "replicated_configuration_matches()" is false. Thus we never examine the
file for any changes and continue using the old version indefinitely.
I don't understand the first part of the comment, but indeed, ->resp doesn't
match. In particular:
62 return ctx->resp->options == ctx->conf->options
and ctx->resp (aka _resp) has 0x47002c1 whereas ctx->conf has 0x41002c1.
I'm not sure but I suspect the additional RES_SNGLKUP|RES_SNGLKUPREOP may be due
to this code:
1000 /* There are quite a few broken name servers out
1001 there which don't handle two outstanding
1002 requests from the same source. There are also
1003 broken firewall settings. If we time out after
1004 having received one answer switch to the mode
1005 where we send the second request only once we
1006 have received the first answer. */
1007 if (!single_request)
1008 {
1009 statp->options |= RES_SNGLKUP;
1010 single_request = true;
1011 *gotsomewhere = save_gotsomewhere;
1012 goto retry;
1013 }
1014 else if (!single_request_reopen)
1015 {
1016 statp->options |= RES_SNGLKUPREOP;
1017 single_request_reopen = true;
1018 *gotsomewhere = save_gotsomewhere;
1019 __res_iclose (statp, false);
1020 goto retry_reopen;
1021 }
I'm guessing these got set when the VPN dropped routing to the old nameservers,
but before the next getaddrinfo() came in, thus leading to the match failing.
I can't see where the application code itself can be at fault here, but I'm not
100% confident about the above analysis either. Any thoughts?
thanks
john
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Issue with stale resolv.conf state
2024-03-11 9:08 Issue with stale resolv.conf state John Levon
@ 2024-03-11 10:51 ` Florian Weimer
2024-03-12 0:51 ` Cristian Rodríguez
0 siblings, 1 reply; 8+ messages in thread
From: Florian Weimer @ 2024-03-11 10:51 UTC (permalink / raw)
To: John Levon; +Cc: libc-alpha
* John Levon:
> I don't understand the first part of the comment, but indeed, ->resp doesn't
> match. In particular:
>
> 62 return ctx->resp->options == ctx->conf->options
>
> and ctx->resp (aka _resp) has 0x47002c1 whereas ctx->conf has 0x41002c1.
>
> I'm not sure but I suspect the additional RES_SNGLKUP|RES_SNGLKUPREOP
> may be due to this code:
>
> 1000 /* There are quite a few broken name servers out
> 1001 there which don't handle two outstanding
> 1002 requests from the same source. There are also
> 1003 broken firewall settings. If we time out after
> 1004 having received one answer switch to the mode
> 1005 where we send the second request only once we
> 1006 have received the first answer. */
> 1007 if (!single_request)
> 1008 {
> 1009 statp->options |= RES_SNGLKUP;
> 1010 single_request = true;
> 1011 *gotsomewhere = save_gotsomewhere;
> 1012 goto retry;
> 1013 }
> 1014 else if (!single_request_reopen)
> 1015 {
> 1016 statp->options |= RES_SNGLKUPREOP;
> 1017 single_request_reopen = true;
> 1018 *gotsomewhere = save_gotsomewhere;
> 1019 __res_iclose (statp, false);
> 1020 goto retry_reopen;
> 1021 }
That's a very good point. Yes, the current reloading code does not take
into account that we change _res.options dynamically based on network
behavior.
That automatic configuration change based on temporary network glitches
is problematic in other contexts as well (it may further trigger bugs in
dual query processing).
Maybe we should just remove the automatic downgrade, basically not
persist this across queries anymore.
Thanks,
Florian
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Issue with stale resolv.conf state
2024-03-11 10:51 ` Florian Weimer
@ 2024-03-12 0:51 ` Cristian Rodríguez
2024-03-12 6:45 ` Florian Weimer
0 siblings, 1 reply; 8+ messages in thread
From: Cristian Rodríguez @ 2024-03-12 0:51 UTC (permalink / raw)
To: Florian Weimer; +Cc: John Levon, libc-alpha
On Mon, Mar 11, 2024 at 7:51 AM Florian Weimer <fweimer@redhat.com> wrote:
>
> * John Levon:
>
cessing).
>
> Maybe we should just remove the automatic downgrade, basically not
> persist this across queries anymore.
Yeah. +1. Users of those broken nameservers deserve at least noticing
they are wrong if such systems are really still around ..
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Issue with stale resolv.conf state
2024-03-12 0:51 ` Cristian Rodríguez
@ 2024-03-12 6:45 ` Florian Weimer
2024-03-12 9:09 ` Philip Sanetra
` (2 more replies)
0 siblings, 3 replies; 8+ messages in thread
From: Florian Weimer @ 2024-03-12 6:45 UTC (permalink / raw)
To: Cristian Rodríguez; +Cc: John Levon, libc-alpha
* Cristian Rodríguez:
> On Mon, Mar 11, 2024 at 7:51 AM Florian Weimer <fweimer@redhat.com> wrote:
>>
>> * John Levon:
>>
> cessing).
>>
>> Maybe we should just remove the automatic downgrade, basically not
>> persist this across queries anymore.
>
> Yeah. +1. Users of those broken nameservers deserve at least noticing
> they are wrong if such systems are really still around ..
I filed:
Automatic activation of single-request options break resolv.conf reloading
<https://sourceware.org/bugzilla/show_bug.cgi?id=31476>
On the other hand, we have this request:
| Change resolv.conf default to single-request
| […]
| We have the year 2022 and these issues still occur, so it was not some
| kind of issue that went away by time as it was possibly expected when
| glibc 2.10 was released.
<https://sourceware.org/bugzilla/show_bug.cgi?id=29017>
So the solution might not be so straightforward.
Thanks,
Florian
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Issue with stale resolv.conf state
2024-03-12 6:45 ` Florian Weimer
@ 2024-03-12 9:09 ` Philip Sanetra
2024-03-12 10:04 ` John Levon
2024-03-12 14:25 ` Cristian Rodríguez
2 siblings, 0 replies; 8+ messages in thread
From: Philip Sanetra @ 2024-03-12 9:09 UTC (permalink / raw)
To: Florian Weimer; +Cc: Cristian Rodríguez, John Levon, libc-alpha
[-- Attachment #1.1: Type: text/plain, Size: 1671 bytes --]
Hi,
I think removing the automatic downgrade without also making the single-request option the defeault behavior would break a lot of systems.
I know of at least two environments in different companies where the default behavior results in 5 seconds timeouts and only the automatic downgrade improves performance in subsequent DNS lookups.
I would appreciate using single-request option as default, like mentioned in https://sourceware.org/bugzilla/show_bug.cgi?id=29017
Regards,
Philip Sanetra
On Tuesday, 12 March 2024 at 7:45 AM, Florian Weimer <fweimer@redhat.com> wrote:
>
>
> * Cristian Rodríguez:
>
> > On Mon, Mar 11, 2024 at 7:51 AM Florian Weimer fweimer@redhat.com wrote:
> >
> > > * John Levon:
> >
> > cessing).
> >
> > > Maybe we should just remove the automatic downgrade, basically not
> > > persist this across queries anymore.
> >
> > Yeah. +1. Users of those broken nameservers deserve at least noticing
> > they are wrong if such systems are really still around ..
>
>
> I filed:
>
> Automatic activation of single-request options break resolv.conf reloading
> https://sourceware.org/bugzilla/show_bug.cgi?id=31476
>
>
> On the other hand, we have this request:
>
> | Change resolv.conf default to single-request
> | […]
> | We have the year 2022 and these issues still occur, so it was not some
> | kind of issue that went away by time as it was possibly expected when
> | glibc 2.10 was released.
>
> https://sourceware.org/bugzilla/show_bug.cgi?id=29017
>
>
> So the solution might not be so straightforward.
>
> Thanks,
> Florian
[-- Attachment #1.2: publickey - code@psanetra.de - 0x61B5EBD7.asc --]
[-- Type: application/pgp-keys, Size: 645 bytes --]
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 249 bytes --]
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Issue with stale resolv.conf state
2024-03-12 6:45 ` Florian Weimer
2024-03-12 9:09 ` Philip Sanetra
@ 2024-03-12 10:04 ` John Levon
2024-03-12 14:25 ` Cristian Rodríguez
2 siblings, 0 replies; 8+ messages in thread
From: John Levon @ 2024-03-12 10:04 UTC (permalink / raw)
To: Florian Weimer; +Cc: Cristian Rodríguez, libc-alpha
On Tue, Mar 12, 2024 at 07:45:23AM +0100, Florian Weimer wrote:
> * Cristian Rodríguez:
>
> > On Mon, Mar 11, 2024 at 7:51 AM Florian Weimer <fweimer@redhat.com> wrote:
> >>
> >> * John Levon:
> >>
> > cessing).
> >>
> >> Maybe we should just remove the automatic downgrade, basically not
> >> persist this across queries anymore.
> >
> > Yeah. +1. Users of those broken nameservers deserve at least noticing
> > they are wrong if such systems are really still around ..
>
> I filed:
>
> Automatic activation of single-request options break resolv.conf reloading
> <https://sourceware.org/bugzilla/show_bug.cgi?id=31476>
Probably a stupid suggestion, but would filtering out these specific flags in
the replicated_configuration_matches() comparison help? Then we'd still go down
the "stat() /etc/resolv.conf" path.
regards
john
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Issue with stale resolv.conf state
2024-03-12 6:45 ` Florian Weimer
2024-03-12 9:09 ` Philip Sanetra
2024-03-12 10:04 ` John Levon
@ 2024-03-12 14:25 ` Cristian Rodríguez
2024-03-12 14:30 ` Florian Weimer
2 siblings, 1 reply; 8+ messages in thread
From: Cristian Rodríguez @ 2024-03-12 14:25 UTC (permalink / raw)
To: Florian Weimer; +Cc: John Levon, libc-alpha
On Tue, Mar 12, 2024 at 3:45 AM Florian Weimer <fweimer@redhat.com> wrote:
> So the solution might not be so straightforward.
I 'm not being sarcastic or anything.. but if the standards do not
recommend an approach for this.. I strongly suggest doing it like
Windows does.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Issue with stale resolv.conf state
2024-03-12 14:25 ` Cristian Rodríguez
@ 2024-03-12 14:30 ` Florian Weimer
0 siblings, 0 replies; 8+ messages in thread
From: Florian Weimer @ 2024-03-12 14:30 UTC (permalink / raw)
To: Cristian Rodríguez; +Cc: John Levon, libc-alpha
* Cristian Rodríguez:
> On Tue, Mar 12, 2024 at 3:45 AM Florian Weimer <fweimer@redhat.com> wrote:
>
>> So the solution might not be so straightforward.
>
> I 'm not being sarcastic or anything.. but if the standards do not
> recommend an approach for this.. I strongly suggest doing it like
> Windows does.
I think Windows has a totally different architecture: their DNS stub
resolver isn't in-process, but system-wide. So it's easier for them to
share state across multiple requests and processes.
Thanks,
Florian
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2024-03-12 14:30 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-03-11 9:08 Issue with stale resolv.conf state John Levon
2024-03-11 10:51 ` Florian Weimer
2024-03-12 0:51 ` Cristian Rodríguez
2024-03-12 6:45 ` Florian Weimer
2024-03-12 9:09 ` Philip Sanetra
2024-03-12 10:04 ` John Levon
2024-03-12 14:25 ` Cristian Rodríguez
2024-03-12 14:30 ` Florian Weimer
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).