From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 98510 invoked by alias); 7 Jan 2020 06:01:44 -0000 Mailing-List: contact libc-help-help@sourceware.org; run by ezmlm Precedence: bulk List-Subscribe: List-Post: List-Help: , Sender: libc-help-owner@sourceware.org Received: (qmail 98497 invoked by uid 89); 7 Jan 2020 06:01:44 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-0.7 required=5.0 tests=AWL,BAYES_50,FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM,KAM_NUMSUBJECT,RCVD_IN_DNSWL_NONE,SPF_PASS autolearn=no version=3.3.1 spammy=automation, H*Ad:U*libc-help, HTo:U*libc-help, recipes X-HELO: mail-qk1-f178.google.com Received: from mail-qk1-f178.google.com (HELO mail-qk1-f178.google.com) (209.85.222.178) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Tue, 07 Jan 2020 06:01:43 +0000 Received: by mail-qk1-f178.google.com with SMTP id c16so41699605qko.6 for ; Mon, 06 Jan 2020 22:01:43 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:from:date:message-id:subject:to; bh=2eL4oAJWaxY+mpwppLzWQ3fBetijyGJAfOB726KCO2A=; b=t/ojukaQJQRRkkmyzOxYF3nqCqVYfBZ3cuZjPEJV+Ravgmj+oB0FC2svupBmByD88N eqpWu15q1HPb+X7Os9bx+U0dFdSv5Ttc2saC98UAQ+j1EbxzSgcoyj3LfUc+IVV2GmBQ A4ewjKv9LTM8sb/72kviJkAk9OdgB9LZYZsoyywsR++GCdPp6imLGrclWtWIwp5x8CdK 7n10iWZnuWirkXftlOb9a29WgdIJJSmGNFQxqmgyzf0zOCjM6BAsneT3suWdB2wwj/mC Ru1c00o2kUyRLvCNuH62M4mQNap5++LPRM+k9EGD5U8toxd2Xywe2FFByN0CEXiuiueL d3OA== MIME-Version: 1.0 From: Tarun Tej K Date: Tue, 07 Jan 2020 06:01:00 -0000 Message-ID: Subject: getaddrinfo() fails to use latest DNS address - v2.27 To: libc-help@sourceware.org Content-Type: text/plain; charset="UTF-8" X-IsSubscribed: yes X-SW-Source: 2020-01/txt/msg00005.txt.bz2 Hi, Environment: glibc version - v2.27 platform - NXP's iMX6 cross-compiler - arm-poky-linux-gnueabi-gcc Built using the Yocto recipes As a part of long term testing of our system, I have a setup of automatic network switching between different interfaces like ethernet, wlan and ppp. During this automation, the DNS addresses in the /etc/resolv.conf keep changing because the active network interface i.e., WLAN/Ethernet/PPP keeps changing. Issue Description: The issue might be related to https://sourceware.org/bugzilla/show_bug.cgi?id=984 It is observed that once in a while, after certain duration like 5 hours or so, the getaddrinfo() fails to resolve the addresses and keep getting EAGAIN 'Temporary failure in name resolution' as return code. 'strace' output of the failing process shows that the getaddrinfo() is doing neither stat64 nor openat() of /etc/resolv.conf (to check for latest DNS change) at all when the process is in this state and may be due to this reason it is not updating the global config (resolv_conf_global) with correct DNS values. I am yet to get the steps to reproduce this issue easily. I have tried a simple application which just calls getaddrinfo() based on user input and that application always does 'stat64' of /etc/resolv.conf and openat when there is change in time or size or inode of /etc/resolv.conf But I am not sure what is causing my actual application to get into a state where it is not even doing 'stat64' of /etc/resolv.conf after some time of running I have gone through glibc code and have a query regarding below part from the function maybe_init() in file resolv/resolv_context.c if (ctx->conf != NULL && replicated_configuration_matches (ctx)) { struct resolv_conf *current = __resolv_conf_get_current (); if (current == NULL) return false; /* Check if the configuration changed. */ if (current != ctx->conf) { /* This call will detach the extended resolver state. */ if (resp->nscount > 0) __res_iclose (resp, true); /* Reattach the current configuration. */ if (__resolv_conf_attach (ctx->resp, current)) { __resolv_conf_put (ctx->conf); /* ctx takes ownership, so we do not release current. */ ctx->conf = current; } } else /* No change. Drop the reference count for current. */ __resolv_conf_put (current); } return true; Here the return value will be 'true' even when the condition if (ctx->conf != NULL && replicated_configuration_matches (ctx)) fails. I think that this is one case where __resolv_conf_get_current() or __resolv_conf_load() would not be called and so 'stat64' or openat() would not be done on /etc/resolv.conf. Why is the function maybe_init returning 'true' when the condition (ctx->conf != NULL && replicated_configuration_matches (ctx)) fails? Note: One thing about /etc/resolv.conf if it helps. Depending the type of active network interface the application changes file type of /etc/resolv.conf is sometimes regular file or symlink to /var/run/resolv.conf. Could the /etc/resolv.conf being a symlink cause any problem like this. Thanks Tarun