From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-oi1-x22f.google.com (mail-oi1-x22f.google.com [IPv6:2607:f8b0:4864:20::22f]) by sourceware.org (Postfix) with ESMTPS id BE1713858407 for ; Mon, 7 Feb 2022 12:01:22 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org BE1713858407 Received: by mail-oi1-x22f.google.com with SMTP id i5so16699182oih.1 for ; Mon, 07 Feb 2022 04:01:22 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:date:mime-version:user-agent:subject :content-language:to:cc:references:from:in-reply-to :content-transfer-encoding; bh=wwqO1ciBoK/9r7J0J77xFG3PdJB9s+Rvk11Tl2PzlG8=; b=mUTHP65eHYoSPMF+ILL9IWh0PZPygs9tIFKNymlK33AwIW9m92dPXjQ8iHZ1pnCGgq jeMbUwLDylGwT+zincZar+3AV9GscGudaE3moKtsebaoMHxZHpQtedy/U5suPT+gPFmO SWAz+ChYEziBJwrI6cQBqMnEQXdK/4KbdPgGfUUmEkKRQE8o3dW6Jf5rUVDLfbvDihke Z28oYiCZmTiwbcXIuw5n8QNyCzZu7tm4YlWvWP5X9ipQ94MuTSRD8PlpiYkqC1ZCD8NA pU5moJ3ASQ0KvY36jEiPt2PoeUcLBLtANlTvdCn5hvLggWSZd/b9tfQd1wFNYw3vUFl7 O5tQ== X-Gm-Message-State: AOAM532DEqreB7YP6mMd329SFlRD/iZY8vK4UGCktMCVjskyrm468maT Pgh1chGwwXs2RW4cQrHPU6qhQx2wm5OXjQ== X-Google-Smtp-Source: ABdhPJyq7SSORon1v2myJJ/ElJy2ROr01hWa0R3dvIKIBkZ+aZ5yMIgbDq7tHeGOGfYqynpRVRJUzQ== X-Received: by 2002:a05:6808:15aa:: with SMTP id t42mr6496377oiw.227.1644235282113; Mon, 07 Feb 2022 04:01:22 -0800 (PST) Received: from ?IPV6:2804:431:c7ca:733:4cdc:e08a:54c6:5108? ([2804:431:c7ca:733:4cdc:e08a:54c6:5108]) by smtp.gmail.com with ESMTPSA id o19sm4164902oae.36.2022.02.07.04.01.21 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 07 Feb 2022 04:01:21 -0800 (PST) Message-ID: <34c6e548-ac58-85f2-c2d5-098bccc5a52a@linaro.org> Date: Mon, 7 Feb 2022 09:01:19 -0300 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.5.1 Subject: Re: [PATCH] linux: fix accuracy of get_nprocs and get_nprocs_conf [BZ #28865] Content-Language: en-US To: "Dmitry V. Levin" Cc: libc-alpha@sourceware.org References: <20220205212402.GA5233@altlinux.org> <2f8633c5-6335-b7aa-e735-65dc36322d7f@linaro.org> <20220207115113.GA29197@altlinux.org> From: Adhemerval Zanella In-Reply-To: <20220207115113.GA29197@altlinux.org> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-5.7 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, NICE_REPLY_A, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 07 Feb 2022 12:01:24 -0000 On 07/02/2022 08:51, Dmitry V. Levin wrote: > Hi, > > On Mon, Feb 07, 2022 at 08:25:11AM -0300, Adhemerval Zanella via Libc-alpha wrote: >> On 05/02/2022 18:24, Dmitry V. Levin wrote: >>> get_nprocs() and get_nprocs_conf() use various methods to obtain an >>> accurate number of processors. Re-introduce __get_nprocs_sched() as >>> a source of information, and fix the order in which these methods are >>> used to return the most accurate information. The primary source of >>> information used in both functions remains unchanged. >>> >>> This also changes __get_nprocs_sched() error return value from 2 to 0, >>> but all its users are already prepared to handle that. >>> >>> Old behavior: >>> get_nprocs: >>> /sys/devices/system/cpu/online -> /proc/stat -> 2 >>> get_nprocs_conf: >>> /sys/devices/system/cpu/ -> /proc/stat -> 2 >>> >>> New behavior: >>> get_nprocs: >>> /sys/devices/system/cpu/online -> sched_getaffinity -> /proc/stat -> 2 >>> get_nprocs_conf: >>> /sys/devices/system/cpu/ -> /proc/stat -> sched_getaffinity -> 2 >>> >>> Fixes: 342298278e ("linux: Revert the use of sched_getaffinity on get_nproc") >>> Closes: BZ #28865 >> >> I think we are circling back on this, on BZ#27645 [1] we changed get_nprocs >> to use sched_getaffinity and then we have to revert it with BZ#28310 [2] because >> it introduced regression on some monitoring tools [3]. >> >> In fact from BZ#27645 and BZ#28624 [4] discussion I think we can't reliable use >> sched_getaffinity because since some container environment returns a synthetic >> mask that might break some programs. Also, sched_getaffinity returns a >> 'per-process' mask instead of system-wide as we discussed in previous threads. >> It should be ok to get adjusting internal tuning (as for malloc). >> >> [1] https://sourceware.org/bugzilla/show_bug.cgi?id=27645 >> [2] https://sourceware.org/bugzilla/show_bug.cgi?id=28310 >> [3] https://sourceware.org/bugzilla/show_bug.cgi?id=27645#c5 >> [4] https://sourceware.org/bugzilla/show_bug.cgi?id=28624 > > Is there any realistic case when 2 is a more accurate estimation for the > number of processors than sched_getaffinity? I suppose there are no such > cases. Also, /sys is consulted first anyway. I am not sure, but my impression is on some environments sched_getaffinity returns a synthetic value that might not represent the correct system supported CPUs. At least, it was my impression in the bug reports, where it does break some programs. > > I wish I saw commit 342298278e earlier to raise objections before it was > committed. > > Please note that BZ #28865 is a real regression we had to patch, this > means glibc must behave properly in that environment without any > additional tuning. > > I suggest to install this fix and see what could be done later > in an unlikely case anything else breaks. I think in this case we should use sched_affinity as the last fallback on get_nprocs as well we, so we first use either sysfs or procfs and only then fallback to sched_getaffinity.