From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail.efficios.com (mail.efficios.com [167.114.26.124]) by sourceware.org (Postfix) with ESMTPS id E9A243857C67 for ; Mon, 6 Jul 2020 14:49:39 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org E9A243857C67 Received: from localhost (localhost [127.0.0.1]) by mail.efficios.com (Postfix) with ESMTP id 63F692D8429; Mon, 6 Jul 2020 10:49:39 -0400 (EDT) Received: from mail.efficios.com ([127.0.0.1]) by localhost (mail03.efficios.com [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id nGbNS3E2RCRr; Mon, 6 Jul 2020 10:49:39 -0400 (EDT) Received: from localhost (localhost [127.0.0.1]) by mail.efficios.com (Postfix) with ESMTP id 198852D8339; Mon, 6 Jul 2020 10:49:39 -0400 (EDT) DKIM-Filter: OpenDKIM Filter v2.10.3 mail.efficios.com 198852D8339 X-Virus-Scanned: amavisd-new at efficios.com Received: from mail.efficios.com ([127.0.0.1]) by localhost (mail03.efficios.com [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id FXEvtcbaTxen; Mon, 6 Jul 2020 10:49:39 -0400 (EDT) Received: from mail03.efficios.com (mail03.efficios.com [167.114.26.124]) by mail.efficios.com (Postfix) with ESMTP id 094812D8338; Mon, 6 Jul 2020 10:49:39 -0400 (EDT) Date: Mon, 6 Jul 2020 10:49:38 -0400 (EDT) From: Mathieu Desnoyers To: Florian Weimer Cc: carlos , Joseph Myers , Szabolcs Nagy , libc-alpha , Thomas Gleixner , Ben Maurer , Peter Zijlstra , Paul , Boqun Feng , Will Deacon , Paul Turner , linux-kernel , linux-api Message-ID: <942999672.22574.1594046978937.JavaMail.zimbra@efficios.com> In-Reply-To: <877dvg4ud4.fsf@oldenburg2.str.redhat.com> References: <20200629190036.26982-1-mathieu.desnoyers@efficios.com> <20200629190036.26982-3-mathieu.desnoyers@efficios.com> <877dvg4ud4.fsf@oldenburg2.str.redhat.com> Subject: Re: [PATCH 2/3] Linux: Use rseq in sched_getcpu if available (v9) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Originating-IP: [167.114.26.124] X-Mailer: Zimbra 8.8.15_GA_3945 (ZimbraWebClient - FF78 (Linux)/8.8.15_GA_3928) Thread-Topic: Linux: Use rseq in sched_getcpu if available (v9) Thread-Index: s+lDRWKNxhAkUDtcGJ1GCFO2e2bJlg== X-Spam-Status: No, score=-9.1 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 06 Jul 2020 14:49:41 -0000 ----- On Jul 6, 2020, at 9:59 AM, Florian Weimer fweimer@redhat.com wrote: > * Mathieu Desnoyers: >=20 >> When available, use the cpu_id field from __rseq_abi on Linux to >> implement sched_getcpu(). Fall-back on the vgetcpu vDSO if >> unavailable. >=20 > I've pushed this to glibc master, but unfortunately it looks like this > exposes a kernel bug related to affinity mask changes. >=20 > After building and testing glibc, this >=20 > for x in {1..2000} ; do posix/tst-affinity-static & done >=20 > produces some =E2=80=9Cerror:=E2=80=9D lines for me: >=20 > error: Unexpected CPU 2, expected 0 > error: Unexpected CPU 2, expected 0 > error: Unexpected CPU 2, expected 0 > error: Unexpected CPU 2, expected 0 > error: Unexpected CPU 138, expected 0 > error: Unexpected CPU 138, expected 0 > error: Unexpected CPU 138, expected 0 > error: Unexpected CPU 138, expected 0 >=20 > =E2=80=9Cexpected 0=E2=80=9D is a result of how the test has been written= , it bails out > on the first failure, which happens with CPU ID 0. >=20 > Smaller systems can use a smaller count than 2000 to reproduce this. It > also happens sporadically when running the glibc test suite itself > (which is why it took further testing to reveal this issue). >=20 > I can reproduce this with the Debian 4.19.118-2+deb10u1 kernel, the > Fedora 5.6.19-300.fc32 kernel, and the Red Hat Enterprise Linux kernel > 4.18.0-193.el8 (all x86_64). >=20 > As to the cause, I'd guess that the exit path in the sched_setaffinity > system call fails to update the rseq area, so that userspace can observe > the outdated CPU ID there. Hi Florian, We have a similar test in Linux, see tools/testing/selftests/rseq/basic_tes= t.c. That test does not trigger this issue, even when executed repeatedly. I'll investigate further what is happening within the glibc test. Thanks, Mathieu --=20 Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com