From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail.sergiodj.net (mail.sergiodj.net [195.201.110.160]) by sourceware.org (Postfix) with ESMTPS id 57A3A3858D35 for ; Wed, 1 Nov 2023 22:14:23 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 57A3A3858D35 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=sergiodj.net Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=sergiodj.net ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 57A3A3858D35 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=195.201.110.160 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1698876874; cv=none; b=OJLG7FXSEMXFiOTlvgFHYjy7udqv1iTGzXA5brUqNfYmH84zPOIhKxhGU+8UVJHq8q/4XpdIN4RwNtAafWdEA+pPZgttUGOwbkIMBEBsIwE337aPRqmYkmReWcb0w3m6KlaSf+xbVdJh1sCWroRcSnIG5UILtUKbtvL9e0yHcjg= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1698876874; c=relaxed/simple; bh=sNl1oc10RlDU8q8S4I4j+rPz6tUlWe2JEk4cFLfShLA=; h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=F5tSFHip3/nEu9R2dmkZUeCbHEla6881LY61GfWZ/qyKkD66tG9ZojIXP089UXIsyY6dumQ3jfE5ngR3VDN3ljmlySPnf90umwtSWUZVpAFClaKDaUy5izvX+mz7wAHtFoyvDE++qNWu/jooSRzkf4KbY9l6UxfxIitrHqcdYpg= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=sergiodj.net; s=20160602; t=1698876862; bh=sNl1oc10RlDU8q8S4I4j+rPz6tUlWe2JEk4cFLfShLA=; h=From:To:Cc:Subject:In-Reply-To:References:Date:From; b=QAsyZz3WA46XyUTwiTRb2RqFfTHRSS48RkCFskMuVtSiV788ZqFQb+CHPDvQW8HnN HC18e0DMLEeg5nZ6YqRP0G0FiaZ4hg7eqqC5ArYm7xjUAVFAfJWttpfuy+UF1OMhDQ Q9ExFh9GjRxYGT38iOgkxBKbMg48sTVifKHO4ZQY= Received: from localhost (unknown [IPv6:2607:f2c0:ed84:1900:6028:9ea5:567a:882f]) by mail.sergiodj.net (Postfix) with ESMTPSA id 9325AA602CF; Wed, 1 Nov 2023 18:14:21 -0400 (EDT) From: Sergio Durigan Junior To: Adhemerval Zanella Netto Cc: Joseph Myers , libc-alpha@sourceware.org Subject: Re: [PATCH] sysdeps: Clear O_CREAT|O_ACCMODE when trying again on sem_open In-Reply-To: (Adhemerval Zanella Netto's message of "Wed, 1 Nov 2023 10:27:11 -0300") References: <20230823042129.3955131-1-sergiodj@sergiodj.net> <4755cc77-54d0-00e6-62d5-2a90d0c35af2@linaro.org> <87il6qjxqy.fsf@sergiodj.net> X-URL: http://blog.sergiodj.net Date: Wed, 01 Nov 2023 18:14:20 -0400 Message-ID: <87o7gdktoj.fsf@sergiodj.net> User-Agent: Gnus/5.13 (Gnus v5.13) MIME-Version: 1.0 Content-Type: text/plain X-Spam-Status: No, score=-3.7 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Wednesday, November 01 2023, Adhemerval Zanella Netto wrote: > On 28/10/23 17:30, Sergio Durigan Junior wrote: >> On Wednesday, August 23 2023, Adhemerval Zanella Netto wrote: >> >>> On 23/08/23 01:21, Sergio Durigan Junior via Libc-alpha wrote: >>>> When invoking sem_open with O_CREAT as one of its flags, we'll end up >>>> in the second part of sem_open's "if ((oflag & O_CREAT) == 0 || (oflag >>>> & O_EXCL) == 0)", which means that we don't expect the semaphore file >>>> to exist. >>>> >>>> In that part, open_flags is initialized as "O_RDWR | O_CREAT | O_EXCL >>>> | O_CLOEXEC" and there's an attempt to open(2) the file, which will >>>> likely fail because it won't exist. After that first (expected) >>>> failure, some cleanup is done and we go back to the label "try_again", >>>> which lives in the first part of the aforementioned "if". >>>> >>>> The problem is that, in that part of the code, we expect the semaphore >>>> file to exist, and as such O_CREAT (this time the flag we pass to >>>> open(2)) needs to be cleaned from open_flags, otherwise we'll see >>>> another failure (this time unexpected) when trying to open the file, >>>> which will lead the call to sem_open to fail as well. >>>> >>>> This can cause very strange bugs, especially with OpenMPI, which makes >>>> extensive use of semaphores. >>>> >>>> The fix here is to actually make sure that the O_CREAT|O_ACCMODE flags >>>> are clear after we enter "try_again". >>>> >>>> See also: https://bugs.launchpad.net/ubuntu/+source/h5py/+bug/2031912 >>> >>> This need needs a bug report and, if possible, a regression check (I give >>> you that it might be tricky due it is a racy condition). >> >> Hi folks, >> >> It took me much longer than I intended to get back to this thread, so I >> apologize. I'm afraid I don't have very exciting news either: I still >> don't have a testcase to exercise the fix. >> >> After talking to Adhemerval during the last Cauldron, we have agreed >> that (a) creating a testcase for this bug is indeed tricky (and may even >> introduce false positives), and (b) we should likely move forward as is. >> >> I still would like to point out that it is possible have a reliable >> reproducer if you follow the steps I outlined on >> https://sourceware.org/bugzilla/show_bug.cgi?id=30789#c1, but >> unfortunately this is not acceptable as a glibc test, so there's that. >> >> Either way, I'd like to know if you consider it OK to proceed. I can >> replicate this explanation in the commit message if you think it's >> necessary. > > Joseph, would adding the following on commit message be enough to > install this patch: > > A regression test for this issue would require a complex and cpu > time consuming logic, since to trigger the wrong code path is not > straightforward due the racy condition. > > Sergio, could you resend the patch either this following or more > extended explanation along with BZ# 30789 on the title? Sure. I'll send it as a reply to your message. -- Sergio GPG key ID: 237A 54B1 0287 28BF 00EF 31F4 D0EB 7628 65FC 5E36 Please send encrypted e-mail if possible https://sergiodj.net/