From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 24037 invoked by alias); 9 Dec 2014 18:25:02 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Received: (qmail 23655 invoked by uid 89); 9 Dec 2014 18:25:01 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-3.0 required=5.0 tests=AWL,BAYES_00,SPF_HELO_PASS,SPF_PASS,T_RP_MATCHES_RCVD autolearn=ham version=3.3.2 X-HELO: mx1.redhat.com Subject: Re: [PATCH 1/3] Use reliable sem_wait interruption in nptl/tst-sem6. From: Torvald Riegel To: =?UTF-8?Q?Ond=C5=99ej_B=C3=ADlka?= Cc: GLIBC Devel In-Reply-To: <20141209165033.GA20499@domone> References: <1417804668.22797.108.camel@triegel.csb> <1417805577.25868.4.camel@triegel.csb> <20141206135040.GA16212@domone> <1418038997.25868.34.camel@triegel.csb> <20141208222857.GB13499@domone> <1418120160.25868.132.camel@triegel.csb> <20141209165033.GA20499@domone> Content-Type: text/plain; charset="UTF-8" Date: Tue, 09 Dec 2014 18:25:00 -0000 Message-ID: <1418149489.25868.230.camel@triegel.csb> Mime-Version: 1.0 Content-Transfer-Encoding: 8bit X-SW-Source: 2014-12/txt/msg00236.txt.bz2 On Tue, 2014-12-09 at 17:50 +0100, Ondřej Bílka wrote: > On Tue, Dec 09, 2014 at 11:16:00AM +0100, Torvald Riegel wrote: > > On Mon, 2014-12-08 at 23:28 +0100, Ondřej Bílka wrote: > > > On Mon, Dec 08, 2014 at 12:43:17PM +0100, Torvald Riegel wrote: > > > > On Sat, 2014-12-06 at 14:50 +0100, Ondřej Bílka wrote: > > > > > On Fri, Dec 05, 2014 at 07:52:57PM +0100, Torvald Riegel wrote: > > > > > > alarm (1); > > > > > > > > > > > > int res = sem_wait (&s); > > > > > > - if (res == 0) > > > > > > - { > > > > > > - puts ("wait succeeded"); > > > > > > - return 1; > > > > > > - } > > > > > > - if (res != -1 || errno != EINTR) > > > > > > + /* We accept all allowed behavior: Implementations that return EINTR and > > > > > > + those that rely on correct interruption through signals to use sem_post > > > > > > + in the signal handler. */ > > > > > > + if (res != 0 && !(res == -1 && errno == EINTR)) > > > > > > { > > > > > > - puts ("wait didn't fail with EINTR"); > > > > > > + puts ("wait neiter succeeded nor failed with EINTR"); > > > > > > return 1; > > > > > > } > > > > > > > > > > > > > > > > That does change test logic as it originally failed when wait succeeded. > > > > > > > > Yes, but why do you think that this is inconsistent? The previous test > > > > didn't add a token in the signal handler, so if wait succeeded, then the > > > > test should fail. > > > > > > > > However, the correct way to interrupt the semaphore with a signal is to > > > > add a token. My patch does that. Second, if we do not want to make > > > > timing assumptions (which the existing test would do if we add a token > > > > to the semaphore in the signal handler), then we need to accept that the > > > > (first) signal handler execution might happen before sem_wait actually > > > > executes. Therefore, we must not fail in this case. > > > > > > > > We have to correctly interrupt with signals because as the futex > > > > documentation stands (allowing return of EINTR on spurious wake-ups), > > > > there's no way to implement the specific behavior of the existing > > > > implementation (which assumes EINTR is returned *only* on signals). > > > > > > > > IOW, the existing test does white-box testing with timing assumptions; > > > > with this patch, we do make a slightly different black-box test with no > > > > timing assumptions. > > > > > > Which misses point of test which is to find bugs. > > > > Please read the POSIX spec. It allows both outcomes, and without timing > > assumptions etc., we can't drive executions towards just one of the > > outcomes. > > > Which does not answer my objection. What extra bugs could this test catch, > compared to say tst-sem2? If there is no such bug you could just delete > that file. tst-sem2 tests that spurious wake-ups and such don't return anything but -1 and errno==EINTR, in particular that 0 isn't returned. After the patch, tst-sem6 tests that a signal handler that posts a token will make sem_wait return. It *also* allows for sem_wait to return -1 and errno==EINTR in that case. Thus, one possible error that the patched tst-sem6 will catch is if the sem_wait itself just retries the futex_wait after the futex_wait returned EINTR, instead of looking for whether there is an available token. > > > > What if in new fooarch > > > assembly one forget to return -1 on interrupt which breaks user application > > > which will assume that semaphore is locked, > > > > I can strengthen the test on res==0, checking whether there is no token > > left. I don't think it buys us much though. > > > > Another option would be to disallow failure; however, this only works if > > sem_assume_only_signals_cause_futex_EINTR remains 0 and not set to a > > different value by a distribution (see Patch 3/3). > > > > > one can deal with rare false > > > positives in tests. > > > > I think you'd be creating false negatives with what you have in mind. > > The false positive would be not failing when 0 is returned, incorrectly, > > ie your example. > > > Depends whats your default. Usually false positive means that test shows > there is disease/bug but not in reality. I disagree regarding "usually", and would say that a negative result for a test is if the test fails, so false negative being a reportedly failed test that was not caused by an actual fault. Nonetheless, we seem to be on the same page regarding what we want to avoid. > > False negatives are a pain. You can deal with them of course, but it > > stands in the way of doing continuous integration and such. what we de > > facto do is just ignore all tests that we know can have false negatives, > > which doesn't make the test useful at all. > > > > > One does not need justify this fact by forward progress/fairness > > > assumptions. Just assumption that OS which keeps all CPU idle for one > > > second while there is suitable task is simply broken. > > > > Well, that *is* a timing assumption. There is nothing broken about this > > in general. Remember that we do have tests failing now and then just > > because of that. Which is awful for testing. > > > Without that assumption there is no guarantee that it will take more > than week to run test suite. That would make it useless. There are timing assumptions that state something like "after a second, the thread will have been scheduled". Those are bad, because they state a concrete bound (1s in this example). The forward progress guarantees that make sense for us, in particular in non-real-time systems, are instead of the form "something good will *eventually* happen". In other words, there's no concrete bound promised, but that there is a (finite) bound. That's an important difference. > > Also, please review the actual background for this change. See patch > > 3/3 for that. The change to the test is not ideal, but please see the > > trade-offs. > > The rarity of problem bugged me, as to trigger that behaviour one would > need run also highly parallel program so it looked unlikely that anybody > would report that. As it couldn't detect some bugs it previously could > its hard to see what is better. Let me try to summarize the background behind this change again: 1) Linux documents futex_wait to return EINTR on signals *or* on spurious wake-ups. 2) If we treat 1) as true -- which we should to unless getting confirmation otherwise -- sem_wait must not return EINTR to the caller anymore if futex_wait returned EINTR. 3) Because of 2), the behavior that is tested in tst-sem6 before my patch cannot be implemented anymore. Thus, we need to do *something*. I proposed this patch, and variations. If you don't see other alternatives, then I guess we'll have to pick from the options I gave. If you disagree with 2), then please comment on Patch 3/3, because that's where this belongs.